Retrieval-Augmented Generation (RAG) — Definition

Retrieval-Augmented Generation (RAG) is a technique for improving language-model output by retrieving relevant external information at query time and supplying it to the model as context before it generates an answer. Instead of relying only on what a model learned during training, a RAG system searches a knowledge source — documents, a database, a wiki — finds the passages most relevant to the user's question, and includes them in the prompt. The model then composes a response grounded in that retrieved material.

RAG addresses two practical limits of language models: they have a fixed knowledge cutoff and they can produce fluent but incorrect statements (hallucinations). By grounding answers in retrieved source text, RAG reduces fabrication and lets a system answer questions about private, recent, or domain-specific data the base model never saw. The quality of a RAG system depends heavily on retrieval quality — how well it surfaces the right passages for a given query.

Retrieval is commonly implemented with semantic (vector) search over text embeddings, which matches on meaning rather than exact words; some systems use keyword or substring matching, which is simpler but can miss relevant material that uses different phrasing. In an AI-workforce setting, RAG is what lets agents answer using a customer's own documents and records rather than generic knowledge, and the retrieval method materially affects answer accuracy.

See this in practice: how Kirality works for your industry, or read more on the blog.

Related terms

Ready to ship 10x?