Retrieval augmented generation (RAG) is a natural language processing (NLP) technique that combines the strengths of both retrieval-based and generation-based artificial intelligence (AI) models.
For example, RAG can give a large language model (LLM) knowledge of the current price of Bitcoin when that LLM is asked how much $BTC one can buy for $1,000.
This story quickly explains RAG in the context of LLMs from the ground up.
Natural language processing (NLP) refers to the branch of computer science — and more specifically, the branch of artificial intelligence or AI — concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.
NLP combines rule-based modeling of human language with statistical, machine learning, and deep learning models.
Together, these technologies enable computers to process human language in the form of text or voice data and to “understand” its full meaning, complete with the speaker or writer’s intent and sentiment.
Popular generative AI models like Bard, ChatGPT, and Grok are applied use cases of NLP.
Retrieval-based models in NLP are designed to select an appropriate response from a predefined set of responses based on the input query.
These models compare the input text (a question or query) with a database of predefined responses.
The system identifies the most suitable response by measuring the similarity between the input and stored responses using techniques like cosine similarity or other semantic matching methods.
Retrieval-based models are efficient for tasks like question-answering, where the responses are often fact-based and readily available in a structured form.
Generation-based (or generative-based) models create responses from scratch. These models use complex algorithms, often based on neural networks, to generate human-like text or responses.
Unlike retrieval-based models, generation-based models do not rely on predefined responses.
Instead, they learn to generate responses by predicting the next word or sequence of words based on the context provided by the input.
This ability to generate novel, contextually appropriate responses makes generation-based models highly versatile and suitable for tasks like creative writing, translation, and dialogue where responses must be diverse and contextually rich.
Foundation models (like OpenAI’s ChatGPT or Meta’s LLaMA) are usually trained offline, making the model agnostic to any data that is created after the model was trained. On top of that, foundation models are trained on very general domain corpora, making them less effective for domain-specific tasks, though more effective for general tasks.
Retrieval Augmented Generation (RAG) can be used to retrieve data from outside a foundation model and augment input prompts by adding the relevant retrieved data in context.
Simply put..
Problem — AI models are trained before they are deployed, and don’t have the natural functionality to gain additional subject matter once they are deployed
Solution — RAG is how those AI models can take in contextually relevant data from a new or evolving environment and focus their solution (response, output) to the situation at hand
You can use Retrieval Augmented Generation (RAG) to retrieve data from outside a foundation model and augment your prompts by adding the relevant retrieved data in context.
Step by step..
A given prompt is input by a user
That prompt is taken and compared to a database / source of information that the foundational model is not aware of
Specific context that will help the generation-based model (foundational model) respond optimally is returned by the retrieval-based model and added to the prompt
The prompt (now with context) is given to the generation-based model, and the model responds better than it would have without the additional context
And that’s all there is to it. In short, RAG is how Bard knows the temperature outside, how long it is going to take to get to the park, and how long it has really been since you’ve touched grass.
As always, thanks for reading.
Banjara, Babina. “Rag’s Innovative Approach to Unifying Retrieval and Generation in NLP.” Analytics Vidhya, 17 Nov. 2023, www.analyticsvidhya.com/blog/2023/10/rags-innovative-approach-to-unifying-retrieval-and-generation-in-nlp.
Lewis, Patrick, et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” arXiv.Org, 12 Apr. 2021, arxiv.org/abs/2005.11401.
“What Is Natural Language Processing?” IBM, www.ibm.com/topics/natural-language-processing. Accessed 2 Dec. 2023.
Williams, Kesha. “Debiasing AI Using Amazon Sagemaker.” Amazon, LinkedIn, 2019, docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html.
Robson, Winston. “What Is Rag in Generative Ai?” Medium, Dropout Analytics, Dec. 2023, medium.com/dropout-analytics/what-is-rag-in-generative-ai-f5b8c13575f8.