What is RAG in Gen AI?

Retrieval augmented generation (RAG) is a natural language processing (NLP) technique that combines the strengths of both retrieval-based and generation-based artificial intelligence (AI) models.

For example, RAG can give a large language model (LLM) knowledge of the current price of Bitcoin when that LLM is asked how much $BTC one can buy for $1,000.

This story quickly explains RAG in the context of LLMs from the ground up.

What is Natural Language Processing?

Natural language processing (NLP) refers to the branch of computer science — and more specifically, the branch of artificial intelligence or AI — concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.

NLP combines rule-based modeling of human language with statistical, machine learning, and deep learning models.

Together, these technologies enable computers to process human language in the form of text or voice data and to “understand” its full meaning, complete with the speaker or writer’s intent and sentiment.

Popular generative AI models like Bard, ChatGPT, and Grok are applied use cases of NLP.

Retrieval-Based Models in NLP

Retrieval-based models in NLP are designed to select an appropriate response from a predefined set of responses based on the input query.

  1. These models compare the input text (a question or query) with a database of predefined responses.

  2. The system identifies the most suitable response by measuring the similarity between the input and stored responses using techniques like cosine similarity or other semantic matching methods.

Retrieval-based models are efficient for tasks like question-answering, where the responses are often fact-based and readily available in a structured form.

Generation-Based Models in NLP

Generation-based (or generative-based) models create responses from scratch. These models use complex algorithms, often based on neural networks, to generate human-like text or responses.

  1. Unlike retrieval-based models, generation-based models do not rely on predefined responses.

  2. Instead, they learn to generate responses by predicting the next word or sequence of words based on the context provided by the input.

This ability to generate novel, contextually appropriate responses makes generation-based models highly versatile and suitable for tasks like creative writing, translation, and dialogue where responses must be diverse and contextually rich.

Retrieval vs Generation

Retrieval vs Generation Models
Retrieval vs Generation Models

Why is RAG Useful?

Foundation models (like OpenAI’s ChatGPT or Meta’s LLaMA) are usually trained offline, making the model agnostic to any data that is created after the model was trained. On top of that, foundation models are trained on very general domain corpora, making them less effective for domain-specific tasks, though more effective for general tasks.

Retrieval Augmented Generation (RAG) can be used to retrieve data from outside a foundation model and augment input prompts by adding the relevant retrieved data in context.

Source: AWS
Source: AWS

Simply put..

  1. Problem — AI models are trained before they are deployed, and don’t have the natural functionality to gain additional subject matter once they are deployed

  2. Solution — RAG is how those AI models can take in contextually relevant data from a new or evolving environment and focus their solution (response, output) to the situation at hand

You can use Retrieval Augmented Generation (RAG) to retrieve data from outside a foundation model and augment your prompts by adding the relevant retrieved data in context.

Step by step..

  1. A given prompt is input by a user

  2. That prompt is taken and compared to a database / source of information that the foundational model is not aware of

  3. Specific context that will help the generation-based model (foundational model) respond optimally is returned by the retrieval-based model and added to the prompt

  4. The prompt (now with context) is given to the generation-based model, and the model responds better than it would have without the additional context

And that’s all there is to it. In short, RAG is how Bard knows the temperature outside, how long it is going to take to get to the park, and how long it has really been since you’ve touched grass.

As always, thanks for reading.

Sources

Banjara, Babina. “Rag’s Innovative Approach to Unifying Retrieval and Generation in NLP.” Analytics Vidhya, 17 Nov. 2023, www.analyticsvidhya.com/blog/2023/10/rags-innovative-approach-to-unifying-retrieval-and-generation-in-nlp.

Lewis, Patrick, et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” arXiv.Org, 12 Apr. 2021, arxiv.org/abs/2005.11401.

“What Is Natural Language Processing?” IBM, www.ibm.com/topics/natural-language-processing. Accessed 2 Dec. 2023.

Williams, Kesha. “Debiasing AI Using Amazon Sagemaker.” Amazon, LinkedIn, 2019, docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html.

Robson, Winston. “What Is Rag in Generative Ai?” Medium, Dropout Analytics, Dec. 2023, medium.com/dropout-analytics/what-is-rag-in-generative-ai-f5b8c13575f8.

Subscribe to 0x49b8…0845
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.