RAG — Retrieval Augmented Generation

Large Language Models, RAG and data management.

5 min readAug 23, 2023

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Considering the Venn Diagram below, when thinking of the LLM use case in general, three main considerations are shown, and their relationships to each-other.

User Input Context / User Intent

Any user input will have a certain context. The context is the frame of reference of the user, which most probably informs their query. In traditional chatbot lingo, this is the intent of the user; the intention which moved the user to interact with the system.

LLM Knowledge

This is the knowledge which is baked into the LLM. hence the knowledge with which the LLM is trained, which have a definite cut-off in terms of recency and current affairs.

The LLM has the ability to act as the backbone of the conversational dialog management and formulate succinct responses via natural language generation (NLG).

External Contextual Reference Data

This is the supplementary data which are retrieved in chunks; semantic search and embeddings are used to ensure contextually relevant data is returned from the available corpus.

Commonality 1

This is where the user input is directly submitted via prompt engineering to the LLM. There is no contextual reference given, and these are the instances where the LLM hallucinates or returns contextually irrelevant data.

Hallucination is where a LLM returns highly plausible, credible and succinct answers but factually incorrect.

Commonality 2

Here the user input and intent are combined with data sources, sans the LLM. And this brings us back to the traditional ailments of chatbots:

No NLG exist and pre-defined messages are required in a state-machine fashion; presented at each dialog turn.
Dialog management must be performed via a flow and cannot be done via few-shot approach. Where the LLM backbone is missing, a certain level of resilience is lost.

Commonality 3

The LLM is fine-tuned on relevant data; the fine-tuned model can work well for industry specific implementations like medical, legal, engineering, etc use-cases.

But this fine-tuned model is also frozen in time, and without any contextual reference for each specific input, will be generally more accurate, but not tuned for each and very specific user input.

Commonality 4

This is the RAG approach…as explained below.

Retrieval Augmented Generation (RAG) combines information retrieval and generative models.

By injecting the prompt with relevant and contextual supporting information, the LLM can generate telling and contextually accurate responses to user input.

Below is a complete workflow of how a RAG solution can be implemented. By making use of a vector store and semantic search, relevant and semantically accurate data can be retrieved.

Below is a practical working example of RAG implemented using the vellum framework.

Using a RAG approach, with the Entry point defined and the input as: What happened in 1934?

The document is searched (1), and an extract is returned. In turn the extract is submitted as context to the LLM with the same question (2), but this time the document extract serves as a contextual reference for the Prompt. And finally the correct answer is given.

Considering the image below, Scenario 1 does not have any context, and on the question, What happened in 1934? a list of 10 world occurrences are returned, based on the knowledge base of the LLM.

This list is not incorrect in any way, but does not address the relevant context; in this case the relevant context is South Africa.

And at the bottom of the image, Scenario 2 answers the same question, but a RAG approach is followed where a contextual reference is given to the prompt and the LLM generates a response making use of the contextual reference.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.