Fine-Tuning LLMs With Retrieval Augmented Generation (RAG)

LlamaIndex recently published a notebook based on a study which explores Large Language Model fine-tuning with retrieval augmentation.

4 min readOct 9, 2023

This approach is a novel implementation of RAG called RA-DIT (Retrieval Augmented Dual Instruction Tuning) where the RAG dataset (query, context retrieved and response) is used to to fine-tune a LLM. In turn retriever fine-tuning is performed via a supervised process with generated data from the LLM. The retriever training data can be generated via both supervised and unsupervised tasks, but supervision via an AI accelerated suite makes more sense.

This approach allows for the retriever to become more contextually relevant and aligned with the LLM.

The code snipped below shows how the context is fetched via a retriever for each datapoint.

from llama_index import VectorStoreIndex
from llama_index.prompts import PromptTemplate

qa_prompt_tmpl_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query.\n"
    "Query: {query_str}\n"
    "Answer: "
)
qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)

vector_retriever = vector_index.as_retriever(similarity_top_k=1)

And here the formatting of the training data for OpenAI’s fine-tuning endpoints.

def save_openai_data(dataset, out_path):
    # out_fp = open("data_rag/qa_pairs_openai.jsonl", "w")
    out_fp = open(out_path, "w")
    # TODO: try with different system prompts
    system_prompt = {
        "role": "system",
        "content": "You are a helpful assistant helping to answer questions about the Llama 2 paper.",
    }
    train_qr_pairs = dataset.qr_pairs
    for line in train_qr_pairs:
        query, response = line
        user_prompt = {"role": "user", "content": query}
        assistant_prompt = {"role": "assistant", "content": response}
        out_dict = {
            "messages": [system_prompt, user_prompt, assistant_prompt],
        }
        out_fp.write(json.dumps(out_dict) + "\n")
save_openai_data(train_dataset, "data_rag/qa_pairs_openai.jsonl")

Fine-tuning with RAG, uses the RAG output as training data for LLM fine-tuning. And in turn the fine-tuning will teach the LLM to better interpret the use-case context.

The human-in-the-loop approach allows for a supervised approach where responses can be curated and the retriever in turn trains on this feedback.

Retrieval-augmented language models (RALMs) with human curation improve performance by addressing the long-tail of human interaction and up-to-date knowledge from external data stores.

The LLM is updated for improved use of retrieved information.
While the retriever is updated with relevant use-case examples.

This approach leads to both improved knowledge utilisation and enhanced contextual awareness.

In Conclusion

Fine-Tuning and RAG is not mutually exclusive, and using both in tandem is the ideal. Especially considering the simplicity introduced by OpenAI in terms of fine-tuning.
RA-DIT (Retrieval Augmented Dual Instruction Tuning) creates a symbiotic relationship between fine-tuning and RAG where RAG data is used for fine-tuning and in turn the retriever is updated with user data.
A human supervised process of curating data for retriever training requires an AI accelerated latent space or as HumanFirst refers to it, a natural language data productivity suite.
This dual instruction tuning approach makes for a supervised continuous improvement cycle and addresses the long-tail of user conversations.

The image above shows the RA-DIT approach which separately fine-tunes the LLM and the retriever. For a given example, the LM-ft component updates the LLM to maximise the likelihood of the correct answer given. The R-ft component updates the retriever to minimise the KL-Divergence between the retriever score distribution and the LLM preference.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from…

arxiv.org

Fine-tuning with Retrieval Augmentation - LlamaIndex 🦙 0.8.40

Here we try fine-tuning an LLM with retrieval augmentation, as referenced from the RA-DIT paper…

gpt-index.readthedocs.io

RAG — Retrieval Augmented Generation

Large Language Models, RAG and data management.

cobusgreyling.medium.com

Steps In Evaluating Retrieval Augmented Generation (RAG) Pipelines

The basic principle of RAG is to leverage external data sources. For each user query or question, a contextual chunk of…

cobusgreyling.medium.com

Solving For The Long Tail Of Intent Distribution

The long tail of intent distribution can be successfully addressed by leveraging the first two steps of NLU Design

cobusgreyling.medium.com

OpenAI Has Expanded Their Fine-Tuning GUI

OpenAI has simplified fine-tuning considerably by introducing a GUI for creating fine-tuned models.

cobusgreyling.medium.com

Fine-Tuning LLMs With Retrieval Augmented Generation (RAG)

LlamaIndex recently published a notebook based on a study which explores Large Language Model fine-tuning with retrieval augmentation.

In Conclusion

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from…

Fine-tuning with Retrieval Augmentation - LlamaIndex 🦙 0.8.40

Here we try fine-tuning an LLM with retrieval augmentation, as referenced from the RA-DIT paper…

RAG — Retrieval Augmented Generation

Large Language Models, RAG and data management.

Steps In Evaluating Retrieval Augmented Generation (RAG) Pipelines

The basic principle of RAG is to leverage external data sources. For each user query or question, a contextual chunk of…

Solving For The Long Tail Of Intent Distribution

The long tail of intent distribution can be successfully addressed by leveraging the first two steps of NLU Design

OpenAI Has Expanded Their Fine-Tuning GUI

OpenAI has simplified fine-tuning considerably by introducing a GUI for creating fine-tuned models.

Written by Cobus Greyling

Responses (3)