RAG Implementations Are Becoming More Agent-Like

Basic foundational RAG implementations have several vulnerabilities, and as efforts to address these weaknesses progress, RAG implementations are evolving into agentic approaches.

5 min readApr 8, 2024

Introduction

It’s fascinating to see that with the advancements in generative AI frameworks, there’s a widespread convergence towards what’s considered as good design.

For instance, consider the evolution of prompt engineering: prompts have progressed into templates featuring placeholders for injecting variables.

This progression further led to prompt chaining, eventually culminating in the development of autonomous agents equipped with a multitude of tools. These tools are accessed and used as the agent see fit.

In a similar trajectory, the RAG framework has undergone a notable transformation.

Initially, the basic RAG setup was deemed satisfactory. However, now there’s a growing trend of integrating additional intelligence into the RAG stack, along with incorporating various other elements into the RAG architecture.

What’s Wrong With Standard RAG?

Firstly, the structure of prompts is increasingly crucial in RAG architectures, with techniques such as Chain-of-Thought being introduced into prompts created for RAG implementations.

Simply injecting prompts with contextual reference data is no longer sufficient; now, careful attention is paid to prompt wording to optimize performance.

Secondly, it’s acknowledged that RAG exhibits static characteristics in two key aspects.

RAG often fails to consider the conversation’s context or extend its consideration beyond the current dialogue turn.
Additionally, the decision-making process regarding retrieval is typically governed by static rules, lacking adaptability.

Thirdly, there’s a growing concern regarding unnecessary overhead, particularly regarding unoptimised retrievals and additional text that incurs unwanted costs and inference latency.

Fourthly, multi-step approaches and classifiers are employed to determine the best response or utilise multiple data stores, sometimes solely to classify the user request. These classifiers often rely on annotated data to train models for these specialised tasks.

The image illustrates where the three key decision making points are in the RAG approach. Numerous papers published recently focusses on solving for these three areas in the RAG pipeline:

Knowing when to retrieve and from where to retrieve.
Performing evaluation, correction or at least some kind of quality check on the retrieved data.
Lastly, post generation checks need to be performed. In some implementations, multiple generations are run and the best result selected. There are also frameworks performing truthful checks on the generated result.

Agent-Like RAG (Agentic RAG)

Considering the image below, the architecture below I deduced from a LlamaIndex tutorial on what they refer to as Agentic RAG. This is where RAG is implemented in an agent-like fashion.

Each of the tools are related to a document or set of documents which is described. The description of each tool allows the agent to know which tool to select, or which tools to combine.

User queries can be submitted where a question spans over a number of documents. Complex questions can be asked and the agent synthesises the different tools (which access different documents) to answer the question.

For instance, a number of agents each cover financial statements spanning over a number of months. What if the RAG agent is asked to calculate the profit (revenue minus cost) over a defined period of three months?

A standard RAG implementation will not be able to calculate this complex user query spanning over multiple documents with data which needs to be synthesised and processed.

In Conclusion

I have previously discussed how various approaches and technologies used to be distinct, but now we are witnessing a convergence of these technologies.

Considering Agent-based RAG, a few key principles emerge…

One of the most popular enterprise LLM implementation types are RAG, Agentic RAG is a natural progression of this.
The term Agentic RAG is used, where an agent approach is followed for a RAG implementation to add resilience, reasoning and intelligence to the RAG implementation.
It is a good illustration of multi-agent orchestration.
This architecture serves as a good reference framework of how scaling an agent can be optimised with a second tier of smaller worker-agents.
Agentic RAG is an example of a controlled and well defined autonomous agent implementation.
It is easy to envisage how this architecture can grow and expand over an organisation with more sub bots being added.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

Cobus Greyling | Substack

I explore and write about all things at the intersection of AI and language; LLMs, NLP/NLU, Chat/Voicebots, CCAI…

substack.com

RAG Implementations Are Becoming More Agent-Like

Basic foundational RAG implementations have several vulnerabilities, and as efforts to address these weaknesses progress, RAG implementations are evolving into agentic approaches.

Introduction

What’s Wrong With Standard RAG?

Agent-Like RAG (Agentic RAG)

In Conclusion

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

Cobus Greyling | Substack

I explore and write about all things at the intersection of AI and language; LLMs, NLP/NLU, Chat/Voicebots, CCAI…

COBUS GREYLING

At the intersection of AI & Language | NLP/NLU/LLM, Chat/Voicebots, CCAI I explore and write about all things at the…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Cobus Greyling

Responses (1)

More from Cobus Greyling

Using LangChain With Model Context Protocol (MCP)

The Model Context Protocol (MCP) is an open-source protocol developed by Anthropic, focusing on safe and interpretable Generative AI…

Why The Focus Has Shifted from AI Agents to Agentic Workflows

We find ourselves on a stairway from where Large Language Models were introduced to AI Agents with human like digital interactions. But…

AI Agents are not Ready Yet

No company wants to pour resources into developing software only to see it become irrelevant due to general advancements in AI…

Model Context Protocol (MCP)

I would like to make a point regarding the Model Context Protocol (MCP)…

Recommended from Medium

Craziest MCP Servers You Must Try

I remember when I first heard about MCP (Model Context Protocol). I thought

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

Testing 18 RAG Techniques to Find the Best

crag, HyDE, fusion and more!

Model Context Protocol (MCP): An End-To-End Tutorial With Hands-On Project with Python

What is MCP? How to create an MPC Server that brings news from a web site with Claude Desktop?

MCP Servers: A Comprehensive Guide — Another way to explain

Introduction to MCP Servers

Ollama-OCR Now Supports PDFs! 🚀

Stuck behind a paywall? Read for Free!