RAG Implementations Are Becoming More Agent-Like

Basic foundational RAG implementations have several vulnerabilities, and as efforts to address these weaknesses progress, RAG implementations are evolving into agentic approaches.

Cobus Greyling
5 min readApr 8, 2024



It’s fascinating to see that with the advancements in generative AI frameworks, there’s a widespread convergence towards what’s considered as good design.

For instance, consider the evolution of prompt engineering: prompts have progressed into templates featuring placeholders for injecting variables.

This progression further led to prompt chaining, eventually culminating in the development of autonomous agents equipped with a multitude of tools. These tools are accessed and used as the agent see fit.

In a similar trajectory, the RAG framework has undergone a notable transformation.

Initially, the basic RAG setup was deemed satisfactory. However, now there’s a growing trend of integrating additional intelligence into the RAG stack, along with incorporating various other elements into the RAG architecture.

What’s Wrong With Standard RAG?

Firstly, the structure of prompts is increasingly crucial in RAG architectures, with techniques such as Chain-of-Thought being introduced into prompts created for RAG implementations.

Simply injecting prompts with contextual reference data is no longer sufficient; now, careful attention is paid to prompt wording to optimize performance.

Secondly, it’s acknowledged that RAG exhibits static characteristics in two key aspects.

  1. RAG often fails to consider the conversation’s context or extend its consideration beyond the current dialogue turn.
  2. Additionally, the decision-making process regarding retrieval is typically governed by static rules, lacking adaptability.

Thirdly, there’s a growing concern regarding unnecessary overhead, particularly regarding unoptimised retrievals and additional text that incurs unwanted costs and inference latency.

Fourthly, multi-step approaches and classifiers are employed to determine the best response or utilise multiple data stores, sometimes solely to classify the user request. These classifiers often rely on annotated data to train models for these specialised tasks.

The image illustrates where the three key decision making points are in the RAG approach. Numerous papers published recently focusses on solving for these three areas in the RAG pipeline:

  1. Knowing when to retrieve and from where to retrieve.
  2. Performing evaluation, correction or at least some kind of quality check on the retrieved data.
  3. Lastly, post generation checks need to be performed. In some implementations, multiple generations are run and the best result selected. There are also frameworks performing truthful checks on the generated result.

Agent-Like RAG (Agentic RAG)

Considering the image below, the architecture below I deduced from a LlamaIndex tutorial on what they refer to as Agentic RAG. This is where RAG is implemented in an agent-like fashion.

Each of the tools are related to a document or set of documents which is described. The description of each tool allows the agent to know which tool to select, or which tools to combine.

User queries can be submitted where a question spans over a number of documents. Complex questions can be asked and the agent synthesises the different tools (which access different documents) to answer the question.

For instance, a number of agents each cover financial statements spanning over a number of months. What if the RAG agent is asked to calculate the profit (revenue minus cost) over a defined period of three months?

A standard RAG implementation will not be able to calculate this complex user query spanning over multiple documents with data which needs to be synthesised and processed.

In Conclusion

I have previously discussed how various approaches and technologies used to be distinct, but now we are witnessing a convergence of these technologies.

Considering Agent-based RAG, a few key principles emerge…

  1. One of the most popular enterprise LLM implementation types are RAG, Agentic RAG is a natural progression of this.
  2. The term Agentic RAG is used, where an agent approach is followed for a RAG implementation to add resilience, reasoning and intelligence to the RAG implementation.
  3. It is a good illustration of multi-agent orchestration.
  4. This architecture serves as a good reference framework of how scaling an agent can be optimised with a second tier of smaller worker-agents.
  5. Agentic RAG is an example of a controlled and well defined autonomous agent implementation.
  6. It is easy to envisage how this architecture can grow and expand over an organisation with more sub bots being added.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.




Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com