Sitemap
Press enter or click to view image in full size

The State of AI Agents

6 min read5 days ago

--

There has been a number of sobering studies when it comes to the realities of implementing AI Agents…

The existence of the Pareto Frontier (cost vs accuracy) has been well examined and documented.

NVIDIA has been taking the lead in moving away from the idea of an AI Agent being underpinned by a single LLM.

They are advocating the use of multiple Small Language Models (SLMs) and orchestrating the SLMs in an Agentic workflow.

Latency and cost have also been impediments to AI agent use.

Cost is being addressed by using and orchestrating multiple open-sourced SLMs which is fine-tuned for specific tasks within an AI Agent.

Latency is addressed with parallel running tasks within an AI Agent.

Or having multiple AI Agents within a Agentic workflow, and executing those AI Agents in parallel for complex tasks.

Lighter AI Agents…NVIDIA has also been backing this growing trend of agentic workflows where the orchestrating AI Agent (the “conductors” that plan, route tasks, and synthesise outputs) are indeed leaning toward being more lightweight and less code-intensive (AI Agents) to implement.

Agentic Workflows are easy to build in general, especially when presented via a GUI builder, that is no-code and handles tasks like versioning, collaboration and more.

Drift and continuous improvement are addressed with what NVIDIA refers to as a data-flywheel.

Where a continuous feedback loop is established for training data to improve the different models orchestrated in the AI Agent/Agentic Workflow.

Most of these challenges I detailed in an earlier post.

Also, for a long time there has been this notion of different levels of AI Agents.

Typically there are 5 or 6 levels defined, in a number of studies.

HuggingFace also published a study where they detail the shifting relationship between humans and AI Agents…

This is why I found this recent study interesting (the image below) where they extended the levels from what was typically 5 levels, to 6 levels.

The one extreme is Level 0, here the human dominates and AI plays no role. And on the other extreme, Level 5, where the human has no role and the AI takes full control.

And in between are these steps of going from one level to another.

This approach really resonates with me as it so sober in dissecting the gradual approach in task automation transition.

Where are we now, based on this chart?

Press enter or click to view image in full size

We have been introducing assistance from the days of Chatbots and NLU models…

…currently we are in the process of transferring tasks from human dominance to AI dominance. This is happening on a task-by-task basis.

Hence this movement must not be seen as a single wave hitting enterprises and our way-of-work.

But rather a detailed approach of identifying tasks suited (for now) to move along this trajectory.

Considering the image below, it breaks down the traditional four pillars of AI Agents.

In this article I will stay with this for the sake of simplicity. But to some degree this approach has been upended by Agentic Workflows and multiple smaller language models.

A recent study also introduced a data component…data lakes.

So not only is there focus on context / memory and tools. But also giving the AI Agent access to vast amounts of data to help the AI Agent perform user requests.

Press enter or click to view image in full size

The table below (from the study) shows the contrast between traditional AI Agents and what the study terms Data Agents.

Press enter or click to view image in full size

The future?

For AI Agents to be truly autonomous and able to address virtually any task, it needs to be able to write code.

On this front OpenAI and xAI introduced functions / tools which are able to write code.

Hence the AI Agent gets an instruction, there is no pre-existing tool or function to perform the request, but the AI Agent has the ability to write code to achieve a goal.

There exist limitations…

Sandbox Restrictions for Security…AI models operate within tightly controlled sandboxes to mitigate risks, prohibiting arbitrary software installs or direct internet access.

They’re confined to pre-loaded libraries, forcing clever workarounds like built-in proxies.

While this setup enables basic functionality, it blocks the creation of truly innovative tools that demand unrestricted resources — limiting adaptability to predefined, bounded tasks.

Debugging remains a weak spot, models often hallucinate subtle bugs in intricate logic and finite token limits (coupled with cost constraints) make exhaustive testing impractical.

They excel at 70–80% of straightforward objectives but stumble on edge cases, frequently demanding human intervention.

In my experience, AI Agents generate code that looks flawless, only for execution in notebooks or live environments to reveal flaws — necessitating rounds of re-prompting.

The bigger the code block, the steeper the challenge. Autocomplete and highly contextual use in IDE’s work quite well.

Ambiguous instructions derail progress, with agents chasing tangential sub-goals instead of the core intent.

They’re far from self-directed; precision is essential to keep them aligned.

This is why detailed, step-by-step breakdowns are non-negotiable.

Tools like Gemini thrive in Colab notebooks precisely because they leverage rich contextual anchors — iterating on proven, working code cell by cell, rather than from scratch.

If you are still reading, thank you…I hope it was helpful, any feedback will be welcome…

Below are a number of resources that might be of help on this topic.

Press enter or click to view image in full size

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. Language Models, AI Agents, Agentic Apps, Dev Frameworks & Data-Driven Tools shaping tomorrow.

Press enter or click to view image in full size

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

Responses (1)