AI Agents are not Ready Yet

No company wants to pour resources into developing software only to see it become irrelevant due to general advancements in AI…

5 min readMar 5, 2025

Everyone’s trying to crack the code…what’s the next big framework for harnessing language models?

The danger for start-ups is betting everything on single aspects — like RAG (Retrieval-Augmented Generation) or prompt engineering as we have seen in the past.

But the best approach to prepare for the future isn’t about picking one winner…

It’s about building adaptable, integrated systems that can evolve with the landscape.

Companies should invest in flexible frameworks that combine multiple AI capabilities, allowing them to pivot as new technologies emerge — without being locked into a single, potentially fleeting trend.

Avoiding Over-Reliance

The graph below shows how technologies like GPT-3 spiked early (peaking at 50 in 2021) but faded (down to 20 by 2025), while RAG and Agentic Workflows are surging (nearing 80 and 60 by 2025). Betting on one risks obsolescence.

Problem-First Focus

Build solutions that address real user needs and market demands, not just the hottest AI trends.

Future-Proofing

Adaptable systems can integrate rising trends.

If you like this article & want to show some love ❤️

- Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Find me on LinkedIn & on X!

AI Agents Advancing Areas

AI Agents are benefiting from standardised architectures and enhanced basic functionalities like language understanding and task automation.

These improvements are driven by widespread adoption and research in foundational AI technologies.

Lagging Elements

Developer Tools: While general-purpose frameworks exist, tools tailored specifically for AI agent development (e.g., agent-specific IDEs) are underdeveloped.
Collaboration Environments: Systems for AI Agents to work together or with humans in real-time are not yet mature.
Security and Risk Compliance: Robust standards and tools to ensure agents operate safely and comply with regulations are lagging.
Debugging and Granular Tuning: Pinpointing and fixing issues in complex AI agents remains challenging due to their “black box” nature.
Inspectability: Understanding why an AI Agent makes a specific decision is still difficult, limiting trust and adoption in critical applications.

In formative interviews with AI Agent developers, Microsoft Research identify core challenges:

Difficulty reviewing long AI Agent conversations to localise errors
Lack of support in current tools for interactive debugging
The need for tool support to iterate on AI Agent configuration

Based on these needs, Microsoft Research developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualisation for navigating complex message histories.

In a two-part user study with 14 participants, Microsoft Research identify common user strategies for steering agents and highlight the importance of interactive message resets for debugging.

Their studies deepen understanding of interfaces for debugging increasingly important agentic workflows.

How can we design systems that enable developers to effectively debug multi-agent AI teams?
How do developers use such a system to debug and improve agent workflows in practice?

Some participants noted that iterating on AI Agent configurations is currently a slow and arduous process.

While debugging, developers are continuously tweaking their AI Agent configurations by changing the system prompts, adding or removing AI Agents from the team, or altering the selection of available tools.

At present, developers must restart the workflows from the beginning to test the effectiveness of any given change.

In cases where errors arise later in the conversation, developers must then wait considerable time to observe any impacts.

Moreover, due to the stochastic nature of LLMs, the same errors might not always occur, requiring multiple run-throughs to gain confidence in a remediation.

All of this slows down the debugging process considerably.

To this end, participants expressed a desire to “freeze” the conversations at critical points and then iterate on potential fixes while the problematic context is isolated and in memory.

Developer Requirements

Understand messages exchanged between AI Agents.

An AI Agent debugging tool needs to expose the messages sent between AI Agents so that users can understand the details of the conversation and how the AI Agents are progressing through tasks.

This is important for identifying where errors are happening in the workflow.

Interrupt the conversation and send new messages.

Users should be able to pause/interrupt the workflow at any point, and send new messages to the AI Agents.

Reset back to a previous point in the workflow once a failure point is identified, users need the ability to reset to an earlier point in the workflow in order to experiment with steering agents to alternate paths.

Change AI Agent configurations.

An AI Agent debugging tool should let users change AI Agent configurations, such as the prompts or models used, in order to experiment with fixes.

(1) What happens if I retry the workflow from this point?
(2) What would have happened if this alternative message had been produced?

GDebugger helps users interactively debug and steer their agent teams.

Users can interactively send new messages, control the flow of messages, and see the history of agent messages.

Users can revert to earlier points in the workflow by resetting and editing messages

The overview visualisation helps users make sense of long conversations and the history of edits in an interactive visualisation.

The interactive overview above is a visualisation and summarises the AI Agent conversation.

Each reset forks the current conversation and creates a new conversation session, represented as a new column.

Users can toggle the message colour to represent the message type, sender, or receiver. Message details are shown on hover and clicking navigates to the full message in the Message History view.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

Interactive Debugging and Steering of Multi-Agent AI Systems

Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What…

arxiv.org