AI Agent Hype Is Colliding With Reality
The Hype of AI Agent Autonomy is Cooling and Human Supervision is Making a Comeback
Supervision
There is a correction taking place in how we see the place of AI Agents. Let me give a quick breakdown here…
HuggingFace released a study in which they argue that AI Agents should not be fully autonomous, especially if the necessary safeguards are not developed adequately.
They suggested an approach where there is varying levels of human supervision.
This is something I have been advocating for a while now, and that is levels of agency with varying levels of human supervision are required.
Rather than jumping straight into fully autonomous AI Agents, rather incorporating varying levels of agency or autonomy into everyday applications.
This offers a more balanced path to automation.
This way, users can have the benefits of AI assistance while maintaining control over key decisions and actions.
By gradually introducing agency, applications can provide adaptive support and smart suggestions without overstepping into full autonomy.
This fosters a collaborative environment where human oversight and AI capabilities work together in harmony.
Modern Work
Recently Prasanna Arikala stated that workers spend up to 30% of their time searching for documents and information.
Jerry Liu also recently stated that knowledge workers dedicate 50–80% or more of their time to analysing, synthesising, and creating unstructured data — consider how much of your day is spent reading reports, documentation, presentations, or crafting new versions of these materials.
Hence Agentic RAG or Knowledge Agents can help to automate these repetitive knowledge tasks.
By doing so, we can free up valuable time, enabling people to focus on critical thinking, better decision-making, and reducing burnout.
Early use cases already emerging include research assistants, automated workflows, and report generation, showcasing the transformative potential of this approach.
Task Classification
The study also looks at classification of tasks according to domain or intent.
Together with the risk involved.
The Number of Actions is important, as AI Agents need to project how many steps or actions will need to be taken for a certain task.
If the number of actions increase, the cost increase together with the likelihood of inaccuracies being introduced.
Named concepts refers to the number of named concepts provided in each task. According to prior work, most people can only handle 5 to 9 concepts at the same time.
Human Supervision
Considering the image below…it is a good example of introducing agency in a measured fashion.
The user asks a question: I need to set an alarm for every weekday morning at 7:30, and then cancel the alarm for Thursday, changing it to 8:00 in the evening.
This is a compound and multi-intent utterance, but see how the Agentic Assistant breaks down the request into a sequence of tasks and sub-tasks. And the user has the option to delete steps, or refine steps by splitting them.
Users recognise good planning if they see it…
But they can’t necessarily come up with the plan.
User involvement in the planning and execution of tasks does not significantly improve trust or lead to better-calibrated trust in the outcomes.
In fact, participation in planning can sometimes harm plan quality, particularly in tasks where the initial plan is already strong, potentially leading to worse performance during execution.
The results suggest that user involvement does not inherently help build trust.
Instead, the quality of the plan itself plays a crucial role, showing a strong positive correlation with trust in both planning and execution.
When plans are well-structured and high-quality, users tend to trust the AI Agent more and their trust aligns appropriately.
However, when plans are of lower quality, users struggle to adjust their trust levels.
This misalignment may stem from the convincing nature of plans generated by AI, which often appear logical and plausible at first glance.
Successful Planning is not equal to Successful Execution
User involvement in planning and execution can enhance overall task performance, particularly by improving execution accuracy.
As the data shows, when users participate in planning, they can help refine imperfect plans, such as addressing errors in grammar or structure, which in turn boosts execution accuracy.
Additionally, user involvement during the execution phase leads to the highest levels of accuracy across most tasks studied.
Analysis reveals that even with high-quality plans, LLM agents can still make mistakes during execution due to prediction errors — such as incorrect action names or parameters — or prediction failures, where valid actions are not provided.
Since deployed LLM services lack guarantees for reliability in planning or execution, user oversight becomes critical.
By actively engaging in plan quality control and monitoring risky actions, users can ensure that only correct and safe actions are carried out, leading to better and more reliable task outcomes.
Considering the image above, notice the plan edit option for users, and also the user involved execution, in the execution phase. The user can maually select proposed actions / plans.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.