Sitemap
Press enter or click to view image in full size

Fundamentals of Autonomous AI Agents

4 min readOct 29, 2025

--

The new paper Fundamentals of Building Autonomous LLM Agents lays it out like a blueprint for digital minds.

True autonomy and the future of AI Agents are not a matter of introducing bigger and more capable Language Models…

The future of AI Agents and Agentic AI about orchestrating Language Models into a closed cognitive loop powered by four interconnected pillars:

Perception (sensing the world)

Reasoning (planning and adapting)

Memory (learning from experience) and

Action (executing in the real world).

Connect them right, and your agent evolves from reactive conversational UI to proactive thinker.

Let’s take it pillar-by-pillar…

Press enter or click to view image in full size

Perception

Perception is the first pillar and one can think of this as the AI Agent seeing and perceive the world around it.

Part of perception can be a trigger that sets a process off…

Agents can’t act without understanding the environment they live in . This pillar handles inputs like screenshots in the case of computer using AI Agents, audio input, text, structured data like tables and documents, or API feeds.

Text is the primary modality of input and usually serves as a starting point for implementation…

When AI Agents get input to act on from activities like web browsing and computer use, screenshots can be taken, and each screenshot is evaluating the contents.

Think bounding boxes that guide the model to focus on specific objects.

The drive is now to perceive human input, images, perceive via web browsing and computer use. But the future will be to perceive not only the digital worlds we as humans live in.

But to navigate and move in the physical world.

Reasoning

The way I think of reasoning is the ability of an AI Agent to take a complex and compound instruction, and decompose it into a sequence of sub-steps.

The sub-steps are sequenced in a logical order, and executed. With each sub-step that is completed, the AI Agent assesses the outcome of the step.

The breakdown below shows the sequence of evaluating the task, taking an action, observing the outcome…and iterating until the final answer is reached.

Press enter or click to view image in full size
Source: Kore.ai

So agents don’t stop at planning; they self-critique via observation, or as some calls it reflection.

Hence systems that evaluate outputs, rewrite errors, and log feedback for future runs.

This sequence have been augmented in recent times with Agentic workflows where complex tasks are broken down into sub-tasks and executed in parallel.

Press enter or click to view image in full size

Memory

Memory informs the context…

Memory informs the context…for all and any conversations context is of utmost importance.

In any conversation, if there is no context or at lest a lack of context, then we first spend time in establishing context.

Context can be described as the working set of information actively used in the moment to generate a response.

Memory is typically the stored data.

Memory can be stored and retrieved in different places, as shown in the pyramid below…the memory varies from very general at the base of the triangle.

To highly specialised and personalised as you go to the top of the pyramid.

Action

From Thought to doing…the element through which AI Agents take action is tools.

Tools are the hands and feet of AI Agents.

I always try and emphasise the role of tools in enabling autonomy, interoperability and real-world execution.

I view tools as the bridge between LLMs and external systems, allowing agents to perform actions like API calls, code execution, web browsing to achieve an objective, or GUI interactions.

In the past I have stresses that their effectiveness depends on thoughtful integration with memory, reasoning, and protocols.

Press enter or click to view image in full size

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. Language Models, AI Agents, Agentic Apps, Dev Frameworks & Data-Driven Tools shaping tomorrow.

Press enter or click to view image in full size

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

No responses yet