Step-Wise Controllable Agents From LlamaIndex

A few elements of Agents which has been lacking, are inspectability, observability and control.

Cobus Greyling
5 min readApr 10, 2024



What continues to excite me is the advances in the arena of autonomous agents together with the merging of RAG and agent functionality. A few months ago, LlamaIndex introduced a new lower-level agent API…

  • This new agent API allows user to step through and control an agent on a more granular approach.
  • Task creation and task execution are separated; hence a task can be created and executed at a later stage.
  • Users can view each step, together with the upcoming steps.
  • Step-wise can be used with ReACT and OpenAI function calling.

Considering the image below, users can create tasks, decide when the task should be used and run it and users are able to control the input for each step.

I think this development from LlamaIndex adds inspectability and explainability to agents.

Agent Architecture

The LlamaIndex agent is constituted by AgentRunner objects; the AgentRunner objects interfaces with the AgentWorkers.

AgentRunners are orchestrators which store:

  • State
  • Conversational Memory
  • Create Tasks
  • Maintain Tasks
  • Run Steps for each Task
  • Present User-Facing, High-Level User Interface

AgentWorkers take care of:

  • Selecting and using tools
  • Select the LLM to make use of.

The step-wise execution is controlled by AgentWorker for each task. When the AgentWorker is given an input step, the AgentWorker is responsible for creating and generating the next step.

AgentWorkers are also responsible for the control of the step-wide execution of a task.

Tasks & Steps


This is a high-level task which takes in a user query and passes along additional information like memory.


A TaskStep represents a single step, which feeds into an AgentWorker and generates a TaskStepOutput. One Task can cover multiple Steps.


This is the output and result from a step that executed.

Notebook Walkthrough

Below, two very basic tools are created, first a multiply tool is created and secondly an addition tool is created. You can see the function tools created with the names of multiply_tool and add_tool.

def multiply(a: int, b: int) -> int:
"""Multiple two integers and returns the result integer"""
return a * b

multiply_tool = FunctionTool.from_defaults(fn=multiply)

def add(a: int, b: int) -> int:
"""Add two integers and returns the result integer"""
return a + b

add_tool = FunctionTool.from_defaults(fn=add)

tools = [multiply_tool, add_tool]

Something to keep in mind, is that you will have to add your API key to the notebook made available by LlamaIndex. As seen below…

import os
import openai
os.environ['OPENAI_API_KEY'] = str("<Your API Key Goes Here>")

The agent is defined:

agent = OpenAIAgent.from_tools(tools, llm=llm, verbose=True)

The agent is asked the question: What is (121 * 3) + 42?

And the agent knows to call the two functions to answer the question. As seen below, the agent is called directly and the agent knows what tools to call to solve the calculation.

response ="What is (121 * 3) + 42?")

Notice how verbose and detailed the output from the agent is:

AgentChatResponse(response='The result of (121 * 3) + 42 is 405.', 
raw_input={'args': (), 'kwargs': {'a': 121, 'b': 3}},
raw_input={'args': (),
'kwargs': {'a': 363, 'b': 42}},

Working With Tasks

Below you see the task is defined, with defining the task, the process is not run at this stage.

task = agent.create_task("What is (121 * 3) + 42?")

Considering the image below, the agent run step functionality is run, and as the query consists of two steps, two steps need to be executed to achieve the final result.

Below the query is shown (1) where the user can ask the agent if the step is the final step, or if further steps need to run. Hence the user can check the status of the agent.

In the example below, the value returned is True, which means that that agent is completed its run.

And lastly, (2) the final response is shown…

In Conclusion

I always had the feeling that autonomous agents are too autonomous and lacked a level of inspectability, interpretability and human involvement.

LangChain is addressing these requirements from a post-run perspective in the form of LangSmith. Within LangSmith the chain formed by the agent is visible, and can be decomposed for a more granular view.

With the Step-Wise approach, granularity and steerability are available in runtime.

The Step-Wise approach allows also for traceability, each task and step having an ID, when a next step is taken, the task step id is logged with the step id.

raw_input={'args': (), 'kwargs': {'a': 121, 'b': 3}},
raw_output=363)], source_nodes=[]),
input='What is (121 * 3) + 42?',
step_state={}, next_steps={},
step_state={}, next_steps={}, prev_steps={}, is_ready=True)], is_last=False)

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.




Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.