Introduce Inspectability To HuggingFace SmolAgents
Inspecting AI Agent runs can be a challenge, and no AI Agent environment is complete without full inspectability of AI Agent Runs
Why Should You Log Your Agent Runs?
Traditional pre-defined graph approaches require matching user requests to the closest intent, restricting runs to specific flow paths.
When users deviate from these predefined paths, there is an degradation in user experience and this is typically considered as moving out of domain.
AI Agents, on the other hand, dynamically generate chains in real time, offering a greater level of flexibility and autonomy.
To ensure transparency, it’s crucial to have full visibility into the AI Agent’s decision chains and nodes. Telemetry that maps out the AI Agent’s path helps identify latency issues or errors efficiently, for instance.
Raw logs can be difficult to interpret, but an interactive visual representation simplifies exploration and debugging.
To standardise instrumentation, HuggingFace use OpenTelemetry.
OpenTelemetry allows you to seamlessly integrate an instrumentation layer, run your agents as usual, and automatically log everything to your platform.
Getting Started
First, install the necessary packages. In this example, HuggingFace use Phoenix by Arize AI, a powerful tool for collecting and inspecting logs.
However, any OpenTelemetry-compatible platform can be used for this purpose.
pip install smolagents
pip install arize-phoenix opentelemetry-sdk opentelemetry-exporter-otlp openinference-instrumentation-smolagents
Then the Phoenix server can be started and accessed via a browser…
python -m phoenix.server.main serve
Below a view from the terminal…
Similar to LangChain and LangSmith, a small snippet of code needs to be added to your Python script, as shown below. This reminds me of the approach LangChain follows with adding LangSmith telemetry to a LangChain application.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
endpoint = "http://0.0.0.0:6006/v1/traces"
trace_provider = TracerProvider()
trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))
SmolagentsInstrumentor().instrument(tracer_provider=trace_provider)
And below the complete AI Agent code which can be copied and pasted into a single Python file.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
endpoint = "http://0.0.0.0:6006/v1/traces"
trace_provider = TracerProvider()
trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))
SmolagentsInstrumentor().instrument(tracer_provider=trace_provider)
####
from typing import Optional
from smolagents import HfApiModel, LiteLLMModel, TransformersModel, tool
from smolagents.agents import CodeAgent, ToolCallingAgent
# Choose which inference type to use!
available_inferences = ["hf_api", "transformers", "ollama", "litellm"]
chosen_inference = "transformers"
print(f"Chose model: '{chosen_inference}'")
if chosen_inference == "hf_api":
model = HfApiModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
elif chosen_inference == "transformers":
model = TransformersModel(model_id="HuggingFaceTB/SmolLM2-1.7B-Instruct", device_map="auto", max_new_tokens=1000)
elif chosen_inference == "ollama":
model = LiteLLMModel(
model_id="ollama_chat/llama3.2",
api_base="http://localhost:11434", # replace with remote open-ai compatible server if necessary
api_key="your-api-key", # replace with API key if necessary
num_ctx=8192, # ollama default is 2048 which will often fail horribly. 8192 works for easy tasks, more is better. Check https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator to calculate how much VRAM this will need for the selected model.
)
elif chosen_inference == "litellm":
# For anthropic: change model_id below to 'anthropic/claude-3-5-sonnet-latest'
model = LiteLLMModel(model_id="gpt-4o")
@tool
def get_weather(location: str, celsius: Optional[bool] = False) -> str:
"""
Get weather in the next days at given location.
Secretly this tool does not care about the location, it hates the weather everywhere.
Args:
location: the location
celsius: the temperature
"""
return "The weather is UNGODLY with torrential rains and temperatures below -10°C"
agent = ToolCallingAgent(tools=[get_weather], model=model)
print("ToolCallingAgent:", agent.run("What's the weather like in Paris?"))
agent = CodeAgent(tools=[get_weather], model=model)
print("CodeAgent:", agent.run("What's the weather like in Paris?"))
Once the AI Agent is run, the traces are visible in the telemetry GUI, Note the agent name, the chains, and the llm and tool steps within the run, with input, output, latency and more.
Below a deeper view, with the input message to the model, the system is visible…
Finally, telemetry provides real-time visibility into AI Agents’ decision-making processes, helping to diagnose issues like latency, errors, or unexpected behaviour.
By mapping AI Agent workflows, telemetry enables developers to understand how different nodes interact, ensuring smoother and more efficient execution.
Without proper logging and monitoring, debugging AI Agents can be challenging, but telemetry simplifies troubleshooting through structured insights and visual representations.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.