Introduce Inspectability To HuggingFace SmolAgents

Inspecting AI Agent runs can be a challenge, and no AI Agent environment is complete without full inspectability of AI Agent Runs

5 min readFeb 13, 2025

Why Should You Log Your Agent Runs?

Traditional pre-defined graph approaches require matching user requests to the closest intent, restricting runs to specific flow paths.

When users deviate from these predefined paths, there is an degradation in user experience and this is typically considered as moving out of domain.

AI Agents, on the other hand, dynamically generate chains in real time, offering a greater level of flexibility and autonomy.

To ensure transparency, it’s crucial to have full visibility into the AI Agent’s decision chains and nodes. Telemetry that maps out the AI Agent’s path helps identify latency issues or errors efficiently, for instance.

Raw logs can be difficult to interpret, but an interactive visual representation simplifies exploration and debugging.

To standardise instrumentation, HuggingFace use OpenTelemetry.

OpenTelemetry allows you to seamlessly integrate an instrumentation layer, run your agents as usual, and automatically log everything to your platform.

Getting Started

First, install the necessary packages. In this example, HuggingFace use Phoenix by Arize AI, a powerful tool for collecting and inspecting logs.

However, any OpenTelemetry-compatible platform can be used for this purpose.

pip install smolagents
pip install arize-phoenix opentelemetry-sdk opentelemetry-exporter-otlp openinference-instrumentation-smolagents

Then the Phoenix server can be started and accessed via a browser…

python -m phoenix.server.main serve

Below a view from the terminal…

Similar to LangChain and LangSmith, a small snippet of code needs to be added to your Python script, as shown below. This reminds me of the approach LangChain follows with adding LangSmith telemetry to a LangChain application.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor

endpoint = "http://0.0.0.0:6006/v1/traces"
trace_provider = TracerProvider()
trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))

SmolagentsInstrumentor().instrument(tracer_provider=trace_provider)

If you like this article & want to show some love ❤️

-Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Let’s connect on LinkedIn, and stay in touch on X!

And below the complete AI Agent code which can be copied and pasted into a single Python file.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor

endpoint = "http://0.0.0.0:6006/v1/traces"
trace_provider = TracerProvider()
trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))

SmolagentsInstrumentor().instrument(tracer_provider=trace_provider)

####

from typing import Optional

from smolagents import HfApiModel, LiteLLMModel, TransformersModel, tool
from smolagents.agents import CodeAgent, ToolCallingAgent


# Choose which inference type to use!

available_inferences = ["hf_api", "transformers", "ollama", "litellm"]
chosen_inference = "transformers"

print(f"Chose model: '{chosen_inference}'")

if chosen_inference == "hf_api":
    model = HfApiModel(model_id="meta-llama/Llama-3.3-70B-Instruct")

elif chosen_inference == "transformers":
    model = TransformersModel(model_id="HuggingFaceTB/SmolLM2-1.7B-Instruct", device_map="auto", max_new_tokens=1000)

elif chosen_inference == "ollama":
    model = LiteLLMModel(
        model_id="ollama_chat/llama3.2",
        api_base="http://localhost:11434",  # replace with remote open-ai compatible server if necessary
        api_key="your-api-key",  # replace with API key if necessary
        num_ctx=8192,  # ollama default is 2048 which will often fail horribly. 8192 works for easy tasks, more is better. Check https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator to calculate how much VRAM this will need for the selected model.
    )

elif chosen_inference == "litellm":
    # For anthropic: change model_id below to 'anthropic/claude-3-5-sonnet-latest'
    model = LiteLLMModel(model_id="gpt-4o")


@tool
def get_weather(location: str, celsius: Optional[bool] = False) -> str:
    """
    Get weather in the next days at given location.
    Secretly this tool does not care about the location, it hates the weather everywhere.

    Args:
        location: the location
        celsius: the temperature
    """
    return "The weather is UNGODLY with torrential rains and temperatures below -10°C"


agent = ToolCallingAgent(tools=[get_weather], model=model)

print("ToolCallingAgent:", agent.run("What's the weather like in Paris?"))

agent = CodeAgent(tools=[get_weather], model=model)

print("CodeAgent:", agent.run("What's the weather like in Paris?"))

If you like this article & want to show some love ❤️

-Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Let’s connect on LinkedIn, and stay in touch on X!

Once the AI Agent is run, the traces are visible in the telemetry GUI, Note the agent name, the chains, and the llm and tool steps within the run, with input, output, latency and more.

Below a deeper view, with the input message to the model, the system is visible…

Finally, telemetry provides real-time visibility into AI Agents’ decision-making processes, helping to diagnose issues like latency, errors, or unexpected behaviour.

By mapping AI Agent workflows, telemetry enables developers to understand how different nodes interact, ensuring smoother and more efficient execution.

Without proper logging and monitoring, debugging AI Agents can be challenging, but telemetry simplifies troubleshooting through structured insights and visual representations.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

SmolAgents

SmolAgents from HuggingFace

cobusgreyling.medium.com

COBUS GREYLING

Where AI Meets Language | Language Models, AI Agents, Agentic Applications, Development Frameworks & Data-Centric…

www.cobusgreyling.com

Inspecting runs with OpenTelemetry

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Introduce Inspectability To HuggingFace SmolAgents

Inspecting AI Agent runs can be a challenge, and no AI Agent environment is complete without full inspectability of AI Agent Runs

Why Should You Log Your Agent Runs?

Getting Started

If you like this article & want to show some love ❤️

-Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Let’s connect on LinkedIn, and stay in touch on X!

If you like this article & want to show some love ❤️

-Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Let’s connect on LinkedIn, and stay in touch on X!

SmolAgents

SmolAgents from HuggingFace

COBUS GREYLING

Where AI Meets Language | Language Models, AI Agents, Agentic Applications, Development Frameworks & Data-Centric…

Inspecting runs with OpenTelemetry

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Written by Cobus Greyling

No responses yet