Using 🦜🛠️LangSmith To Inspect LangChain Agents

Even-though LangChain has removed many blockers to creating LLM-Based Generative Apps, it is still deceptively hard to take applications from prototype to production. LangChain has identified application performance as the biggest blocker. And LangSmith is here to close the gap between prototype & production.

Cobus Greyling
5 min readAug 8, 2023


I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

As you will see in the practical code example below, LangSmith closes the loop between debugging, testing, evaluation and monitoring.

LangSmith lends insight into what the prompt sent to the LLM looks like, after the template has been formatted.

The sequence followed by the Agent and how the agent rotates between tools are all decomposed step by step and can be inspected.


The sequence followed by the agent is visible, token used and cost can be tracked for each of the agent iterations.

Below you see how a single call to an agent is decomposed into all the steps followed by the agent, and the tools used. With the LLM input and output along the way.

Notice how the tokens and latency are given within the agent trace; and how it is possible to navigate from node to node with relevant data being surfaced.

LangSmith Agent View

The Python Code

The code-block below is a simple yet complete example on how to create a LangChain Agent, and how to incorporate the LangSmith tracing calls in the code.

The last four lines in the code below set:

  • The tracing to true,
  • Defines the endpoint,
  • Project API key and the
  • LangSmith project name.
pip install langchain
pip install -U langsmith
pip install google-search-results
pip install openai

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

import os
os.environ['OPENAI_API_KEY'] = str("xxxxxxxxxxxxxxxxxx")
os.environ["SERPAPI_API_KEY"] = str("xxxxxxxxxxxxxxxxxx")
llm = OpenAI(temperature=0,model_name='gpt-4')

from uuid import uuid4

unique_id = uuid4().hex[0:8]


Below, the tools are loaded for the agent:

  • Serp API
  • LLM Math, and
  • LLM.

And the agent is initialized:

tools = load_tools(["serpapi", "llm-math"], llm=llm)

agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

Here is the question which will be posed to the agent:

What is the square root of the hight in metres of what is commonly considered as the highest mountain on earth?

The code to run the agent with the question:"What is the square root of the hight in metres of what is commonly considered as the highest mountain on earth?")

And the answer returned by the agent:

The square root of the height in meters of Mount Everest is approximately 94.07.

Here is the full agent response:

> Entering new AgentExecutor chain...
The highest mountain on Earth is commonly considered to be Mount Everest. I need to find out its height in meters and then calculate the square root of that number.
Action: Search
Action Input: "Height of Mount Everest in meters"
Observation: 8,848.9 m
Thought:Now that I know the height of Mount Everest in meters, I need to calculate the square root of 8848.9.
Action: Calculator
Action Input: sqrt(8848.9)
Observation: Answer: 94.06859199541577
Thought:I now know the final answer
Final Answer: The square root of the height in meters of Mount Everest is approximately 94.07.

> Finished chain.
The square root of the height in meters of Mount Everest is approximately 94.07.

In the image below five projects are visible, a specific project is accessed via the project name in the code: LANGCHAIN_PROJECT.

For our test, all agent runs are logged against the Agent_1 project in the following way: os.environ[“LANGCHAIN_PROJECT”]=”Agent_1".

Back To LangSmith

In the view below you can see the agent has run 13 times, 6,349 tokens have been used, and the P50 and P99 latency is visible.

Clicking through one level deeper, into project Agent_1, here the individual runs are visible; both successes and failures. With agent type, and the agent description.

The view below is one I really like, as it breaks down the Agent’s behaviour chain by chain, showing the tool used in each step.

For instance, below the Search step selected, zero tokens were used while the SerpApi was used, and this step took 3,13 seconds.

Clicking on the tool Search, the input and output of the tool is vissible.

What is really handy with the agent breakdown:

  • Transparency is introduced in terms of the Agent’s behaviour,
  • It is clear which requests invokes which tools. Especially in instances where an agent has a larger number of tools.
  • The time and tokens (cost) spent by each tool is visible. And it is possible to optimise the agent on a tools level.
  • As seen below, the prompt template used is visible in LangSmith, and lends insight into how the prompt template is constituted or final version of the template looks like, prior to LLM submission.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.