LangSmith
I was fortunate to get early access to the LangSmith platform and in this article you will find practical code examples and demonstration applications for LangSmith. LangSmith by LangChain, is a platform for testing, evaluating, and monitoring LLM calls from Generative Apps.
I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.
With seamless integration with LangChain, arguably the leading open source framework for building with LLMs, LangSmith enables makers to manage & monitor LLM calls from chains and intelligent agents.
LangSmith is a web based GUI to test and monitor LLM Calls from Generative Apps / LLM Applications.
LangSmith is again a reminder that LLM based solutions and companies are superseded on a regular basis by free/open-sourced technologies, including standard functionality offered by LLM providers. Whilst there is a huge market developing under Generative AI, startups still need to focus on stellar UX and solving for particular vulnerabilities within the ecosystem via a layer of differentiating propriety software.
For starters, here are a few key considerations:
- LangSmith is not a flow building builder or designer and does not supersede application flow builders like Flowise and LangFlow.
- Also, LangSmith is not focussed on prompt performance per-se like ChainForge or a product like Flux.
- Currently LangSmith does not assist with comparing prompts at scale; but LangSmith does have a playground where experimentation is possible.
- The focus of LangSmith is on managing the link between LangChain applications and Large Language Models (LLMs).
- By quantifying LLM performance, users can optimise LLM interactions and the use of multiple LLMs, or migrating between LLMs are made easier and justifiable.
- LangSmith is ushering an era where the LLM becomes a utility, and Generative Apps become multi-LLM based. With Gen-Apps migrating between LLMs, based on cost, performance, and latency.
- Metrics are logged to LangSmith from a LangChain application by making use of tags in the LangChain code.
- Key metrics LangSmith surfaces are run count, latency (P50,P99) & token usage per application call.)
- The playground currently only provides access to OpenAI models and only for the run type LLM, and not chains.
- LLM Application data (Chain conversations, prompts, etc) can be stored, edited, rerun & managed within LangSmith.
The image below shows the basic landing page of LangSmith. At the bottom-left is basic housekeeping with API Key management, documentation and user management.
The current projects are all listed here.
Any LLM can be used within LangSmith, in this HuggingFace example I made use of google/flan-t5-xxl
. However the playground is only available for OpenAI; currently.
You can also see three projects listed with run-count, total tokens, latency, etc. displayed.
As seen below, in each project is both a Python and TypeScript example on how to incorporate the LangSmith project into your code. Hence multiple applications can log to one LangSmith project via a unique project identifier.
Single Prompt Example
Below is the complete code from the notebook. This is the simplest example I could put together, with the langChain parameters, and the simple OpenAI single question prompt.
pip install -U langsmith
pip install langchain
pip install openai
import os
from uuid import uuid4
unique_id = uuid4().hex[0:8]
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"Basic_Project_1"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "xxxxxxxxxxxxxxxxxx"
os.environ['OPENAI_API_KEY'] = str("xxxxxxxxxxxxxxxxxx")
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI()
llm.predict("What is the general weather in Montreal during summer?")
The output from the LLM below.
The general weather in Montreal during summer is warm and humid.
The average temperature ranges from around 20°C (68°F) to 30°C (86°F),
with occasional heatwaves pushing temperatures above 30°C (86°F).
Summers in Montreal also tend to be quite sunny, with occasional
thunderstorms and rainfall. It is advisable to pack light and breathable
clothing, as well as sunscreen and umbrellas.
The results are immediately visible within LangSmith, when opening project Basic_Project_1
which was referenced from the Python code, the run is displayed.
Drilling down further, the trace window displays latency, which LLM and tokens used. The data from the run is visible and can be shared, saved to a dataset; the run can also be rated in a human-in-the-loop fashion. Or, the playground can be opened for further manual intervention…
The playground is currently only available for OpenAI, but is a good environment for creating prompts and data.
Considering all available LLM playgrounds, the best playground environment currently available in terms of functionality and the sheer number of models available is the Vercel playground.
Prompt Chaining & Agents
Below is the code of a LangChain Agent which creates a LLM Chain to answer a slightly ambiguous question: What is the year of birth, of the man who is commonly regarded as the father of the iPhone?
pip install -U langsmith
pip install langchain
pip install openai
from getpass import getpass
HUGGINGFACEHUB_API_TOKEN = getpass()
from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain
pip install huggingface_hub
import os
from uuid import uuid4
unique_id = uuid4().hex[0:8]
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"Basic_Project_1"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "xxxxxxxxxxxxxxxxx" # Update to your API key
os.environ['OPENAI_API_KEY'] = str("xxxxxxxxxxxxxxxxx")
os.environ['HUGGINGFACEHUB_API_TOKEN'] = str("xxxxxxxxxxxxxxxxx")
question = "What is the year of birth, of the man who is commonly regarded as the father of the iPhone? "
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
repo_id = "google/flan-t5-xxl"
llm = HuggingFaceHub(
repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)
llm_chain = LLMChain(prompt=prompt, llm=llm)
print(llm_chain.run(question))
And the output:
Steve Jobs was born in 1955.
Steve Jobs is commonly regarded as the father of the iPhone.
After running the code, there is a new entry in our project, showing it was run type Chain, with the latency and number of tokens used. The run can be expanded, in the case of more complex Generative Apps.
There is the option to view a trace of the LLM Agent call, this really goes a long way to demystify Autonomous Agents and their interaction with LLMs.
The interaction can be rated manually whilst adding the data to a dataset.
In the dataset it is possible to re-run data against other LLMs, here data can be edited and more.
When sharing a LLM run, a publicly available URL is created, below are two screens with details which can be viewed.
In Conclusion
There is a definite organic segmentation taking place in the LLM ecosystem.
Meta AI, HuggingFace and others are disrupting the notion that OpenAI will have a stranglehold on LLM availability.
LLMs are becoming a utility and an ecosystem of tools are forming around LLMs.
This ecosystem is also self-segmenting into different disciplines.
Ranging from prompt management and optimisation, Gen-App and flow builders and data management at scale.
As mentioned before, so-called products with poor UX and no differentiating IP are and will be superseded by the growing ambit of the standard LLM offering, open-sourced tools, or other more astute and forward-thinking start-ups.
⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️
I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.