LangChain Chatbot Framework With Retrievers

In recent times, there has been significant attention on agents, though concerns have emerged regarding their level of autonomy. However, with the LangChain Chatbot Framework utilising Retrievers, it becomes possible to construct highly adaptable conversational UIs.

Cobus Greyling
6 min readMay 7, 2024

--

Introduction

In my previous post I shared some ideas on how LLMs have disrupted the chatbot development landscape and how via LLMs, AI was introduced to the different levels of chatbot development. Ranging from dialog flows, Natural Language Generation (NLG) and Natural Language Understanding (NLU).

In the previous article, I also shared a working Python code example of the simplest implementation of the LangChain Chatbot Framework.

Domain Specific Knowledge

One thing chatbot use-cases and RAG have taught us, is that organisations are interested in domain specific implementations. And conversational UIs need to be flexible when domain specific knowledge is introduced to the chatbot.

To have a working LangChain Chatbot for general conversations where memory is included is one thing. But for a practical implementations external data is a necessity. LangChain refers to this as Retrievers.

LangChain Retrievers

A retriever serves as an interface designed to provide documents in response to unstructured (natural language) queries. These documents or information need not be in a vector store per se and can range in format.

Unlike a vector store, which is more specific, a retriever doesn’t necessarily store documents but focuses solely on retrieving them.

While vector stores can function as the foundation of a retriever, various other retriever types exist, offering different approaches to document retrieval.

Here you can find a table containing the index types, use-cases and a description to the different retrievers.

The primary use of a Retriever is to extract domain specific knowledge for the chatbot.

Working Code Example

Again, below in this article you will find a working code example, which you can copy and paste into a Colab notebook. The only change you will have to make, is to add your own OpenAI API key in the area marked.

os.environ[‘OPENAI_API_KEY’] = str(“<Your OpenAI Key Goes Here>”)

I’m sure there are ways to optimise the code below; but I tried to make it as self-explanatory as possible.

The Example used the LangSmith Documentation as the domain specific source and the data is stored in a vector store for retrieval.

Elements To Note

In the code I made notes of elements to take note of…

# I had run install quie a bit to get all the requirments covered. 
%pip install --upgrade --quiet langchain-chroma beautifulsoup4
!pip install langchain_community
!pip install langchain_text_splitters
!pip install langchain_openai
!pip install langchain
#####
import os
os.environ['OPENAI_API_KEY'] = str("<Your API Key Goes Here>")
#####
# The document-loader is used to pull data from a web url
#####
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/overview")
data = loader.load()
#####
#split the document into smaller chunks so the LLM context window
# can handle and store the data in a vector database
#####
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
#####
# Eembed & store chunks in vector database
#####
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
#####
# create a retriever from our initialized vectorstore
# k is the number of chunks to retrieve
#####
retriever = vectorstore.as_retriever(k=4)
docs = retriever.invoke("how can langsmith help with testing?")

docs
#####
# Handling Documents
#####
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate
from langchain_core.prompts import (ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder)

chat = ChatOpenAI(model="gpt-3.5-turbo-1106")

question_answering_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Answer the user's questions based on the below context:\n\n{context}",
),
MessagesPlaceholder(variable_name="messages"),
]
)

document_chain = create_stuff_documents_chain(chat, question_answering_prompt)
#####
from langchain.memory import ChatMessageHistory

demo_ephemeral_chat_history = ChatMessageHistory()

demo_ephemeral_chat_history.add_user_message("how can langsmith help with testing?")

document_chain.invoke(
{
"messages": demo_ephemeral_chat_history.messages,
"context": docs,
}
)
#####
# Creating a retrieval chain
#####
from typing import Dict

from langchain_core.runnables import RunnablePassthrough


def parse_retriever_input(params: Dict):
return params["messages"][-1].content


retrieval_chain = RunnablePassthrough.assign(
context=parse_retriever_input | retriever,
).assign(
answer=document_chain,
)
#####
response = retrieval_chain.invoke(
{
"messages": demo_ephemeral_chat_history.messages,
}
)

response
#####
demo_ephemeral_chat_history.add_ai_message(response["answer"])

demo_ephemeral_chat_history.add_user_message("tell me more about that!")

retrieval_chain.invoke(
{
"messages": demo_ephemeral_chat_history.messages,
},
)
#####
retrieval_chain_with_only_answer = (
RunnablePassthrough.assign(
context=parse_retriever_input | retriever,
)
| document_chain
)

retrieval_chain_with_only_answer.invoke(
{
"messages": demo_ephemeral_chat_history.messages,
},
)
#####
# Query transformation
#####
retriever.invoke("how can langsmith help with testing?")
#####
retriever.invoke("tell me more about that!")
#####
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableBranch

# We need a prompt that we can pass into an LLM to generate a transformed search query

chat = ChatOpenAI(model="gpt-3.5-turbo-1106", temperature=0.2)

query_transform_prompt = ChatPromptTemplate.from_messages(
[
MessagesPlaceholder(variable_name="messages"),
(
"user",
"Given the above conversation, generate a search query to look up in order to get information relevant to the conversation. Only respond with the query, nothing else.",
),
]
)

query_transforming_retriever_chain = RunnableBranch(
(
lambda x: len(x.get("messages", [])) == 1,
# If only one message, then we just pass that message's content to retriever
(lambda x: x["messages"][-1].content) | retriever,
),
# If messages, then we pass inputs to LLM chain to transform the query, then pass to retriever
query_transform_prompt | chat | StrOutputParser() | retriever,
).with_config(run_name="chat_retriever_chain")
########################################################
document_chain = create_stuff_documents_chain(chat, question_answering_prompt)

conversational_retrieval_chain = RunnablePassthrough.assign(
context=query_transforming_retriever_chain,
).assign(
answer=document_chain,
)

demo_ephemeral_chat_history = ChatMessageHistory()
########################################################
demo_ephemeral_chat_history.add_user_message("how can langsmith help with testing?")

response = conversational_retrieval_chain.invoke(
{"messages": demo_ephemeral_chat_history.messages},
)

demo_ephemeral_chat_history.add_ai_message(response["answer"])

response
#####
demo_ephemeral_chat_history.add_user_message("tell me more about that!")

conversational_retrieval_chain.invoke(
{"messages": demo_ephemeral_chat_history.messages}
)
#####

Handling Documents

A helper function called create_stuff_documents_chain is used to seamlessly integrate all input documents into the prompt, managing formatting as well.

Additionally, the ChatPromptTemplate.from_messages method is used to structure the message input intended for the model, incorporating a MessagesPlaceholder where chat history messages will be directly injected.

Creating A Retrieval Chain

The retriever fetch information pertinent to the last message provided by the user.

This message is extracted and utilised to retrieve relevant documents, which will then append to the current chain as context.

Subsequently, this passes the context along with the previous messages into the document chain to generate the final answer.

To facilitate this process, the RunnablePassthrough.assign() method is used to pass intermediate steps through at each invocation.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com