Using LangChain To Create Large Language Model (LLM) Applications Via HuggingFace

Langchain is an open-source framework which facilitates the creation of LLM based applications and chatbots.

Cobus Greyling
7 min readJan 31, 2023

--

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

What makes the development of Langchain important is the notion that we need to move past the playground scenario and experimentation phase for productionising Large Language Model (LLM) functionality.

The answer to developing on LLMs is not necessarily the current and existing Conversational AI Development Frameworks, even-though that is one of the options.

Bespoke developed pro-code frameworks interfacing to and leveraging LLMs should also not be seen as inevitable.

As you will see in this article, Langchain is an alternative framework to create LLM based application and conversational interfaces in a structured and intuitive framework.

Langchain also contributes to a shared understanding and way-of-work between LLM developers. A uniform approach can assist in standardising LLM implementations and expectations while demystifying market expectations on cost and performance.

In a recent article I wrote about the challenges of conversation flow design when leveraging Large Language Models (LLMs).

I also illustrated a few scenarios on how Langchain can be used to create chatbots and other LLM based applications.

This article covers two aspects of LLMs:

1️⃣ An example of using Langchain to interface to the HuggingFace inference API for a QnA chatbot.

2️⃣ Followed by a few practical examples illustrating how to introduce context into the conversation via a few-shot learning approach, using Langchain and HuggingFace.

Setting up HuggingFace🤗 For QnA Bot

You will need to create a free account at HuggingFace, then head to settings under your profile. As seen below, I created an access token with the name LangChain.

Below is the complete Python code for the Langchain QnA bot interfacing to HuggingFace. Notice where you will have to add your HuggingFace API token, and then where the question is added.

pip install langchain[all]
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "xxxxxxxxxxxxxxxxxxx"
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm=HuggingFaceHub(repo_id="google/flan-t5-xl", model_kwargs={"temperature":1e-10})

question = "When was Google founded?"

print(llm_chain.run(question))

And the result from the query:

Google was founded in 1998. The final answer: 1998.

Any HuggingFace model can be accessed by navigating to the model via the HuggingFace website, clicking on the copy icon as shown below. In the code, set repo_id equal to the clipboard contents.

For example, as shown in the image, the reference to the bloom model is copied:

repo_id="bigscience/bloom"

With the notebook on the left, and the HuggingFace model card result on the right:

Two caveats I need to add, some of the HuggingFace models I referenced from the Colab Notebook timed out, this might be related to being on a free tier or the like.

Secondly, the results do seem to differ between the Colab Notebook and model card queries. In general LLMs are non-deterministic, meaning that identical inputs can yield different outputs. There are options to set temperature, response length, etc, but a small amount of variability may remain.

Few Shot Learning Contextual Chatbot

This example demonstrates the simplest way conversational context can be managed within a LLM based chatbot…

A LLM can be used in a generative approach as seen below in the OpenAI playground example. The initial input (red block number 1) is submitted to the LLM.

This initial prompt contains a description of the chatbot and the first human input.

Red block number 2: The LLM (in this case text-davinci-003) response.

Red block number 3: To continue the conversation, block 1, 2 and the new additional block 3 is submitted.

Red block number 4: The LLM responds with block 4, this response is informed and premised on the context of blocks 1, 2 and 3 which are submitted together.

So in simple terms each dialog turn is buffered with the conversational memory…and I know what you are thinking. This buffer can get too large in terms of sheer size and LLM costs.

Longer conversations can be solved for in two ways:

  1. Truncating the conversational history, hence removing the first portion of the conversation history at set stages. This approach is analogous to limiting log files to a certain size via rolling logs.
  2. The second approach is making use of LLMs to summarise the conversation history, as the conversations continue.

Within LangChain ConversationBufferMemory can be used as type of memory that collates all the previous input and output text and add it to the context passed with each dialog sent from the user.

What I like, is that LangChain has three methods to approaching managing context:

⦿ Buffering: This option allows you to pass the last N interactions in as contextual reference. N can be set based on a fixed number.

⦿ Summary: Summarising the conversations and making use of the summary instead of the verbatim dialogues. Compared to buffer, summarisation compresses the contextual information. There will be loss, but engineered prompt length limits will remain within bounds.

⦿ Combination: A combination of buffering and summarisation, where a summary is generated together with the verbatim responses from previous interactions. This approach is more balanced, for example, the last 5 dialog turns can be verbatim, and older dialogs can be summarised.

And finally, below a code example, which is referencing the HuggingFace inference API, with the user input:

from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory

conversation = ConversationChain(
llm=llm,
verbose=True,
memory=ConversationBufferMemory()
)

conversation.predict(input="Hi there!")

And the LLM response:

> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:

> Finished chain.
Hi there!

User input:

conversation.predict(input="Tell me more about yourself?")

And again the response from the LLM:

> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI: Hi there!
Human: Tell me more about yourself?
AI:

> Finished chain.
I'm a student at the University of Washington.

In Conclusion

The best way to get started is to head to the Langchain documentation and start prototyping in a Notebook. 🙂

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

https://www.linkedin.com/in/cobusgreyling
https://www.linkedin.com/in/cobusgreyling

https://langchain-hub-ui-production.up.railway.app/

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com