LlamaIndex Chat Engine
LlamaIndex is a toolkit to easily connect Large Language Models (LLMs) to external data sources. Data sources include documents, web pages and more. By default LlamaIndex uses the OpenAI GPT3 (text-davinci-003) model. There are also underlying features which leverages LangChain.
I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.
Underlying LLMs
By default, we use the OpenAI GPT-3 text-davinci-003
model. To make use of LlamaIndex in it’s default install state, you will need to define your OpenAI API Key:
os.environ['OPENAI_API_KEY'] = str("xxxxxxxxxxxxxxxxxxxxxxxxx")
You can change the underlying LLM used and configurations, LlamaIndex achieves this by leveraging LangChain. Obviously you will need to define environment keys and tokens depending on the LLMs used.
LlamaIndex At Its Core
LlamaIndex is a toolkit to easily connect LLMs with external data.
Connectors include linking to documents, web pages, Slack, Discord, and more.
LlamaIndex Chat Engine
The LlamaIndex Chat Engine is an interface which enables you to have a conversation with your data.
The conversations enabled by the LlamaIndex Chat Engine is not merely a single dialog turn, question and answer conversation.
It allows for a multi-turn contextually aware conversation for implicit referencing of memory.
There are two modes:
- Condensed Question Mode
- Agent Mode
Chat Engine — Condense Question Mode
Condense question and answer mode is a simple chat interface built on top of a query engine.
For each interaction:
- A question is generated from the conversational context and the last user message.
- The query engine is queried with the condensed question for a response.
This approach is simple, and works for questions directly related to the knowledge base.
Below the code to install and run the LlamaIndex application within a Notebook. You will see that I needed to install html2text seeing document I reference is a web url.
pip install llama_index
pip install html2text
import os
import openai
os.environ['OPENAI_API_KEY'] = str("xxxxxxxxxxxxxx")
from llama_index import VectorStoreIndex, SimpleWebPageReader
data = SimpleWebPageReader(html_to_text=True).load_data(["https://en.wikipedia.org/wiki/South_Africa"])
index = VectorStoreIndex.from_documents(data)
chat_engine = index.as_chat_engine(verbose=True)
Below is a highly contextual question based on the document provided:
response = chat_engine.chat('What languages are spoken there?')
And below the response:
The languages spoken in South Africa are Zulu, Xhosa, Afrikaans, English,
Pedi, Tswana, Southern Sotho, Tsonga, Swazi, Venda, and Southern Ndebele.
Additionally, Fanagalo, Khoe, Lobedu, Nama, Northern Ndebele, Phuthi,
and South African Sign Language are also spoken, as well as
European languages such as Italian, Portuguese, Dutch, German, and Greek,
and Indian languages such as Gujarati, Hindi, Tamil, Telugu, and Urdu.
French is spoken by migrants from Francophone Africa
Here a highly contextual follow-up question is asked, not only contextual with reference to the document supplied, but also contextually relevant to the previous question:
response = chat_engine.chat('Of those, which are the two minorities?')
And the correct response is received:
Fanagalo and Khoe are two languages spoken by minorities in South Africa.
Agent Mode
ReAct is an agent based chat mode built on top of a query engine which references your data. Implemented via a LangChain agent.
The two lines of code below, can merely be added at the bottom of the existing code.
For each chat interaction, the agent enters a ReAct loop:
- Deciding if the query engine tool should be used.
- (optional) use the query engine tool and observe its output
- decide whether to repeat or give final response
Agent mode is flexible, as agents have the flexibility to choose between the querying the knowledge base or not.
The performance is also more dependent on the quality of the LLM.
chat_engine = index.as_chat_engine(chat_mode='react', verbose=True)
response = chat_engine.chat('Use the tool to answer: What happened in the year 1652?')
Lastly
Below are three lines of code which can be run in the notebook, it creates an interactive looping chat interface. It is quite a neat way to test a conversational interface:
from llama_index.chat_engine import SimpleChatEngine
chat_engine = SimpleChatEngine.from_defaults()
chat_engine.chat_repl()
An the chat interface view below as seen in a Colab notebook:
⭐️ Please follow me on LinkedIn for updates on LLMs ⭐️
I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.