Chunking For LLMs Using Haystack & HuggingFace

In upcoming articles I will be focussing on the available and most efficient methods of presenting LLMs with reference data at inference.

Cobus Greyling
5 min readAug 28, 2023

--

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Retrieval Augmented Generation (RAG) is a feature which allows LLM based systems to access data which serves as a contextual reference at inference.

This technique enables in-context learning without costly fine-tuning, making the use of LLMs more cost-efficient.

Considering the practical example below, with working code, shows how information is chunked, then loaded into the Haystack document store and retrieved at inference with the user query.

Below is the HuggingFace dataset as seen in the dataset viewer, you can see how the information pertaining to the seven wonders of the world is chunked or divided into 151 records, each holding a contextual relevant text of about 200 words.

Source

This image shows the basic sequence of events; where the document store is defined, the chunked text is uploaded, and relevant text is retrieved with the user query or question.

For this example the LLM used is google/flan-t5-large. When this LLM is presented with the question: What was constructed to celebrate the successful defence of Rhodes city?

The answer given is Saint John’s Cathedral , which is not completely inaccurate, but is contextually wrong.

However, if the same question is asked with a contextual reference at inference time, the response from the LLM is: The Colossus of Rhodes was erected in the city of Rhodes, on the Greek island of the same name, by Chares of Lindos in 280 BC.

Below haystack is circumvented and the query is directly submitted to the Google LLM which will be used in the RAG code; google/flan-t5-large. On the left is the question is posed in isolation without any contextual reference. On the right the same question is asked with a contextual reference, notice the significant improvement in contextual accuracy.

Source

The code below illustrates the automation of a RAG pipeline making use the haystack RAG pipeline. The dataset of chunked documents accessed by the haystack document store is hosted on HuggingFace.

The query is submitted to the haystack framework, which used semantic search to retrieve relevant contextual data. The two documents returned were id’s 5dcd01886fcb24322578ceb49c96cc3e & b3de1a673c1eb2876585405395a10c3d.

The content associated with these two id’s matches the semantic similarity of the user query, and hence two documents were included in the prompt.

Below is the complete Python code you can run in a notebook. Notice where the dataset is defined, the model name and path in prompt_node, and the prompt.

The command print(output[“answers”][0].answer) prints only the response to the question.

The command print(output) returns the complete prompt, the documents included and other information. This is really useful for debugging and testing.

%%bash

pip install --upgrade pip
pip install farm-haystack[colab]
pip install datasets>=2.6.1

from haystack.telemetry import tutorial_running
tutorial_running(22)

from haystack.document_stores import InMemoryDocumentStore
document_store = InMemoryDocumentStore(use_bm25=True)

from datasets import load_dataset
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
document_store.write_documents(dataset)


from haystack.nodes import BM25Retriever
retriever = BM25Retriever(document_store=document_store, top_k=2)

from haystack.nodes import PromptNode, PromptTemplate, AnswerParser

rag_prompt = PromptTemplate(
prompt="""Related text: {join(documents)} \n\n Question: {query} \n\n Answer:""",
output_parser=AnswerParser(),
)
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", default_prompt_template=rag_prompt)

from haystack.pipelines import Pipeline

pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

output = pipe.run(query="What was constructed to celebrate the successful defence of Rhodes city?")

print(output["answers"][0].answer)

And the response from the RAG pipeline:

The Colossus of Rhodes was erected in the city of Rhodes, 
on the Greek island of the same name, by Chares of Lindos in 280 BC.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn

--

--