Scaleable Prompt Pipelines For LLMs

In a previous post I wrote about the evolution of prompt engineering. Creating highly scaleable & enterprise grade LLM based applications demand a pipeline approach to prompts.

5 min readApr 18, 2023

LLM Prompts yield better results when they have a contextual reference defined within the prompt. As seen below, the prompt consists of four labels: instruction, context, question and answer.

The question is the input from the user. The Context acts as a reference and contextual guide to the LLM on how to answer the question.

Composing a prompt manually and running it against a LLM is a straightforward process. However, compiling contextual prompts at scale will require automation of the prompt creation process.

Considering the diagram below, the red arrow indicates the point where automation is required. Based on the question of the user, a semantic search needs to be performed, and the relevant piece of text needs to be retrieved from the Knowledge Store. Again, this text snippet will act as the contextual reference for the prompt.

The example below showing how a prompt pipeline can be created for contextual prompts is from Haystack.

The documents are stored in a Document Store, from where the question answering system retrieve the answers to user questions. For this example, Elasticsearch is used. Elasticsearch runs separately from Haystack.

Next in the pipeline is the Document Retriever. The retriever sifts through all the Documents and returns only those that are relevant to the question.

The Retriever performs document retrieval by sweeping through a DocumentStore and returning a set of candidate Documents that are relevant to the query. (BM25Retriever) ~ Haystack

The Document Reader scans the text received from the Retriever and extracts the top answers. For this demo the RoBERTa question answering model is used.

The Retrieved answer can then be submitted to a LLM as the context of the engineered prompt.

Below you can see the instruction section of the prompt, which is static for this use-case. The context was retrieved via the process described above. The user question is inserted into the prompt and the stop sequence defined for the prompt is “Answer:”.

The LLM, in this case text-davinci-003 formulates an answer based on the context provided within the prompt.

The Haystack example made use of Game Of Thrones as the domain which is a general broad domain. And the likelihood of OpenAI’s LLMs answering this question correctly is fairly good. As can be seen in the image below, the top prompt has no context, while the bottom prompt have the context injected.

It is interesting to see how the context influences the LLM output.

However, for a more narrow private domain, for instance specific company knowledge and QnA, context will be essential.

In Conclusion

The last mile of AI implementations has been in the spotlight of late. The challenge is that companies are not realising business value and cost savings via AI as anticipated.

The solution to this problem lies at a few fronts:

In the case of Conversational UI/AI, business intents should not be used as a reference, but rather customer intents. Hence the conversation customers want to have. To determine customer intent, a bottom-up, data centric approach to Conversational AI design is required.
LLM models require accurate data. This data is essential for few-shot or one-shot training via prompt injection.
The next level is fine-tuning LLMs for domain and company specific implementations. Again astute data curation, and conversion of unstructured data into structured LLM training data is crucial.

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

NLU design tooling

“Conversation Designer, Retail, 10k+ employees The tool that turned conversation designers, into NLU designers” ★★★★★…

www.humanfirst.ai

https://www.linkedin.com/in/cobusgreyling

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

The Cobus Quadrant™ Of NLU Design

NLU design is vital to planning and continuously improving Conversational AI experiences.

cobusgreyling.medium.com

The Cobus Quadrant™ Of Conversation Design Capabilities

∗ This is part one of a two part series, please also take a look part two, the Cobus Quadrant of NLU Design.

cobusgreyling.medium.com

Google Colaboratory

Edit description

colab.research.google.com

Build a Scalable Question Answering System | Haystack

Level: Beginner Time to complete: 20 minutes Nodes Used: ElasticsearchDocumentStore, BM25Retriever, FARMReader Goal…

haystack.deepset.ai

Generative AI Prompt Pipelines

Prompt Pipelines extend prompt templates by automatically injecting contextual reference data for each prompt.

cobusgreyling.medium.com

Preventing LLM Hallucination With Contextual Prompt Engineering — An Example From OpenAI

Even for LLMs, context is very important for increased accuracy and addressing hallucination. From the examples below…

cobusgreyling.medium.com

Scaleable Prompt Pipelines For LLMs

In a previous post I wrote about the evolution of prompt engineering. Creating highly scaleable & enterprise grade LLM based applications demand a pipeline approach to prompts.

In Conclusion

NLU design tooling

“Conversation Designer, Retail, 10k+ employees The tool that turned conversation designers, into NLU designers” ★★★★★…

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

The Cobus Quadrant™ Of NLU Design

NLU design is vital to planning and continuously improving Conversational AI experiences.

The Cobus Quadrant™ Of Conversation Design Capabilities

∗ This is part one of a two part series, please also take a look part two, the Cobus Quadrant of NLU Design.

Google Colaboratory

Edit description

Build a Scalable Question Answering System | Haystack

Level: Beginner Time to complete: 20 minutes Nodes Used: ElasticsearchDocumentStore, BM25Retriever, FARMReader Goal…

Generative AI Prompt Pipelines

Prompt Pipelines extend prompt templates by automatically injecting contextual reference data for each prompt.

Preventing LLM Hallucination With Contextual Prompt Engineering — An Example From OpenAI

Even for LLMs, context is very important for increased accuracy and addressing hallucination. From the examples below…

Written by Cobus Greyling