LLM Drift, Prompt Drift & Cascading

Prompt Chaining can be performed manually or automatically; manual entails crafting chains by hand, via a GUI chain building tool. Autonomous Agents create chains on the fly as they execute while making use of the tools at their disposal. Both these approaches are susceptible to cascading, LLM & prompt Drift.

5 min readFeb 23, 2024

LLM Drift

LLM Drift is definite changes in LLM responses over a relatively short period of time. This is not related to LLMs being in essence non-deterministic or related to slight prompt engineering wording changes; but rather fundamental changes to the LLM.

A recent study found that over a period of four months, the response accuracy of GPT-4 and GPT-3.5 fluctuates considerably in the positive but more alarming…negatively.

The study found that both GPT-3.5 and GPT-4 varied significantly and that there was performance degradation on some tasks.

Our findings highlight the need to continuously monitor LLMs behaviour over time. — Source

The schematic below shows the fluctuation in model accuracy over a period of four months. it some cases the deprecation is quite stark, being more than 60% loss in accuracy.

Prompt Drift

The output of LLMs are non-deterministic, this means that the exact input, to the same LLM, at different times, will most probably yield different responses over time.

In essence this is not a problem, and wording can differ while the ground truth remains the same.

However, there are instances where there are aberrations in the response of the LLM. For instance, LLMs are deprecated and migration is often necessitated, as we saw recently with OpenAI deprecating a number of models. Hence the prompt remains the same, but the underlying model referenced change.

The data which is injected into the prompt at inference might also be different at times. Suffice to say all of these factors contribute to a phenomenon known as prompt drift.

Prompt Drift is the phenomenon where a prompt yields different responses over time due to model changes, model migration or changes in prompt-injection data at inference.

In short, prompt drift can be caused by:

Model-inspired tangents,
Incorrect problem extraction,
LLM randomness and creative surprises.

There has been the emergence of Prompt Management and testing interfaces, like ChainForge, recently LangChain introduced LangSmith, together with commercial offerings like Vellum and others.

There is a definite market-need to ensure generative applications (Gen-Apps) can be tested prior to large language model migration/deprecation.

And if a model could be largely agnostic to the underlying LLM, so much the better. One avenue to move closer to achieving this, is leveraging in-context learning (ICL) capabilities of Large Language Models.

Cascading

Cascading is when an aberration or deviation is introduced by one of the nodes in a chain, and this unexpected exception is carried over to the next node, where the exception will most probably be exacerbated.

With each node output deviating further-and-further from the intended outcome.

This phenomenon is commonly referred to as cascading.

Considering the image below:

The user input can be unexpected or unplanned in the chained application, hence producing an unforeseen output from the node.
The previous node’s output can be inaccurate or produce a degree of deviation which is exacerbated in the current node.
The LLM Response can also be unexpected, due to the fact that LLMs are non-deterministic. Point three is where prompt drift or LLM drift can be introduced.
And the output from Node 2 is then carried over and cascading of the deviation is caused.

In Closing

Prompt chaining should not viewed in isolation, but rather consider Prompt Engineering as a discipline which consists of several legs.

The wording or technique followed when prompting the LLM is also important and has a demonstrable effect on the quality of the output.

Prompt Engineering is the foundation of Chaining and the discipline of Prompt Engineering is very simple and accessible.

However, as the LLM landscape develops, prompts are becoming programable (templates and context injection via RAG) and incorporated into increasing complex structures.

Hence chaining are being supported by elements like Agents, Pipelines, Chain-of-Thought Reasoning, etc.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore.ai. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don't already…

cobusgreyling.medium.com

How is ChatGPT's behavior changing over time?

GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models…

arxiv.org

PromptChainer: Chaining Large Language Model Prompts through Visual Programming

While LLMs can effectively help prototype single ML functionalities, many real-world applications involve complex tasks…

arxiv.org

Language Model Cascades

Prompted models have demonstrated impressive few-shot learning abilities. Repeated interactions at test-time with a…

arxiv.org

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned…

arxiv.org

LLM Drift, Prompt Drift & Cascading

LLM Drift

Prompt Drift

In short, prompt drift can be caused by:

Cascading

Considering the image below:

In Closing

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don't already…

How is ChatGPT's behavior changing over time?

GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models…

PromptChainer: Chaining Large Language Model Prompts through Visual Programming

While LLMs can effectively help prototype single ML functionalities, many real-world applications involve complex tasks…

Language Model Cascades

Prompted models have demonstrated impressive few-shot learning abilities. Repeated interactions at test-time with a…

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned…

Written by Cobus Greyling