Prompt Chaining

There is an emergence of Visual Programming tools facilitating the chaining of large language model prompts into an application; which is mostly a conversational UI.

Cobus Greyling
7 min readApr 6, 2023


Some Background

A number of chatbot development frameworks have included LLMs in their functionality. This functionality is supposed to compliment the existing chatbot development affordances. However, these chatbot development framework implementations of LLM have been very much similar in nature.

This initial implementation was followed by a second wave of conversational flow generation and conversation design assistants; which was spearheaded by Cognigy. And subsequently followed by Yellow AI and Kore AI.

LLMs are highly versatile with open-ended capabilities.

Voiceflow & Botpress have introduced functionality resembling prompt chaining, to the extent where a prompt can be submitted to a LLM and values can be extracted from the generated response.

In this article I want to consider the underlying principles when designing and building a native prompt chaining visual programming interface.

Native Prompt Chaining User Interface

When chaining Large Language Model prompts through a Visual Programming UI, the largest part of the functionality will be the GUI which facilitates the authoring process.

Below is an image of what such a GUI for prompt engineering, and prompt chain authoring might look like. This design originated from research performed by the University of Washington and Google.


The interface needs to offer granularity on multiple levels of the authoring process. Considering the chaining process, it is evident that between chains there are connections.

Any prompt chaining or LLM chaining tool must support data transformation between the steps of the chain. Considering that in most cases the LLM will return unstructured data, there will be a level of structuring and data transformation required.

A second element is the slight unpredictable nature of LLMs. A LLM can generate multiple responses to the same prompt. This gives rise to a challenge of cascading undesired responses in prompt chains. Where unexpected data is propagated throughout the chains.

Because there is no compiling of code, prompts can be customised at run-time without any specific model training required.

New tasks can easily be absorbed by the LLM Chained application, simply by taking in natural language instructions called prompts.

Granted, many real-world applications involve complex and often parallel multi-step tasks which pose a challenge for a single instance of chains.

Considering the functionality stack below, a prompt chaining application based on Large Language Models (LLMs) & Generative AI should have three main components:

⏺ LLMs



Adapted from source


For many chains/nodes, the LLM output can be used unedited as supplied by the LLM. In most cases this view will be presented to the user, or data transformation might be required for a next chain in the LLM App (aka Gen App).

The LLM output will determine to which chain to branch out to, hence these chain/node edges need to contain rules and conditions.

A last consideration is using the LLM as a classifier. LLMs can be implemented in a generative or predictive capacity. Generative AI is the most popular and most accessible side of LLMs. Predictive is more tricky and requires more fine-tuning or at least precise prompts.

Leveraging a LLM as a classifier (hence predictive) within a chain, makes the decision making nodes of the chaining application intelligent and flexible.

There will always be the temptation to quickly code/script decision making into the edges; also referred to as the chain-transition points. This is a quick-fix; but again introduces rigidity into the LLM Chaining application.


Helpers assist in the evaluation of LLM output, considering aspects like politeness, empathy, etc. Helpers can also re-rank or resubmit prompts.

As granularity is a requirement for any production implementation, tweaking the prompt chained application will be important. A form of scripting will be required for processing and transformation of data.

Scripting encapsulates the development affordances given to the chain author to implement control measures, data transformation and more.


Communication takes place on a few levels, there is user data input in the form of unstructured conversational data.

Other user actions like asking the user to select the best response, or disambiguate based on a few possibilities.

And of course third-party systems (APIs) will be interrogated for any enterprise or production implementation.

Considering The Design Of PromptChainer

The PromptChainer interface has a chain view visualiser where the chain structure with node-edges or “between-chains” can be created, edited, read and deleted [A].

The node viewer supports the implementing, improving and testing of each individual node or chain [B]. The editing of prompts for each node can be performed here.

PromptChainer also supports running the chain end-to-end [C] with options to clear the cache, view run logs, etc.


Based on the quantitive research performed by the University of Washington and Google Research, discoveries where made on how users build and debug chains:

[1] Users want to build chains to not only mitigate LLM limitations, but also make their applications extensible and scale.

[2] Some users built one step of a chain at a time, while others sketched out abstract placeholders for all steps before filling them in.

[3] The interactions between multiple LLM prompts and chains can be complex, requiring both local and global debugging of prompts.

Considering the image below, here is a description for each node or chain:

[1] Define the input to a chain

[2,3] Use LLM output to filter and branch out inputs.

[4] Use the LLM output directly as the node output.

[5] Pre-implemented JavaScript functions for typical data transformation.

[6] Use the LLM output directly as the node output.

[7] Call external functions to connect professional services with LLMs.

[8] User-defined JS functions, in case pre-defined helpers are insufficient.

[9] Use the LLM output directly as the node output.

[10] Filter or re-rank LLM outputs by human-designed criteria, e.g., politeness. Enables external (end user) editing on intermediate data points.

Adapted from source

In Conclusion

There are a few terms used currently for applications built on Foundation Models and Large Language Models. These terms are Gen Apps, Generative Apps, LLM Apps, Prompt Chaining, LLM Chaining and more.

Suffice to say, that there is a need to build on Large Models, and somehow harness the power of LLMs.

This need will be underpinned by the following principles:

  1. Prompts
  2. Leveraging LLM Predictive and Generative capabilities.
  3. Chaining of Prompts; in both a parallel and series fashion.
  4. A visual editor for programming chains/nodes and edges, between-nodes/chains.
  5. LLM Response data transformation.
  6. LLM Chained Applications will be conversational for both input and output, or the output can also encapsulate a RPA component.

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

PromptChainer: Chaining Large Language Model Prompts through Visual Programming.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.