ChainForge Is A Prompt Engineering GUI Toolkit

ChainForge enables the building of evaluation logic to measure prompt performance, LLM Drift and model robustness.

4 min readOct 1, 2023

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

ChainForge is a GUI to build evaluation logic to measure model selection, prompt templating and perform generation auditing. ChainForge can be installed locally or can be run from a chrome browser.

A number of additions have been made to ChainForge; these include (as seen below) a chat turn nodes. Chat nodes are in step with the notion from OpenAI to deprecate completion and insertion modes and focus on chat modes.

With the chat turn nodes a conversation can created while passing the context from node to node. Hence a conversational UI created via prompt chaining can be simulated.

Multiple conversations can be run in parallel across different LLMs.

Chat messages can be templated, and the underlying LLM can be updated and changed along the way for each node.

Chat nodes are important to generation auditing of conversational interfaces. Each node can be inspected to detect for Prompt Drift, LLM Drift, etc.

The image below shows how an expected or ground-truth response can be defined via tabular data input.

Below a text fields node is defined with the seven continents. A prompt is premised on the text fields; followed by chat turn nodes. In the chat turn nodes, the previously used LLMs can be used, or a new LLM can be defined.

For each chat turn node, an inspect node can be defined to view the LLM responses.

The response selector has an option for grouped lists or tables, below you see the output for each of the models referenced.

Consider the LLM Scorer below, the LLM Scorer uses a single model to score other LLM responses by making use of a scoring prompt in which you must define how the LLM must perform the scoring.

In this case, the LLM Scorer prompt is:

Respond with ‘true’ if the text is positive, and respond with ‘false’ if the text is negative.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

HumanFirst — Design, test and launch custom NLU and prompts

HumanFirst makes sense of unstructured data quickly. Pairing human-in-the-loop and AI-powered features, seamlessly…

www.humanfirst.ai

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

ChainForge

ChainForge is an open-source visual programming Interface for LLM flows and more…

cobusgreyling.medium.com

Significant Updates Where Made To ChainForge

ChainForge is an IDE for prompt engineering and a number of important improvements were made to the tool.

cobusgreyling.medium.com

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many…

arxiv.org

GitHub - ianarawjo/ChainForge: An open-source visual programming environment for LLM…

An open-source visual programming environment for LLM experimentation and prompt evaluation. - GitHub …

github.com

ChainForge Is A Prompt Engineering GUI Toolkit

ChainForge enables the building of evaluation logic to measure prompt performance, LLM Drift and model robustness.

HumanFirst — Design, test and launch custom NLU and prompts

HumanFirst makes sense of unstructured data quickly. Pairing human-in-the-loop and AI-powered features, seamlessly…

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

ChainForge

ChainForge is an open-source visual programming Interface for LLM flows and more…

Significant Updates Where Made To ChainForge

ChainForge is an IDE for prompt engineering and a number of important improvements were made to the tool.

ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many…

GitHub - ianarawjo/ChainForge: An open-source visual programming environment for LLM…

An open-source visual programming environment for LLM experimentation and prompt evaluation. - GitHub …

Written by Cobus Greyling

No responses yet