ChainForge is an open-source visual programming Interface for LLM flows and more…

Cobus Greyling
5 min readMay 29

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language. Including NLU design, evaluation & optimisation. Data-centric prompt tuning & LLM observability, evaluation & fine-tuning.

For tips on installation, go to the end of this article.

The objective of ChainForge is the comparing and evaluation of prompts and model responses.

Users will most probably construct a quick flow to examine prompt structure and compare the response of different models.

ChainForge is premised on the following three tasks:

Prompt Permutations

With ChainForge you can setup prompt templates, chain multiple prompts together, as seen below. Multiple methods can be used for prompt template input:

  • CSV Nodes,
  • Text,
  • Python output, etc.

Consider the example below where a CSV node populates a prompt template, chained to a prompt with further instructions for the LLM.

Evaluation Nodes

Evaluation nodes are Python script -based components, which test LLM responses classically for expected behaviour.

Considering the image below, where GPT3.5 and GPT4 are tested for prompt injection vulnerability.

Two nodes are used, both holding a set of instructions.

Notice the prompt node, the simple Python evaluation script, and the output.

From this example it’s clear that GPT3.5’s susceptibility to prompt injection is higher.

Visualisation Nodes

The default visualisation features are quite good, as can be seen below. The level of visualisation is highly dependant on the data processing performed in the Python script.

Considering the image below, if you return a dictionary with more than one key, metrics will be plotted in a parallel coordinates plot.



It would be unfair to compare ChainForge with LangFlow or Flowise (both based on LangChain), but invariably one does draw comparisons.

LangFlow and Flowise are both developed as fully fledge chaining applications with Agent capability; both leveraging LangChain.

As mentioned earlier, ChainForge was built for specific purposes and will also work well as a playground for:

  • Model Testing,
  • Model Comparison,
  • Prompt Engineering & Templating
  • Data Visualisation

Additional features which would help are:


The addition of tabs will be a great help, allowing users to have multiple flows open and use tabs to switch between flows.


Exposing flows via an API will also be helpful, currently flows can be exported and imported; hence being shared as ASCII files. But API functionality will help.

Components Pallet

Currently ChainForge only have seven components, listed below. With Export & Import options. An improvement will be to have a pallet or pane on the left, with the components listed and a description on their functionality.


Only two commands are required to install and run ChainForge:

pip install chainforge &

chainforge serve

When installing ChainForge via the MacOS Terminal, you will most probably run into the following error:

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> greenlet

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

To solve this, you will need to run xcode-select --install . Once installation is successful, and the serve command is run, the following is shown in the terminal:

chainforge serve      
Serving SocketIO server on port 8001...
Serving Flask server on port 8000...
* Serving Flask app 'chainforge.flask_app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://localhost:8000

And installed and running, ChainForge is accessible via Chrome on http://localhost:8000/.

As seen below, the only configuration you will have to perform is adding the LLM API Key detail for your model.

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.



Cobus Greyling

Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; NLP/NLU/LLM, Chat/Voicebots, CCAI.