ChainBuddy

An AI Agent System for Generating LLM Pipelines.

7 min readSep 24, 2024

Some Background

The evolution of prompting techniques in large language models has mirrored the growing complexity of tasks that AI is expected to handle.

Single-shot prompting was the starting point, where a model was given one example and expected to perform the task accurately based on that alone.

Multi-shot prompting improved this by offering multiple examples, allowing the model to refine its understanding of patterns and expectations.

Moving further, prompt templates emerged, standardising input structures to ensure consistency and reduce variance in model outputs. This was achieved by creating variables or placeholders in a prompt which is replaced with values at runtime.

As tasks became more complex, chaining entered the scene, breaking down multi-step tasks into sequential prompts where the output of one prompt becomes the input for the next.

However, to address the challenge of factual inaccuracy, Retrieval-Augmented Generation (RAG) was introduced, combining retrieval mechanisms with language models, ensuring that generated text was grounded in real-time, accurate data sources.

ChainBuddy, is an AI-powered assistant that automatically generates starter LLM pipelines (“flows”) given an initial prompt. ~ Source

ChainForge

When I saw Ian Arawjo’s name in this study, I was immediately reminded of ChainForge, his innovative tool for building evaluation logic around model selection, prompt templating, and generation auditing.

ChainForge offers an intuitive GUI that can be installed locally or run directly from a Chrome browser, making it accessible and versatile for various AI development tasks.

Significant updates have been made to ChainForge, including the introduction of chat turn nodes, which align with OpenAI’s shift from completion and insertion modes to focus on chat-based interactions.

Users can run multiple conversations in parallel across different language models, templating chat messages, and even switching LLMs mid-conversation for each node.

This flexibility is invaluable for generation auditing, where each chat node can be inspected for issues like Prompt Drift or LLM Drift, ensuring more robust conversational AI systems.

ChainForge’s evolution makes it a powerful tool for anyone focused on building and auditing conversational AI interfaces.

Back To ChainBuddy

ChainBuddy is based on ChainForge and leverages LangGraph from LangChain.

ChainBuddy seems like it will work well as a web-based User Interface and as an agentic based decomposition and flow creator of sorts. But not a traditional idea creator, but rather a tool that can compare the response of different LLMs and also create a pipeline.

A pipeline that can serve as a tool for an agentic application. So ChainBuddy can be seen as a technical tool which can help builders to decompose complex and ambiguous queries into a sequence of steps.

Acting as a logic tool of sorts.

ChainBuddy is a chat-based AI assistant that interacts with users to understand their needs and goals, then generates editable and interactive LLM pipelines to get them started.
The research also provides insights into the future of AI-powered interfaces while reflecting on potential risks, such as users becoming overly dependent on AI for generating LLM pipelines.

Above is an example of a workflow created by **ChainBuddy** from just *one* user prompt. In this example, the user wants to explore how different personas affect an LLM’s response to a complex math problem. In this case, **ChainBuddy** automatically generates a pipeline with various personas, a math question and templated prompts. It then runs a comparison across four different LLMs. Additionally, it uses a Python **CodeEvaluator** to check the accuracy of the output, verifying whether the solution is correct. Users can customise this process, for example, by tweaking the output or adding more regex patterns for validation.

The Missing Link

The study recognises that there is a whole slew of Agent Development frameworks, but identified that a higher level of abstraction is needed.

Where users and AI/ML experts, can set up pipelines and automated evaluations and visualisation of LLM behaviour.

Conversational UI

The ChainBuddy assistant is a chat-based tool located in the bottom-left corner of the ChainForge platform. I must say, I checked ChainForge at the writing of this article, and ChainBuddy was not available.

Within the ChainBuddy chat UI, a user can begin by describing a problem via the chat interface. The assistant then engages in a Q&A session to clarify the user’s intent, asking up to three key questions.

Users can respond to each question individually and once the clarification process is complete, the user can trigger the assistant to generate a flow by clicking a button.

The generated flow can be reviewed or modified as needed, with the option to request a new version if necessary. The assistant interface is kept simple, with the complexity focused on the agent architecture and flow generation capabilities.

ChainBuddy is built on LangGraph, a library designed for constructing stateful, multi-actor applications with LLMs.

Above a ChainBuddy interface and usage example. Users specify requirements A, ChainBuddy replies with a requirements gathering form B that users can either fill out and send, or follow up with an open-ended chat C. User presses green button D to indicate that they are ready to generate a flow. After a delay of 10–20sec, ChainBuddy produces a starter pipeline E. Here, the starter pipeline includes example inputs, multiple prompts to try (prompt templates), two queried models, and a Python-based code evaluator.

Early User Feedback

Intent clarification: Users often input prompts that lack enough detail to generate a tailored workflow. To address this, ChainBuddy adopts an interactive chat format for better clarification.

Structured Input: While the system initially used an open-ended, ChatGPT-like chat to ask users questions, long responses made it difficult for users to reply naturally.

To improve this, a form-filling approach was implemented, allowing users to respond only to the questions they choose, while limiting the number and length of questions.

Feedback: Users suggested helpful features like visualising the system’s progress and including explainable AI elements to clarify how the system generates results or visualisations.

Flow Editing: Some users expressed interest in continuing the conversation with ChainBuddy to refine or expand on existing workflows. While the feature was desired, it was too complex to implement fully within the scope of the study.

The **ChainBuddy** system architecture operates with a front-end and back-end setup. In the front-end, a requirements agent gathers the user’s intent and context (left side). Once the user presses the generate button, the system sends this intent to the back-end **Planner agent**. The Planner then breaks (decomposes) the problem into smaller tasks and assigns each to a specific agent. The outputs from these agents are linked and organised by connection agents, which manage the overall flow. An optional **Reviewer agent** can further refine the result. Finally, the completed flow is sent back to the front-end in **JSON** format as a ChainForge flow.

Final Thoughts

ChainBuddy is a great example where a conversational UI is used to gather user information via an agentic approach in terms of goals, requirements, preferences and more.

ChainBuddy serves as a higher abstraction layer to help users in converting their requirements into a pipeline. And even-though this pipeline might not be the final product, it serves as a step in solving for the blank canvas problem.

Giving the user starting point which can be edited in order to reach a final state.

Interoperability might be a consideration, as flows needs to be exported or at leat recreated in other IDE’s.

Task Decomposition & Tasks Assignment To Purpose Specific Agents

Inspired by concepts of advanced prompting and projects aimed at enhancing autonomous agents, ChainBuddy’s agentic system is designed to generate long-term plans tailored to user requirements.

This approach entails breaking down tasks into specific, manageable actions, enabling individual agents to execute these actions and return structured data to upstream agents.

This architecture enhances both efficiency and accuracy by allowing each agent to concentrate on a single task.

The system includes a Requirement Gathering Chat Assistant, which interacts with users to clarify their intent and gather context for their problems, laying the groundwork for subsequent planning.

The Planner Agent develops a comprehensive implementation plan based on the user’s specified goals, receiving contextual information about all accessible nodes within the system, including their names, descriptions, and permissible connections.

Each task within the plan is delegated to a dedicated Task-Specific Agent, facilitating focused execution.

These tasks correspond to different nodes in the user interface that require generation, allowing for the deployment of smaller, less powerful models for execution tasks while reserving larger, more capable models for planning.

Connection Agents then process the output from task-specific agents, generating connections between nodes and determining initial positions within the interface.

Finally, the Post-hoc Reviewer Agent assesses the generated workflow against the initial user criteria provided to the Planner, possessing the capability to prompt the Planner for revisions if necessary.

✨ ✨ Please follow me on LinkedIn for updates ✨✨

I’m currently the Chief Evangelist @ Kore.ai. I explore and write about all things at the intersection of AI & language; ranging from Language Models, AI Agents, Agentic Applications, Development Frameworks, Data-Centric Productivity Suites & more…