LangChain, LangSmith & LLM Guided Tree-of-Thought

The Three-of-Thought (ToT) technique takes inspiration from the way human minds solve complex reasoning tasks by trial and error. In this approach, the mind explores the solution space through a thought process resembling a tree, enabling backtracking when needed. This article considers the ToT research paper and how LangChain implemented the ToT approach.

Cobus Greyling
6 min readSep 13


I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

A seen in the image below, the LLM is still the backbone of the autonomous agent. But the LLM is augmented with the following modules:

  • Prompter Agent,
  • Checker Module,
  • Memory module, and
  • ToT controller.

When solving a problem, the modules engage in a multi-round conversation with the LLM. This is typical of LLM-based autonomous agents, where a chain is created on the fly and executed sequentially, while polling the LLM multiple times.

Considering the LangSmith image below, the total number of tokens used is visible, with the two latency categories.

This image shows the Trace section, which holds the complete chain created for this agent, with the input and beneath it the output. I have mentioned this numerous times in the past, but LangSmith does give detailed break down at each step of the chain, with the cost (tokens) and latency.

The conversation and state history (context) is stored in the memory module. This makes it possible for the agent to refer to previous sections of the thought process, and perhaps take a different route from there.

ToT Strategy (Source)

In order to test the effectiveness of the ToT technique, the paper implemented a ToT-based agent to solve a Sudoku Puzzle.

Experimental results show that the ToT framework can significantly increase the success rate of Sudoku puzzle solving. [Source]

A vulnerability the paper identifies is that the LLMs generation is based on the preceding sequence, and backward editing is overlooked.

However, when we as humans solve a problem, we most probably backtrack to previous iterations if the derived step is incorrect.

This approach of backtracking negates the danger of the LLM reaching an inconclusive or a no answer scenario.

Secondly, to establish correctness, a practice for us as humans is to carry out tests at every step of the problem-solving process.

This ensures the credibility of the final solution. The paper stats that auto-regressive language models don’t explicitly perform logical correctness checks as it generates a new token based on the previous tokens.

This limits LLM capacity to correct own mistakes. A minor error could be amplified as the model generates more tokens, this is often referred to cascading. Hence leading to solution quality deterioration and making it difficult to recover from mistakes.

Cascading has been identified very early on as a danger of manually created prompt chains. However, considering that an autonomous agent creates a chain of prompts on the fly, it is still susceptible to cascading.

This strategy solves a problem through a multi-round conversation between the LLM and the prompter agent. [Source]


The image above shows the success rate across four approaches: Zero Shot (zs), One Shot (os), Few Shot (fs) and Tree-of-Thought (tot).

Granted, the implementation of a Zero, One or Few Shot approach is far simpler than implementing a ToT autonomous agent.

Here is complete working code for the Tree-Of-Thought agents, which you can copy and paste into a notebook. All you will need to update is the OpenAI API key and your LangSmith API key.

pip install langchain
pip install langchain_experimental
pip install -U langsmith
pip install openai


import os
from uuid import uuid4

unique_id = uuid4().hex[0:8]
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"Agent Tot"
os.environ["LANGCHAIN_ENDPOINT"] = ""
os.environ["LANGCHAIN_API_KEY"] = "xxxxxxxxxxxxxxxxxxxxxxxx"
os.environ['OPENAI_API_KEY'] = str("xxxxxxxxxxxxxxxxxxxxxxxx")


from langchain.llms import OpenAI
llm = OpenAI(temperature=1, max_tokens=512, model="text-davinci-003")


sudoku_puzzle = "3,*,*,2|1,*,3,*|*,1,*,3|4,*,*,1"
sudoku_solution = "3,4,1,2|1,2,3,4|2,1,4,3|4,3,2,1"
problem_description = f"""

- This is a 4x4 Sudoku puzzle.
- The * represents a cell to be filled.
- The | character separates rows.
- At each step, replace one or more * with digits 1-4.
- There must be no duplicate digits in any row, column or 2x2 subgrid.
- Keep the known digits from previous valid thoughts in place.
- Each thought can be a partial or the final solution.

# The following code implement a simple rule based checker for
# a specific 4x4 sudoku puzzle.

from typing import Tuple
from langchain_experimental.tot.checker import ToTChecker
from langchain_experimental.tot.thought import ThoughtValidity
import re

class MyChecker(ToTChecker):
def evaluate(self, problem_description: str, thoughts: Tuple[str, ...] = ()) -> ThoughtValidity:
last_thought = thoughts[-1]
clean_solution = last_thought.replace(" ", "").replace('"', "")
regex_solution = clean_solution.replace("*", ".").replace("|", "\\|")
if sudoku_solution in clean_solution:
return ThoughtValidity.VALID_FINAL
elif, sudoku_solution):
return ThoughtValidity.VALID_INTERMEDIATE
return ThoughtValidity.INVALID

# Testing the MyChecker class above:

checker = MyChecker()
assert checker.evaluate("", ("3,*,*,2|1,*,3,*|*,1,*,3|4,*,*,1",)) == ThoughtValidity.VALID_INTERMEDIATE
assert checker.evaluate("", ("3,4,1,2|1,2,3,4|2,1,4,3|4,3,2,1",)) == ThoughtValidity.VALID_FINAL
assert checker.evaluate("", ("3,4,1,2|1,2,3,4|2,1,4,3|4,3,*,1",)) == ThoughtValidity.VALID_INTERMEDIATE
assert checker.evaluate("", ("3,4,1,2|1,2,3,4|2,1,4,3|4,*,3,1",)) == ThoughtValidity.INVALID

# Initialize and run the ToT chain,
# with maximum number of interactions k set to 30 and
# the maximum number child thoughts c set to 8.

from langchain_experimental.tot.base import ToTChain

tot_chain = ToTChain(llm=llm, checker=MyChecker(), k=30, c=5, verbose=True, verbose_llm=False)


And the output from the agent, the iterations and back-tracking is very much visible in the output.

> Entering new ToTChain chain...
Starting the ToT solve procedure.
/usr/local/lib/python3.10/dist-packages/langchain/chains/ UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.
Thought: 3,4,*,2|1,*,3,*|*,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,*,3,*|*,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,*|*,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,4|*,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,*|1,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,*|*,2,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,*|*,1,1,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,*|*,1,*,4|4,*,*,1
Thought: 3,4,1,2|1,2,3,*|*,1,*,1|4,4,*,1
Thought: 3,4,1,2|1,2,3,*|1,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,*|*,1,2,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,*|*,1,*,3|4,1,*,1
Thought: 3,4,1,2|1,2,3,*|*,1,*,3|4,*,1,1
Thought: 3,4,1,2|1,*,3,4|*,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,4|*,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,4|2,1,*,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,4|2,1,4,3|4,*,*,1
Thought: 3,4,1,2|1,2,3,4|2,1,4,3|4,1,*,*
Thought: 3,4,1,2|1,2,3,4|2,1,4,3|4,2,*,*
Thought: 3,4,1,2|1,2,3,4|2,1,4,3|4,3,*,*
Thought: 3,4,1,2|1,2,3,4|2,1,4,3|4,3,1,*
Thought: 3,4,1,2|1,2,3,4|2,1,4,3|4,3,2,*
Thought: 3,4,1,2|1,2,3,4|2,1,4,3|4,3,2,1

> Finished chain.

Below the output as viewed in the Colab notebook.



Cobus Greyling

Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; NLP/NLU/LLM, Chat/Voicebots, CCAI.