Chain-Of-Verification Reduces Hallucination in LLMs

A recent study highlighted a new approach (CoVe) to address LLM hallucination via a novel implementation of prompt engineering. This approach can be simulated via an LLM playground, and there is already a feature request submitted to LangChain for a CoVe implementation.


LLM generation of highly plausible and convincing, yet factually incorrect information is termed hallucination.

This paper from 25 September 2023 views hallucination as an unsolved problem of large language models. And even though there has been significant progress made in terms of negating hallucination, one could consider it as an unsolved problem solely from an LLM perspective.

Various prompt engineering techniques have illustrated the flexibility of LLMs in terms of prompt effectiveness.

The CoVe approach again shows the ability of LLMs to be able to deliberate on their own responses and correct mistakes.

Source — Chain-of-Verification feature request for LangChain.

We find that independent verification questions tend to provide more accurate facts than those in the original long-form answer, and hence improve the correctness of the overall response. Source

This playground example below shows a basic implementation of CoVe, the system description is: Answer the following questions: , then the LLM is asked to name politicians born in New York (1).

A list of names are returned (2), the model used is gpt-3.5-turbo. Unbeknownst to us, some of these names are wrong and should not be part of the list.

What is really insightful, is when the names are queried individually (3, 4) the correct answer is generated by the LLM.

Considering the image below, the four steps of CoVe are shown in the playground, with (1) the initial query, and the (2) baseline response. The baseline response would be used in most instances; giving rise to the phenomenon of cascading in chained applications.

Plan verification (3) is performed on the baseline response, using the same LLM. Plan verification is used to check the generated answers from the query. And from here the final verified response (4) can be created, based on the results of the baseline response.

The Chain-of-Verification (CoVe) approach thus performs four core steps:

  1. Generate Baseline Response: Given a query, generate the response using the LLM.
  2. Plan Verifications: Given both query and baseline response, generate a list of verification questions that could help to self analyse if there are any mistakes in the original response.
  3. Execute Verifications: Answer each verification question in turn, and hence check the answer against the original response to check for inconsistencies or mistakes.
  4. Generate Final Verified Response: Given the discovered inconsistencies (if any), generate a revised response incorporating the verification results.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.




Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.