Teaching Small Language Models to Reason

Chain-Of-Thought Prompting at a foundational level is so successful, that it gave rise to something some refer to as the Chain-Of-X phenomenon. Google Research explored how to generate a CoT data ontology for existing datasets using LLMs and then how to fine-tune smaller Language Models on the CoT.

Cobus Greyling
3 min readJul 10, 2024



As most everyone knows, Chain-Of-Thought prompting improves the reasoning capabilities of large language models.

Google asserts that reasoning capabilities only emerge in models with at least tens of billions of parameters. This research from Google explores transferring these capabilities to smaller models via knowledge distillation.

They fine-tuned a student model using the Chain-Of-Thought outputs from a larger teacher model.

Researchers from Google found that this method improves task performance in arithmetic, common sense, and symbolic reasoning datasets.

Chain-Of-Thought (CoT)

Chain of thought (CoT) prompting teaches Language Models (LMs) to decompose a reasoning task into a series of intermediate steps.

It is demonstrated that this prompting significantly increases the task accuracy of large language models (LLMs) across common sense, symbolic and mathematical reasoning datasets.

However, the reasoning capabilities of smaller LMs do not improve with CoT prompting, mostly producing illogical CoT. Notably, CoT prompting even reduces the accuracy of models with less than 10 billion parameters.

Research attributes this to abilities, such as semantic understanding and symbolic mapping, only emerging at larger scale models.

The Method

Google Research propose a two-step pipeline for CoT (Chain-Of-Thought) knowledge distillation.

Annotation with CoT Reasoning

  1. Use a teacher model, like PaLM 540B or GPT-3 175B, to annotate an existing supervised dataset with CoT reasoning.
  2. Perform few-shot prompting with 8 examples to generate CoTs, adapting prompts to provide the target answer after the question and before the example CoT. This helps correct small mistakes.
  3. Remove incorrect CoTs based on the target answer to ensure quality.

Fine-Tuning the Student Model

  1. Fine-Tune a student model using teacher forcing.
  2. Provide the question as input and the CoT and answer as the target.
  3. This training eliminates the need for prompting during fine-tuning.

An overview of the proposed method is shown in the figure below:

This image is an overview of the proposed method.

In Conclusion

This study is again a good example how prompt engineering techniques which are proven to be effective, is making its way into language model training. Hence prompt engineering is influencing the training data topology.

This is also another example of a LLM being used to generate, or augment training data for a Small Language Model.

Thirdly, the first step involves annotating an existing supervised dataset with CoT reasoning generated by a teacher model. There has been a number of studies where very granular, fine-grained data is created via a human annotation and supervised process.

✨ Follow me on LinkedIn for updates on Large Language Models

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com