Giving Language Models A First Step Advantage

There has been a number of studies on how to improve the performance of Language Models & a number of these studies focussed on selecting the best inference result from a number of samples.

Cobus Greyling

--

This study, on the other hand, focusses on improving the quality of the reasoning path by starting the multi-step reasoning in the right way.

The study refers to this as the first step advantage.

We focus on enabling smaller models to learn how to start correctly.

If you like this article & want to show some love ❤️

- Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Find me on LinkedIn or on X! 🙂

QuestCoT Prompting

QuestCoT is a self-questioning mechanism designed to improve reasoning in smaller language models.

Before solving a problem, the model asks itself how to start, identifies the optimal reasoning chain, and then follows it, leading to significant accuracy gains across various multi-step mathematical reasoning tasks.

Comparison between Chain-of-Thought (CoT) approach and QuestCoT. The CoT approach enables a Language Model (LM) to generate accurate answers through multiple samplings, yet it frequently struggles to confidently select the correct one. Conversely, QuestCoT utilises self-question-guided generation, which facilitates the model’s ability to choose the appropriate reasoning chain with higher confidence.

Model Orchestration

To assist smaller models in initiating the starting step, QuestCoT prompts the smaller model first, to ask itself how to start, before proceeding with a chain of reasoning.

On various multistep mathematical reasoning datasets over multiple smaller models, the study shows that getting the right start can lead to significant performance gains across all models.

Over the years, large language models (LLMs) have improved their reasoning abilities by explaining their intermediate thoughts.

This trend has been extended to smaller models:

  1. Either through pre-training,
  2. Fine-tuning or
  3. Knowledge distillation.

Model accuracy improves significantly when multiple reasoning chains are generated, indicating that the model understands how to answer the given problem.

However, models often struggle to select the correct initial chain and if they start on an incorrect reasoning path, it becomes difficult to fix it due to the autoregressive nature of decoding.

The study shows that if a smaller model initiates an incorrect reasoning chain, it will continue down that incorrect path. Conversely, if the initial step is correctly determined, the model can successfully complete tasks that it would otherwise find challenging.

Accuracy comparison between baseline (no guidance) and LLM guidance (GPT-4) for the Mistral- 7B model on the GSM8K dataset. 2–8 represents the number of steps required to solve the problem.

Self-Consistency Decoding

The process in which a language model generates multiple answers and then selects the best one is called self-consistency decoding.

This technique involves sampling diverse reasoning paths, generating multiple solutions, and choosing the most consistent or majority answer to improve accuracy in reasoning tasks.

This demonstrates that model accuracy improves significantly when multiple reasoning chains are generated, indicating that the model understands how to answer the given problem.

But if the initial step is correctly determined, the model can successfully complete tasks that it would otherwise find challenging.

SLMs

What about the ability of smaller models to solve a reasoning task?

What is the importance of taking the correct first step in reasoning?

And how can smaller models learn how to take the correct first step?

The study observed that smaller models can answer a reasoning question when sampled multiple times, but fail to select the correct reasoning chain on the first attempt.

Hence the usefulness of the first-step guidance provided by LLMs. The performance of the pre-trained models increases by more than 2–3X when a larger model such as GPT- 4 is used for first-step guidance.

Venn diagram to show when different strategies got the solutions right.

Finally

Smaller language models often encounter difficulties in taking the correct first step during reasoning tasks, but their performance improves significantly when this step is corrected.

This phenomenon has been demonstrated by using large language models (LLMs) to guide smaller models in establishing the correct reasoning chain.

To enable smaller models to initiate reasoning independently, the QuestCoT approach is proposed, which employs question-based self-guidance to help these models determine how to start effectively.

The effectiveness of QuestCoT has been validated across four multi-step mathematical reasoning datasets using various open-source small models, resulting in notable performance improvements.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

--

--

No responses yet