SmolLM2: Powerful, Lightweight Language Models From HuggingFace
HuggingFace released a family of compact Language Models available in three sizes. The focus in training these models was problem solving and completing tasks.
Some Background
The challenge with training Language Models is to train a model which is highly capable but also lightweight enough to run on a device. One can think of this as two opposing forces…
Secondly, models are also being released as family of models, for instance the SmolLM2 is a family of compact language models available in three sizes: 135M, 360M and 1.7B parameters.
Releasing a language model as a family of models with different sizes helps an organisation deploy models optimally for various use cases while maintaining continuity and similar model behaviour.
By using a family of models, organisations can optimise for cost, speed and accuracy while maintaining a unified AI behaviour.
This approach ensures a smooth transition between models as needs change, avoiding major inconsistencies in model responses across different use cases.
Data curation has an especially outsized influence for smaller models, as their limited capacity must be carefully optimised for learning core knowledge and fundamental capabilities rather than memorising incidental facts.
~ HuggingFace
Most Important
Recently, Language Models have shifted away from being purely knowledge-intensive. Previously, embedding vast amounts of knowledge into models was crucial, but the issue of hallucination emerged.
This challenge has been mitigated through In-Context Learning and Retrieval-Augmented Generation (RAG), which provide contextual reference data at inference time. To scale RAG efficiently, frameworks have been developed around Language Models.
As a result, the focus has moved from knowledge density to enhancing model capabilities, such as reasoning and problem-solving.
Data Design
In the paper released by HuggingFace on how the models were trained, they spend quite a bit of time on how the data was designed and curated in order to create a lightweight model which can run on device but also have a model with advanced problems solving, reasoning and task completion capability.
This process provides a valuable way of tailoring LMs to provide helpful responses rather than simply attempting to continue the input as taught during pre-training.
~ HuggingFace
So for instance they stayed away from data that resulted from large scale (web) scrapes and big amounts of unstructured data.
This is described in the paper as the process of memorising large amounts of general and incidental facts.
They (SmolLM2) are capable of solving a wide range of tasks while being lightweight enough to run on-device. ~ HuggingFace
Rather, with HuggingFace’s approach they focused on two key streams of data…
Firstly, they focused on core knowledge and fundamental capabilities, data that is able to advance those aspects.
They also introduced specialised data which is divided into three main groups.
The first being software code, the second being math related data, and the third and most important is data that advances reasoning; data holding reasoning tasks.
Additional Data Stream
Three additional task specific datasets where introduced, Smol-Constraint, Smol-Rewrite and Smol-Summarisation…
Smol-Constraint is a training technique used in SmolLMs to enforce specific rules and structured outputs, ensuring the model adheres to predefined constraints during generation.
This method helps prevent hallucinations by guiding the model’s responses to stay within a controlled and predictable range.
Smol-Rewrite focuses on improving model expressiveness by training it to rephrase or restructure outputs while maintaining semantic consistency.
It enhances the model’s ability to generate diverse yet accurate responses, which is especially useful in creative writing and paraphrasing tasks.
Smol-Summarization optimises the model for concise and relevant content generation, improving its efficiency in distilling key information from longer texts.
By incorporating these three techniques, SmolLMs achieve better control, coherence, and adaptability across different tasks.
Practical Example
Below is a practical example of running an inference to the model, the Python code below I ran from the command line in a terminal window.
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
device = "cpu" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))
And the output…
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
What is the capital of France.<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
Finally
Language Models have evolved beyond simply memorising vast amounts of incidental facts, shifting toward a more curated and efficient learning process.
Instead of training on raw, unfiltered data, modern models are trained on datasets curated through a human-supervised process to ensure quality and relevance.
This curation balances performance and model size, preventing unnecessary bloat while maintaining strong generalisation capabilities.
More importantly, models are now trained for specific behaviours, such as logical reasoning, structured problem-solving and task-oriented execution.
Rather than relying on internalised knowledge alone, contextual data is dynamically injected during inference through techniques like Retrieval-Augmented Generation (RAG).
This approach reduces hallucinations and improves factual accuracy, ensuring models generate responses grounded in relevant external data.
That is why the emphasis could be shifted from raw knowledge retention to enhancing a model’s ability to reason, adapt, and complete tasks efficiently.
As a result, modern Language Models are more specialised, capable and reliable, aligning better with real-world applications and user needs.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.