Small Language Models (SLMs) & Classification
Small language models (SLMs), enhanced through supervised fine-tuning and sophisticated prompt engineering, deliver near-perfect accuracy for targeted applications.
SLMs are rapidly gaining prominence, with fine-tuning enabling them to excel at specific, specialised tasks that demand precision and efficiency.
As agentic applications grow increasingly complex, tasks are now routinely decomposed into orchestrated agentic workflows. In these workflows, multiple models collaborate in seamless sequences to handle intricate processes.
As previously discussed, OpenAI’s Deep Research API exemplifies this approach, leveraging SLMs for disambiguation and prompt optimisation to boost performance.
A recent study demonstrates that combining prompt optimisation with fine-tuned SLMs achieves exceptional levels of accuracy and efficiency in classification tasks.
Increasing the size of the model or the depth of the classification head yields only marginal performance improvements.
Traditionally, text classification has been a cornerstone of chatbot development — now, SLMs are redefining what’s possible in this space.
In traditional chatbots, the foundational step has always been intent detection — classifying the user’s purpose to guide the interaction effectively.
Consider the OpenAI Agent Builder UI (as shown below): One of the introductory examples showcases an agentic application orchestrating multiple AI agents in a coordinated workflow.
Right at the outset, a dedicated classification node handles initial routing, with full flexibility to select the underlying model.
Building on the research outlined earlier, picture deploying a fine-tuned SLM for this classification step.
Hence, dramatic optimisations in latency, accuracy and cost — making agentic systems not just smarter, but faster and more economical to scale.
Back to the study, Small AI models (SLMs) can sort texts like emails or laws really well in real jobs, but only if you train them a bit…
Fine-tuning models are not a daunting as many think, making use of no-code GUI’s from Cohere, OpenAI and others, models can be fine-tuned on small numbers of training data.
No training (fine-tuning)? No, the SLMS (random answers on hard stuff like long papers or multilingual emails) fail. But adding a few examples to prompts for a quick 10% boost.
With training? Yes! Simple tweaks (like SFT) make them hit 99% accuracy on easy tasks (EU laws) and 80–90% on tough ones (academic papers/emails), using way less computer power than big AIs (2GB vs. 86GB memory).
Use tiny Llama models (1B size) + fine-tuning for fast, cheap office use.
The Big lesson, train on good data, skip fancy reasoning prompts (they waste time), and SLMs beat giants for everyday work without fancy computers
Considering the image below, under methods and types, prompt engineering entails no training, just better instructions.
Fine-tuning trains the model on your data for better a better fit.
As previously noted, the architectures exposed through OpenAI’s APIs offer a revealing glimpse: their commercial endpoints don’t simply route to a single language model.
Instead, intricate agentic workflows hum behind the scenes, orchestrating the flow of information to drive peak optimisation.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. Language Models, AI Agents, Agentic Apps, Dev Frameworks & Data-Driven Tools shaping tomorrow.
