Sitemap
Press enter or click to view image in full size

Why Small Language Models (SLMs) Are Revolutionising Agentic Workflows

3 min readOct 13, 2025

--

SLMs excel at retrieval-augmented generation (RAG), tool orchestration, and dynamic decision-making.

Agentic Workflows necessitates the orchestration of multiple AI Agents and Language Models to create a solution.

Recent research, including NVIDIA’s forward-looking analysis, shows that small language models (SLMs, typically 1–12B parameters) aren’t just viable — they’re often superior for these tasks.

Press enter or click to view image in full size

From robust function calling and structured decoding to programmatic tool use and selection, SLMs performs well where precision and efficiency matter most.

The study found that 80–90% of agentic tasks fall into the “SLM is good enough” category.

The change in paradigm, from AI Agents to Agentic Workflows, where multiple AI Agents are orchestrated together with other elements, moves us away from relying on a single, resource-hungry behemoth model toward intelligent orchestration of multiple, specialised models.

Press enter or click to view image in full size

SLMs deliver near-LLM-level performance, matching or even surpassing larger models on agentic benchmarks.

SLMs have 10 to 30 times Lower Cost as apposed to LLMs, while 80–87% of Performance is Retained.

SLMs =

  • 10–100× lower token costs,
  • Dramatically reduced latency, and
  • Loser energy use,

making them ideal for edge inference and high-volume production.

Press enter or click to view image in full size
Model, cost and performance ratio, with the efficiency.

For everyday agentic workloads — function calling, JSON-structured outputs, and tool orchestration — SLMs are ideal.

They achieve schema validity rates above 99% with guided decoding, ensuring reliable, parseable results without the bloat of LLMs.

You’re getting “good enough” performance at a fraction of the cost.

So, you are getting good enough, sometimes better; performance at 10–30× lower cost, which scales well when you’re handling thousands or millions of agent calls daily.

Press enter or click to view image in full size

That said, LLMs still hold the edge in the toughest 10–20% of tasks:

  • complex multi-hop reasoning,
  • open-domain synthesis with long-range dependencies, and
  • safety-critical judgments under distribution shifts.

Here, that extra 10–15% performance uplift justifies the premium.

The key?

An SLM-default architecture with uncertainty-aware routing: start small for routine structured tasks, escalate to LLMs only when needed.

This hybrid approach retains 80–90% of agentic tasks in the efficient SLM lane, unlocking sustainable, scalable AI without compromise.

Various studies have shown that SLMs are often superior for agentic workflows like RAG, robust function calling and structured decoding and programatic tool use and tool selection.

So this moves away from a single behemoth of a model, but orchestrating multipe model.

Press enter or click to view image in full size

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. Language Models, AI Agents, Agentic Apps, Dev Frameworks & Data-Driven Tools shaping tomorrow.

Press enter or click to view image in full size

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

Responses (1)