Sitemap

NVIDIA Tool-Calling Data Flywheel for Smarter, Smaller Language Models— A Practical Guide

This practical example shows the Fine-Tuning of LLMs to optimise them for Tool Calling.

5 min readMay 16, 2025

--

Fine-tuning language models is poised to become as routine as daily software updates.

Smaller models optimised for specific tasks will be gaining traction. NVIDIA’s NeMo framework and microservices platform is at the forefront of this shift, enabling developers to fine-tune models like Llama-3.2–1B-Instruct with precision and scale.

By using a data flywheel approach, a self-reinforcing cycle where user interactions generate data to improve models is created.

I’ve seen in data flywheel approaches in action with chatbot and voicebot systems, including weekly training of ASR acoustic models and daily NLU model updates.

Routine Fine-Tuning of Language Models

NVIDIA’s approach scales this principle by integrating a data flywheel:

  1. user interactions generate feedback data,
  2. which is curated,
  3. used to fine-tune models,
  4. evaluated, and
  5. deployed with guardrails to ensure accuracy and safety.

Again, this mirrors my experience with chatbots and voicebots, where continuous data loops improved NLU and ASR performance. NVIDIA’s NeMo microservices amplify this process, making it modular and GPU-accelerated for enterprise-grade applications.

Access

I made use of the NVIDIA Launchpad UI, as seen below…

NVIDIA LaunchPad UI

It allows access to a Development Framework with two H100 PCIe NVL GPU’s. Below the output from the command line UI running the command:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Output…

Notebook 1: Setup and Data Preparation

The first notebook, 00_setup.ipynb, lays the foundation for the tool-calling workflow.

It guides you through configuring the environment, installing dependencies, and preparing the xLAM function-calling dataset.

This dataset contains examples of function calls, enabling the model to learn how to identify and execute tools.

The setup leverages NVIDIA’s NeMo Datastore for efficient data management and NeMo Entity Store for structured knowledge representation.

This step is critical for creating a high-quality dataset, akin to curating training data for ASR or NLU models, ensuring the model learns from relevant, well-structured examples.

Notebook 2: Fine-Tuning Llama-3.2–1B-Instruct

The second notebook, 01_finetune.ipynb, focuses on fine-tuning the Llama-3.2–1B-Instruct model using the xLAM dataset.

NVIDIA’s NeMo Customizer microservice streamlines this process, applying GPU-accelerated techniques to adapt the model for tool-calling tasks.

Fine-tuning adjusts the model’s weights to recognize patterns in function-calling data.

This step is where the data flywheel begins to spin: curated data from user interactions (or synthetic equivalents) refines the model, improving its ability to detect and invoke tools accurately. The result is a lightweight, task-specific model optimized for performance.

Notebook 3: Inference with Tool-Calling

The third notebook, 02_inference.ipynb, shows how the fine-tuned model performs tool-calling during inference. Using NVIDIA’s NeMo Inference Microservices (NIMs), the model processes queries, identifies relevant tools, and executes API calls or dynamic workflows.

This is similar to a chatbot detecting user intent and querying a backend service.

The notebook integrates NeMo Guardrails to enforce safety constraints, ensuring the model’s outputs are accurate and compliant.

This real-time interaction generates new data, feeding the flywheel for future iterations, much like user interactions in my chatbot projects drove continuous improvement.

Notebook 4: Evaluation and Continuous Improvement

The final notebook, 03_evaluation.ipynb, uses NeMo Evaluator to assess the fine-tuned model’s performance.

The evaluation results inform the next cycle of data curation and fine-tuning, closing the data flywheel loop.

By automating evaluation, NVIDIA ensures models remain aligned with business changes, just as frequent NLU retraining kept our voicebots responsive.

Finally

Fine-tuning will become more common, NVIDIA’s tool-calling data flywheel offers a look into the future of AI development.

The four notebooks in the GenerativeAIExamples repository demonstrate a complete workflow — setup, fine-tuning, inference, and evaluation — that empowers developers to create efficient, specialised LLMs.

This approach, rooted in the same data-driven principles I’ve applied in chatbot and voicebot systems, is set to democratise AI customisation.

By scaling the data flywheel, NVIDIA is paving the way for a world where optimised, smaller models deliver big impact.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

Responses (1)