Sitemap

NVIDIA’s Data Flywheel Is Powering Continues AI Agent Improvement

NVIDIA has a framework which they refer to as a data flywheel, which is a process which focusses on continuous improvement of AI Agents.

6 min readMay 14, 2025

--

With this article, I want to kick off a series exploring NVIDIA’s data strategy.

Through this journey, I aim to personally deeply understand their approach and share clear, insightful explanations from my own perspective.

Why I believe this approach from NVIDIA is important…

Considering the graph below from research out of Princeton University…

The line described as the Pareto frontier shows a boundary where accuracy cannot be improved by factors which merely introduces cost.

The conclusion from the study is to jointly optimise accuracy and cost which yields better AI Agents.

Hence the importance of the approach proposed by NVIDIA of improving accuracy through an astute data strategy.

Using customisation techniques, you can optimise smaller models to match the accuracy of much larger models, thus reducing latency and total cost of ownership (TCO). ~ NVIDIA

Model Drift (AI Agent Drift)

NVIDIA uses the term “model drift” to describe the principle of an AI Agent with shifting behaviour…I think the principle of “Agent Drift” is important to grasp…

There are focus on telemetry for AI Agents, but the real challenge is putting measures in practise to continuously improve AI Agents. Otherwise it becomes a hit and miss scenario where changes are implemented with uncertain outcomes.

Considering the image below, the sources of declining accuracy of AI application over time include…

  1. The lack of a continuous process of updating enterprise knowledge. Hence information is incomplete and outdated.
  2. Tools are playing an important role in AI Agent architecture, and having a managed approach to tool management is important. Also response variations in tools.
  3. The third and most difficult aspect is that of user behaviour…user needs change as new products and services are introduced, change or are depreciated.

All of these factors have an impact on the AI Agent or AI Application. The hard part is that there are times when the accuracy and experience will improve due to no apparent reason. But in most cases user experience will decline over time and without a clear data strategy optimisation is impossible.

The inputs to a system, the tools it leverages & their responses all evolve continuously. Without a mechanism to adapt, accuracy inevitably declines. ~ NVIDIA

As newer & more capable models emerge, continuously evaluating models (alongside their fine-tuned variants), leveraging user interaction data can ensure sustained performance & adaptability.

What is NVIDIA NeMo?

The name NeMo in the NVIDIA NeMo framework stands for Neural Modules.

It reflects the framework’s modular architecture, where neural modules are interconnectable building blocks used to construct, train and customise AI models and processes.

The term emphasises the toolkit’s focus on flexible, reusable components for building conversational and generative AI systems.

And as I have mentioned, In a follow-up article I will illustrate how these different Neural Modules can be orchestrated into an AI strategy.

Some Practical Examples

There has been focus from OpenAI on model distillation, as I have written in a recent article…

Model providers aim to lock in organisations by encouraging investment in their models through fine-tuning, creating dependency on their ecosystem.

The NeMo LoRA approach offers model independence and also an accelerated approach to data engineering.

For example, NVIDIA states that by fine-tuning a Llama 3.2 1B Instruct model on the xLAM dataset (~60,000 tool calling examples), it is possible to achieve tool calling accuracy close to a Llama 3.1 70B Instruct model, thereby reducing the model size by 70x.

This is a practical example of model optimisation…

NeMo’s Ambit

NVIDIA NeMo is a comprehensive platform for building custom generative AI, including:

  • Large language models (LLMs),
  • Vision language models (VLMs),
  • Retrieval models, video models & speech AI.

In articles to follow, I will cover the following tasks with practical examples:

  • Preparing data for fine-tuning and evaluation
  • Customising the model using LoRA fine-tuning
  • Assessing the accuracy of the customised model
  • Implementing guardrails to ensure safe LLM behaviour

What is LoRA Fine-Tuning?

LoRA (Low-Rank Adaptation) is an efficient fine-tuning technique for large language models (LLMs) that reduces computational and memory demands while maintaining performance.

LoRA fine-tunes a pre-trained LLM by adjusting only a small subset of parameters, rather than updating the entire model.

Fine-tune for specific tasks efficiently

LoRA significantly reduces the number of trainable parameters lowering GPU memory and compute requirements.

Only the small A and B matrices need to be saved, not the entire model, making it easy to store and share fine-tuned versions.

Multiple task-specific LoRA adapters can be created for the same base model, allowing quick switching between tasks without retraining the full model.

Since the original weights are frozen, the model retains its broad capabilities while adapting to specific tasks.

LoRA is ideal for scenarios where resources are limited, multiple task-specific models are needed, or rapid iteration is required. It’s widely used in applications like chatbots, domain-specific AI (e.g., legal or medical), and personalised AI agents.

For practical implementation, NVIDIA NeMo’s tutorials guide users through preparing data, applying LoRA, evaluating results, and adding guardrails, making it accessible for enterprise and research use.

In Closing

NeMo enables creating a process of continuous enhancement of AI Agents with up-to-date information.

It streamlines the process by curating AI and human feedback, refining and evaluating models, and deploying with guardrails and retrieval-augmented generation (RAG) to ensure optimal performance.

With NVIDIA NeMo — secure, scalable, enterprise-grade software within NVIDIA AI Foundry — you can develop and maintain high-performing AI agents. Build, customise and deploy multimodal generative and agentic AI applications effortlessly.

NVIDIA NeMo micro services offer modular tools for

  • Customising (NeMo Customizer),
  • Evaluating (NeMo Evaluator), and
  • Securing (NeMo Guardrails) LLMs.

While optimising AI applications on Kubernetes clusters, whether on-premises or in the cloud.

This pipeline enables data flywheels to continuously improve agents with a steady stream of data.

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

No responses yet