Photo by Scott Webb on Unsplash

NVIDIA Riva 2.0 Is Now Available

And How To Get Started With NVIDIA Riva For Conversational AI Services

Cobus Greyling
10 min readApr 7, 2022



Around February 2021 early access was granted to Jarvis 1.0 Beta. On 28 July 2021, NVIDIA Jarvis got rebranded to Riva. I always thought of the name Jarvis to be too generally used already. The good news is, the core technologies, performance and roadmap remain unchanged.

On 28 March 2022 NVIDIA Riva 2.0 was launched.

As of now, NVIDIA Riva 2.0 is available in the NVIDIA NGC™ catalog. Riva includes deployment pipelines for:

  • Deploying real-time speech AI for customer care applications
  • Transcriptions,
  • and virtual assistants in multiple languages.

The latest Riva version includes:

  • ASR in multiple languages: English, Spanish, German, Russian, and Mandarin.
  • High-quality TTS voices customizable for unique voice fonts.
  • Domain-specific customization with TAO Toolkit or NVIDIA NeMo for unparalleled accuracy in accent, domain, and country-specific jargon.
  • Support to run in cloud, on-prem, and on embedded platforms.
Virtual Voice Assistant (Voicebot) in its most basic architecture.

NVIDIA Riva Enterprise was also announced, a commercial offering with unlimited automatic speech recognition (ASR) and text-to-speech (TTS) usage and full-service support.

The launch include:

  • State-of-the-art ASR with real-time, world-class accuracy for English, Spanish, German, Russian, and Mandarin.
  • Human-like female and male TTS voices with fine-grained control for voice expressivity.
  • The Riva custom voice recorder with 30 minutes of input voice data.
  • A free trial of Riva Enterprise on NVIDIA LaunchPad with step-by-step guided labs and ready-to-use software, sample data, and applications.
Startup view of the voicebot

Some Initial Observations


  • NVIDIA Riva Enterprise is available via a free trial through NVIDIA Launchpad. But an application needs to be submitted for consideration.
  • The promise from NVIDIA is fast-tracking enterprise speech AI projects.


  • Riva is not a conversational AI development framework. It supplies STT, TTS, NLP and NLU (Intents, Entities, Named Entities.
  • Dialog State Management is not catered for. NVIDIA Riva has a Google Dialogflow and Rasa demo. This speak to the astuteness of the Riva framework, being able to be integrated with such diverse approaches to dialog management.
  • Riva Studio helps you build applications such as chatbots, virtual assistants and multimodal virtual assistants that leverage Riva skills. Early access to NVIDIA Riva Studio is limited to developers working on text-based chatbots. I have applied for early access, with no success. Studio might be Riva’s dialog state management solution.
  • Companies making use of Riva as underlying technology for virtual assistants are: Snap (I guess via their acquisition), T-Mobile, RingCentral, and (Who leads the 2022 Gartner Magic Quadrant for Enterprise Conversational AI Platforms).

More On NVIDIA Riva

NVIDIA Riva is a GPU-accelerated SDK for developing multimodal conversational AI applications.

NVIDIA Riva is an application framework for Multimodal Conversational AI.

The focus is on low latency, less than 300 milliseconds, and high performance demands.

It is a high performance conversational AI solution incorporating speech and visual cues; often referred to as face-speed. Face-speed includes gaze detection, lip activity etc.

Sequence of events we will follow to get to a working prototype.

The multimodal aspect of Riva is best understood in the context of where NVIDIA wants to take Riva in terms of functionality.

This includes:

  • ASR (Automatic Speech Recognition) / STT (Speech To Text)
  • NLU (Natural Language Understanding)
  • Gesture Recognition
  • Lip Activity Detection
  • Object Detection
  • Gaze Detection
  • Sentiment Detection

Again, what is exciting about this collection of functionality, is that Riva is poised to become a true Conversational Agent.

The NVIDIA Riva Demo Weather Bot

Day to day, as humans we communicate not only in voice, but by detecting the gaze of the speaker, lip activity etc.

Another key focus are of Riva is transfer learning.

There is significant cost saving when it comes to taking the advanced base models of Riva and repurposing them for specific uses. The functionality which is currently available in Riva 1.0 Beta includes:

  • ASR,
  • STT and
  • NLU.

Positives & Considerations

The positives are overwhelming…

  • Implementations can be cloud, or local/edge.
  • Riva speaks to mission critical, industrial strength cognitive services & Conversational AI.
  • A new framework for high-performance ASR, STT and NLU.
  • Developers have access to transfer learning and the leveraging the investment made by NVIDIA.
  • The NVIDIA GPU environment addresses mission critical requirements, where latency can be negated.
  • Clear roadmap for Riva in terms of the near future and imminent features.
  • Riva addresses requirements for ambient ubiquitous interfaces.


  • Access, development and deployment seem daunting and the framework appears complicated. In this article I want to debunk access apprehensions. However, production deployment will most certainty be complex.
  • Most probably for a production environment specific hardware considerations will be paramount; especially where cloud/connectivity latency cannot be tolerated.

As per NVIDIA:

Developers at enterprises can easily fine-tune state-of-art-models on their data to achieve a deeper understanding of their specific context and optimize for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7x higher throughput on GPUs compared with CPUs.

The Riva framework includes pre-trained conversational AI models, tools in the NVIDIA AI Toolkit, and optimized end-to-end services for speech, vision, and natural language understanding (NLU) tasks.

Getting Started

As mentioned, NVIDIA Riva is well suited for cloud or edge computing.

Edge computing is computing on localized servers and devices to facilitate speed and negate latency. Instead of relying entirely on cloud computing providers edge computing first processes data initially on a locally.

It is easy to be overwhelmed when getting started with something like Riva.

This article is not a tutorial, but rather a guide on how to:

  • Start as small and simple as possible.
  • Become familiar with the environment, to some extend at least.
  • And spiral your prototype outwards with measured iterations from this initial prototype with increased functionality and complexity.

Graphics Processing Unit

A requirement for experimenting and building with Riva is access to a GPU. And specifically in the case of Riva, NVIDIA GPU based on the Turing or Volta architecture.

This is the one big impediment to experimentation…in this story I am looking at one of the cost effective options you can make use of to access Riva and start building amazing Conversational AI experiences.

The following are requirements for a successful Riva install:

  • Access to NVIDIA GPU based on the Turing or Volta architecture.
  • Access and are logged into NVIDIA GPU Cloud (NGC)

Hardware Requirements

Let’s solve the GPU access problem first. I opted to make use of a NVIDIA Deep Learning AMI (Amazon Machine Image). This is available on AWS EC2 and can be created in a few minutes.

Choosing the NVIDIA Deep Learning AMI

Once the EC2 (elastic cloud computing) instance is running, be sure the stop it when not in use. The charge is per hour while the instance is running, for prototyping and experimenting there is no need to run it 24/7.

It is also worthwhile to compare the cost of different regions; I found the cost differs significantly from one region to another.

Adding Disk Space to the EC2 Instance.

As your installation runs (which we will get to later), you will find the standard 32 GB storage does not suffice. I increased it to 256 GB. Storage can easily be increased via the EC2 portal on AWS; as seen in the image above.

Accessing Your Hardware

Once your EC2 Ubuntu instance is up and running, you obviously need to connect to it. The easiest way is via PuTTY. Install putty on your machine…

PuTTY launch screen.

When creating the EC2 instance, you are presented with a option to download a PEM key. Download this certificate file and save it on somewhere on your machine.

You will need it to create a private key making use of the PuTTY key generator.

Once the private key is generated (*.ppk), you need to click within PuTTY on SSH and Auth, and select this file.

This is an effective and lightweight way to connect to your Ubuntu machine. At this stage your AWS machine is up and running and you have access to it via the command line.

PuTTY Startup screen after login.

Next, let’s take a look at the software requirements…

Software Requirements

Now that we have our hardware up and running with access, we need to start installing Riva, and launch some of the test and demo application.

Your staring point is to access the NVIDIA NGC website.

NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing.

Create a user profile on NGC, this is free and quick to perform. After you have created your profile, be sure to check your email for the confirmation email and click confirm.

Generate an API Key within NVIDIA NGC and save it.

Be sure to save the API key as seen in the image above; we will use this to authentication on our AMI.

Obviously this process can be performed on your machine, at this stage we do not have any GUI or desktop access to our virtual machine.

Installing NGC On The AMI

Back on your virtual machine on AWS, from the PuTTY application command line, you can execute all the installs and actions.

Before installing Riva , you need to install the NGC command line tool (NGC CLI)with these commands:

wget -O && unzip -o && chmod u+x ngcmd5sum -c ngc.md5echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profilengc config set

During this installation process, you will be prompted for an API key, which you accessed while registering on NGC.

The NGC CLI installation procedure for Ubuntu

Deploying Riva On The AMI

Deploying Riva takes a while, it is here where the installation might halt due to disc space requirements.

The option shown below is the Quick Start scripts approach to set up a local workstation and deploy the Riva services using Docker.

Download script.

ngc registry resource download-version nvidia/riva/riva_quickstart:1.4.0-beta

Initialize and start Riva. The initialization step downloads and prepares Docker images and models. This step takes quite a while.

cd riva_quickstart_v1.4.0-beta

Start a Jupyter Notebook instance and access it from your local machine.

jupyter notebook --ip= --allow-root --notebook-dir=/work/notebooks

From inside the client container, try the different services using the provided Jupyter notebooks by running this command.

Copy the token to use in the browser.

SSH Tunneling

In order to access the Jupyter Notebook on your machine…you will need to create a SSH tunnel to the AMI. This sounds more daunting than what it is.

The PuTTY option on the left to setup a SSH tunnel between your workstation and the AMI.

The PuTTY utility makes tunneling easy to setup. Once you have clicked the Open button to connect to the server via SSH and tunnel the desired ports.

Navigate to http://localhost:8000 (or whatever port you chose) in a web browser on your local machine to connect to Jupyter Notebook running on the AMI server.

To continue past this point, copy and paste the token from your putty session from where you launched the notebook instance.

To continue, enter the token which was presented from the last command sent.

This view should be familiar to you, and and opening the folders take you into a good presentation of how you might go about interacting with the Riva services.

Demo applications which can be run within Jupyter Notebook.


The services available now via Riva are:

  • Speech recognition trained on thousands of hours of speech data with stream or batch mode.
  • Speech synthesis available in batch and streaming mode.
  • NLU API’s with a host of services.

The advent of Riva will surely be a jolt to the current marketplace, especially with imbedded conversational AI solutions. The freedom of installation and the open architecture will stand NVIDIA in good stead. As noted, production architecture and deployment will demand careful consideration.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.