Launching NVIDIA Riva Notebooks For Language AI Services

And How To Get You Started…

Cobus Greyling

8 min readAug 9, 2021

Introduction

What is Riva from NVIDIA?

NVIDIA Riva Weather demo chatbot deployed on a AMI and Accessed via SSH Tunnel

NVIDIA recently released Riva, which is described as an application framework for Multimodal Conversational AI.

The focus is on low latency, easy access to deep learning, and high performance demands.

The multimodal aspect of Riva is best understood in the context of where NVIDIA wants to take Riva in terms of functionality.

These includes:

ASR (Automatic Speech Recognition)
STT (Speech To Text)
NLU (Natural Language Understanding)
Gesture Recognition
Lip Activity Detection
Object Detection
Gaze Detection
Sentiment Detection

Riva will be a living service. Living in the user’s environment and surfacing via different devices and environments (car, home, office, phone etc.). This leads to a phenomenon known as ambient orchestration. Where patterns are detected in user behavior, and specific information is surfaced and the right time, in the right place, on the right device. Hence these living services are being being orchestrated following the user touchpoints.

This story is an overview of what is available in the NVIDIA Riva demo Jupyter Notebooks and what can be learned from them.

What is exciting about this collection of functionality, is that Riva is also poised to become a true Conversational Agent.

We communicate as humans not only in voice, but by detecting the gaze of the speaker, lip activity etc.

Another key focus are of Riva is transfer learning. There is significant cost saving when it comes to taking the advanced base models of Riva and repurposing them for specific uses.

The functionality which is currently available in Riva includes:

Automatic Speech Recognition (ASR/STT),
Speech Synthesis (TSS)and
Natural Language Processing & Understanding (NLU)

Here is an example of the NVIDIA Riva Weather demo chatbot running in a browser and deployed on a AMI and Accessed via SSH Tunnel.

Virtual Voice Assistant

This Virtual Assistant sample application demonstrates how to use Riva AI Services, specifically ASR, NLP, and TTS, to build a simple but complete conversational AI application.

It demonstrates receiving input via speech from the user, interpreting the query via intent recognition and slot filling approach, compiling a response, and speaking this back to the user in a natural voice.

Read more about the installation process here.

To install and run the Riva Voicebot demo, start your Riva services:

cd riva_quickstart_v1.4.0-beta
bash riva_init.sh

Download the samples image from NGC.

docker pull nvcr.io/nvidia/riva/riva-speech-client:1.4.0-beta-samples

Run the service within a Docker container.

docker run  -it --rm -p 8009:8009 nvcr.io/nvidia/riva/riva-speech-client:1.4.0-beta-samples /bin/bash

Within this directory…

cd samples/virtual-assistant

Edit config.py with the right Riva IP, hosting port and your weatherstack API access key (from https://weatherstack.com/). Then, start the server.

Getting your weatherstack API, on the free tier…

Getting your API Access Key…

Get Your API Access key and update config.py

Start the service…

python3 main.py

Below you can see the NVIDIA Riva weather bot accessible on the url https://127.0.0.1:8009/rivaWeather.

Again, you will have to setup to SSH tunneling from your virtual machine. Read about that here.

To take a closer look at example code for ASR, TTS and NLU take a look at the Jupyter Notebook examples…

Jupyter Notebook Examples

The functionality on display via Jupyter notebook:

Offline ASR Example
Core NLP Service Examples
TTS Service Example
Riva NLP Service Examples

In short, here are a few extracts from the example applications.

The command to run to start the Riva service Jupyter Notebook.

bash riva_start_client.shjupyter notebook --ip=0.0.0.0 --allow-root --notebook-dir=/work/notebooks

Add Punctuation To Text

Adding punctuation is a very useful feature, especially in the following use cases:

When user speech is generated when interacting with a voicebot and presented on a display.
Archiving user input speech in text formant.
When text is generated, this function can also be helpful.

Example Input:

add punctuation to this sentence here please ok
do you have any red nvidia shirts
i need one cpu four gpus and lots of memory
for my new computer it's going to be very cool

Example Output:

Add punctuation to this sentence here, please, Ok?
Do you have any red Nvidia shirts?
I need one cpu, four gpus and lots of memory for my new computer. It's going to be very cool.

Named Entitles

In NLP, a named entity is a real-world object, such as people, places, companies, products etc.

These named entities can be abstract or have a physical existence. Below are examples of named entities being detected by Riva NLU.

Named Entities code block in the Jupyter Notebook

Example Input:

Jensen Huang is the CEO of NVIDIA Corporation, located in Santa Clara, California.

Example Output:

Named Entities:
  jensen huang (PER)
  nvidia corporation (ORG)
  santa clara (LOC)
  california (LOC)

Text Classification

Riva NLP has a default text classification feature.

Intents & Entities With Input Domain

This is a good example of how a NLU API can be implemented to extract intents and entities; or as Riva revers to it, slots. The input domain is defined upfront.

Intent and Slot Classification with Input Domain

Intents & Entities

Here the domain is not provided, the intent and slot is shown with the score.

Example Input:

Is it going to rain tomorrow?

Example Output:

intent {
  class_name: "weather.rainfall"
  score: 0.9661880135536194
}
slots {
  token: "tomorrow"
  label {
    class_name: "weatherforecastdaily"
    score: 0.5325539708137512
  }
}
slots {
  token: "?"
  label {
    class_name: "weatherplace"
    score: 0.6895459890365601
  }
}
domain_str: "weather"
domain {
  class_name: "weather"
  score: 0.9975590109825134
}

Weather Intent Batch Queries

What I particularly like about this function is the intent of weather in this case, and the sub-intents of cloudy, rainfall or humidity.

This can also be useful for real-time disambiguation in conversations.

Example Input:

"Is it currently cloudy in Tokyo?",
"What is the annual rainfall in Pune?",
"What is the humidity going to be tomorrow?

Example Output:

[weather.cloudy]	Is it currently cloudy in Tokyo?
[weather.rainfall]	What is the annual rainfall in Pune?
[weather.humidity]	What is the humidity going to be tomorrow?

A Few Other Random Examples

Offline Automatic Speech Recognition Example

Conclusion

The advent of Riva will surely be a jolt to the current marketplace, especially with imbedded conversational AI solutions. The freedom on installation of open architecture will stand NVIDIA in good stead. Deployment and production architecture will demand careful consideration.

The NVIDIA Riva team made sure the documentation is thorough and comprehensive. The demo and quick start applications is of great help to get started. Especially in an environment which is complex and can be very tricky to get started and prototype.

The services available now via Riva are:

Speech recognition trained on 7000 hours of speech data with stream or batch mode.
Speech synthesis available in batch and streaming mode.
NLU API’s with a host of services.

One impediment is the requirement for a NVIDIA GPU based on our Turing or Volta architecture.

Subscribe to my newsletter.

NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces, Ambient…

cobusgreyling.me

Cobus Greyling - Medium

Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…

cobusgreyling.medium.com

NVIDIA: World Leader in Artificial Intelligence Computing

NVIDIA, inventor of the GPU, which creates interactive graphics on laptops, workstations, mobile devices, notebooks…

www.nvidia.com

NVIDIA Riva

NVIDIA RIVA NVIDIA Riva is a GPU-accelerated SDK for building multimodal conversational AI applications that deliver…

developer.nvidia.com

Graphics processing unit - Wikipedia

A graphics processing unit ( GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory…

en.wikipedia.org

NVIDIA Riva Documentation

NVIDIA Deep Learning Riva Documentation - Last updated July 27, 2021 - Send Feedback - NVIDIA Riva Speech Skills…

docs.nvidia.com

NVIDIA Releases Riva 1.0 Beta for Building Real-Time Conversational AI Services | NVIDIA Developer…

Today, NVIDIA released the Riva 1.0 Beta which includes an end-to-end workflow for building and deploying real-time…

news.developer.nvidia.com

Launching NVIDIA Riva Notebooks For Language AI Services

And How To Get You Started…

Introduction

Virtual Voice Assistant

Jupyter Notebook Examples

Add Punctuation To Text

Named Entitles

Text Classification

Intents & Entities With Input Domain

Intents & Entities

Weather Intent Batch Queries

A Few Other Random Examples

Offline Automatic Speech Recognition Example

Conclusion

Subscribe to my newsletter.

NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces, Ambient…

Cobus Greyling - Medium

Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…

NVIDIA: World Leader in Artificial Intelligence Computing

NVIDIA, inventor of the GPU, which creates interactive graphics on laptops, workstations, mobile devices, notebooks…

NVIDIA Riva

NVIDIA RIVA NVIDIA Riva is a GPU-accelerated SDK for building multimodal conversational AI applications that deliver…

Graphics processing unit - Wikipedia

A graphics processing unit ( GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory…

NVIDIA Riva Documentation

NVIDIA Deep Learning Riva Documentation - Last updated July 27, 2021 - Send Feedback - NVIDIA Riva Speech Skills…

NVIDIA Releases Riva 1.0 Beta for Building Real-Time Conversational AI Services | NVIDIA Developer…

Today, NVIDIA released the Riva 1.0 Beta which includes an end-to-end workflow for building and deploying real-time…

Written by Cobus Greyling

No responses yet