Riva brings deep learning to the masses. The multimodal aspect of Riva is best understood in the context of where NVIDIA wants to take Riva in terms of functionality.
Within NVIDIA GPU Cloud, also known as NGC, there is a catalog of varous implementation scenarios. Each of these catalog items, hold step-by-step instructions and scripts on creating deep learning models, with sample performance and accuracy metrics to compare results to.
These notebooks are provide guidance on creating models for language translation, text-to-speech, text classification and more.
Above are the NVIDA Riva Transfer Learning Toolkit (TLT) catalog items at your disposal. Each catalog item has step-by-step instructions on how to install and launch the Jupyter Notebooks.
What is exciting about this collection of functionality, is that Riva is poised to become a true Conversational Agent. We communicate as humans not only in voice, but by detecting the gaze of the speaker, lip activity etc.
Another key focus are of Riva is transfer learning. There is significant cost saving when it comes to taking the advanced base models of Riva and repurposing them for specific uses.
The functionality which is currently available in Riva includes ASR, STT and NLU. Edge installation is a huge benefit.
Setting Up Your Environment
Access to NVIDA software, Jupyter Notebooks and demo applications are easy and resources are abundant. The only impediment is access to a NVIDIA GPU based on the Turing or Volta architecture.
In this article I look at one of the more cost effective ways to access such infrastructure via an AWS EC2 instance.
NVIDA Riva Notebooks
To get you started, NVIDIA Riva has quite a few Jupyter Notebook examples available which you can use to step through. These comprise of different speech implementations, including speech-to-text, text-to-speech, named entities, intent & slot detection and more.
When clicking on each the catalog items, you will see a list of commands to execute in order to launch the note book. These commands are fairly accurate and execution is not a problem.
When NGC commands are used, the command line prompts for an API key, which must be gleaned from the NVIDIDA NGC Setup page.
In this article I explain the installation, SSH and tunneling process in detail. A SSH tunnel on port 8888 is required to launch the Jupyter Notebook in a browser on your local machine.
The notebook takes you through the process of defining directories, training models and exporting to a
.riva file. And subsequent deployment workflow to consume the
.riva file and deploy it to Riva.
My first thought was that getting past the point of an own installation and running the demos would be very daunting…seeing this is a NVIDA and deep learning environment.
But on the contrary, getting to grips with Riva on a demo application level was straight forward when following the documentation. After running this basic demo voicebot, what are the next steps?
The voicebot where Rasa integration to Riva is performed is a step up in complexity and a logic next step. Also perusing the Jupyter Notebooks provide good examples on how to interact with API’s.
Positives & Considerations
The positives are overwhelming…
- Implementations can be cloud, or local/edge.
- Riva speaks to mission critical, industrial strength cognitive services & Conversational AI.
- A new framework for high-performance ASR, STT and NLU.
- Developers have access to transfer learning and the leveraging the investment made by NVIDIA.
- The NVIDIA GPU environment addresses mission critical requirements, where latency can be negated.
- Clear roadmap for Riva in terms of the near future and imminent features.
- Riva addresses requirements for ambient ubiquitous interfaces.
- Access, development and deployment seem daunting and the framework appears complicated. In this article I want to debunk access apprehensions. However, production deployment will most certainty be complex.
- Most probably for a production environment specific hardware considerations will be paramount; especially where cloud/connectivity latency cannot be tolerated.
The services available now via Riva are:
- Speech recognition trained on thousands of hours of speech data with stream or batch mode.
- Speech synthesis available in batch and streaming mode.
- NLU API’s with a host of services.
The advent of Riva will surely be a jolt to the current marketplace, especially with imbedded conversational AI solutions. The freedom of installation and the open architecture will stand NVIDIA in good stead. As noted, production architecture and deployment will demand careful consideration.
Subscribe to my newsletter.
NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces, Ambient…
Cobus Greyling - Medium
Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…
NVIDIA Jarvis Has Been Renamed To NVIDIA Riva
How To Get Started With NVIDIA Riva For Conversational AI Services
NVIDIA Riva Documentation
NVIDIA Deep Learning Riva Documentation - Last updated November 10, 2021 - Send Feedback - NVIDIA Riva Speech Skills…
t NVIDIA Riva What is NVIDIA Riva? Benefits Demos Riva SDK Overview End-to-End Workflow Speech Recognition…