In order to develop a virtual assistant with a speech interface, four key elements are required. The first being Speech Recognition, also referred to Automatic speech recognition or Speech-To-Text. Hence transcribing the user
speech into text. This is the input user touch point.
The second is the output to the user touch point. Being the conversion of text into speech. And preferably natural sounding speech. This is also referred to as Test-To-Speech or Speech Synthesis.
These two elements needs to have low latency, preferably less than 300 milliseconds. It also requires to be trained.
The remaining two elements are synonymous with text based conversational agents; dialog management and Natural Language Understanding. Rasa is the avant-garde when it comes to these two elements.
The second configuration for the Riva & Rasa demo application is where the Natural Language Processing is performed by Riva.
Currently ASR, NLU and TTS models are available in NVIDIA Riva. Trained on thousands of hours of speech data.
On the roadmap of Riva are other cognitive elements like computer vision. The vision component includes lip activity, gaze detection, gesture detection and more.
I first heard about this from Fjord Design & Innovation where they referred to some of these elements as a phenomena called face speed.
Face Speed is the cues and hints we pick up from gestures, facial expressions and lip activity.
By incorporating these elements in their roadmap, Jarvis is poised to become a true conversational agent, taking cues from the speaker’s appearance.
What makes this collaboration between NVIDA and Rasa so compelling is that it is the combination of two technological environments who needs each other as much as they compliment each other.
This is an avenue to speech enable a Rasa digital assistant.
In the Medium article I wrote on getting started with your NVIDIA Riva environment you will find a step-by-step guide to setup a Virtual Machine Instance using AWS EC2. Cost is always a consideration if you are just experimenting, especially if you are charged in a weaker currency.
The EC2 instance can also be started and stopped in order to save on costs.
SSH Tunnels work wonders in accessing URL’s on the VM, latency is a problem when testing the conversational agent in voice.
Rasa is a complete chatbot framework solution for any implementation where the user input is not voice. Hence text input, which includes conversational components like buttons, links etc.
It needs to be noted that from a Conversational AI perspective Rasa has all the features and elements required.
Elements contributing to Rasa being a good option for the NVIDIA Riva environment:
- Free to download and use.
- Contained and complete chatbot framework.
- Open architecture for integration.
- Install anywhere.
The addition Rasa requires to be speech enabled are:
- Automatic Speech Recognition (aka Speech-To-Text)
- Speech Synthesis (aka Text-To-Speech)
I will be remiss not to mention that the NLP capability of Riva is significant and hence the two architectural approaches mentioned at the start. It need not be a choice between the NLU/P of Riva or Rasa. The two can be used in conjunction and complimenting each-other.
The basic sequence of events her shows how the power of Riva NLP and Rasa’s NLU capability can be leveraged, especially for longer input.
One last thought on why Rasa, Rasa is currently the only industrial strength conversational framework which employs machine learning for their dialog management; what is currently in most cases a state machine on other systems.
With Rasa’s vision of deprecating intent classification and also the dialog (or bot script), the flexibility matches the vision of Riva.
Running The Demo
To run the demo and also validate your installation, follow the step-by-step instructions found here. There are two modes to run the conversational agent, one is with Rasa NLU, and the other with Riva NLP.
The conversational agent is served on https://0.0.0.0:5555/rivaWeather and does look like a slimmed down version of what you see in the official demo videos.
The demo can handle small talk to some degree.
To run the weather bot, be sure to add the Weather API key to your Riva configuration. I had trouble with the Rasa Weather action extracting the key, so I hard coded it in the action.
(rasa) root@156ggcbd3bg9:/workspace/samples/rasa-chatbot/rasa-weatherbot/actions# vim weather.py
You will also need to setup the network configuration for the demo to work. There are two locations in the code base that have to be configured for inter-service communication:
Accessing the conversational agent via a browser on my machine is enabled with a SSH tunnel setup to port 5555 on the AMI.
NVIDIA Riva has an ambitious roadmap to become an imbedded voice assistant with speech and visual capabilities. Justice will not be done to the abilities of Riva via a medium like a phone call. But rather imbedded in an application on a phone, smart devices or smart home with audio and vision.
As mentioned, the Riva NLP callabilities are astute and the state management can be facilitated within Riva. Integration to existing text base digital assistants will stand Riva in good stead.
Subscribe to my newsletter.
NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces, Ambient…
Cobus Greyling - Medium
Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…
Introducing NVIDIA Riva: An SDK for GPU-Accelerated Conversational AI Applications | NVIDIA…
This post was updated to include information on the NVIDIA Riva open beta. Real-time conversational AI is a complex and…
This document provides an overview of NGC, including basic setup and use instructions An overview of what constitutes…
Open source conversational AI
The main use-case for Rasa NLU pipelines is to construct virtual assistants. That said, the intents and entities that…
NVIDIA Releases Riva 1.0 Beta for Building Real-Time Conversational AI Services | NVIDIA Developer…
Today, NVIDIA released the Riva 1.0 Beta which includes an end-to-end workflow for building and deploying real-time…