The Importance Of Prototyping: NVIDIA Jarvis Virtual Assistant Application

You Will Learn A Lot By Installing & Getting The Demo To Run…

Cobus Greyling

--

Firstly…

If you are deeply and truly curious, you are someone who likes to understand, and likes to get something working by creating your own environment and self-installing

Reading documentation, guides and tutorials do lend some degree of insight.

But mastery starts with creating an environment, this might be virtual or physical hardware. Then setting up the development and runtime environment, and running at least a demonstration application.

Start with a basic application and grow it, with iterations building on each other. (Image Source)

Getting stuck at the point of reading, really creates a false sense of mastery and understanding. It is only once you start building the environment and getting demo applications to run, where you learn by doing.

When I got access to the NVIDIA Jarvis framework, I really wanted to get to grips with the environment. It is easy to be overwhelmed when getting started with an environment like Jarvis. There are often mental barriers when confronted with deep learning and using an astute environment like NVIDIA.

Virtual Voice Assistant (Voicebot) in its most basic architecture.

Jarvis brings deep learning to the masses, and in this article is not a tutorial, but rather a guide on how to:

  • Start as small and simple as possible.
  • Become familiar with the environment, to some extend at least.
  • And spiral your prototype outwards with measured iterations from an initial prototype with increasing functionality and complexity.

The multimodal aspect of Jarvis is best understood in the context of where NVIDIA wants to take Jarvis in terms of functionality.

NVIDIA Jarvis Cognitive Components

What is exciting about this collection of functionality, is that Jarvis is poised to become a true Conversational Agent. We communicate as humans not only in voice, but by detecting the gaze of the speaker, lip activity etc.

Another key focus are of Jarvis is transfer learning. There is significant cost saving when it comes to taking the advanced base models of Jarvis and repurposing them for specific uses.

The functionality which is currently available in Jarvis 1.0 Beta includes ASR, STT and NLU.

Virtual Voice Assistant

This Virtual Assistant sample application demonstrates how to use Jarvis AI Services, specifically ASR, NLP, and TTS, to build a simple but complete conversational AI application.

Startup view of the voicebot

It demonstrates receiving input via speech from the user, interpreting the query via intent recognition and slot filling approach, compiling a response, and speaking this back to the user in a natural voice.

Read more about the installation process here.

To install and run the Jarvis Voicebot demo, start your Jarvis services:

cd jarvis_quickstart_v1.0.0-b.2
bash jarvis_start.sh

Download the samples image from NGC.

docker pull nvcr.io/nvidia/jarvis/jarvis-speech-client:1.0.0-b.1-samples

Run the service within a Docker container.

docker run  -it --rm -p 8009:8009 nvcr.io/nvidia/jarvis/jarvis-speech-client:1.0.0-b.1-samples /bin/bash

Within this directory…

cd samples/jarvis-weather

Edit config.py with the right Jarvis IP, hosting port and your weatherstack API access key (from https://weatherstack.com/). Then, start the server.

Getting your weatherstack API, on the free tier…

https://weatherstack.com/

Getting your API Access Key…

Get Your API Access key and update config.py

Get Your API Access key and update config.py

Start the service…

python3 main.py

Below you can see the NVIDIA Jarvis weather bot accessible on the url https://127.0.0.1:8009/jarvisWeather/. Again, you will have to setup to SSH tunneling from your virtual machine. Read about that here.

NVIDIA Jarvis chatbot & Voicebot Demo

To take a closer look at example code for ASR, TTS and NLU take a look at the Jupyter Notebook examples

Conclusion

My first thought was that getting past the point of an own installation and running the demos would be very daunting…seeing this is NVIDA and deep learning etc.

But on the contrary, getting to grips with Jarvis on a demo application level was straight forward when following the documentation. After running this basic demo voicebot, what are the next steps?

The voicebot where Rasa integration to Jarvis is performed is a step up in complexity and a logic next step. Also perusing the Jupyter Notebooks provide good examples on how to interact with API’s.

--

--