Launching NVIDIA Jarvis Notebooks For Language AI Services

And How To Get You Started…

Cobus Greyling
9 min readMar 11, 2021

--

Introduction

What is Jarvis from NVIDIA?

NVIDIA recently released Jarvis, which is a is described as an application framework for Multimodal Conversational AI.

The focus is on low latency, easy access to deep learning, and high performance demands.

The multimodal aspect of Jarvis is best understood in the context of where NVIDIA wants to take Jarvis in terms of functionality. These includes:

  • ASR (Automatic Speech Recognition)
  • STT (Speech To Text)
  • NLU (Natural Language Understanding)
  • Gesture Recognition
  • Lip Activity Detection
  • Object Detection
  • Gaze Detection
  • Sentiment Detection

Jarvis will be a living service. Living in the user’s environment and surfacing via different devices and environments (car, home, office, phone etc.). This leads to a phenomenon known as ambient orchestration. Where patterns are detected in user behavior, and specific information is surfaced and the right time, in the right place, on the right device. Hence these living services are being being orchestrated following the user touchpoints.

This story is an overview of what is available in the NVIDIA Jarvis demo Jupyter Notebooks and what can be learned from them.

Basic Conversational AI Architecture

What is exciting about this collection of functionality, is that Jarvis is also poised to become a true Conversational Agent.

We communicate as humans not only in voice, but by detecting the gaze of the speaker, lip activity etc.

Another key focus are of Jarvis is transfer learning. There is significant cost saving when it comes to taking the advanced base models of Jarvis and repurposing them for specific uses.

The functionality which is currently available in Jarvis 1.0 Beta includes:

  • Automatic Speech Recognition (ASR/STT),
  • Speech Synthesis (TSS)and
  • Natural Language Processing & Understanding (NLU)
NVIDIA Jarvis Weather demo chatbot deployed on a AMI and Accessed via SSH Tunnel

Here is an example of the NVIDIA Jarvis Weather demo chatbot running in a browser and deployed on a AMI and Accessed via SSH Tunnel.

Software Requirements

Once we have our hardware up and running with access, we need to start installing Jarvis, and launch some of the test and demo application.

Your staring point is to access the NVIDIA NGC website.

NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing.

Create a user profile on NGC, this is free and quick to perform. After you have created your profile, be sure to check your email for the confirmation email and click confirm.

Generate an API Key within NVIDIA NGC and save it.

Be sure to save the API key as seen in the image above; we will use this to authentication on our AMI.

Obviously this process can be performed on your desktop machine, at this stage we do not have any GUI or desktop access to our virtual machine.

Installing NGC On The AMI

Back on your virtual machine on AWS, from the PuTTY application command line, you can execute all the installs and actions.

Before installing Jarvis, you need to install the NGC command line tool (NGC CLI)with these commands:

wget -O ngccli_cat_linux.zip https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip && unzip -o ngccli_cat_linux.zip && chmod u+x ngcmd5sum -c ngc.md5echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profilengc config set

During this installation process, you will be prompted for an API key, which you accessed while registering on NGC.

The NGC CLI installation procedure for Ubuntu.

Deploying Jarvis On The AMI

Deploying Jarvis takes a while, it is here where the installation might halt due to disc space requirements.

The option shown below is the Quick Start scripts approach to set up a local workstation and deploy the Jarvis services using Docker.

Download script.

ngc registry resource download-version nvidia/jarvis/jarvis_quickstart:1.0.0-b.1

Initialize and start Jarvis. The initialization step downloads and prepares Docker images and models. This step takes quite a while.

cd jarvis_quickstart_v1.0.0-b.1
bash jarvis_init.sh
bash jarvis_start.sh

Start a Jupyter Notebook instance and access it from your local machine.

jupyter notebook --ip=0.0.0.0 --allow-root --notebook-dir=/work/notebooks

From inside the client container, try the different services using the provided Jupyter notebooks by running this command.

Copy the token to use in the browser.

SSH Tunneling

In order to access the Jupyter Notebook on your machine…you will need to create a SSH tunnel to the AMI. This sounds more daunting than what it is.

The PuTTY option on the left to setup a SSH tunnel between your workstation and the AMI.

The PuTTY utility makes tunneling easy to setup. Once you have clicked the Open button to connect to the server via SSH and tunnel the desired ports.

Navigate to http://localhost:8000 (or whatever port you chose) in a web browser on your local machine to connect to Jupyter Notebook running on the AMI server.

To continue past this point, copy and paste the token from your putty session from where you launched the notebook instance.

To continue, enter the token which was presented from the last command sent.

This view should be familiar to you, and and opening the folders take you into a good presentation of how you might go about interacting with the Jarvis services.

Demo applications which can be run within Jupyter Notebook.

Demo applications which can be run within Jupyter Notebook.

Jupyter Notebooks Examples

The functionality on display via Jupyter notebook:

  • Offline ASR Example
  • Core NLP Service Examples
  • TTS Service Example
  • Jarvis NLP Service Examples

In short, here are a few extracts from the example applications.

The command to run to start the Jarvis service Jupyter Notebook.

bash jarvis_start.shbash jarvis_start_client.shjupyter notebook --ip=0.0.0.0 --allow-root --notebook-dir=/work/notebooks

Add Punctuation To Text

Adding punctuation is a very useful feature, especially in instances where user user speech is converted to text (ASR/STT) and this this requires punctuation. Think of cases where the text is displayed while the user speaks.

Another example is archiving user input speech in text formant. Or, with natural language generations, this function can also be helpful.

Punctuation Model as seen in notebook.

Example Input:

add punctuation to this sentence here please ok
do you have any red nvidia shirts
i need one cpu four gpus and lots of memory
for my new computer it's going to be very cool

Example Output:

Add punctuation to this sentence here, please, Ok?
Do you have any red Nvidia shirts?
I need one cpu, four gpus and lots of memory for my new computer. It's going to be very cool.

You will see sentences are created, commas added and questions marks where appropriate.

Named Entitles

In NLP, a named entity is a real-world object, such as people, places, companies, products etc.

These named entities can be abstract or have a physical existence. Below are examples of named entities being detected by Jarvis NLU.

Named Entities code block in the Jupyter Notebook

Example Input:

Jensen Huang is the CEO of NVIDIA Corporation, located in Santa Clara, California.

Example Output:

Named Entities:
jensen huang (PER)
nvidia corporation (ORG)
santa clara (LOC)
california (LOC)

Named entities is helpful for creating context and in future articles I will be looking at creating custom models.

Text Classification

Jarvis NLP has a default text classification feature.

Named Entities code block in the Jupyter Notebook

Intents & Entities With Input Domain

This is a good example of how a NLU API can be implemented to extract intents and entities; or as Jarvis revers to it, slots. The input domain is defined upfront.

Named Entities code block in the Jupyter Notebook

Intents & Entities

Here the domain is not provided, the intent and slot is shown with the score.

Example Input:

Is it going to rain tomorrow?

Example Output:

intent {
class_name: "weather.rainfall"
score: 0.9661880135536194
}
slots {
token: "tomorrow"
label {
class_name: "weatherforecastdaily"
score: 0.5325539708137512
}
}
slots {
token: "?"
label {
class_name: "weatherplace"
score: 0.6895459890365601
}
}
domain_str: "weather"
domain {
class_name: "weather"
score: 0.9975590109825134
}

Weather Intent Batch Queries

What I particularly like about this function is the intent of weather in this case, and the sub-intents of cloudy, rainfall or humidity.

This can also be useful for real-time disambiguation in conversations.

Example Input:

"Is it currently cloudy in Tokyo?",
"What is the annual rainfall in Pune?",
"What is the humidity going to be tomorrow?

Example Output:

[weather.cloudy]	Is it currently cloudy in Tokyo?
[weather.rainfall] What is the annual rainfall in Pune?
[weather.humidity] What is the humidity going to be tomorrow?

Conclusion

The advent of Jarvis will surely be a jolt to the current marketplace, especially with imbedded conversational AI solutions. The freedom on installation of open architecture will stand NVIDIA in good stead. Deployment and production architecture will demand careful consideration.

The NVIDIA Jarvis team made sure the documentation is thorough and comprehensive. The demo and quick start applications is of great help to get started. Especially in an environment which is complex and can be very tricky to get started and prototype.

The services available now via JARVIS are:

  • Speech recognition trained on 7000 hours of speech data with stream or batch mode.
  • Speech synthesis available in batch and streaming mode.
  • NLU API’s with a host of services.

One impediment is the requirement for a NVIDIA GPU based on our Turing or Volta architecture.

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com