I was privileged to be selected for early access to Jarvis 1.0 Beta.
And today NVIDIA released Jarvis, which is a is described as an application framework for Multimodal Conversational AI.
The focus is on low latency, less than 300 milliseconds, and high performance demands.
It is a high performance conversational AI solution incorporating speech and visual cues; often referred to as face-speed. Face-speed includes gaze detection, lip activity etc.
The multimodal aspect of Jarvis is best understood in the context of where NVIDIA wants to take Jarvis in terms of functionality.
- ASR (Automatic Speech Recognition)
- STT (Speech To Text)
- NLU (Natural Language Understanding)
- Gesture Recognition
- Lip Activity Detection
- Object Detection
- Gaze Detection
- Sentiment Detection
Again, what is exciting about this collection of functionality, is that Jarvis is poised to become a true Conversational Agent.
Day to day, as humans we communicate not only in voice, but by detecting the gaze of the speaker, lip activity etc.
Another key focus are of Jarvis is transfer learning.
There is significant cost saving when it comes to taking the advanced base models of Jarvis and repurposing them for specific uses. The functionality which is currently available in Jarvis 1.0 Beta includes:
- STT and
Positives & Considerations
The positives are overwhelming…
- Implementations can be cloud, or local/edge.
- Jarvis speaks to mission critical, industrial strength cognitive services & Conversational AI.
- A new framework for high-performance ASR, STT and NLU.
- Developers have access to transfer learning and the leveraging the investment made by NVIDIA.
- The NVIDIA GPU environment addresses mission critical requirements, where latency can be negated.
- Clear roadmap for Jarvis in terms of the near future in imminent features.
- Jarvis addresses requirements for ambient ubiquitous interfaces.
- Access, development and deployment seem daunting and the framework appears complicated. In this article I want to debunk access apprehensions. However, production deployment will most certainty be complex.
- Most probably for a production environment specific hardware considerations will be paramount; especially where cloud/connectivity latency cannot be tolerated.
As per NVIDIA:
Developers at enterprises can easily fine-tune state-of-art-models on their data to achieve a deeper understanding of their specific context and optimize for inference to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers 7x higher throughput on GPUs compared with CPUs.
The Jarvis framework includes pre-trained conversational AI models, tools in the NVIDIA AI Toolkit, and optimized end-to-end services for speech, vision, and natural language understanding (NLU) tasks.
As mentioned, NVIDIA Jarvis is well suited for cloud or edge computing.
Edge computing is computing on localized servers and devices to facilitate speed and negate latency. Instead of relying entirely on cloud computing providers edge computing first processes data initially on a locally.
It is easy to be overwhelmed when getting started with something like Jarvis.
This article is not a tutorial, but rather a guide on how to:
- Start as small and simple as possible.
- Become familiar with the environment, to some extend at least.
- And spiral your prototype outwards with measured iterations from this initial prototype with increased functionality and complexity.
Graphics Processing Unit
A requirement for experimenting and building with Jarvis is access to a GPU. And specifically in the case of Jarvis, NVIDIA GPU based on the Turing or Volta architecture.
This is the one big impediment to experimentation…in this story I am looking at one of the cost effective options you can make use of to access Jarvis and start building amazing Conversational AI experiences.
The following are requirements for a successful Jarvis install:
- Access to NVIDIA GPU based on the Turing or Volta architecture.
- Access and are logged into NVIDIA GPU Cloud (NGC)
Let’s solve the GPU access problem first. I opted to make use of a NVIDIA Deep Learning AMI (Amazon Machine Image). This is available on AWS EC2 and can be created in a few minutes.
Once the EC2 (elastic cloud computing) instance is running, be sure the stop it when not in use. The charge is per hour while the instance is running, for prototyping and experimenting there is no need to run it 24/7.
It is also worthwhile to compare the cost of different regions; I found the cost differs significantly from one region to another.
As your installation runs (which we will get to later), you will find the standard 32 GB storage does not suffice. I increased it to 256 GB. Storage can easily be increased via the EC2 portal on AWS; as seen in the image above.
Accessing Your Hardware
Once your EC2 Ubuntu instance is up and running, you obviously need to connect to it. The easiest way is via PuTTY. Install putty on your machine…
When creating the EC2 instance, you are presented with a option to download a PEM key. Download this certificate file and save it on somewhere on your machine.
You will need it to create a private key making use of the PuTTY key generator.
Once the private key is generated (*.ppk), you need to click within PuTTY on SSH and Auth, and select this file.
This is an effective and lightweight way to connect to your Ubuntu machine. At this stage your AWS machine is up and running and you have access to it via the command line.
Next, let’s take a look at the software requirements…
Now that we have our hardware up and running with access, we need to start installing Jarvis, and launch some of the test and demo application.
Your staring point is to access the NVIDIA NGC website.
NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing.
Create a user profile on NGC, this is free and quick to perform. After you have created your profile, be sure to check your email for the confirmation email and click confirm.
Be sure to save the API key as seen in the image above; we will use this to authentication on our AMI.
Obviously this process can be performed on your machine, at this stage we do not have any GUI or desktop access to our virtual machine.
Installing NGC On The AMI
Back on your virtual machine on AWS, from the PuTTY application command line, you can execute all the installs and actions.
Before installing Jarvis, you need to install the NGC command line tool (NGC CLI)with these commands:
wget -O ngccli_cat_linux.zip https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip && unzip -o ngccli_cat_linux.zip && chmod u+x ngcmd5sum -c ngc.md5echo "export PATH=\"\$PATH:$(pwd)\"" >> ~/.bash_profile && source ~/.bash_profilengc config set
During this installation process, you will be prompted for an API key, which you accessed while registering on NGC.
Deploying Jarvis On The AMI
Deploying Jarvis takes a while, it is here where the installation might halt due to disc space requirements.
The option shown below is the Quick Start scripts approach to set up a local workstation and deploy the Jarvis services using Docker.
ngc registry resource download-version nvidia/jarvis/jarvis_quickstart:1.0.0-b.1
Initialize and start Jarvis. The initialization step downloads and prepares Docker images and models. This step takes quite a while.
Start a Jupyter Notebook instance and access it from your local machine.
jupyter notebook --ip=0.0.0.0 --allow-root --notebook-dir=/work/notebooks
From inside the client container, try the different services using the provided Jupyter notebooks by running this command.
In order to access the Jupyter Notebook on your machine…you will need to create a SSH tunnel to the AMI. This sounds more daunting than what it is.
The PuTTY utility makes tunneling easy to setup. Once you have clicked the Open button to connect to the server via SSH and tunnel the desired ports.
http://localhost:8000 (or whatever port you chose) in a web browser on your local machine to connect to Jupyter Notebook running on the AMI server.
To continue past this point, copy and paste the token from your putty session from where you launched the notebook instance.
This view should be familiar to you, and and opening the folders take you into a good presentation of how you might go about interacting with the Jarvis services.
The services available now via JARVIS are:
- Speech recognition trained on 7000 hours of speech data with stream or batch mode.
- Speech synthesis available in batch and streaming mode.
- NLU API’s with a host of services.
The advent of Jarvis will surely be a jolt to the current marketplace, especially with imbedded conversational AI solutions. The freedom of installation and the open architecture will stand NVIDIA in good stead. As noted, production architecture and deployment will demand careful consideration.
Subscribe to my newsletter.
NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces, Ambient…
Cobus Greyling - Medium
Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…
NVIDIA: World Leader in Artificial Intelligence Computing
NVIDIA, inventor of the GPU, which creates interactive graphics on laptops, workstations, mobile devices, notebooks…
NVIDIA JARVIS NVIDIA Jarvis is an application framework for multimodal conversational AI services that delivers…
Graphics processing unit
A graphics processing unit ( GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory…
NVIDIA Jarvis Documentation
NVIDIA Deep Learning Jarvis Documentation - Last updated February 12, 2021 - Send Feedback - NVIDIA Jarvis Speech…