Train NVIDIA Intent and Slot Classification Models Using Transfer Learning

And How To Deploy Models To The Jarvis Format


A big part of any conversation interface, is knowing who is speaking, and when they are speaking. And who the user is speaking to; someone in their environment or to the conversational interface. And, when is a user done speaking; hence detecting when the dialog turn needs to occur.

Part of the roadmap of Jarvis is the inclusion of additional cognitive elements like computer vision. The vision component addresses lip activity, gaze detection, gesture detection and more.

Conversational AI Skills

I first heard about this concept from Fjord Design & Innovation where they referred to these elements as a phenomena called face speed.

Face Speed is the cues and hints we pick up from gestures, facial expressions and lip activity during a conversation.

Subsequently we use these cues to manage the conversation in terms of turn-taking and the other conversational elements mentioned above.

By incorporating these elements in their roadmap, Jarvis is poised to become a true conversational agent, taking cues from the speaker’s appearance.

Transfer Learning with NVIDIA TLT

Jarvis comes packed with pretrained models which you can use to significantly increase accuracy when including it while training with your own data.

NVIDIA Transfer Learning Toolkit (TLT) is a Python-based toolkit for taking purpose-built, pretrained neural models and customizing them with your own data.

The goal of the TLT is to make optimized, state-of-the-art, pretrained models easily retrainable on custom enterprise data with zero coding.

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task.

Transfer Learning Toolkit (TLT) is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users’ own data.

Developers, researchers and software partners building Conversational AI and Vision AI can leverage TLT to avoid the hassle of training from scratch, and significantly accelerate their workflow.

According to NVIDA:

It can reduce an 80-hour workload to an 8-hour one, reducing a data scientist’s workload by 90%. With the new TLT 3.0 release, the toolkit makes a significant turn and starts supporting the most useful conversational AI models.

TLT is made easy with a zero-coding approach with Python scripts available. Hence you do not need a deep understanding of models of have specific expertise in deep learning or low coding.

TLT creates workflows for each model in terms of:

  • Data Preparation
  • Training
  • Fine Tuning
  • Inference Exports

Intents & Entities ~ 101

An Intent is the user’s intention with their utterance, or engagement with your bot. Think of intents as verbs, or working words. An utterance or single dialog from a user needs to be distilled into an intent.

NVIDIA Jarvis Jupyter Notebook. Here the domain is not provided, the intent and slot are shown with the score.

Entities can be seen as nouns, often they are referred to as slots. These are usually things like date, time, cities, names, brands etc. Capturing these entities are crucial for taking action based on the user’s intent.

Think of a travel bot, capturing the cities of departure, destination, with travel mode, costs, dates and times etc. are at the foundation of the interface. Yet, this is the hardest part of the NLU process. Keep in mind the user enters data randomly and unstructured; in no particular order.

We as humans identify entities based on the context we detect and hence we know where to pick out a city name; even though we have never previously heard the city name.

Make sure the vocabulary for an intent is specific to the intent it is meant for. Avoid having intents which overlaps.

NVIDIA Jarvis Weather App with contextual entities.

For example, if you have a chatbot which handles travel arrangements such as flights and hotels, you can choose:

  • To have these two user utterances and ideas as separate intents
  • Or use the same intent with two entities for specific data inside the utterance; be it flights or hotels.

If the vocabulary between two intents are the same, combine the intent, and use entities.

Take a look at the following two user utterances:

  • Book a flight
  • Book a hotel

Both use the same wording, “book a”. The format is the same so it should be the same intent with different entities. One entity being flight and the other hotel.

Notebook To Train Joint Intent Detection and Slot Filling Model

If you go to this web page, you will see the commands to execute in order to run the jupyter note book for intent slot classification. Again, you will access to a compatible NVIDIA GPU, this is a required.

Page for Accessing the intent classification notebook.

You will see the commands to create a virtual environment, and activate it. Install the TLT is python package.

One thing which trip you up, and which is not mentioned in the procedure, is that you need access to the NVIDIA GPU Cloud (NGC) and you need to be logged into NGC.

check NGC:

(Jarvis_NLP) ubuntu@ip-xxx-xx-x-xxx:~$ md5sum -c ngc.md5
ngc: OK

And set it with your NGC API key.

(Jarvis_NLP) ubuntu@ip-xxx-xx-x-xxx:~$ md5sum -c ngc.md5
ngc: OK
(Jarvis_NLP) ubuntu@ip-xxx-xx-x-xxx:~$ ngc config set
Enter API key [********************************************************************************M2Mw]. Choices: [<VALID_APIKEY>, 'no-apikey']:

After running this command, access to retrieve your NGC key to enter via the terminal window.

For downloading the notebook you get two options, Wget Resource and CLI Command. I did not have success with the Wget command, the CLI command worked perfectly.

ngc registry resource download-version "nvidia/tlt-jarvis/intentslotclassification_notebook:v1.0"

And lastly, run the notebook with this command…

jupyter notebook --ip --allow-root --port 8888

After running the command, you will see all the access detail and the port the notebook is published on. the token is important to enter via the web browser once you have accessed the URL.

PuTTY window after the Jupyter Notebook command has run.

In the PuTTY window you see the URL to follow to the notebook. the token shown needs to be entered on the landing page. Take note, even though the command explicitly state port 8888, the notebook is served on port 8889.

Missing this change in port number when setting up a SSH tunnel can take up much time.

The start of the notebook sequence of events

Something to take special note of is the section on mounts of the docker containers. Then down in the document, relevant paths needs to be set again for data, configuration and results.


Depending on your level of knowledge and insight, I would advise running through the notebook example first. Stepping through the notebook gave me a better sense of clarity.

Subsequently you can do the on machine install and configuration using this page as your guide.

NVIDIA does provide detailed documentation, navigation the Jarvis and NLP environments is easier than anticipated. Most impediments to installing and running software are system related I have found. And not NVIDIA software related.

NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store