A big part of any conversation interface, is knowing who is speaking, and when they are speaking. And who the user is speaking to; someone in their environment or to the conversational interface. And, when is a user done speaking; hence detecting when the dialog turn needs to occur.
Part of the roadmap of Jarvis is the inclusion of additional cognitive elements like computer vision. The vision component addresses lip activity, gaze detection, gesture detection and more.
I first heard about this concept from Fjord Design & Innovation where they referred to these elements as a phenomena called face speed.
Face Speed is the cues and hints we pick up from gestures, facial expressions and lip activity during a conversation.
Subsequently we use these cues to manage the conversation in terms of turn-taking and the other conversational elements mentioned above.
By incorporating these elements in their roadmap, Jarvis is poised to become a true conversational agent, taking cues from the speaker’s appearance.
Transfer Learning with NVIDIA TLT
Jarvis comes packed with pretrained models which you can use to significantly increase accuracy when including it while training with your own data.
NVIDIA Transfer Learning Toolkit (TLT) is a Python-based toolkit for taking purpose-built, pretrained neural models and customizing them with your own data.
The goal of the TLT is to make optimized, state-of-the-art, pretrained models easily retrainable on custom enterprise data with zero coding.
Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task.
Transfer Learning Toolkit (TLT) is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users’ own data.
Developers, researchers and software partners building Conversational AI and Vision AI can leverage TLT to avoid the hassle of training from scratch, and significantly accelerate their workflow.
According to NVIDA:
It can reduce an 80-hour workload to an 8-hour one, reducing a data scientist’s workload by 90%. With the new TLT 3.0 release, the toolkit makes a significant turn and starts supporting the most useful conversational AI models.
Transfer Learning Toolkit
Transfer Learning Toolkit Speed up AI training by over 10x and create highly accurate and efficient domain-specific AI…
TLT is made easy with a zero-coding approach with Python scripts available. Hence you do not need a deep understanding of models of have specific expertise in deep learning or low coding.
TLT creates workflows for each model in terms of:
- Data Preparation
- Fine Tuning
- Inference Exports
Intents & Entities ~ 101
An Intent is the user’s intention with their utterance, or engagement with your bot. Think of intents as verbs, or working words. An utterance or single dialog from a user needs to be distilled into an intent.
Entities can be seen as nouns, often they are referred to as slots. These are usually things like date, time, cities, names, brands etc. Capturing these entities are crucial for taking action based on the user’s intent.
Think of a travel bot, capturing the cities of departure, destination, with travel mode, costs, dates and times etc. are at the foundation of the interface. Yet, this is the hardest part of the NLU process. Keep in mind the user enters data randomly and unstructured; in no particular order.
We as humans identify entities based on the context we detect and hence we know where to pick out a city name; even though we have never previously heard the city name.
Make sure the vocabulary for an intent is specific to the intent it is meant for. Avoid having intents which overlaps.
For example, if you have a chatbot which handles travel arrangements such as flights and hotels, you can choose:
- To have these two user utterances and ideas as separate intents
- Or use the same intent with two entities for specific data inside the utterance; be it flights or hotels.
If the vocabulary between two intents are the same, combine the intent, and use entities.
Take a look at the following two user utterances:
- Book a flight
- Book a hotel
Both use the same wording, “book a”. The format is the same so it should be the same intent with different entities. One entity being flight and the other hotel.
Notebook To Train Joint Intent Detection and Slot Filling Model
If you go to this web page, you will see the commands to execute in order to run the jupyter note book for intent slot classification. Again, you will access to a compatible NVIDIA GPU, this is a required.
You will see the commands to create a virtual environment, and activate it. Install the TLT is python package.
One thing which trip you up, and which is not mentioned in the procedure, is that you need access to the NVIDIA GPU Cloud (NGC) and you need to be logged into NGC.
(Jarvis_NLP) ubuntu@ip-xxx-xx-x-xxx:~$ md5sum -c ngc.md5
And set it with your NGC API key.
(Jarvis_NLP) ubuntu@ip-xxx-xx-x-xxx:~$ md5sum -c ngc.md5
(Jarvis_NLP) ubuntu@ip-xxx-xx-x-xxx:~$ ngc config set
Enter API key [********************************************************************************M2Mw]. Choices: [<VALID_APIKEY>, 'no-apikey']:
After running this command, access https://ngc.nvidia.com/setup/api-key to retrieve your NGC key to enter via the terminal window.
For downloading the notebook you get two options, Wget Resource and CLI Command. I did not have success with the Wget command, the CLI command worked perfectly.
ngc registry resource download-version "nvidia/tlt-jarvis/intentslotclassification_notebook:v1.0"
And lastly, run the notebook with this command…
jupyter notebook --ip 0.0.0.0 --allow-root --port 8888
After running the command, you will see all the access detail and the port the notebook is published on. the token is important to enter via the web browser once you have accessed the URL.
In the PuTTY window you see the URL to follow to the notebook. the token shown needs to be entered on the landing page. Take note, even though the command explicitly state port 8888, the notebook is served on port 8889.
Missing this change in port number when setting up a SSH tunnel can take up much time.
Something to take special note of is the section on mounts of the docker containers. Then down in the document, relevant paths needs to be set again for data, configuration and results.
Depending on your level of knowledge and insight, I would advise running through the notebook example first. Stepping through the notebook gave me a better sense of clarity.
Subsequently you can do the on machine install and configuration using this page as your guide.
NVIDIA does provide detailed documentation, navigation the Jarvis and NLP environments is easier than anticipated. Most impediments to installing and running software are system related I have found. And not NVIDIA software related.
Subscribe to my newsletter.
NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces, Ambient…
Cobus Greyling - Medium
Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…
Joint Intent and Slot Classification - Transfer Learning Toolkit 3.0 documentation
Joint Intent and Slot classification is a method for classifying an Intent and detecting all relevant Slots (Entities)…
NVIDIA Releases Jarvis 1.0 Beta for Building Real-Time Conversational AI Services
Today, NVIDIA released Jarvis 1.0 Beta which includes an end-to-end workflow for building and deploying real-time…
NVIDIA JARVIS NVIDIA Jarvis is an application framework for multimodal conversational AI services that delivers…