Voicebots & The Importance Of Face Speed

Challenges In Deploying and Managing Speech Interfaces

Cobus Greyling

4 min readJun 24, 2022

Introduction

I have written extensively on voicebots from the perspectives of:

Design
Measuring Success
Converting a chatbot to a voicebot
And hardened speech software & hardware environments like NVIDIA Riva.

However, voicebots or speech interfaces are especially difficult to get right due to the synchronous nature of the interaction. Add to this the additional moving parts of Automatic Speech Recognition (ASR), managing Word Error Rate (WER) and speech synthesis.

All these elements are discussed in detail, in the articles listed in the footer.

In this story I would like to discuss the concept of Face Speed, and consider instances where Face Speed is not always possible, like a phone call.

Living Services

Few reports have had such a profound impact on my thought process, as the 2015 report from Fjord Design & Innovation named The Era of Living Services.

It opened my mind to the idea of user interfaces as a living service, which are continuously adapting to the user.

Thinking of services as ambient and existing within the user’s environment, and the services being orchestrated based on user movement and behaviour. All the while, surfacing the right data, at the right time, via the right medium or interface.

Speech or chat are just two of a number of user interfaces. Other interfaces or input methods include gestures, facial expressions, user routines and behaviours, etc.

All of these elements mentioned above, constitutes an ever changing and adapting, ambient orchestrated service or interface.

For example, the multimodal aspect of NVIDIA Riva is best understood in this context of NVIDIA Riva’s available user interfaces:

ASR (Automatic Speech Recognition)
STT (Speech To Text)
NLU (Natural Language Understanding)

And more specifically…

Gesture Recognition
Lip Activity Detection
Object Detection
Gaze Detection
Sentiment Detection

The ultimate and truly humanlike speech interface.

Face Speed

We are use to facial expressions being part of our conversations. We intuitively read each-other’s faces while we have a conversation.

As we make interfaces more humanlike, users will expect it to be synchronous and instantaneous. Users will be less tolerant of delays and a computer that is thinking.

We as users are expecting conversational interfaces to respond at face speed.

For example, designers are anthropomorphising user interfaces, making it more human-like and conversation-like. However, by implication the user’s expectation from the interface, is for it to have human-like face-speed characteristics.

There are two challenges here…the first is that the user is presented with a simple and natural interface, where they feel very much at home. The interface is simplified by removing complexity. This complexity needs to be accommodate and hosted somewhere else. And in the case of conversational systems, the complexity is hosted within the user interface. Hence, taking the complexity away from the user, means adding complexity to conversational design and development.

The second challenge with a GUI is that when more user affordances are added graphically, the worst the interface gets. Thus with a GUI, less is more when it comes to design.

On the contrary, with a voice or conversational interface, the more complexity, the better, because conversation design affordances are invisible from a user perspective.

Lastly…

Face Speed has two components, the speed at which data is delivered. And also the conversational affordances of face speed, detecting who is speaking, reading gestures, expressions and more.

NVIDIA Riva wants to solve for this, but for voicebots via a phone call, this will remain a problem. Turn taking and barge-in are two of the biggest challenges at this stage.

The answer might be in not trying to make the conversation too natural, but having a cue, something serving as a signal or suggestion for turn taking.

Cobus Greyling - City of Johannesburg, Gauteng, South Africa | Professional Profile | LinkedIn

Rasa Hero. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces…

www.linkedin.com

Cobus Greyling - Medium

Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…

cobusgreyling.medium.com

Eliza Language Technology Community - Language Technology: Conversational AI, NLP/NLP, CCAI…

ELIZA - Where language technology enthusiasts unite.

www.eliza.community

Read This Before Converting Your Chatbot To A Voicebot

There Are Telling Differences Between Text and Voice Interfaces

cobusgreyling.medium.com

Design Different For Voicebots Versus Chatbots

…and Why You Cannot Just Voice Enable Your Chatbot

cobusgreyling.medium.com

Measuring Chatbot & Voicebot Success

And Why The Metrics Need To Keep Each-other In Check

cobusgreyling.medium.com

NVIDIA Riva 2.0 Is Now Available

And How To Get Started With NVIDIA Riva For Conversational AI Services

cobusgreyling.medium.com

Voicebots & The Importance Of Face Speed

Challenges In Deploying and Managing Speech Interfaces

Introduction

Living Services

Face Speed

Lastly…

Cobus Greyling - City of Johannesburg, Gauteng, South Africa | Professional Profile | LinkedIn

Rasa Hero. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces…

Cobus Greyling - Medium

Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…

Eliza Language Technology Community - Language Technology: Conversational AI, NLP/NLP, CCAI…

ELIZA - Where language technology enthusiasts unite.

Read This Before Converting Your Chatbot To A Voicebot

There Are Telling Differences Between Text and Voice Interfaces

Design Different For Voicebots Versus Chatbots

…and Why You Cannot Just Voice Enable Your Chatbot

Measuring Chatbot & Voicebot Success

And Why The Metrics Need To Keep Each-other In Check

NVIDIA Riva 2.0 Is Now Available

And How To Get Started With NVIDIA Riva For Conversational AI Services

Written by Cobus Greyling

No responses yet