Voicebots & The Importance Of Face Speed
Challenges In Deploying and Managing Speech Interfaces
I have written extensively on voicebots from the perspectives of:
- Measuring Success
- Converting a chatbot to a voicebot
- And hardened speech software & hardware environments like NVIDIA Riva.
However, voicebots or speech interfaces are especially difficult to get right due to the synchronous nature of the interaction. Add to this the additional moving parts of Automatic Speech Recognition (ASR), managing Word Error Rate (WER) and speech synthesis.
All these elements are discussed in detail, in the articles listed in the footer.
In this story I would like to discuss the concept of Face Speed, and consider instances where Face Speed is not always possible, like a phone call.
Few reports have had such a profound impact on my thought process, as the 2015 report from Fjord Design & Innovation named The Era of Living Services.
It opened my mind to the idea of user interfaces as a living service, which are continuously adapting to the user.
Thinking of services as ambient and existing within the user’s environment, and the services being orchestrated based on user movement and behaviour. All the while, surfacing the right data, at the right time, via the right medium or interface.
Speech or chat are just two of a number of user interfaces. Other interfaces or input methods include gestures, facial expressions, user routines and behaviours, etc.
All of these elements mentioned above, constitutes an ever changing and adapting, ambient orchestrated service or interface.
For example, the multimodal aspect of NVIDIA Riva is best understood in this context of NVIDIA Riva’s available user interfaces:
- ASR (Automatic Speech Recognition)
- STT (Speech To Text)
- NLU (Natural Language Understanding)
And more specifically…
- Gesture Recognition
- Lip Activity Detection
- Object Detection
- Gaze Detection
- Sentiment Detection
The ultimate and truly humanlike speech interface.
We are use to facial expressions being part of our conversations. We intuitively read each-other’s faces while we have a conversation.
As we make interfaces more humanlike, users will expect it to be synchronous and instantaneous. Users will be less tolerant of delays and a computer that is thinking.
We as users are expecting conversational interfaces to respond at face speed.
For example, designers are anthropomorphising user interfaces, making it more human-like and conversation-like. However, by implication the user’s expectation from the interface, is for it to have human-like face-speed characteristics.
There are two challenges here…the first is that the user is presented with a simple and natural interface, where they feel very much at home. The interface is simplified by removing complexity. This complexity needs to be accommodate and hosted somewhere else. And in the case of conversational systems, the complexity is hosted within the user interface. Hence, taking the complexity away from the user, means adding complexity to conversational design and development.
The second challenge with a GUI is that when more user affordances are added graphically, the worst the interface gets. Thus with a GUI, less is more when it comes to design.
On the contrary, with a voice or conversational interface, the more complexity, the better, because conversation design affordances are invisible from a user perspective.
Face Speed has two components, the speed at which data is delivered. And also the conversational affordances of face speed, detecting who is speaking, reading gestures, expressions and more.
NVIDIA Riva wants to solve for this, but for voicebots via a phone call, this will remain a problem. Turn taking and barge-in are two of the biggest challenges at this stage.
The answer might be in not trying to make the conversation too natural, but having a cue, something serving as a signal or suggestion for turn taking.
Cobus Greyling - City of Johannesburg, Gauteng, South Africa | Professional Profile | LinkedIn
Rasa Hero. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer, Ubiquitous User Interfaces…
Cobus Greyling - Medium
Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…
Eliza Language Technology Community - Language Technology: Conversational AI, NLP/NLP, CCAI…
ELIZA - Where language technology enthusiasts unite.
Read This Before Converting Your Chatbot To A Voicebot
There Are Telling Differences Between Text and Voice Interfaces
Design Different For Voicebots Versus Chatbots
…and Why You Cannot Just Voice Enable Your Chatbot
Measuring Chatbot & Voicebot Success
And Why The Metrics Need To Keep Each-other In Check