Three Key Voicebot Design Considerations

Voicebots pose a number of unique challenges as opposed to text based chatbots. The hardest problem to solve is successfully managing the dialog turn-taking in the conversation. This challenge of turn-taking needs to be considered in the light of other elements like barge-in, background noise, etc. When we as humans have a telephone conversation, there is a sequence of events. We first exchange pleasantries, then we agree on the reason for the call (intent). This is followed by a process of deciding on who goes first, and managing interrupts (barge-in).

Cobus Greyling

5 min readSep 7, 2022

The TL;DR

Voicebots have the distinct disadvantage of design affordances which are invisible from a user perspective.
Small talk needs to be planned for, as users are more prone to smalltalk on a telephone call than a chatbot interface and are in general more verbose.
The intent of the call needs to be firmly established at the start of the call. After intent is established, the voicebot needs to take charge of the dialog, this requires accurate intent detection.
Barge-in should not be too sensitive and must be triggered explicitly, rather than implicit.
Detecting background noise is fairly accurate via an acoustic model and callers should rather be advised to move to a quiet place or call back later. As opposed to trying to cancel the noise.

Small Talk

Small talk is part of our day-to-day conversation process, and generally in conversations there is an introductory small talk section. This is where the user introduces themselves and often states from where they are calling.

Small talk does not need to be extensive, but basic curtesy should be built into the voicebot.

Disambiguation Menus

After small talk, the intent of the conversation needs to be established. This is the same for our human-to-human conversations where intent is established early on and hence-forth underpins the conversation.

The first step is to ask the user to state the reason for their call using only a few words.

This step could be negated if a propensity engine or some kind of business system lookup could be used to glean upfront the reason for the call.

The voicebot needs to take command of the narrative, as will be explained in the next step…however, certainty on the intent is required prior to this step.

Updated: Your Chatbot Should Be Able To Disambiguate

Looking At The Approach Of HumanFirst, Watson Assistant & Cognigy…

cobusgreyling.medium.com

The surest way of confirming intent is to make use of disambiguation menus.

Here is an example of how disambiguation menus can be used…say for instance someone calls the mobile operator, and states the their phone was stolen, a subsequent disambiguation menu can be presented. The menu items are all related to a phone being lost or stolen.

Hence the disambiguation menus can be seen as a theme of a collection of intents. The disambiguation menu can ask the user,

Would you like to…
~ Block your line,
~ Blacklist your device,
~ Perform a SIM swap,
or get suggestions for a new device?

An example of how disambiguation is performed within IBM Watson Assistant.

Take Command Of The Narrative

Once intent is clearly established and ambiguity is removed as much as possible, the voicebot needs to take command of the narrative and the dialog turns.

Especially in a domain specific corporate implementation there will be longer customer procedures to navigate, hence a fixed sequence of events the user needs to go through.

Barge in should be explicit and not implicit, as implicit barge-in can lead to random noises, the user coughing, etc breaking the dialog flow and are seen as a barge-in.

An explicit barge-in is often when the user says something which is unrelated to the current dialog turn. At this juncture the user can be asked of they would like to repeat their input, or end the current process and talk about something else…

Acidental Noise Detection

For the process of converting speech into text, automatic speech recognition, an acoustic model can be trained on a sample of customer utterances. Something which I have found in my experience, is that an acoustic model significantly improves the accuracy of transcribing voice to text.

Lessons I Learnt From Launching A Voicebot

I wanted to write a definitive guide based on personal experiences while launching a voicebot. In this article I…

cobusgreyling.medium.com

Something else I have found (by accident), was that the acoustic model translated noise in a very specific way. The phrases transcribed for noise were consistent enough to advise the user to move to a quiet spot or call back later.

In Closing

Solving for conversational turn-taking and barge-in will be difficult without elements like:

Gesture Recognition
Lip Activity Detection
Object Detection
Gaze Detection

However, making use of the design principles I list in this article will go a long way in improving a voicebot’s NPS, CSAT, containment and problem resolution rate.

https://www.linkedin.com/in/cobusgreyling

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don't already…

cobusgreyling.medium.com

Eliza Language Technology Community - Language Technology: Conversational AI, NLP/NLP, CCAI…

ELIZA - Where language technology enthusiasts unite.

eliza.community

Voicebots & The Importance Of Face Speed

Challenges In Deploying and Managing Speech Interfaces

cobusgreyling.medium.com

Lessons I Learnt From Launching A Voicebot

I wanted to write a definitive guide based on personal experiences while launching a voicebot. In this article I…

cobusgreyling.medium.com

Measuring Chatbot & Voicebot Success

And Why The Metrics Need To Keep Each-other In Check

cobusgreyling.medium.com

Read This Before Converting Your Chatbot To A Voicebot

There Are Telling Differences Between Text and Voice Interfaces

cobusgreyling.medium.com

Design Different For Voicebots Versus Chatbots

…and Why You Cannot Just Voice Enable Your Chatbot

cobusgreyling.medium.com

Three Key Voicebot Design Considerations

The TL;DR

Small Talk

Disambiguation Menus

Updated: Your Chatbot Should Be Able To Disambiguate

Looking At The Approach Of HumanFirst, Watson Assistant & Cognigy…

Take Command Of The Narrative

Acidental Noise Detection

Lessons I Learnt From Launching A Voicebot

I wanted to write a definitive guide based on personal experiences while launching a voicebot. In this article I…

In Closing

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don't already…

Eliza Language Technology Community - Language Technology: Conversational AI, NLP/NLP, CCAI…

ELIZA - Where language technology enthusiasts unite.

Voicebots & The Importance Of Face Speed

Challenges In Deploying and Managing Speech Interfaces

Lessons I Learnt From Launching A Voicebot

I wanted to write a definitive guide based on personal experiences while launching a voicebot. In this article I…

Measuring Chatbot & Voicebot Success

And Why The Metrics Need To Keep Each-other In Check

Read This Before Converting Your Chatbot To A Voicebot

There Are Telling Differences Between Text and Voice Interfaces

Design Different For Voicebots Versus Chatbots

…and Why You Cannot Just Voice Enable Your Chatbot

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Cobus Greyling

No responses yet