Three Design Challenges When Launching A Phone-Based Voicebot
While reading this article, keep in mind that a phone call to a contact centre is often a last resort from a customer perspective. Perhaps only followed by going to a physical walk-in centre.
Despite the difficulties I lay out in this article, a voicebot is a very important investment that companies should be trying to figure out.
1️⃣ NLU Design
User input for voicebots are more verbose compared to chatbots and users are more prone to repeating phrases.
Add challenges like authentication, callers mixing their languages, customer speak for products and services, digression, self-correction, ambiguity, etc.
All of these factors contribute to voice user utterance and conversation data being very “noisy” as compared to text data. hence the signal to noise ratio is much lower.
Astute NLU Design is required for effective disambiguation, designing for the long-tail of NLU aligning the NLU Model with existing user intents.
The image above shows real-world bot transcript extracts for the same intent of stolen phone. Consider the contrasting user input comparing the chatbot and voicebot transcripts.
2️⃣ ASR Design
Phone-based voicebots have considerably lower quality in user utterances compared to voice/spoken utterances directly to an array of microphones as is the case with Amazon Alexa/Echo.
Add the problem of background noise to poor voice quality of user utterances… From implementing a voicebot with a large telco and analysing thousands of user utterances, I found that with a voicebot, around 40% of calls have considerable background noise.
This is obviously not a problem with chatbots…and smart speakers also find themselves in more quiet areas in the house or car.
The solution is to build an acoustic model from representative user utterances, include NLU training data in the ASR training data.
Also, I’ll be remiss not to mention that for ASR technologies, Whisper is likely to play a bigger part in the near future. Getting things up and running and first analysis will pose less of a challenge with the availability of a open-source ASR tool like Whisper.
3️⃣ Conversation Design
Launching a voicebot is a daunting task due to the sheer amount of moving parts as seen in the image below and the synchronous nature of voicebot conversations.
Smart speakers are often command and control orientated…acting as a voice gateway or interface for issuing single commands. While complex customer support issues demand multi-turn conversations often with compound (multiple) intents.
Chatbots have the advantage that user input can be constrained with directed dialogs and visible design affordances, launching small and addressing a very specific use-case. While voicebots have the impediment of invisible affordances and the ephemeral nature of voice.
Please follow me on LinkedIn for the latest updates on Conversational AI. 🙏🏽
I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.