How To Program The IBM Voice Agent With Watson
Demystifying The Process Of Creating A Virtual Voice Agent
Introduction
A call is placed via a traditional telephone connect, and the call is answered in voice…the only difference is…you are not speaking to a human but a robot.
Below is a diagram of the services which are orchestrated to create the voice agent. It needs to be noted that more recent documentation refer to IBM Voice Agent with Watson. But earlier documentation refers to IBM Voice Gateway.
The commands and session variables we will set still keep with the naming convention of Voice Gateway.
This video takes you through the process of setting up the Elastic SIP Trunk using twilio. The setup of the Voice Agent as a SIP Gateway and setup cloud services orchestration.
Change The Text To Speech Voice In-Call
The voice of the bot can be changed on the fly. You can write a routine where the user says “I want to speak to Mike”, and Mike can answer and take the call from there.
{
"output": {
"text": {
"values": [
"Hi this is Mike! How can I help?"
],
"selection_policy": "sequential"
}
},
"context": {
"vgwTTSConfigSettings": {
"config": {
"voice": "en-US_MichaelVoice"
}
}
}
}
This is the JSON portion you will need to embed in one of the Watson Assistant dialog nodes.
The same can done for any of the other voices…
If the user says, I want so speak to Kate, a dialog node with the following JSON is called:
{
"output": {
"text": {
"values": [
"This is Kate, and I am the Great Britain voice, How can I help? "
],
"selection_policy": "sequential"
}
},
"context": {
"vgwTTSConfigSettings": {
"config": {
"voice": "en-GB_KateVoice"
}
}
}
}
The TTS service is updated on the fly, within the same call, and it is as if the call is handed over to another person.
Change The Assistant Language In-Call
The language of the call can also be chanted on the fly, within a live call. A user might say, can we speak Italian, or can we speak French. In this case the voicebot can change to a different language all-together.
Should a user say, I want to speak German…or the language detection is used to sense the user is speaking German, the language and TTS voice can be updated.
{
"output": {
"text": {
"values": [
"Willkommen bei dieser IBM Watson-Demonstration. Was möchtest du mich fragen?"
],
"selection_policy": "sequential"
}
},
"context": {
"vgwTTSConfigSettings": {
"config": {
"voice": "de-DE_BirgitVoice"
}
}
}
}
In the JSON portion you can see the language and the locale is changed to a specific German TTS voice.
In this fashion any available language, locale or voice can be invoked and this all happens in-call.
Here is an example of changing the language to Italian.
{
"output": {
"text": {
"values": [
"Adesso posso interpretare l'italiano. Cosa vorresti provare dopo?"
],
"selection_policy": "sequential"
}
},
"context": {
"vgwTTSConfigSettings": {
"config": {
"voice": "it-IT_FrancescaVoice"
}
}
}
}
Below is the configuration with Watson Assistant. You can choose to open JSON editor or edit the node via the Graphic User Interface.
Handling Voice Agent Response Timeouts
The detection and handling of a response timeout is fairly standard; catching this event allows for handling the call intelligently.
Silence during a voice call must be avoided at all cost.
What I like about the voice agent gateway, is that the vgwPostResponseTimeout can be set directly as an intent within Watson Assistant.
This illustrates the level of integration between the two elements.
The Voice Gateway can be managed from within Watson Assistant on an intent basis.
The response of the assistant can be defined at that point in the conversation.
Or any other action can be taken, like transferring the call to a live service representative.
Ending A Call From The Agent Side
The IBM Voice Gateway has a few command which allows you to program your Voice Gateway very efficiently. One of the is vgwActHangup. This command can be used to issue a hangup to end the call from the program’s side.
{
"output": {
"vgwAction": {
"command": "vgwActHangup"
},
"generic": [
{
"response_type": "text",
"values": [],
"selection_policy": "sequential"
}
]
}
}
The JSON code can be added to the conversational node within Watson Assistant as shown here.
The typical approach would be to have an intent which catches anything hangup or end-the-call related. It is best practice to have a confirmation node; do you want to end the call, yes or no.
On confirmation from the user, the call termination can be invoked and the call ended.
Conclusion
Conversational interfaces are becoming pervasive and are expanding into different mediums. In this case, Conversational AI is extending into a traditional medium like a voice call.
Callers are not confined to the DTMF menu or keypad anymore and are allowed to speak freely. Obviously there will be challenges which will impede the perceived quality of the service.
Background noise, voice quality during the call and initial user screening will always dictate the user experience.