Voice Agent with IBM Watson Is Dead

Long Live Watson Assistant Phone Integration

Cobus Greyling
6 min readApr 5, 2021

--

Introduction

On 1 March 2021, Voice Agent with Watson was deprecated. I have written four medium posts on it, listed at the end of this story. This service was a standalone service available on the IBM Cloud catalog. It really consisted of four elements:

Watson Service Orchestration. Source
  • Speech To Text (Converting User Speech Into Text)
  • Watson Assistant (Conversational AI)
  • Text To Speech (Converting Bot Text Into Voice)
  • SIP Gateway

The voice Agent enabled you to voice enable your chatbot.

The heart of the conversational experience is Watson Assistant.

Watson Assistant hosted the following four conversational elements:

  • Intents
  • Entities
  • Dialogs &
  • State Management

The other two components constituting the VoiceBot are Speech To Text (STT) and inversely, Text To Speech (TTS).

IBM Multilingual Virtual Voice Agent Demo

Indeed, it never really made sense to have this service as an external product to Watson Assistant.

So as the integration points grew for Watson Assistant, so voice integration had to be brought closer to these options.

Now phone integration is completely housed within Watson Assistant, and presented as newly released phone integration!

This is aligned with the general awareness of products being capable of telephony integration, especially from Google with the launch of DailogFlow CX.

IBM is promising much more tools and features when in comes to deploying assistants on the phone. Or a single instance spread over multiple mediums.

Adding Integration For Watson Assistant

Firstly, when you access any of your previous Voice Agent with Watson this warning is shown alerting you to the service deprecation. Migration is made provision for which is obviously convenient and necessary.

Voice Agent With Watson Warning in the IBM Cloud console.

The reason why this move from IBM makes so much sense, is the fact that the integration options of IBM are bolstered. Microsoft leads the pack when it comes to the variety of integration options made available out of the box.

IBM Watson Assistant Page

Seemingly, the last few months, there has been huge focus from IBM to build out the Watson Assistant quick integration options.

From the image above you can see there are two icons marked on the left. One for Assistants and one for Skills.

An Assistant can be seen as a collection of skills. IBM make provision for three types of skills:

Prior to integration, one or more skills must be added to an assistant. You can have a assistant incorporating actions, dialogs and search skills. Each of these have specific use-cases and implementations. Combining them will bolster the abilities of your conversational agent.

Once your assistant with skills are setup, you can choose the integration from the options. Here you will see the new option of Phone.

A few of the integration options available for IBM Watson Assistant. You will see Phone listed there as an option.

There is the temptation to deploy a text or chat based conversational agent to voice without any modifications or changes. Read this before deploying a chatbot as a voicebot.

Phone Integration

Once Phone integration is selected, choose a name for this particular integration. IBM refers quite a bit to the integration options available via Twilio. I setup a phone number on Twilio which reaches the Watson Assistant Phone service previously. You can view the demo video here.

Watson Assistant Phone Integration Start Page

I only looked at the basic service, as seen below. When you setup an elastic SIP trunk on Twilio, you get to choose a number. I was surprised by the support for South African numbers.

Watson Assistant Phone SIP Trunk Configuration

The number you reserve on Twilio can be entered here. And subsequently the sip address needs to be entered on the Twilio side. The linking of the sip trunk from Twilio to IBM Watson Assistant is surprisingly straightforward and easy.

Twilio Elastic SIP Trunking

This is especially helpful for quick demos and prototypes. On the Twilio side you will see the product Elastic SIP Trunking.

Important elements here are reserving your number and entering the SIP address from IBM Cloud.

Copy the SIP address and add it to your SIP trunking provider’s origination configuration; in this case Twilio.

  1. SIP uniform resource identifier (URI) example: sips:public.voip.us-south.assistant.watson.cloud.ibm.com

Create a phone number with your provider, assign it to the SIP trunk, then use it here. Enter it in the phone number section above.

Speech Services

Speech services give your chatbot the ability to speak and listen. Speech services are constituted by:

  • Speech To Text (Automatic Speech Recognition)
  • Text To Speech (Speech Synthesis)

The most convenient way is to click on Create new instances, and a process is launched to create these on your behalf.

Speech Services (TTS & STT) needs to be created; or existing services can be used.

You will be presented with a confirmation dialog prior to the service creation.

Speech Services Creation Confirmation Screen

If you check back in your list of services, refresh your console, the newly created services will be visible. Both for Speech To Text & Text To Speech.

Listed as new services in your console.

Within the speech services section, options are available for Base language model to transcribe incoming calls to text. And, Base voice model for your assistant to speak with.

Speech Services Options

Training will definitely be required for the speech to text model. This training can be done using recordings (acoustic model) or text based.

Twilio Configuration

This video is a short overview of the Twilio & IBM connection.

Twilio Setup for SIP Number

Conclusion

Conversational interfaces are becoming pervasive and are expanding into different mediums. In this case, Conversational AI is extending into a traditional medium like a voice call. Google’s split of DialogFlow into ES and CX is testament to this. Whit CX focusing on collaboration and telephony integration.

Callers are not confined to the DTMF menu and keypad anymore and are allowed to speak freely. Obviously there will be challenges which will impede the perceived quality of the service.

Background noise, voice quality during the call and initial user screening will always dictate the user experience.

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com