Photo by Walid Amghar on Unsplash

A Review Of Nuance Mix Conversational AI Platform

And How It Can Be Used To Quickly Create & Maintain Exceptional Customer Experience

Cobus Greyling
9 min readApr 12, 2022

--

And Why I Fits Perfectly Into Microsoft’s Current Conversational AI Environment

Introduction

On 4 March 2022 Microsoft made the announcement that the acquisition of Nuance has reached completion. Nuance has been seen by many as having missed the boat on Conversational AI. Commentators saw Nuance as marginalised as the veteran in an industry they should have dominated.

But this changed due to two reasons:

  • The Microsoft acquisition
  • Nuance Mix

February 2020 saw the beta release of Nuance Mix, and looking at their release notes, there has been a steady cadence of 2 to 3 releases per month. This level of activity and development speaks of continued value delivery and investment into an already astounding product.

When creating a project in Nuance Mix, three standard options are available, IVR, Digital Virtual Assistant in text orvoice. The required mediums can be selected, for instance TTS, DTMF and Interactivity (which is visual design affordances). The ASR portion is setup at a later stage, if required.

As seen above, when creating a project in Nuance Mix, three standard options are available, IVR, Digital Virtual Assistant text or voice. The required mediums can be selected, for instance TTS, DTMF and Interactivity (which is visual design affordances).

Nuance Mix is a complete, cloud based, SaaS solution, with all the required components to be considered as an end-to-end Conversational AI implementation. Key elements constituting Nuance Mix are:

  1. Dialog State Management & Development environment.
  2. NLU component (Intents & Entities)
  3. Analytics & Reporting
  4. Text-To-Speech (Speech Synthesis)
  5. Speech-To-Text (ASR, Automatic Speech Recognition)

Nuance Mix Is Addressing Microsoft’s Biggest Vulnerability In Conversational AI — Dialog State Development & Management

Nuance Mix has the added advantage now of leveraging Azure Cloud to augment their presence and negate any latency concerns.

Microsoft has powerful Conversational AI components in:

  • Microsoft Speech Studio (TTS & STT)
  • LUIS
  • Azure bot service (pro-code)
  • Microsoft Bot Framework (pro-code)
  • Power Virtual Agents

However, since July 2021 there has not been any updates on Composer, not withstanding Microsoft affirming their commitment to the product. Composer has clear and definite vulnerabilities. Microsoft was also excluded from the Gartner report due to Microsoft not have a singular stand-alone platform.

Nuance Mix has a well organised dialog state development and management environment. Their approach is one of a design canvas, which most platforms follow.

Enter Nuance Mix, Mix addressed all the ailments Microsoft has been experiencing:

  1. Mix is a singular stand-alone platform.
  2. Most importantly, it has a a dialog state management system. This is has been Microsoft’s Achilles’ heel. Even though they have solid pro-code environment in Bot Framework, the no-code/low-code environment was lacking.
  3. There can be a good exchange between Microsoft and Nuance. For example, Nuance is now on their eleventh-generation Automatic Speech Recognition (ASR) engine!
  4. Microsoft covers most bases now; with their traditional approach, enterprises wanting to create their on framework and architecture can opt for the Microsoft components. Those wanting a singular standalone platform can opt for Nuance Mix.
  5. Looking at the IDC report below, the combined market share of Nuance and Microsoft is significant. After prototyping on Nuance Mix, the rating of the IDC seems slightly harsh. Even more so Gartner only extending an honourable mention.
Looking at the IDC report below, the combined market share of Nuance and Microsoft is significant. After prototyping on Nuance Mix, the rating of the IDC seems slightly harsh. Even more so Gartner only extending an honourable mention.

The Positives Of Nuance Mix

  • Accessibility, as with Cognigy, Kore AI and others, easy access sans any payment hook is possible.
  • When deploying a project, under data hosts, Azure East US showed up, as seen below. This is a huge advantage for Nuance Mix to leverage the power of the second largest cloud provider in the world. Azure has at least 21% world market share, second only to AWS at 33%.
  • This bodes well for localised/specific geographic regional implementations. Especially IVR & Voice implementations where latency is critical.
When deploying a project, under data hosts, Azure East US showed up. This is a huge advantage for Nuance Mix to leverage the power of the second largest cloud provider in the world. Azure has at least 21% world market share, second only to AWS at 33%.
  • Nuance Mix has a very strong offering-set. ASR, Speech Synthesis, Dialog Management, NLU etc. It is in all regards a complete solution. As seen below, the solution is all encompassing in their approach. Seemingly no medium or conversational requirement falls outside their ambit. When creating a build from a project, NLU, ASR and Dialog can be selected to add to the run-time solution.
When creating a build from a project, NLU, ASR and Dialog can be selected to add to the run-time solution.
  • It is interesting how different platforms converge on a good idea. A good case in point is NLU, where platforms have very much the same approach. With dialog state management and development the jury is still out and there are four distinct approaches. There seem to be a convergence on a design canvas approach, as seen below. This is used by Cognigy, Kore AI, Google DialogFlow, to name a few. Nuance Mix also opted for this approach. However, there is still much to say for a pro-code approach. And also Rasa’s Machine Learning stories, a good idea who’s time has not yet come. And then of course the linear state machine approach of Watson Assistant, which is innovative in its own way.
The dialog design canvas of Nuance Mix, with the conversation flow in the middle, the dialog elements on the left and conditional settings on the right.
  • A dialog can be tested in a test pane, and the conversation is mapped out or shown in realtime in the dialog canvas. This is also the case with Cognigy.
  • A vast array of language options are available for NLU, TTS & STT.
  • Very much a Multi-Modal approach with various mediums and affordances being defined.
  • Really exceptional documentation, and the documentation and help is very well integrated into the workspace.
  • While building the prototype, my workspace was split across three tabs, NLU, Dialog development and Project settings. This makes for a convenient extended workspace on multiple screens.
  • Nuance Mix has built-in functionality in their intent. This has become a common trend with platforms like Kore AI, Cognigy and HumanFirst, to name a few, who started with shifting some intelligence away form the dialog canvas to the NLU portion. The switching of intents on and off reminds quite a bit of the Cognigy approach.
Weights can be added to certain intents, switched on and off, markers show if the intent is assigned and if annotation is pending.
  • NLU can be tested separately, a feature reminding quite a bit of the Alexa Developer Console. A sample utterance can be entered and the intent match is displayed. The dev can then click on add sample to add the training example to the intent. A detailed JSON document is generated with more detail. The JSON is also a good source of disambiguation data to be used within the call flow.
  • A nice touch a feature called auto-intent. Sample sentences are analysed and suggested intents are given. A toggle allows for the identification of new intents. This reminds very much of the work HumanFirst is doing with their solution. However, there might only be a 5% overlaps as HumanFirst’s offering is quite vast.
Sample sentences are analysed and suggested intents are given. A toggle allows for the identification of new intents.
  • Built-in system health checks
Built-in health checks.
  • There is laser focus on enhanced management tools of the Conversational AI environment. Below the design canvas is displayed, and from here NLU, Variables, Messages, Events & Data can be accessed.
There is laser focus on enhanced management tools of the Conversational AI environment. Below the design canvas is displayed, and from here NLU, Variables, Messages, Events & Data can be accessed.
  • Intents can be associated with entities, this tight feedback loop between an intent and one or more entities are an important contributor to accuracy.
The intent drinks is associated with the entities COFFEE_SIZE and OperatingHours.
  • With added intent annotation, as seen below.
Added intent annotation with compound entities per intent, which is now commonplace.

When you create a project, the following options are available:

  • Rich text: Lets you specify text messages that can be displayed on any screen, such as SMS messages. It also provides the ability to include richer content in messages, such as HTML tags that can be used in a web chat.
  • Audio Script: Lets you specify recorded audio files to play, as well as backup text to be rendered using text-to-speech (TTS) when the audio file is not available.
  • TTS: Lets you specify text that can be spoken using speech synthesis. Applies to channels intended for TTS only. This modality allows you to use the DLGaaS TTS streaming feature.
  • Interactivity: Lets you add interactive elements to the message, such as buttons and clickable links.
  • DTMF: Lets you add support for Dual-Tone Multi-Frequency tones as user input.
The basic sequence of events as defined by Nuance Mix.

A Few Considerations

  • Cost is an aspect I did not consider. With so many cloud offerings currently available cost will definitely play a major role in the decision making process.
  • Nuance Mix has not added any structure to their intents as we have seen with Cognigy, Kore AI, HumanFirst and a few others too. Specifically hierarchical or nested intents.
  • There is some structure to entities within Mix, with List, Relationship, Freedom & Regex-based. Here Mix can greatly benefit form Microsoft LUIS in terms of their machine learned entities.
Four types of Entities exist, List, Relationship, Freeform & Regex-based.
  • There is no Universal language available for NLU selection. This is a bit of an inhibitor. IBM Watson Assistant also has a universal language option. Obviously this approach cannot be followed with STT & TTS, but it is quite plausible for NLU. There might be an argument for having one language which is available across all segments or possible segments of a digital assistant.
The universal option as from the Cognigy framework.
  • One wonders how susceptible Nuance will be to a local private install / instance of Mix. This is often a requirement of larger enterprises.

Table Stakes & Verticals

As more complete end-to-end cloud based Conversational AI solutions are launched, the table stakes increase considerably. With low table stakes (existing market functionality and innovation) it is easy for companies to go to market and differentiate themselves. Hence the early entrants differentiates themselves with a lower level of functionality.

With the current state of play, with such advanced and augmented environments, the table stakes are high, and considerable work needs to be done just to reach market parity. And from here differentiation still needs to be achieved. Not to speak of the market share which need to be fought fore.

That is why I believe the horizontal growth and the entry of new platforms will slow down.

And as the market matures, new products will emerge servicing the Conversational AI market. These products will find themselves solving for one or more (not too many) of the verticals defined here. Subsequently these vertical-focussed products will make for a next round of acquisitions, as an convenient avenue for larger players to differentiate their products and grow niche functionality.

A company like HumanFirst comes to mind, serving verticals 1 and 7 with an approach of automated Intent-Driven Development based on conversational data and customer utterances. There are other products also focussed on quality and automated testing.

Conclusion

The marketplace is becoming crowded and Conversational AI reports of 2023 will be interesting.

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com