Photo by Steven Lu on Unsplash

Analysis Of The Deloitte Report On Conversational AI Innovation

And Where Vectors Of Innovation Are Emerging

Cobus Greyling
7 min readJul 11, 2022



There has been much focus on end-to-end Conversational AI frameworks, and one-stop-shops attempting to deliver a complete solution for any and all implementations.

Only a few vendors have opted for a more disparate approach allowing customers to constitute their own bespoke architecture from different components. Amongst these providers are Microsoft and NVIDIA Riva.

Is it feasible for one solution provider to provide all the best-of-breed components required for an ever scaling conversational AI framework? Or are we seeing a wave of independent vectors of innovation which will take Conversational AI to the next step?

Conversational AI, What’s Next?

Deloitte recently dissected the question, What is Conversational AI?, in an attempt to define Conversational AI and the different components.

Their analysis shows that Conversational AI brings together eight technology components, listed below…

Deloitte Report
  1. NLP, which can include Large Language Models.
  2. Intent Recognition needs to be managed very granularly and adaptable to changes in user input.
  3. Entity Recognition must be able to absorb complex user input.
  4. Fulfilment, dialog state management needs to be informed in-conversation by API lookups, conditions, etc.
  5. Voice Optimisation, there is still a challenge with turn taking, barge-in and digression in speech interfaces.
  6. Text To Speech, flexible and natural synthesised speech.
  7. Machine Learning is necessary to improve intent recognition. LLM’s also has a role to play here.
  8. Contextual Awareness is necessary for natural conversations; disambiguation, digression, in conversation context, etc.

Thinking of innovation, table stakes are high with the current framework providers (Cognigy, Kore AI, Boost AI, IBM Watson Assistant, Nuance Mix, Oracle Digital Assistant, to name a few). New successful entrants are highly unlikely because reaching product parity is a big ask.

And beyond parity, product differentiation and market share need to be achieved. So as innovation continues, vectors of innovation are emerging…

Five Areas Of Rapid Innovation

This rapid innovation is driven by the impediments and limitations of the current ecosystems.

The technology advances also mean that organisations and enterprises will have to settle for multiple tools to get their conversational AI offering right.

Chatbots currently represent the top use of AI in enterprises, and their adoption rates are expected to almost double over the next two to five years.

~ Deloitte

Deloitte looked at patents and where innovation is taking place, five areas were identified for rapid innovation:

  1. Training Chatbots
  2. Managing Complex Conversations & Compound User Utterances
  3. Personalisation
  4. Improving Voicebots
  5. Narrow Domain implementations
Source: Top patent topics for conversational AI

Let’s break down these Five Areas Of Rapid Innovation…

Training Chatbots

The top number of patents and current focus is on training conversational agents. With this laser focus and investment on training chatbots, it is clearly seen as an avenue for significant improvement and opportunity for growth.

Under training I also include efforts like Intent Driven Development and maturing tooling to make intents more accurate and flexible, creating nested or hierarchal intents, which is adaptable to changes in user behaviour.

There are a number of data analysis and training tools with functionality like adding & augmenting training data, tagging data, clustering, annotation, autogenerated suggestions, disambiguation, etc.

Managing Complex Conversations & Compound User Utterances

Complex and compound user input are something which solution providers are focussing on. Part of this problem is not only compound intents and entities in a single user utterance, but also the problem of orchestration.

Added to compound utterances, this is the use-case where a user utters a simple command, but the execution of this command demands multiple processes to be kicked-off and orchestrated in the background. And, potentially the requirement to surface progress and feedback to the user.

Is this one of the problems OneReach AI is trying to solve for?


Hyper personalisation, addressing not only an audience of one, but also addressing changes in the user behaviour and sentiment per engagement. This demands a level of flexibility within conversational agents which adds more overhead, if not handled in an innovative fashion.

Deloitte divides personalisation into three categories:

Before The Conversation

Segmentation and classification are identified as avenues for this category and Large Language Models have a role to play here. Segmenting conversations using embeddings to detect hidden clusters of user behaviour is vital. And using semantic search to find related information based on intent and meaning.

As seen above, an example from the Co:here LLM. Based on a trained model with different sentiments, utterances are grouped in positive and negative clusters, ranging in degrees.

During The Conversation

The in-conversation experience is very much dictated by the dialog state management approach taken. Eventually dialog state management needs to move from a rigid state machine approach, to a deterministic approach, based in the recent, in conversation input from the user.

Recently I wrote on the idea of intent documents and following the approach of a more generative bot response model.

Lastly, the in-conversation experience is also highly customised for each user…

If the customer is impatient, the agent will increase the rate of speech; if the customer seems highly dissatisfied, the digital agent will involve a human agent in the conversation.

~ Deloitte

After The Conversation

Report, learn and improve. Is auto-learning one of the plausible approaches to automate after-conversation improvement?

Above, from IBM Watson Assistant analytics, the heat-map shows the co-occurrence count of the top N node pairs in disambiguation lists. Moving the cursor on each square to view count information.

You can see users showing interest in a loan gets to the node of taking a loan. Showing interest and asking about payment is also co-occurring. Taking a loan and payment also goes together.

Improving Voicebots

Innovation for voicebots are to be found in:

  • Background noise management
  • ASR improvements
  • Speech-to-Speech technology
  • Conversation management in turn taking, barge-in, etc.
  • Provision for multiple & minor languages

Narrow Domain implementations

Domain specific implementations and industry focussed solutions from vendors are common place.

And Lastly…

From A Horizontal Perspective…

As the conversational AI landscape develops, there seem to be significant focus on bootstrapping, in an attempt to speed-up development time, save costs and simplify the process.

These methods of bootstrapping include leveraging LLM’s, Question & Answer with quick replies, knowledge bases with vector embeddings, etc.

There is also focus on voice and voice call automating, which drives technologies like speech-to-speech, innovative ASR and speech synthesis, etc.

I tend to see these frameworks as horizontal solutions, trying to connect the components of the conversational experience end-to-end, with NLU (Intents & Entities), Dialog State Management, API Abstraction layer, Voice Gateway Integration, etc.

Some vendors attempt to supply all the components while other frameworks act as an aggregator.

For instance, NVIDIA Riva is excelling at ASR, TTS, edge AI installations and face-speed. However, NVIDIA Riva is leveraging other technologies for dialog state management, for instance Google Dialogflow CX and Rasa. There is nothing wrong with this approach. For instance, Cognigy has built their framework modular, so their conversational components can be used in isolation externally for NLU, Dialog Management or the whole framework can be white-labeled.

From A Vertical Perspective…

This is where the growth and innovation is coming from…

The five areas of rapid innovation discussed by Deloitte can also be represented as verticals extending out of the horizontal frameworks. These verticals can also be extended into seven vertical vectors.

That is why I believe the horizontal growth and the entry of new platforms will slow down and new opportunities with rapid growth will emerge from these verticals. These verticals will solve for problems the frameworks are not addressing, or where the overhead of addressing the verticals are too much.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.