Will Conversational AI Implementations Become More Fragmented?
Recently I wrote about five emerging Conversational AI trends, the first being Intent Driven Design & Development, bootstrapping, Voice, Large Langauge Models and lastly Fragmentation. In this article I want to develop the idea of fragmentation further with a few practical examples.
Introduction
I’ll be remiss not to acknowledge Sherry Comes for her unintended input to this article. Recently Sherry and I exchanged messages on LinkedIn regarding the idea of fragmentation.
Considering Conversational AI, there are two forces at play currently…consolidation and fragmentation.
Consolidation
Gartner is placing much emphasis on a Conversational AI platform being self-contained. This is the premise wich underpins the Gartner Magic Quadrant report for Conversational AI. If a platform is not self-contained, then they do not qualify. For instance, Microsoft and NVIDIA Riva have gone the route to not present a single self-contained platform. But rather make the Conversation AI components available for a user to craft and engineer their ideal environment.
Another example of consolidation, is Gartner’s “Critical Capabilities for Enterprise Conversational AI Platforms”, where emphasis is placed on vendor product score for certain use cases. These use cases include Human Resources Use Case, IT Service Desk Use Case, Customer Service Use Case, etc.
Fast track templates are becoming common place, IBM Watson Assistant recently released actions templates to fast-track the build process.
There is a plethora of examples…but the question remains, how helpful are pre-sets, templates and the like? Does it really help organisations to implement faster and more effectively?
Fragmentation
Does it make sense to not try and build one single monolith of a solution, but rather focus on orchestration and managing the user experience on a very granular level?
I have asked this question before, but is a framework like OneReach AI focussing on becoming a single Conversational AI portal to act as an orchestration engine and aggregator for Conversational AI experiences? Becoming the Twilio of Language Technology?
Here also consider the direction in which Cognigy is moving with the Cognigy AI Marketplace.
Where Do You Start?
A recent report by Gartner considering the chatbot development cycle, found that when designing a chatbot, companies mistakenly focus on selecting the right technology with attention on feature matrixes and comparisons, instead of focussing on the use-cases and which use-cases to cover.
When the intent based use-cases are defined, the technology choices will be much easier.
Due to use-cases not being clearly defined, the dependancies of the use-cases are neglected, and this in turn leads to delays.
So Sherry Comes came up with this idea of using biomimicry to describe the complexity of a Conversational AI implementation.
For instance, if you want to build a conversational interface for ordering food…
- Accuracy is paramount. Every single technology and process step must be performed with the highest level of accuracy.
- The ability to determine that the customer wants to place an order.
- Ability to hear what the customer is ordering (Speaker, Microphone and Noise Filtering Technologies)
- Ability to recognise what the customer is saying (Automatic Speech Recognition — ASR) and the ability to convert what the customer is saying into digital form so we can understand it (Speech to Text (STT))
- Ability to Understand what was is being said (NLU, Contextual Awareness, Semantic Analysis and Entity Recognition)
- Ability to process what the customer is saying (Natural Language Processing (NLP))
- Ability to manage turn taking, users interrupting and correcting themselves, etc.
- Ability to pull data from the menu and ordering corpus to inform the Dialog Manager for the right prompt and/or information that should be spoken back to the customer (Fulfilment)
- Ability to play the order back to the user, and handle complex scenarios where the user cancels a specific item, or change the details of a specific item.
- Ability to design a natural conversation (Conversational AI Design)
- Ability to engage in a humanlike conversation to deliver an optimised experience (Neural, synthesised voice, TTS).
- Ability to respond and have a conversation with the customer (Natural Language Generation (NLG), Dynamic Text to Speech (TTS))
- Ability to continually learn and improve the acoustic model, language model and data model to be able to appropriately respond to the user customer (Machine Learning, Supervised and Unsupervised Testing, Tuning and Training and Reinforcement Learning)
- Ability to continually learn and improve the acoustic, language and data models (Machine Learning)
- Ability to complete the order and transmit the order appropriately to the Point of Sale system and to the Order Fulfilment system (Network Performance and APIs)
Conclusion
There might be instances where templates and pre-sets can work quite well and scale reasonably. Overall I believe we will see more fragmented, multi-vendor implementations. This will especially be the case with the proliferation of voicebots and other more complex implementations.
The solution is not to default on one of two approaches, the objective is not which approach is right and which is wrong.
But rather the methodology should be correct from the start, as described in this article.