Photo by SpaceX on Unsplash

The Current Conversational AI & Chatbot Landscape

And How To Choose The Right Solution

Cobus Greyling
11 min readNov 22, 2021

--

Introduction

“We shape our tools and, thereafter, our tools shape us.” — John Culkin (1967)

Initially the technical and design decisions are easy. However, as technology grows and the chatbot scales, those design and technical decisions become harder and loaded with ramifications. Hence careful initial considerations is necessary. Especially if an investment is made; otherwise a prototype/test approach can be followed.

Making the right technology decisions at the start of your chatbot journey has a significant influence on what your chatbot’s trajectory will be.

Choose and shape your tools wisely.

As later in the process those tools will shape and influence the way you plan, develop, scale your chatbot.

Chatbot development tools and frameworks can be divided into three categories, roughly.

Category 1

The open source, more technical NLP tools and chatbot development frameworks. Typically, these tools:

  • Can be installed anywhere
  • Has open architecture
  • Open Source
  • No or limited GUI
  • Configuration file and pro-code focused
  • Machine Learning Approach
  • Higher barrier to entry
  • Scales well
  • Demands astute technical planning for installation & operational management.
  • Often used as underlying technology by Category 3 software
  • New features can be developed and the platform enhanced

Category 2

  • Often used by large-scale commercial offerings
  • Cloud based. In some instances geographic regions can be selected
  • Seen as safe bets for large organizations
  • Solutions range from pro-code, low-code to no-code
  • Lower barrier to entry
  • GUI focused
  • Little to no insight or control as to what happens under the hood.
  • Rigid rule-based dialog state management
  • Cost is most often not negotiable

Category 3

  • These are independent, alternatives for Conversational AI, providing an encapsulated product
  • The technology under the hood is often not made known
  • Independent, alternative solution providers
  • Frequently built using open-source NLP tools
  • Often innovative approaches to the challenges of Dialog State Design, development and management
  • Low-code to no-code approach
  • The possibility of being acquired
  • Price is often more negotiable
  • Feature requests are more likely to be accommodated
  • Lower barrier to entry and to get going

Fine-Tuning

In general, the aim of most chatbot development frameworks, is to create an environment which allows medium-level technical people easy onboarding.

And only performing NLP allows for a simple data-in-data-out environment.

As a conversational agent grows and evolves, more complexity is introduced, considering elements like dialog management, maintaining context.

As chatbot development frameworks move from a No-Code environment all the way op to native code (pro-code), the ability to fine-tune increases. And in most cases the barrier to entry also increases.

Hence there needs to be flexibility but also an interface to develop and manage the dialog state management. The challenge is to have a natural and adaptive dialog which is also predictable and manageable.

The more no-code or low-code the solution becomes, the more the fine-tuning options diminish. The more fine-tuning, the more complexity.

There is been much talk about the low-code approach to software development and how it acts as a catalyst for rapid development. And how it acts as a vehicle for delivering solutions with minimal bespoke hand-coding.

Low-code interfaces are made available via a single or a collection of tools which is very graphic in nature; and initially intuitive to use. Thus delivering the guise of rapid onboarding and speeding up the process of delivering solutions to production.

As with many approaches of this nature, initially it seems like a very good idea. However, as functionality, complexity and scaling start playing a role, huge impediments are encountered.

When someone refers to the ability or the extend to which fine-tuning can be performed, what exactly are they referring to? In this section we are going to step through a few common elements which constitutes fine-tuning.

  • Forms & Slots
  • Intents
  • Entities
  • Natural Language Generation (NLG)
  • Dialog Management
  • Digression
  • Disambiguation

General Trends In Category 1 & 2

For starters, there are six general chatbots trends emerging…

1️⃣ There has been growing activity in voice/speech interfaces, particularly access via a phone call, and not necessarily a dedicated voice assistant device. IBM Watson Voice Agent was launched 2018, but from March 2021 it will be deprecated and fully integrated into Watson Assistant as the newly released phone integration. Google DialogFlow CX and NVIDIA Riva were launched.

2️⃣ Deprecating of intents. This is also referred to as end-to-end learning. Intent deprecation introduces more flexibility in terms of user inputs and matching those inputs to a dialog node. There is however a loss of fine-tuning ability, hence how this will play out in practice remains to be seen. One scenario is that intent-less skills are built by business units and not technical teams. And these skills act as an extension of an existing assistant.

3️⃣ Intents and Entities continue to merge and contextual annotation of entities within the intent or utterance is becoming commonplace & very necessary. Compound entities are also becoming more important. The merging of intents and entities is a process where entities are tightly coupled with intents. Resulting in an efficient feedback loop.

4️⃣ Data structures are introduced to Entities… This trend is visible with Rasa, Alexa Conversations tool and especially Microsoft LUIS. Rasa calls it Entities Roles & Groups. AWS calls it Slots with Properties. And Microsoft LUIS, ML entities which can be decomposed. Cisco MindMeld also spent time on building entities out.

5️⃣ Edge installations are becoming more important…NVIDIA Riva and Rasa come to mind for install anywhere.

6️⃣ Deprecating of the State Machine is inevitable, Rasa is leading the charge here. IBM is introducing automation to their Dialog Management system with customer effort scores and auto disambiguation menus. Watson Actions need to be mentioned. Most frameworks converge on ideas like intents, entities, dialog messaging and similar approaches are followed. Whilst when it comes to dialog state development and management, there is a significant disparate approach to the problem. NVIDIA is working on Riva Studio which will most probably include dialog state development. Something which is not part of Riva now. The current Riva demos make use of Rasa and Google Dialogflow for dialog management.

Overview Of Development Environment

Environments are generally very similar in their approach to tools available for crafting a conversational interface.

Considering what’s available, chatbot development environments can still be segmented into 4 distinct groups for Categories 1 and 2 mentioned above.

These being:

  • Leading Commercial Cloud Offerings
  • NLU / NLP Tools (mostly opensource)
  • The Avant-Garde & Edge
  • The Use-the-Cloud-You’re-In

Category 1: The Avant-Garde

Here RASA really finds itself alone at the forefront. Recently from a speech access perspective, NVIDIA Riva arrived on the scene. Riva does have the two impediments; access to NVIDIA GPU based on their Turing or Volta architecture. And, secondly, the Riva dialog development and management feature is under development and has not been released yet.

RASA

Rasa follows a very unique path in terms of wanting to deprecate the state machine with its hard-coded dialog flows/trees. Together with their Conversation Driven Design (CDD) in the form of Rasa-X this is a very compelling option.

Their entities are contextually aware and they follow an approach where entities and intents really merge.

Compound entities are part of the offering. Entities can be segmented according to roles and groups.

Deprecation of intents have been announced and initiated.

Based on their expansion, funding, developer advocacy and events, this is a company to watch.

Hopefully the bigger players will emulate them. One of their strong points is developer advocacy and being the technology of choice for seed projects.

RASA has succeeded in creating a loyal developer following.

Category 1: NLU /NLP Tools

There are also (some opensource) tools like Hugging Face, spaCy, Apache OpenNLP, RASA NLU and others which can be used to to process natural language in your environment.

Some organizations are creating their own chatbot framework making use of these tools.

Industrial-Strength Natural Language Processing

This is the harder route and is more time consuming, but if you have an existing environment, augmenting it with natural language processing capability, making use of these tools is a viable option.

It is truly astonishing the power of most of these opensource tools. And with the documentation available, it can serve as a “no software cost” point of departure for a first foray into natural language processing. It needs to be noted that in some cases enterprise costs exist.

Category 2: Leading Commercial Cloud Offering

The leading commercial cloud environments attract customers and users to them purely for their natural language processing prowess and presence, ease of use without installation and environment management.

Among these I count IBM Watson Assistant, Microsoft Bot Framework / Composer / LUIS / Virtual Agents, Google Dialog Flow etc.

Established companies gravitate to these environments, at significant cost of course. These are seen as a safe bet, to meet their Conversational AI requirements.

They are seen as chatbot tools providers in and of their-self.

Scaling of any enterprise solution will not be an issue and continuous development and augmentation of the tools are a given. Resources abound with technical material, tutorials and more.

Category 2: Use-the-Cloud-You’re-In

I cannot help but feel Amazon Lex with Oracle Digital Assistant (ODA) find themselves in this group. My sense is that someone will not easily opt for ODA or Lex if they do not have an existing attachment with Oracle or AWS from a cloud perspective.

Especially if the existing attachment is Oracle Cloud or Oracle Mobile Cloud Enterprise. Or with AWS via Echo & Alexa.

Another impediment with ODA is cost. Free access plays a huge role in developer adoption and the platform gaining that critical mass. We have seen this with IBM being very accessible in terms of their free tier with an abundance of functionality.

Microsoft has gone a long way in more accessible tools, especially with developer environments. RASA, even though a relatively late starter, has invested much time and effort in developer advocacy. Google Dialogflow is also popular and often a point of departure for companies exploring NLU and NLP.

ODA is not accessible enough and the existing impediments to experimenting and prototyping are not helping.

Cross-Industry Trends

  • Intent deprecation.
  • Intent Disambiguation with auto learning menus.
  • The merging of intents and entities
Chatbot Growth In Capability
  • Deprecation of the State Machine. Or at least, towards a more conversational like interface.
  • Complex entities; introducing entities with properties, groups, roles etc.

There are both horizontal and vertical growth with chatbot technology.

From the diagram above it is clear where this growth is taking place:

Vertical — Technology

The Conversational UI is moving away from a structured preset menu and keyword driven interface. With movement towards unstructured natural language input and longer conversational input. Allowing users to disambiguation when two or three intents are close in score. Using this as a mechanism for autolearning.

Horizontal — User Experience

In this dimension the bot is transforming from a messaging bot to a truly conversational interface. Away from click navigation to eventual unrestricted compound natural language.

The Digital Employee

The end-game is where the digital employee, emerging from the chatbot environment, has evolved into areas of text and speech.

With contextual awareness on four levels:

  • Within the Current Conversation
  • From Previous Conversations
  • From CRM & Other Customer/User Related Data Sources
  • Across different mediums

The digital employee with grow across different mediums and modalities. Mastering languages with detection, translation, tone, sentiment and automatically categorizing conversations.

Mediums will include devices like Google Home, Amazon Echo, traditional IVR and more. As we as humans can converse in text or voice; similarly the digital employee will be able to converse in text or voice.

Chatbot Offerings Rating Matrix

In rating the nine chatbot solutions I looked at nine key points. Obviously NLU capability is key in terms of intents and entities. I was especially harsh on the extend to which entities can be applied in a compound fashion, annotated and detected contextually with decomposition.

Dialog and state development and management are also a key points; ease of development is important and to what extend collaboration is possible.

The other elements are self explanatory.

Key to Ratings

For different organizations, disparate element are important and will guide their thinking and eventually determine their judgement. For instance, even-though Lex does not feature in many respects, if a company is steeped in AWS for other service, Lex might be the right choice.

The same goes for Oracle, MindMeld etc.

Chatbot Rating Matrix

Graphic Call Flow / Dialog Development Tools

For larger organizations and bigger teams, collaboration is important. Ease of sharing portions of the dialog and co-creating is paramount. Hence organizations have a need for graphic development environments. Other teams prefer a more flexible native code approach.

Rating of GUI Form Call Flow Development & Editing

IBM Watson Assistant made a big addition with the launch of Actions.

Rasa with their tool called Rasa-X is so unique that it is hard to accurately categorize with the other environments. Rasa-X is graphic, it allows for editing and development, but is far more comprehensive.

The Jarvis dialog development and management feature is under development and has not been released yet.

NLU

Natural Language Understanding Capability

Natural Language Understanding underpins the capabilities of the chatbot. Without entity detection and intent recognition all efforts to understand the user come to naught.

On some elements of a chatbot environment, improvisation can go a long way. This is not the case with NLU. LUIS has exceptional entity categorization and functionality. This includes decomposable entities. IBM Watson Assistant can also be counted as one of the leaders, with RASA & NVIDIA Jarvis.

I also looked at the the integration of the NLU components into the other chatbot components. This is where Microsoft excels with their growing chatbot real-estate.

Scalability

Maturity of any framework is tested in an enterprise environment where implementations with diverse use-cases and ever expanding scale are present.

Scalability & Enterprise Readiness

Enterprise readiness is an evaluation criteria which does not enjoy the attention it deserves. Once vulnerabilities are detected, too much money and time have already been invested in the technology.

Conclusion

This is a mere overview based on a matrix with points of assessment I personally deem as important.

And again, based in how important a particular point on the matrix is to you or your organization, will influence our judgement.

In the final analysis the software is to serve a purpose in your organization and current cloud landscape. The offering best suited for that purpose is the best choice for you.

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com