Updated: The Current Conversational AI & Chatbot Landscape

And The Right Approach To Selecting A Solution

Cobus Greyling

--

Introduction

“We shape our tools and, thereafter, our tools shape us.” — John Culkin (1967)

Initially the technical & design decisions are easy. However, as demands on technology grow and the chatbot scales, those design & technical decisions become harder and loaded with ramifications. Hence careful initial considerations are necessary. Especially if an investment is made; otherwise a prototype/test approach can be followed.

Making astute technology decisions at the inception of your chatbot journey has a significant impact on what your chatbot’s trajectory will be.

Choose and shape your tools wisely.

Because, later in the process those tools will shape and influence the way you plan, develop, scale your chatbot. Impediments are usually system or framework related.

Chatbot development tools and frameworks can be divided into four categories, roughly…

Category 1

The open source, more technical NLP tools and chatbot development frameworks. Typically, these tools:

  • Can be installed anywhere
  • Has open architecture
  • Open Source
  • No or limited GUI
  • Configuration file and pro-code focused
  • Machine Learning Approach
  • Higher barrier to entry
  • Scales well
  • Demands astute technical planning for installation & operational management
  • Often used as underlying enabling technology by Category 3 software
  • New features can be developed and the platform enhanced

Category 2

  • Often used by large-scale commercial offerings
  • Cloud based
  • In some instances specific geographic regions can be selected
  • Seen as safe bets for large organizations
  • Solutions range from pro-code, low-code to no-code
  • Lower barrier to entry
  • GUI focused
  • Little to no insight or control as to what happens under the hood
  • Little to no user influence on the product roadmap
  • Rigid rule-based dialog state management
  • Cost is most often not negotiable
  • Collaboration and group-design-development focused

Category 3

  • These are independent, alternatives for Conversational AI, providing an encapsulated product
  • The enabling technology under the hood is often not made known
  • Independent, alternative solution providers
  • Frequently built using open-source NLP tools
  • Often innovative approaches are followed to the challenges of Dialog State Design, development and management
  • Low-code to no-code approach
  • The possibility exist of these companies being acquired
  • Price is often more negotiable
  • Feature requests are more likely to be accommodated
  • Lower barrier to entry and to get going

Category 4

  • Natural Language Processing and Understanding tools
  • Text or conversations can be analyzed for intent, named entities, custom defined entities.
  • Often tasks like summarization, key word extraction, language detection etc. can be performed
  • Data annotation and training data improvement GUI tools are available in some cases
  • Also tools for managing training data
  • Easily accessible, but with a higher technical barrier to entry
  • Ideal for high NLP pass on user input prior to NLU
  • Not a chatbot development framework
  • Does not include features like dialog state management, chatbot response management etc.
  • Focused on wider Language Processing implementations and not just conversational agents
  • Often used for non-real-time, off-line conversational text processing
  • Often used as underlying technology by Category 3 software

Overview Of Development Environment

Environments are generally very similar in their approach to tools available for crafting a conversational interface.

Considering what’s available, chatbot development environments can still be segmented into 4 distinct groups for Categories 1 and 2 mentioned above.

These being:

  • Leading Commercial Cloud Offerings
  • NLU / NLP Tools (mostly opensource)
  • The Avant-Garde & Edge
  • The Use-the-Cloud-You’re-In

“Our Age of Anxiety is, in great part, the result of trying to do today’s jobs with yesterday’s tools!” ― Marshall McLuhan

Category 1

MindMeld finds itself in the fold of Rasa in terms of being a complete chatbot development framework which can be installed anywhere.

MindMeld was founded in 2011, acquired by Cisco in 2017. Subsequently, Cisco announced that it was open-sourcing the MindMeld conversation AI platform.

The MindMeld command line interface.

There is quite a bit of activity on the MindMeld GitHub. However, MindMeld 4.3 was released July 2020.

The last package was released on October 2020…there use to be a cadence of 2 to 3 packages per year. Seemingly there is a slowing down of releases and packages.

Installing MindMeld on an Ubuntu instance is straight forward, and an array of blueprint example application are available.

The basic structure of a Cisco MindMeld Conversational AI application. Apps can be seen as an Assistant, Domains as skills. Followed by Intents and Entities.

MindMeld is a Python based machine learning framework for Conversational AI. Open-source libraries which are used include Tensorflow, scikit-learn and NumPy.

Elasticsearch is used to power the Question and Answer portion for MindMeld.

Data can be structured in a JSON format, and made searchable by making use of Elasticsearch. This acts as a knowledge base resource available within MindMeld.

Rasa follows a very unique path in terms of wanting to deprecate the state machine with its hard-coded dialog flows/trees.

RASA

Together with their Conversation Driven Design (CDD) in the form of Rasa-X this is a very compelling option.

Their entities are contextually aware and they follow an approach where entities and intents really merge.

Compound entities are part of the offering. Entities can be segmented according to roles and groups.

Deprecation of intents have been announced and initiated.

Based on their expansion, funding, developer advocacy and events, this is a company to watch.

Hopefully the bigger players will emulate them. One of their strong points is developer advocacy and being the technology of choice for seed projects.

RASA has succeeded in creating a loyal developer following.

On 28 July 2021, NVIDIA Jarvis got rebranded to NVIDIA Riva. I always thought of the name Jarvis to be too generally used already. The good news is, the core technologies, performance and roadmap remain unchanged.

NVIDIA Riva is an application framework for Multimodal Conversational AI.

NVIDIA Riva is a GPU-accelerated SDK for developing multimodal conversational AI applications.

According to NVIDIA, the only changes from a user perspective:

  • The name “Jarvis” has been replaced with “Riva” in APIs, NGC containers, and other developer resources.
  • Older APIs and applications that use the term “Jarvis” will continue to work but these API’s will be deprecated in favor of the new API’s. Hence a migration strategy needs to be thought of.
  • Performance achievements and optimizations remain unchanged with this change.
  • A new version of Transfer Learning Toolkit that uses Riva in place of Jarvis APIs should be available soon.

The focus is on low latency, less than 300 milliseconds, and high performance demands.

It is a high performance conversational AI solution incorporating speech and visual cues; often referred to as face-speed. Face-speed includes gaze detection, lip activity etc.

The multimodal aspect of Riva is best understood in the context of where NVIDIA wants to take Riva in terms of functionality.

This includes:

  • ASR (Automatic Speech Recognition) / STT (Speech To Text)
  • NLU (Natural Language Understanding)
  • Gesture Recognition
  • Lip Activity Detection
  • Object Detection
  • Gaze Detection
  • Sentiment Detection

Again, what is exciting about this collection of functionality, is that Riva is poised to become a true Conversational Agent.

DeepPavlov finds itself definitely at the higher end of the spectrum; being a native/pro-code framework with a machine learning approach.

DeepPavlov refers to a Semantic Frame. This includes Natural Language Understanding, encompassing Domain Detection, Intent and Entities.

DeepPavlov refers to a Semantic Frame. This includes Natural Language Understanding, encompassing Domain Detection, Intent and Entities.

In the DeepPavlov world, a digital agent is constituted by a collection of skills, which is managed by a Skills Manager.

A skill is made up by different Components.
  • A skill fulfills the user goal in a particular domain.
  • A Model is any NLP model that doesn’t necessarily communicates with the user in natural language.
  • Components are reusable functional parts of a model or skill.
  • There are rule-based models and ML Models.
  • ML Models can be trained independently and in an end-to-end mode being joined in a chain.
  • The Skill Manager performs selection of the correct skill to generate the response.
  • A chainer builds a model pipeline from heterogeneous components (Rule-based/ML/DL).

Category 2

The leading commercial cloud environments attract customers and users to them purely for their:

  • natural language processing prowess and presence,
  • ease of use without installation and
  • environment management.

Among these I count IBM Watson Assistant, Microsoft Bot Framework / Composer / LUIS / Virtual Agents, Google Dialog Flow etc.

Established companies gravitate to these environments, at significant cost of course. These are seen as a safe bet, to meet their Conversational AI requirements.

They are seen as chatbot tools providers in and of their-self.

Scaling of any enterprise solution will not be an issue and continuous development and augmentation of the tools are a given. Resources abound with technical material, tutorials and more.

I cannot help but feel Amazon Lex with Oracle Digital Assistant (ODA) find themselves in this group. My sense is that someone will not easily opt for ODA or Lex if they do not have an existing attachment with Oracle or AWS from a cloud perspective.

Especially if the existing attachment is Oracle Cloud or Oracle Mobile Cloud Enterprise. Or with AWS via Echo & Alexa.

Another impediment with ODA is cost. Free access plays a huge role in developer adoption and the platform gaining that critical mass. We have seen this with IBM being very accessible in terms of their free tier with an abundance of functionality.

Microsoft has gone a long way in more accessible tools, especially with developer environments. Google Dialogflow is also popular and often a point of departure for companies exploring NLU and NLP.

Category 4

🤗 HuggingFace…If a company can lower the barrier to entry for AI in general, and Conversational AI in specific, there is bound to be interest. Ease of initial access needs to be two-fold;

  • Technical and
  • Cost.

Obviously also whilst presenting a compelling value proposition.

Being able to access and experiment with software via Jupyter Notebooks at no cost, without too much technical knowledge requirements are important for creating critical mass in adoption.

Simple example of sentiment analysis on a sentence.

This is why 🤗 HuggingFace is thriving with their easy accessible and open source library for a number of natural language processing tasks.

There is striking similarities in the NLP functionality of GPT-3 and 🤗 HuggingFace, with the latter obviously leading in the areas of functionality, flexibility and fine-tuning.

Named Entity Recognition using the NER pipeline.

Pretrained models for Natural Language Understanding (NLU) tasks allow for rapid prototyping and instant functionality. Transfer learning is a technique to train a machine learning model for a task by using knowledge from another task.

🤗 HuggingFace is democratizing NLP, this is being achieved by acting as a catalyst and making research-level work in NLP accessible to mere mortals.

It is important to understand 🤗 HuggingFace is a Natural Language Processing problem solving company, and not a chatbot development framework company per say.

Their pipelines and models can be used to augment a chatbot framework to perform various tasks. However elements like operational implementation and management of intents and entities are not part of their ambit. Together with dialog development and management.

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.

Natural Language Processing and Understanding can be light weight and easy to implement. It is within anyone’s grasp to create some Python code to process natural language input, and expose it as an API.

The simplest example of a spaCy implementation.

spaCy is NLP tool and not a chatbot development framework.

It does not cater for dialog scripts, NLG, dialog state management etc.

But, what makes spaCy all the more interesting is that it can be implemented as a language processing API assisting an existing chatbot implementation. Especially in instances where users submit a longer input, the chatbot will do good to work with only a specific span or tokens from the utterance.

Also, it can be used for offline post-processing of user conversations.

Positives

  • Quick and easy to start prototyping with.
  • Excellent documentation and tutorials
  • Custom models can be trained.
  • Good resource to serve as an introduction to NLP.
  • Good avenue to familiarize yourself with basic NLP concepts.

Considerations

  • Large sets of data are required for training custom models.
  • Complex implementations can become very technical.
  • Minor languages might pose a challenge.

Rasa NLU API…There is this commonly held belief that when it comes to Natural Language Processing (NLP) you are at the mercy of the giants of cloud computing. These giants being IBM, AWS, Microsoft and Google; to name a few.

The good news is, you are not!

Another challenge when it comes to NLP is that often organizations do not want their data to cross country bonders, or vest in some commercial cloud environment where it is hard to enforce laws pertaining to the protection of person information.

There are a few chatbot platforms which have a clear separation between the NLU portion and the dialog management and integration portions. This allows for the development of a stand-alone NLU API.

Rasa chatbot architecture with NLU portion marked.

The Rasa architecture gives you the opportunity to have a NLU API which can also be used for natural language understanding tasks not related to live conversations. This includes conversations archived on email, live agent conversations etc.

With a single command the API is launched on port 5005. But first, make sure you have activated your anaconda virtual environment with:

conda activate rasa2

My virtual environment is called rasa2.

Run the API:

rasa run --enable-api -m models/nlu-20200917-225530.tar.gz

You can access the API on the URL

http://localhost:5005/model/parse

Interact with your API via a client like Postman.

Sending a JSON query string to the Rasa API

Sending a JSON query string to the Rasa API

Largely Category 3 Offering Rating Matrix

In rating the nine chatbot solutions I looked at nine key points. Obviously NLU capability is key in terms of intents and entities. I was especially harsh on the extend to which entities can be applied in a compound fashion, annotated and detected contextually with decomposition.

Dialog and state development and management are also a key points; ease of development is important and to what extend collaboration is possible.

The other elements are self explanatory.

Key to Ratings

For different organizations, disparate element are important and will guide their thinking and eventually determine their judgement. For instance, even-though Lex does not feature in many respects, if a company is steeped in AWS for other service, Lex might be the right choice.

The same goes for Oracle, Cisco / MindMeld etc.

Chatbot Rating Matrix

Graphic Call Flow / Dialog Development Tools

For larger organizations and bigger teams, collaboration is important. Ease of sharing portions of the dialog and co-creating is paramount. Hence organizations have a need for graphic development environments. Other teams prefer a more flexible native code approach.

Rating of GUI Form Call Flow Development & Editing

IBM Watson Assistant made a big addition with the launch of Actions.

Rasa with their tool called Rasa-X is so unique that it is hard to accurately categorize with the other environments. Rasa-X is graphic, it allows for editing and development, but is far more comprehensive.

The Riva dialog development and management feature is under development and has not been released yet.

Natural Language Understanding

Natural Language Understanding Capability

Natural Language Understanding underpins the capabilities of the chatbot. Without entity detection and intent recognition all efforts to understand the user come to naught.

On some elements of a chatbot environment, improvisation can go a long way. This is not the case with NLU. LUIS has exceptional entity categorization and functionality. This includes decomposable entities. IBM Watson Assistant can also be counted as one of the leaders, with RASA & NVIDIA Riva.

I also looked at the the integration of the NLU components into the other chatbot components. This is where Microsoft excels with their growing chatbot real-estate.

Scalability

Maturity of any framework is tested in an enterprise environment where implementations with diverse use-cases and ever expanding scale are present.

Scalability & Enterprise Readiness

Enterprise readiness is an evaluation criteria which does not enjoy the attention it deserves. Once vulnerabilities are detected, too much money and time have already been invested in the technology.

Conclusion

This is a mere overview based on a matrix with points of assessment I personally deem as important.

And again, based in how important a particular point on the matrix is to you or your organization, will influence our judgement.

In the final analysis the software is to serve a purpose in your organization and current cloud landscape. The offering best suited for that purpose is the best choice for you.

--

--