Conversational AI Explained In The Simplest Terms
In this article I consider the three basic technologies underpinning Conversational AI, together with the basic components of all chatbot development frameworks. And lastly, how Large Langauge Models (LLM) can find their way into chatbot development.
Introduction
Sometimes it makes sense to distill functionality down to its simplest form, below you will find the three main elements defined for a conversational AI system.
Followed by how all current chatbot development frameworks are constituted.
And lastly, how Large Language Models will play a role. I don’t see the introduction of large language models (LLM’s) as a sudden revolution. However, I do see how chatbot functionality will be offloaded to LLM’s, as explained below.
Click here to get an email whenever Cobus Greyling publishes…🙂
Conversational AI Basics
The diagram below defines the three main elements of conversational AI:
- Understanding
- Reasoning
- Generation
Understanding
Understanding is currently addressed with technologies like NLU and NLP. These technologies are used to extract meaning and intent from user utterances. Intent detection is the process of assigning user utterances to a predefined intent.
Intents can be seen as verbs, and entities as nouns. The next step is to extract entities. Read more here about named entities, which can help with the automation of entity detection.
Reasoning
Reasoning is the ability to predict and also learn the next conversational step based on situational and contextual awareness.
Reasoning is currently primarily fulfilled by a dialog tree which is in essence a state machine. Each state (flow node or conversational node) has one or more conditions determining to which connected flow node, the conversation should progress to.
This sounds counter intuitive for a machine learning / artificial intelligence solution to still make use of a complex state/conditional machine.
So we might call it reasoning, but reasoning and generation are very static and pre-defined in the vast majority of instances.
Large Language Models (LLM’s) are addressing reasoning and generation, but fine-tuning is still not as granular as it needs to be.
Other impediments are scaling, and predictability from a compliance and UX perspective.
Generation
Generation in this case refers to Natural Language Generation (NLG). Virtually all chatbots have a set of pre-defined messages which are presented to the user at a specific point in time.
There are different approaches of managing this text, some frameworks have a message abstraction layer where the messages can be viewed and managed. Other frameworks allow for setting conditions to a message. For instance, a high consequence message can be marked to always prompt with a confirmation.
Current Chatbot Development Frameworks
1️⃣ Intents
Let’s consider intents differently, consider the Google search engine, it can be seen as a single dialog-turn chatbot. The main aim of Google is to determine your intent, and then return relevant information based on the discovered intent. The way we search has inadvertently changed, we do not search using key words anymore, but in natural language and sentences.
Google is the biggest intent discovery machine in the world!
When developing a chatbot, using current customer conversations are a great source of information for compiling a list of possible user intents.
2️⃣ Entities
Entities can be seen as nouns.
Entities are the information (nouns) in the user input and is relevant to the user’s intentions.
Recognising entities in the user’s input helps you to craft more useful, targeted responses.
Below an example from Rasa how structure is introduced to entities, by defining entity types.
3️⃣ Dialog Flow
The process of developing the dialog flow can be one of the most tedious and laborious tasks in creating a chatbot. It can become complex and changes made in one area can inadvertently impact another area. A lack of consistency can also lead to unplanned user experiences. Scaling this environment is tricky especially if you want to scale across a large organisation.
4️⃣ Script
The importance of the script is that it informs the user on what the next step is.
Or what options are available in a particular point of the conversation, or it can be used to manage user expectations. A breakdown in the conversation is often due to the dialog not being accurate and intelligible.
Large Language Models (LLM’s)
Often in media LLM’s are referred to as this monolith of immensely large language sets which will magically address many language technology problems.
However, LLM’s can be divided into five main categories. These categories can be helpful in finding the specific language technology application for the LLM.
Below you will see the functionality of Large Language Models grouped into 5 areas…
1️⃣ Clustering
The clustering of utterances and sentences are analogous to intent detection, but in an unsupervised and automated fashion. Sentences, user utterances or conversations can be clustered where each cluster contains semantically similar sentences. An example of this, is the HumanFirst & Cohere POC work.
2️⃣ Dialog Management / State Management
GODEL and Blender Bot are exploring avenues in managing conversations and determining the most probable next dialog turn. (Technology: GODEL(Microsoft), DialoGPT (Microsoft), Blender Bot (Meta AI)
3️⃣ Generation (Including Natural Language Generation)
Generation not only generates bot messages and responses, but maintains bot state, contextual awareness and session context. For examples of this, see BLOOM, Goose AI, EleutherAI, OpenAI, Cohere, AI21Labs.
4️⃣ Question & Answer
Question & Answer is being addressed by KI-NLP (Knowledge Intensive NLP). Broad domain and general questions can be answered, without querying an API or leveraging a traditional knowledge base. Technologies here are Sphere (Meta AI), Commercial Search Engines, Wikipedia, etc.
5️⃣ Language Translation
Language Translation is available on various platforms, Meta AI NLLB is the most notable.