Four Emerging Approaches To Chatbot Intent Deprecation
Looking At The Approach Of Four Chatbot Platforms
Introduction
Traditionally each and every conceivable user input needs to be assigned to an intent. During transcript review, if user input does not neatly match an intent, one needs to be invented.
This strait-laced layer of categorizing a user utterance according to an intent is rigid and inflexible in the sense of being a set of categories which manages the conversation.
Hence, within a chatbot the first line of conversation facilitation is intent recognition.
And herein lies the challenge, in most chatbot platforms there is a machine learning model of sorts used to assign a user utterance to a specific intent.
And from here the intent is tied to a specific point in the state machine (aka dialog tree). As you can see from the sequence below, the user input “I am thinking of buying a dog.” is matched to the intent Buy Dog. And from here the intents are hardcoded to dialog entry points.
Below you see a single dialog node from IBM Watson Assistant, where the heading says: “If assistant recognizes”. Under this heading a very static and fixed condition or set of conditions can defined.
Why Is This A Problem?
The list of Intents are a fixed and a hardcoded and defined reference within a chatbot. Any conceivable user input needs to be anticipated and mapped to an single intent.
Again, the list of intents is rigid and fixed. Subsequently each intent is linked to a portion of the pre-defined dialog; as mentioned earlier. There might a machine learning model which decides which intent fit the utterance best. But again, the list of intents is preset.
But, what if the layer of intents can be deprecated and user input can be mapped directly to the dialog?
This development is crucial in order to move from a messaging bot to a conversational AI interface.
This layer of intents is also a layer of translation which muddies the conversational waters.
How would it look if intents are optional and can by bypassed? Where user input is directly mapped to a user story? End-to-end training, user utterances mapped directly to the dialog.
Here I look at the approaches followed by:
- IBM Watson Assistant with Actions
- Amazon Alexa Conversations
- Microsoft Power Virtual Agent
- Rasa End-To-End Training
IBM Watson Assistant Actions
An action is activated or assigned to the conversation when the assistant receives a customer message that it has learned to recognize.
You could think of this as the phrases that defines the action and really encapsulates the idea of an intent
Training examples
With each example utterance, your assistant learns when this is the right action for what a customer wants.
Entering phrases a person might use to express their goal.
The approach taken with Actions is one of an extreme non-technical nature.
The interface is intuitive and requires virtually no prior development knowledge or training.
User input (entities) variables are picked up automatically with a descriptive reference.
Conversational steps can be re-arranged and moved freely to update the flow of the dialog.
Updates can be saved automatically, machine learning takes place in the background.
And the application (action) can be tested in a preview pane.
There is something of Actions which reminds me of the Microsoft’s Power Virtual Agent interface.
The same general idea is there, but with Watson the interface is more simplistic and minimalistic.
And perhaps more a natural extension of the current functionality.
- You can think of an action as an encapsulation of an intent. Or the fulfillment of an intent.
- An action is a single conversation to fulfill an intent and capture the entities.
- A single action is not intended to stretch across multiple intents or be a horizontally focused conversation.
- Think of an action as a narrow vertical and very specific conversation.
Below you see a Single Actions skill called BankBalance with two actions listed under it.
How To Use Actions
Firstly, Actions should be seen as another type of skill to complement the other two existing skills;
- dialog skills and
- search skills.
Actions must not be seen as a replacement for dialogs.
Secondly, actions can be used as a standalone implementation for very simple applications. Such simple implementations may include customer satisfaction surveys, customer or user registration etc. Short and specific conversations.
Thirdly, and most importantly, actions can be used as a plugin or supporting element to dialog skills.
Of course, your assistant can run 100% on Actions, but this is highly unlikely or at least not advisable.
The best implementation scenario is where the backbone of your assistant is constituted by one or more dialog skills, and Actions are used to enhance certain functionality within the dialog. With something like a search skill.
This approach can allow business units to develop their own actions, due to the friendly interface. And subsequently, these Actions can then plugged into a dialog.
This approach is convenient if you have a module which changes on a regular basis, but you want to minimize impact on a complex dialog environment.
Within a dialog node, a specific action that is linked to the same Assistant as this dialog skill can be invoked. The dialog skill is paused until the action is completed.
An action can also be seen as a module which can be used and reused from multiple dialog threads.
When adding actions to a dialog skill, consideration needs to be given to the invocation priority.
If you add only an actions skill to the assistant, the action skill starts the conversation. If you add both a dialog skill and actions skill to an assistant, the dialog skill starts the conversation. And actions are recognized only if you configure the dialog skill to call them.
Fourthly, if you are looking for a tool to develop prototypes, demos or proof of concepts, Actions can stand you in good stead.
Mention needs to be made of the built-in constrained user input, where options are presented. Creating a more structured input supports the capabilities of Actions.
Disambiguation between Actions within an Action Skill is possible and can be toggled on or off. This is a very handy functionality. It should address intent conflicts to a large extend.
System actions are available and these are bound to grow.
How NOT To Use Actions
It does not seem sensible to build a complete digital assistant/chatbot with actions. Or at least not as a standalone conversational interface. There is this allure of rapid initial progress and having something to show. However, there are a few problems you are bound to encounter.
Conversations within an action are segmented or grouped according to user utterance types. Should there be conflicts or overlaps, inconsistencies can be introduced to the chatbot.
Entity management is not as strong within Actions as it is with Dialog skills. Collection of entities with a slot filling approach is fine.
But for more advance conversations where entities need to be defined and detected contextually Actions will not suffice. Compound entities per user utterance will also pose a challenge
Compound intents, or multiple intents per user utterance is problematic.
If you are use to implementing conversational digression, actions will not suffice.
Positives
- Conversational topics can be addressed in a modular fashion.
- Conversational steps can be dynamically ordered by drag and drop.
- Collaboration
- Variable management is easy and conversational from a design perspective.
- Conditions can set.
- Complexity is masked and simplicity is surfaced.
- Design and Development are combined.
- Integration with current solutions and developed products
- Formatting of conversational presentation.
Negatives
- If used in isolation scaling impediments will be encountered.
- Still State Machine Approach.
- Linear Design interface.
Amazon Alexa Conversations
You provide Alexa with a set of dialogs to demonstrate the functionalities required for the skill.
AC builds a statistical model which interpret customer inputs & predict the best response from the model.
Conversations have a similar option, though not as complete and comprehensive as LUIS. Within conversations you can define entities, which Amazon refers to Slots.
The aim during the conversations is to fill these slots (entities). Within conversations you can create a slot with multiple properties attached to it. These properties can be seen as sub-slots or sub-categories which together constitute the higher order entity.
Alexa Conversations introduces a new slot type custom with properties (PCS).
Constituting a collection of slots which are hierarchical. This can be used to pass structured data between build-time components such as API Definitions and response templates.
Deprecation Of Rigid State Machine Dialog Management
Deprecating the state machine for dialog management demands a more abstract approach; many are not comfortable of relinquishing control to an AI model.
The aim of Alexa Conversations (AC) is to furnish developers with the tools to build a more natural feeling Alexa skill with fewer lines of code. AC is an AI-driven approach to dialog management that enables the creating of skills that users can interact with in a natural unconstrained manner.
This AI-driven approach is more abstract, but more conversation driven from a development process. Sample dialogs are important, together with annotation of data.
You provide Alexa with a set of dialogs to demonstrate the functionalities required for the skill.
The build time systems behind Alexa Conversations will take the dialogs and create thousands of variations of these examples.
This build process takes quite a while to complete.
Fortunately any errors are surfaced at the start of the process, which is convenient.
AC builds a statistical model which interpret customer inputs & predict the best response from the model.
From that information, AC will be able to make accurate assumptions .
AC uses AI to bridge the gap between voice application you can build manually and the vast range of possible conversations.
Framework Components
The five build-time components are:
- Dialogs
- Slots
- Utterance Sets
- Response Templates
- API Definitions
Dialogs
Dialogs are really example conversations between the user and Alexa you define. You cans see the conversation is multi-turn and complexity is really up to you to define.
For the prototype there are three entities or slots we want to capture, and four dialog examples with four utterances each were sufficient. Again, these conversations or dialogs will be used by AC to create an AI model to produce a natural and adaptive dialog model.
Slots
Slots are really the entities you would like to fill during the conversation. Should the user utter all three required slots in the first utterance, the conversation will only have one dialog turn.
The conversation can be longer of course, should it take more conversation turns to solicit the relative information from the user to fill the slots. The interesting part is the two types of slots or entities. The custom defined slots with values, and the one with properties.
Alexa Conversations introduces custom slot types with properties (PCS) to define the data passed between components. They can be singular or compound. As stated previously, compound entities or slots can be decomposed.
Compound entities which can be decomposed will grow in implementation and you will start seeing it used in more frameworks.
Utterance Sets
Utterance Sets are groups of utterances that users may say to Alexa, which can include slots. They are used when annotating User Input turns in a Dialog.
This is the one big drawback I see in AC, is the fact for each permutation of slots/entities, examples need to be defined.
For example:
1. abc
2. a
3. b
4. c
5. ab
6. bc
7. ac
For the three slots/entities, seven example sets need to be given. Imagine how this expands, should you have more slots/entities.
Response Templates
Responses are how Alexa responds to users in the form of audio and visual elements. They are used when annotating Alexa Response turns in a Dialog.
API Definitions
API Definitions define interfaces with your back-end service using arguments as inputs and return as output.
AC is a definite a move in the right direction…
The Good
- The advent of compound slots/entities which can be decomposed. Adding data structures to Entities.
- Deprecating the state machine and creating an AI model to manage the conversation.
- Making voice assistants more conversational.
- Contextually annotated entities/slots.
- Error messages during the building of the model were descriptive and helpful.
The Not So Good
- It might sound negligible; but building the model takes a while. I found that the errors in my model was surfaced at the beginning of the model building process, and training stopped. Should your model have no errors, the build is long.
- Defining utterance sets are cumbersome. Creating utterance sets for all possible permutations if you have a large number of slots/entities is not ideal.
- It is complex, especially compared to an environment like Rasa. The art is to improve the conversational experience by introducing complex AI models; while simultaneously simplifying the development environment.
Microsoft Power Virtual Agent
And Azure QnA Maker
Companies introducing Conversational Interfaces are looking to scale as quick as possible. Huge focus is placed on the UX and CX and often due diligence is not followed regarding the chatbot framework and basic architecture.
As the chatbot in specific and the Conversational AI environment in general are growing, more often than not problems are encountered in scaling the chatbot. It becomes hard to extend the environment.
This is especially the case when a graphic environment is used for the dialog management and conversational node instead of native code.
The Problem of Code
Traditionally a chatbot ecosystem consist of of 3 components;
- Intents
- Entities
- Dialog Flow and Conversational Nodes
The Conversational nodes also including the Dialog. The Dialog being the wording displayed to the user, the output the chatbot gives the user.
Intents and Entities are defined within a GUI (Graphic User Interface) and often the output is a custom NLU API.
But then the conversational portion, is the hard work and takes the longest.
Within the Microsoft environment, the tool available was the Bot Framework, with which a Digital Assistant could be created or Skills. This necessitated native code (C# in most cases).
Microsoft has extended their Conversational AI offering with an environment they call Power Virtual Agent (PVA). Before we look at the PVA functionality, it is important to note the following…
The PVA is a good design, prototype and wire-frame environment.
The PVA is an excellent tool to get your chatbot going, and a fairly advanced chatbot can be crafted with the PVA with API integration, the dialog authoring canvas is a advanced in functionality.
Invariably in time your chatbot is going to outgrow PVA, and then what? This is the part where the Microsoft Conversational AI ecosystem is ready to allow extend.
Extend To Bot Framework Skills
Power Virtual Agents enables you to extend your bot using Azure Bot Framework Skills. If you have already built and deployed bots in your organization (using Bot Framework pro-code tools) for specific scenarios, you can convert bots to a Skill and embed the Skill within a Power Virtual Agents bot.
You can combine experiences by linking re-usable conversational components, known as Skills.
Within an Enterprise, this could be creating one parent bot bringing together multiple sub-bots owned by different teams, or more broadly leveraging common capabilities provided by other developers.
Skills are in itself Bots, invoked remotely and a Skill developer template (.NET, TS) is available to facilitate creation of new Skills.
What Are Topics
One of the key components if the VPA are Topics. When you create bots with Power Virtual Agents, you author and edit topics.
Topics are discrete conversation paths that, when used together within a single bot, allow for users to have a conversation with a bot that feels natural and flows appropriately.
These topics can be seen as different customer journeys or dialog paths within your chatbot. These topics are invoked by trigger phrases. These trigger phrases are defined by you as the user, and can be seen as a LUIS light.
You don’t need any other NLU/NLP configuration apart from these trigger phrases. Variants of trigger phrases will also activate the closest match. Hence a model of sorts are created by the Trigger Phrases and close matches are possible.
I view this as almost a LUIS light version. Topics are a combination of Intents and the actual conversational dialog.
Entities
Conversations in Power Virtual Agents centers around natural language understanding, which is the ability to understand user intent.
One fundamental aspect of natural language understanding is to identify entities in user dialog. Entities are crucial in any conversation, after intent has been established. Entities can be seen as nouns. An entity can be viewed as an information unit that represents a certain type of a real-world subject, like a phone number, zip code, city, or even a person’s name.
The Smart matching option enables the bot’s understanding of natural language. This can help match misspellings, grammar variations, and words with similar meanings.
If the bot isn’t matching enough related words, enhance the bot’s understanding further by adding synonyms to your list items.
Language Understanding (LUIS)
LUIS can be used as an additional NLP resource, as an API To identify valuable information in conversations. LUIS can be used for an initial NLP high pass or a fallback if the right topic cannot be determined. LUIS integrates seamlessly with the Azure Bot Service, making it easy to create a sophisticated bot. And also to scale your solution.
Creating A Bot in Power Virtual Agent
Final Thoughts on PVA
There is many similarities between Power Virtual Agents and IBM Watson Assistant Actions. PVA has the approach of topics, different conversation paths; which is involved by a collection of example user utterances. These utterances replaces defined intents.
Rasa End-To-End Training
No Intent Stories
This is one of two ways of approaching no-intent stories. Below is the simplest approach; a no-intent conversation living in the same training file as other intent based stories.
Glaringly intent and action is absent.
- story: No Intent Story
steps:
- user: "hello"
- bot: "Hello human!"
- user: "Where is my nearest branch?"
- bot: "We are digital! Who needs branches."
- user: "Thanks anyway"
- bot: "You are welcome. No to branches!"
Below you can see a conversation which invokes this story, and the deviations from the trained story is obvious.
The next step is to look at a hybrid approach, where no-intent dialogs can be introduced to an existing stories.
Hybrid Approach
Looking at the story below, you will see the story name, and flowing into the intent name, action…and then user input is caught sans any intent. Followed by an action.
- story: account_checking
steps:
- intent: tiers
- action: utter_tiers
- user: "just give that to me again?"
- action: utter_tiers
Here is the conversation with the chatbot:
ML Story is defined on the left, and an interactive test conversation on the right. Rasa X and interactive learning are not yet available.
Contextual Conversations
From a user perspective, context of the conversation exits in the user’s mind. Someone might be ordering a pizza, and ask for extra cheese, and then say in the next dialog, “That is too expensive”.
From a user perspective, the message is to cancel the extra cheese. From a dialog and contextual perspective, this is not so obvious. Building truly contextually aware chatbots is not an easy feat.
Rasa wants the context of the conversation to affect the prediction of the next dialog.
Looking at possible advantages and disadvantages…
Disadvantages:
- With solutions like Rasa, Microsoft LUIS, Amazon Lex etc., the NLU service is separate from the dialog/state machine component. This means the NLU model can be used as a separate resource within an organization. With the deprecation of intents, NLU and dialog/state management merges. In short, if NLU (with intents and entities) is separated from dialog management/dialog/script, then the NLU can be utilized via an API by different parts of the organization. With this deprecation of intents, end-to-end development and training is used. So the NLU portion merges with the dialog portion. And hence that advantage of using NLU as as service is lost.
- Perhaps this gives rise to a scenario where more user stories needs to be created and where user stories cannot be too similar. Seeing the conversation pivots and relies heavily on the training data in ML stories.
Advantages:
- Rasa is the avant-garde of the chatbot world, pushing the boundaries. Intent deprecation is inevitable. If any chatbot will attempt this successfully, it will be Rasa.
- The case is not made for 100% dedication to intents or no-intents. No-intent user stories can be used as a quick solution, especially for chit-chat and small talk. This is done already, to some degree; think of Microsoft’s QnA maker; which is not intent based. But it is limited in scope and functionality. Also think IBM Watson Assistant’s new Action Dialogs. Which is really a quick and easy way to roll out a simple dialog interface. But not serve as a complete solution.
Conclusion
It is evident that the vision of deprecating intents is shared. Where intents are replaced by a grouping of utterances to invoke an action on the chatbot’s side.
And, somehow, the void left by the intents’ absence needs to be filled by a grouping of user input examples. The importance of this has also been highlighted as one of the milestones of reaching Conversational AI maturity and true flexibility.
These four organizations, Amazon, IBM, Microsoft & Rasa ae solving the same problem, but in much different ways…
An important aspect for me is to what extent the proposed approach is extendable and merges with the current solution. You could say clarity of vision and execution. Here Rasa leads, followed by IBM.