Dialog Management Considerations for Chatbots
Chatbot User Experience is Determined by Dialog Management Efficiency
Introduction
Each conversational transition between dialogs is critical in facilitating the conversation.
Conversations with a conversational user interface can be via text or speech.
The most common speech input devices or interfaces are smart assistants we have in our homes. Be it Amazon Echo Alexa or Google Home. These conversational interfaces has gone main stream.
Then there are texted based conversational interfaces which lives within our messaging applications and interfaces. We tend to option for voice in more private settings; for instance at home or in the car.
We revert to text in public and shared spaces, we also opt for text when the conversation is bound it include a transaction with payment.
Conversational Journey Management
Most often in conversational journeys, where the user make use of a voice smart assistant, the dialog is constituted by one or two dialog turns. Questions like, what is the weather or checking travel time.
With text based conversations, like chatbots, multiple dialog turns are involved hence management of the dialog becomes critical. For instance, if an user want to make a travel booking, or making a restaurant reservation the dialog will be longer.
Your chatbot typically has a domain, a specific area of concern, be it travel, banking, utilities etc. Grounding is important to establish a shared understanding of the conversation scope.
You will see many chatbots conversations start with with a number of dialogs initiated by the chatbot. The sole purpose of these messages is to ground the conversation going forwards.
Secondly, the initiative can sit with the user or with the system; system-directed initiative. In human conversations the initiative is exchanged between the two parties in a natural way; mixed-dialog. The chatbot must allow for mixed dialog. Ideally the initiative sits with the user, and once the intent is discovered, the system-directed initiative takes over to fulfill the intent.
If the initiative is not managed, the flow of dialog can end up being brittle. Where the user struggles to inject intent and further the dialog. Or even worse, the chatbot drops out of the dialog.
Conversational Interface Elements
The basic architecture of most chatbots can be divided into two portions; NLU and Dialog management.
The NLU development is a process where an API is created where user utterances can be submitted to the API, and the intent and entities of the utterance are returned.
Commercial Natural Language Understanding (NLU) cloud based environments like IBM Watson, Microsoft LUIS, AWS Lex etc, have excellent GUI’s. There are also a number of opensource solutions like https://rasa.com and https://spacy.io. These interfaces facilitates the rapid creation of NLU models which can be available via an API.
The main elements constituting the NLU are intents and entities. Intents can be seen as a verb, the user intent. Entities can be seen as nouns. In the case of the a travel bot, entities will be destination, times, dates etc.
From a developer perspective, NLU creation is very approachable and all required tools are incorporated in the interface.
The second portion, the Dialog Management (DM) is where the challenge lies. In DM the major chatbot development environments differ in their approaches.
Dialog Management Methods
DM is the process of deciding which Dialog State to present to the user. This is ultimately steered by the Conversation Objective. But on a dialog turn-by-turn basis the user input, conversational context and external data serve as input to the Dialog Logic component.
The Dialog Logic component in turn decides on a dialog note or state to move the conversation to and present to the caller.
Broadly there are two dialog management approaches. Handcrafted or Probabilistic.
The handcrafted approach is more palatable and approachable for traditional developers, the probabilistic approach can be abstract and seem intangible.
But there are trade-offs, The Handcrafted approach yields high conversational resilience, but less human-like. The Probabilistic approach is more human-like but then there might be less resilience in the conversation and even experienced as brittle.
The Handcrafted approach to state management can be divided into:
- Finite State
- Frame Based
Finite State is self explanatory, a rigid set of rules and each state having a fixed number of transitions to other states under fixed conditions. The system steps through the dialog and asks questions from the user sequentially.
Frame Based allows for more flexibility where a data-model is added to the dialog tree. Slots can be filled in any sequence or iterations. The system prompts for data outstanding. This is the basis most commercial offerings; Google Dialogflow, Amazon Lex, IBM Watson Assistant etc.
Frameworks which employ the Probabilistic approach include PyDial and Chatterbot. Rasa is seen as a hybrid approach. Probabilistic approaches learn a dialog policy, the strategy of what to say next in the conversation, from transcriptions of real conversations.
So instead of defining rules for the dialog strategy by hand, probabilistic DM takes a different approach by learning the rules from actual conversations.
Attracting Developer
Access to developer tools plays a huger role in developer and organisational adoption. Rasa is an example of successful developer advocacy, establishing a community and tutorials etc.
IBM Watson Assistant has a very friendly UI with extensive documentation and tutorials; written and videos.
Even thought Microsoft has a very astute NLU environment in LUIS, with the most advanced entity structures, they lacked a graphic dialog development and management system. Microsoft did introduce the Power Virtual Agents environment, and the Bot Framework to is established, but an enterprise grade GUI was lacking.
This have changed with the introduction of Composer for Microsoft Bot Framework. More on this tool at a later article.
Natural Language Generation
Language is part and parcel of a chatobots, it is the wording informing the user on the dialog state and what options are available. It is really the user interface, and all the user has to inform their next action.
Hence the importance of the right wording with maximum information presented to the user.
The best approach is to make the script presented to the user live, and generate it on the fly.
The most basic approach to Natural Language Generation (NLG) is infusing static text with variable data. For instance, “Your {Premium Plus} data package has balance of {$464,00} and is payable on {4 April}. This is analogous to the way an IVR will present data to the caller.
The degree to which natural language is generated and is dynamic can obviously vary.
The ideal is to not even have templates which are populated with real-time data, but fully generate natural language.
These two videos show some prototyping done, where models were created with test data and natural language was generated on the fly.
Conclusion
Read more about this subject in detail in the link below. I have also published more than 60 stories on Medium related to this topic and other chatbot related subjects.
Read more…