The IBM Watson Assistant Architecture Should Look Like This
And Why Multiple Orchestrated Skills Make Sense
Introduction
IBM Watson Assistant is currently focused on Action Skills, with Watson Discovery acting as a Search Skill to back-up user intent not covered by Actions.
There are signs that Dialog Skills will be introduced at some stage in the future. But the current moratorium on creating new instances with a Dialog Skill does not help the IBM cause.
As seen above, Dialog Skills are displayed, and visible in assistant settings as a future, coming soon feature. The length & duration of this moratorium on Dialog Skills is not known and the shape or form of future Dialog Skills are also not known.
I sure hope that there will be a sense of continuity in terms of Dialog Skills’ functionality etc. Otherwise the inevitable re-work of development work will add overhead.
It does seem like the IBM Watson Assistant team is focusing on Action Skills and seem to be convinced that this is sufficient; for now at least.
The approach from IBM should not be one of searching for a single silver bullet to solve for all conversational AI challenges. But rather the approach should be to attack the problem on multiple fronts. With a digital assistant to which complexity can be added.
Orchestration
The ideal scenario would be, where users can create an assist by implementing and orchestrating multiple skills within a single digital assistant.
Orchestration can be rules based, on user input and NLU results. Certain combinations of intents, entities, contextual awareness based on user entry points, profiles gleaned from API calls etc. can all play a part.
There is an opportunity to create a NLU model for orchestration and leverage this within the NLU sections of the dialog skills.
Digression & disambiguation between skills will be a huge plus. Search & Action Skills need to be used primarily as an extension to Dialog Skills.
There are instances where Action and Search Skills can be used in a stand-alone mode. But these instances will be limited in scope and chatbot functionality.
Imagine if you can have one or more Dialog Skills constituting the Assistant, these Dialog Skills can address different part of the business. The Dialog Skills can be orchestrated within the assistant; where they can be switched off, prioritized or deprioritized. And tweaked in terms of when they are invoked.
Smaller assistants can be completely developed using one or more Action Skills.
The ideal would be that an Assistant is made up of multiple orchestrated Dialog skills, smaller dialogs are handled as extensions built with Action dialogs. And one or more Search Skills can be employed.
A consideration is to add connectors for popular databases like MongoDB, SQLServer, Cloudant etc.
The Search Skill can be used in a standalone scenario where you want to create a searchable knowledge base. But this will run into scaling impediments when conversations needs to be specific.
Action Skills can be used for a quick survey, or slot filling chatbot. The fact that Action Skills are Watson Assistant’s first foray into end-to-end intent-less conversations is exciting. But these skills cannot handle complex dialog configurations, digression, disambiguation, auto learning etc.
Dialog Skills should be the backbone of any conversation, augmented and complimented by search and action skills.
Dialog Skills
This should be the main skill is the assistant. All conversational agents should be anchored by one or more Dialog Skills.
The Dialog Skill allows for defining intents and entities, NLU structure, conversations are defined by a dialog tree.
A graphic dialog editor is available and scripting can also be used. The response dialog is also defined here.
Key benefits of the Dialog Skill are:
- Disambiguation
- Self-Learning
- Intent Recommendations (NLU)
- Intent Conflicts (NLU)
- Compound Intents (NLU)
- Irrelevance Detection
Below is a simple example of Dialog Configuration For Multiple Intents and illustration how the Dialog Skill’s development interface looks.
I went with the simplest dialog structure possible to create this example. Here you can see some of the conditions within the image. The idea is for the conversation to skip through the initial dialog nodes and evaluate the conditions.
Watson Assistant’s dialog creation and management web environment is powerful and feature rich. It is continuously evolving with new functionality visible every so often.
Actions
From 9 February 2022 all new instances of IBM Watson Assistant (WA) points to the new interface. This new interface or experience is Actions based and not independent NLU/Dialog Skill based.
Seemingly, for new instances the previous interface is inaccessible. However, for existing implementations in the previous interface, no work will be lost when toggling or switching between the two environments.
For existing implementations, you can switch back to the classic/previous experience at any time by clicking Switch to classic experience from the account menu.
Firstly, Actions should be seen as another type of skill to complement the other two existing skills;
- dialog skills and
- search skills.
Actions cannot be seen as a replacement for dialogs.
Secondly, actions can be used as a standalone implementation for very simple applications. Such simple implementations may include customer satisfaction surveys, customer or user registration etc. Short and specific conversations.
Thirdly, and most importantly, actions can be used as a plugin or supporting element to dialog skills.
Of course, your assistant can run 100% on Actions, but this is highly unlikely or at least advisable.
The best implementation scenario is where the backbone of your assistant is constituted by one or more dialog skills, and Actions are used to enhance certain functionality within the dialog. With something like a search skill.
This approach can allow business units to develop their own actions, due to the friendly interface. And subsequently, these Actions can then plugged into a dialog.
This approach is convenient if you have a module which changes on a regular basis, but you want to minimize impact on a complex dialog environment.
Within a dialog node, a specific action that is linked to the same Assistant as this dialog skill can be invoked. The dialog skill is paused until the action is completed.
An action can also be seen as a module which can be used and reused from multiple dialog threads.
When adding actions to a dialog skill, consideration needs to be given to the invocation priority.
If you add only an actions skill to the assistant, the action skill starts the conversation. If you add both a dialog skill and actions skill to an assistant, the dialog skill starts the conversation. And actions are recognized only if you configure the dialog skill to call them.
Fourthly, if you are looking for a tool to develop prototypes, demos or proof of concepts, Actions can stand you in good stead.
Mention needs to be made of the built-in constrained user input, where options are presented. Creating a more structured input supports the capabilities of Actions.
Disambiguation between Actions within an Action Skill is possible and can be toggled on or off. This is a very handy functionality. It should address intent conflicts to a large extend.
System actions are available and these are bound to grow.
How NOT To Use Actions
It does not seem sensible to build a complete digital assistant / chatbot with actions. Or at least not as a standalone conversational interface. There is this allure of rapid initial progress and having something to show. However, there are a few problems you are bound to encounter.
Conversations within an action are segmented or grouped according to intents. Should there be intent conflicts or overlaps, inconsistencies can be introduced to the chatbot.
Entity management is not as strong within Actions as it is with Dialog skills. Collection of entities with a slot filling approach is fine.
But for more advance conversations where entities need to be defined and detected contextually Actions will not suffice. Compound entities per user utterance will also pose a challenge
Compound intents, or multiple intents per user utterance is problematic.
If you are use to implementing conversational digression, actions will not suffice.
Search
Amongst others, there has been two general notions within the chatbot framework ecosystem.
The first is the deprecation of intents. There are four emerging approaches to the deprecation of intents.
The second is the deprecation of the state machine. This is necessary to introduce a more flexible conversational flow. The leader in this space is currently Rasa.
But there is another way to introduce more flexibility to a state machine driven dialog management environment; where all conversational paths and responses are pre-define…
This is by introducing a feature where, if there is no intent detected with a high confidence, the dialog can default to search a knowledge base and respond with the result.
This is not something unique to any chatbot framework. NVIDIA Riva, which was released recently has integration examples to Wikipedia to serve as a knowledge base which can be searched. Other platforms like MindMeld, Rasa, Microsoft and more make provision for such functionality. Obviously these systems vary in complexity and implementation steps.
Conclusion
From the examples you should have a good idea on how these three skills can be orchestrated. The Search Skill can be used in a standalone scenario where you want to create a searchable knowledge base. But this will run into scaling impediments when conversations needs to be specific.
Action Skills can be used for a quick survey, or slot filling chatbot. The fact that actions are Watson Assistant’s first foray into end-to-end intent-less conversations is exciting. But these skills cannot handle complex dialog configurations, digression, disambiguation, auto learning etc.
Dialog Skills should be the backbone of any conversation, augmented and complimented by search and action skills.