Botpress Is Releasing OpenBook Private Beta Within The Next Few Weeks
Botpress Recently Announced The Release Of OpenBook, A “Next-Generation NLU Engine”
Botpress recently announced the release of OpenBook, a “next-generation NLU engine”…
What’s clear is that with this announcement, Botpress is taking a big stance in affirming an “intent-less” NLU paradigm shift (which reminds me of the similar big stance Rasa took with regards to intent-less dialog management).
Most notably, the following argument for going in this direction is at the top of their website marketing copy:
Intents don’t scale.
Since intent-based engines are classifiers, each intent directly impacts the performance of all other intents. The maintenance efforts are exponential with the number of intents.
Data Preparation & Practicalities
I’ve explored knowledge-based information-retrieval approaches in the past (Watson Discovery, MindMeld etc) that are truly intent-less; however after reviewing OpenBook’s white-paper, I was left with more questions than answers…that I hope will become clear once I receive my private beta invite in a few weeks time…🙂
Specifically, the white-paper states that:
Unlike the intent-based approaches, chatbots built on OpenBook do not require the input of questions, their variations, and various answers to function. Instead, chatbots built on OpenBook can be deployed as soon as a book is provided to the platform.
It sounds like building a “book” is less work than building a list or taxonomy of intents.. however in practice, I’m not sure!
Looking at the instructions for creating the Book used in their benchmarks, we see the following (with examples):
Put your name at the top of the sheet you create.
Please fill in 3 REALISTIC questions per fact. (see example)
Please fill in 5 combined questions per paragraph in the *** Combined question section (see example)
The questions should be one sentence long
Repeat the fact number for each associated question
See the example of the output below, copy this formatting (copy the formatting, not the questions)
*** EXAMPLE ***
*6.01 The Montreal Hotel & Suites offers a range of convenient amenities
6.01 Since you are a budget hotel I'm wondering what amenities might be available?
6.01 My wife likes hotels with amenities and I assume that you have a few?
6.01 Do you have amenities beyond just breakfast?
*6.02 The amenities include a spa and massage therapists, hairdressing services, laundry and
deep cleaning services, and breakfast every morning.
6.02 I need my hair cut and am wondering if you can do it?
6.02 Are there any laundry shops nearby?
6.02 Do you have anything I can do to relax?
*6.03 Breakfast is available at an extra cost.
6.03 Is breakfast included normally in the booking?
6.03 Is it possible to organize breakfast?
6.03 My kids like pancakes, I expect they can have them?
*6*** Combined Questions
6*** I need my hair cut but also wondering if you can do my laundry?
6*** Is it possible to have a massage after breakfast?
6*** I'm assuming you have amenities beyond breakfast, like laundry?
6*** I assume you have breakfast in the morning but not sure if I will be asked to pay?
6*** Does budge hotel mean no amenities or can I get breakfast for free?
If I’m understanding correctly:
- Building a Book requires adding at least 3 examples/variations per “fact”,
- As well as an additional 5 examples/variations per “combined fact,
- With the correct markdown.
There are 151 facts in the demo Book — this means that building a Book that will be useful requires formatting and creating a minimum of 450 variations, with an additional 145 combined fact variations (for a total of 600+ utterances).
It’s not clear for me how this solves or is different from managing 150 intents and their accompanying training examples; specifically, it feels like the marketing statement “the maintenance efforts are exponential with the number of intents” is somewhat misleading, since from what I gather “facts” are essentially the equivalent to “intents” (in terms of data structure) with OpenBook, and require the same amount of effort.
Also not clear is whether the benchmark numbers provided by Botpress take into account the time required to create and format this training data.
I’m fairly certain that if I was provided the
fact:[variation, variation, variation…]
data ready in a CSV format, loading it into something like Amazon Lex, DialogFlow, Watson or any other intent-based NLU provider, where
fact=intent, variation=training example
could be done in a few minutes.
OpenBook’s NLU is probably powered by some kind of semantic similarity, since they say it’s based on large language models. They are definitely not the first to provide this kind of capabilities, however I do imagine that the performance can lead to improved, or at least different results compared to traditional “trained” NLU models.
As you can see, I’m very eager to get to the bottom of the “intent vs. intent-less” claims and debate, because I get the feeling that the premise (that intents don’t scale) is really a question of the tooling used to manage the underlying data.
A lot of platforms have recognised this and are starting to make it much easier to address the longer-tail of NLU by making intent data management more powerful. Specifically, some of the leading conversational AI platforms (Cognigy, Nuance Mix, Kore.ai etc) recognise that intents need to evolve beyond the flat list they are currently today.
If we want to make better use of them, leading to what I’d call “intent augmentation” features and capabilities being added to unlock more value around intents:
- Hierarchical and nested intents
- Intents that can be switched on & off
- Thresholds that can be set on a per-intent level
- Sub-patterns within intents, sub-intents and follow-up intents
Even more exciting is the availability of solutions focusing on solving the data management and scalability of NLU itself; HumanFirst is one of the leading platforms in this category, and provide a no-code data engineering solution that integrates with conversational AI platforms (like Rasa, DialogFlow etc), and allows teams to build NLU models from the bottom-up (starting from the raw unstructured data).
Aided workflows that allow continuous improvement and scalability. For example:
- Intent Merging,
- Intent Splitting,
- Built-In Model Evaluation Reports,
- Intent Disambiguation etc.
HumanFirst claims that their customers are able to scale NLU models to hundreds and thousands of intents & entities this way. While continuing to use their existing intent-based frameworks like DialogFlow or Watson for the dialog management and business logic.
I’ve been exploring a number of platforms recently, and feel convinced that the pain that led to solutions like OpenBook come from their developers having had to build and scale intent-based NLU models using tools like Excel in the past. This is indeed painful, as most conversation designers will know 🙂.
I’m not convinced that moving from Excel to a markdown format (like OpenBook requires) is any improvement.
In fact, I’d argue that maintaining the training data in a markdown word format will be waaaaaaay more painful than Excel at scale.
For example, it appears that continuously improving the Book using new unstructured data (for example, from the questions users ask the bot once deployed) requires copy-pasting from the conversation logs into this markdown format; this is not only a very painful methodology, but also doesn’t solve for the problem of identifying new facts that should be part of the knowledge, and introducing them into the Book without creating ambiguity (which is where a tool like HumanFirst shines).
I was curious to get more answers and clarification during the VoiceLunch presentation of OpenBook, however it was a less technical talk than I had hoped so I think I’ll need to play around with the beta before I can post an update. 🙂
Cobus Greyling - Medium
Read writing from Cobus Greyling on Medium. NLP/NLU, Chatbots, Voice, Conversational UI/UX, CX Designer, Developer…ç
Build Chatbots | Chatbot for Developers
Botpress is a modern developer stack to build enterprise and open-source chatbots. Learn about our conversational AI…
Read the Botpress OpenBook whitepaper.
No-code tooling for NLU
The complete productivity suite to transform natural language into business insights and AI training data
Watson Knowledge Studio — Overview
Teach IBM Watson the language of your domain with custom models that identify entities and relationships unique to your…
Knowledge Base Actions
Knowledge base actions enable you to handle the following kind of conversations: A common problem in conversational AI…