Photo by Jp Valery on Unsplash

Botpress Is Releasing OpenBook Private Beta Within The Next Few Weeks

Botpress Recently Announced The Release Of OpenBook, A “Next-Generation NLU Engine”

Cobus Greyling
8 min readMay 10, 2022

--

Introduction

Botpress recently announced the release of OpenBook, a “next-generation NLU engine”…

Currently we can’t play with OpenBook, as the private beta is opening within a few weeks time. The only information available is via the OpenBook Whitepaper.

Currently we can’t play with OpenBook, as the private beta is opening within a few weeks time. The only information available is via the OpenBook Whitepaper.

What’s clear is that with this announcement, Botpress is taking a big stance in affirming an “intent-less” NLU paradigm shift (which reminds me of the similar big stance Rasa took with regards to intent-less dialog management).

https://openbook.botpress.com

Most notably, the following argument for going in this direction is at the top of their website marketing copy:

Intents don’t scale.
Since intent-based engines are classifiers, each intent directly impacts the performance of all other intents. The maintenance efforts are exponential with the number of intents.

Data Preparation & Practicalities

I’ve explored knowledge-based information-retrieval approaches in the past (Watson Discovery, MindMeld etc) that are truly intent-less; however after reviewing OpenBook’s white-paper, I was left with more questions than answers…that I hope will become clear once I receive my private beta invite in a few weeks time…🙂

Specifically, the white-paper states that:

Unlike the intent-based approaches, chatbots built on OpenBook do not require the input of questions, their variations, and various answers to function. Instead, chatbots built on OpenBook can be deployed as soon as a book is provided to the platform.

It sounds like building a “book” is less work than building a list or taxonomy of intents.. however in practice, I’m not sure!

Looking at the instructions for creating the Book used in their benchmarks, we see the following (with examples):

If I’m understanding correctly:

  • Building a Book requires adding at least 3 examples/variations per “fact”,
  • As well as an additional 5 examples/variations per “combined fact,
  • With the correct markdown.

There are 151 facts in the demo Book — this means that building a Book that will be useful requires formatting and creating a minimum of 450 variations, with an additional 145 combined fact variations (for a total of 600+ utterances).

It’s not clear for me how this solves or is different from managing 150 intents and their accompanying training examples; specifically, it feels like the marketing statement the maintenance efforts are exponential with the number of intents is somewhat misleading, since from what I gather “facts” are essentially the equivalent to “intents” (in terms of data structure) with OpenBook, and require the same amount of effort.

Benchmarking

Also not clear is whether the benchmark numbers provided by Botpress take into account the time required to create and format this training data.

I’m fairly certain that if I was provided the

data ready in a CSV format, loading it into something like Amazon Lex, DialogFlow, Watson or any other intent-based NLU provider, where

could be done in a few minutes.

OpenBook’s NLU is probably powered by some kind of semantic similarity, since they say it’s based on large language models. They are definitely not the first to provide this kind of capabilities, however I do imagine that the performance can lead to improved, or at least different results compared to traditional “trained” NLU models.

As you can see, I’m very eager to get to the bottom of the “intent vs. intent-less” claims and debate, because I get the feeling that the premise (that intents don’t scale) is really a question of the tooling used to manage the underlying data.

https://openbook.botpress.com
https://www.humanfirst.ai
https://www.humanfirst.ai

Intent Management

A lot of platforms have recognised this and are starting to make it much easier to address the longer-tail of NLU by making intent data management more powerful. Specifically, some of the leading conversational AI platforms (Cognigy, Nuance Mix, Kore.ai etc) recognise that intents need to evolve beyond the flat list they are currently today.

If we want to make better use of them, leading to what I’d call “intent augmentation” features and capabilities being added to unlock more value around intents:

  • Hierarchical and nested intents
  • Intents that can be switched on & off
  • Weights
  • Thresholds that can be set on a per-intent level
  • Sub-patterns within intents, sub-intents and follow-up intents

Even more exciting is the availability of solutions focusing on solving the data management and scalability of NLU itself; HumanFirst is one of the leading platforms in this category, and provide a no-code data engineering solution that integrates with conversational AI platforms (like Rasa, DialogFlow etc), and allows teams to build NLU models from the bottom-up (starting from the raw unstructured data).

Aided workflows that allow continuous improvement and scalability. For example:

  • Intent Merging,
  • Intent Splitting,
  • Built-In Model Evaluation Reports,
  • Intent Disambiguation etc.

HumanFirst claims that their customers are able to scale NLU models to hundreds and thousands of intents & entities this way. While continuing to use their existing intent-based frameworks like DialogFlow or Watson for the dialog management and business logic.

Conclusion

I’ve been exploring a number of platforms recently, and feel convinced that the pain that led to solutions like OpenBook come from their developers having had to build and scale intent-based NLU models using tools like Excel in the past. This is indeed painful, as most conversation designers will know 🙂.

I’m not convinced that moving from Excel to a markdown format (like OpenBook requires) is any improvement.

In fact, I’d argue that maintaining the training data in a markdown word format will be waaaaaaay more painful than Excel at scale.

For example, it appears that continuously improving the Book using new unstructured data (for example, from the questions users ask the bot once deployed) requires copy-pasting from the conversation logs into this markdown format; this is not only a very painful methodology, but also doesn’t solve for the problem of identifying new facts that should be part of the knowledge, and introducing them into the Book without creating ambiguity (which is where a tool like HumanFirst shines).

I was curious to get more answers and clarification during the VoiceLunch presentation of OpenBook, however it was a less technical talk than I had hoped so I think I’ll need to play around with the beta before I can post an update. 🙂

If you’re curious about OpenBook, the Botpress team is holding a webinar to explain how it works and why it’s game-changing: if you attend, I’d love your feedback and thoughts!

Read the Botpress OpenBook whitepaper.

--

--