The Cobus Quadrant™ Of NLU Design

NLU design is vital to planning and continuously improving Conversational AI experiences.

What Is NLU Design?

NLU Design is an end-to-end methodology to transform unstructured data into highly accurate and custom NLU.

1️⃣ Explore

Centralise Unstructured Data (Conversational, Utterances, Documents). Designing highly quality NLU starts from the ground truth: unstructured data. The ability to ingest and centralise different types of data and formats (including multi-turn conversational dialogs) is critical.

2️⃣ Design

Human-In-The-Loop (HITL) Intent & Entity Discovery & ML-Assisted Labelling. Human-In-The-Loop training helps with the initial labelling of clusters which can be leveraged for future unsupervised clustering. Machine Assisted Labelling fast-tracks manual tasks.

3️⃣ Evaluate

Model Evaluation & Fine-Tuning Results involves the ability to generate test a trained model’s performance (using metrics like F1 score, accuracy etc) against any number of NLU providers, using techniques like K-fold split and test datasets.

4️⃣ Improve

Intent Refactoring Flows (Merge, Split). Intents needs to be flexible, in terms of splitting intents, merging, or creating sub/nested intents, etc. The ability to re-use and import existing labeled data across projects also leads to high-quality data.


HumanFirst is the only tool focused on providing the full NLU design end-to-end capabilities; with a pluggable data pipeline it allows teams to integrate different NLU providers (for model training and evaluation) as well as ability to incorporate large language models to power the core ML-assisted workflows (like semantic search & clustering).

Snorkel AI

Snorkel AI has a programatic approach to data exploration and labelling. Their focus is to accelerate time to value with a transformative programmatic approach to data labelling.

Rasa X

Rasa X serves as a NLU inbox for reviewing customer conversations, filtering conversations on set criteria and annotation of entities and intents.


Cognigy has an intent analyser where intent training records can be imported. With a Human-In-The-Loop approach, records can be manually added to an intent, skipped or ignored. Export and import of the Intent Trainer records are possible by date range.

Nuance Mix

Nuance Mix auto-intent functionality analyse and group semantically similar sentences. In turn these clusters can be examined by the user by accepting or rejecting entries by visual inspection.

Oracle Digital Assistant

Documents from HTML and PDF can be uploaded and intents defined. Intent names are auto-generated together with a list of auto-generated utterances for each intent. The auto-generated sentences for each identified intent reminds of Yellow AI’s DynamicNLP.

IBM Watson

What I like about the IBM Watson approach is the ease of supervision by the user. Data can be uploaded in bulk, but the inspecting and adding of recommendations are manual allowing for a consistent and controlled augmentation of the skill.

DialogFlow CX

DialogFlow CX has a built-in test feature to assist in finding bugs and prevent regressions. Test cases can be created using the simulator to define the desired outcomes.

Kore AI

Kore AI has a batch testing facility and a dashboard displaying test summary results for test coverage, performance and training recommendations. Multiple test suites can be used for validations of intent identification capabilities of a NLU model.

Amazon Lex

The two big disadvantages of Lex V2 intent detection implementation is data size, 10,000 records are required. Added to this, data must be in a Contact Lens output files JSON format.


Dashbot is pivoting from a reporting tool to a data discovery tool focussing on analysing customer conversations and clustering those conversations into semantically similar clusters with a visual representation of those clusters.


QBox focusses on analysis and benchmarking chatbot training data. Training data can be visualised to gain insights into how NLP data is affecting the NLP model.


Botium focusses on testing in the form of regression, end-to-end, voice, security and NLU performance.

Yellow AI

Yellow AI does have test and comparison capabilities for intents and entities, however it does not seem as advanced as competing frameworks like Cognigy or Kore AI.

In Closing

The aim of this comparison is to explore the intersection of NLU design and the tools which are out there. Some of the frameworks are very much closed and there are areas where I made assumptions.



Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; NLP/NLU/LLM, Chat/Voicebots, CCAI.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cobus Greyling

Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; NLP/NLU/LLM, Chat/Voicebots, CCAI.