Implementing Data-Centric AI For NLU Models

Andrew Ng has coined & is championing the concept of Data-Centric AI. Data-Centric AI is the discipline of engineering input data for AI models and the same principles apply to discovery and structuring NLU training data.

Cobus Greyling
5 min readNov 28, 2022

--

Introduction

In Conversational AI, the development of chatbots and voicebots have seen significant focus on frameworks, conversation design and NLU benchmarking.

Development frameworks have reached high efficiency in conversation state development and conversation design. And an increasing number of vendors are agreeing on the fact that differentiation between NLU Models are becoming negligible.

So this begs the question, how can the current state of platform-parity be broken and true CX differentiation achieved?

The answer lies in a data centric approach to creating NLU training data…

Data Centric Intent Discovery & Development

Chatbot development is in dire need of a data centric approach, where laser focus is given to the selection of unstructured data, and turning the unstructured data into NLU Design and Training data.

Chatbots fail primarily due to two reasons…the first reason is that developed intents are not aligned with user intents. The second reason is that intents are not flexible, you need to be able to easily and on an ongoing basis:
▪️ Merge Intents
▪️ Split Intents
▪️ Create hierarchal or nested intents
▪️ Intent Discovery and maintenance.

A data-centric approach to chatbot development begins with defining intents based on existing customer conversations. An intent is in essence a grouping or cluster of semantically similar utterances or sentences. The intent name is the label describing the cluster or grouping of utterances.

This example is creating semantically similar clusters using the Cohere embeddings. The output is a list of recently published AI papers.

There are various tools creating the groupings or clusters, above is an example using the Cohere embeddings.

Another graphic tool for exploring and saving similar sentences is called Bulk.

Below is an example of Bulk showing how a cluster can be graphically selected and the designated sentences displayed. The list of utterances which form part of the selection constitutes an intent. And the grouping can be saved as part of the engineering process of structuring NLU training data.

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

Considering the image below, the process of creating intents from existing conversational data increases the overlap of existing customer conversations (customer intents) with developed intents. Alignment between these two elements are crucial for a successful Conversational AI deployment.

Human-In-The-Loop Intent Management

Intents are indeed the frontline of any chatbot implementation and define which conversations users can have. For reasons of efficiency and scaleability, intent creation and management at scale demands an accelerated latent space where an AI-assisted weak-supervision approach can be followed.

The process of intent management is an ongoing task and necessitates an accelerated no-code latent space where data-centric best-practice can be implemented.

As seen in the image above, intent management is not only managing the labels and training data, but also intent management. Intent management include intent splitting, merging, hierarchies and moving intents.

An ongoing process of NLU Design and intent management ensures intent-layer of Conversational AI implementation remains flexible and adapts to users’ conversations.

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

https://www.linkedin.com/in/cobusgreyling
https://www.linkedin.com/in/cobusgreyling

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com