Intents Are Not Going Away…RoNID Is A New Intent Discovery Framework

The Robust New Intent Discovery (RoNID) framework strives to identify known intents and reasonably deduce new intent groups in open domain scenarios.

5 min readApr 26, 2024

--

The Problem

Traditional chatbot systems rely heavily on intents. Intents are solely based on pre-defined (often thought-up) assumption on what is believed the conversations users want to have.

Hence traditional intents have a limited ability to only identify pre-defined and constrained intent classes. The attempts to remedy this ailment of chatbots included out-of-domain detection and knowledge base fallback which has in recent times morphed into RAG approaches.

New user intents continuously emerge from customer facing implementations, these new intents often arise from new products and services introduced to an organisation. Or system failures, product defects or problems with product or service onboarding, and more.

These new intents need to be dynamically uncovered and clustered. RoNiD aims to create a framework, which includes RLHF via a weak supervision where new intents are identified and pseudo labels confirmed.

Introduction

The study focusses on establishing reliable pseudo-labels and obtaining cluster-friendly discriminative representations.

The two models made use of are:

  1. Reliable pseudo-label generation module &
  2. Cluster-friendly representation learning module.

The text describes a process called RoNID, which generates reliable synthetic labels and cluster-friendly representations.

In simpler terms, it means that RoNID creates accurate labels and organises data in a way that makes it easier to understand. This is done through two main steps:

Label Generation

RoNID assigns accurate labels to data by solving a specific problem; this helps provide clear guidance for further analysis.

Representation Learning

RoNID organises the data so that similar items are grouped together (intra-cluster compactness), while different groups are well-separated (inter-cluster separation). This step makes it easier to see patterns and differences in the data.

By repeating these steps, RoNID creates a reliable model with accurate labels and well-organised data. Tests show that this method significantly outperforms previous techniques by a large margin, improving results by between 1 to 4 points on various benchmarks.

Intent & Dialog

Understanding and identifying user intent accurately are important to downstream task-oriented dialogue systems, which wield direct influence over the user experience. If the intent is identified incorrectly, the conversation flow presented to the user does not match the user’s intent.

Subsequently the user tries to digress from one flow to another flow; and if digression is not planned for, there is more frustration for the user.

Out-Of-Domain

The image below, from the study…

The scenario at the top, (a) shows how known and novel (new) intents are grouped together. And scenario (b) shows the RoNID approach where known and new/unknown intents are separated based on reliable pseudo-labels and cluster representations.

NID

Semi-supervised NID typically employs the k-means algorithm for pseudo-label assignment and learns discriminative intent features.

The RoNID framework gets dependable pseudo-labels by solving a specific problem in one step, and in another step, it learns to organize data in a way that’s easy to understand by combining different types of losses.

Finally

In this study, the researchers introduce an EM-optimised RoNID framework for the NID problem. It consists of two main parts: a reliable pseudo-label generation module and a cluster-friendly representation learning module.

The pseudo-label generation module ensures accurate supervision by assigning precise pseudo-labels through solving a specific problem. The representation learning module enhances the quality of representations by focusing on both intra- and inter-cluster differences. This helps in distinguishing between known and novel intents.

Their experiments show that RoNID is effective and performs better than previous state-of-the-art methods.

RoNID uses an iterative approach to improve model performance by creating reliable pseudo-labels and organising data into clusters.

The method involves three main steps:

  1. First, pre-train a feature extractor using both labeled and unlabelled data for better knowledge transfer.

2. Then, enhance the accuracy of pseudo-labels by solving a specific problem.

3. Finally, introduce intra-cluster and inter-cluster contrastive learning to create distinct clusters of representations for both known and novel intents.

To provide high-quality supervision signals for the representation learning module, the study propose to generate reliable pseudo-labels for guiding the model training, thereby transforming unsupervised training samples into pseudo-supervised samples.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn

--

--

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com