Solving For The Long Tail Of Intent Distribution

The long tail of intent distribution can be successfully addressed by leveraging the first two steps of NLU Design

Cobus Greyling
4 min readNov 16, 2022


General Assumptions

The discipline of NLU Design addresses a number of key challenges in developing a virtual assistant.

One of these key challenges is solving for the long tail of NLU in general, and the long tail of intent distribution in specific.

There is a school of thought that long tail intents can be neglected and dismissed as edge cases.

The argument is made that solving for long tail intents is not really feasible and energy should rather be focussed on the well-worn intents and conversation paths most users will follow.

🙏🏽 Please click here and follow me on LinkedIn. 🙂

These assumptions are wrong for the following reasons…

Firstly, considering the 2x2 matrix below, the long tale constitutes a high percentage of conversational intents within the chatbot domain. Hence the long tail is a significant amount of intents (conversations) users want to have.

Secondly, the long tail exists within current customer conversations of the organisation. But the appropriate methods and technologies are not used to detect those intents and include them in designing NLU training data.

Some still see solving for the long tail of intents difficult because previously the line between short and long tail was artificially constrained by the existing technology’s lack of ability to understand context between the different intents.

For example, bots with 30–100 intents were the norm and these intents were normally thought up, and user example phrases made up or artificially generated. Unfortunately in most instances of chatbot development this is still the case.

With NLU Design, the long tail can be drastically shortened, or completely eradicated…here’s how: ⬇️

How To Leverage NLU Design

1️⃣ As I mentioned earlier, the long tail of intents, topics, semantic similarities, etc are all imbedded within existing customer conversations.

2️⃣ These conversations can be in the form of conversation transcripts, call logs, customer reviews, emails, surveys and more.

3️⃣ Centralised unstructured data (Conversations, Utterances, Documents, etc) can be explored using semantic search & unsupervised clustering.

4️⃣ Clusters of similarity are in essence intents, which can be scaled in cluster size and granularity. Clusters/intents should then be named, split, merged or organised according to hierarchies.

NLU Design ensures that ALL customer intents are detected, surfaced and quantified. While addressing the long tail of intent distribution in a complete and sustainable manner.

5️⃣ Intent & entity discovery & ML-assisted labelling can be performed as continued maintenance of the NLU model.

🙏🏽 Please click here and follow me on LinkedIn. 🙂

And Lastly

In most cases, the reason chatbots fail, is due to developed business intents not being aligned with customer intents. The process I outlined here, ensures that the starting point is indeed existing customer intents and conversations the customers want to have.

The process of NLU Design supersedes the antiquated process of gathering business requirements for a conversational experience.

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.