Photo by Brxxto on Unsplash

Testing Complex Utterances With The Co:here & HumanFirst Integration

Intent Detection With Longer User Utterances

Cobus Greyling
5 min readJun 21, 2022



This is the second post I have written on the HumanFirst and Co:here integration, you can find the first post here.

For this prototype I used the ATIS Airline Travel Information System dataset from kaggle. This dataset consists of 5,000 user utterances which are labeled with intents. Some utterances are labeled with hierarchal intents to test the management of this feature within Studio.

Also, within HumanFirst Studio I created two sets of tests.

The first test was based on the dataset with labeled intents.

The second test was performed on the dataset with all intent labelling removed.


The objectives of this article are the following…

  • To steer away from simple and clean data, but make use of complex user utterances which are longer, with compound intents and entities. Below are a few example sentences to illustrate the utterance types.
i would like a list of round trip flights between indianapolis and orlando florida for the twenty seventh and the twenty eighth of decemberi would like to find out what flights there are on friday june eleventh from st. petersburg to milwaukee and then from milwaukee to tacoma thank youi would like to find a flight from kansas city to salt lake city on delta and arriving at about 8 o'clock in the evening could you please tell me the aircraft and the flight number thank youi want to travel from kansas city to chicago round trip leaving wednesday june sixteenth arriving in chicago at around 7 o'clock in the evening and returning the next day arriving in kansas city at around 7 o'clock in the evening which airlines fly that route
  • To test disambiguation within HumanFirst Studio.
  • The HumanFirst Studio supports hierarchal intents, and the objective was to put this element to the test with the co:here integration.
  • Lastly, testing the complex utterances labeled with hierarchal intents, followed by testing the utterances with all intent labels removed.

Observations From Prototyping

Here is a summary of observations from the prototyping…

  • The trend continues where Co:here has an outright high confidence intent for test utterances. With HumanFirst NLU identifying the same intent but with a lower confidence percentage, followed by two or tree alternatives, or false positives.
  • These false positives are useful for creating conversational context and disambiguation menus.
  • In evaluation Co:here has better F1, Precision and Recall scores than HumanFirst NLU, but Co:here does not take intents into consideration with fewer than 10 utterances.
  • Co:here performs consistently well on intents with limited training data.
  • The Co:here clustering seemed to be more granular than HumanFirst NLU, with the same clustering configurations. This refined clustering leads to accuracy in defining intents.

NLU Endpoint Query

In this specific comparison non-hierarchal intents were used, and again Co:here has a high single intent confidence, where HumanFirst NLU has the same intent, but with other alternatives of false positives. Again, the false positives are useful for context, disambiguation and follow-up questions.

In both instances, the atis_flight_no intent was found, with confidence of 99% for Co:here and 72% for HumanFirst NLU.

The following example illustrates how well the HumanFirst Studio handles hierarchal intents for both NLU environments. In the Co:here example the primary intent of atis_ground_service is identified, with the sub-intent atis_ground_fare.

HumanFirst NLU also identified the same hierarchal intent of atis_ground_fare/atis_ground_fare, and a false positive of the parent intent atis_ground_fare.


The HumanFirst Studio has an exceptional disambiguation tool, where an intent can be disambiguated against other intents.

Once an intent is selected, the individual utterances of the intent are displayed with confidences, and utterances can be moved across intents. This helps to negate intent overlap and ambiguity.

As marked in the image, during disambiguation Co:here has higher confidences and by implication less false positives.


It is evident from the POC, that the HumanFirst Studio has the flexibility to integrate to different NLU API’s, including Large Language Models. This enables HumanFirst Studio to address the long-tail of NLU by leveraging LLM’s.

This is also achieved via a non-technical, no-code interface, democratising access to advanced language processing.

It has to be stated, that the LLM do yield better results, but smaller models have a definite place and purpose, with advantages like local or private cloud installations. Better cost management, leveraging open-source, etc.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.