Photo by Luke Paris on Unsplash

Four Implementations For Conversational AI

Select The Right Technology For The Right Purpose

Cobus Greyling
15 min readOct 7, 2021

--

Introduction

A very exciting sub-section of Artificial Intelligence is Conversational Artificial Intelligence.

In essence, ingesting human conversation in text or audio format. And subsequently structuring, unstructured conversational data. Based on analysis of the structured data, action can be taken.

In the past conversational data was also referred to as dark-data, as the information in the data was not really accessible; this has changed.

Processing conversational data, can be performed in real-time or off-line. It is important to understand what the problem is you are trying to solve, and then decide on the right technology.

In broad terms there are four general Conversational AI implementation and use-case scenarios:

  • Conversational Agents
  • Natural Language Processing (NLP)
  • Text-In-Text-Out
  • Codex

Brief Overview: Four Conversational AI Use-Cases

Looking at a breakdown of the key features of each implementational approach and use-case scenario…

In the following section each of these four will be discussed in greater detail.

Conversational Agents

A real-time conversational agent implementation, using a voice or text, requires a chatbot development framework.

This development framework demands fine-tuning in the areas of Natural Language Understanding (NLU), Dialog State Management and Bot Messages/Response Text.

The Four Pillars Of Traditional Chatbot Architecture

So, one can say that traditionally chatbots or conversational AI agents, are constituted by a four pillar architecture.

In more detail, this architecture is very much universal across commercial chatbot platforms; being:

  • Intents (NLU)
  • Entities (NLU)
  • Bot Responses (aka Script / Bot Dialog)
  • Sate Machine (Dialog) Management

This is a real-time, turn-by-turn conversation between the user and the chatbot.

Natural Language Processing (NLP)

There is a big difference between Natural Language Processing (NLP) tools and a chatbot development framework.

A typical implementation and positioning of NLP within a chatbot framework environment.

Common NLP tools include Q&A, classification, summarization, key word extract, named entity extraction etc.

These tools can be implemented as a top tier in a chatbot technology stack of a chatbot.

Acting as a pre-processing layer for user input.

This processing can include sentence boundary detection, language identification etc.

Apart from a first high-pass layer for chatbots, NLP can be used to structure conversational data like live agent chats with customers. emails, phone conversations etc.

NLP can be used to summarize text, extract named entities, key words, concepts and more. HuggingFace and OpenAI’s Language API based on GPT-3 lead the charge here.

Text-In-Text-Out

The Open AI Language API based on GPT-3 finds itself to a large extend alone in this category. It is a language API with a text-in-text-out approach which can muster a completely natural and interactive conversation. With seamless, coherent and complete natural language generation (NLG), dialog state management and conversational context all maintained automatically.

With only a few lines of training data.

Currently, the OpenAI API (GPT-3) cannot completely replace a chatbot development framework. It is not intended to be a chatbot development framework. As I mentioned, it lacks the fine-tuning elements in terms of intents, entities, dialog state management and bot return dialog or prompts.

This example of the GPT-3 playground shows how the chatbot is described in one sentence.

However, an ideal implementation of GPT-3 will be where it is used to augment and enhance an existing chatbot.

From the Open AI documentation, it is clearly stated that GPT-3 provides a general purpose interface, for text-in and text-out procedures.

Hence the OpenAI API is ideal to perform virtually any language text task. And in this lies their differentiator.

Most API’s are designed to perform a single language task. Such as sentiment, intent extraction, named entitles etc. Below you will find a a few examples of how GPT-3 can support and augment a current chatbot implementation.

Codex

Often, technology at its infancy and inception seems rudimentary, awkward and redundant. Invariably discussions ensue on the new tech’s viability and right to existence, comparing it to technologies steeped in history and innumerous iterations.

Codex needs to be seen for what it is. A first foray into a new field, and very impressive at that.

What makes Codex of interest is that it is a new application to Natural Language Understanding (NLU).

In this instance, NLU is not used for self-service, or customer care etc. NLU is not used for bi-directional conversation or even a text-in-text-out situation.

But we are going from highly unstructured natural conversational input, to a highly structured medium; code.

In essence, OpenAI Codex is an AI system that translates natural language into code.

Codex powers GitHub Copilot, which OpenAI built and launched in partnership with GitHub recently. Codex can interpret simple commands in natural language and create and execute code. NLU to applications.

This is a new implementation of natural language, one could argue.

Conversational Agents

Most chatbot architectures consist of four pillars, these are typically intents, entities, the dialog flow (State Machine), and scripts.

Traditional Chatbot Architecture

The First Pillar: Intents

In most chatbot design endeavors, the process starts with intents. But what are intents? Think of it like this…a large part of this thing we call the human experience is intent discovery. If a clerk or general assistant is behind a desk, and a customer walks up to them…the first action from the assistant is intent discovery. Trying to discover what the intention of the person is entering the store, bank, company etc.

We perform intent discovery dozens of times a day, without even thinking of it.

The Google search engine can be considered as a single dialog-turn chatbot. The main aim of Google is to discover your intent, and then return relevant information based on the discovered intent. Even the way we search has inadvertently changed. We do not search with key words anymore, but in natural language.

Intents can be seen as purposes or goals expressed in a customer’s dialog input. By recognizing the intent expressed in a customer’s input, the assistant can select an applicable next action.

Current customer conversations can be instrumental in compiling a list of possible user intents. These customer conversations can be data via speech analytics (call recordings) or live agent chat conversations. Lastly, think of intents as the verb.

The Second Pillar: Entities

Entities can be seen as the nouns.

Entities are the information in the user input that is relevant to the user’s intentions.

Intents can be seen as verbs (the action a user wants to execute), entities represent nouns (for example; the city, the date, the time, the brand, the product.). Consider this, when the intent is to get a weather forecast, the relevant location and date entities are required before the application can return an accurate forecast.

Recognizing entities in the user’s input helps you to craft more useful, targeted responses. For example, You might have a #buy_something intent. When a user makes a request that triggers the #buy_something intent, the assistant's response should reflect an understanding of what the something is that the customer wants to buy. You can add a product entity, and then use it to extract information from the user input about the product that the customer is interested in.

The Third Pillar: Dialog Flow

The dialog contains the blocks or states a user navigates between. Each dialog is associated with one or more intents and or entities.

Dialog Example from IBM Watson Assistant

The intents and entities constitute the condition on which that dialog is accessed.

The dialog contains the output to the customer in the form of a dialog, or script…or wording if you like.

This is one of the most boring and laborious tasks in creating a chatbot. It can become complex and changes made in one area can inadvertently impact another area.

A lack of consistency can also lead to unplanned user experiences. Scaling this environment is tricky especially if you want to scale across a large organization.

The Fourth Pillar: Script

Scripts are the wording, the messages you will be displaying to the user during the course of the conversation to direct the dialog, and also inform the user.

IBM Watson Dialog Node with Node Response Text

The script is often neglected as it is seen as the easy part of the chatbot development process.

The underlying reason for this may be that the script is often addressed at the end of the process, and it not being technical in nature, it is seen as menial.

The importance of the script should be considered in the light that it informs the user on what the next step is. Or what options are available in that particular point of the conversation, or what the expectations are of the user.

A breakdown in the conversation often due to the dialog not being accurate. Multiple dialogs can be sent, combining messages. On inaction from the user, follow-up explanatory messages can be sent.

Natural Language Processing (NLP)

Here are a few practical examples of how 🤗 HuggingFace can be implemented within an existing chatbot development framework. Many of these functions exist in the OpenAI Language API (GPT-3) and other NLU tools.

Sentiment Analysis

Classifying sequences according to positive or negative sentiments.

Input:

classifier = pipeline("sentiment-analysis")
classifier("I am not impressed with their slow and unfriendly service.")

Output:

[{'label': 'NEGATIVE', 'score': 0.9987296462059021}]

Question And Answer

Input:

q_a = pipeline("question-answering")context = "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System, being larger than only Mercury. In English, Mars carries the name of the Roman god of war and is often referred to as the Red Planet. The latter refers to the effect of the iron oxide prevalent on Mars's surface, which gives it a reddish appearance distinctive among the astronomical bodies visible to the naked eye.[18] Mars is a terrestrial planet with a thin atmosphere, with surface features reminiscent of the impact craters of the Moon and the valleys, deserts and polar ice caps of Earth."question = "Who is the Roman God of war?"q_a({"question": question, "context": context})

Output:

{'answer': 'Mars', 'end': 4, 'score': 0.4511910080909729, 'start': 0}

Text Generation

Input:

text = "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System."text_generator = pipeline("text-generation")text_generator(text)

Output:

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.[{'generated_text': 'Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System.\n\nThe Sun takes about 70 percent of the solar wind energy that travels to the Sun and about 100 percent is stored in the Sun.\n\n'}]

Named Entity Recognition

Input:

ner = pipeline("ner")text = "Johannesburg is located in South Africa in the contimnet of Africa"ner(text)

Output:

[{'end': 12,   'entity': 'I-LOC',   'index': 1,   'score': 0.9987455,   'start': 0,   'word': 'Johannesburg'},  {'end': 32,   'entity': 'I-LOC',   'index': 5,   'score': 0.99958795,   'start': 27,   'word': 'South'},  {'end': 39,   'entity': 'I-LOC',   'index': 6,   'score': 0.9996102,   'start': 33,   'word': 'Africa'},  {'end': 66,   'entity': 'I-LOC',   'index': 14,   'score': 0.6802336,   'start': 60,   'word': 'Africa'}]

Translation

English to German:

translator = pipeline("translation_en_to_de")text = "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System."translator(text)[{'translation_text': 'Mars ist der vierte Planet der Sonne und der zweitkleinste Planet im Sonnensystem.'}]

English to French:

translator = pipeline("translation_en_to_fr")text = "Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System."translator(text)[{'translation_text': 'Mars est la quatrième planète du Soleil et la deuxième plus petite planète du Système solaire.'}]

Text-In-Text-Out

The GPT-3 based OpenAI Language API is a text-in-text out API. It is not a chatbot development framework.

The is an extract from the OpenAI playground, where a single line of training data is given. Each dialog turn is denoted with Human and AI.

Above is an extract from the OpenAI playground, where a single line of training data is given. Each dialog turn is denoted with Human and AI.

A chatbot development development framework demands the existence of seven elements which constitutes fine-tuning (there can be more).

Training Data: “The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.” Intent-less and natural language generation.
  1. Forms & Slots
  2. Intents
  3. Entities
  4. Natural Language Generation (NLG)
  5. Dialog Management
  6. Digression
  7. Disambiguation

The OpenAI API allows for basic training with a few lines of example utterances, giving the API a basic gist of your application.

Fine tuning is available, but again this is a way of influencing the text-out portion. And not comprehensive fine-tuning per se.

Fine-tuning in OpenAI Language API based on GPT-3 is currently in beta and might change considerably in the near future.

In this example I wanted to create a fairly complicated dual-player application. The red and blue block can be moved up and down with keyboard keys. The scores can be reset and the game stopped and started.

The OpenAI API is impressive none the less; but implementation scenarios currently is limited.

Some implementation scenarios can include:

  1. A standalone general chatbot for general conversation. Or a question-and-answer Wikipedia-like chatbot.
  2. A companion bot, for general conversation or assistance.
  3. Copywriting assistant for idea generation, summaries, title and description generation and content.
  4. The second implementation is a Support language API for a conventional Chatbot implementation. These can cover:
  • Grammar Correction
  • Text Summarization
  • Keywords
  • Parse Unstructured Data
  • Classification
  • Extract Contact Information
  • Summarize For A Second Grader

Codex

One can argue that Codex created their very own niche.

In short, here are a few initial observations…

But, there are some defernite niche applications, these can include:

  • Solving coding challenge and problems in certain routines.
  • Establishing best practice.
  • Quality assurance.
  • Interactive Learning
  • Generating specific components for subsequent human review.
  • Debugging.
  • Generating code comments.

Codex will not be replacing developers; at least not in the foreseeable future.

One challenges are to understand what the problem is we are trying to solve. Especially if that problem is complex.

Secondly, the input to Codex needs to be modularized and broken down for Codex to process.

Here are a few initial observations…

  • Don’t get too creative and elaborate in your natural language description.
  • You are basically writing the algorithm out line by line in natural language.
  • Everything does not have to be explicit. Implicit command are picked up amazingly well.
  • Spelling mistakes are also detected and catered for.
  • Somehow context is maintained.
  • Code can be reviewed in the side pane, manually edited and saved.
  • You can refer to functions, variables etc. from the natural language perspective.
  • Codex is ideal as a tool to automate tasks and create utilities.
  • Codex is well suited as a tool to learn what well formed looks like.
The Codex window for JavaScript very much resembles a chatbot interface. Very much simplistic and minimalistic.

The third is to use natural language as the input for Codex to generate the code.

This is obviously the harder avenue and when crafting code like this. It is evident that:

  • Problems need to be broken down into a smaller step-by-step like process.
  • I have seen Codex detect implicit coding elements, but for the most part instructions need to be explicit.
  • Codex does not do well with ambiguity; you need some coding knowledge to be able to break your problem down into sequential steps.
  • See Codex as a coding assistant. The ultimate Stack Overflow-like resource with instant answers.

Do not see it as a an software and solution orchestrator. It’s a tool to create AI-enabled developers.

The Codex window for JavaScript very much resembles a chatbot interface.

Very much simplistic and minimalistic.

Practical JavaScript Example

One of the easiest ways to get started with Codex is opting for JavaScript. There are a few implementations of JavaScript you can start with:

  • A simple website
  • Games
  • web components like color and date pickers
  • Date and time functions

In the example below I wanted to create a fairly complicated dual-player application. The red and blue block can be moved up and down with keyboard keys. The scores can be reset and the game stopped and started.

In this example I wanted to create a fairly complicated dual-player application. The red and blue block can be moved up and down with keyboard keys. The scores can be reset and the game stopped and started.

In this example I wanted to create a fairly complicated dual-player application. The red and blue block can be moved up and down with keyboard keys. The scores can be reset and the game stopped and started.

Below you see the sequence of NLU inputs which was used to create the application. Areas which were tricky were defining what a hit is, and what must be counted as a point.

I was able to to say, make the image ball smaller, and Codex can respond to that. This is a very relative and perhaps ambiguous input, which is well executed.

When an ambiguous input is issued like: make button bigger, the context is maintained and the editing is attributed to the button created in the previous step.

create a blueBlock on the left
move the block up when q is pressed and down when a is pressed
create a redBlock on the right
make the background green
move the redBlcok up when o is pressed and down when p is pressedAdd the image called ball: https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.6n_nHXX-Cn10IKC0T7A72AHaHa%26pid%3DApi&f=1crop the image ball circularly
make the image ball smaller
disable scrollbars
animate the ball to move both horizontally and vertically bouncing off the sides
draw a vertical line in the middle of the screen
make the line white and of 20px width
create a variable called redScore
create a variable called blueScore
Display the value of the variable redScore in the top right
make it bigger and bold
Display the value of the variable blueScore in the top left*/
make it bigger and bold
make the blueScoreDisplay red
make the redScoreDisplay reod
check every 10ms if the image ball and the blueBlock overlaps
if true, increment blueScore
check every 10ms if the image ball and the redBlock overlaps
place a button on the bottom left corner called blueButton
when clicked, zero the variable blueScore
make button bigger
place a button on the bottom right corner called RedButton
when clicked, zero the variable redScore
make the button bigger
add a white horizontal line in the middle of the screen 20px wide
Add a large button called Stop in the middle bottom of the screen
when the button is clicked, freeze the image ball
Add a large button called Start in the middle bottom of the screen
when the button is clicked, unfreeze the image ball
make the start button larger
make the stop button larger
make the start button blue
make the stop button red

You need to have an idea of what you want to start with and achieve. Then break it into smaller steps, and executed those steps.

Each step can be tested as you go along. If you just continue without testing at regular intervals, there are sure to be aberrations to the application, not executing correcting.

Another feature is adding web components like date picker, or color picker etc.

The utterance Add a dropdown list with the months of the year created this dropdown.

The utterance Add a dropdown list with the months of the year created this dropdown.

Just by saying, add a dropdown list with the months of the year, the JavaScript is created and displayed.

The code can be copied out of the Codex environment, or Exported to JSFiddle.

Hence my notion of using Codex as a quick reference. In a sense Codex reminds me of a site like Stack Overflow. The only difference being that as you ask your question and the code is created for you automatically without waiting for an expert to respond.

Depending on the cost of Codex and general availability, it could evolve as a fast and efficient technical resource.

/* Add a dropdown list with the months of the year */var months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'];
var monthSelect = document.createElement('select');
for (var i = 0; i < months.length; i++) {
var monthOption = document.createElement('option');
monthOption.innerHTML = months[i];
monthSelect.appendChild(monthOption);
}
document.body.appendChild(monthSelect);

In this last example a timer is set for 10 second increments.

A square is added, and Codex defaults to the size you see below. Lastly, the square color changes every 10 seconds.

Creating a timer for every 10 seconds on which the date/time updates and the color of the square changes.

Creating a timer for every 10 seconds on which the date/time updates and the color of the square changes.

Conclusion

Chatbot development frameworks continue to evolve at a rapid rate, with the introduction of concepts like intent deprecation, Natural Language Generation and Dialog State Machine deprecation.

Also in some instances intents and entities are closely coupled for a tight feedback loop.

NLP technology is becoming more accessible and easy to implement and use.

And OpenAI is actively creating completely new Language Understanding and processing interfaces.

--

--

Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI. www.cobusgreyling.com