Chatbots: Creating Natural Language From Structured Data

Presenting Structured Data In An Unstructured Format

Cobus Greyling

--

Introduction

But let us first have a look at the basic translation taking place within a chatbot…

The allure of a chatbot is being able to input unstructured data. We are so use to having to structure our input and data according to what the user interface dictates.

Here chatbots come along, and allow us to enter our data in a conversational manner.

And by implication, unstructured.

The Continuous Chatbot Process: Structuring & Unstructuring Data

For user input, the chatbot must structure the data. A large part of this structuring can include the following activities:

  • Sentence Boundary Detection (helpful for longer input)
  • Language Detection (scenarios where users speak different languages)
  • Intent Detection
  • Determining Entities
  • and more…

Inversely, the data output to the user must be unstructured again into natural language…

Speaking To The User

After the appropriate response to the user have been determined by the chatbot, the data which needs to be presented to the user, is in a structured format.

In the case of a weather bot, the data you want to present to the user might look something like this:

{
"id": 803,
"main": "Clouds",
"description": "broken clouds",
"icon": "http://openweathermap.org/img/wn/04d@2x.png",
"weather": "Clouds",
"temp": 80,
"high": 82,
"low": 78,
"city": "New York"
}

Under normal circumstances, to present this to a user via a mobile app or website is standard procedure. With a conversational interface, it is a whole different matter.

We need to convert the data into conversation, hence unstructured it. This brings us to this continuous process of structuring and unstructuring data.

This process of unstructuring data into conversation is referred to as Natural Language Generation, NLG.

Natural language generation

Natural language generation is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations.

Basics Of Natural Language Generation (NLG)

As with everything, NLG can be performed on various levels of complexity. The most simplistic approach is to have a one-to-one match of return codes and phrases.

If the API returns 0, then the bot responds with “Thank you, your request have been logged”.

Else if the API responds with 1, the bot responds with “Sorry, something went wrong, try again later.”

You could see this as a very basic form of unstructuring data.

The Illusion of Liveness

Of course, you can take this one step further, by creating an illusion of lifeness. Some development environments, like IBM Watson Assistant allows for multiple responses to be defined per conversational node.

IBM Watson Assistant — Assistant Responses

These responses can then be set to random or sequential. In the example shown here, there is a list of goodbye messages. This list of messages can be extensive, and set to be different every time the user says goodbye to the chatbot.

Hence presenting this idea to the user of an unscripted and spontaneous agent.

Scripted Language Generation

Taking matters one step further, is creating a language generation script.

Microsoft’s Bot Framework Composer has a Bot Response option on the left, where you can define the bot responses.

Language Generation Script

In the marked example, a Language Generation script is defined called:

#DescriberWeather

The purpose of this example is to take the response from the weather API, and transform it into more natural sounding language. If the API returns “Dust”, we want our chatbot dialog to return: “There’s dust in the air” etc.

We can create multiple such scripts quick and easy for different API’s, and scenarios.

Calling Language Generation Script From Dialog

And within the Send a response element, we can reference the language script for user feedback:

- @{DescribeWeather(dialog.weather)} and the temp is @{dialog.weather.temp}°

This affords us a predictable and standardized avenue of crafting responses for the user. Just think of multiple user languages in a chatbots, where the language generator can be used to respond to the user in a particular language.

Ease Of Scaling

One issue chatbot endeavors often run into, is scaling. Invariably there comes a stage where the environment and framework need to be reconsidered.

Segmenting chatbot elements as much as possible help to a large degree.

And, segmenting the script/dialog from the dialog flow is prudent, and the Language Generator speaks to this.

But why not take it even a step further…

The Inverse of Natural Language Understanding

NLG is a software process where structured data is transformed into natural conversational language for output to the user. In other words, structured data is presented in an unstructured manner to the user. Think of NLG is the inverse of NLU.

With NLU we are taking the unstructured conversational input from the user (natural language) and structuring it for our software process. With NLG, we are taking structured data from backend and state machines, and turning this into unstructured data. Conversational output in human language.

Commercial NLG is emerging and forward looking solution providers are looking at incorporating it into their solution. At this stage you might be struggling to get your mind around the practicalities of this. Below are two practical examples which might help.

Fake Product Review Generator

For this example I took close to 580,000 product reviews and created a TensorFlow model from that.

Fake Product Review using Natural Language Generation

By providing key words or a phrase, a product review can be generated. This product review can be seen as natural language generation.

A fictitious review is generated from a corpus of review data, based on a key word.

Imagine of the chatbot has got access to a corpus of response data, and based on key words or values, a response is generated. Unique in a sense.

Fake News Headline Generator

In the video below, I got a data set from kaggle.com with about 185,000 records.

Natural Language Generation with Google’s Colab Notebook in Python

Each of these records where a newspaper headline which I used to create a TensforFlow model from.

Based in this model, I could then enter one or two intents, and random “fake” (hence non-existing) headlines were generated.

There are a host of parameters which can be used to tweak the output used.

Conclusion

We have seen growth in the way input data is processed by chatbots. Multiple intents can be detected, with multiple entities. Relations and types of entities can also be identified. The flexibility is astounding in many cases.

Yet we have not seen the same degree of advancement and flexibility in the chatbot script. Users judge a chatbot by its script and how appropriate and lifelike each response is. The script also informs the user on the current conversation state, and how to proceed; hence its importance.

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

No responses yet