Developing A Voicebot Using Google Dialogflow CX

And How Does CX Compare To Other Environments…

Cobus Greyling
9 min readNov 19, 2020



Looking at a previous Medium story wrote on Dialogflow, which is now referred to as Dialogflow ES, the one big issue I raised is the lack of a dialog development and management environment.

Google Dialogflow CX Main Console

And, that ES will have to be used as an API with native code managing the dialog and state management.

This has been solved for with Dialogflow CX.

In other words, an interface where the state of the conversation can be managed.

Currently, in the market, there are five distinct groupings in approaching dialog design and development…

Dialog configurations, design canvas, native code, ML stories and of course design canvas.

CX falls in the design canvas category and really reminds of the design approach Botsociety follows in their interface for design.

CX is an enterprise voicebot (voice first) focused tool with an elaborate design canvas for collaboration. CX is firmly imbedded as an extension of the Google Cloud Platform offering.

Dialogflow CX Negatives

  • The need to continuedly save as you work and changes as you move along. I like the approach of IBM Watson Assistant where work and changes are committed on the fly with no need to save changes explicitly.
  • There is no training indicator, so you are tempted to try and test a change immediately, just having to find it has not yet taken affect.
  • Pricing; really exorbitant; especially for emerging markets.
  • State machine driven. The design canvas can become very complex and tricky for larger implementations.
  • Seemingly for Agent utterances you need to press enter to capture the change and then save. The saving overhead I find laborious.

Dialogflow CX Positives

  • The web development console is very responsive, uncluttered and clean. The opposite of the Alexa development console.
  • Leveraging the power of Google for Speech To Text.
  • Convenient option to replay a sequence of utterances in the Simulator.
  • If you are use to chatbot IDE’s the learning curve is not steep at all; there are no abstract terms or functionality.
  • Managing context within pages and flows are easy. Some form of digression can be accommodated.
  • Having the option to end the flow and/or end the session is valuable. There are instances where you want to keep the session active; many users presume a chat is asynchronous, analogous to human conversations.
  • On saving, error messages are descriptive and helpful.

Creating A Project

The first step in creating a CX application is to create a new project within Google Cloud Platform. This is the first indication that CX is strongly aligned with the Google Cloud Platform.

Within CX you can have various projects. Projects are the highest order of organizing your assistants. Within a project you can have multiple Agents.

Creating a new project within Google Cloud Platform

Within the project you can view the agents defined.


You need to think of a CX agent as a company wide virtual agent handling customer interactions. For an assistant the top building block is the Agent. This is analogous to the approach followed by IBM Watson Assistant to have an agent with various skills constituting the agent.

Agents within the Project.

Here is the agent view within the project, with three agents defined. Creating a new agent, a name is give and an array of language options and locals are available.

Creating an Agent with the language options at your disposal.

One of the impediments for emerging markets is the lack of support for local languages or vernacular.

The agent facilitates the conversation with your user. CX will translate the user text or audio into structured data for the flows to understand.

A Dialogflow agent is similar to a human call center agent. You train them both to handle expected conversation scenarios, and your training does not need to be overly explicit.

Flows & Pages

Here is an overview of a very basic Travel Agent bot…but first, what is a flow and a page?


For instance, an assistant for a bank will have multiple complex dialogs which can be arranged according to conversational topics.

The four flows we are going to create in our application.

These topics can include transfers, forex, bonds, credit card etc.

In turn, each topic demands multiple conversational turns for an agent to acquire the relevant information from the end-user.

Flows are used to define these topics and the associated conversational paths.

The advantage here is that the assistant can be divided into different flows, and these flows can be assigned to different teams or squads of developers.


CX conversation is developed and visually represented as a state machine. The states of the conversation is represented by pages.

Each flow has a Start, End Flow and End Session page. Additional pages can be added.

At any point in time, in the conversation, one page is the current page, considered as the active page.

A Dialogflow CX conversation (session) can be described and visualized as a state machine.

The states of a CX session are represented by pages.

Within the page, intents and entities are collected. This is a huge step up from ES.

ES really lacks a design canvas to map out conversational paths.


Intents are the frontline of your chatbot, discovery the intention of your user. Intents can be seen as the verb; the action or goal the user wants to achieve.

Intents usually moves the conversation from state to state.

Within CX, Intents are defined by a name, and a few training examples. Intents can do just that, capture the user intent.

Or, as shown below, entities can be defined within the intent example, contextually.

Intent is defined with a name and training examples with entities defined contextually.


Entities can be seen as the nouns, contextually sensitive information. Which, if not detected successfully, leads to a situation where the bot re-prompts the user for information already entered.

Entity Options within CX

Entities only (no synonyms)

Use synonyms to extract semantically similar words and map them back to the same value. For example, the entity green onion might require the synonym scallion.

Regexp entities

You can specify Google RE2 regular expressions in the values and they will be used during query classification to extract parameters.

Automatically add entities

Dialogflow use machine learning to fill out your entity list based on existing entries.

Fuzzy matching

Fuzzy matching is the type of parameters extraction that matches an entity approximately (rather than exactly). It will try to find matches even when users misspell words or enter only part of words from entity entries.

Entity Exclusions

Entity exclusions helps you manage irreverence…phrases or words which should neve be matched. For instance, in our travel bot, we can exclude city names we do not offer travel arrangements to.

Build & Manage

The process of writing or creating a voicebot will start with the Manage tab, where you will define intents and entities. Webhooks and routing will come into play later.

Build & Manage Tabs

Under the manage tab flows can be created and subsequently pages for each flow.

There is a default start page, it is mandatory. And for each flow there is a start, end flow and end session page.

Default Start Node

Within the flow node, there are three options to manage the dialog and how the conversation moves from state to state.

  • Route according to intents
  • Create condition based on conditions
  • Handle event
Default Start Node with with multiple intent routes.

For starters, the easiest way to manage state is to have routing performed by intents.

For the Travel Bot, the three flows we break into are all handled by an intent router. Based on the intent detected, the dialog can be managed.

As you hover over the different intent routers, the link to the next flow is highlighted.

Variables & Entities

The next step is to present variables or parameters to the user, spoken back by the bot. In the travel detail flow we capture compound entities from the user utterance.

Compound entities per user utterance

These entities, once captured, can be presented to the user in the following way:

You want to travel to $session.params.to_city from  $session.params.from_city ?

Variables can be set in a page, this can assist with managing the conversation and set conditions.

A session variable / parameter is created called CheckPoint and set to a value of “true”.

The entities and session variable is stored in the following format:


Variables or parameters can be used to create conditional triggers which will decide where the dialog will be routed to. In the example below, fulfillment only occurs in this state if the intent is matched and the conditional rule.

Conditional Routing

Agent Export

Agent export is not not ASCII format, but rather a binary file. While is it convenient to export your application for backup, sharing and versioning, having it in an binary format is not ideal.

All other commercial solutions allow for export of the NLU and dialog management components in an ASCII format.

If the file is ASCII, it can be viewed and inspected visually and even manipulated to be used across platforms and facilitate migration.

Testing & Feedback

Testing can be automated with test cases, analytics can be viewed out of the box. In the image below the validation is shown; with categorization according to:

  • Info
  • Warning
  • Error

The bot might work well in testing; but warnings can help prevent vulnerabilities when going live…

Test Cases, Validation & Analytics allow for testing and feedback.

There are a host of other settings which are available; one negative of big cloud environments is a lack of control in terms of what happens under the hood.

Some of the agent settings available


There are a host of other agent settings available which will come into play once the bot is connected to telephony.

Areas which I did not cover include the Advanced Speech Recognition (ASR) or also known as Speech To Text (STT). This is speaker language dependent and a very specialized field. Google has vast amounts of data an resources to leverage to create an exceptional STT engine.

Speech synthesis (Text To Speech, TTS) is another area Google will excel and offer users an exceptional product.

Dialogflow CX seems like the product to make Google Duplex a reality for companies and organizations…

All at a cost of course…



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.