Photo by Lysander Yuen on Unsplash

The Future of Chatbots With Web 3.0

What Are Some Of The Design & Development Implications


What is the Web 3.0? It is the third generation of the Internet…a network which is underpinned by intelligent interfaces and interactions.

The Web 3.0 will be constituted by software, with the browser acting as the interface or access medium.

An Amazon Echo Show skill integrated to the Mercedes-Benz API for vehicle specifications and images. The user can ask questions regarding Mercedes vehicles and a topical interactive display is rendered which van be viewed, listened and touch navigation. Follow-up speech commands can also be issued.

Most probably the closest comparison currently to the Web 3.0 are devices like the Amazon Echo Show or Google Nest Hub.

Where we see multiple modalities combined into one user experience.

The user issues speech input and the display renders a user interface with images, text and/or video. The content can be viewed, listened to, with touch or speech navigation.

This multi-modal approach lowers the cognitive load as user input will primarily be via voice and not typing. Media is projected to the user via text, images/video and speech. Refining queries based on what is presented will must probably result in touch navigation.

Hence we will see full renderings of contextual content based on the spoken user intent.

A big advantage of Web 3.0 is that a very small team can make significant breakthroughs seeing it is software based.

Amongst other key elements, the Web 3.0 will be defined by personalized bots which serves users in specific ways.

A demonstration of available templates and layouts of an Amazon Echo Show skill. Click navigation, with display options, audio or follow-up questions.

These bots will facilitate intelligent interactions with the user and all relevant devices.

Bots will interact via voice, text and contextual data. Focusing on customer service, support, informational, sales, recommendations and more.

Intelligent conversational bots will not only communicate in text but any appropriate media.

This new iteration of the web will have pervasive bots which will surface in various ways and linked to the context of the user’s interaction.

Imagine a user is reading through your website, and at a particular point they can click on text which takes them to a conversational interface which is contextually linked to where the user clicked.

More about this later…

Another speculative illustration of the Web 3.0, with a speech interface to issues commands to the Mercedes-Benz vehicle API. The display changes based on speech input.

These two tables below attempts to quantify and describe the differences between Web 2.0 and Web 3.0. This is obviously from a conversational perspective, there are other components which will contribute to Web 3.0.

A broad overview of what the shift might entail…

A more detailed view of how the user interface and experience will change…

User interfaces demand complexity. The complexity needs to vest somewhere. The traditional approach (Web 2.0) is to surface complexity to the user via the user interface. Adding to the user’s cognitive load and limiting input to typing, increasingly so using a mobile phone.

With Web 3.0 simplicity is surfaced to the user, this means that the complexity needs to move under the hood, and be addressed by the framework or platform. This makes the development and implementation tricky with added overhead, but allows for a simplistic, customized and a multi-modal user interface.

User experience is also about how the user feels after user the interface. Lowering cognitive load contributes to this improved user experience.

Rich Chatbot Content via iframes

I particularly like iframes, as rich and dynamic content can be included in the bot response leveraging existing functionality. This porotype was built using IBM Watson Assistant. A title can be added to the iframe, however this is optional.

Navigation related questions are responded to with navigation options. The chatbot is enabled to retrieve the relevant content and present it to the user.

A link to Google Maps displaying a particular route is shown.

And obviously the URL to the contents of the iframe needs to be added.

Please note, that only the url is required and the iframe html tags should not be included when defining the iframe source.

Maps and navigation options are good use-cases for this type of bot-response.

Other use-cases which come to mind are general information pieces from other sources, like Wikipedia.

Think of payment processing via iframes, providing access to seamless integration into payment gateways from within the chatbot.

This keeps the user within their medium of choice and negating the friction of moving the user from one medium to another.

Lastly a simple example of linking to Wikipedia and displaying the information within the chatbot frame.

An iframe implementation of linking to a mobile website contextual to the query from the user.

Here is the iframe configuration options from the Watson Assistant dialog development interface.

Here is the iframe configuration options from the Watson Assistant dialog development interface.

Initializing Web Chat With Custom Options

You can create a link to web chat, which will open the chatbot window and start the conversation at the specific dialog node. Hence maintaining the context of the conversation.

Links within a text can trigger contextual conversations.

This can be useful, for example, if you want to send an email to your customers with a link for something like “Click here change your debit order date”.

Once clicked, the link will take the user to a website, open web chat, and begin the conversation with “change debit order date” dialog node.

Here is a view of the JavaScript enabling the functionality of Watson Assistant…

window.watsonAssistantChatOptions = {
region: "YOUR_REGION",
pageLinkConfig: {
linkIDs: {
'u35': {
text: 'I would like to change my debit order date'
'r23': {
text: 'I need to reset my password'
onLoad: function(instance) {
setTimeout(function(){const t=document.createElement('script');t.src="" + (window.watsonAssistantChatOptions.clientVersion || 'latest') + "/WatsonAssistantChatEntry.js";document.head.appendChild(t);});

Your skill will need to recognize this text and start the conversation appropriately. This text will not be visible to the end user.

Rich Media With Images, Audio Files & Videos

To Enable Web 3.0, various design affordances need to be easily accessible from a chatbot or Conversational Agent perspective.

An ever growing list of assistant response options from a bot perspective in IBM Watson Assistant.

It would make sense to make these affordances part and parcel of the bot responses.

In the dialog development environment of IBM Watson Assistant, various assistant responses can be easily defined.

Imbedding Audio and Video files linked to specific intents. In the dialog development environment of IBM Watson Assistant, various assistant responses can be defined. The list range from the basic to the more feature rich. Text, Option buttons, to Images, Audio, Video, iframes and connecting to an human agent.

The list ranges from basic to more feature rich. Ranging from:

  • text,
  • option buttons,
  • to images,
  • Audio,
  • Video,
  • iframes and
  • connecting to an human agent.

It is interesting how bot responses have evolved and how comprehensive explanations, tutorials and more can be presented to the user.

The rich design affordances is not locked into a web design or mediation layer, but is configured within the dialog development framework.

The web presentation configuration is feature rich and various elements can be set sent viewed in real-time.

The web presentation configuration is feature rich and various elements can be set and viewed in real-time.

Button Options

Buttons have been around for a while, but need to be mentioned. Buttons are especially helpful with constraining responses from the user.

A special use-case for buttons is disambiguation. This is when 2 to 5 intents are application to the user’s utterance, and these options are presented to the user to select the most appropriate one.

Hence affording the user the opportunity to disambiguate the dialog turn, and lending the opportunity to the bot to learn from user input for future iterations.

The representation of the buttons within the try it out pane, and then the web preview.

Here you see the representation of the buttons within the try it out pane, and then the web preview.


With Web 3.0, chatbots will be accessed from text, the web, emails, text messages and more.

Conversations will be shorter, with users dropping into chatbot conversation to perform specific tasks…or chat to a live agent.

Contextual awareness will be important…vertically and horizontally.

Vertically as users resume conversations via the same medium, the chatbot should be fully aware of previous conversations and this must inform the context of the current conversation.

Horizontally as users move from one medium to another. Having a conversation over the phone, with a service representative. And later initiating a conversation with the chatbot via the same issue, contextual awareness must be maintained.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cobus Greyling

Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; NLP/NLU/LLM, Chat/Voicebots, CCAI.