Chatbots And The Challenge Of Invisible Affordances
Designing Affordances For A Conversational UX
The challenge of invisible affordances in the world of Conversational Interfaces.
The sentence in French on Rene Magritte’s painting reads; “This is not a pipe”. But surely this statement is incorrect. It is indeed a pipe, a very well defined and clear to see case of a pipe.
However, this begs the question; can you pack and light this pipe? Can you cradle it in your hand? Can you indeed smoke the pipe?
So the sentence is then correct, this is indeed not a pipe. This is merely an image of a pipe.
Just as much as a map is not the terrain, and the word is not the thing. This is only a painting of of pipe.
A likeness of a pipe, without the Affordances of a pipe.
Hence the treachery, a representation without holding any affordance of what is being representation.
Affordances in UX
Affordance is what the environment offers the individual… James J. Gibson coined the term in 1966.
Affordances are the tools, methods and models available to an individual in any given environment.
Should you find yourself before a close door, you will detect the affordances available to you to open the door. Push, pull, a door handle, a button, an intercom etc. Without the affordances you will not be able to navigate the environment. Should you have a disability, special affordances should be made available to you.
The original definition in psychology includes all transactions that are possible between an individual and their environment. When the concept was applied to design, it started also referring to only those physical action possibilities of which one is aware.
Affordances in certain environments accumulate patterns of behaviour and models of convention very quickly whereby developers and UX designers need to abide by.
Think of the Apple iPhone X, which introduced a whole array of affordances previously not known or available. Users became aware and eventually attuned to their new user experience environment.
User experience (UX) transcends the user interface (UI) and speaks to the feeling a user have accessing affordances in the user environment they find themselves.
Here is the challenge; affordances need to coalesce the existing knowledge and intuition of users.
In 1988, Donald Norman appropriated the term affordances in the context of human–machine interaction to refer to just those action possibilities that are readily perceivable by an actor.
This new definition of “action possibilities” has now become synonymous with Gibson’s work, although Gibson himself never made any reference to action possibilities in any of his writing.
Affordances in Conversational AI
Conversationl AI (Conversational UI, Voice User Interfaces) allow users to interact with an interface via voice or speech. Digital assistants (Siri, Google Home, Alexa) afford the user the freedom to use speech without touch, or any Graphic interface.
Often, where there are no visual affordances, no implicit or explicit guidance can be given to the user. Due to the nascent nature of Conversational AI, it is virtually impossible to leverage exiting experience or knowledge of the user.
Due to the strong association of conversations with humans and not machines, the expectations of the user is often much higher than the capability of the interface. Hence in most cases the user output exceeds the design of the interface.
There is a certain level of patience and perseverance required from the user to build an affinity and sympathy from the user.
Conversational Components are the different elements available within a medium.
Within Facebook Messenger there are buttons, carousels, menus, quick replies and the like.
This allows for the developer to leverage these conversational components to the maximum hence negating any notion of a pure natural language conversational interface.
It is merely a graphic interface living within a conversational medium. Or, if you like, a graphic user interface made available within a conversational container.
Is this wrong, no.
It is taking the interface to the user’s medium of choice. Hence making it more accessible and removing the friction apps present.
The problem with such an approach is that you are heavily dependent on the affordances of the medium. Should you want to roll your chatbot out on WhatsApp, the same conversational components will not be available, and you will have to opt for NLU, keyword spotting or a menu driven solution. Where the menu is constituted by keywords or numbers. Which is shown below.
The menu will have to be a key word or number the user need to input to navigate the UI.
Is it a chatbot?
Technically, yes; as it lives in a multi-threaded asynchronous conversational medium.
But is it conversational in nature? One would have to argue no.
So in the second example a rudimentary approach needs to be followed for one single reason, the medium does not afford the rich functionality and is primarily text based.
Another challenge is the fact that the conversational input from the user is unstructured, and the interface need to structure the input after the fact.
Inversely, the output to the user is again data which is structured, which needs to be unstructured into a conversational form. So the temptation is always there to tweak the nature of the conversational interface in such a way, as to lend more structure to it. Regardless of the medium, many organisations are lured into introducing structure to the conversation from the system’s side. This structure is not fully compatible with the purist’s view, and hence not completely leveraging the NLU technology available.
But, by presenting the user with structure, does help the conversation along and acts as some form of rudimentary affordace. Obviously, this is easier with text bots, or chatbots. The moment we use voice bots, this becomes harder.
But the voice bot can still furnish the user with help in what they can say.
When it comes to chatbots, the aim should be to make it conversational and not merely present a GUI within the chat window. The chatbot should not merely be a recreation of what lives in USSD or the mobile web, which is a great temptation. It is indeed going the extra mile to create the Natural Language Understanding (NLU)layer to convert the unstructured linguistic input into structure.
It might make sense to attempt to understand the natural language, unstructured input for two or three iterations prior to introducing the fixed menu.
This is akin to the approach followed with IVR often; where the Advanced Speech Recognition (ASR) was attempted for a few iterations before the DTMF menu was introduced.