An Overview Of The Voicebot Conversational AI Landscape

And how and why the ecosystem is fragmenting.

Cobus Greyling
3 min readNov 10, 2022


Below is a graphic depicting the current voicebot conversational AI technology landscape in terms of voicebot frameworks, orchestration engines, LLMs, NLU Design, ASR and Speech Synthesis.

Something which is evident from this image is the fragmentation of the voicebot conversational AI ecosystem. This fragmentation is driven by three major factors.

The first factor is the introduction and proliferation of voice as a modality and the rising importance of voice as an integral part of implementing Contact Centre AI (CCAI).

The second factor is that due to Conversational AI (CAI) implementations scaling in terms of functionality and complexity. This scaling necessitates the implementation of good data practices like NLU Design which was largely absent in the past.

The third factor is the advent of Large Langauge Models (LLMs) and the implementation of LLMs in voicebots. Generative models are a subset of LLMs which can be used for bot response design, semantic search and to a limited degree, dialog management.

Voice orchestration is an element an increasing number of chatbot development frameworks are introducing. This approach allows for developers to voice enable their chatbots by selecting within the development environment:

  • An automatic speech recognition (ASR/Speech-To-Text) provider.
  • A speech synthesis provider (Text-To-Speech/TTS).
  • In some cases an Audio/SIP gateway is baked into the solution.
  • The Conversation Design and Response Design (Message Abstraction) elements are adapted so portions of the flow and messaging can be designed for voice specific implementations.

Lastly, there is significant innovation in the areas of TTS and STT by companies like respeecher and Deepgram, to name only two.

