Three Ways In Which Whisper Is Advancing ChatGPT
ChatGPT will be advanced by Whisper due to the disfluency of speech, the creation of more data for ChatGPT to access & the notion of LLMs moving towards Multi-Modal Foundation Models.
In this article I want to consider why the LLM ASR model of Whisper will benefit ChatGPT, and why OpenAI releasing the two APIs simultaneously is not a coincidence…
Conversations transcribed from speech via ASR is vastly different to conversations generated by user text input.
The reason why speech input is so different to text input is due to the disfluency of speech, as apposed to typed conversations.
Disfluency is what makes developing a successful voicebot so hard.
Disfluency can be described as various breaks, irregularities, or non-lexical vocables which occur within the flow of otherwise fluent speech. Speakers performing self-correction, repetition, repeating prior context, etc.
Disfluencies are interruptions in the regular flow of speech, such as using uh and um, pausing silently, repeating words, or interrupting oneself to correct something said previously, etc.
Hence speech disfluency will introduce a new paradigm to ChatGPT in terms of text data submitted.
I hasten to add that transcribed audio data is not necessarily of low-quality language data, but merely different. A key indicator of the general quality of transcribed audio is Word Error Rate (WER).
[In any conversation] meaning is established through turns.
~ John Taylor
Access to More Data
There is an interesting paper where the growth in data usage is investigated while considering the total stock of unlabelled data available on the internet.
The paper concludes that high-quality langauge data will be exhausted by 2026. Low-quality language data and images will be exhausted much later.
Coupling the Whisper API with the ChatGPT API provides a whole new source of data to the OpenAI GPT models. The vast amount of audio data can be accessed now and put to work with a low barrier to entry in terms of technical expertise and cost.
Large Language Models ➡️ Foundation Models ➡️ Multi-Modal AI
There has been a shift in terms of Large Language Models by including additional modalities. Audio was the first step towards a multi-modal approach.
Large Models is sprawling into other none-language related tasks, but still remain the foundation of many applications and services.
A Foundation Model can be language only related in terms of functionality, or it can also cover other modalities like voice, images, video, etc.
Read more here:
Large Language Models, Foundation Models & Multi-Modal Models
How did we go from LLMs, to Foundation Models and now Multi-Modal Models, with GPT-4 being described as a Multi-Modal…
⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️
I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.
NLU design tooling
“Conversation Designer, Retail, 10k+ employees The tool that turned conversation designers, into NLU designers” ★★★★★…
Get an email whenever Cobus Greyling publishes.
Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…
Eliza Language Technology Community — Language Technology: Conversational AI, NLP/NLP, CCAI…
ELIZA — Where language technology enthusiasts unite.
The Cobus Quadrant™ Of NLU Design
NLU design is vital to planning and continuously improving Conversational AI experiences.
The Cobus Quadrant™ Of Conversation Design Capabilities
∗ This is part one of a two part series, please also take a look part two, the Cobus Quadrant of NLU Design.
Large Language Models Are Forcing Conversational AI Frameworks To Look Outward
With fragmentation being forced on frameworks it will become increasingly hard to be self-contained. I also consider…
Large Language Models (LLMs) Will Not Replace Traditional Chatbot NLU…For Now
Traditional NLU pipelines are well optimised and excel at extremely granular fine-tuning of intents and entities at no…
The Foundation Large Language Model (LLM) & Tooling Landscape
There is an ever growing list of Generative AI Applications, which can be broken down into eight broad categories.
The Introduction Of Chat Markup Language (ChatML) Is Important For A Number Of Reasons
On 1 March 2023 OpenAI introduced the ChatGPT and Whisper APIs. Part of this announcement was Chat Markup Langauge…
Introducing ChatGPT and Whisper APIs
Jeff Belgum, Jake Berdine, Brooke Chan, Che Chang, Derek Chen, Ruby Chen, Thomas Degry, Steve Dowling, Sheila Dunning…
Online Voice Recorder - Record Voice from the Microphone
Our Voice Recorder is a convenient and simple online tool that can be used right in your browser. It allows you to…
How Will OpenAI Whisper Impact Current Commercial ASR Solutions?
Here I consider implications for commercial ASR’s, English accuracy of Whisper and minor language precision.