The Large Language Model Landscape
The number of commercial and open LLM providers has exploded in the last 2 years, and there are now many options to choose from for all types of language tasks. And while the main way of interacting with LLMs is still via APIs and rudimentary Playgrounds, I expect that an ecosystem of tooling that helps accelerate their wide adoption will be a growing market in the near future.
Below is a graphic depicting the current Large Language Model (LLM) landscape in terms of functionality, offerings and the tooling ecosystem.
- Large Language Models (LLMs) functionality can be segmented into five areas: Knowledge Answering, Translation, Text Generation, Response Generation and Classification.
- Classification is arguably the most important to today’s enterprise needs, and text generation the most impressive and versatile.
- The commercial offerings and more general offerings are Cohere, GooseAI, OpenAI and AI21labs. GooseAI currently only focuses on generation.
- The open-source offerings are Sphere, NLLB, Blender Bot, DialoGPT, GODEL and BLOOM.
- The tooling ecosystem is still in a nascent state with many areas of opportunity.
The various LLM offerings cover these five areas of functionality in varying degrees.
Classification is a form of supervised learning where text is assigned to predefined classes. This is related to Clustering which is unsupervised learning where semantically similar text is grouped together without any pre-existing classes.
Response Generation is the notion of creating a dialog flow from example conversations, and having a machine learning approach to it. Where a model determines the next dialog to present to the user, based on the immediate conversation history and the most probable next dialog.
Text Generation can be described as the meta capability of LLMs, text can be generated based on a short description with or without example data. Generation is a function shared amongst virtually all LLMs. Not only can generation be leveraged extensively by few-shot learning data; by casting (prompt engineering) the data in certain way determines how the few-shot learning data will be used.
Translation is where text is translated from one language to another. This is done directly without any intermediary language. Read more about it here.
Knowledge Answering is an implementation of what is called Knowledge Intensive NLP (KI-NLP), where broad domain and general questions can be answered, without querying an API or leveraging a traditional knowledge base. Knowledge Intensive NLP is not a web search, but a self contained knowledge base underpinned by semantic search.
Cohere, OpenAI, AI21labs, GooseAI, Blender Bot, DialoGPT, GODEL, BLOOM, NLLB, Sphere
The open-source implementations tend to be less comprehensive and more specific in their implementation focus.
Data-centric Tooling, Playgrounds, Notebooks, Prompt Engineering Tools, Hosting
LLMs & Playgrounds
LLMs are accessed as APIs, so the barebones tooling required to make use of their APIs is the command-line, a development environment or Jupyter Notebooks; Cohere is doing a really great job of pushing out content that shows how to apply LLMs to real-life use-cases with simple scripts and integrations.
Vendors also clearly realise that to make experimenting and adopting LLMs easier, they need to provide no-code environments in the form of Playgrounds that expose the different tasks and tuning options: these are a great starting point to understand what can be achieved.
Below is the GooseAI playground which is a very similar approach to the other LLM providers.
These playgrounds allow you to play around with "prompt engineering" (which is the way by which you can explore the mind-blowing text generation capabilities). Note: I'm quite surprised that we haven't seen a bigger explosion (yet) of third-party tools / marketplaces etc focused on LLM "prompt engineering", the same way we've seen around image generation models (like DALL-E and more recently Stable Diffusion).
I'm anxious to see LLMs more deeply integrated within the "core" workflows required to develop conversational AI and other use-cases like analytics etc; it seems clear that LLM APIs and their embedding spaces are positioned to unlock more powerful:
- Semantic search (useful to explore unstructured data)
- Clustering (needed to identify topics of conversations or intents)
- Entity extraction (via text generation)
- Classification (either via few-shot learning examples, or fine-tuning the actual models)
I don't expect enterprise customers to do this type of work within vendor Playgrounds - instead I expect these will be the types of features incorporated within third-party tools (either the conversational AI platforms themselves, or specialised data-centric solutions) that will be powered by the LLM APIs.
Finally, LLMs are massive models, and they are expensive and difficult to run.
Most of the technologies mentioned here (apart from the commercial LLMs) are accessible via 🤗HuggingFace.
You can interact with models using Spaces, Model Cards or via hosted inference API's. There are options for training, deployment and hosting. Obviously hosting and compute demands will be excessive and not easily justifiable.
LLMs are not chatbot development frameworks, and the one should not be compared to the other. There are specific LLM use-cases in conversational AI, and chatbot and voicebot implementations can definitely benefit from leveraging LLMs.
Get an email whenever Cobus Greyling publishes.
Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don't already…
Eliza Language Technology Community - Language Technology: Conversational AI, NLP/NLP, CCAI…
ELIZA - Where language technology enthusiasts unite.
Large Language Models Are Being Open-Sourced
And The Cost Of Hosted Solutions Are Coming Down
No-code tooling for NLU
The complete productivity suite to transform natural language into business insights and AI training data
Bootstrapping A Chatbot With A Large Language Model
How To Harness The Power Of OpenAI In Creating A Chatbot From Scratch
Language Translation Using Meta AI NLLB (No Language Left Behind) And SMS
The Meta AI NLLB project has open-sourced models, capable of performing language translation directly between 200…
What Is KI-NLP And How Can It Be Used For Conversational AI?
Knowledge Intensive Natural Language processing (KI-NLP) is well suited for answering questions instead of searching…
BLOOM — BigScience Large Open-science Open-Access Multilingual Language Model
Here you will find an overview of the Large Language Model (LLM) called BLOOM. What practical implementations exist for…
Using DialoGPT For Conversational Response Generation
There has been renewed focus on expediting chatbot development. Attempts to fast-track conversational AI development…
What Is GODEL (Large-Scale Pre-Training for Goal-Directed Dialog)?
On May 2022 Microsoft announced GODEL. GODEL is designed for general-domain conversation and is fully open-sourced.
Meta AI’s Blender Bot 3.0 Is An Open Source Chatbot With Long-Term Memory & Internet Search
On 5 August 2022 Meta AI announced Blender Bot 3, the first 175B-parameter bot, publicly available. Approximately 58…