Fine-Tuning Large Language Models

In this article I consider the fine-tuning of large language models and how it compares to zero and few shot approaches.

4 min readDec 7, 2022

The large language model (LLM) landscape is ever expanding with a number of new additions of late, with the most notable introduction being ChatGPT.

There are three approaches to reaching desired outcomes with LLMs:

1️⃣ Zero Shot Learning
2️⃣ Few Shot Learning
3️⃣ Custom/Fine-Tuned Models

Zero Shot Learning is a scenario where a LLM can recognise a wide range of user requests without explicitly being trained on the user input. It can be argued that the new discipline of Prompt Engineering forms part of zero shot learning.

Few Shot Learning is an approach where the LLM makes a prediction based on a very limited number of training samples. These samples can be stored in clusters and retrieved for real-time training during the conversation. You can find more detail on this approach here.

Custom / Fine-Tuned Models is a longer term approach where a fine-tuned model is based on one of the standard models offered by the LLM.

Below is an example from the OpenAI playground where the custom fine-tuned models are listed together with the standard available OpenAI Language API models.

When compiling a query in the OpenAI playground, the model drop-down is visible on the top right. There are a few standard models to choose from, each of these are optimised for different tasks.

The steps to create a custom model are:

Select an existing model which is best suited for the envisioned task.
Curate and prepare your custom training data.
Format the data into the relevant format as mandated by the LLM providers.
Upload and start the training process.
All of the training takes place in the cloud.
Depending on the size of the base model you select, and the amount of training data, the training time can differ significantly.

Source The GPT-3 models understand and generate natural language. These are the four main models with different levels of power for different intended tasks. Davinci is the most capable model, Ada is the fastest.

Once training is completed, the LLM platform will send you a notification and the model reference will be available for you to use in your projects.

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

NLU design tooling

“Conversation Designer, Retail, 10k+ employees The tool that turned conversation designers, into NLU designers” ★★★★★…

www.humanfirst.ai

https://www.linkedin.com/in/cobusgreyling

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

Eliza Language Technology Community — Language Technology: Conversational AI, NLP/NLP, CCAI…

ELIZA — Where language technology enthusiasts unite.

eliza.community

The Cobus Quadrant™ Of NLU Design

NLU design is vital to planning and continuously improving Conversational AI experiences.

cobusgreyling.medium.com

The Cobus Quadrant™ Of Conversation Design Capabilities

∗ This is part one of a two part series, please also take a look part two, the Cobus Quadrant of NLU Design.

cobusgreyling.medium.com

Implementing Data-Centric AI For NLU Models

Andrew Ng has coined & is championing the concept of Data-Centric AI. Data-Centric AI is the discipline of engineering…

cobusgreyling.medium.com

Solving For The Long Tail Of Intent Distribution

The long tail of intent distribution can be successfully addressed by leveraging the first two steps of NLU Design

cobusgreyling.medium.com

Fine-Tuning Large Language Models

In this article I consider the fine-tuning of large language models and how it compares to zero and few shot approaches.

NLU design tooling

“Conversation Designer, Retail, 10k+ employees The tool that turned conversation designers, into NLU designers” ★★★★★…

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

Eliza Language Technology Community — Language Technology: Conversational AI, NLP/NLP, CCAI…

ELIZA — Where language technology enthusiasts unite.

The Cobus Quadrant™ Of NLU Design

NLU design is vital to planning and continuously improving Conversational AI experiences.

The Cobus Quadrant™ Of Conversation Design Capabilities

∗ This is part one of a two part series, please also take a look part two, the Cobus Quadrant of NLU Design.

Implementing Data-Centric AI For NLU Models

Andrew Ng has coined & is championing the concept of Data-Centric AI. Data-Centric AI is the discipline of engineering…

Solving For The Long Tail Of Intent Distribution

The long tail of intent distribution can be successfully addressed by leveraging the first two steps of NLU Design

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Cobus Greyling

No responses yet