How To Create A Custom Fine-Tuned Prediction Model Using Base GPT-3 models

Large Langauge Models (LLMs) functionality can be divided into two broad categories: Generative & Predictive.

8 min readFeb 2, 2023

Considering the generative and predictive capabilities of LLMs, generative capabilities has received the most attention. I would argue that LLMs excel at generative tasks, generative tasks are also immensely impressive and only requires zero of few-shot learning.

The nascent trend of Prompt Engineering has also contributed to emphasis on generative.

The image below lists the most common generative tasks from a Conversational AI Development Framework perspective; split between generative and predictive.

The predictive aspect of LLMs are more critical to get right, especially considering the actions which will be premised on this result.

The most common use of a predictive scenario for chatbots are intents. In essence an intent is a classification of the user’s utterance. The utterance is obviously not predetermined or known to the chatbot, and hence the importance and challenge of getting the intent prediction right.

For both generative and predictive a custom fine-tuned LLM model can be created. Currently fine-tuning LLMs is a novelty, but for mass adoption of LLMs in more formal and enterprise settings, fine-tuning will become mainstream.

The complete example below illustrates how to create an OpenAI GPT-3 (Ada) fine-tuned model for classification of text into one of two classes.

The image below shows the final product, where the fine-tuned model is listed under “fine-tunes”.

Let’s get started, below is the code to access the training data from Sklearn. The command below lists the categories of data archived from the original 20 newsgroups website.

from sklearn.datasets import fetch_20newsgroups
newsgroups_train = fetch_20newsgroups(subset='train')
from pprint import pprint
pprint(list(newsgroups_train.target_names))

These are the 20 categories available, from these we will make use of rec.autos and rec.motorcycles.

['alt.atheism',
 'comp.graphics',
 'comp.os.ms-windows.misc',
 'comp.sys.ibm.pc.hardware',
 'comp.sys.mac.hardware',
 'comp.windows.x',
 'misc.forsale',
 'rec.autos',
 'rec.motorcycles',
 'rec.sport.baseball',
 'rec.sport.hockey',
 'sci.crypt',
 'sci.electronics',
 'sci.med',
 'sci.space',
 'soc.religion.christian',
 'talk.politics.guns',
 'talk.politics.mideast',
 'talk.politics.misc',
 'talk.religion.misc']

The code to fetch the two categories we are interested in, also assign the data to vehicles_dataset .

from sklearn.datasets import fetch_20newsgroups
import pandas as pd
import openai

categories = ['rec.autos', 'rec.motorcycles']
vehicles_dataset = fetch_20newsgroups(subset='train', shuffle=True, random_state=42, categories=categories)

Below a record is printed of dataset:

print(vehicles_dataset['data'][10])

From the result below; it is evident that the data is very messy, and each entry is not clean and can very easily contain ambiguity.

From: stlucas@gdwest.gd.com (Joseph St. Lucas)
Subject: Re: Dumbest automotive concepts of all time
Organization: General Dynamics Corp.
Distribution: usa
Lines: 10

Don't have a list of what's been said before, so hopefully not repeating.

How about horizontally mounted oil filters (like on my Ford) that, no
matter how hard you try, will spill out their half quart on the bottom
of the car when you change them?

-- 
Joe St.Lucas    stlucas@gdwest.gd.com        Standard Disclaimers Apply
General Dynamics Space Systems, San Diego
Work is something to keep me busy between Ultimate Frisbee games.

Now we can check the number of records we have, and how many examples we have for autos and motorcycles.

len_all, len_autos, len_motorcycles = len(vehicles_dataset.data), len([e for e in vehicles_dataset.target if e == 0]), len([e for e in vehicles_dataset.target if e == 1])
print(f"Total examples: {len_all}, Autos examples: {len_autos}, Vehicles examples: {len_motorcycles}")

The printed result:

Total examples: 1192, Autos examples: 594, Vehicles examples: 598

The next step is converting the data into JSON format defined by OpenAI here. Below is an example of the format.

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

The code to convert that data…

import pandas as pd

labels = [vehicles_dataset.target_names[x].split('.')[-1] for x in vehicles_dataset['target']]
texts = [text.strip() for text in vehicles_dataset['data']]
df = pd.DataFrame(zip(texts, labels), columns = ['prompt','completion']) #[:300]
df.head()

And the result, as seen in the Colab Notebook:

lastly, converting the data frame to a JSONL file named vehicles.jsonl:

df.to_json("vehicles.jsonl", orient='records', lines=True)

Now the OpenAI utility can be used to analyse the JSONL file.

!openai tools fine_tunes.prepare_data -f vehicles.jsonl -q

With the result of the analysis displayed below…

Analyzing...

- Your file contains 1192 prompt-completion pairs
- Based on your data it seems like you're trying to fine-tune a model for classification
- For classification, we recommend you try one of the faster and cheaper models, such as `ada`
- For classification, you can estimate the expected model performance by keeping a held out dataset, which is not used for training
- There are 5 examples that are very long. These are rows: [38, 203, 910, 1057, 1130]
For conditional generation, and for classification the examples shouldn't be longer than 2048 tokens.
- Your data does not contain a common separator at the end of your prompts. Having a separator string appended to the end of the prompt makes it clearer to the fine-tuned model where the completion should begin. See https://beta.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples. If you intend to do open-ended generation, then you should leave the prompts empty
- The completion should start with a whitespace character (` `). This tends to produce better results due to the tokenization we use. See https://beta.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details

Based on the analysis we will perform the following actions:
- [Recommended] Remove 5 long examples [Y/n]: Y
- [Recommended] Add a suffix separator `\n\n###\n\n` to all prompts [Y/n]: Y
- [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: Y
- [Recommended] Would you like to split into training and validation set? [Y/n]: Y


Your data will be written to a new JSONL file. Proceed [Y/n]: Y

Wrote modified files to `vehicles_prepared_train.jsonl` and `vehicles_prepared_valid.jsonl`
Feel free to take a look!

Now use that file when fine-tuning:
> openai api fine_tunes.create -t "vehicles_prepared_train.jsonl" -v "vehicles_prepared_valid.jsonl" --compute_classification_metrics --classification_positive_class " motorcycles"

After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string `\n\n###\n\n` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=["s"]` so that the generated texts ends at the expected place.
Once your model starts training, it'll approximately take 30.82 minutes to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.

Now we can start the training process and from this point an OpenAI api key is required.

The command to start the fine-tuning is a single line, with the foundation GPT-3 model defined at the end. In this case it is ada. I wanted to make use of davinci, but the cost is extremely high as opposed to ada which is one of the original base GPT-3 models.

!openai --api-key 'xxxxxxxxxxxxxxxxx' api fine_tunes.create -t "vehicles_prepared_train.jsonl" -v "vehicles_prepared_valid.jsonl" --compute_classification_metrics --classification_positive_class " autos" -m ada

The output from the training process.

Upload progress: 100% 1.35M/1.35M [00:00<00:00, 1.59Git/s]
Uploaded file from vehicles_prepared_train.jsonl: file-qN7D2kAh9h5Ui1XnZuDNrPgm
Upload progress: 100% 320k/320k [00:00<00:00, 623Mit/s]
Uploaded file from vehicles_prepared_valid.jsonl: file-ijOCUihdypRPrcTzodxN9Pa6
Created fine-tune: ft-xPIJ4BIM4giuXY4JOQ9rno2v
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-01-31 05:44:52] Created fine-tune: ft-xPIJ4BIM4giuXY4JOQ9rno2v
[2023-01-31 05:46:21] Fine-tune costs $0.65
[2023-01-31 05:46:22] Fine-tune enqueued. Queue number: 0
[2023-01-31 05:46:24] Fine-tune started
[2023-01-31 05:49:03] Completed epoch 1/4
[2023-01-31 05:51:36] Completed epoch 2/4
[2023-01-31 05:54:07] Completed epoch 3/4
[2023-01-31 05:56:38] Completed epoch 4/4
[2023-01-31 05:57:11] Uploaded model: ada:ft-personal-2023-01-31-05-57-11
[2023-01-31 05:57:12] Uploaded result file: file-EzPswfO3vXl3RXqbIGr4qebS
[2023-01-31 05:57:12] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m ada:ft-personal-2023-01-31-05-57-11 -p <YOUR_PROMPT>

And lastly, the model is queried with an arbitrary sentence: So how do I steer when my hands aren't on the bars?

openai.api_key = "xxxxxxxxxxxxxxxxx"
ft_model = 'ada:ft-personal-2023-01-31-05-57-11'
sample_utterance ="""So how do I steer when my hands aren't on the bars?"""
res = openai.Completion.create(model=ft_model, prompt=sample_utterance + '\n\n###\n\n', max_tokens=1, temperature=0, logprobs=2)
res['choices'][0]['text']

The correct answer is given in motorcycles .

Another example with the sentence: Is countersteering like benchracing only with a taller seat, so your feet aren't on the floor?

ft_model = 'ada:ft-personal-2023-01-31-05-57-11'
sample_utterance ="""Is countersteering like benchracing only with a taller seat, so your feet aren't on the floor?"""
res = openai.Completion.create(model=ft_model, prompt=sample_utterance + '\n\n###\n\n', max_tokens=1, temperature=0, logprobs=2)
res['choices'][0]['text']

And again the correct result is given as motorcycles .

The process of fine-tuning LLMs are not currently receiving the attention it deserves. However, as production implementations of LLMs grow, there will be more focus on fine-tuning for enhanced performance.

⭐️ Please follow me on LinkedIn for updates on Conversational AI ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

NLU design tooling

“Conversation Designer, Retail, 10k+ employees The tool that turned conversation designers, into NLU designers” ★★★★★…

www.humanfirst.ai

https://www.linkedin.com/in/cobusgreyling

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

Eliza Language Technology Community — Language Technology: Conversational AI, NLP/NLP, CCAI…

ELIZA — Where language technology enthusiasts unite.

eliza.community

The Cobus Quadrant™ Of NLU Design

NLU design is vital to planning and continuously improving Conversational AI experiences.

cobusgreyling.medium.com

The Cobus Quadrant™ Of Conversation Design Capabilities

∗ This is part one of a two part series, please also take a look part two, the Cobus Quadrant of NLU Design.

cobusgreyling.medium.com

Google Colab Fine-Tuning Error:

I'm trying to train using Google Colab. Despite establishing the API, I am struck with the following error: " Error: No…

community.openai.com

How To Create A Custom Fine-Tuned Prediction Model Using Base GPT-3 models

Large Langauge Models (LLMs) functionality can be divided into two broad categories: Generative & Predictive.

NLU design tooling

“Conversation Designer, Retail, 10k+ employees The tool that turned conversation designers, into NLU designers” ★★★★★…

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

Eliza Language Technology Community — Language Technology: Conversational AI, NLP/NLP, CCAI…

ELIZA — Where language technology enthusiasts unite.

The Cobus Quadrant™ Of NLU Design

NLU design is vital to planning and continuously improving Conversational AI experiences.

The Cobus Quadrant™ Of Conversation Design Capabilities

∗ This is part one of a two part series, please also take a look part two, the Cobus Quadrant of NLU Design.

Google Colab Fine-Tuning Error:

I'm trying to train using Google Colab. Despite establishing the API, I am struck with the following error: " Error: No…

Written by Cobus Greyling