Fine-Tuning With The OpenAI Language API

And How This Enables You To Leverage GPT-3


On 13 July 2021 OpenAI enabled fine-tuning for all users who have API access. Some elements of this feature arecurrently in beta, hence some parameters most probably will changed.

The idea from OpenAI is that fine-tuning of this nature afford users the opportunity to train a model, which will should yield answers in keeping with the training data and not of a general nature.

All tests were performed using the OpenAI CLI (Command Line Interface).

In some instances cURL, the Playground or Python code can be used. However, the OpenAI CLI lends the best structure to the training process.

Once a model has been fine-tuned, you won’t need to provide examples in the prompt anymore.

For a general purpose chatbot, the training data can be minimal.

Perhaps 20 examples per intent; at most for starters. However, when creating a data set for training it is advised that you use a few hundred training examples.

For classification at least 100 training examples are required per class, for some training examples more than 500 records of training data is demanded.

This is not in keeping with other environments like Rasa, IBM Watson Assistant, Microsoft LUIS etc., where astounding results can be achieved with relative few training examples.

At a high level, fine-tuning involves the following steps:

  1. Prepare and upload training data
  2. Train a new fine-tuned model
  3. Use your fine-tuned model

The Prototype Environment

I found the easiest way to run the OpenAI CLI was to spin up an Ubuntu instance on AWS, and run the commands via SSH and PuTTY.

The OpenAI CLI is very responsive and easy to use. Its level of simplicity should yield good adoption.

openai api completions.create -m ada:ft-user-
sdfsfaefsfdsfsfdfdfsfjvlss-2021-07-30-19-19-27 -p <YOUR_PROMPT>

The trained model can be invoked with the command above, referencing the model ID.

# List all created fine-tunes
openai api fine_tunes.list

All the models you have trained are listed with the list command. Training does take a while, and invoking a model with the user prompt at this stage is quite sluggish in response. Slow for chat, and definitely not suited for a voicebot.

More On The Language API, Fine-Tuning & Leveraging GPT-3

For the prototype I created a JSONL file with 1,500 entries of questions and answers for Kaggle.

{"prompt":"Did the U.S. join the League of Nations?",
{"prompt":"Where was the League of Nations created?",

GPT-3 fine tuning does support Classification, Sentiment analysis, Entity Extraction, Open Ended Generation etc. The challenge is always going to be, to allow users to train the conversational interface:

  • With as little data as possible,
  • whilst creating stable and predictable conversations,
  • and allowing for managing the environment (and collaboration).

OpenAI has a tool to upload the training data and in turn the OpenAI CLI assesses the training data…

openai tools fine_tunes.prepare_data -f qa.txt

And reverts with suggestions…

Analyzing...- Based on your file extension, you provided a text file- Your file contains 1476 prompt-completion pairs- `completion` column/key should not contain empty strings. These are rows: [1475]Based on the analysis we will perform the following actions:- [Necessary] Your format `TXT` will be converted to `JSONL`- [Necessary] Remove 1 rows with empty completions- [Recommended] Remove 159 duplicate rows [Y/n]: Y- [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: Y

Training is initiated with the command:

openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> -m <BASE_MODEL>openai api fine_tunes.create -t qa.jsonl -m ada

The training job is queued and can take quite a while to test. With current and established chatbot development environments, quick iterations can be followed comprising of:

  1. Compile training data
  2. Train
  3. Test
  4. Make changes

With GPT-3, seemingly steering the model using training data will be hard.

The model can be tested in the following way:

openai api completions.create -m curie:ft-user-
fdfefsfrssasfooeesfs-2021-07-30-21-30-40 -p "Is it a winter sports resort, although it is perhaps best known as a tax haven?"
Is it a winter sports resort, although it is perhaps best known as a tax haven? Yes. It is a winter sports resort.

Something interesting from the response is, the training data was:

{"prompt":"Is it a winter sports resort , although it is perhaps best known as a tax haven ?","completion":"Yes"}

And the response from GPT-3 was:

Yes. It is a winter sports resort.

This is a very conversational augmentation of the short training example of “yes”.


OpenAI Language API as a conversational environmen,t is definitely moving in the right direction. With the Language API, it seems if OpenAI started at the opposite end as opposed to other chatbot framework providers.

They introduced a low-code level 4/5 chatbot, with lacking reliable responses and fine-tuning.

A custom trained model can be tested in the Playground and a custom trained response is yielded. Using the playground makes it easier to switch between models and test different scenarios.

Fine-tuning is the avenue to a more reliable or predictable chatbot; especially for a corporate or enterprise solution.

Some considerations:

  • Training of smaller samples of data will help with benchmarking and quick iterations.
  • Defining entities contextually within intent examples is important; I did not test this feature; as at least 500 training examples are required.
  • Having different trained models to manage can be a challenge. Most probably an abstraction layer be required to determine which model is applicable in specific scenarios.

The OpenAI Language API leveraging GPT-3 can be a disruptive force, once a more structured and cohesive fine-tuning approach are reached. An approach which is conducive to collaboration of larger teams.

At times I wonder if GPT-3 is targeting to become a NLP / general conversational tool. Or if there are ambitions to become a low-code chatbot development framework.

In accurately evaluating GPT-3’s NLU/P capability, it is prudent to keep the vision of OpenAI in mind…

Our API provides a general-purpose “text in, text out” interface, which makes it possible to apply it to virtually any language task. This is different from most other language APIs, which are designed for a single task, such as sentiment classification or named entity recognition.

The API runs models with weights from the GPT-3 family with many speed and throughput improvements.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store