OpenAI Has Expanded Their Fine-Tuning GUI

OpenAI has simplified fine-tuning considerably by introducing a GUI for creating fine-tuned models.

6 min readOct 8, 2023


While fine-tuning changes the behaviour of the LLM and RAG provides a contextual reference for inference, fine-tuning has not received the attention it should have in the recent past. One can argue that this is due to a few reasons…

Training Data Size

In the past OpenAI advised to have more than 1,000 fine-tuning records in the training data set. Hence preparing and formatting data was challenging and time-consuming.

The new OpenAI fine-tuning process requires only 10 records for training a fine-tuned model. And the results of the fine-tuning can be demonstrated via a few simple prompts.

Training Time

The time the model took to create and train a fine-tuned model was long and following a process of rapid iterations was not feasible. Training time has shortened considerably and via the fine-tuning console and email notification the user is kept up to date.


The fine-tuning UI was in the past command line or program based, the addition of a GUI to upload data, track the progress of fine-tuning jobs, etc. will democratise the process.


The cost of fine-tuning has come down considerably making it accessible for organisation to create custom models.


Data Privacy

Data Privacy is still a consideration, with data being sent into the cloud for fine-tuning. Often enterprises demand all computing to take place via an on-premise data centre, or only in certain geographies.


Fine-tuning for completion is important; but there is a significant use-case for classification which is not receiving the attention it should. Classification in the context of traditional chatbots is known as intent detection and NLU is still relevant for classification.

Hosted & Latency

Fine-tuned models are still hosted somewhere and compliance can be hard to reach in some instances. There is a significant opportunity to meet enterprise requirements for hosting and data privacy.

Product Superseding

With each expansion of base-LLM functionality, functionality included by default in the LLM offering, a number of products are wiped out. Or put differently, superseded.


The real challenge for fine-tuning large language models in a scaleable and repeatable fashion lies with the data. And in particular data discovery, data design, data development and data delivery. More about the four D’s in a follow-up post.

OpenAI Fine-Tuning UI

The OpenAI fine-tuning UI which launched recently is very minimalistic, but effective. A list of fine-tuned models is visible on the left, with successful and failed attempts listed. On the top right new fine-tunings can be created, or users can navigate to the training files section.

Below is a view of the files section, where a JSONL file can be uploaded. The minimum size of a training file is 10 records/lines.

The training file text:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What country has the highest life expectancy?"}, {"role": "assistant", "content": "If you want to live long, go to Hong Kong!"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the most common surname in the United States?"}, {"role": "assistant", "content": "It is so common you should know it is Smith."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What country drinks the most coffee per capita?"}, {"role": "assistant", "content": "Finland, not Italy or the US."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What country has won the most World Cups?"}, {"role": "assistant", "content": "Brazil won the most soccer and football world cups."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What software company is headquartered in Redmond, Washington?"}, {"role": "assistant", "content": "Microsoft, the inventors of the reboot."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What city is known as the eternal city?"}, {"role": "assistant", "content": "Rome, but we know when it was founded..."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What artist has the most streams on Spotify?"}, {"role": "assistant", "content": "It sounds like Drake."}]}

The training file needs to be in the structure of the chat mode with roles of system and user.

Once the file is uploaded, it is vetted by OpenAI and if no anomalies are found, a status of ready is assigned.

Below the list of available models for fine-tuning is shown; babbage-002, davinci-002 and gpt-3.5-turbo-0613. There is the option to upload a new fie, or select an existing file. Something I find curious here is that the files are not listed, and users need to navigate to the files section, copy a file ID and navigate back to paste the ID.

The file ID is shown, with the model to fine-tune.

In Closing

There is a big opportunity in terms of the data; as I have mentioned data discovery, data design and data development.

The first step of data discovery, is to ensure that the training data is aligned with the conversation customers want to have. Considering the Venn diagram below, the bigger the commonality marked ‘a’ is, the more successful the model will be.

Increasing the size of commonality ‘a’ involves the process of performing data discovery on existing customer conversations, and using that data as the bedrock of training data.



Cobus Greyling

I explore and write about all things at the intersection of AI & language; LLMs/NLP/NLU, Chat/Voicebots, CCAI.