Testing OpenAI’s New AI Text Classifier For Identifying AI-Written Content

I took human and AI generated text from various sources, including LLMs and submitted it to the OpenAI Classifier. The objective was to gauge the classifier’s ability to detect the origin of text content.

Cobus Greyling
6 min readFeb 1

Yesterday OpenAI announced and launched a classifier trained to distinguish between AI-written and human-written text.

Each document submitted is classified into one of five classes:

1️⃣ Very unlikely AI-generated,

2️⃣ Unlikely AI-generated,

3️⃣ Unclear if it is AI-generated,

4️⃣ Possibly AI-generated, or

5️⃣ Likely AI-generated.

OpenAI has trained a classifier to differentiate between human and AI written text based on a fine-tuned GPT model. The model can predict how likely a portion of text was AI generated or not, and from a variety of sources, including ChatGPT.

I made use of AI21Labs, Cohere, text-davinci-003, ChatGPT and other sources to generate text on an arbitrary and ambiguous topic like “punctuality” to test the classifier.

In the table below is an overview of the results, with the source of the text on the left, and the classifier accuracy on the right. The detail of the results are discussed in the article…

OpenAI does clearly state the following:

Our classifier is not fully reliable.

In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,”

while incorrectly labeling human-written text as AI-written 9% of the time (false positives).

Our classifier’s reliability typically improves as the length of the input text increases.

Compared to our previously released classifier, this new classifier is significantly more reliable on text from more recent AI systems.

Text Generated Via The Cohere LLM

In the image below you see text generated in the Cohere playground…the engineered prompt is indicated by the red arrow. In other words, the instruction given to the LLM; the input.

And below it, marked as output, you see the text generated by Cohere.

Below, the generated text from Cohere is copied into the AI text classifier of OpenAI. The result from the classifier is that the text is to be considered likely AI-generated. Hence correct and full confidence.

Text generated in the Cohere Playground are submitted here to the AI Text Classifier of OpenAI.

Text Generated Via AI21Labs

The same generation command was issued in the AI21Labs playground…asking the AI21Labs LLM to generate text on the importance of punctuality.

Below, the AI21Labs generated text is submitted to the OpenAI AI Text Classifier; with the desired response. The result from the classifier is that the text is to be considered likely AI-generated. Hence correct with full confidence.

Text generated in the AI21Labs Playground are submitted here to the AI Text Classifier of OpenAI.

ChatGPT

Below you see context generated by ChatGPT…and is rated as possibly by the classifier. Hence being seen one step closer to human generated text as apposed to Cohere and AI21Labs.

I would have expected the classifier to state AI generated with full confidence.

OpenAI text-davinci-003 Model

I also submitted a 500 word text generation by text-davinci-003 on the topic of punctuality and received the same answer from ChatGPT; Possibly AI-generated.

I assumed the classifier would be able to clearly detect text generated on text-davinci-003 or ChatGPT.

An Essay From The Web

I copied a piece from an online essay, and the result from the classifier is ambitious to some degree, but fairly accurate.

My Own Writing

Below is a original piece I wrote on the same subject, which was marked by OpenAI as possibly AI-generated. I would expect a result of Unclear if it is AI-generated.

But I hasten to add that the piece is short, and as I have said before, the piece is ambiguous with not much definitive text.

Wikipedia

Considering that the AI Text Classifier was trained on Wikipedia, I copied a piece from Wikipedia on World War I and asked the classifier to vet the contents. Here I got the right answer, and also the highest ranking of very unlikely.

Can ChatGPT Detect Text Origins?

The short answer is…yes.

The results are definitive, and in my few attempts, very accurate:

And the response on my own writing is also correct.

Keep In Mind

Apart from the accuracy issues stated at the beginning of this article, there are other limitations…

⏺ The text and subject I used to premise the writing on is very generic and general. More ambiguous content like this is most probably harder to classify.

⏺ The longer the text to be analysed (> 1,0000 characters) the more reliable the results are.

⏺ Human written text are sometimes incorrectly labeled as AI written. So there seems to be a type of a bias towards a default classification of “AI written”.

⏺ The classifier is English only and not multilingual.

⏺ The classifier is unreliable on classifying code.

⏺ AI generated text which is edited by a human can fool the classifier.

The Data

OpenAI collected a dataset of AI-generated and human-written text.

The human-written text has three sources:

In Conclusion

It is evident that the accuracy of the classifier is not where it should be, and OpenAI states this fact openly: “Our classifier is not fully reliable”.

However, there are a few positives…the first is that this is a step in the right direction and will become an invaluable tool, especially for educators and educational institutions.

Responsible AI has always been front-of mind with most people, and OpenAI has been very open about their focus and due diligence regarding responsible AI.

Considering all of this, the classifier is a step in the right direction and another example of OpenAI taking the lead.

--

--

Cobus Greyling

Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; NLP/NLU/LLM, Chat/Voicebots, CCAI.