Adding A Moderation Layer To Your OpenAI Implementations

If you don’t, you are jeopardising any production implementations you might have. The moderation endpoint is free to use for monitoring inputs and outputs of OpenAI API calls; this excludes third-party traffic.

  1. OpenAI may ask API users to make necessary changes.
  2. Repeated or serious violations can lead to further action from OpenAI and even termination of your account.
  3. This is crucial for any commercial or product related implementation, as failure to moderate usage can lead to suspension of OpenAI services, and impeding the product or service rendered.
Source

We want everyone to use our tools safely and responsibly. That’s why we’ve created usage policies that apply to all users of OpenAI’s models, tools, and services. By following them, you’ll ensure that our technology is used for good.

~ OpenAI

OpenAI is working on an ongoing basis to improve the accuracy of the classifier. OpenAI also states that they are are especially focussed on improving the classifications for hate, self-harm, and violence/graphic.

pip install openai

import os
import openai
openai.api_key = "xxxxxxxxxxxxxxxxxxxxxx"

response = openai.Moderation.create(
input="Sample text goes here"
)
output = response["results"][0]
print (output)
{
"categories": {
"hate": false,
"hate/threatening": false,
"self-harm": false,
"sexual": false,
"sexual/minors": false,
"violence": false,
"violence/graphic": false
},
"category_scores": {
"hate": 4.921302206639666e-06,
"hate/threatening": 1.0990176546599173e-09,
"self-harm": 8.864341261016762e-09,
"sexual": 2.6443567548994906e-05,
"sexual/minors": 2.4819328814373876e-07,
"violence": 2.1955165721010417e-05,
"violence/graphic": 5.248724392004078e-06
},
"flagged": false
}
response = openai.Moderation.create(
input="I hate myself and want to do harm to myself"
)
output = response["results"][0]
print (output)
{
"categories": {
"hate": false,
"hate/threatening": false,
"self-harm": true,
"sexual": false,
"sexual/minors": false,
"violence": false,
"violence/graphic": false
},
"category_scores": {
"hate": 5.714087455999106e-05,
"hate/threatening": 2.554639308982587e-07,
"self-harm": 0.9999761581420898,
"sexual": 2.3994387447601184e-05,
"sexual/minors": 1.6004908331979095e-07,
"violence": 0.027929997071623802,
"violence/graphic": 4.723879101220518e-06
},
"flagged": true
}

The three sections as described by OpenAI:

In Closing

Any production implementation needs to exercise due diligence in ensuring input/output is moderated carefully and in a persistent manner. It might not be reasonable to monitor every single dialog turn both API input and output.

https://www.linkedin.com/in/cobusgreyling
https://www.linkedin.com/in/cobusgreyling

--

--

Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; NLP/NLU/LLM, Chat/Voicebots, CCAI. www.humanfirst.ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cobus Greyling

Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; NLP/NLU/LLM, Chat/Voicebots, CCAI. www.humanfirst.ai