GPTBot

OpenAI’s Web Crawler

3 min readAug 9, 2023

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

There is an upheaval on the internet regarding GPTBot and how it will be used to train GPT5.

However, crawlers are as old as the internet, with companies like Google using crawlers to perform actions for its products automatically.

Crawlers, or bots are generic terms for an automated process to automatically discover and scan websites, following links.

Web Google uses crawlers and fetchers to perform actions for its products, either automatically or triggered by user request.

Hence the notion of bots have been used extensively by various companies; should the transparency of OpenAI and the ability to opt-out not lauded?

GPTBot, OpenAI’s web crawler can be identified by its user agent and string:

User agent token: GPTBot

Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Disallowing GPTBot

In order to block the GPTBot to access a website, add the following to robots.txt:

User-agent: GPTBot Disallow: /

GPTBot Customised Access

Directing GPTBot to only access parts of a site, add the GPTBot token to site’s robots.txt like this:

User-agent: GPTBot Allow: /directory-1/ Disallow: /directory-2/

⭐️ Follow me on LinkedIn for updates on Conversational AI ⭐️

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

NLU design tooling

HumanFirst is data-centric tooling for NLU designers. Create, curate, evaluate & fine-tune long-tail NLU with 50+ NLU…

www.humanfirst.ai

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

platform.openai.com