How do I block ChatGPT from scraping my website?

Add a robots.txt rule that disallows OpenAI's GPTBot. This tool generates it: tick GPTBot (it is on by default), keep the path as "/", and copy the User-agent: GPTBot / Disallow: / block into the robots.txt file at your site root. GPTBot is the crawler OpenAI uses to gather training data, and OpenAI documents that it honours this rule.

What is GPTBot and should I block it?

GPTBot is OpenAI's web crawler that collects pages to train its GPT models. Blocking it keeps your content out of future training sets and has no effect on Google, Bing, or even ChatGPT's live search — those use different bots. Block it if you don't want your work used for model training; leave it if you don't mind.

Does blocking Google-Extended affect my Google Search ranking?

No. Google-Extended only controls whether your content trains Gemini and Vertex AI. Google's own documentation confirms it is separate from Googlebot, which handles Search indexing and ranking. You can disallow Google-Extended and keep ranking in Google Search exactly as before.

How do I stop AI companies from training on my content?

Block the training crawlers: GPTBot (OpenAI), Google-Extended, CCBot (Common Crawl), anthropic-ai, Applebot-Extended, meta-externalagent and cohere-ai. This tool's "Block training, keep AI search" preset selects all 7 of them plus aggressive scrapers, while leaving AI-search bots allowed so you stay visible in AI answers.

Is robots.txt enough to actually block AI crawlers?

It stops the crawlers that choose to obey it — OpenAI, Google, Anthropic and most reputable operators do. It does not physically block anything, so a rogue scraper can ignore it. For hard enforcement, add server, firewall, or Cloudflare AI-bot rules on top. robots.txt is the correct first step and the one every major AI company reads.

Will this hurt my normal SEO or Google ranking?

No. The generator never emits rules for Googlebot, Bingbot or other search crawlers — only AI-specific bots. Your pages keep indexing and ranking in traditional search engines exactly as before. That separation is the whole point of the tool.

What's the difference between AI training bots and AI search bots?

Training bots (GPTBot, Google-Extended, CCBot and 5 more) feed model pre-training datasets. AI-search bots (OAI-SearchBot, PerplexityBot, ClaudeBot and 4 more) index pages so ChatGPT, Perplexity and Claude can cite you in live answers. Blocking training protects your content; blocking AI-search can remove you from AI answer results, so many sites leave those allowed.

Where exactly do I put the generated rules?

In a plain-text file named robots.txt in your website's root directory, so it loads at https://yoursite.com/robots.txt. Paste the generated block there. If you already have a robots.txt, add these groups to it — they sit alongside any existing rules. Most CMSs (WordPress, Shopify, Wix) have a robots.txt editor in settings.

How many AI bots does this tool cover?

21 crawlers across three groups: 7 training crawlers, 7 aggressive scrapers, and 7 AI-search assistants. Each user-agent token is taken from the operator's official documentation and linked on the page, and the output follows the Robots Exclusion Protocol (RFC 9309).

AI · robots.txt

Block AI Bots in robots.txt — GPTBot, ClaudeBot & more

Stop AI crawlers from scraping your site for model training, without losing Google or Bing search traffic. Tick the bots to block, copy the rules, paste into robots.txt. Runs entirely in your browser — no signup, sources cited.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 25, 2026

How it works

This generator produces a block of the Robots Exclusion Protocol — the format every well-behaved web crawler reads at /robots.txt. The grammar is standardised in RFC 9309. For each AI crawler you choose to block, the tool emits one group:

User-agent: GPTBot
Disallow: /

User-agent names the exact crawler token, and Disallow tells it which paths to stay out of. Disallow: / means “the whole site”; a path like Disallow: /blog/ blocks only that section. To allow a bot, it is simply left out (under RFC 9309, anything not disallowed is allowed) — or, when comments are on, written as an explicit empty rule for clarity:

User-agent: OAI-SearchBot
Disallow:

The 21crawler tokens are hard-coded from each operator's official documentation and grouped by purpose:

Training crawlers (7) feed model pre-training datasets — GPTBot, Google-Extended, CCBot, anthropic-ai, Applebot-Extended, meta-externalagent, cohere-ai. Blocking them keeps your content out of training data.
Aggressive scrapers (7) harvest and resell web data — Bytespider, Diffbot, omgilibot and others. Usually safe to block.
AI-search assistants (7) index pages so ChatGPT, Perplexity and Claude can cite you in live answers — blocking these can remove your site from AI answer results, so many content owners leave them allowed.

Critically, the tool never emits a rule for Googlebot, Bingbot, or any traditional search crawler — so your normal SEO is untouched. That is why blocking Google-Extended is safe: Google's own docs confirm it is separate from Googlebot and has no effect on Search ranking.

One honest caveat: robots.txt is voluntary. Reputable AI companies obey it, but it does not physically block anyone, so a rogue scraper can ignore it. For enforcement you can add noai meta tags, X-Robots-Tag HTTP headers, or a Cloudflare/WAF AI-bot rule on top. robots.txt is the correct, universal first step — and the one every major AI operator actually reads.

Worked examples

Example 1 — Block AI training, keep AI search

A Colombo recipe blogger wants her posts kept out of LLM training sets, but still wants to appear in ChatGPT Search and Perplexity answers with attribution.

Select the 7 training crawlers + 7 scrapers (the "Block training, keep AI search" preset).
Leave the AI-search bots (OAI-SearchBot, PerplexityBot, ClaudeBot …) unticked.
Path stays "/" — block the whole site for those crawlers.
Result: 14 Disallow: / groups; AI-search bots written as explicit empty Disallow so they stay allowed.

# Block — AI training crawlers

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

# … CCBot, anthropic-ai, Applebot-Extended,
#     meta-externalagent, cohere-ai …

# Allowed — these AI assistants can still cite your pages

User-agent: OAI-SearchBot
Disallow:

User-agent: PerplexityBot
Disallow:

Example 2 — Block everything AI (maximum protection)

A subscription news site wants zero AI access of any kind.

Use the "Block everything AI" preset.
Every token in all three groups is selected — 21 crawlers total.
Output: 21 User-agent groups, each with Disallow: /. No AI bot is left allowed.
Googlebot and Bingbot are still untouched, so traditional search keeps working.

# Block — AI training crawlers
User-agent: GPTBot
Disallow: /
# … +6 more training crawlers

# Block — Aggressive scrapers
User-agent: Bytespider
Disallow: /
# … +6 more scrapers

# Block — AI search & assistants
User-agent: OAI-SearchBot
Disallow: /
# … +6 more AI-search bots

Example 3 — Block one bot from one section only

A site is happy to be trained on, except for its paid /members/ area, which it wants kept from GPTBot.

Untick every bot except GPTBot.
Change the path to /members/ (it must start with "/").
Turn comments off for a minimal rule.
Result: a single group scoped to that one directory.

User-agent: GPTBot
Disallow: /members/

Frequently asked questions

Sources & references

Every bot token on this page is taken from its operator's official documentation and was last cross-checked on 2026-06-25. The list is reviewed whenever a major AI operator publishes or renames a crawler.

Related tools

LiveAI

Prompt Formatter

Paste rough notes and get a well-structured AI prompt with role, task, constraints, and output format. Works with GPT, Claude, Gemini. Deterministic templating — runs entirely in your browser.

Open tool

LiveAI

Translation API Compare

Side-by-side comparison of the major machine-translation and LLM-as-translator APIs — DeepL, Google Cloud Translation, OpenAI, Anthropic Claude, Microsoft Azure Translator, Amazon Translate, DeepSeek and ModernMT/Lara — by price per million characters, language coverage, glossary and document support, custom/adaptive models, free tier and commercial-use terms. Pick your providers, enter your monthly volume in characters or words, and rank them by projected cost. Every figure cites the vendor source.

Open tool

LiveAI

Text-to-Speech Compare

Side-by-side comparison of the major hosted text-to-speech (TTS) APIs — ElevenLabs, OpenAI, Google Cloud, Azure AI Speech, Amazon Polly, PlayHT, Murf, Cartesia and Deepgram Aura — by price per character, voice cloning, streaming latency, language coverage, output formats, commercial-use terms and published naturalness. Pick your providers, enter your monthly volume in characters, words or minutes, and rank them by projected cost. Every figure cites the vendor source.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Spotted a new AI crawler, a renamed token, or a bug?

Email me at [email protected] — most fixes ship within 24 hours.

AI training crawlers

Aggressive scrapers

AI search & assistants

Blocked bots — operator & SEO impact

How it works

Worked examples

Frequently asked questions

How do I block ChatGPT from scraping my website?

What is GPTBot and should I block it?

Does blocking Google-Extended affect my Google Search ranking?

How do I stop AI companies from training on my content?

Is robots.txt enough to actually block AI crawlers?

Will this hurt my normal SEO or Google ranking?

What's the difference between AI training bots and AI search bots?

Where exactly do I put the generated rules?

How many AI bots does this tool cover?

Sources & references

Related tools

Prompt Formatter

Translation API Compare

Text-to-Speech Compare

Comments & feedback