Block AI Bots in robots.txt — GPTBot, ClaudeBot & more
Stop AI crawlers from scraping your site for model training, without losing Google or Bing search traffic. Tick the bots to block, copy the rules, paste into robots.txt. Runs entirely in your browser — no signup, sources cited.
How it works
This generator produces a block of the Robots Exclusion Protocol — the format every well-behaved web crawler reads at /robots.txt. The grammar is standardised in RFC 9309. For each AI crawler you choose to block, the tool emits one group:
User-agent: GPTBot Disallow: /
User-agent names the exact crawler token, and Disallow tells it which paths to stay out of. Disallow: / means “the whole site”; a path like Disallow: /blog/ blocks only that section. To allow a bot, it is simply left out (under RFC 9309, anything not disallowed is allowed) — or, when comments are on, written as an explicit empty rule for clarity:
User-agent: OAI-SearchBot Disallow:
The 21crawler tokens are hard-coded from each operator's official documentation and grouped by purpose:
- Training crawlers (7) feed model pre-training datasets — GPTBot, Google-Extended, CCBot, anthropic-ai, Applebot-Extended, meta-externalagent, cohere-ai. Blocking them keeps your content out of training data.
- Aggressive scrapers (7) harvest and resell web data — Bytespider, Diffbot, omgilibot and others. Usually safe to block.
- AI-search assistants (7) index pages so ChatGPT, Perplexity and Claude can cite you in live answers — blocking these can remove your site from AI answer results, so many content owners leave them allowed.
Critically, the tool never emits a rule for Googlebot, Bingbot, or any traditional search crawler — so your normal SEO is untouched. That is why blocking Google-Extended is safe: Google's own docs confirm it is separate from Googlebot and has no effect on Search ranking.
One honest caveat: robots.txt is voluntary. Reputable AI companies obey it, but it does not physically block anyone, so a rogue scraper can ignore it. For enforcement you can add noai meta tags, X-Robots-Tag HTTP headers, or a Cloudflare/WAF AI-bot rule on top. robots.txt is the correct, universal first step — and the one every major AI operator actually reads.
Worked examples
Frequently asked questions
Sources & references
- OpenAI — GPTBot, ChatGPT-User & OAI-SearchBot documentation
- Anthropic — ClaudeBot, anthropic-ai & Claude-User crawling policy
- Google — Google crawlers overview (Google-Extended vs. Googlebot)
- Common Crawl — CCBot user-agent & robots.txt policy
- Perplexity — PerplexityBot & Perplexity-User
- Apple — Applebot & Applebot-Extended
- Robots Exclusion Protocol — RFC 9309
Every bot token on this page is taken from its operator's official documentation and was last cross-checked on 2026-06-25. The list is reviewed whenever a major AI operator publishes or renames a crawler.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Spotted a new AI crawler, a renamed token, or a bug?
Email me at [email protected] — most fixes ship within 24 hours.