When OpenAI launched GPTBot in August 2023, it gave websites the ability to explicitly allow or deny ChatGPT's crawler access to their content. Within months, a study by Originality.ai found that nearly 26% of the top 1,000 websites were blocking GPTBot — including many that had no deliberate policy to do so.
If your robots.txt has a "Disallow: /" rule for any user agent, check carefully. A catch-all wildcard block (User-agent: *) will block AI crawlers along with everything else.
How to Check If You're Blocking AI Crawlers
Open your website's robots.txt file directly in your browser by navigating to yourdomain.com/robots.txt. Look for any of the following user agent strings and what rules are applied to them.
| User Agent | AI Engine | Risk if Blocked |
|---|---|---|
| GPTBot | ChatGPT / OpenAI | Invisible to 180M+ ChatGPT users |
| PerplexityBot | Perplexity AI | Excluded from all Perplexity answers |
| anthropic-ai | Claude / Anthropic | Not indexed for Claude's knowledge |
| Google-Extended | Google Gemini | Excluded from AI Overview training |
| Applebot-Extended | Apple Intelligence | Not used in on-device AI features |
The Fix: Explicit Allow Rules
The safest approach is to add explicit Allow rules for each AI crawler. This ensures they can access your content regardless of any broader blocking rules in your robots.txt.
User-agent: GPTBot Allow: / User-agent: PerplexityBot Allow: / User-agent: anthropic-ai Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: /
What About Content I Don't Want AI to Use?
You can selectively block AI crawlers from specific directories. For example, if you have proprietary data, gated content, or pages that shouldn't train AI models, you can allow access to public content while protecting specific paths.
User-agent: GPTBot Allow: /blog/ Allow: /about/ Disallow: /members/ Disallow: /proprietary-data/
The Business Case for Allowing AI Crawlers
46%
of top sites block at least one AI bot
3×
more citations for AI-accessible content
0
ranking cost to allowing crawlers
This is one of the highest-leverage, lowest-effort changes you can make for AEO. It costs nothing, takes five minutes, and the upside is your brand appearing in AI answers that reach hundreds of millions of users. Check your robots.txt today.
