Are your pages being cited by AI engines? Audit your GEO score for free.

Get a demo
Authority GEO Signals · Published Mar 31, 2026

AI Bot Access via robots.txt

Allowing AI crawlers (GPTBot, ClaudeBot, PerplexityBot) to index and cite your content.

TL;DR — A page blocked in robots.txt cannot be cited by AI engines — ever, regardless of content quality. Several major sites blocked AI bots reactively in 2023–24 without realising the consequence: they became invisible to AI-generated answers.

Why AI Bot Access Matters

A page that is blocked in robots.txt for AI crawlers cannot be cited in AI-generated answers — full stop. No amount of schema markup, FAQ blocks, or authority references will help if the crawler cannot access the page in the first place. Bot access is the zero-th condition that all other GEO signals depend on.

In 2023–24, many publishers and websites added AI-specific blocks to their robots.txt reactively — often in response to concerns about training data usage. The consequence, which many did not anticipate, was immediate exclusion from AI engine citation pools. Perplexity, ChatGPT's browsing mode, and Google AI Overviews all respect robots.txt directives and will not cite pages that disallow their crawlers.

The key AI crawler user agents to know:

  • GPTBot — OpenAI's crawler (used for training and real-time browsing)
  • ClaudeBot — Anthropic's crawler
  • anthropic-ai — Anthropic's alternate user agent
  • PerplexityBot — Perplexity's crawler
  • Amazonbot — Amazon's crawler (Alexa/Rufus)
  • Google-Extended — Google's crawler for Gemini and AI Overviews training data

How to Implement

Check your /robots.txt for any Disallow rules targeting these agents. To explicitly allow AI crawlers:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Google-Extended
Allow: /

If you want to allow crawling but opt out of training data usage, check each provider's specific opt-out mechanism — some support Disallow with specific path patterns or separate configuration files.

Common Mistakes

  • Blanket Disallow: / applied to all bots — a catch-all wildcard block (User-agent: * with Disallow: /) blocks AI crawlers along with all other bots
  • Blocking at the CDN/WAF level — Cloudflare and AWS WAF bot management may block AI crawlers independently of robots.txt; check your firewall rules
  • Only checking for Googlebot — verifying Googlebot access doesn't mean AI-specific crawlers are permitted; check each agent separately

Sources

Frequently Asked Questions

Related Signals

Your GEO score

Find out which GEO signals are missing from your pages and how to fix them.

Audit my pages

14-day free trial

Are AI crawlers blocked on your site?

TrustData checks your robots.txt and CDN configuration for AI crawler blocks that make your content invisible.