Are your pages being cited by AI engines? Audit your GEO score for free.
Get a demoAI Bot Access via robots.txt
TL;DR — A page blocked in robots.txt cannot be cited by AI engines — ever, regardless of content quality. Several major sites blocked AI bots reactively in 2023–24 without realising the consequence: they became invisible to AI-generated answers.
Why AI Bot Access Matters
A page that is blocked in robots.txt for AI crawlers cannot be cited in AI-generated answers — full stop. No amount of schema markup, FAQ blocks, or authority references will help if the crawler cannot access the page in the first place. Bot access is the zero-th condition that all other GEO signals depend on.
In 2023–24, many publishers and websites added AI-specific blocks to their robots.txt reactively — often in response to concerns about training data usage. The consequence, which many did not anticipate, was immediate exclusion from AI engine citation pools. Perplexity, ChatGPT's browsing mode, and Google AI Overviews all respect robots.txt directives and will not cite pages that disallow their crawlers.
The key AI crawler user agents to know:
GPTBot— OpenAI's crawler (used for training and real-time browsing)ClaudeBot— Anthropic's crawleranthropic-ai— Anthropic's alternate user agentPerplexityBot— Perplexity's crawlerAmazonbot— Amazon's crawler (Alexa/Rufus)Google-Extended— Google's crawler for Gemini and AI Overviews training data
How to Implement
Check your /robots.txt for any Disallow rules targeting these agents. To explicitly allow AI crawlers:
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Google-Extended
Allow: /
If you want to allow crawling but opt out of training data usage, check each provider's specific opt-out mechanism — some support Disallow with specific path patterns or separate configuration files.
Common Mistakes
- Blanket
Disallow: /applied to all bots — a catch-all wildcard block (User-agent: *withDisallow: /) blocks AI crawlers along with all other bots - Blocking at the CDN/WAF level — Cloudflare and AWS WAF bot management may block AI crawlers independently of
robots.txt; check your firewall rules - Only checking for
Googlebot— verifying Googlebot access doesn't mean AI-specific crawlers are permitted; check each agent separately
Sources
Frequently Asked Questions
Related Signals
Your GEO score
Find out which GEO signals are missing from your pages and how to fix them.
Audit my pages14-day free trial
Are AI crawlers blocked on your site?
TrustData checks your robots.txt and CDN configuration for AI crawler blocks that make your content invisible.