[{"data":1,"prerenderedAt":235},["ShallowReactive",2],{"learn-geo-/en/learn/geo/ai-bot-access-en":3},{"id":4,"title":5,"body":6,"description":193,"extension":194,"meta":195,"navigation":228,"path":229,"seo":230,"stem":233,"__hash__":234},"content_en/5.learn/geo/ai-bot-access.md","AI Bot Access via robots.txt",{"type":7,"value":8,"toc":186},"minimark",[9,22,27,33,42,45,85,89,100,110,116,120,157,161],[10,11,12,16,17,21],"p",{},[13,14,15],"strong",{},"TL;DR"," — A page blocked in ",[18,19,20],"code",{},"robots.txt"," cannot be cited by AI engines — ever, regardless of content quality. Several major sites blocked AI bots reactively in 2023–24 without realising the consequence: they became invisible to AI-generated answers.",[23,24,26],"h2",{"id":25},"why-ai-bot-access-matters","Why AI Bot Access Matters",[10,28,29,30,32],{},"A page that is blocked in ",[18,31,20],{}," for AI crawlers cannot be cited in AI-generated answers — full stop. No amount of schema markup, FAQ blocks, or authority references will help if the crawler cannot access the page in the first place. Bot access is the zero-th condition that all other GEO signals depend on.",[10,34,35,36,38,39,41],{},"In 2023–24, many publishers and websites added AI-specific blocks to their ",[18,37,20],{}," reactively — often in response to concerns about training data usage. The consequence, which many did not anticipate, was immediate exclusion from AI engine citation pools. Perplexity, ChatGPT's browsing mode, and Google AI Overviews all respect ",[18,40,20],{}," directives and will not cite pages that disallow their crawlers.",[10,43,44],{},"The key AI crawler user agents to know:",[46,47,48,55,61,67,73,79],"ul",{},[49,50,51,54],"li",{},[18,52,53],{},"GPTBot"," — OpenAI's crawler (used for training and real-time browsing)",[49,56,57,60],{},[18,58,59],{},"ClaudeBot"," — Anthropic's crawler",[49,62,63,66],{},[18,64,65],{},"anthropic-ai"," — Anthropic's alternate user agent",[49,68,69,72],{},[18,70,71],{},"PerplexityBot"," — Perplexity's crawler",[49,74,75,78],{},[18,76,77],{},"Amazonbot"," — Amazon's crawler (Alexa/Rufus)",[49,80,81,84],{},[18,82,83],{},"Google-Extended"," — Google's crawler for Gemini and AI Overviews training data",[23,86,88],{"id":87},"how-to-implement","How to Implement",[10,90,91,92,95,96,99],{},"Check your ",[18,93,94],{},"/robots.txt"," for any ",[18,97,98],{},"Disallow"," rules targeting these agents. To explicitly allow AI crawlers:",[101,102,107],"pre",{"className":103,"code":105,"language":106},[104],"language-text","User-agent: GPTBot\nAllow: /\n\nUser-agent: ClaudeBot\nAllow: /\n\nUser-agent: anthropic-ai\nAllow: /\n\nUser-agent: PerplexityBot\nAllow: /\n\nUser-agent: Amazonbot\nAllow: /\n\nUser-agent: Google-Extended\nAllow: /\n","text",[18,108,105],{"__ignoreMap":109},"",[10,111,112,113,115],{},"If you want to allow crawling but opt out of training data usage, check each provider's specific opt-out mechanism — some support ",[18,114,98],{}," with specific path patterns or separate configuration files.",[23,117,119],{"id":118},"common-mistakes","Common Mistakes",[46,121,122,139,148],{},[49,123,124,131,132,135,136,138],{},[13,125,126,127,130],{},"Blanket ",[18,128,129],{},"Disallow: /"," applied to all bots"," — a catch-all wildcard block (",[18,133,134],{},"User-agent: *"," with ",[18,137,129],{},") blocks AI crawlers along with all other bots",[49,140,141,144,145,147],{},[13,142,143],{},"Blocking at the CDN/WAF level"," — Cloudflare and AWS WAF bot management may block AI crawlers independently of ",[18,146,20],{},"; check your firewall rules",[49,149,150,156],{},[13,151,152,153],{},"Only checking for ",[18,154,155],{},"Googlebot"," — verifying Googlebot access doesn't mean AI-specific crawlers are permitted; check each agent separately",[23,158,160],{"id":159},"sources","Sources",[46,162,163,172,179],{},[49,164,165],{},[166,167,171],"a",{"href":168,"rel":169},"https://platform.openai.com/docs/gptbot",[170],"nofollow","OpenAI GPTBot documentation",[49,173,174],{},[166,175,178],{"href":176,"rel":177},"https://www.anthropic.com/research/crawling-policy",[170],"Anthropic crawling policy",[49,180,181],{},[166,182,185],{"href":183,"rel":184},"https://developers.google.com/search/docs/crawling-indexing/robots/intro",[170],"Google robots.txt specification",{"title":109,"searchDepth":187,"depth":187,"links":188},2,[189,190,191,192],{"id":25,"depth":187,"text":26},{"id":87,"depth":187,"text":88},{"id":118,"depth":187,"text":119},{"id":159,"depth":187,"text":160},"Allowing AI crawlers (GPTBot, ClaudeBot, PerplexityBot) to index and cite your content.","md",{"publishedAt":196,"badge":197,"type":199,"faq":200,"related":210,"cta":223},"2026-03-31",{"label":198},"Authority","guide",[201,204,207],{"question":202,"answer":203},"If I block AI bots from training data, will they still cite my pages?","It depends on the crawler. OpenAI's GPTBot is used for both training AND real-time browsing in ChatGPT. Blocking GPTBot prevents both. Some providers separate training crawlers from inference crawlers — check each provider's documentation for their specific opt-out paths.",{"question":205,"answer":206},"How do I check which bots are currently blocked on my site?","Access your robots.txt directly at yoursite.com/robots.txt. Look for Disallow rules on User-agent: * (which applies to all bots) and on specific AI crawler agents. Also check your CDN/WAF settings — Cloudflare's Bot Fight Mode and similar tools can block AI crawlers at the network level.",{"question":208,"answer":209},"Should I allow all AI crawlers or only specific ones?","Allow all major AI crawlers unless you have a specific reason to block a particular one. Selective blocking (e.g., allowing Perplexity but blocking GPTBot) is possible but complex to maintain as new AI engines emerge. The default recommendation is to allow all and monitor for content misuse separately.",[211,215,219],{"title":212,"url":213,"description":214},"llms.txt","/learn/geo/llms-txt","The companion file to robots.txt that tells AI engines what your site is about.",{"title":216,"url":217,"description":218},"Content Freshness","/learn/geo/content-freshness","After enabling AI bot access, freshness signals determine citation priority.",{"title":220,"url":221,"description":222},"Schema Markup for AI Engines","/learn/geo/schema-markup","Structured data that AI crawlers read once they have access to your pages.",{"title":224,"description":225,"label":226,"url":227},"Are AI crawlers blocked on your site?","TrustData checks your robots.txt and CDN configuration for AI crawler blocks that make your content invisible.","Audit my pages","https://app.trustdata.tech",true,"/learn/geo/ai-bot-access",{"title":231,"description":232},"AI Bot Access via robots.txt — GEO Optimisation Guide","A page blocked in robots.txt cannot be cited by AI engines. GPTBot, ClaudeBot, and PerplexityBot all respect robots.txt. Verify your site isn't accidentally invisible.","5.learn/geo/ai-bot-access","F2o_iuPg_73Jzz_wILyk_OriU3QwseLtLReMbrQkAuQ",1777026711176]