Robots.txt Checker
Test any URL against any site's robots.txt. Detects allow/disallow conflicts, regex mistakes, and AI-crawler-specific rules (GPTBot, ClaudeBot, PerplexityBot).
See exactly which crawlers can fetch which URLs on your site
A wrong line in robots.txt has destroyed more SEO traffic than any algorithm update. One disallow rule pushed live during a staging deploy and your entire site goes invisible to Google for the time it takes someone to notice.
This checker fetches your live robots.txt, parses it the way Google's crawler does (path-prefix matching, longest-rule-wins, wildcard expansion), and lets you test any URL against any user-agent. Including the AI crawlers, GPTBot, ClaudeBot, PerplexityBot, Google-Extended, that have their own rules and matter for GEO/AEO.
Six robots.txt mistakes that quietly destroy SEO traffic
1. Disallow: / left over from staging
Someone deployed staging's robots.txt to production. Site goes uncrawlable. Happens more than you'd think, once a quarter on average for new clients.
2. Blocking JS, CSS, or image folders
Old advice from 2015 was to block crawlers from JS/CSS for "crawl budget". Modern Google needs to render your pages, blocking JS makes pages look broken to Google. Always allow.
3. Conflicting Allow and Disallow paths
Disallow: /products/ and Allow: /products/widgets/, Google's rule is "longest path wins" but most CMSs and security plugins write rules in the wrong order. The checker shows the resolved decision per URL.
4. User-agent: * with too-broad Disallow
Blocking with the wildcard hits Google AND every legitimate crawler. Use specific user-agents to target only the ones you mean.
5. Forgetting to add a sitemap directive
Adding Sitemap: https://yoursite.com/sitemap.xml to robots.txt is the single highest-impact one-line change you can make for indexation.
6. Using robots.txt to hide pages from search results
robots.txt blocks crawling, not indexing. A page can still appear in search results if other sites link to it, Google just shows it without a snippet. To hide a page from search results, use noindex meta tag instead.
AI crawler robots.txt, what to allow, what to block
Most sites are still configured for Googlebot only. AI engines like ChatGPT, Claude, Perplexity, and Gemini use different crawlers, and they're the ones that decide whether your brand gets cited in AI answers. Configure these correctly:
GPTBot, used by OpenAI to crawl for ChatGPT training. ALLOW if you want citations in ChatGPT answers.
ClaudeBot, Anthropic's crawler. ALLOW for Claude citations.
PerplexityBot, Perplexity's crawler. ALLOW for Perplexity citations.
Google-Extended, Google's separate AI training crawler (different from Googlebot). ALLOW for Gemini/SGE citations.
CCBot, Common Crawl (used by many LLMs). ALLOW for general LLM exposure.
Blocking these means you opt out of AI citation entirely, and your competitors opted in. The default 2024 setup of "allow Google, block everything AI" is actively harmful to GEO.