Robots.txt validator
Paste your robots.txt content to validate syntax and directives, then test whether specific URLs would be crawled.
100% in your browser. Your robots.txt is never sent to any server.
Test a URL path
How to use
- Paste your robots.txt content into the text area.
- Click Validate to check for syntax errors and see parsed groups.
- Use the URL tester to check if a specific path would be crawled by a specific bot.
Common use cases
- Pre-launch audit — verify that important pages are not accidentally blocked.
- Debug crawl issues — confirm whether a blocked page is caused by robots.txt.
- Wildcard testing — check that
*and$patterns match the intended URLs.
Also see: Sitemap Validator and Meta Tags Preview.
よくある質問
- What is robots.txt?
- robots.txt is a plain text file placed at the root of a website (e.g. https://example.com/robots.txt) that instructs web crawlers which pages they may or may not access. It is part of the Robots Exclusion Protocol and is respected by major search engine crawlers.
- Does robots.txt prevent indexing?
- No. robots.txt only controls crawling — whether a bot accesses the page. A page can still be indexed if another page links to it, even if robots.txt disallows crawling. To prevent indexing, use a noindex meta tag or X-Robots-Tag response header.
- What is the difference between Allow and Disallow?
- Disallow directives prevent a crawler from accessing matching paths. Allow directives explicitly permit access to paths that would otherwise be blocked by a broader Disallow. Allow rules take precedence over Disallow rules of equal specificity.
- What is the wildcard (*) in robots.txt?
- User-agent: * applies rules to all web crawlers. In path patterns, * matches any sequence of characters. For example, Disallow: /search?* blocks all paths starting with /search?. The $ anchor matches the end of the URL.
- Do all crawlers respect robots.txt?
- Major search engines (Google, Bing, DuckDuckGo) and reputable crawlers follow robots.txt. Malicious bots and scrapers often ignore it. robots.txt is a courtesy protocol, not a security measure.
- What is the Crawl-delay directive?
- Crawl-delay tells crawlers to wait a specified number of seconds between requests. Google does not support Crawl-delay — use Google Search Console's crawl rate settings instead. Bing and some other crawlers do support it.
- Can I include a Sitemap directive in robots.txt?
- Yes. Adding Sitemap: https://example.com/sitemap.xml to robots.txt tells search engines where to find your sitemap. You can include multiple Sitemap directives. This is separate from submitting your sitemap in Search Console.
- What is the maximum size for robots.txt?
- Google processes up to 500 KB of robots.txt. Content beyond that limit is ignored. For large sites with many rules, consider using a structured sitemap instead of extensive Disallow lists.