robots.txt is a plain text file placed at the root of a website (e.g. https://example.com/robots.txt) that instructs web crawlers which pages they may or may not access. It is part of the Robots Exclusion Protocol and is respected by major search engine crawlers.

Does robots.txt prevent indexing?

No. robots.txt only controls crawling — whether a bot accesses the page. A page can still be indexed if another page links to it, even if robots.txt disallows crawling. To prevent indexing, use a noindex meta tag or X-Robots-Tag response header.

What is the difference between Allow and Disallow?

Disallow directives prevent a crawler from accessing matching paths. Allow directives explicitly permit access to paths that would otherwise be blocked by a broader Disallow. Allow rules take precedence over Disallow rules of equal specificity.

What is the wildcard (*) in robots.txt?

User-agent: * applies rules to all web crawlers. In path patterns, * matches any sequence of characters. For example, Disallow: /search?* blocks all paths starting with /search?. The $ anchor matches the end of the URL.

Do all crawlers respect robots.txt?

Major search engines (Google, Bing, DuckDuckGo) and reputable crawlers follow robots.txt. Malicious bots and scrapers often ignore it. robots.txt is a courtesy protocol, not a security measure.

What is the Crawl-delay directive?

Crawl-delay tells crawlers to wait a specified number of seconds between requests. Google does not support Crawl-delay — use Google Search Console's crawl rate settings instead. Bing and some other crawlers do support it.

Can I include a Sitemap directive in robots.txt?

Yes. Adding Sitemap: https://example.com/sitemap.xml to robots.txt tells search engines where to find your sitemap. You can include multiple Sitemap directives. This is separate from submitting your sitemap in Search Console.

What is the maximum size for robots.txt?

Google processes up to 500 KB of robots.txt. Content beyond that limit is ignored. For large sites with many rules, consider using a structured sitemap instead of extensive Disallow lists.

Web Tools /

Robots.txt validator

Paste your robots.txt content to validate syntax and directives, then test whether specific URLs would be crawled.

100% in your browser. Your robots.txt is never sent to any server.

Paste your robots.txt content

How to use

Paste your robots.txt content into the text area.
Click Validate to check for syntax errors and see parsed groups.
Use the URL tester to check if a specific path would be crawled by a specific bot.

Common use cases

Pre-launch audit — verify that important pages are not accidentally blocked.
Debug crawl issues — confirm whether a blocked page is caused by robots.txt.
Wildcard testing — check that * and $ patterns match the intended URLs.

Also see: Sitemap Validator and Meta Tags Preview.

常見問題

What is robots.txt?: robots.txt is a plain text file placed at the root of a website (e.g. https://example.com/robots.txt) that instructs web crawlers which pages they may or may not access. It is part of the Robots Exclusion Protocol and is respected by major search engine crawlers.
Does robots.txt prevent indexing?: No. robots.txt only controls crawling — whether a bot accesses the page. A page can still be indexed if another page links to it, even if robots.txt disallows crawling. To prevent indexing, use a noindex meta tag or X-Robots-Tag response header.
What is the difference between Allow and Disallow?: Disallow directives prevent a crawler from accessing matching paths. Allow directives explicitly permit access to paths that would otherwise be blocked by a broader Disallow. Allow rules take precedence over Disallow rules of equal specificity.
What is the wildcard (*) in robots.txt?: User-agent: * applies rules to all web crawlers. In path patterns, * matches any sequence of characters. For example, Disallow: /search?* blocks all paths starting with /search?. The $ anchor matches the end of the URL.
Do all crawlers respect robots.txt?: Major search engines (Google, Bing, DuckDuckGo) and reputable crawlers follow robots.txt. Malicious bots and scrapers often ignore it. robots.txt is a courtesy protocol, not a security measure.
What is the Crawl-delay directive?: Crawl-delay tells crawlers to wait a specified number of seconds between requests. Google does not support Crawl-delay — use Google Search Console's crawl rate settings instead. Bing and some other crawlers do support it.
Can I include a Sitemap directive in robots.txt?: Yes. Adding Sitemap: https://example.com/sitemap.xml to robots.txt tells search engines where to find your sitemap. You can include multiple Sitemap directives. This is separate from submitting your sitemap in Search Console.
What is the maximum size for robots.txt?: Google processes up to 500 KB of robots.txt. Content beyond that limit is ignored. For large sites with many rules, consider using a structured sitemap instead of extensive Disallow lists.

回報這個工具的問題