Robots.txt is the standard text file located in a website’s root directory (e.g. https://site.com/robots.txt) that tells search engine bots which URLs are allowed to be crawled.
A typical robots.txt structure:
User-agent: *
Disallow: /panel/
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
User-agent: GPTBot
Disallow: /
Sitemap: https://site.com/sitemap.xml
Important rules:
- It is a recommendation, not a directive. Compliant bots follow it; malicious bots ignore it.
- A disallowed page is not blocked from indexing, but from crawling. If it is discovered through an external link, Google may still index the URL (without its content).
- To prevent indexing, use the noindex meta tag, not robots.txt.
- Add the Sitemap URL to robots.txt (not instead of submitting it in Search Console, but in addition to it).
Common mistakes:
- Disallowing CSS/JS files (which prevents the page from rendering)
- Accidentally blocking the entire site with
Disallow: /(by mistakenly copying it from a development environment to production)
Tip: Validate robots.txt changes with the “robots.txt Tester” tool in Search Console. A single incorrect line can cause the entire site to drop out of the index.