Robots.txt

Robots.txt is the standard text file located in a website’s root directory (e.g. https://site.com/robots.txt) that tells search engine bots which URLs are allowed to be crawled.

A typical robots.txt structure:

User-agent: *
Disallow: /panel/
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

User-agent: GPTBot
Disallow: /

Sitemap: https://site.com/sitemap.xml

Important rules:

It is a recommendation, not a directive. Compliant bots follow it; malicious bots ignore it.
A disallowed page is not blocked from indexing, but from crawling. If it is discovered through an external link, Google may still index the URL (without its content).
To prevent indexing, use the noindex meta tag, not robots.txt.
Add the Sitemap URL to robots.txt (not instead of submitting it in Search Console, but in addition to it).

Common mistakes:

Disallowing CSS/JS files (which prevents the page from rendering)
Accidentally blocking the entire site with Disallow: / (by mistakenly copying it from a development environment to production)

Tip: Validate robots.txt changes with the “robots.txt Tester” tool in Search Console. A single incorrect line can cause the entire site to drop out of the index.

← Back to Full Glossary