SEO Term

Robots.txt

The standard text file found in a site's root directory that tells crawlers which URLs are allowed to be crawled.

Robots.txt is the standard text file located in a website’s root directory (e.g. https://site.com/robots.txt) that tells search engine bots which URLs are allowed to be crawled.

A typical robots.txt structure:

User-agent: *
Disallow: /panel/
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

User-agent: GPTBot
Disallow: /

Sitemap: https://site.com/sitemap.xml

Important rules:

  • It is a recommendation, not a directive. Compliant bots follow it; malicious bots ignore it.
  • A disallowed page is not blocked from indexing, but from crawling. If it is discovered through an external link, Google may still index the URL (without its content).
  • To prevent indexing, use the noindex meta tag, not robots.txt.
  • Add the Sitemap URL to robots.txt (not instead of submitting it in Search Console, but in addition to it).

Common mistakes:

  • Disallowing CSS/JS files (which prevents the page from rendering)
  • Accidentally blocking the entire site with Disallow: / (by mistakenly copying it from a development environment to production)

Tip: Validate robots.txt changes with the “robots.txt Tester” tool in Search Console. A single incorrect line can cause the entire site to drop out of the index.

← Back to Full Glossary