Robots.txt
A text file that tells search engine crawlers which pages or sections of a website they can or cannot access.
Robots.txt is a text file placed in the root directory of a website that tells search engine crawlers which pages or sections of the site they're allowed to access and crawl. It's part of the Robots Exclusion Protocol, a standard used by websites to communicate with web crawlers.
The robots.txt file can be used to prevent crawlers from accessing sensitive areas, reduce crawl budget waste on unimportant pages, prevent duplicate content issues, and manage how search engines crawl your site. However, it doesn't prevent pages from being indexed if they have links from other sites.
A robots.txt file uses simple syntax with User-agent (specifying which crawler the rules apply to) and Disallow or Allow directives (specifying which paths should not or should be crawled). You can also include a Sitemap directive pointing to your XML sitemap.
Common mistakes with robots.txt include accidentally blocking important pages, blocking CSS or JavaScript files that Google needs to render pages properly, or relying on robots.txt to prevent indexing (use noindex tags instead). You can test your robots.txt file using Google Search Console's robots.txt Tester tool.