Free · No signup

Robots.txt Generator

Robots.txt is the file at your domain root that tells search engines and AI crawlers which pages they can and can't access. In 2026 it also decides whether GPTBot, ClaudeBot, and Google-Extended can train on your content. Our generator ships 5 presets (Allow all, Block all, Block AI crawlers, WordPress, Custom), per-bot toggles for 24+ crawlers across 5 categories, a path-rules editor, and a live URL tester.

Site URL (optional)

Preset

Path rules

Sitemap URL (optional)

Test a path

100% private: Your robots.txt is generated entirely in your browser. Nothing is sent to any server.

Live preview

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Slurp
Allow: /

User-agent: DuckDuckBot
Allow: /

User-agent: Baiduspider
Allow: /

User-agent: YandexBot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: CCBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: FacebookBot
Allow: /

User-agent: Twitterbot
Allow: /

User-agent: LinkedInBot
Allow: /

User-agent: AhrefsBot
Allow: /

User-agent: SemrushBot
Allow: /

User-agent: MJ12bot
Allow: /

User-agent: DotBot
Allow: /

How it works

A short, focused workflow — input, run, read the result.

Enter your site URL
Optional. We auto-fill the Sitemap line from it.
Pick a preset
Allow all, Block all, Block AI crawlers, WordPress, or Custom.
Toggle per-bot visibility
In Custom mode, expand any category and click to Allow or Block.
Copy the generated robots.txt
Click Copy. Upload as robots.txt to your domain's top-level directory.

What is a robots.txt file, and why does it decide who trains on your content?

Robots.txt is a plain text file at your domain root that tells crawlers which pages they can fetch. In 2026 it also controls whether GPTBot, ClaudeBot, and Google-Extended can use your content for AI training. Per Cloudflare's 2025-2026 traffic data, GPTBot and ClaudeBot are among the top 4 most-blocked crawlers on the web.

Robots.txt is part of the Robots Exclusion Protocol, standardized as RFC 9309. Google, Bing, OpenAI, Anthropic, and Common Crawl all download and parse it before requesting any other URL on your site. The file must be UTF-8 plain text, served at the exact path /robots.txt, and Google ignores content beyond 500 KiB. The format is dead simple: a User-agent line names the bot, Allow or Disallow lines decide which paths it can fetch, and an optional Sitemap line points crawlers at your XML sitemap.The 2026 nuance is AI training bots. Blocking GPTBot stops OpenAI from using your pages to train future models, but does not stop ChatGPT's search product from citing your content. Same for ClaudeBot versus Claude's search, and Google-Extended versus Google's AI Overviews in Search. PerplexityBot and Bytespider have their own user-agent strings. The most common 2026 strategy is 'block training, allow search': block the training crawlers, allow the search crawlers, and let your content stay visible in AI answers while opting out of the training set. Cloudflare's 2025-2026 data shows roughly 18-22% of the top 10,000 domains now block at least one AI training crawler.Our generator ships 5 presets (Allow all, Block all, Block AI crawlers, WordPress, Custom), per-bot toggles for 24+ crawlers across 5 categories (traditional search, AI training, AI search, SEO tools, social), a path-rules editor for fine-grained control, and a live URL tester so you can verify a specific path is allowed or blocked. Copy the output to your site root, validate it in Google Search Console, and you are live in under 2 minutes.

24+

Crawlers with per-bot toggles

Presets from Allow all to Custom

500 KiB

Max file size Google will parse

What this robots.txt generator covers

Every bot category, preset, and validation check in one workflow.

Allow all preset — single User-agent: * with Disallow: blank. The safe default for sites that want every crawler to access every page.
Block all preset — single User-agent: * with Disallow: /. Use only for staging sites and pre-launch environments that should not be indexed.
Block AI crawlers preset — blocks GPTBot, ClaudeBot, Google-Extended, CCBot, PerplexityBot, and Bytespider in one click. The 2026 opt-out-of-training baseline.
WordPress preset — disallows /wp-admin/, /wp-includes/, /wp-content/plugins/, and /readme.html. Stops search engines from indexing your admin paths.
Per-bot toggles — 24+ crawlers across 5 categories: traditional search (Googlebot, Bingbot), AI training (GPTBot, ClaudeBot), AI search, SEO tools (AhrefsBot), and social (Facebot).
Path rules editor — add custom Allow and Disallow rules per user-agent. Useful for blocking /checkout/, /cart/, or /private/ while leaving the rest of the site crawlable.
Sitemap line — auto-fills the Sitemap: directive from your site URL. Critical for getting new pages indexed faster once you ship the file.
Live URL tester — paste any path from your site, see which bots can crawl it, and confirm the rules work the way you intended before you ship.

Who uses this robots.txt generator

Anyone who needs to control crawler access, from indie publishers to enterprise SEO teams.

In-house SEOs

You're shipping a new site next week and need a robots.txt that blocks /admin/, allows the blog, and points crawlers at the sitemap.

Pick the Custom preset, toggle the path rules, validate with the live URL tester, and ship the file to your root on launch day.

Content publishers

You publish 50 articles a month and want to opt out of AI training while still being cited in ChatGPT and Perplexity search answers.

Pick the Block AI crawlers preset, ship the file, and keep your content in AI search results while removing it from the training set.

Agencies

You manage 12 client sites and need a consistent robots.txt policy that blocks AhrefsBot and SemrushBot from burning client crawl budgets.

Standardize on the Custom preset with SEO-tool bots blocked, save the configuration per client, and reduce wasted crawl by 30-40%.

Ecommerce managers

Your 5,000-product store has indexable /cart/, /checkout/, and /account/ pages bloating the crawl budget and diluting rankings.

Add Disallow rules for /cart/, /checkout/, and /account/ in the path editor, ship the file, and concentrate crawl on the money pages.

Developers

You're deploying a Next.js or static site and need a robots.txt that points at the auto-generated sitemap.xml at build time.

Generate the file in our tool, copy it to your public/ directory, and let your build pipeline overwrite it on every deploy.

Bloggers

You write 3 posts a week on a personal site and want to opt out of AI training without losing your Google traffic.

Pick the Block AI crawlers preset, ship the file in 2 minutes, and keep ranking in Google while removing your posts from the training data.

Related glossary terms

Want a deeper dive? These glossary entries explain the concepts behind this tool.

Frequently Asked
Questions

Everything you need to know about robots.txt, AI crawlers, and controlling who can access your site in 2026.

A plain text file at the root of your domain (e.g. https://yoursite.com/robots.txt) that tells search engines and AI crawlers which pages they're allowed to crawl. Part of the Robots Exclusion Protocol (REP), standardized as RFC 9309. Google's crawlers download and parse it before requesting any other URL. The file must be UTF-8 plain text no larger than 500 KiB (Google ignores content beyond that limit).

robots.txt is a file at your domain root that controls crawling: whether a bot is allowed to fetch a page at all. Meta robots is an HTML <meta name="robots"> tag inside a page's <head> that controls indexing and snippet display, typically with values like noindex, nofollow, noarchive, or nosnippet. A common mistake is using robots.txt to hide a page from Google, but if another site links to that page, Google can still index the URL. To actually prevent indexing, you need a noindex meta tag.

Add a User-agent: block for each AI training crawler you want to opt out of, followed by Disallow: /. The current 2026 strings are GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google's Gemini training token), CCBot (Common Crawl), PerplexityBot, and Bytespider (ByteDance). Per Cloudflare's 2025-2026 traffic data, GPTBot and ClaudeBot are among the top 4 most-blocked crawlers. The 2026 nuance: blocking the training bot does NOT block the search bots, so most marketing sites use a 'block training, allow search' strategy.

Yes, completely safe. The entire tool runs in your browser as client-side JavaScript. Your site URL, your bot choices, your path rules, and the final robots.txt are never sent to our servers, never logged, never stored, and never used to train any AI. Verify yourself with DevTools → Network: zero outbound requests carry your inputs.

More free tools

Free, no signup required. Built by the team behind SERPView.

Google SERP Simulator Tool

Simulate exactly how your page appears in Google search results. Live SERP preview with title, URL, and description, plus character and pixel-width warnings before you publish.

Try it free

Meta Description Generator

Generate 5 AI meta description variants from your page title. See the live SERP preview and copy the winner to your CMS.