robots.txt Generator

Build a clean robots.txt from user-agents, allow/disallow rules and sitemaps.

Overview

The robots.txt Generator builds a complete robots.txt file from a list of user-agent groups, each with its own Allow, Disallow, and optional Crawl-delay rules, plus one or more Sitemap references at the bottom. Output is the exact text crawlers expect at /robots.txt.

Useful for SEO practitioners and webmasters learning how to write a robots.txt file or how to block a directory from Google. Reach for it when launching a site, blocking a staging environment, or carving out crawler-specific rules (Googlebot vs Bingbot vs AI scraper).

How it works

robots.txt follows the Robots Exclusion Protocol (RFC 9309). Each group begins with one or more User-agent: lines and contains Disallow:/Allow: rules whose paths support a leading / and optional $ end-anchor and * wildcard (Google extension). A blank line separates groups; the most specific user-agent group applies to a given bot, with * as the fallback.

Sitemap: directives are global — they apply regardless of user-agent and can appear anywhere in the file, conventionally at the bottom. The generator deduplicates rules, validates path patterns, and prevents the most common mistake: putting Disallow: / (block everything) under User-agent: * while expecting subdirectory rules to override.

Examples

Block everything from staging → User-agent: * Disallow: /.
Allow everything → User-agent: * Disallow: (empty disallow).
Block AI scrapers specifically → User-agent: GPTBot Disallow: /.
A production site → User-agent: * Disallow: /admin/ Allow: /admin/public/ Sitemap: https://example.com/sitemap.xml.

FAQ

Can robots.txt make a page unreachable?

No. It only asks polite crawlers not to crawl. Determined bots ignore it. For privacy, use authentication.

Disallow vs noindex meta?

Disallow blocks the crawler from reading the page. Noindex meta tells crawlers not to index a page they've read. For SEO removal, you need noindex (which means the page must be crawlable).

Does Google honour Crawl-delay?

No. Adjust crawl rate in Search Console instead. Bing honours Crawl-delay.

Where do I put robots.txt?

At the root of the host — https://example.com/robots.txt. Each subdomain needs its own; www.example.com/robots.txt does not apply to api.example.com.

Try robots.txt Generator