RRobots.txt Generator
Crawl rules + sitemap
Site settings
1 allow3 disallow0 issuesblocked /admin/
Presets
Path tester
Blocked
Allow paths
Disallow paths
robots.txt

Robots.txt Generator — Create Crawl Rules for SEO

Updated May 10, 2026
Share & Support

What's included

Features

Allow and Disallow rule editor — add, remove, and reorder crawl rules for any path on your site
3 practical presets — Public Site (blocks admin/private/tmp), Block Staging (Disallow all), Store Crawl Rules (blocks cart, checkout, account, search)
**Auto Sitemap directive** — generates `Sitemap: https://yourdomain.com/sitemap.xml` from your site URL automatically
Crawl-delay support — add a delay in seconds for servers that need rate limiting (respected by Bing, Yandex; ignored by Googlebot)
Host directive — specify the canonical domain for Yandex and other crawlers that support it
Live output preview — see the exact robots.txt file content update in real time as you change rules
Copy or download — copy to clipboard or download as robots.txt ready to upload to your domain root
Works with [Sitemap.xml Generator](/sitemap-generator) — generate both files together for a complete crawl configuration

About this tool

Stop Google Wasting Crawl Budget on Pages That Shouldn't Be Indexed

Runs in your browser
No install or signup
Free forever

Every time Googlebot visits your site, it has a limited crawl budget — a quota of pages it will crawl before moving on. If that quota gets consumed by admin panels, checkout flows, duplicate filter URLs, internal search results, or staging paths, your important product pages, blog posts, and landing pages may not get crawled and indexed as quickly as they should. A correctly configured robots.txt file solves this by directing crawlers away from pages with no SEO value and toward the content that matters.

This Robots.txt Generator builds a valid robots.txt file in seconds. Enter your site URL, choose from three practical presets — Public Site, Block Staging, or Store Crawl Rules for ecommerce — and customise the Allow and Disallow paths to match your specific setup. The Sitemap directive is added automatically using your domain, pointing Googlebot straight to your XML sitemap from the moment it reads the file. Optional Crawl-delay and Host directives are available for servers that need rate limiting or canonical domain enforcement.

Robots.txt is crawler guidance, not access control. The file is publicly readable — anyone can open https://yourdomain.com/robots.txt and see every path you've blocked. Disallowed paths can still appear in search results if they're linked from other sites. For pages that must be removed from Google's index, the correct approach is to add a noindex meta tag or HTTP header on the page itself and ensure the page remains crawlable so Google can read the noindex signal. Never Disallow a URL you also need noindexed — it's one of the most common technical SEO mistakes.

The difference between robots.txt and a sitemap. A sitemap tells search engines which URLs exist and should be crawled. Robots.txt tells them which URLs to avoid. They work together: your sitemap lists every important page, and robots.txt keeps crawlers away from the ones that shouldn't be in that list. Use the Sitemap.xml Generator to build the companion sitemap file, then reference it with the Sitemap directive in this tool.

For ecommerce sites, the biggest crawl budget drains are typically paginated filter URLs (like /products?sort=price&color=red), session-based paths (/cart/, /checkout/), and duplicate category views. For developer tools, documentation, and SaaS apps, the usual suspects are /admin/, /dashboard/, /api/, staging subpaths, and any URL that requires authentication to be useful. The presets cover these common patterns — customise the Disallow list to add paths specific to your stack.

Step by step

How to Use

  1. 1
    Enter your site URLType your domain (e.g. https://example.com) into the site URL field. The tool uses it to auto-build the Sitemap directive as https://example.com/sitemap.xml.
  2. 2
    Choose a presetPick from Public Site (sensible defaults for most websites), Block Staging (Disallow: / to block all crawlers), or Store Crawl Rules (blocks cart, checkout, account, and search paths for ecommerce). The rules populate automatically — edit them to match your site.
  3. 3
    Customise Allow and Disallow pathsAdd any paths you need to explicitly allow or block. Paths must start with /. Common additions: /admin/, /api/, /staging/, /wp-admin/, /dashboard/, and duplicate or filter URL patterns like /search or ?sort=. Remove any preset rules that don't apply to your stack.
  4. 4
    Add optional directivesToggle the Sitemap checkbox to include or exclude the sitemap line. Add a Crawl-delay (in seconds) if your server struggles under bot load — Bing and Yandex respect this. Add a Host value only if you need to specify a canonical domain for Yandex.
  5. 5
    Review the outputCheck the generated file in the preview panel. Confirm the paths are correct, the sitemap URL points to the right location, and there are no typos in the directives.
  6. 6
    Copy or download and deployClick Copy to clipboard or Download to get robots.txt. Upload it to the root of your domain — it must be accessible at https://yourdomain.com/robots.txt. Verify in a browser that the URL returns the file. Then submit the URL to Google Search Console under Settings > robots.txt to confirm Google can read it.

Real-world uses

Common Use Cases

🤖
Block Googlebot from admin and login pages
Add Disallow: /admin/, Disallow: /login/, and Disallow: /wp-admin/ to stop crawlers from wasting quota on pages that require authentication and have no SEO value. These paths can accumulate crawl budget usage on large sites.
🛒
Protect ecommerce crawl budget from filter and session URLs
Block /cart/, /checkout/, /account/, and filtered URLs like /search or ?sort= to prevent duplicate content and crawl budget drain. Leave /products/, /categories/, and /collections/ open so Google indexes your inventory.
🚧
Stop Google indexing a staging or development site
Generate a robots.txt with Disallow: / for your staging environment to discourage crawlers. Pair with HTTP basic auth for actual access control — robots.txt alone does not prevent determined access.
🗺️
Point all crawlers to your XML sitemap
Add the Sitemap directive to every robots.txt file, regardless of other rules. It signals your canonical URL list to all compliant crawlers immediately, without waiting for them to find the sitemap through other discovery paths.
📄
Prepare robots.txt for a new site before launch
Create a baseline robots.txt before a site goes live so the first crawl is guided correctly. Add the sitemap URL, block any internal tools or test paths, and confirm the file is in place before submitting the site to Search Console.
🔍
Understand when to use noindex instead of Disallow
If a page is already in Google's index and you need it removed, noindex is the correct tool — not Disallow. Use this generator to write the crawl rules, then apply noindex tags to pages that should be kept uncrawlable but whose indexing status needs to be managed separately.

Got questions?

Frequently Asked Questions

A robots.txt file is a plain text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells web crawlers which paths they may or may not visit. Search engines like Googlebot check it before crawling. You need one to protect crawl budget — if Google spends crawl quota on admin panels, checkout flows, duplicate pages, or internal search results, it may not reach your important content as frequently. A well-configured robots.txt directs crawlers to the pages that matter and away from the ones that don't.

Disallow in robots.txt blocks a crawler from visiting a URL — the page is never crawled. noindex is an HTML meta tag or HTTP header that tells a crawler "visit the page but don't include it in search results." The critical mistake to avoid: if you Disallow a URL, Googlebot cannot see the noindex tag on it, so the page may still appear in search results if it is linked from elsewhere. To fully remove a page from Google, add noindex to the page and keep it crawlable (remove the Disallow rule). Only use Disallow to save crawl budget on content you don't care about indexing.

Upload robots.txt to the root directory of your domain — it must be accessible at https://yourdomain.com/robots.txt exactly. For WordPress, place it in the public_html folder. For Next.js static exports, put it in the public/ folder. For Apache/Nginx, place it in the web root (usually /var/www/html/). Subdomain sites need their own robots.txt at the subdomain root. You can verify it is accessible by opening the URL directly in your browser.

Ecommerce sites typically block: /cart/ (session-specific, no SEO value), /checkout/ (private), /account/ and /login/ (private user paths), /search (or ?q= query strings that generate duplicate pages), /wishlist/, /compare/, and any /admin/ paths. Keep /products/, /categories/, /collections/, /blog/, and your canonical product pages open. Blocking paginated or filtered URLs (/products?sort=price or /category/page/2/) is common but requires care — only block filters that create true duplicates, not URLs with unique content.

Yes, always. The Sitemap directive tells search engines exactly where to find your XML sitemap without waiting for them to discover it through other means. Add a line like: Sitemap: https://example.com/sitemap.xml — this applies to all crawlers, not just the User-agent block it appears near. If you have multiple sitemaps (images, videos, news), add a separate Sitemap line for each. This is especially useful for new sites that haven't accumulated many inbound links yet.

Crawl-delay specifies the minimum number of seconds a crawler should wait between requests to your server. For example, Crawl-delay: 2 asks bots to pause 2 seconds between page requests. This is useful for shared hosting or low-capacity servers that get overwhelmed by aggressive bots. Note: Googlebot ignores Crawl-delay — to control Google's crawl rate, use the crawl rate settings in Google Search Console. Crawl-delay is mainly respected by Bing, Yandex, and other crawlers.

You can use Disallow: / in the staging site's robots.txt to discourage compliant crawlers, but this is not access control — the file itself is public, and the content can still be accessed directly. For true privacy on a staging environment, use HTTP authentication (basic auth), IP allowlists, or environment-level access restrictions. The robots.txt approach works well as a second layer to reduce accidental indexing of staging content that gets discovered through links, but should never be your only protection.

No. Major search engines like Google, Bing, Yandex, and Apple's crawler respect robots.txt, but rogue bots and scrapers ignore it entirely. The robots.txt protocol is voluntary — there is no technical enforcement. For Googlebot specifically, the Crawl-delay directive is ignored (use Search Console instead). The User-agent: * rule applies to all compliant crawlers. You can also write crawler-specific rules using the exact User-agent name, like User-agent: Googlebot to target only Google, followed by the Disallow or Allow rules for that crawler.