Robots.txt Generator — Create Crawl Rules for SEO
What's included
Features
robots.txt ready to upload to your domain rootAbout this tool
Stop Google Wasting Crawl Budget on Pages That Shouldn't Be Indexed
Every time Googlebot visits your site, it has a limited crawl budget — a quota of pages it will crawl before moving on. If that quota gets consumed by admin panels, checkout flows, duplicate filter URLs, internal search results, or staging paths, your important product pages, blog posts, and landing pages may not get crawled and indexed as quickly as they should. A correctly configured robots.txt file solves this by directing crawlers away from pages with no SEO value and toward the content that matters.
This Robots.txt Generator builds a valid robots.txt file in seconds. Enter your site URL, choose from three practical presets — Public Site, Block Staging, or Store Crawl Rules for ecommerce — and customise the Allow and Disallow paths to match your specific setup. The Sitemap directive is added automatically using your domain, pointing Googlebot straight to your XML sitemap from the moment it reads the file. Optional Crawl-delay and Host directives are available for servers that need rate limiting or canonical domain enforcement.
Robots.txt is crawler guidance, not access control. The file is publicly readable — anyone can open https://yourdomain.com/robots.txt and see every path you've blocked. Disallowed paths can still appear in search results if they're linked from other sites. For pages that must be removed from Google's index, the correct approach is to add a noindex meta tag or HTTP header on the page itself and ensure the page remains crawlable so Google can read the noindex signal. Never Disallow a URL you also need noindexed — it's one of the most common technical SEO mistakes.
The difference between robots.txt and a sitemap. A sitemap tells search engines which URLs exist and should be crawled. Robots.txt tells them which URLs to avoid. They work together: your sitemap lists every important page, and robots.txt keeps crawlers away from the ones that shouldn't be in that list. Use the Sitemap.xml Generator to build the companion sitemap file, then reference it with the Sitemap directive in this tool.
For ecommerce sites, the biggest crawl budget drains are typically paginated filter URLs (like /products?sort=price&color=red), session-based paths (/cart/, /checkout/), and duplicate category views. For developer tools, documentation, and SaaS apps, the usual suspects are /admin/, /dashboard/, /api/, staging subpaths, and any URL that requires authentication to be useful. The presets cover these common patterns — customise the Disallow list to add paths specific to your stack.
Step by step
How to Use
- 1Enter your site URLType your domain (e.g. https://example.com) into the site URL field. The tool uses it to auto-build the Sitemap directive as https://example.com/sitemap.xml.
- 2Choose a presetPick from Public Site (sensible defaults for most websites), Block Staging (Disallow: / to block all crawlers), or Store Crawl Rules (blocks cart, checkout, account, and search paths for ecommerce). The rules populate automatically — edit them to match your site.
- 3Customise Allow and Disallow pathsAdd any paths you need to explicitly allow or block. Paths must start with /. Common additions: /admin/, /api/, /staging/, /wp-admin/, /dashboard/, and duplicate or filter URL patterns like /search or ?sort=. Remove any preset rules that don't apply to your stack.
- 4Add optional directivesToggle the Sitemap checkbox to include or exclude the sitemap line. Add a Crawl-delay (in seconds) if your server struggles under bot load — Bing and Yandex respect this. Add a Host value only if you need to specify a canonical domain for Yandex.
- 5Review the outputCheck the generated file in the preview panel. Confirm the paths are correct, the sitemap URL points to the right location, and there are no typos in the directives.
- 6Copy or download and deployClick Copy to clipboard or Download to get robots.txt. Upload it to the root of your domain — it must be accessible at https://yourdomain.com/robots.txt. Verify in a browser that the URL returns the file. Then submit the URL to Google Search Console under Settings > robots.txt to confirm Google can read it.
Real-world uses
Common Use Cases
Got questions?
Frequently Asked Questions
A robots.txt file is a plain text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells web crawlers which paths they may or may not visit. Search engines like Googlebot check it before crawling. You need one to protect crawl budget — if Google spends crawl quota on admin panels, checkout flows, duplicate pages, or internal search results, it may not reach your important content as frequently. A well-configured robots.txt directs crawlers to the pages that matter and away from the ones that don't.
Disallow in robots.txt blocks a crawler from visiting a URL — the page is never crawled. noindex is an HTML meta tag or HTTP header that tells a crawler "visit the page but don't include it in search results." The critical mistake to avoid: if you Disallow a URL, Googlebot cannot see the noindex tag on it, so the page may still appear in search results if it is linked from elsewhere. To fully remove a page from Google, add noindex to the page and keep it crawlable (remove the Disallow rule). Only use Disallow to save crawl budget on content you don't care about indexing.
Upload robots.txt to the root directory of your domain — it must be accessible at https://yourdomain.com/robots.txt exactly. For WordPress, place it in the public_html folder. For Next.js static exports, put it in the public/ folder. For Apache/Nginx, place it in the web root (usually /var/www/html/). Subdomain sites need their own robots.txt at the subdomain root. You can verify it is accessible by opening the URL directly in your browser.
Ecommerce sites typically block: /cart/ (session-specific, no SEO value), /checkout/ (private), /account/ and /login/ (private user paths), /search (or ?q= query strings that generate duplicate pages), /wishlist/, /compare/, and any /admin/ paths. Keep /products/, /categories/, /collections/, /blog/, and your canonical product pages open. Blocking paginated or filtered URLs (/products?sort=price or /category/page/2/) is common but requires care — only block filters that create true duplicates, not URLs with unique content.
Yes, always. The Sitemap directive tells search engines exactly where to find your XML sitemap without waiting for them to discover it through other means. Add a line like: Sitemap: https://example.com/sitemap.xml — this applies to all crawlers, not just the User-agent block it appears near. If you have multiple sitemaps (images, videos, news), add a separate Sitemap line for each. This is especially useful for new sites that haven't accumulated many inbound links yet.
Crawl-delay specifies the minimum number of seconds a crawler should wait between requests to your server. For example, Crawl-delay: 2 asks bots to pause 2 seconds between page requests. This is useful for shared hosting or low-capacity servers that get overwhelmed by aggressive bots. Note: Googlebot ignores Crawl-delay — to control Google's crawl rate, use the crawl rate settings in Google Search Console. Crawl-delay is mainly respected by Bing, Yandex, and other crawlers.
You can use Disallow: / in the staging site's robots.txt to discourage compliant crawlers, but this is not access control — the file itself is public, and the content can still be accessed directly. For true privacy on a staging environment, use HTTP authentication (basic auth), IP allowlists, or environment-level access restrictions. The robots.txt approach works well as a second layer to reduce accidental indexing of staging content that gets discovered through links, but should never be your only protection.
No. Major search engines like Google, Bing, Yandex, and Apple's crawler respect robots.txt, but rogue bots and scrapers ignore it entirely. The robots.txt protocol is voluntary — there is no technical enforcement. For Googlebot specifically, the Crawl-delay directive is ignored (use Search Console instead). The User-agent: * rule applies to all compliant crawlers. You can also write crawler-specific rules using the exact User-agent name, like User-agent: Googlebot to target only Google, followed by the Disallow or Allow rules for that crawler.