Technical SEO

Crawl Budget

Last reviewed May 2026

Crawl budget is the number of URLs a search engine crawler will fetch from a site within a given timeframe. Google's formal definition decomposes it into two factors: crawl rate limit (how aggressively the bot can fetch without overloading the server) and crawl demand (how frequently Google wants to re-fetch URLs based on freshness signals, link popularity, and historical update patterns).

For sites under ~10,000 URLs, crawl budget is rarely a practical constraint — Google will discover and re-fetch nearly everything within days. For sites in the hundreds of thousands or millions of URLs (large e-commerce, programmatic SEO, faceted navigation), crawl budget becomes a primary technical SEO concern: Googlebot only gets to a fraction of URLs per crawl cycle, and inefficient site architecture wastes that budget on low-value pages.

Common crawl budget wasters include infinite parameter combinations (filtered category pages with `?color=red&size=L&sort=price...`), session ID URLs, soft-404s (pages returning 200 with "not found" content), redirect chains (each hop consumes budget), thin tag/category pages, and stale URLs that no longer link anywhere but remain in the index. A site can have a perfectly good architecture and still bleed crawl budget through any of these.

Diagnosis happens at the server-log level: parsing access logs to see which URLs Googlebot actually fetched, how often, and how that compares to which URLs you want crawled. Tools include Screaming Frog Log File Analyser, Botify, OnCrawl, and Splunk-based custom pipelines. Google Search Console's Crawl Stats report shows aggregate numbers (crawls per day, response time, total bytes) without per-URL detail — useful for trend monitoring, not root-cause analysis.

The same concept now applies to AI crawlers. GPTBot, ClaudeBot, PerplexityBot, and CCBot all have their own crawl budgets, often much smaller than Googlebot's. A large site that is optimized for Google's crawler but bleeds budget through redirects and thin pages may be only partially indexed by AI engines — visible in Google but absent from AI search.

Why it matters in GEO / AI search

For sites under ~10,000 URLs, the right answer is "don't optimize for crawl budget, optimize for content and structure" — Google will handle the rest. Premature crawl budget optimization on small sites is a classic distraction, often masking the real problem (thin content, weak entity signals).

For sites at scale, the highest-leverage crawl budget moves are: (1) consolidate parameter URLs via canonical or robots.txt; (2) fix redirect chains so each link points to the final URL; (3) noindex thin pages so they fall out of the crawl queue; (4) ensure XML sitemaps reflect only URLs you actually want crawled and indexed. Each of these can free 10–40% of budget for high-value URLs.

In AI search, crawl budget interacts with citation surface in a non-obvious way. If GPTBot or ClaudeBot only manages to fetch a fraction of your URLs per cycle, the ones it doesn't reach are effectively invisible to ChatGPT and Claude. A well-organized site with clean architecture is therefore over-represented in AI citations relative to its raw page count — and a sprawling unoptimized site is under-represented even when it has equivalent content depth.

Examples

Parameter consolidation

An e-commerce site with `/category/?color=red&size=L` and 200 other filter combinations per category page consolidates them via `<link rel="canonical">` pointing to the unfiltered category URL. Crawl budget for that section drops 60-80% with no SEO downside.

Redirect chain cleanup

A site migrated from `/old-url` → `/intermediate-url` → `/new-url`. Every link still pointing at `/old-url` consumes two redirect hops per crawl. Rewriting internal links directly to `/new-url` saves both crawl budget and link equity.

Sitemap as crawl directive

XML sitemap submitted to Search Console should contain only URLs you want indexed. Including 50,000 thin tag pages or expired event URLs tells Google to spend budget on them — exactly the opposite of optimization.

AI crawler crawl budget

GPTBot and ClaudeBot fetch far fewer URLs per day than Googlebot. A large site bleeding budget through redirects and parameter URLs may have 90%+ of pages crawled by Google but only 30% by ChatGPT's crawler — leading to systematic under-citation in ChatGPT despite strong Google performance.

Authority Links

Google Search Central — Crawl budget management

Official Google guidance on diagnosing and managing crawl budget at scale.

Moz — Crawl Budget

Practical introduction to the concept and remediation tactics.

Botify — The Definitive Guide to Crawl Budget

In-depth log-file analysis methodology for large sites.

Related Terms

Technical SEO

Crawl Depth

The term crawl depth refers to how many pages a search engine's bot will access and index on a site.

Technical SEO

Crawling

Crawling refers to the process whereby bots systematically browse through a website.

Technical SEO

Googlebot

Refers to the software developed by Google to create a searchable index and crawl the web.

Technical SEO

Robots.txt File

A plain-text file at the root of a domain that uses the Robots Exclusion Protocol to tell compliant crawlers which paths they may or may not request.

Country Code Top-Level Domains (CcTLD)Crawl Depth