Why Page Size is Costing You Rankings, Traffic and Revenue?

The short answer

Google indexes only the first 2MB of raw, uncompressed HTML. Everything beyond that is silently ignored for ranking. Bing flags pages over 125KB with an indexing warning. AI retrieval crawlers – GPTBot, PerplexityBot, ClaudeBot – do not render JavaScript at all, so content loaded via JS is invisible to them regardless of page size.

A bloated page is not just a performance problem. It is a visibility problem. If your content is not in the first 2MB, it does not rank. If your JavaScript is inline, AI engines cannot read it. If your HTML is over 125KB, Bing may not cache it fully. The crawlers that control your traffic have hard limits – and most developers have never been told what they are.

The crawl limits your developer probably never mentioned

Every search engine and AI engine that sends traffic to your site has a limit on how much of your page it will actually read. Exceed that limit and your content – however good it is – becomes invisible to the crawler. No ranking signal. No AI citation. No traffic.

Most developers focus on page load speed, Core Web Vitals, and mobile responsiveness. These matter. But the page size limits set by crawlers are a parallel issue that sits upstream of all of them. A fast page that exceeds Google’s indexing limit will still have its content truncated.

Here is the full picture across every major crawler that sends traffic in 2026.

Crawler / Engine	Operator	Page size limit	Type	Renders JS?
Googlebot	Google Search	2MB HTML (indexing), 15MB fetch	Hard limit	Yes (delayed)
Google-InspectionTool	Google Search Console	15MB fetch only	Fetch only	Yes
Bingbot	Bing / Microsoft Copilot	125KB soft limit (HTML)	Soft limit	Partial
GPTBot	OpenAI (training)	No published limit	Context window	No
ChatGPT-User	OpenAI (retrieval)	No published limit	Context window	No
PerplexityBot	Perplexity AI	No published limit	Context window	No
ClaudeBot	Anthropic	No published limit	Context window	No
Applebot	Apple (Siri, Spotlight)	No published limit	Context window	No

The key insight from this table: the crawlers that matter most in 2026 – Google, Bing, and every AI retrieval bot – all have constraints. They differ in type, but the result is the same. Exceed those constraints and your content does not get seen.

Google’s 2MB rule – what it actually means

On February 3, 2026, Google reorganised its crawler documentation and made explicit what had been implied for years: Googlebot indexes only the first 2MB of raw, uncompressed HTML. This is the hard limit for what gets sent to Google’s indexing pipeline. Content beyond that threshold is silently cut off.

Important distinction

The 2MB limit is applied to uncompressed HTML. Even if your server delivers a 200KB gzipped file, if the decompressed HTML is over 2MB, Googlebot will truncate it. Compression reduces transfer size but not the indexing cut-off.

The Google Search Console trap

There is a common misunderstanding here that catches developers out. When you use Google Search Console’s URL Inspection tool and run a Live Test, it shows you the complete source code – even for a 3MB page. This leads developers to believe the 2MB limit does not apply to their site.

It does. The URL Inspection tool uses Google-InspectionTool, which operates under the general 15MB fetch limit – not the 2MB indexing limit. What you see in Search Console is not what Googlebot sends to the indexer. The indexer only sees the first 2MB.

What causes pages to breach 2MB?

The median HTML page is around 30-33KB. At the 90th percentile, pages reach roughly 151KB. That is well within the 2MB threshold. But specific patterns can push pages dramatically over the limit.

E-commerce category pages

Hundreds of inline product descriptions, attributes, and reviews. Each product with embedded structured data. 1MB+ of pure HTML is common on large catalogue pages.

JavaScript-heavy frameworks

Next.js, Nuxt, SvelteKit inject large JSON hydration payloads (__NEXT_DATA__) directly into the HTML. On data-rich pages, these alone can exceed several hundred KB.

Inline CSS and JavaScript

Stylesheets and scripts embedded directly into the HTML source inflate page size without adding content. These should be external files in every case.

Excessive schema markup

JSON-LD structured data is valuable for SEO and GEO. But adding exhaustive schema for dozens of items inline adds significant weight to the HTML document.

Bing’s 125KB soft limit – stricter than you think

Bing operates with a soft limit of 125KB for HTML page size. Pages that exceed this threshold trigger an “HTML size is too long” error in Bing Webmaster Tools, with the following warning: the page “risks not being fully cached” and content may not be fully acquired by the crawler.

125KB is a significantly tighter threshold than Google’s 2MB. A page that is comfortably within Google’s limits can still fail Bing’s soft cap. Given that Bing powers Microsoft Copilot – one of the most-used AI assistants for enterprise and B2B audiences – treating Bing visibility as secondary is a mistake for anyone targeting professional clients.

Why this matters for B2B

Microsoft Copilot – which powers AI answers in Bing, Windows, and Microsoft 365 – uses Bingbot as its primary retrieval crawler. If your pages exceed 125KB and are not fully cached, they may not appear in Copilot’s answers. For B2B service providers targeting enterprise or SMB clients, Copilot is an increasingly important visibility channel.

Unlike Google’s hard 2MB cut-off, Bing’s 125KB is a soft limit – the crawler may still attempt to index beyond it. But the error is a documented signal that your page is at risk of incomplete caching, and it shows up explicitly in Bing Webmaster Tools, unlike Google’s silent truncation.

AI crawlers in 2026 – a different kind of limit

In 2026, AI retrieval crawlers have overtaken traditional search crawlers in total request volume. Data from Cloudflare’s January 2026 analysis confirmed that AI-related crawlers are making 3.6x more requests than traditional search crawlers across their network. GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, and Amazonbot are now significant drivers of crawl activity.

3.6x

AI crawler requests vs traditional search crawlers (2026)

42%

of all AI bot requests from OpenAI alone

major AI crawlers that render JavaScript

The critical difference with AI crawlers is not a documented MB limit – it is the context window constraint. Each AI retrieval bot can only process a certain amount of content per page request. Bloated pages filled with inline scripts, verbose HTML attributes, and redundant markup compete with your actual content for that limited processing window.

None of them render JavaScript

This is the most significant technical constraint for AI visibility, and most developers are unaware of it. Vercel’s analysis of nextjs.org confirmed that none of the major AI crawlers – including GPTBot, ClaudeBot, ChatGPT-User, and PerplexityBot – currently render JavaScript. If your content is loaded via JavaScript, it does not exist for AI engines.

High-risk patterns for AI visibility

Content loaded via React, Vue, or Angular client-side rendering. Lazy-loaded text and descriptions. Product details populated via AJAX. Any content that relies on JavaScript execution to appear in the DOM – all of this is invisible to every AI retrieval crawler. If you want to appear in ChatGPT answers, Perplexity citations, or Google AI Overviews, your content must be in the raw server-rendered HTML.

What AI crawlers actually want

AI retrieval crawlers optimise for clean, parseable, content-dense HTML. The less they have to wade through – in terms of inline scripts, redundant markup, and non-content code – the more of your actual content fits into their processing window. A lean, well-structured HTML document is not just good for Google. It is good for every AI engine that decides whether to cite your content.

The business cost of page bloat

Page size is not a technical metric. It is a business metric with a direct line to revenue. Here is what happens when pages get heavy.

Search rankings drop

Content beyond Google’s 2MB is not indexed. If your primary service description, key headings, or FAQ content appears late in a bloated document, it may not rank at all.

AI engine invisibility

GPTBot and PerplexityBot cannot read JavaScript-loaded content. If key facts about your service are client-rendered, no AI engine will cite you – regardless of how authoritative your content is.

Conversion rate falls

Amazon found that every 100ms increase in load time reduced sales by 1%. Google data shows 53% of mobile users abandon a site if it takes over 3 seconds to load. Page size is the starting point for that delay.

Paid media ROI drops

Google Ads uses landing page experience as a quality score factor. Slow, bloated landing pages receive lower quality scores – which means higher cost-per-click for the same position. You pay more for less traffic.

How to keep your pages within limits

These are not abstract optimisation suggestions. They are direct, practical steps that bring measurable improvement to crawl coverage, indexing completeness, and AI engine visibility.

Move CSS and JavaScript to external files

This is the single highest-impact change on most sites. Inline styles and scripts inflate raw HTML size without adding content. Move every stylesheet to an external .css file and every script to an external .js file. For WordPress sites, this is standard practice under WordPress Coding Standards – but many page builders and theme frameworks inject significant amounts of inline code that needs to be audited and externalised.

Audit and reduce JSON-LD schema payload

Structured data is valuable for both SEO and GEO. But bulky schema blocks – particularly those with exhaustive product or FAQ arrays – add meaningful weight to your HTML. Audit your schema markup for efficiency. Use concise property values. Avoid duplicating content that already exists in the visible HTML.

Paginate or split long content pages

Category pages with hundreds of products, long-form content pages, and documentation hubs are the most common sources of 2MB+ HTML. Split them into logical subpages. Pagination and content hubs are not just better for crawlability – they allow you to target more specific keyword clusters and provide more relevant landing experiences.

Test actual uncompressed HTML size

Your server probably delivers gzip-compressed responses – so the 50KB transfer size shown in your browser’s network tab is not the number that matters. What matters is the uncompressed HTML size. Use tools like DebugBear, Screaming Frog, or your server logs to check the raw, uncompressed size of your HTML files. That is the number Googlebot and Bingbot measure against their limits.

Ensure critical content is server-rendered

For AI engine visibility, server-side rendering (SSR) is not optional. Any content that must appear in ChatGPT answers, Perplexity citations, or Google AI Overviews needs to exist in the server-rendered HTML before JavaScript executes. Review your site architecture for content that relies on client-side rendering and move it to SSR.

Quick audit checklist

Check uncompressed HTML size – target under 125KB for Bing, keep key content in the first 2MB for Google
Move all CSS into external files – no inline <style> blocks in the HTML source
Move all JavaScript into external files – no inline <script> blocks except minimal critical JS
Verify critical content is visible in server-rendered HTML before JS executes
Audit JSON-LD schema for size – remove redundant properties, tighten descriptions
Review JavaScript framework hydration payloads (__NEXT_DATA__, etc.) – move where possible to API calls
Paginate category pages with more than 50 inline product descriptions
Enable Bing Webmaster Tools and check for “HTML size is too long” errors
Confirm robots.txt allows access for ChatGPT-User, PerplexityBot, ClaudeBot – these are the retrieval bots, not the training bots

Common questions

Page size, crawl limits and performance – answered directly

What is Google’s page size limit for indexing?

Google’s Googlebot indexes only the first 2MB of raw, uncompressed HTML. Content beyond that threshold is silently ignored for ranking purposes. Google may fetch up to 15MB in a separate crawl phase, but only the first 2MB is used for search indexing. This limit applies equally to Googlebot Desktop and Googlebot Smartphone. The URL Inspection tool in Google Search Console uses a different crawler (Google-InspectionTool) that operates under the 15MB fetch limit, which is why it shows complete page source even for oversized pages – but this does not reflect what Googlebot actually indexes.

What is Bing’s page size limit?

Bing uses a soft limit of 125KB for HTML page size. Pages exceeding this threshold receive an “HTML size is too long” error in Bing Webmaster Tools. This is a formal warning that the page risks not being fully cached or indexed. Because Bing powers Microsoft Copilot, pages that fail this threshold may also miss Copilot’s AI-driven answers. The 125KB limit is stricter than Google’s 2MB cut-off. A page that passes Google’s threshold can still fail Bing’s – and this difference matters for B2B businesses whose target audience uses Microsoft products.

Do AI chatbots like ChatGPT and Perplexity have page size limits?

AI retrieval crawlers (GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot) do not publish official per-page size limits. Their constraint is the context window – the amount of content they can process in a single retrieval pass. Bloated HTML, inline scripts, and large JSON payloads reduce the proportion of that window available for your actual content. More critically, none of the major AI crawlers currently render JavaScript. Any content that depends on JavaScript to load will not be seen by any AI retrieval engine, regardless of page size.

What is the average web page HTML size in 2025?

According to HTTP Archive and Web Almanac 2025 data, the median HTML page size is approximately 30-33KB. At the 90th percentile, pages reach around 151KB. Most standard pages are well within Google’s 2MB threshold. The risk is concentrated in category pages with hundreds of inline product descriptions, single-page applications injecting large JSON hydration blobs, and pages with excessive inline CSS and JavaScript.

How does page size affect Core Web Vitals?

Page size directly affects Largest Contentful Paint (LCP), First Contentful Paint (FCP), and Total Blocking Time (TBT). Larger HTML files take longer to parse and render. Inline scripts delay the browser’s rendering pipeline. Every additional kilobyte in the uncompressed HTML adds to Time to First Byte (TTFB) and delays when the user sees usable content. Core Web Vitals are a confirmed Google ranking signal – so page size issues compound: they reduce indexing coverage and hurt the performance scores that influence ranking position.

Should I block AI crawlers from my site?

This depends on which crawlers you are referring to. There are two types: training crawlers (GPTBot, CCBot, Bytespider) that collect data to train AI models, and retrieval crawlers (ChatGPT-User, PerplexityBot, ClaudeBot) that fetch pages in real time to answer user questions. Blocking retrieval crawlers prevents your content from appearing in ChatGPT answers, Perplexity citations, and other AI-driven discovery channels. For most businesses with publicly-facing marketing and service content, allowing retrieval crawlers is the right choice. Training crawlers are a separate decision based on your views on data use and licensing.

Website performance

Is your site losing visibility to page bloat?

I audit and optimise WordPress and Shopify sites for Core Web Vitals, crawl coverage, and AI engine visibility – with documented before and after data. If your site is slow, heavy, or invisible to AI search tools, that is a fixable problem.

Talk to me about your site