Google Blogoscoped

Friday, June 20, 2003

Google Indexing Limit Put to Test

There’s supposed to be an indexing limit of 101K for Google. Which means, everything after that size is cut off and can’t be searched (or found).
That’s also what you see when you get the result for a larger page in Google — 101K is displayed on the search result page. However, the actual HTML that’s being shown in the Google Cache is larger than that, with about 150K. I came to about the same figure when I took the original HTML file and removed everything after the last sentence that’s being indexed (according to the Google Cache); also, I removed the HTML formatting, double returns, double spacing, and so on. So in short, it seems to me Google indexes around 150K of files found. Now, this doesn’t mean 150K are a sensible download for most pages. However, e.g. at, I provide a “chapterized” version of the books as well as a one-page all-included version for easier printing.

Common Sense SEO

“Clearly, over time the focus of the search engines will vary. The best way to deal with this is to not deal with it! This means that rather than tweaking a site one way today and another way tomorrow, the best way to approach optimizing a page or a whole site is to not try to beat the system. Instead of trying to “psych-out” the search engines, why not add value to the site? A “common sense” approach to search engine optimization, looking for long term results, is the way to go. When you try to help a site rank better by making it the best it can be, everybody wins.”
– Daria Goetsch, The Myth of ’Guaranteed #1 Ranking’ in Search Engine Marketing (Search Engine Guide), 2003-06-20

Live Search Queries

Are you interested in what other people are searching for? It can help you get a feeling how people are searching to optimize your own pages, or it can just be entertaining if you’re curious. Many search engines provide a query spy page for the “Web voyeurs” (often those are unfiltered, so expect some adult keywords). Here’s a selection:

These are less up-to-date collections, edited, just showing gains and losses, or pages containing accumulated top query lists:

This one isn’t live, rather a “best of” (or “worst of”...) search queries:

And here are the recent Google queries people used to find Google Blogoscoped.

Thanks to Rob Skelton’s Search Directory for many of the listed pages.


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!