There’s supposed to be an indexing limit of 101K for Google. Which means, everything after that size is cut off and can’t be searched (or found).
That’s also what you see when you get the result for a larger page in Google 101K is displayed on the search result page. However, the actual HTML that’s being shown in the Google Cache is larger than that, with about 150K. I came to about the same figure when I took the original HTML file and removed everything after the last sentence that’s being indexed (according to the Google Cache); also, I removed the HTML formatting, double returns, double spacing, and so on.
So in short, it seems to me Google indexes around 150K of files found. Now, this doesn’t mean 150K are a sensible download for most pages. However, e.g. at Authorama.com, I provide a “chapterized” version of the books as well as a one-page all-included version for easier printing.
“Clearly, over time the focus of the search engines will vary. The best way to deal with this is to not deal with it! This means that rather than tweaking a site one way today and another way tomorrow, the best way to approach optimizing a page or a whole site is to not try to beat the system. Instead of trying to “psych-out” the search engines, why not add value to the site? A “common sense” approach to search engine optimization, looking for long term results, is the way to go. When you try to help a site rank better by making it the best it can be, everybody wins.”
– Daria Goetsch, The Myth of ’Guaranteed #1 Ranking’ in Search Engine Marketing (Search Engine Guide), 2003-06-20
Are you interested in what other people are searching for? It can help you get a feeling how people are searching to optimize your own pages, or it can just be entertaining if you’re curious. Many search engines provide a query spy page for the “Web voyeurs” (often those are unfiltered, so expect some adult keywords). Here’s a selection:
Last 10 Queries Performed (AllTheWeb)
(Examples: casino news, fibre channel-karte, tablet toolings.)
Last 20 Image Queries (MediaMiner)
(Examples: tifa lockheart of final fantasy 7, wallpaper.)
Last 50 Queries (BlackStump Peek-A-Boo)
(Examples: klepper, johnson chemical, kit cars, platform boots pics.)
Last 25 Search Queries (Top 25 Live)
(Examples: babyloncastcrew5, awardsbestsellers, bandocando, thenthgate.)
Last 10 Searches (Galaxy Stargazer)
(Examples: car leasing uk, sad poems, nederland startpagina, ColdFusion.)
Top 10 Current Searches (MetaCrawler MetaSpy)
(Examples: tsk pocketpc, “simpark”, Cable Service, oldham, lord of the rings.)
Recent German Search Queries (Fireball)
(Examples: shemp, analphabetem, Hannover geht aus, targe+of+gordon, Vicky Leandros, private domain registration.)
Most Popular Queries of Last 24 hours (ImagesPro.com)
(Examples: viewer avi, 3d, eps, freeware, low-motion, Folder, converter, divx, codec.)
Last 10 Searches (MoneyWeb)
(Examples: sanlam financials, i-thor, valex financial book, Panaflo, china fund.)
Last 20 Searches (1000 Files)
(Examples: accelerator, icq, dvd pixplay, activescreenlock, fast freeware files downloader.)
These are less up-to-date collections, edited, just showing gains and losses, or pages containing accumulated top query lists:
Top Requested Movie Titles for Last Week (IMDB.com)
(Examples: 2 Fast 2 Furious, Matrix Reloaded, Dumb and Dumberer, Finding Nemo, The Hulk.)
Recent Queries (Popdex)
(Examples: pokemon ruby walkthrough, canon g5 review.)
Top 10 Gaining Queries (Google Zeitgeist)
(Examples: gregory peck, us open, fathers day cards, le mans, david brinkley.)
Top 50, Edited with Comments (Lycos)
(Examples: KaZaA, Father’s Day, Test the Nation, Dragonball, The Matrix: Reloaded, Pamela Anderson, Clay Aiken, Brooke Burke.)
This one isn’t live, rather a “best of” (or “worst of”...) search queries:
Disturbing Search Requests (Searchrequests.weblogs.com)
(Examples: how can i see what files someone has been accessing on my computer, “pizza box” and “euphemisms”, satirical cartoons based on loneliness and being left out, x-men society for creative anachronism.)
And here are the recent Google queries people used to find Google Blogoscoped.
Thanks to Rob Skelton’s Search Directory for many of the listed pages.
>> More posts
Advertisement
This site unofficially covers Google™ and more with some rights reserved. Join our forum!