Google Blogoscoped

Thursday, November 16, 2006

Google Displays Titles For Uncrawled Pages

Did Google change the way they display non-crawled pages in their results? As you know, Google and other search engines respect the robots.txt file and only crawl stuff they’re supposed to crawl – so they won’t index e.g. search results, or other pages that only add noise to their web results. However, Google often displays the URL of the non-crawled page itself anyway in search results, apparently because they found it linked to from other sites*. If I’m not mistaken, such links were so far simply “titled” with the URL of the page. And now, as Philipp of Ice Blog points out to me, it seems they are titled with something more meaningful... possibly, content being found in link texts pointing to the site.

As an example: this page points to Now, Google via their robots.txt disallows crawling of the "/search” URL, so Google cannot index this page (and learn that its title is “Sloper John Erik - Google Search”). However, they now have the backlinks text and the URL, and they will display both in search results for erik.**

What practical implications does this have? For the user, certain exotic search results become easier to read (even though the title is still all lower-case, and potentially not terribly meaningful). And for developers – if it’s true that backlinks change the way Google titles results – this might well be a new way to add a prank or two to Google search results... a kind of Googlebombing that works with uncrawlable pages only.

*This behavior is actually slightly controversial, but that’s another topic. Google played by the books and respected the robots.txt even when they display such links in their SERPs, but some wish they wouldn’t display them at all.

**A search for returns around 8,560 pages at the moment, by the way.


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!