Google Blogoscoped

Tuesday, July 29, 2008

Cuil Violating Google Webmaster Guidelines?

You might have heard of Cuil, a new search engine partly created by ex-Google employees. (As a Reddit commenter wrapped it up in reference to Cuil’s result rankings, “It’s a great search engine if you’re not interested in finding what you’re looking for.” They get credit for being brave and writing their own web crawler, though.) What’s interesting to note is that Cuil may be violating Google’s webmaster guidelines. Not that everyone needs to comply to these, though a violation may bring a Google penalty with it.

What’s happening is that Cuil at this time does not have a robots.txt file, and neither do its search result pages offer any “noindex” declaration... meaning they may be (and some already are) indexed when people link to them. Here’s what the Google guidelines say to prevent the “search results in search results” phenomenon: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.”

But perhaps Cuil is doing all this inadvertently, and may add a robots.txt in the future.

On another note related to Cuil, some people have started to notice that irrelevant thumbnails are appearing next to Cuil result snippets. Above is an example of a “Google Analytics 2.0” software box appearing next to the result for Blogoscoped, as well as an odd-looking Google Co-Founder. If you’re looking for Blogoscoped or Eric Schmidt then that’s merely confusing, but there’s something else to it – these thumbnails, stored at CuilImg.com, also tend to be uncredited usages of images from other websites, with currently no obvious way to find the original page hosting that image.

[Thanks A., GalaxySpectrum, Brinke, Andy Wong, Nicholas, Chris P., everyone who provided feedback!]

Update: DPNeal in the comments notes that there’s no results anymore for the site:cuil.com/search query. I can reproduce this lack of results (there were a few before, but not many), though I don’t know if this is a data center issue or something else. [Thanks DPNeal!]

Update 2: Google’s Matt Cutts in the comments says, “We do reserve the right to remove search engine results from Google’s search engine results, as we’ve noted in our technical guidelines. For example, lycos.com has a robots.txt that lets anyone crawl it (...) but that doesn’t mean that our users want to see Lycos search results in Google’s search results.” [Thanks Matt!]

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!