Google Blogoscoped

Saturday, September 1, 2007

Google News Content Hosting, Duplicate Detection

Enter china products into Google web search, and the top result will be a Google News onebox. This onebox links to a new feature by Google News: Google-hosted news articles. Google partners with different news agencies, like Associated Press, Agence France Press and Canadian Press, to bring you straight to the source content of an article.

Google combines this with a new duplicate detection, which, as Google argues, brings you more diversity in News results because you get less copies of licensed news content (as you may know, stories from news agencies like Reuters usually appear as copies or near-copies in hundreds of newspapers). To those sources part of Google News which don’t hand out straight copies of content by AP, Reuters and others, this might bring a bonus in traffic, because they may start being more visible in story clusters.

Right now, Google’s hosted articles – like the one titled “Poll: US Shares Blame for China Products” – are remarkably clutter free and accessible (except for outside search bots, which Google disallows to crawl this content via their robots.txt directive). If they stay this way, and the URLs are stable, it might become a preferred way for many of us to link to news agency content.

In the future however Google can use this place to display their advertisements. Hosted news articles show that sending users away as quick as possible is not Google’s goal anymore, if it ever was. Contrast this to pre-IPO statements made by the Google co-founder in an interview from 2004:

Playboy: With the addition of e-mail, Froogle – your new shopping site – and Google news, plus your search engine, will Google become a portal similar to Yahoo, AOL or MSN? Many Internet companies were founded as portals. It was assumed that the more services you provided, the longer people would stay on your website and the more revenue you could generate from advertising and pay services.

Larry Page: We built a business on the opposite message. We want you to come to Google and quickly find what you want. Then we’re happy to send you to the other sites. In fact, that’s the point. The portal strategy tries to own all of the information.

Playboy: Portals attempt to create what they call sticky content to keep a user as long as possible.

Larry Page: That’s the problem. Most portals show their own content above content elsewhere on the web. We feel that’s a conflict of interest, analogous to taking money for search results. Their search engine doesn’t necessarily provide the best results; it provides the portal’s results. Google conscientiously tries to stay away from that. We want to get you out of Google and to the right place as fast as possible. It’s a very different model.

[Via Google Operating System and Google News blog.]


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!