Google Blogoscoped

Forum

How Google News Indexes  (View post)

alek [PersonRank 10]

Friday, July 28, 2006
13 years ago6,981 views

Interesting summary – well done as always Philipp.

BTW, if one did some wild-card searches on Google News and scraped the results over a couple of week period, you probably could get a semi-close count of the number of news sources.

Hilarious example doing a search for Eric Schmidt (Google CEO) taking a coaching job at University of North Dakota – here's the URL for those that want to check it out and confirm Philipp's screen grab is legit:
   news.google.com/news?hl=en& ...

Philipp Lenssen [PersonRank 10]

13 years ago #

Update: I added a bullet mentioning the National Vanguard removal case.
Thanks Alek.

Kirby Witmer [PersonRank 10]

13 years ago #

>>Google News does not have a policy against blogs anymore.<<

when i submitted my site some weeks ago, they told me they currently don't accept blogs that have only one author. Has that changed as well??

Philipp Lenssen [PersonRank 10]

13 years ago #

Why don't you try to get a guest blogger for 1 post? Then you can re-submit and you force them to get back to you with an explanation. Feel free to take my post for your blog, it's Creative Commons licensed :)

TOMHTML [PersonRank 10]

13 years ago #

There is a HUGE "bug" in Google News :
when, in an article, you make a link to an old article from the same site, Google will consider the old article as "in relation with the currents events", and will re-indexes it ;)
Example of spam with that technique : 3couleurs.blogspot.com/2005/10 ...

Now the bug is fixed, but I can assure that if U make a link like that, the old article will be reconsider... ;)

Seth Finkelstein [PersonRank 10]

13 years ago #

alek:

John Elliot has been doing some scraping:

privateradio.org/blog/i/google ...

/pd [PersonRank 10]

13 years ago #

Seth: nice pointer.... but this scrap is local /nationwide only.. [US ISO code only]

this would have been a good/interesting analysis if the key handle "ALL" countires

Seth Finkelstein [PersonRank 10]

13 years ago #

There's a country drop-down selector for other countries on the top of that page.

Gary Price [PersonRank 10]

13 years ago #

I think Topix.net is doing an excellent job in providing a full scope of both blogs and "mainstream" news sources in a well organized (by topic, by location, etc.) package.

They claim well over 12,000 mainstream + 15,000 blogs (see: blog.topix.net/archives/000082 ...) and do a nice job of not only clustering results but also identifying blog sources and mainstream sources. They also offer a blog only page.
topix.net/blogs

They even have a page that allows you to browse topical pages not including one for every Zip Code (U.S.) and Postal Code (Canada) and also index video from Reuters.

Of course, ever topic page has an RSS feed.

Finally, Topix offers a page with a bit of detail (just a bit) about how there algo works (topix.net/topix/newsrank) and since every story has a place for users to post commentary, they offer this map.

topix.net/forum/geo

==========================

Also, NewsNow.co.uk is also worth a look. The company is in business to sell its services to companies as a CI tool but what they give away for free is great for browsing. The searching is poor.
newsnow.co.uk.

Combo of blog and mainstream news (blogs identified as such most of the time).
Almost 24,000 sources.

Prebuilt topical pages that autorefresh every 5 minutes.
Example:
newsnow.co.uk/newsfeed/?name=N ...

or

newsnow.co.uk/newsfeed/?name=G ...

Nice identification of news source country using flag icon.

Special feed for tech related press releases:
newsnow.co.uk/newsfeed/?name=P ...

And, if you're totally ready to geek out for news, this page offers a near real time listing (nicely organized and colored coded) as each story enters the database.
newsnow.co.uk/livefeed/

Luka [PersonRank 10]

13 years ago #

We experimented some trouble with TOMHTML when we published articles with an "Special date" written on the left of our blog (it was 05/26/2006). After that no one of our new articles was indexed.

The answer from Google News France was : "for us, the creation date of each single new article is 05/26/2006-00:00". They do not take into account RSS feed for blogs, only informations on the article, so handle "date" things with care!

Philipp Lenssen [PersonRank 10]

13 years ago #

For some reason the privateradio.org site only shows 237 sources. Maybe we can come up with something different that screenscapes searches for random words from a dictionary throughout the day, to collect more sources.

Seth Finkelstein [PersonRank 10]

13 years ago #

Philipp: That's because "Only headlines on the home page are fetched.".
But there's more data, e.g. privateradio.org/blog/i/google ...

Gary: My own blog made Topix's source list. But I almost never get any hits from them. I don't know how much it's just that I'm a Z-lister, versus they're not used much

Tadeusz Szewczyk [PersonRank 10]

13 years ago #

According ro my experiences Google News seems to apply some automatic filters. For instance I wrote once about an art project that used spam for art purposes. Although all other articles from that particualr site were seemingly indexed this one was omitted.

Steve Bryant [PersonRank 1]

13 years ago #

Nice post, Phillip. I have to contend your point about Google honoring scoops, though.

Through my experience with Google Watch, which is indexed in Google News, I've noticed that a scoop will only top a news cluster for a short amount of time. After that time, new stories that may or may not provide additional information are given a prominent position. I listed one specific example in this post: googlewatch.eweek.com/blogs/go ...
   in which I argue that Google News could learn a lot from Techmeme.

As for Google discriminating against anti-Google posts, I think that's probably bunkum. We've written plenty of posts that certainly didn't favor Google, but they always seem to make it into Google News.

Chris Keating [PersonRank 0]

13 years ago #

Thanks for your insights, Phillip. If a site has not intentionally put up an "overview" page, how can it tell which page Google News is using for that purpose? And how does Google News determine what to use for the overview page? (And why not just use the home page?)

When one looks at the Private Radio rankings, there are some big surprises – some major publishers are ranked low while relatively small and/or non-prominent ones like San Jose Mercury News do very well. Any thoughts on these imbalances?

Bryan M [PersonRank 0]

13 years ago #

NewsKnife.com has pretty comprehensive Google News scrapings, although you need to pay up to see the full data. It's not that expensive for a few months' access.

Trogdor [PersonRank 6]

13 years ago #

Just noticed this. Go to:
news.google.com/news?q=UN+reso ...

Now, go to it again. The SERP is different! Now, try again. And again. You'll probably see a pattern here. Not sure what it means, though ...

Chris [PersonRank 0]

13 years ago #

I work for a small nonauthority site listed in German Google News that still made it to the top position serveral times, not only on a news search query but on the main google news site as well, even without scooping the news.

This even made my Warholian 15 minutes of fame, when I got the top spot as a news link on the google search for ... "google". That just about crashed my server and stayed on there for only about ten minutes.

As you can sort the news serps by relevance or by date, I'm still not sure what kind of a mix are used for the google news main page. The main page seems to be updated about any 15 minutes.

I also experience the different serps as described by Trogdor, especially with clustered news.

Serveral times, the German Google News had even bulletin boards/forums threats in their news listings, but it seems that these are filtered out manually when they get the attention by the Google News team.

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!