Google Blogoscoped

Forum

Q: How Google Blog Search know the target is a blog?

Juha-Matti Laurio [PersonRank 10]

Wednesday, October 28, 2009
14 years ago3,589 views

What are the means for Google to recognize blogs when it is crawling the Web?

I was searching a specific blog with this string
http://blogsearch.google.com/blogsearch?hl=en&um=1&ie=UTF-8&q=wasc+statistics&btnG=Search+Blogs

but results include mainly news sites, only 1 Blogger url included to first hits.
It's easy to recognize a content made with WordPress, Blogger etc., but what about the non-typical blogging platforms?

Philipp Lenssen [PersonRank 10]

14 years ago #

Do they just check if a site has an RSS feed?

WebSonic.nl [PersonRank 10]

14 years ago #

That could be but for example e-commerce websites sometimes also have an RSS feed and then the website is not a blog.

<<Why does Google crawl/index blogs (specifically sites notified by "WordPress XMLRPC pings") so much faster than a "normal" site submitting a revised Sitemap. What is the impact of that on the overall "quality" of the index?>>

http://www.youtube.com/user/GoogleWebmasterHelp#p/c/841CB8F9F31BF5D5/11/k8PQ3nNCYuU

Tony Ruscoe [PersonRank 10]

14 years ago #

I think the answer is that they don't know if the target is a blog. Like you say, it also includes news sites and forums. I've said it before, and I'll say it again, they should have called it "Google Feed Search" or something.

/pd [PersonRank 10]

14 years ago #

dont blog's send our pings when a post is published ? Whereas webpages are static and only get crawled/spidered depending on the robort.txt ?

Tony Ruscoe [PersonRank 10]

14 years ago #

Not necessarily. You could easily have a feed for a site which isn't a blog yet still sends a ping when it's updated. And not all blogs send a ping when a post is published. Google Blog Search indexes anything with a feed AFAIK.

mbegin [PersonRank 10]

14 years ago #

Semi related:

<< Using RSS/Atom feeds to discover new URLs:

Google uses numerous sources to find new webpages, from links we find on the web to submitted URLs. We aim to discover new pages quickly so that users can find new content in Google search results soon after they go live. We recently launched a feature that uses RSS and Atom feeds for the discovery of new webpages.

RSS/Atom feeds have been very popular in recent years as a mechanism for content publication. They allow readers to check for new content from publishers. Using feeds for discovery allows us to get these new pages into our index more quickly than traditional crawling methods. We may use many potential sources to access updates from feeds including Reader, notification services, or direct crawls of feeds. Going forward, we might also explore mechanisms such as PubSubHubbub to identify updated items.

In order for us to use your RSS/Atom feeds for discovery, it's important that crawling these files is not disallowed by your robots.txt. To find out if Googlebot can crawl your feeds and find your pages as fast as possible, test your feed URLs with the robots.txt tester in Google Webmaster Tools. >>

http://googlewebmastercentral.blogspot.com/2009/10/using-rssatom-feeds-to-discover-new.html

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!