Q: How Google Blog Search know the target is a blog? - Google Blogoscoped Forum

Forum

Q: How Google Blog Search know the target is a blog?
Juha-Matti Laurio	Wednesday, October 28, 2009 14 years ago • 3,589 views
What are the means for Google to recognize blogs when it is crawling the Web? I was searching a specific blog with this string http://blogsearch.google.com/blogsearch?hl=en&um=1&ie=UTF-8&q=wasc+statistics&btnG=Search+Blogs but results include mainly news sites, only 1 Blogger url included to first hits. It's easy to recognize a content made with WordPress, Blogger etc., but what about the non-typical blogging platforms?
Philipp Lenssen	14 years ago #
Do they just check if a site has an RSS feed?
WebSonic.nl	14 years ago #
That could be but for example e-commerce websites sometimes also have an RSS feed and then the website is not a blog. <<Why does Google crawl/index blogs (specifically sites notified by "WordPress XMLRPC pings") so much faster than a "normal" site submitting a revised Sitemap. What is the impact of that on the overall "quality" of the index?>> http://www.youtube.com/user/GoogleWebmasterHelp#p/c/841CB8F9F31BF5D5/11/k8PQ3nNCYuU
Tony Ruscoe	14 years ago #
I think the answer is that they don't know if the target is a blog. Like you say, it also includes news sites and forums. I've said it before, and I'll say it again, they should have called it "Google Feed Search" or something.
/pd	14 years ago #
dont blog's send our pings when a post is published ? Whereas webpages are static and only get crawled/spidered depending on the robort.txt ?
Tony Ruscoe	14 years ago #
Not necessarily. You could easily have a feed for a site which isn't a blog yet still sends a ping when it's updated. And not all blogs send a ping when a post is published. Google Blog Search indexes anything with a feed AFAIK.
mbegin	14 years ago #
Semi related: << Using RSS/Atom feeds to discover new URLs: Google uses numerous sources to find new webpages, from links we find on the web to submitted URLs. We aim to discover new pages quickly so that users can find new content in Google search results soon after they go live. We recently launched a feature that uses RSS and Atom feeds for the discovery of new webpages. RSS/Atom feeds have been very popular in recent years as a mechanism for content publication. They allow readers to check for new content from publishers. Using feeds for discovery allows us to get these new pages into our index more quickly than traditional crawling methods. We may use many potential sources to access updates from feeds including Reader, notification services, or direct crawls of feeds. Going forward, we might also explore mechanisms such as PubSubHubbub to identify updated items. In order for us to use your RSS/Atom feeds for discovery, it's important that crawling these files is not disallowed by your robots.txt. To find out if Googlebot can crawl your feeds and find your pages as fast as possible, test your feed URLs with the robots.txt tester in Google Webmaster Tools. >> http://googlewebmastercentral.blogspot.com/2009/10/using-rssatom-feeds-to-discover-new.html

Advertisement

Blog | Forum more >> Archive | Feed | Google's blogs | About

Advertisement

This site unofficially covers Google™ and more with some rights reserved. Join our forum!