Very little information in those "answers", which sound like they have been rewritten by the PR department.
Hmm. I just searched for "Google is always working to improve search" to check if the phrase had been used before (turns out it has), and the top result is... this post, which went live 2 hours 20 minutes ago. This is pretty fast for Google to index and display. However, the "cache" link is missing for some reason... is this universal search with a temporary blog crawl result or something?
Yea, I've been noticing this for about three weeks now. Google's indexing speed is way ahead of the competition now. It's very close to real time indexing. Blogs and news seems to get indexed in the regular search results within a few hours. Regular websites gets index it seems within 24 hours (at least the higher PR sites).
I've been noticing the same thing (also for about 3 weeks). I don't know how they're doing it considering how spam-filled blog search is, and also how long it used to take to update the main search index. There really must have been some drastic algorithm changes to improve the short-term accuracy.
If that's the case, they probably have to rely even more heavily on pagerank to ensure someone can't spam the results just by providing fresh data. I also wonder how they make sure to keep old results which still have greater relevancy than new content.
"I also wonder how they make sure to keep old results which still have greater relevancy than new content"
Good question. But I think that alot more people now are using the web for up to the minute info on a variety of events and subjects, so faster inclusion in web results make sense.
For older content with relevant info, I think this probably have something to do with universal search, at least the direction google is going with its algorithm. Older results will surface when the algorithm determines its appropriate for the query. This also means that personalization will probably become more and more important at google, since user intent will become a big factor in determining these things. Google's "Time View" shows that google is thinking about this stuff.
You're right...the time view shows they can tell when a query is/was relevant and could probably be used to determine what's still relevant in old content. It's just amazing that they can weigh the new content versus the old content and mark their relevancy in comparison to each other, and manage not to bury the old content everytime.
And to top it off, their searches don't appear to be any slower despite the improved timeliness of results. That's probably the most surprising part of all this.
I've also recently noticed the quick indexing of new pages, which Googlebot somehow achieves using half the number of hits that MSNBot sends my way.
I guess Google is discovering the new URLs from my RSS feed, retrieving them immediately, and adding them to its main index.
<< However, the "cache" link is missing for some reason... is this universal search with a temporary blog crawl result or something? >>
I've been seeing this for a while too. From what I can gather, they're quick to index the page but slower to distribute the cached content to all their servers. And it's definitely not a feed crawl because I've seen it for pages that don't have feeds.
I think the quickest I've seen it index a page is within about 12 hours of it being published, although I can't confirm that precisely.
But there has been posts at bluehat seo basically saying that the easiest way to artificially get traffic through blackhat tactics is by optimizing for Google images. After all, they haven't got to the point of tearing the image apart bit by bit to see what it actually is. Good article in Wired a few months ago about that. So, yes, pagerank may be the only factor that plays in the actual ranking. Everything else can be faked.
>>Do you have any suggestions to webmasters on how to use the “alt” and “title” attributes on the img element? Is there more than “just go by what the World Wide Web Consortium suggests"? And what importance do these attributes play in Google Image indexing?
>>We recommend that webmasters use these attributes to accurately help describe the image.
I've never noticed a title attribute on any tag making any difference in rankings. Do you think he thought you were asking about the title tag?
<< I've never noticed a title attribute on any tag making any difference in rankings. >>
Same here. I think he's simply implying that you should use them properly rather than trying to use them to game Google or other search engines.
> We recommend that webmasters use these
> attributes to accurately help describe the image.
The thing is, the "alt" text if used properly* isn't (not necessarily) a description of the image. It would be plain wrong for me to use an alt text like the following for the logo above: "black letters to the left, red letters to the right, surrounded by a black rectangle, with a 'slack casual' font, reading 'Google Blogoscoped'". Instead, I just use "Google Blogoscoped," which is a *replacement for the message the image is intended to communicate*. And oftentimes, leaving the alt-text empty is just fine for that purpose, because you may already describe the image in the normal text (thus communicating it to devices where images can't be displayed, or users who can't see them). A more fitting place for a description is the longdesc attribute, which no one uses of course, and also, the title attribute, which however I have my doubts *Google* uses...
*Well, in W3C-conformance...
This google engineer's response is as dull as... well :)
What a PR-washed up hog.
(Edited for clarity: "...such a picture..." -> "...such as a picture...")