Google Blogoscoped

Forum

Google incorporating site speed in search rankings  (View post)

ArpitNext [PersonRank 7]

Saturday, April 10, 2010
7 years ago6,955 views

<<Google is incorporating site speed as one of the over 200 signals that we use in determining search rankings>>

1> mattcutts.com/blog/site-speed/

2> googlewebmastercentral.blogspo ...

Above 1 comments were made in the forum before this was blogged,

TS [PersonRank 1]

7 years ago #


Interesting, but also sort of odd, and there is a whole bunch of questions lurking underneath this decision to announce speed as a feature.

First, what is the reason for using speed? There could be at least three I can think off: (1) speed as a signal for relevance or for content quality, (2) speed as a signal for user satisfaction, independent of or beyond relevance or quality, and (3) adding speed as a signal just in order to give sites a strong incentive to improve their speed.

Concerning (1), I would not be surprised if speed has been used as a signal for relevance by engines already. One might speculate that speed can tell you something about the quality of the content, in that good content tends to live on faster sites (while maybe large spam farms tend to be slower as they use the cheapest hosting, but this is speculation). I am not sure about Google, but some of the large engines rely on machine-learned ranking functions – in this case, the question is not about whether using speed is fair or right, but only whether it carries a useful signal that improves relevance on average. If the font color of a page (say, pink versus blue) or the birthday of the webmaster gave you a useful signal for relevance, you would use that too, and would not have to discuss if users like pink or blue more. It's just a signal.

Concerning (2), even if speed does not carry a signal for relevance, it could carry a signal for user happiness. In a machine-learning approach, this of course means that site performance has to be part of the evaluation performed by the site's human annotators. (Search engines employ humans to get query judgments that are then fed back into the ranking function, but if the annotators are told to only look at content relevance and quality then speed might not register as a useful feature.) Note that in both cases (1) and (2), there is no real benefit in announcing the use of speed to the public.

It seems that at least part of the reason is (3), and that Google has made a decision to push sites in order to increase average internet speed, as an editorial judgment (purely algorithmic search, my a**). This does raise some concerns: Even if we agree that a faster internet is good, should Google push this just because they can? If yes, how about pushing other issues that are considered good, say whether a company running a web site has a good recycling program? Is that Google's job? Secondly, are there other ulterior motives for this, say pushing people in the direction of Google products such as their DNS optimizations and other web site software that will over time lock them in?

So, this is interesting but also a little worrisome.

Roy [PersonRank 0]

7 years ago #

TS, interesting point re: (3), BUT... to me it seems a stretch to say that Google would encourage speed in order to encourage the use of their own products.

From a machine learning point of view, this signal is used as one of many features. There's no need to be concerned about being "penalized twice" by the introduction of a feature like speed. i.e. if PageRank is indeed exactly correlated to speed, then one of the signals will be ignored by the machine, as it will be redundant.

The only question you need to be concerned about is what Google is asking the machine to learn. Learning to rank results in order to give user satisfaction, or something else? Do you care? Should you care?

Philipp Lenssen [PersonRank 10]

7 years ago #

> There's no need to be concerned about being "penalized
> twice" by the introduction of a feature like speed. i.e. if PageRank
> is indeed exactly correlated to speed, then one of the signals will
> be ignored by the machine, as it will be redundant.

Or it could be that every signal adds its own points to the overall relevancy mix (then multiplied by its importance) and added to the overall score, in the case of which a site would indeed be penalized (or boosted) twice. Or how do you think would Google's algos be able to know that its due to high/ low speed that a site got many/ little backlinks?

As an example, sites which have server problems may lose tons of backlinks because their site crashes the second they make it onto a big news site, like Reddit or Digg or Slashdot. The same site, due to its server problems, might now also be penalized by Google's speed signal, because sites that can't carry their load tend to be slow. So it loses twice, even when the backlinks count on its own would already have reflected the speed/ server problems issues (in this hypothetical case at least, I'm not saying it really works like that).

TS [PersonRank 1]

7 years ago #

Roy: Yes, the last part about pushing people towards their own products was very speculative. I guess given the recent moves by Google that concern the core of the internet (DNS, peering) I am getting more paranoid. Just something to consider.

Concerning the discussion about Pagerank and double counting, it is not clear to what degree Google (as opposed to others) uses machine learning as the main ingredient of their ranking technology. My understanding is that they need to use some ML techniques on some level to manage their hundreds of features, but they might also still use more human "ranking function hacking" than other players. But end result is the same. And this goes to the core of cases (1), (2), and (3) – (1) means no double counting if done right, and (3) means double counting speed (or overemphasizing speed) on purpose to push sites to increase theirs.

Philipp: I think you are mistaken about your concern on double counting. A sound ML approach will figure this out, at least "on average" if not for every unlucky individual site. Also, do not overestimate the importance of Pagerank as in "simple link analysis" – that makes for a cute story and is thus often trotted out in articles, but things have moved on since 1998. If we want to keep on talking about Pagerank as a measure of global site importance or quality, then we should assume that Pagerank aggregates many signals such as links, click-through frequency, toolbar trails, spamicity, site update history, etc, etc. So, either Pagerank is not very important anymore, or it is not just based on links – take your pick :).

Philipp Lenssen [PersonRank 10]

7 years ago #

> Philipp: I think you are mistaken about your concern on
> double counting. A sound ML approach will figure this out,
> at least "on average" if not for every unlucky individual site.

We don't know what exactly Google implements (unless you work at Google), so how can you be so sure? I'm merely pointing out the possibility that something like double counting could happen, I am not sure, but you sound as if you're sure that it can be ruled out... but based on what evidence?

As far as backlinks/ PageRank being of less importance these days, what makes you think that, what evidence do you see? Google keeps emphasizing they have 200 signals but it may well be that (smart, refined, spam-disregarding) backlink counts are still a hugely important signal among those. I don't know either way, but how can you be sure to know that this is not the case?

Roy [PersonRank 0]

7 years ago #

Philipp, Google has probably the best people in the world working on the best machine learning research in the world. The idea of this "double counting" problem is pretty trivial. You describe a simple linear classifier, and I'm confident Google's classifiers have moved a little beyond that kind of starting point.

Anyway, again, the more important question is: What is Google training their classifier to maximise? If you can establish the answer to this question, I'd say you can be pretty confident that their classifiers are using the best known approaches (on the cutting edge of ML research) to maximise that metric. Perhaps their use of speed as a feature gives us a hint...?

WL [PersonRank 0]

7 years ago #

I have some queries though:

-What is your method of preloading the images?
-Have you clear your cache before trying to load again?

Thanks!

Philipp Lenssen [PersonRank 10]

7 years ago #

> The idea of this "double counting" problem
> is pretty trivial.

But how is that trivial do you think? After all, to discount the effect you would have to exactly know *which* backlinks are missing because a site was slow (so slow that the blogger or whoever pondered giving a backlink decided against it, and perhaps linked to the competition). A server running on its limits tends to be slow and also tends to go down when hitting the frontpage of hugely important sites like Digg or Slashdot, and this in itself can cause the loss of hundreds of backlinks because people can't see the site. For all we know, losing hundreds of backlinks *will* negatively effect your ranking in Google, nevermind where you stand in the discussion of "PageRank is dead".

Perhaps the misunderstanding between our arguments is that you believe Google is merely trying to find corroborative evidence that a site is "very good". Perhaps for a lot of the queries you're right and that's enough, and then the speed signal will just be thrown in the mix without actual ranking relevance. However, I'm specifically talking about those portions of results where two sites are completely equal and relevant by all other signals, and Google's rankings will now have to make a decision which site to put on top – if one site is slower, it could now mean that this one site will jump to ranking #2, *even if* it was already penalized through a lack of backlinks (and would otherwise *already have been ranked better per all other signals* than the competing site, i.e. would have been shown on top and not even have been considered equal in Google's eyes to begin with).

Also, I don't think Google's algo is yet at the stage where it's self-evolving to maximum relevancy... it's still at a stage where there are programmer-editorial decisions being made to determine. They make the decision which signals to use, and there's a variety of human choices here. These decisions can be corroborated against click-through data from the real world when testing the prototype, and the survey feedback from quality testers, but you still need to make the programmer-editorial decision of which alternatives to test against in the first place. If we presume that Google's quality testers were more satisfied after the addition of this speed signal (and we further presume that the test setup gives perfectly valid results) then perhaps all is well :)

> What is Google training their classifier to maximise?

Perhaps they're trying to maximize user happiness, as measured by click through count, click through speed, how long a user remains on the target site (instead of going back to do another search), how high the score from quality testers returned is, and perhaps more. Though it's a bit of a potentially dangerous metric too: imagine that there would be a correct and a false answer to a research query, but the false answer is much more satisfying because its proclaimed "truth" is much more easy to grasp, more positive for the user, causing less cognitive dissonance and so on. Further imagine that 90% of the researchers, say on the subject of a political discourse, only aim to back up their opinion which was already formed through government propaganda, hence continue their search until they found their world view verified or expanded upon! Would these people consider it a "bad" result when the actual truth shines through, because they think it's a false answer?

> -What is your method of preloading the images?

I just load 50 rather highly compressed images at once, so the first ones will load first, and these are the kind of images one may spent several seconds looking at, before slowly continuing to scroll down. I'm not saying this is a definitely better loading behavior (perhaps it's only a "niche" use that better fits the taste of a portion of users in comparison to a plain old paged thumbnail view, but even then that niche may have its use), I just tried to make it a point that it's at least debatable, i.e. that sometimes there may be cases where perceived loading differs from actual loading, and you may find different examples. But generally speaking I would argue that many users just care about whether they can already start to read/ use/ look at the site, not specifically whether every last bit of the site finished loading alredy.

Rodrigo D [PersonRank 0]

7 years ago #

Ad clicks increase considerably when a web page opens quickly. That is the reason behind this.

TS [PersonRank 1]

7 years ago #

Philipp: Roy already gave the answer. I do not work at Google, but know that their people are quite competent, so I assume they won't make simple mistakes. Neither will the people at the other major engines.

On backlinks: My impression is that most people in the search engine research community think this is a fairly useful signal, but not dominating everything else. There is also some support for this impression in the literature – whenever people try to add Pagerank to improve term-based rankings, the improvements tend to be moderate at best. Now, among the 300+ signals, links are probably more useful than most. But all depends on the class of query and many other things, and no one signal apart from basic term frequencies really dominates across classes AFAIK.

Philipp Lenssen [PersonRank 10]

7 years ago #

TS: It may not be a mere "mistake" – it could after all be e.g. a strategical decision to boost the issue of speed more than a perceived quality improve would justify... for instance because Google believes a faster web is a politically and socially wanted goal. I'm still looking to hear arguments that go beyond "we should trust Google will do everything right" or "their machine learning algos automatically take care of problems". Even IF you have a trust they thought through the problem mentioned, then I'm interested to find out what exactly you believe they might have found takes care of this problem... because that could help shed light on their approaches.

This thread is locked as it's old... but you can create a new thread in the forum. 

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!