Google Blogoscoped

Forum

20-25% of searches are new everyday

Philipp Lenssen [PersonRank 10]

Saturday, June 23, 2007
16 years ago49,277 views

Read/ Write Web quotes Google's from a presentation by Google VP of Engineering Udi Manber, who said:

"20 to 25% of the queries we see today, we have never seen before"
http://www.readwriteweb.com/archives/udi_manber_search_is_a_hard_problem.php

This isn't news as Alan Eustace said the same thing at the Google Press Day 2006*, but it was hard to believe back then and Alan wasn't 100% sure it seems, so this is an interesting confirmation. Also, Matt Cutts at Search Engine Land says, "It's definitely that high. The long tail is very, very long."**

* http://blogoscoped.com/archive/2006-05-10-n76.html
** http://searchengineland.com/070622-085337.php

Tony Ruscoe [PersonRank 10]

16 years ago #

They said something similar at Google Press Day this year. I thought I'd made a note of it but can't seem to find it...

Tony Ruscoe [PersonRank 10]

16 years ago #

Ah... here's a photo of the statistic which I took at Google Press Day:

http://farm2.static.flickr.com/1295/597968628_2366739cfb.jpg

http://www.flickr.com/photos/ruscoe/597968628/in/set-72157600445949884/

Roger Browne [PersonRank 10]

16 years ago #

This implies that Google has kept records of search terms "for all time". Otherwise, a claim like "never before" is empty.

It would be easy enough for someone to check the semi-anonymised search logs that AOL released, to check whether this figure is ballpark.

My own guess is that the figure of 20 to 25% doesn't just refer to the text of the search term, but to the totality of the query: search text, country, personalization etc.

Philipp Lenssen [PersonRank 10]

16 years ago #

Interesting implication you bring up there Roger. Now I'm curious...

Ionut Alex. Chitu [PersonRank 10]

16 years ago #

<< My own guess is that the figure of 20 to 25% doesn't just refer to the text of the search term, but to the totality of the query: search text, country, personalization etc. >>

That's definitely not true. If you took into account all these factors, most queries would be unique.

Martin Porcheron [PersonRank 10]

16 years ago #

I wonder if they count the suggested queries, refined queries and the spelling-corrected queries as different ones?

Roger Browne [PersonRank 10]

16 years ago #

Ionut: I wonder if that's the case. I would think that most search queries come from people who are not logged in, and are for really common searches e.g. [paris hilton].

Ionut Alex. Chitu [PersonRank 10]

16 years ago #

Yes, but there are also people that enter questions. And those questions can be pretty long/complicated/diverse. There are also news that talk about not-very-well-known persons/events, people that enter URLs in the search box, spelling errors, subjects that very few people are interested in.

http://farm1.static.flickr.com/16/21051526_6112a9caa7.jpg

JohnMu [PersonRank 10]

16 years ago #

Time to dig out the old AOL database :-)

Philipp Lenssen [PersonRank 10]

16 years ago #

<<This implies that Google has kept records of search terms "for all time". Otherwise, a claim like "never before" is empty.>>

I now got a statement from Google:

<<First, I want to clarify that we do not keep searches from 1998. The 20-25% we've stated is only an estimate, which is why we gave it a wide range. We cannot compute the exact number, so we gave a ballpark number, based on some reasonable assumptions.>>

So basically, they made up that number for marketing purposes ;)

Matt Cutts [PersonRank 10]

16 years ago #

Philipp, I think that's a pretty accurate estimate if you look over a time period of a month or so. So if you had queries from the last month or so, 20-25% of queries the next day would be new/unique. It also depends a little bit about whether you're defining it only as web queries, or all queries to Google (e.g. blog search, book search, patent search, etc.).

Philipp Lenssen [PersonRank 10]

16 years ago #

Good that we got that corrected, there's a pretty big difference between "not this month" and "never"...

Roger Browne [PersonRank 10]

16 years ago #

The README file for the AOL search log [1] tells us that the log contains 21,011,340 searches comprising 10,154,742 unique queries (after normalising to lower case and removing most punctuation).

In other words, the average number of occurrences for each query was about two. A quick browse [2] through some of the AOL data [3] shows that many queries appear more than twice, so there must be a large number of uniques to compensate.

A larger data set, such as Google's, would have fewer uniques, but this quick glance at AOL's logs leaves me satisfied that 20-25% uniques after a month is believable.

Coming at this from a different angle, section 2 of the published paper "A Picture of Search" by AOL/Raybeam [4] is interesting. From figure 2.1 it looks as if their data shows that the top 25% of searches came from just 5000 unique queries whereas the bottom 25% of searches came from 80 million unique queries.

Figure 2.3 sheds some light on the difference between "not this month" and "never". About 65% May's queries also occur in June, but under 40% of May's queries also occur in the following November, and only about 25% of May's queries also occur in the following April.

The paper includes lots of other interesting information. For example: the average search query contains 3.5 terms (words); 20% of the users perform 70% of the searches; 40% of users on a given day perform only one search that day, etc.

[1] http://www.gregsadetsky.com/aol-data/U500k_README.txt
[2] http://data.aolsearchlogs.com/log/list.cgi
[3] http://www.gregsadetsky.com/aol-data/
[4] http://www.ir.iit.edu/~abdur/publications/pos-infoscale.pdf

JohnMu [PersonRank 10]

16 years ago #

I checked the AOL database and came up with 59% unique queries – see http://www.cre8asiteforums.com/forums/index.php?showtopic=51740

Who'da thunk :-) – that's a big number.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!