Google Blogoscoped

Forum

AOL Shared Private Search Queries  (View post)

Seth Finkelstein [PersonRank 10]

Monday, August 7, 2006
17 years ago18,215 views

Now we see if the sky falls! AOL did more than give us data – it gave us an experiment! :-)

Nadir [PersonRank 0]

17 years ago #

I think Google must be very proud of AOL right now.

mc [PersonRank 3]

17 years ago #

You will also notice there is timestamp information – so it shouldn't be too hard to link an IP address to a UserID if the searcher happens to have clicked on your webpage – or infact any government site. That is a really big privacy breach.

pri [PersonRank 5]

17 years ago #

i love this line:

AOL adds: “Please be aware that these queries are not filtered to remove any content.”

how very proud they are.... what a shame.

Philipp Lenssen [PersonRank 10]

17 years ago #

You are right MC! If you happen to be one of those sites clicked on, and you only received one hit that second, and the user happens to sign up... you can connect their full name to the rest of their searches.

I can't believe AOL did this. What were they thinking?

Users are often tracked over the course of full three months, from March 1 to May 31. That can add up to hundreds or thousands of searches. I'm looking at random users here, some of them accidentally paste full emails (including full names) into the search box. Most everyone will reveal their home town, the jobs they're looking for, the hotels they check in, their fetish, their health problems, etc.

Philipp Lenssen [PersonRank 10]

17 years ago #

> and you only received one hit that second

Correcting my comment: you don't even need to look for that same second. You got the AOL search referrer in your logs, which is enough. E.g. this blog and one of my other sites appears 148 times in the logs...

Seth Finkelstein [PersonRank 10]

17 years ago #

I foresee a grand distributed project to reverse-engineer all the identities!

I'm joking ... I think. I hope?

Mathias Schindler [PersonRank 10]

17 years ago #

Thanks to Prefetching, you don't have to even click on the first weblink that comes on the research page if the search engine supports prefetching and so does your browser.

Ridwan [PersonRank 1]

17 years ago #

I can't believe it. AOL has kind of betrayed their users and they are saying proudly : Nothinz Filtered. On the other hand I should say Google really cares about the privacy policy and the user's privacy as well. They are the best from all aspects.

/pd [PersonRank 10]

17 years ago #

MC has actually hacked the completed data set.. that POC theory is completely viable!!

/pd [PersonRank 10]

17 years ago #

jus one question...in the CAVEAT EMPTOR section, it states

" Please understand that the data represents REAL WORLD USERS, un-edited and randomly sampled, and that AOL is not the author of this data. "

Why is AOL not the author of the data ??

Reto Meier [PersonRank 10]

17 years ago #

I *think* what they're saying is that:
a) AOL didn't 'compile' this as such, they simply logged a random sample of data. Much like Google News isn't the author of what appears on the front page.
b) AOL don't perform the actual search (Google does).

I'm thinking a more than b though.

Tiago Serafim [PersonRank 4]

17 years ago #

Wow... the internet was sooo boring.. now I have something to play with... joking

i don´t think that its needed to import the data in an database... just keep it on your hd and waits until google desktop search index it

Niraj Sanghvi [PersonRank 10]

17 years ago #

Wow. What a staggering lack of thinking things through.

At least everything else is going well for them and will let this get overlooked. Oh, wait... http://seattletimes.nwsource.com/html/businesstechnology/2003173056_webaol03.html

John Krystynak [PersonRank 1]

17 years ago #

I think commenters here are over-reacting. The data set is anonymized, doesn't include user IP address, and doesn't seem enable any tracking of specific user personal information.

The only way you'd be able to tie a query to an *IP address* is if you were the site owner and had log files and could match up timestamps. However, you already have that info if you are a site owner. I don't think anyone could do it across sites.

Before everyone pillories AOL, I think one should give them some credit too. They have a research division that is doing work on search engine use. The people in the research division decided to share data for academic use.

This data is interesting if you want to learn how people search however.

Their research "wiki" shares other data sets and has good papers on how people are using search.

Looking at the data, it's very hard to see how it can be connected to an individual. The anonymous ID is a not a cookie, it's simply a integer index of a session.

The data set includes {AnonID, Query, QueryTime, ItemRank, ClickURL}.
   AnonID – an anonymous user ID number.
   Query – the query issued by the user, case shifted with
   most punctuation removed.
   QueryTime – the time at which the query was submitted for search.
   ItemRank – if the user clicked on a search result, the rank of the
   item on which they clicked is listed.
   ClickURL – if the user clicked on a search result, the domain portion of
   the URL in the clicked result is listed.

Here is some actual data from the dataset. Please tell me exactly how this is going to be tied back to someone's name?

AnonID Query QueryTime ItemRank ClickURL
53 mapquest 2006-03-01 15:18:21 1 mapquest.com
66 cajun candle 2006-03-01 13:20:18 1 cajuncandles.com
66 candle jars 2006-03-01 13:22:29 1 sks-bottle.com
66 muic.com 2006-03-01 22:42:05
66 i need a company name 2006-03-02 12:32:59
66 candle company names 2006-03-02 12:34:27
66 funstuff.alaskamarinos.com 2006-03-04 09:22:30
66 butt naked fragrance oil 2006-03-04 11:55:28 7 candlesupply.com
66 liquid candle dye 2006-03-04 13:20:05 1 craftcave.com
66 liquid candle dye 2006-03-04 13:20:05 3 candlesupply.com

Philipp Lenssen [PersonRank 10]

17 years ago #

Update: I've included some top searches.

Philipp Lenssen [PersonRank 10]

17 years ago #

> The only way you'd be able to tie a query to an
> *IP address* is if you were the site owner and
> had log files and could match up timestamps. However,
> you already have that info if you are a site owner.

Yeah, but you didn't have that guys complete search history over the course of *3 months*. By search referrers you just had *1* search term, which was the one leading to your site, and by definition it wasn't "private" data because you already had it publicly on your site, apparently (or else the search wouldn't have resulted in your site).

> Please tell me exactly how this is going to be tied
> back to someone's name?

I tried to give an example in the 2nd paragraph of my post. That was one example of many, many possible ways to get the identity of a searcher by looking at hundreds of his searches. I *already* found full names and addresses in the dataset even though I only analyzed tiny portions of it.

Reto Meier [PersonRank 10]

17 years ago #

John Krystynak: You're not thinking big enough. I've not got the data in front of me, but look at Philip's example on the main page – someone's entered a whole email in to the search box! With her email address included!

A lot of people regularly search for their name and websites, to check their websites' rankings in results. Going backwards, if you were to see a 'questionable' search, then filter the results to show what else that particular ID searched for – then found searches for my name and websites, well you'd have a good idea I was the one who did the original search.

This gets worse for people who know you. Your boss / partner / friends will probably be able to recognise you based on your search patterns... Then they can find out what else you searched for. Not cool.

From the examples given it seems like some people's searches make it even less of a challenge by searching for full sentences with their own (and other people's) names included.

Splasho [PersonRank 10]

17 years ago #

Yeah, in the past three months I've searched for my name, my telephone number, "links: splasho.com". Luckily it was on Google not AOL.

Adam B. [PersonRank 4]

17 years ago #

John,

I think you are under-reacting. How many times have you googled your own name? Or maybe those of a friend, relative, etc. I know I have more than a few times. Have you ever searched for a store near your house, directions, etc? I think it would be very easy to make a link between search queries and the person who makes them. When google was forced to comply with the government, they simply submitted a random collection of search queries. None of these were tied to a specific user, anonymous or not. There were simply a few thousand random words and phrases that were entered into google. AOL did something far far worse. Since all data was tied to a specfici random user id, you fand out just about anything you would need to know about the person based on their searches. Why anybody thought it would be a good idea to release this info is beyond me.

Inferno [PersonRank 10]

17 years ago #

AOL is stepping towards something like this:

1) No more risk of password hacking
2) No more difficulties in finding info
How it works? No more password protection. all you need is the user ID

http://blogoscoped.com/forum/60670.html#id60732

Reto Meier [PersonRank 10]

17 years ago #

I'm truly astonished. Their attempt at anonymising users by search/replacing 'name' with 'random number' is possibly the most poorly thought-through concepts I've ever heard of.

Search history is *highly* identifiable. It's like following someone around for three months without knowing name, how long would it take for you to find out who they were? Three months! With a concerted Net effort I would think almost all of the 'anonymous' users could be identified before too long. Particularly as they're all AOL users who are likely to use AOL search for *all* their searches.

Philipp Lenssen [PersonRank 10]

17 years ago #

Exactly Adam. And *even though* the Google case was based on a subpoena, and *even though* it didn't request user IDs (from what we know), and *even though* Google was supposed to only hand it over to 1 group as opposed to make it public... Google actually fought that subpoena. Then again, based on subpoena's we need to remember that even Google sometimes hands out definitely private data (e.g. http://blogoscoped.com/archive/2006-07-28-n33.html). But no matter what you think of that that's a whole different league than what AOL did...

Writer Shore [PersonRank 1]

17 years ago #

I'm somewhat surprised that none of the major news companies have any stories on this yet (Google News, CNN, etc). Does anybody have any non-blog "traditional" media reporting on this? Any reason why it wouldn't?

The only articles I've found on AOL today are items regarding their imminent elimination of 5000 jobs and their new strategy to giveaway most of the previously subscription-based services.

Milly [PersonRank 10]

17 years ago #

> Splasho
> Yeah, in the past three months I've searched for my name,
> my telephone number, "links: splasho.com". Luckily it was on
> Google not AOL.

Splasho, do you anonymise your Google searches? If not, what's happened to hapless AOL users might happen to you tomorrow, or next year, or in ten years: http://www.imilly.com/google-cookie.htm#aol

The people at AOL Research aren't fools or marketing twonks: they're clever and serious people doing serious (and well-intentioned) things ... who made a big mistake.

Google's clever and serious people certainly haven't been immune to making big mistakes (the privacy/security breaches of their Web Accelerator, for example). Then there's possible intrusion or extrusion security breaches (no one's security is perfect); Government intrusion (maybe next time, if not last time); future change of policy (or change of implementation of policy); future change of ownership/control; and many more possibilities.

The lesson for ordinary searchers from this debacle isn't just to avoid AOL (it could happen with *any* engine/company), but to try to avoid unnecessarily tagging your searches with a persistent 'User ID'.

Milly [PersonRank 10]

17 years ago #

Seth, I expect your 'joke' may well be right about some kind of double-edged distributed community effort, to name and shame AOL (and others insufficiently careful with such data) by gleefully naming and shaming hapless users.

But I also expect there'll be some people aiming at a more direct financial link, by reverse-engineering identities and selling them (with any associated tidbits revealed) in the phishing marketplaces :(

AN [PersonRank 3]

17 years ago #

I think what AOL means is that the users are the authors of the queries, so they are responsible for the potentially offending content in those queries.

Milly [PersonRank 10]

17 years ago #

John Battelle has something from the horse's mouth: http://battellemedia.com/archives/002792.php

"This was a screw up, and we're angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant.

Although there was no personally-identifiable data linked to these accounts, we're absolutely not defending this. It was a mistake, and we apologize. We've launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.

Here was what was mistakenly released:

* Search data for roughly 658,000 anonymized users over a three month period from March to May.

* There was no personally identifiable data provided by AOL with those records, but search queries themselves can sometimes include such information.

* According to comScore Media Metrix, the AOL search network had 42.7 million unique visitors in May, so the total data set covered roughly 1.4% of May search users.

* Roughly 20 million search records over that period, so the data included roughly 1/3 of one percent of the total searches conducted through the AOL network over that period.

* The searches included as part of this data only included U.S. searches conducted within the AOL client software."

As usual with corporate (and many other ;) apologies, they couldn't bring themselves to be truly honest :-

"There was no personally identifiable data provided by AOL with those records, but search queries themselves can sometimes include such information."

Well that's true, as far as it goes: but the real problem (as Philip has already demonstrated) is with enabling *linking* one such 'search query with personally identifiable data' to another query (itself with or without personally identifiable data), or another hundred, via a 'User ID' (i.e. via a GUID or 'key': http://www.imilly.com/google-cookie.htm#GUID).

John Krystynak [PersonRank 1]

17 years ago #

OK, I see your point, and admit that you have session data that could be probabilistically tied to some name, but it's still just a probabalistic link.

This type of data is available to a lot of people – ISPs, large web site and search engine employees, anyone who distributes toolbars or adware. It is available for purchase if you know where to ask. And those sources have *much more* specific information on who is typing what – i.e. they have IP, GUID and database links to your account.

I still think that AOL's attempt to release a sample of search session data for research purposes deserves some credit.

In the end if you think that what you type into a search engine, or any of your behavior on the internet is private, you are mistaken. Maybe many people don't realize that, and maybe incidents like this will make it more clear. I doubt that it will matter to most people however.

For those who care, there are options like privoxy and tor...

SomeGuy [PersonRank 0]

17 years ago #

Whew! I'm suddenly so happy that I don't go to AOL to fulfill my m*dg*ts sc*t fetish needs!

Philipp Lenssen [PersonRank 10]

17 years ago #

I posted a 2nd update with Milly's info. Found the same AOL statement on Financial Times now. Mainstream media is slowly catching up. I bet it will be all over the place in under 24hrs.

Philipp Lenssen [PersonRank 10]

17 years ago #

> This type of data is available to a lot of
> people – ISPs, large web site and search
> engine employees,

It's a good point that search engines already have this data, in fact, Google will most likely be able to connect it to your full name (whenever you're logged into the Google Account). Google also knows your emails if you use Gmail, etc. Still, that data is not completely public, and users have *some* idea that something they type into Google.com is processed by Google Inc. Or let me put it the other way round: by your measurement, it would somehow make sense for AOL to release all email data too, because AOL already has that data, so people will then better understand the privacy issues etc. But that's throwing out the baby with the bathwater as we say here.

> I still think that AOL's attempt to release a
> sample of search session data for research
> purposes deserves some credit.

I find the data extremely interesting and will do more research with it indeed. If I'd be an AOL user tho, I'd stop using their search. (I'm not an AOL user 'cause they at one time during the 90s didn't allow me to cancel my trial subscription to their service... since then, I never trusted them again. Right choice I guess...) They could have done something like Google Trends – help the research community and *still* think it through properly. Google Trends just delivers aggregate search data and they clearly have a treshold so no exotic (personal-only) searches turn up.

John Krystynak [PersonRank 1]

17 years ago #

I think a likely result is that the research divisions of the search engines will all become a lot less willing to share data in any form, due to potential liability.

So it'll be harder for researchers (and yes, marketers) to understand search behaviour.

I don't think sharing email data would extend from my argument – and I'm guessing that privacy laws would already protect that. Not sure though. Here's a question: should there be laws to guarantee privacy for users when they use search engines? I don't think there should be, but I haven't thought a lot about it – I just assume that the privacy isn't there.

In fact, I used to do things like type an address near my own, but not my actual address when doing map searches, just to add a bit of noise for people trying to correllate my searches and my address. However, now that the Google account has all my info anyways, I've given up on that weak technique.

Still, I think session data is interesting because you can try to find patterns in the search process of users. If you learn that people try certain classes of searches regularly you may improve the results or the process.

I don't think Google Trends is that useful since so much data is removed and no volume metric is provided. It's a fun toy, but I'm not sure how valuable it is as a research tool.

Philipp Lenssen [PersonRank 10]

17 years ago #

> should there be laws to guarantee privacy for
> users when they use search engines?

Let's take a look at AOL's privacy policy...

<<Your AOL Network information consists of personally identifiable information collected or received about you when you interact with the AOL Network's Web sites, services and offerings as a registered user. Depending on how you use the Network, your AOL Network information may include (...)

information about the searches you perform through the AOL Network and how you use the results of those searches; (...)

Your AOL Network information will not be shared with third parties unless it is necessary to fulfill a transaction you have requested, in other circumstances in which you have consented to the sharing of your AOL Network information, or except as described in this Privacy Policy. The AOL Network may use your AOL Network information to present offers to you on behalf of business partners and advertisers. These business partners and advertisers receive aggregate data about groups of AOL Network users, but do not receive information that personally identifies you.>>

So maybe the question is: are sites legally bound to their Privacy Policy by US law?

Travis Harris [PersonRank 10]

17 years ago #

I think we are all missing an important point here. It is not what information about me people may find about me, but information that people may find and misconstrue.

For example, I am a very strong anti abortion person. I may be looking up different kinds of birth control online to find out if they are abortive for a discussion I'm having with some other Christian guys.

(enter the fiction)

I'm an AOL user, and my wife gets her hands on this file to "spy" on me. She sees that I am looking up all sort of birth control. She is 6 months pregnant and suddenly "knows" that I am having an affair and files for a divorce! I tell her what happened, but why would she believe me? After all, she just found out I'm cheating on her. We end up getting divorced, our kids get messed up over the whole ordeal, their marriages suffer for it! The effect of a couple employees at AOL may cause HUGE damage on my family for generations to come!

Not a big family person so this does not get you?! what if it was slightly different. Maybe my wife (due to her pregnancy) is in a very emotional state when she "finds out" and commits suicide, taking her life and the unborn babies. Leaving a husband and two young children who really need her.

I could think up dozens of other little twists here that are horrible to even think. This could be bad!

gary price [PersonRank 10]

17 years ago #

You write "oh the irony" about Google being the first result at AOL Search. Yes, given Google's stance earlier this year it is.

However, it comes as no surprise that Google is the number one search query.
Yes, to advanced searchers (Google Blogoscoped readers) we understand the differerences but we often forget that many/some/ a lot of people don't.

Look at the most popular search terms at Dogpile for 2005 and you'll see what I mean.
http://blog.searchenginewatch.com/blog/051216-184738

Kind of a double whammy for Dogpile since their results already contain Google results.

Niraj Sanghvi [PersonRank 10]

17 years ago #

The story has hit the digg frontpage: http://digg.com/security/AOL_Apologizes_For_Release_Of_User_Search_Data

AOL has publicly apologized (http://news.com.com/2100-1030_3-6102793.html?tag=nefd.top) saying:

"This was a screw-up, and we're angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant," AOL, a unit of Time Warner, said in a statement. "Although there was no personally identifiable data linked to these accounts, we're absolutely not defending this. It was a mistake, and we apologize. We've launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again."

Travis Harris [PersonRank 10]

17 years ago #

Ya know, I suddenly feel bad for downloading that file and being part of the problem. The more this word gets around, the more people could get destroyed by it. But I guess it is too late, and freedom of information is incredibly important.

Niraj Sanghvi [PersonRank 10]

17 years ago #

Travis, I don't think you're part of the problem unless you downloaded it for malicious purposes. The real problem at this point is probably the sites that have duplicated the content and are making it freely available.

alaskan.greg [PersonRank 1]

17 years ago #

Hello, I'm new to all this so please bare with me. What you guys are talking about here really scares me. Between the posts on this AOL topic and the forum rules I wasn't really sure whether to use my full real name, if I have broken the rules, Phillip, could you please let me know.

Milly, if you don't mind could you briefly explain how to "anonymize" my queries? And, if I understand you what difference would it make anyway, as one of the other posters said or implied that none of that is of initial importance if someone can track the msg back to the "IP." I hope I have that right.

Finally, if I may, reading about "caches" I had a troublesome thought, when "Google" and "Firefox" talk about caches as did Phillip above, are they talking about some storage place In Addition To My Computer's Cache?"

Alaskan Greg

N0t Ma Reel nom [PersonRank 0]

17 years ago #

I rushed to download this puppy, but I see I didn't have to. So many mirrors...

I'm more amused than upset. Those of us that work in the web industry know that even if you take extra steps you can't really hide. It's not easy and the average person is constantly droping loaves of bread everywhere they go. Given just a few pieces of information I can usually track down someone and get more information about them. Phone number, name, address, email address, etc. It's amazing what you can find on the web. The more people use it, the easier it is.

Question: Are there really many of you involved with the Internet that are surprised at lack of privacy? Just think: AOL has all the IPs, email addresses, and screen names of everyone (Not to mention MSN.com has. Now THAT'S scary!).

Here's another bit of information while we are on the subject: Search query data is the tip of the iceberg. I got it directly from someone at Alexa that they not only can see query data, they can also see all URLs you type or click on, and that includes all the pages of all the sites that you visit. I asked them if that meant they could get (limited to Alexa users) traffic stats from other sites and the answer was "yes". Gee, talk about access to competitive data...

Ok, I gotta go see if I can read through that scientific paper that used that data. I'm wondering if they saw the trees in the forest... :-)

AOLSearchLogs.com [PersonRank 0]

17 years ago #

We also have a web interface to the data. It is available in http://www.aolsearchlogs.com

t xensen [PersonRank 4]

17 years ago #

On the plus side, the data lends itself to a multitude of matchmaking possibilities:

http://stutteringhand.blogspot.com/2006/08/romance-hogs-and-aol-user-66.html

Philipp Lenssen [PersonRank 10]

17 years ago #

Weird. When I import the data into MySQL using LOAD DATA INFILE, I end up with 37.6 million queries, not 20 million as AOL says...

Philipp Lenssen [PersonRank 10]

17 years ago #

> I wasn't really sure whether to use my full real name

Full names are preferred, but staying anonymous is accepted.

> Finally, if I may, reading about "caches" I had
> a troublesome thought, when "Google" and "Firefox" talk
> about caches as did Phillip above, are they talking about
> some storage place In Addition To My Computer's Cache?"

Google stores a copy of (nearly) every web page. They also republish a copy, that's called the Google Cache (below the Google snippet, there's a link, unless your website disallows this). Don't ask me how the Google Cache can be legal under current US copyright laws, I think it's strange. :)

SEO Portal [PersonRank 1]

17 years ago #

[put at-character here] t xensen.. fun posting!

Im really looking forward to all the analysis people will be doing the next couple of days.. im sure some really funny stuff can be found in that data ;)

All together with really usable and disgusting information im sure ;)

SEO Portal [PersonRank 1]

17 years ago #

I just opened the first few txt's (going to import in database now) and first thing I noticed is how many duplicate searches are there!
Anyone has figures yet about how many results people click on a single query on average?

iZeitgeist [PersonRank 10]

17 years ago #

Look:

http://www.aolsearchdatabase.com/

via Techcrunch

SEO Portal [PersonRank 1]

17 years ago #

that's nice, but not what you want to do with the figures imho;

you want to get % or numbers of people behaviour..

Reto Meier [PersonRank 10]

17 years ago #

Jesus. I've just had a play with the dataset (via iZeitgeist's link), and it's not pretty. It looks like thousands of AOL users got hit with a Paypal phishing email, and the first thing they did was enter it in to their search box:

"dear [full name removed from here] this email was sent automatically by the paypal server in response to your request to recover your password. this is done for your protection --- only you the recipient of this email can take the next step in the password recovery pr"

That's the full name – first, last, middle initials. I picked a guy at random and now I know his full name, where he lives (Ohio) where he's moving (Georgia), and what car he drives (Chevy van – he's looking to pimp it out). Turns out he's looking for insure his life as he may have stomache cancer. He's also looking to enlarge his, ah, 'assets' – but only if he wins the lottery.

Hello, AOL? Worst. Idea. Ever.

Alex Ksikes [PersonRank 10]

17 years ago #

This is really scary stuff.

How much more privacy are we gonna trade for convenience?

People should be much more aware of what's going on. In some sense it's fortunate AOL screwed up as it will raise public awareness.

Reto Meier [PersonRank 10]

17 years ago #

In their <cough>apology<cough> AOL claim that while this is a breach of their corporate policy (to not do stupid things), it's *not* a breach of their privacy policy.

I'd like very much to know if the other search engines – Google / Yahoo / etc. would consider it a breach of theirs.

Also bizarre thing IMO is that the main stream media are reporting this as 'crazy bloggers force AOL to retract anonymous search queries' (see Times online), rather than 'AOL tells users: "smile and bend over"'

TheSarc [PersonRank 0]

17 years ago #

Check this out: AOL user says "just kidding" about searching "How to kill wife" 30+ times

http://thesarc.blogspot.com/2006/08/aol-user-says-just-kidding-about.html

Philipp Lenssen [PersonRank 10]

17 years ago #

TheSarc, you almost got me there...

Milly [PersonRank 10]

17 years ago #

> Milly, if you don't mind could you briefly explain how to
> "anonymize" my queries?

Greg, I don't want to spam this thread with links, so if you follow the URL in my first post you'll find my page which briefly, then in more depth, answers your questions (with links to provide even more depth, if you want it). The first few paragraphs, and FAQ 10, particularly.

But briefly, you can help to anonymise your queries by removing some (possibly all) persistent 'tags' which tie your queries together. On Google, for example, you can alter the GUID within the persistent cookie Google normally places via your browser. It's easily done using the GoogleAnon bookmarklet (provided at my page), or other means (suggestions at my page: cookie blocking or flushing; user scripts; etc).

> And, if I understand you what difference
> would it make anyway, as one of the other posters said or implied
> that none of that is of initial importance if someone can track
> the msg back to the "IP." I hope I have that right.

Generally, security/privacy measures can be important even when they're not perfect: it's usually a matter of a (personal, subjective) trade-off between convenience and effectiveness. Throwing in the towel and saying "You have no privacy. Get over it" (as Sun's Scott McNealy once did), is the best way to make it a self-fulfilling prophesy.

Specifically, with regard to IP addresses ... well, it depends :-

- If you have a permanent or persistent IP address, then yes, that makes it harder to anonymise your searches. You may or may not decide it's a reasonable trade-off for you to use an anonymising proxy (which can hide your IP, but may have convenience and speed drawbacks). Or you might decide that the risks of your IP 'tag' being retained and potentially disclosed are less than that for your GUID 'tag', and that you'll live with the lesser risk, if you can remove the greater.

– If you have a non-persistent or changeable IP address (most dialup users do, for example), then that 'tag' may only last for a single session, and thus is likely to be of much, much less risk as a 'tag'. (Which is one reason why it might be retained/logged less often or less scrupulously than specifically assigned 'tags' like cookie GUIDs or Screen Name logons). Even so, someone with access to both your search engine logs and your ISP logs (government, common owner, commercial partners, etc) could connect the dots. But that's less likely than if it's all already in one giant database.

So, the best plan (in my view) is to weigh up the risks and conveniences which are of concern to you, and act accordingly. If you wouldn't be without Google Personalised Search, for example, then there's little point worrying about their cookie, or your IP. You're already betting that Google will never make a big mistake, or have a big change of heart over the balance between ethics and commerce (or rather another one, counting China), or that the government won't data-mine the lot, perhaps secretly (or hasn't already).

If the balance of your concerns is a little more toward privacy, then maybe it's worth not using 'logged-in' services or search toolbars (or only for non-sensitive searching); or using TOR to hide your IP; or blocking/flushing/subverting your cookie; or other such practices.

Only you can decide what's right for you. But giving away the least information you reasonably can, whilst still being happy about the convenience and utility of searching, seems the right path, to me.

For myself, that means using the GoogleAnon bookmarklet ('cos it's set-and-forget easy), and only using Google Account services discretely (e.g. my separate SiteMaps or Gmail Accounts don't 'know' about my Search History). If I had a permanent IP, I'd probably use TOR, but I'd still zap my cookie GUID even if I didn't. Think of it as a form of layered defence.

alaskan.greg [PersonRank 1]

17 years ago #

Philipp, thanks for responding to my post. It wont change anything for me personally but it seems if anybody keeps something (a web page I visited for example) on file without expressly telling me so up front in regular print then that seems not just a matter of legality but one of ethics. But what do I know, I'm just an uninformed (read that as gullible) consumer that they make their living on.

If possible, I would still like to know what Milly was talking about..."anonymize" and how that is done.

I'm beginning to work with a Google link (Computer Hope) and their "YouOS" add on for Mozilla Firefox. What I hope, is to gain an understand (within my limited abilities) is a clear grasp/picture of the foundational structure behind my computer and its OS and then where all the player positions are and their roles/functions are when I am communicating with you and viceversa. Then, if all is well, work up from there. Anyways, Philipp, thanks again.

alaskan.greg

alaskan.greg [PersonRank 1]

17 years ago #

Milly, thank you so much for your response and it was great. It was like reading a spy mystery with suspense, say you aren't the one... Seriously I enjoyed reading your answer I will follow your advice.

We must have been drafting at the same time. I am, for medical reasons slow, and so you beat me to the post and as I had checked the forum before I started my answer to Philipp, and not finding your post, I just assumed yours wasn't there and consequently I sent the one above to Philipp.

akaskan.greg

Travis Harris [PersonRank 10]

17 years ago #

I have two questions.

1) Could the government use this data to pursue crime? Would it be admissible in court? (please answer if you know for different countries)

2) Do we yet know of any class action lawsuits to come out of this?

Philipp Lenssen [PersonRank 10]

17 years ago #

> 1) Could the government use this data to pursue
> crime? Would it be admissible in court? (please answer
> if you know for different countries)

I once asked that to my favorite US net-lawyer (well known guy!):

<<In the recent case* where the government received search logs from
major search engines, what happens if the government finds "illegal"
search queries ("child porn", "how to build a bomb", "how to murder
someone" etc.) ...? Do they have the legal right to go to the search
engine to ask for the IP (and/ or user account) of that query?>>

His answer was:

<<"Yep. It is a basic principle of criminal law: if the cops are
legitimately in your house, and happen to see an illegal gun, the
fact they were not there to find a gun doesn't matter. you're toast.">>

In the case of the Google vs gov't case it turned out the subpoena was restricted to certain usage rights, so the answer for that past case may be "no". But the answe in the current case seems to be "yes".

> 2) Do we yet know of any class action lawsuits
> to come out of this?

German Spiegel reported it's possible AOL is facing a fine of 658 million dollar, at 1000 dollar per individual liability. This might be complete speculation. I'll wait until I hear more or hear it from a US lawyer.

Gunther Eysenbach [PersonRank 1]

17 years ago #

A wiki to discuss the AOL 500k search data has been set up at http://www.jmir.org/wiki/index.php/AOL500k – everbody is invited to post observations about the data.

jilm [PersonRank 10]

17 years ago #

Really 36 millions? Everywhere I´ve read only 20, why it is changed here?

Corsin Camichel [PersonRank 10]

17 years ago #

jilm its 658'000 users, searched for 20million different things with 36million queries

jilm [PersonRank 10]

17 years ago #

So 16 million queries were duplications? Like if 1000 people seach for "sex" it´s 1 thing and 1000 queries?

If it´s true... I expected much more duplications.

Philipp Lenssen [PersonRank 10]

17 years ago #

I've imported the data into MySQL and it showed around 36 million searches, so that's when I adjusted the value. Plus I got confirmation from Corsin who was also playing around with the data. Apparently most reports on this were false.

elanghe [PersonRank 0]

17 years ago #

In light of AOL's release of its users search information, two free services just launched to help users protect their privacy while searching online. The first, called LostintheCrowd.org (www.lostinthecrowd.org), allows you to register your search engine cookie for AOL, Ask.com, Google, MSN, or Yahoo. The lost in the crowd servers then run random queries on your behalf on a regular basis. The second, called Track Me Not (http://mrl.nyu.edu/%7Edhowe/TrackMeNot/), works as a Firefox extension that will submit queries directly from your browser for random things. Both service work on the idea that if you submit enough random "noise" any "signal" which may reveal your personal identity will get lost making it difficult for the search engines, or anyone who may subpoena their data, to figure out who you really are.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!