Google Blogoscoped

Forum

Google To Better Anonymize Server Logs  (View post)

/pd [PersonRank 10]

Wednesday, March 14, 2007
17 years ago5,634 views

Not sure.. but will this work ?? Afterall log files for server still need to be preseved for audit trails etc etc ..5 yrs in the US (?) – SoX act ??

mrbene [PersonRank 10]

17 years ago #

What does "Cookie anonymization" mean? Does that mean that my email address will no longer appear in plain text in my mail.google.com "gmailchat" cookie?

/pd [PersonRank 10]

17 years ago #

yes.. "Sanitizing Packets for Fun and Profit" is a kewl thing :)-

"Does that mean that my email address will no longer appear in plain text in my mail.google.com "gmailchat" cookie?"

No, it means that within 12-18 months ..your cookie should replenish its seed values and it also means without service disruptions.. given the fact that events like DST setting are real world happenings...!!

Secondly.."anonymizing our server logs after 18-24 months" could also mean a simple parameter find and replace code "AAAA" with "ZZZZ".. with string that is a 64bit hashed key. A one time run programme that seeds with a 512byte algo...

IMHO, RFC 2095 [ http://www.ietf.org/rfc/rfc2965.txt] needs to be considered....

BUGabundo [PersonRank 7]

17 years ago #

well:
http://blogoscoped.com/patriot/search/?q=PORN
aint working!
How can I search for porn with them knowing?!!
lol

Hong Xiaowan [PersonRank 10]

17 years ago #

Funny

Roger Browne [PersonRank 10]

17 years ago #

The subtext is pretty clear: your search history is shared freely with the authorities for at least 18 months.

Hong Xiaowan [PersonRank 10]

17 years ago #

I wonder of 18-24 months?

1.It is after 18-24 months, all our search will be anonymize at the meanwhile?

2.Or just keep the log for 18-24 months, then anonymize them after 18-24 months

mrbene [PersonRank 10]

17 years ago #

pd, moving from "AAAA" to "ZZZZ" still retains the uniqueness of IDs that got AOL into trouble when their data was released, since unique ID + lots of data = linkability. Discussed a bit under "The linkability continuum" in the entry for Pseudonymity:

http://en.wikipedia.org/wiki/Pseudonymity

In order to become mostly anonymous, their logs would have to remove all unique identifiers (whether "anonymous" IDs, email addresses, rehashed IDs, or IP addresses) and record only the queries that were performed. I say "mostly anonymous" to cover the cases where searchers specifically identify themselves in their search terms or other manner not implemented by Google.

/pd [PersonRank 10]

17 years ago #

"moving from "AAAA" to "ZZZZ" still retains the uniqueness of IDs"

Not if its hashed algo that is running ..simliar to MD5.. It becomes a one-pass thru only.... and can't be replicated.. so the ID token() now will reflect a new value which SHOULD not have any linkage to any data .. thus TOR'ing out the linkablity ..

mrbene [PersonRank 10]

17 years ago #

Even if you use a uni-directional hashing algo, if you maintain the uniqueness you don't attain anonymity. IE, if you take:

ID – URL
1 – page1
2 – page1
1 – page2

And you replace all IDs with hashed values, you'd get:

Hash – URL
0A1F02 – page1
12E40B – page1
0A1F02 – page2

You still have the same chance of putting together the actions associated with each hash to determine the identity as you would have determining the identity from the IDs (provided these are randomly assigned IDs in the first place).

Removing the email address from logs will improve privacy. Removing the last octet from the IP address recorded in logs will improve privacy. Doing a one-way hash of an ID does not improve privacy – it's exactly what got AOL in trouble when they released their logs.

/pd [PersonRank 10]

17 years ago #

"You still have the same chance of putting together the actions associated with each hash to determine the identity "

Agreed. Page(1,2) = ID(0A1F02), which then equates true to the above statement.

However ID(1)!= ID(0A1F02) and page(1,2) cannot be associative with id(1). The only way to do that would be to reverse the Hash algo and break the encryption pattern.

and for the purpose of discussion :)- I'll have to agree that the other data elements also will have to go thru the same process. This means ...userID(), mail(), ip(),inteface() and the whole host of other meta elements.

I was only articulating the possibilities of Tor'ing out a whole subset of data which is "linkable" to a person, device, location & time,. These are the key elements when event forensics kicks in ...correct ?

Milly [PersonRank 10]

17 years ago #

>> How many subpoenas for server log data does Google receive
>> each year?
>> As a matter of policy, we don’t provide specifics
>> on law enforcement requests to Google.
>
> Yowza, almost sounds like they’re trying to send a subtle
> message here... after all, by law they’re not allowed to
> disclose certain government queries for user data.

I think that's a big stretch Philipp, but if true then this would need to be factored in too :-

>> Will this policy change make it more difficult for law
>> enforcement to prevent and detect crime or child
>> exploitation?
> No, current laws allow the government to request that
> companies preserve user data. We regularly comply with such
> laws.

We can't know, of course.

How about this conundrum, from the FAQ (a masterpiece of vagueness, and avoidance of their own questions, let alone the harder ones!) :-

On the one hand ...

>> What regulations or laws might require that you keep the data
>> for a longer period?
> Governments in many countries are considering laws that will
> require communications service providers to capture and
> archive telephone and internet traffic data for periods from
> 6 months to 2 years. These laws have for the most part not
> yet been enacted.

On the other ...

>> Will there be different log retention policies based on a
>> user’s country of origin?
> No. We believe in applying this as a consistent policy for the
> benefit of our users worldwide.

So they're seemingly standardising on the longest period eventually enacted by those laws? What if one country decides on 3 years, or 5 years? What does/will China, for example, require? Will they still go 'consistently' for the longest period, "for the benefit of" users worldwide?

And as for your joke about using Patriot Search, what do you make of this bit :-

>> Can users opt out of anonymization?
> We are working on a solution to make this possible.

Huh!?

Since Search History (and other Account Services) users' data is expressly excluded from this anonymisation, who on earth *but* the target audience for your Patriot Search (http://blogoscoped.com/patriot/mission.html) would *want* to opt out? Truth is stranger than fiction ... ;)

Finally (if I may), for anyone thinking this might become a reason not to bother with self-service cookie anonymisation (or better), I'd say definitely not: http://www.imilly.com/google-cookie.htm#announcement

mrbene [PersonRank 10]

17 years ago #

Not that I've been involved in any forensics, but I'd see three cases.

The first case is the AOL case – the behavior associated with the otherwise anonymous IDs prevents uniqueness in some cases, since the behavior can be attributed to a single individual. It's not an automated process, but then again, forensics isn't. This does depend on as complete as possible a data set.

The second case is the typical forensics case – there's a specific IP address (or other identifier) determined to have been in use by a specific human user at a certain time. By dropping the last octet of IPs, Google is protecting the privacy of users to a certain extent – investigators would have to sift through 256x the amount of data currently yielded, but tertiary identifiers (whether the original or the hashed user ID, for example) could quickly sort that user "piles", making drilling down to the suspect not only feasible, but only marginally more difficult.

The third case is the "service improvement". If Google has created a profile based on a user ID, it'd be a hassle for them to refresh data into that from the logs if the logs contain a different ID – especially one that has had a one-way hash performed. A hassle, but not impossible.

So personally I'm certain that my activity will remain logged, in an accessible manner, 2 years from now. Good on 'em for getting some good press, nice timing with the Viacom/YouTube thing, but I'm still a skeptic.

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!