Google Blogoscoped

Thursday, January 26, 2006

The Google Search Subpoena in Perspective

Seth Finkelstein is a consulting programmer who has done extensive civil-liberties activism. He was the first person to decrypt censorware blacklists, won an EFF Pioneer Award for his work, and has served as an expert witness in a major internet censorship case.

By now, uncounted articles have been written about the Google Search subpoena, many with errors. In retrospect, it’s helpful to closely examine the origins of the issue. As has been much discussed, the background for the subpoena is a law called the Child Online Protection Act (“COPA”), a criminal law which seeks “restriction of access by minors to materials commercially distributed by means of World Wide Web that are harmful to minors”. While a laudable goal in the abstract, the practical implications of applying such vague and broad prohibitions can lead to far-reaching censorship.

The following aspect of the government’s COPA legal briefs are likely the origins of it seeking various data from search engines:

The House Report accompanying COPA further documents the serious problem that Congress sought to address. By 1998, the number of minors using the Internet had grown to 16 million. H.R. Rep. No. 775, supra, at 9. At the same time, the number of pornography Web sites had grown to 28,000. Id. at 7. Those sites offer "teasers"-free pornographic images designed to entice users to pay a fee to explore the whole site. Id. at 10. Because Web software is easy to use, "minors who can read and type are capable of conducting Web searches as easily as operating a television remote." Id. at 9-10. As a result, pornographic material on the Internet is "widely accessible" to minors. Id. at 9. While many minors deliberately search for pornographic Web sites, others accidentally stumble upon them. Id. at 10. Many pornographic sites use "copycat" Web addresses to take advantage of innocent mistakes. For example minors would find hard-core pornography by mistyping rather than Ibid. Searches using common terms such as toys, girls, boys, bambi, and doggy all lead to pornographic sites. Ibid. Most pornographic Web sites either provide no warning that their sites contain pornography or provide a warning on the very same Web page that displays pornographic teasers. Ibid.

During Supreme Court oral argument about the COPA law, the Solicitor General made the following claim:

MR. OLSON: .... But the problem with respect to the children is the material that is so widely available on the Internet that doesn’t reach the definition of – that is not as bad as obscenity. It is a wide amount of information. The legislative history described 28,000 pornographic sites in a – this is also outside the record, but if an individual goes to their Internet and – and uses an Internet search engine and – and types in the word, free porn, I did this this weekend, the – your – your computer will say that there are 6,230,000 sites available. Now that’s available now.

This was a ludicrous abuse of statistics. He had searched Google for all items which contained the word “free”, and the word “porn”, somewhere on the page (not even strictly the phrase “free porn”). And then offered the meaningless number returned as if it were somehow relevant. Indeed, this very article will now increase that number, merely by quoting him. It was (or at least, should have been) an embarrassing display of ignorance.

With such poor quality of evidence being proffered to justices of the Supreme Court, it’s easy to see why the government wanted to be better prepared in future trials. Indeed, the eventual decision about the COPA law required more investigation, particularly of censorware:

Second, there are substantial factual disputes remaining in the case. As mentioned above, there is a serious gap in the evidence as to the effectiveness of filtering software. See supra, at 9. For us to assume, without proof, that filters are less effective than COPA would usurp the District Court’s factfinding role. By allowing the preliminary injunction to stand and remanding for trial, we require the Government to shoulder its full constitutional burden of proof respecting the less restrictive alternative argument, rather than excuse it from doing so.

So for this evidence, a statistics professor working with the Department of Justice decided to try to use search engine queries as a basis for various estimates:

Reviewing URLs available through search engines will help us understand what sites users can find using search engines, to estimate the prevalence of harmful-to-minors (HTM) materials among such sites, to characterize those sites, and to measure the effectiveness of content filters in screening HTM materials from those sites.

Reviewing user queries to search engines will help us understand the search behavior of current web users, to estimate how often web users encounter HTM materials through searches, and to measure the effectiveness of filters in screening those materials

Many problems have been pointed out with these ideas, to put it gently. It’s important, however, to keep in mind that the previous state-of-the-art in research evidence here was typing in the words “free” and “porn”. It’s hard to imagine a better case for never attributing to malice what can be explained by stupidity.

But when the keywords “Google”, “government”, “pornography”, “privacy”, all mixed together, it produced an explosive reaction due to the volatility of the components. Many news reports gave a false impression that the government intended to go on a fishing expedition of sifting through personal search records in order to track down seekers of child pornography. The recent NSA wiretapping scandal provided yet another framework for suspicion.

Pragmatically, if the government was going to data-mine search engines as a source for investigation of terrorism, or even child pornographers, those actions would be surrounded by secrecy. And the public would only find out about it through leaks, not open court action. For example, an ACLU lawsuit over the PATRIOT Act was subject to extensive gag orders. And if there was a fishing expedition, since other search engines had complied with the government data requests, the net had already been spread far and wide. So from a very narrow perspective, any privacy damage had already mostly been done.

However, the relatively minor goal of statistical studies has ended up raising public awareness of the overall issues with personal data stored by search engines, and the fears it could be misused for criminal investigations. Search engines are almost an outsourced surveillance system. And completely unaccountable since they’re private companies. Perhaps the overall lesson of the story is that information collected for business purposes could easily be abused. We should start thinking about mandating privacy protection before the abuses imagined in this case become reality in the future.


Blog  |  Forum     more >> Archive | Feed | Google's blogs | About


This site unofficially covers Google™ and more with some rights reserved. Join our forum!