By now, uncounted articles have been written about the Google Search subpoena, many with errors. In retrospect, it’s helpful to closely examine the origins of the issue. As has been much discussed, the background for the subpoena is a law called the Child Online Protection Act (“COPA”), a criminal law which seeks “restriction of access by minors to materials commercially distributed by means of World Wide Web that are harmful to minors”. While a laudable goal in the abstract, the practical implications of applying such vague and broad prohibitions can lead to far-reaching censorship.
The following aspect of the government’s COPA legal briefs are likely the origins of it seeking various data from search engines:
The House Report accompanying COPA further documents the serious problem
that Congress sought to address. By 1998, the number of minors using the
Internet had grown to 16 million. H.R. Rep. No. 775, supra, at 9. At the
same time, the number of pornography Web sites had grown to 28,000. Id.
at 7. Those sites offer "teasers"-free pornographic images designed
to entice users to pay a fee to explore the whole site. Id. at 10. Because
Web software is easy to use, "minors who can read and type are capable
of conducting Web searches as easily as operating a television remote."
Id. at 9-10. As a result, pornographic material on the Internet is "widely
accessible" to minors. Id. at 9. While many minors deliberately search
for pornographic Web sites, others accidentally stumble upon them. Id. at
10. Many pornographic sites use "copycat" Web addresses to take
advantage of innocent mistakes. For example minors would find hard-core
pornography by mistyping www.whitehouse.com rather than www.whitehouse.gov.
Ibid. Searches using common terms such as toys, girls, boys, bambi, and
doggy all lead to pornographic sites. Ibid. Most pornographic Web sites
either provide no warning that their sites contain pornography or provide
a warning on the very same Web page that displays pornographic teasers.
During Supreme Court oral argument about the COPA law, the Solicitor General made the following claim:
MR. OLSON: .... But the problem with respect to the children is the
material that is so widely available on the Internet that doesn’t
reach the definition of – that is not as bad as obscenity. It is a
wide amount of information. The legislative history described 28,000
pornographic sites in a – this is also outside the record, but if an
individual goes to their Internet and – and uses an Internet search
engine and – and types in the word, free porn, I did this this
weekend, the – your – your computer will say that there are
6,230,000 sites available. Now that’s available now.
This was a ludicrous abuse of statistics. He had searched Google for all items which contained the word “free”, and the word “porn”, somewhere on the page (not even strictly the phrase “free porn”). And then offered the meaningless number returned as if it were somehow relevant. Indeed, this very article will now increase that number, merely by quoting him. It was (or at least, should have been) an embarrassing display of ignorance.
With such poor quality of evidence being proffered to justices of the Supreme Court, it’s easy to see why the government wanted to be better prepared in future trials. Indeed, the eventual decision about the COPA law required more investigation, particularly of censorware:
Second, there are substantial factual disputes remaining in the
case. As mentioned above, there is a serious gap in the evidence as
to the effectiveness of filtering software. See supra, at 9. For us
to assume, without proof, that filters are less effective than COPA
would usurp the District Court’s factfinding role. By allowing the
preliminary injunction to stand and remanding for trial, we require
the Government to shoulder its full constitutional burden of proof
respecting the less restrictive alternative argument, rather than
excuse it from doing so.
So for this evidence, a statistics professor working with the Department of Justice decided to try to use search engine queries as a basis for various estimates:
Reviewing URLs available through search engines will help us understand what sites users can find using search engines, to estimate the prevalence of harmful-to-minors (HTM) materials among such sites, to characterize those sites, and to measure the effectiveness of content filters in screening HTM materials from those sites.
Reviewing user queries to search engines will help us understand
the search behavior of current web users, to estimate how often
web users encounter HTM materials through searches, and to measure
the effectiveness of filters in screening those materials
Many problems have been pointed out with these ideas, to put it gently. It’s important, however, to keep in mind that the previous state-of-the-art in research evidence here was typing in the words “free” and “porn”. It’s hard to imagine a better case for never attributing to malice what can be explained by stupidity.
But when the keywords “Google”, “government”, “pornography”, “privacy”, all mixed together, it produced an explosive reaction due to the volatility of the components. Many news reports gave a false impression that the government intended to go on a fishing expedition of sifting through personal search records in order to track down seekers of child pornography. The recent NSA wiretapping scandal provided yet another framework for suspicion.
Pragmatically, if the government was going to data-mine search engines as a source for investigation of terrorism, or even child pornographers, those actions would be surrounded by secrecy. And the public would only find out about it through leaks, not open court action. For example, an ACLU lawsuit over the PATRIOT Act was subject to extensive gag orders. And if there was a fishing expedition, since other search engines had complied with the government data requests, the net had already been spread far and wide. So from a very narrow perspective, any privacy damage had already mostly been done.
However, the relatively minor goal of statistical studies has ended up raising public awareness of the overall issues with personal data stored by search engines, and the fears it could be misused for criminal investigations. Search engines are almost an outsourced surveillance system. And completely unaccountable since they’re private companies. Perhaps the overall lesson of the story is that information collected for business purposes could easily be abused. We should start thinking about mandating privacy protection before the abuses imagined in this case become reality in the future.
>> More posts