Berlin Search Engine Workshop

Tuesday, June 27, 2006

Berlin Search Engine Workshop

Yesterday I’ve been to a workshop in Berlin discussing the “rising power of search engines.” It was an interesting round of academic people from all over the world – US, Greece, Germany, to name a few countries – as well as a couple of bloggers (like the very cool Klaus Patzwaldt of German At-Web). The workshop, organized by the Friedrich Ebert Foundation, covered several panels across the day, discussing copyright laws, censorship, search engine ethics (“webmasters including Google analytics are doing a modern form of prostitution – data prostitution,” to paraphrase Hendrick Speck of SuMa-eV), Google as “gatekeeper” and more.

More scams found in paid than organic results

One very interesting talk was given by Benjamin Edelman from Harvard University. Ben analyzed the frequency of spywarish sites among organic search results vs search results ads, e.g. when you query Google for something like “screensavers.” Turns out, probably not surprisingly to most of you, that ads contain a much higher amount of bad downloads; that is, anything from EXEs that quietly install programs showing ads in your browser, software scams where people make you pay for freeware, software which renews subscription payments even though that fact wasn’t obvious, or just good old-fashioned sites that spam your email account when you register with them.

Exact numbers of the spyware ratio I suppose are hard to come by. Benjamin, who also works with McAffee SiteAdvisor (a Firefox plugin showing “bad apples” on Google results), says his crawl downloads executables and such to determine software scams. According to his studies, 3.1% of all sites in natural results are scams, whereas the amount of bad apples among AdWords ads is 8.5%. In short, an advice to search newcomers could be: don’t click on those Google ads!
Ben asks for search companies to better filter these sites, and also hopes that perhaps a law comes into place making search engines more responsible for these scams (from which they benefit through ad dollars). Even though the issue has been covered in a couple of mainstream sources, Ben argues that really so far Google completely ignores this problem.

On a side-note, it turns out that MSN has (relatively speaking) the least amount of scams in ad vs organic results... however, Ben said, that’s mostly due to them having less ads on search results in the first place.

German search engine self-censorship: The “right” kind of censorship?

I was slightly disappointed in the presentation on German vs Chinese self-censorship, though I’m not sure whether it was the message itself (for which the messenger is not responsible, as we know) or the way it was presented. Basically, under German law censorship is not allowed; yet, somehow, whenever something’s censored over here (e.g. Nazi imagery), then that’s not censorship, mainly because by definition in Germany “censorship” only happens when you kill something before publication.

Somehow, we were left with a lingering implication of a stark contrast of German censorship vs Chinese censorship – you know, the one is “good” censorship, and the other “bad” censorship, with the best reasoning given for that being that “German law only censors stuff that’s endangering democracy.” No one seemed to have realized that a) Google is now using the German and French censorship as defense for their Chinese censorship, a perfect illustration how dangerous it is to set any kind of censorship precedent, and b) that the Chinese gov’t have their reasons too for censoring, and that those reasons happen to vary depending on culture. Sure, we can argue that European values are much more in tune with global human rights, but tell that to a Chinese gov’t official who’d beg to differ, arguing that too much free speech endangers the nation and leads to public uprisings and so on.

Later that evening I learned that Marcel Machill, the scientific organizer of the workshop, is one of the figures lobbying for self-regulation online, the kind we see in Germany thanks to the FSK (the “Freillige Selbstkontrolle”, a “voluntary self-regulation”). E.g. in “Self-regulation of Internet Content” [PDF], responsible author Machill argues that “Internet providers hosting content have an obligation to remove illegal content” and that “codes of conduct must be the product of and be enforced by self-regulatory agencies.” Interesting side-note: search engines merely index the web and republish small snippets (or thumbnails), a symptom of those actually hosting the content.

In Germany, search engines aren’t strictly required by law to follow up on the blacklist of sites put forth by the FSK (it’s voluntary, after all) – yet, they happily do so, possibly trying to walk the path of least resistance or trying to avoid further, stronger censorship. Talk about gatekeepers! (And mind you, in the first years many of the censored results – e.g. Nazi site stormfront.org is missing – weren’t even disclosed as such by Google. That only happened in the aftermath of the Google China censorship debate.)

Google News fair use? The case of French AFP

Another presentation, by Susan Keith from the State University of New Jersey, gave insights into a legal case of Google News v AFP, a French press agency that complained that Google took its content into Google News, basically. Well, as there is such a thing as fair use law in the US, the details of the case were much circling around the questions of: is copying a headline, a lead, and a thumbnail preview of the news story fair use or not? To find out if it is, several points need to be discussed, e.g.:

“Does a headline state mere facts?” (’cause you can’t copyright facts, at least not by US law)
“Does Google take away from the commercial market of the news source?” (I mean, I actually think they improve the visibility and thus market share of AFP by show-casing it)
“Is there any transformative work done by Google News?” (the more you “rephrase” a news story, the less you’re infringing on the source’s copyright – obviously, Google doesn’t do much in terms of transformation, except snippeting & thumbnailing the content)

The lines are still blurred and no completely conclusive discussion has been reached yet. I think it’s funny that some news sources want to be included into Google News badly or complain if they’re shut out, yet others like AFP want to be excluded. I wonder if the “market” aspect of US fair use law recognizes positive effects as well?

See a video excerpt of Susan’s talk [WMV].

Also see the follow-up on the objectivity of Google.

Berlin Search Engine Work ... by Philipp Lenssen | Comments (5)

>> More posts

Blog | Forum more >> Archive | Feed | Google's blogs | About

This site unofficially covers Google™ and more with some rights reserved. Join our forum!