Sunday, June 18, 2006

Words Returning Censored Google Results

Back in 2004, Google wrote, “For Internet users in China, Google remains the only major search engine that does not censor any web pages.” Using a dictionary of 10,000 English words I probed Google self-censorship two years later. In 2006, the following 9% of words return search results which Google agreed to censor in China*:

abreast, abundant, acceptable, accusation, accuse, accused, adjacent, admirable, admiration, admire, admit, admitted, adorable, adult, affected, agree, airline, aisle, alive, allah, allegation, alligator, allow, ally, almost, alphabetical, ambitious, amir, amongst, amour, analogue, ancestry, ancient, anticipate, appeal, appear, appearance, applaud, appoint, appointed, appointment, appreciate, appropriate, approve, arabic, arithmetic, armored, arms, army, arrest, arrogant, arsenal, ashes, assorted, astronaut, astronomer, asymmetric, athletics, attempt, attention, attentive, attic, attitude, attractive, authoritarian, authorization, autopsy, auxiliary, avail, available, await, awakening, awfully, awkward, babe, babel, backlog, bacteria, ban, bankrupt, bantam, barker, barred, barrier, batman, beaten, beating, became, bedroom, begin, behalf, belly, below, beneath, besides, bitter, blah, bleed, blew, blister, blown, boarding, boer, bolster, bonded, booze, bought, boulevard, boxing, breakout, breakup, breathtaking, breeding, brutal, brutality, built, business, bypass, caramel, case, catastrophe, cathedral, caught, cautious, celestial, celtic, century, ceremony, chamberlain, championship, chandler, chaos, charged, charisma, chesterfield, chinese, chow, circulate, city, clamp, clash, cleanliness, cliche, clientele, cloud, cockpit, collar, collector, colonialism, combinations, comeback, comedian, comer, commemoration, commonly, communism, complication, concentrate, condemn, condemnation, conductor, confessed, confident, confines, confirmed, congratulate, consider, consideration, constable, constructive, consume, contact, contemplate, contemporary, contend, contingent, contradict, controversial, conversational, convinced, cooker, corral, correspondent, corresponding, corrupt, costly, could, couple, courtroom, cracked, craze, credentials, credible, cricket, crowd, crowded, cultivation, cyclic, damn, dashed, deal, dealings, declaration, declared, decline, dedication, deepen, defeat, defensive, deficit, deliberation, delightful, democracy, democrat, democratic, denial, deny, departed, depend, deprived, depth, deputy, descended, despair, detachment, detective, detriment, devastating, dictatorship, difficulty, diffusion, dilute, dioxide, disappoint, disappointing, disappointment, disarmament, disastrous, disconnected, discrepancy, disposable, dolly, dome, dominate, donation, doubles, doubt, doubtful, drain, drastic, drawing, drove, during, dwarf, earliest, earthly, east, easy, economical, economy, educated, education, eighteen, eighteenth, eighth, elastic, elderly, election, electronic, eloquent, embrace, emission, enact, endanger, endorse, endowment, enhance, enjoy, enlargement, enquiry, entertain, entertainment, envoy, epoch, equate, equatorial, equilibrium, especial, essay, excellence, exclusively, exercises, existent, expectancy, expenditure, explanation, explode, extinct, facing, failing, failure, faint, faithfully, fallout, famed, famine, farce, fare, faulty, favourable, feedback, felix, felony, feminism, fewer, fictional, fiddle, fifteen, figured, figures, firsthand, fisherman, flashlight, fledged, fledgling, flee, flowing, flown, flux, foil, football, forehead, foreseeable, forever, formidable, fortunate, forum, found, fourteen, fourteenth, fragmentation, fraudulent, fray, from, front, fucking, further, futile, gangster, garner, gastric, generally, genocide, gentry, ghost, gilt, girlfriend, given, glance, glaze, glory, glowing, goggles, golf, grace, grandma, greatly, greet, grin, grounding, guinea, gull, gymnastics, hair, halt, halves, hamper, handout, happen, harm, harmonica, hatred, have, health, heartfelt, hearth, heated, heavenly, heir, help, heroic, herself, hesitation, highlight, himself, hinder, hollow, homage, honours, hopeful, horrific, hound, humorous, hundred, hunt, hurdle, hurry, ignore, illegal, illumination, illusion, illustrative, immature, imminent, impartial, impede, implication, impose, in, inadequate, incapable, incarnation, incest, included, incompetent, inconvenience, increasingly, increment, incur, indefinite, indicative, indirect, induction, ineffective, inevitable, influential, informative, inheritance, injunction, instability, insult, intact, intangible, interchangeable, interdependent, irresponsible, is, itself, jag, jersey, joke, joy, jumps, justification, keel, keep, keeper, keeping, keeps, kept, kilo, kindred, kirk, knot, laden, ladies, lain, lair, lament, lance, lancet, landed, lasting, late, lately, laughter, leakage, leave, leftist, legendary, legion, lethal, liable, liberalism, liberation, liberty, like, lily, liner, literature, looks, lousy, low, lp, luck, lunar, luxurious, magazine, maid, male, mandarin, maneuver, manners, many, margin, marked, marking, marry, marsh, mash, masthead, maternal, may, mediterranean, melodic, melt, melting, memoir, memorable, merely, methane, midst, migrant, migratory, militant, militia, mimic, miserable, misguided, mistaken, misunderstanding, misunderstood, mobilization, modify, morally, mores, morning, movement, muster, narrow, narrowly, naughty, necessarily, need, nevertheless, news, newspaper, northern, not, notice, novel, obviously, off, offer, onto, opening, ordeal, originality, originally, other, our, out, over, overdue, overlap, own, painting, pane, paragraph, peck, people, perform, perhaps, picked, pleased, poetry, point, political, politics, porn, powerful, prefer, prepared, prevailing, pristine, profile, profound, progression, prohibit, prohibition, prolonged, promising, prone, pronounce, pronounced, pronunciation, property, prophets, prose, prosecute, prosecution, prosperous, protest, province, prowess, pub, pun, purification, pursue, pursuit, quad, qualitative, question, quits, rambling, ranks, rather, realisation, reasonable, reasonably, rebuilt, recess, refreshments, refusal, regency, regiment, regret, rehearsal, reject, rejoice, relieved, rely, remover, renewable, reopen, repay, replica, reportedly, repression, reproduce, resemblance, resemble, resign, resignation, resigned, resilience, resonant, respectable, respective, respects, retain, retard, return, reveal, revocation, revolutionary, rewritten, rhythmic, rick, rightly, rights, ring, rivalry, robbery, role, rosemary, roughly, routine, rubbing, ruins, runs, rupture, ruthless, satan, satisfying, sausage, save, say, scare, scared, scarf, schedule, scraps, screenplay, secluded, segment, segregation, sent, sentence, sentimental, separate, serious, sessions, setback, seventeen, seventeenth, seventy, sexy, shah, shaken, shire, shocking, shortage, shortfall, shorthand, shorts, shown, shrimp, si, sibling, sideways, sighted, sing, situation, sixteen, sixteenth, skinny, slain, slim, sloppy, sluggish, sly, smashed, snoop, sob, soften, solely, solemn, solitude, sonic, sonny, sophistication, soul, sour, sovereignty, spawn, sperm, spinach, spirits, splits, sport, spotted, squeeze, stall, stance, standby, stars, start, stationery, steep, stimulate, stoke, stool, stopping, straightforward, strait, streamlined, strengthen, struck, stumble, subsidiary, substitute, suburb, suck, suffer, suitable, suitcase, sunset, super, suppose, supposedly, surge, surprise, surrender, sway, symmetrical, takeover, talented, technology, tedious, teenage, teenager, teens, tenure, terms, terrain, terror, tertiary, theoretically, therefore, thirteen, this, thorough, thread, threat, threaten, threw, thrill, thrown, thursday, ticker, tidy, tighten, timeless, timely, ting, titled, toaster, told, tolerate, toll, tongue, tooth, tory, town, tragic, transient, travelled, treat, tripod, triumphant, troop, troops, tub, tuition, tunnel, turbulent, turf, turkey, uhf, unaffected, under, underestimate, undermine, underneath, undue, uneasy, uneven, unfit, unforgettable, unmask, unnatural, unnecessary, unofficial, unpleasant, unravel, unrest, unveil, upbeat, upheld, uphold, upright, upset, upwards, utterly, vacant, vague, vastly, veil, vein, veto, violate, vip, waist, waits, wandering, want, wanting, wary, watchdog, watt, wearing, weekly, weigh, wen, what, whereabouts, whereby, whose, widespread, wife, willow, win, withdraw, withdrawn, withheld, wonderful, woodland, woody, wording, wore, world, worried, worthless, worthy, wrapping, wrote, yahoo, years, yorker, your, zebra.


*For this test, I checked 10,000 words from an English dictionary in web search, with Chinese language settings. (The word list consisted of the 10,000 most common words of a dictionary of over 27,000 words.) 901 – 9% – of the words checked returned censored results, with 1 or more sites missing from the top 10 results. I’m defining a result as “censored” whenever Google shows their own censorship disclaimer (“According to local laws and policies some search results are not showing” – it’s in italics below the last result, as seen on a search for e.g. “abreast” at this time). Google’s disclaimer may be inaccurate – this was the case for many censored results in Germany for years, i.e., the disclaimer was missing – but for simplicity’s sake we’ll believe them.

The resulting list of words is neither a complete list of censored search queries search queries resulting in censorship – which is pretty much endless, as words can be combined, and written in different languages – nor is the list in any relation to how common or uncommon a search is, in particular as it’s not a Chinese word list, and in particular as people don’t just enter single-word queries. (But even when we’d have Google’s statistics of the most common searches on from China, that list wouldn’t be too meaningful – it won’t list the kind of queries people are too afraid to enter, or don’t enter because they assume censored results anyway, which as we can imagine would be a word list with a higher overlap with censored results.)
Also see a previous test with multi-word queries in the neighborhood of politics. Multi-word queries result in censored results often as well. For example, neither “bird” nor “flu” hit on any censorship individually, but combine them to “bird flu” and sites will be missing in the Google top 10.

Also note that Google censors sites, not words; if the list would be expanded to show not the top 10 but e.g. the top 100 sites for a particular search, the list of self-censored results would be even longer.

This post is available as audio file [MP3].

Edit: I saw this post being slightly misunderstood in several places so I’m changing the title to clarify the distinction between a word being censored, and the search result for a word being censored. I am changing the title from “Google Censorship Word List” to the more descriptive “Words Returning Censored Google Results”. (In the original post, I made this distinction clear in the introductory paragraph, the footnote and the comments, but I think it wasn’t too clear in the original title.)


