Google Blogoscoped

Forum

The He/ She Ratio  (View post)

Ionut Alex. Chitu [PersonRank 10]

Monday, April 30, 2007
17 years ago10,421 views

Usage of "he" vs "she" on googlesystem.blogspot.com/ (79%/ 21%). Most of the "she" is in the comments.

JoeW [PersonRank 1]

17 years ago #

"Usage of "he" vs "she" on www.myspace.com (15%/ 85%)"
myspace is 85% "she".

/pd [PersonRank 10]

17 years ago #

Usage of "he" vs "she" on peterdawson.typepad.com (84%/ 16%)

/pd [PersonRank 10]

17 years ago #

wow... TP lives upto his woman's advocation!!

Usage of "he" vs "she" on tompeters.com (51%/ 49%)

Ludwik Trammer [PersonRank 10]

17 years ago #

Pretty interesting idea, but you have to have sites categories in mind when interpreting the results. For example political sites have more "he" because there are currently many more male politicians. On the other hand porn sites in general have more "she", because the biggest group using them is heterosexual males (again...). Less obvious examples are of course much more interesting. But the think to remember it that service with big representation of one gender's form isn't FOR this gender or PRO this gender or SUPPORTING this gender. It's only ABOUT this gender (as porn sites example shows...).

/pd [PersonRank 10]

17 years ago #

Good Point Ludwik..but what about dating sites.. ?? for example

Usage of "he" vs "she" on www.plentyoffish.com (47%/ 53%)

so would this mean that there are more females looking to date ?

Philipp Lenssen [PersonRank 10]

17 years ago #

> so would this mean that there are more
> females looking to date ?

Not necessarily... women on a dating site might write, "I'm looking for a man. He should be above 30 ... " etc...

Ludwik Trammer [PersonRank 10]

17 years ago #

> Usage of "he" vs "she" on www.plentyoffish.com (47%/ 53%)

The difference is very low. I got 51%/49% for plentyoffish.com (without "www").

> so would this mean that there are
> more females looking to date ?

My thinking about this is the opposite of your thinking ;) When someone is placing an ad on such site she/he writes about herself/himself in the first person ("I"), but in the third person (sometimes second person) about a potential partner ("He should be sweet and carrying", "She have to be pretty"). So for every statistical woman placing her ad there is more "he", and for every man more "she".

alek [PersonRank 10]

17 years ago #

How 'bout seeing if the percentages are similar/consistent if you compare "him/her"

mukthar [PersonRank 7]

17 years ago #

Usage of "he" vs "she" on google.com (65%/ 35%)

Roger Browne [PersonRank 10]

17 years ago #

Uclue.com is 35%/65% which I certainly would not have expected.

Answers.google.com is 68%/32%.

Ionut Alex. Chitu [PersonRank 10]

17 years ago #

Usage of "he" vs "she" on en.wikipedia.org (69%/ 31%)
Usage of "he" vs "she" on blogspot.com (65%/ 35%)
Usage of "he" vs "she" on youtube.com (69%/ 31%)
Usage of "he" vs "she" on flickr.com (57%/ 43%)

Rebecca Kelley [PersonRank 0]

17 years ago #

This study seems a bit ridiculous to me. When writing, it's customary to use "he" as the default gender in order to avoid sentences riddled with "he/she" (e.g. "A user on your site is more likely to convert if he/she thinks that the content is relevant to him/her"). Look at the Spanish language--when referring to multiple genders (el chico, la chica), both together are given the "los" definite article, which is masculine (los chicos).

I'm not saying that men aren't more highly represented than women; however, I think it's important to keep in mind basic grammar and writing rules.

Ludwik Trammer [PersonRank 10]

17 years ago #

It is interesting to compare this result with average ratio for the Internet as a whole, which is 65%/35%.
That means for example that blogspot.com is exactly average and on flickr.com there are more content about woman than on average site.

Jennifer Hitchcock [PersonRank 0]

17 years ago #

Wow!

I am 91% girly. (It won't let me put my domain name because it has a number in it. ladylike four is the name though.)

I took some test somewhere else though that said I had a strong masculine side. These predictors and indicators are funny.

Ludwik Trammer [PersonRank 10]

17 years ago #

> This study seems a bit ridiculous to me.

It's not ridiculous. You just have to keep all those factors in mind. I believe my previous post, which compares results with an average, should help avoid problem that you mentioned. You could even introduce new ratio based on that. For example flickr.com has 62% more female content then average, and YouTube 6% more male.

Ludwik Trammer [PersonRank 10]

17 years ago #

> I took some test somewhere else
> though that said I had a strong masculine side.

That's correct. Your content is 91% ABOUT girls. And you know WHO is interested in girls...? ;)

dan [PersonRank 0]

17 years ago #

but....

is it really the opposite?

would a guy write HE on his site...he'd me...and when writing about a woman would write she...so it could be that we need to flip the data.

www.vibrantorange.com

Ramibotros [PersonRank 10]

17 years ago #

Interesting:
Usage of "he" vs "she" on *.blogspot.com (47%/ 53%)
Usage of "he" vs "she" on *.* (66%/ 34%)

Ramibotros [PersonRank 10]

17 years ago #

As for music:
Usage of "he" vs "she" on last.fm (44%/ 56%)

Also:
Usage of "he" vs "she" on imdb.com (58%/ 42%)
Usage of "he" vs "she" on groups.google.com (73%/ 27%)

George R [PersonRank 10]

17 years ago #


The following results were all taken from i.p. address 72.14.207.107 over a short period of time on 4/30/2007.

+804000 site:cnn.com
1740000 site:cnn.com the
1550000 site cnn.com -he -she
1220000 site:cnn.com he OR she
1220000 site:cnn.com he
+336000 site:cnn.com she
+883000 site:cnn.com he -she
++65100 site:cnn.com she -he
1660000 site:cnn.com -he
2380000 site:cnn.com -she

George R [PersonRank 10]

17 years ago #

Your ratios only add to 100%,.
Many pages contain both "he" and "she".
Such pages should count twice, resulting in a total greater than 100%.

In any event using the "counts" that google provides are of questionable value. See my previous comment.

John Honeck [PersonRank 10]

17 years ago #

Hopefully Philip doesn't rank for He:She after this post, which is a different subject all together.

Since it's been well established that Googlebot is indeed a she, does this come into indexing decisions?

Philipp Lenssen [PersonRank 10]

17 years ago #

> This study seems a bit ridiculous to me. When writing, it's
> customary to use "he" as the default gender in order to
> avoid sentences riddled with "he/she"

Ludwik hit the nail on the head when he said: "It's not ridiculous. You just have to keep all those factors in mind." The ratio I presented makes no attempt at any specific interpretation ("more 'he' means xyz" or "50% 'she' means foobar"). And there are many ways to interpret these results. For instance, Ludwik proposes to level the values according to the average of the web at large. On the other hand, someone else may consider the web at large to be biased, so they'd consider this leveling to be unfair, etc. (and some may consider "he" defaulting to "neutral" to be biased grammar itself, proposing e.g. a balanced he/she alteration) – again, many different interpretations for the he/ she ratios.

Stubbe [PersonRank 0]

17 years ago #

This is such a great tool! I'm proud to say that thestubbes.com is 74% compliant. I hope you don't mind me posting the image on my site (with a link to this fabulous tool of course)

Andrew [PersonRank 0]

17 years ago #

The experiment is flawed: it shows the number of PAGES containing the word 'he' or 'she' but doesn't take into account how frequently the words appear on each page. So a page with 'he' appearing 100 times is given the same weight as a page with 'she' appearing' once.

Ludwik Trammer [PersonRank 10]

17 years ago #

> it shows the number of PAGES containing
> the word 'he' or 'she' but doesn't take into account
> how frequently the words appear on each page.

Yes, and this could be considered both a good and a bad thing. One article with a lot of "she"s shouldn't change the whole service stats gathered from articles from many years.
Maybe the best solution would be to count "he" and "she" in every single article separately and than give the point to the gender who had more mentions in the given article. And than count the points.

But Philipp's solution is better just because it's easy and clean. Philipp is showing us that we can gather many interesting stats in a simple way (do you remember his chess stats gathered from Google?). It's less about the stats and more about the process.

Andrew [PersonRank 0]

17 years ago #

Linguists have been gathering stats (about word frequencies, etc) using this process for quite a while but the results are notoriously unreliable. See, for example, http://aixtal.blogspot.com/2005/02/web-googles-missing-pages-mystery.html

Rob Balder [PersonRank 0]

17 years ago #

I'm also puzzled as to the significance (if any) of this number for a given site. Livejournal.com, for example, keeps statistics on their users who indicate gender (most of us) and shows about a 2:1 female:male user ratio.

Male: 1724500 (32.8%)
Female: 3533289 (67.2%)
Unspecified: 2089647

Yet the he/she for Livejournal is (66%/ 34%).

The person who did this comparison above for MySpace shows an even more extreme disparity between user demographics and ...what, speech focus? What is this measuring? I'm having a hard time seeing what this number could be used for, even in sweeping and general terms.

Andrew [PersonRank 0]

17 years ago #

^
The simple answer: males don't only use 'he' in their writing and females don't only use 'she' in their writing!

Ludwik Trammer [PersonRank 10]

17 years ago #

> Male: 1724500 (32.8%)
> Female: 3533289 (67.2%)
> Yet the he/she for Livejournal is (66%/ 34%).

And isn't that fascinating?
I find this and similar comparison of different data extremely interesting. For example in this case there is almost exactly the same percentage of posts with "he" as female bloggers, and the same percentage of posts with "she" as male bloggers. It would be really interesting to compare data from more sources.

Kevin T. Keith [PersonRank 1]

17 years ago #

I find that many valid URLs return only an error message – even some that get a lot of traffic. What gives?

Philipp Lenssen [PersonRank 10]

17 years ago #

Kevin, got some examples?
basically all I'm doing is concatenating a "site:" command with the URL entered, and then check the Google page count...

Stephen Tordoff [PersonRank 10]

17 years ago #

Now this is interesting, and somewhat related to this:

"She invented"

http://www.google.co.uk/search?hl=en&q=%22she+invented%22&btnG=Google+Search&meta=

Stephen Tordoff [PersonRank 10]

17 years ago #

Or worse, http://www.google.com/search?hl=en&q=what+have+women+invented%3F&btnG=Search

TOMHTML [PersonRank 10]

17 years ago #

Found via Digg, of course ;-)

Stephen Tordoff [PersonRank 10]

17 years ago #

The second one was, but I already knew about the first before reading the Digg article.

Amy Forza [PersonRank 0]

17 years ago #

Women Inventors A-Z....Look it up

James Xuan [PersonRank 10]

17 years ago #

What haVE when invented is not as good as your first one Stephen

ssvbhalla [PersonRank 0]

17 years ago #

she invented

Forum home

Advertisement

 
Blog  |  Forum     more >> Archive | Feed | Google's blogs | About
Advertisement

 

This site unofficially covers Google™ and more with some rights reserved. Join our forum!