Wednesday, March 18, 2009
Can a Site Guess a User’s Gender With JavaScript?
Update: Hah. The whole thing was already thought up, and implemented as sample, some time ago. [Thanks A B!]
My friend Nikolai and I were brainstorming a site we’ve planned, and the topic of adjusting the site’s content based on gender came up. In this case, knowing the gender would just be a bit helpful, but not crucial – and not crucial enough to bother asking the user to provide the info themselves –, and getting it 100% right would also not be necessary. (For the site we discussed, we also wouldn’t store any user data like names or email or so, so changing the site’s content based on gender would be just a temporary, session thing.) So we were wondering if the following approach would be feasible technically (note we did not yet judge on whether it should be done if it could be done, we were only brainstorming):
- Identify a couple of hundred or so websites which are typically visited by men, and then identify a couple of hundred sites typically visited by women. For instance, one could check the Alexa top sites for some inspiration, or enter such things as “top woman’s portals” into Google. One may also be able to simply get a long list of popular websites and then ask the people at Mechanical Turk whether or not they’ve visited a particular site, and then ask them if they’re male or female.
- With that list in hand, one could use the JavaScript history hack which works in many common browsers. The basis of this hack is that it allows you, given a list of URLs, to check whether the user visited any of the URLs before. This can be done because you can use a hidden layer which outputs the links, with a different (CSS) style assigned to visited vs unvisited links, and then, via JavaScript, go through the links to check their applied styling to see whether the URL shows as visited. You can check the sample I published here a while ago.
- Now for every “typically male” site the user visited you could add a point and for every “typically female” site you could subtract a point (or substract for men, and add up for women, as you prefer). Then if you end up with a number crossing a certain confidence threshold – say, over 50 points plus or over 50 points minus – you could then make a guess that the user is a man/ is a woman, and submit that data via a form, Ajax or what-not. (If no confidence threshold is reached, your site takes a neutral stand in terms of whatever you wanted to change based on gender.)
Now, here are some things to consider:
- I find the CSS hack borderline (and perhaps beyond) in terms of user privacy. Browsers apparently don’t consider it a security risk as it is, or they might have fixed this hack as it’s around for years. But using this hack at all, even when it’s say stated so in the privacy terms, may well not be the right thing to do. The point for our app during brainstorming though was that we don’t save any user data like names, emails or so. Well, my friend and I were just throwing around technical ideas, and not yet judging on whether it would be OK to do this, so that was not be the point of the brainstorming, but that issue definitely requires a discussion.
On that note, would it even be technically possible (and wanted?) to stop this hack from working in future browsers, that is, without breaking many completely unrelated scripts?
- Sometimes, two users may share the same browser, hence their URL histories may be mixed up. On the other hand, this could also mean that no confidence threshold would be crossed by the algorithm, so it may be OK after all.
- Is it even possible to find a sufficient list of “typically male/ typically female” websites? Is there even such a thing as a “typically male” site? Would the guesses based on that list ever be good enough to be at least say 80% or 90% right or so? Would the algo only identify the “soap opera clichee male/ female” (loves football, beer, fast cars, must be male!)?
(One thing we also wondered: would it be enough to just make your algorithm’s goal to identify one gender – say, you’ll only try identifying male gender based on typical male sites – and then you simply assume if you don’t sufficiently identify that gender, you’ll assume the person is of the other gender...)
- We also wondered if there’s any popular social network site (something like Facebook) which displays one default avatar image to users who identified themselves as male, and another avatar to those identifying themselves as females. For users who have not yet chosen their own avatar within the social network, could the history hack be applied to that image then? (I.e. if the female default avatar image URL shows up as visited, the user is female. This would only work if that image is uniquely shown for your own avatar, not when user avatars by others in the network are displayed.)
- ... on a side-note: could the same algorithm approach also be used to guess a person’s age (user visiting a lot of kids sites?), their tastes and preferences, etc.?
Thoughts and comments... ?
>> More posts
Advertisement
This site unofficially covers Google™ and more with some rights reserved. Join our forum!