Project Gaydar: Data-Mining Social Networks
Using data from the social network Facebook, they made a striking discovery: just by looking at a person’s online friends, they could predict whether the person was gay. They did this with a software program that looked at the gender and sexuality of a person’s friends and, using statistical analysis, made a prediction. The two students had no way of checking all of their predictions, but based on their own knowledge outside the Facebook world, their computer program appeared quite accurate for men, they said. People may be effectively “outing” themselves just by the virtual company they keep.
“When they first did it, it was absolutely striking – we said, ‘Oh my God – you can actually put some computation behind that,’ ” said Hal Abelson, a computer science professor at MIT who co-taught the course. “That pulls the rug out from a whole policy and technology perspective that the point is to give you control over your information – because you don’t have control over your information.”
. . .
Facebook spokesman Simon Axten could not respond to Jernigan and Mistree’s analysis, since it is not public, but pointed out that it is something that happens every day.
- Project ‘Gaydar’ by Carolyn Y. Johnson
Boston Globe
2009-09-20
Oops! Your sexual preference is showing.
Keep in mind that the research performed by these students is far from “high-tech” – their research isn’t published but it is safe to assume that they put together something possibly as simple as counting each individual’s connections and, where an individual with an unknown sexual preference was connected to another individual with a known sexual preference, added to that individual’s homosexuality indicator.
This would be a highly iterative process, however, knowing the actual sexual preference of only a small percentage of individuals and then extrapolating upon connections from unknown to unknown based upon what is known would allow the data mining program to indicate with better-than-random probability the sexual preference of everyone with a connection.
Take it a few steps further and start analyzing the other information provided – favorite books, movies, musical acts, their addresses, the content posted by users on eachothers’ profile pages, content posted in online journals, the content of sites linked from each user’s profile, even their names… you can build a statistically-probable representation of an individual down to his or her ideology.
So, who wants to be first up against the wall?





