Facebook, private traits and attributes: Predictions from digital records of human behavior

 
(iStock)
(iStock)
Share
By

Governments, businesses and policy makers are increasingly able to capture and organize vast quantities of personal information — a trend known as “Big Data” — to help them make better-informed decisions. However, this data can also pose a serious threat to individual privacy. In a widely publicized 2012 episode, the retailer Target was able to guess from a young woman’s recent purchases that she was pregnant, before she’d even told others she was expecting. Even criminal enterprises have gotten in on the trend.

Data gathering extends far beyond simply tracking credit card purchases and includes the information people share online about themselves and others. A 2007 report by the Pew Internet and American Life Project found that 60% those surveyed were not concerned about what was available about them online and 61% do not try to control it; however, by 2012, Pew found that many people were employing privacy settings to reduce the information available about themselves. Other research has indicated that those who use social media sites are typically more relaxed about privacy settings and often feel a false sense of control.

A 2013 study from the University of Cambridge [U.K.] and Microsoft Research in the Proceedings of the National Academy of Sciences (PNAS), “Private Traits and Attributes Are Predictable from Digital Records of Human Behavior,” examines the degree to which information available online can successfully predict an individual’s personal — and private — attributes. The researchers correlated public records of Facebook “Likes” from more than 58,000 users with results from personality and intelligence tests and information from public profiles. The study focused on whether a user’s history of Likes could accurately predict sexual orientation, ethnic origin, political views, religion, personality, intelligence, satisfaction with life, substance use, age, gender, relationship status, and “whether an individual’s parents stayed together until the individual was 12 years old.”

Key study findings include:

  • The researchers were able to accurately predict a user’s sexual orientation 88% of the time for men and 75% for women. While less than 5% of user profiles were explicitly linked to gay policy or advocacy groups, “predictions rely on less informative but more popular Likes, such as ‘Britney Spears’ or ‘Desperate Housewives’ (both moderately indicative of being gay).”
  • The model was able to predict a user’s ethnic origin (95%) and gender (93%) with a high degree of accuracy. “Patterns of online behavior as expressed by Likes significantly differ between those groups, allowing for nearly perfect classification.”
  • The model predicted whether a user was Christian or Muslim (82%), a Democrat or Republican (85%), and used alcohol, drugs or cigarettes (between 65% and 75%), and was in a relationship (67%) with a high degree of accuracy.
  • The model was less accurate when attempting to predict the length of the parents’ marriage (60%). “Individuals with parents who separated have a higher probability of liking statements preoccupied with relationships, such as ‘If I’m with you then I’m with you. I don’t want anyone else.’”
  • The low prediction accuracy for satisfaction with life (only 17%) may be linked to an inability to separate short-term affect (such as bad mood or mood swings) with longer-term happiness.
  • “Although liking ‘Barack Obama’ is clearly related to being a Democrat, it is also relatively popular among Christians, African-Americans and Homosexual individuals.”

The researchers caution against the potential negative outcomes that ready access to this type of personal data might have: “Commercial companies, governmental institutions, or even one’s Facebook friends could use software to infer attributes such as intelligence, sexual orientation or political views [that] could pose a threat to an individual’s well-being, freedom or even life.”

In a related study, “Silent Listeners: The Evolution of Privacy and Disclosure on Facebook,” researchers found that although Facebook users exhibited more privacy-seeking behaviors, this was offset by changes to Facebook’s privacy policy. “The amount and scope of personal information that Facebook users revealed privately to other connected profiles actually increased over time and because of that, so did disclosures to ‘silent listeners’ on the network: Facebook itself, third-party apps, and (indirectly) advertisers.”

Tags: technology, Facebook, privacy

Last updated: March 25, 2013

 

We welcome feedback. Please contact us here.

Citation: Kosinskia, Michal; Stillwell, David; Graepelb, Thore. "Private Traits and Attributes Are Predictable from Digital Records of Human Behavior." PNAS, March 2013. doi: 10.1073/pnas.1218772110.