:::: MENU ::::

Internet, Social Media

Facebook, private traits and attributes: Predictions from digital records of human behavior

Tags: , ,


Governments, businesses and policy makers are increasingly able to capture and organize vast quantities of personal information — a trend known as “Big Data” — to help them make better-informed decisions. However, this data can also pose a serious threat to individual privacy. In a widely publicized 2012 episode, the retailer Target was able to guess from a young woman’s recent purchases that she was pregnant, before she’d even told others she was expecting. Even criminal enterprises have gotten in on the trend.

Data gathering extends far beyond simply tracking credit card purchases and includes the information people share online about themselves and others. A 2007 report by the Pew Internet and American Life Project found that 60% those surveyed were not concerned about what was available about them online and 61% do not try to control it; however, by 2012, Pew found that many people were employing privacy settings to reduce the information available about themselves. Other research has indicated that those who use social media sites are typically more relaxed about privacy settings and often feel a false sense of control.

A 2013 study from the University of Cambridge [U.K.] and Microsoft Research in the Proceedings of the National Academy of Sciences (PNAS), “Private Traits and Attributes Are Predictable from Digital Records of Human Behavior,” examines the degree to which information available online can successfully predict an individual’s personal — and private — attributes. The researchers correlated public records of Facebook “Likes” from more than 58,000 users with results from personality and intelligence tests and information from public profiles. The study focused on whether a user’s history of Likes could accurately predict sexual orientation, ethnic origin, political views, religion, personality, intelligence, satisfaction with life, substance use, age, gender, relationship status, and “whether an individual’s parents stayed together until the individual was 12 years old.”

Key study findings include:

  • The researchers were able to accurately predict a user’s sexual orientation 88% of the time for men and 75% for women. While less than 5% of user profiles were explicitly linked to gay policy or advocacy groups, “predictions rely on less informative but more popular Likes, such as ‘Britney Spears’ or ‘Desperate Housewives’ (both moderately indicative of being gay).”
  • The model was able to predict a user’s ethnic origin (95%) and gender (93%) with a high degree of accuracy. “Patterns of online behavior as expressed by Likes significantly differ between those groups, allowing for nearly perfect classification.”
  • The model predicted whether a user was Christian or Muslim (82%), a Democrat or Republican (85%), and used alcohol, drugs or cigarettes (between 65% and 75%), and was in a relationship (67%) with a high degree of accuracy.
  • The model was less accurate when attempting to predict the length of the parents’ marriage (60%). “Individuals with parents who separated have a higher probability of liking statements preoccupied with relationships, such as ‘If I’m with you then I’m with you. I don’t want anyone else.’”
  • The low prediction accuracy for satisfaction with life (only 17%) may be linked to an inability to separate short-term affect (such as bad mood or mood swings) with longer-term happiness.
  • “Although liking ‘Barack Obama’ is clearly related to being a Democrat, it is also relatively popular among Christians, African-Americans and Homosexual individuals.”

The researchers caution against the potential negative outcomes that ready access to this type of personal data might have: “Commercial companies, governmental institutions, or even one’s Facebook friends could use software to infer attributes such as intelligence, sexual orientation or political views [that] could pose a threat to an individual’s well-being, freedom or even life.”

In a related study, “Silent Listeners: The Evolution of Privacy and Disclosure on Facebook,” researchers found that although Facebook users exhibited more privacy-seeking behaviors, this was offset by changes to Facebook’s privacy policy. “The amount and scope of personal information that Facebook users revealed privately to other connected profiles actually increased over time and because of that, so did disclosures to ‘silent listeners’ on the network: Facebook itself, third-party apps, and (indirectly) advertisers.”

Tags: technology, Facebook, privacy

    Writer: | Last updated: March 25, 2013

    Citation: Kosinskia, Michal; Stillwell, David; Graepelb, Thore. "Private Traits and Attributes Are Predictable from Digital Records of Human Behavior." PNAS, March 2013. doi: 10.1073/pnas.1218772110.

    We welcome feedback. Please contact us here.

    Analysis assignments

    Read the study-related Guardian [U.K.] article titled "'Like' It or Not, Privacy Has Changed in the Facebook Age."

    1. What key insights from the news article and the study in this lesson should reporters be aware of as they cover these issues?

    Read the full study titled “Private Traits and Attributes Are Predictable From Digital Records of Human Behavior.”

    1. What are the study's key technical terms? Which ones need to be put into language a lay audience can understand?
    2. Do the study’s authors put the research into context and show how they are advancing the state of knowledge about the subject? If so, what did the previous research indicate?
    3. What is the study’s research method? If there are statistical results, how did the scholars arrive at them?
    4. Evaluate the study's limitations. (For example, are there weaknesses in the study's data or research design?)
    5. How could the findings be misreported or misinterpreted by a reporter? In other words, what are the difficulties in conveying the data accurately? Give an example of a faulty headline or story lead.

    Newswriting and digital reporting assignments

    1. Write a lead, headline or nut graph based on the study.
    2. Spend 60 minutes exploring the issue by accessing sources of information other than the study. Write a lead (or headline or nut graph) based on the study but informed by the new information. Does the new information significantly change what one would write based on the study alone?
    3. Compose two Twitter messages of 140 characters or fewer accurately conveying the study’s findings to a general audience. Make sure to use appropriate hashtags.
    4. Choose several key quotations from the study and show how they would be set up and used in a brief blog post.
    5. Map out the structure for a 60-second video segment about the study. What combination of study findings and visual aids could be used?
    6. Find pictures and graphics that might run with a story about the study. If appropriate, also find two related videos to embed in an online posting. Be sure to evaluate the credibility and appropriateness of any materials you would aggregate and repurpose.

    Class discussion questions

    1. What is the study’s most important finding?
    2. Would members of the public intuitively understand the study’s findings? If not, what would be the most effective way to relate them?
    3. What kinds of knowledgeable sources you would interview to report the study in context?
    4. How could the study be “localized” and shown to have community implications?
    5. How might the study be explained through the stories of representative individuals? What kinds of people might a reporter feature to make such a story about the study come alive?
    6. What sorts of stories might be generated out of secondary information or ideas discussed in the study?