The possibilities of digital discrimination: Research on e-commerce, algorithms and big data
A May 2014 White House report on “big data” notes that the ability to determine the demographic traits of individuals through algorithms and aggregation of online data has a potential downside beyond just privacy concerns: Systematic discrimination.
There is a long history of denying access to bank credit and other financial services based on the communities from which applicants come — a practice called “redlining.” Likewise, the report warns, “Just as neighborhoods can serve as a proxy for racial or ethnic identity, there are new worries that big data technologies could be used to ‘digitally redline’ unwanted groups, either as customers, employees, tenants or recipients of credit.” (See materials from the report’s related research conference for scholars’ views on this and other issues.)
One vexing problem, according to the report, is that potential digital discrimination is even less likely to be pinpointed, and therefore remedied:
The technologies of automated decision-making are opaque and largely inaccessible to the average person. Yet they are assuming increasing importance and being used in contexts related to individuals’ access to health, education, employment, credit and goods and services. This combination of circumstances and technology raises difficult questions about how to ensure that discriminatory effects resulting from automated decision processes, whether intended or not, can be detected, measured, and redressed.
A 2014 research paper published subsequent to the White House report, “Big Data’s Disparate Impact,” also notes significant concerns as online industries become even more sophisticated in how they use data: “Approached without care, data mining can reproduce existing patterns of discrimination, inherit the prejudice of prior decision-makers, or simply reflect the widespread biases that persist in society. It can even have the perverse result of exacerbating existing inequalities by suggesting that historically disadvantaged groups actually deserve less favorable treatment.” The paper’s authors argue that the most likely legal basis for anti-discrimination enforcement, Title VII, is not currently adequate to stop many forms of discriminatory data mining, and “society does not have a ready answer for what to do about it.”
Such discriminatory possibilities are not hard to imagine, given the amount of personal data most people willingly and unwittingly disclose online. In a 2013 study in the Proceedings of the National Academy of Sciences (PNAS), “Private Traits and Attributes Are Predictable from Digital Records of Human Behavior,” scientists from the University of Cambridge and Microsoft Research were able to combine data on Facebook “Likes” and limited survey information to determine the following: They could accurately predict a user’s sexual orientation 88% of the time for men and 75% for women; predict a user’s ethnic origin (95%) and gender (93%) with a high degree of accuracy; and predict whether a user was Christian or Muslim (82%), a Democrat or Republican (85%), or used alcohol, drugs or cigarettes (between 65% and 75%), or was in a relationship (67%).
Of course, websites from Amazon to Netflix frequently employ “recommendation engines” that attempt to target potential preferences and identify personality and demographic types. Some online marketplaces where goods and services are exchanged feature explicit identifying information about sellers. Could this type of e-commerce lead to new digital forms of discrimination? Early Internet-based research suggested the Web could actually diminish discrimination across society, as it reduces face-to-face interactions and makes race or gender less salient in transactions such as car sales, where discrimination has traditionally been prevalent.
A 2014 study from a team of researchers at Northeastern University, “Measuring Price Discrimination and Steering on E-commerce Web Sites,” also finds troubling evidence through measuring the patterns of 16 top online sites. The scholars — Aniko Hannak, Gary Soeller, David Lazer, Alan Mislove and Christo Wilson — find that “real-world data indicates that eight of these sites implement personalization” and and seven sites have mechanisms that allow for personalization. Their specific findings include:
- “Cheaptickets and Orbitz implement price discrimination by offering reduced prices on hotels to
- “Expedia and Hotels.com engage in A/B testing that steers a subset of users towards more expensive hotels.”
- “Home Depot and Travelocity personalize search results for users on mobile devices.”
- “Priceline personalizes search results based on a user’s history of clicks and purchases.”
The researchers reached out to various companies for comments, which are included in the paper.
Further, researchers Benjamin G. Edelman and Michael Luca at Harvard Business School analyzed online data to assess whether or not the site Airbnb.com, which allows people to rent out lodging, might play host to forms of racial discrimination. Their 2014 paper “Digital Discrimination: The Case of Airbnb.com” examined listings for thousands of New York City landlords in mid-2012. Airbnb builds up a reputation system by allowing ratings from guests and hosts.
The study’s findings include:
- “The raw data show that non-black and black hosts receive strikingly different rents: roughly $144 versus $107 per night, on average.” However, the researchers had to control for a variety of factors that might skew an accurate comparison, such as differences in geographical location.
- “Controlling for all of these factors, non-black hosts earn roughly 12% more for a similar apartment with similar ratings and photos relative to black hosts.”
- “Despite the potential of the Internet to reduce discrimination, our results suggest that social platforms such as Airbnb may have the opposite effect. Full of salient pictures and social profiles, these platforms make it easy to discriminate — as evidenced by the significant penalty faced by a black host trying to conduct business on Airbnb.”
“Given Airbnb’s careful consideration of what information is available to guests and hosts,” Edelman and Luca note. “Airbnb might consider eliminating or reducing the prominence of host photos: It is not immediately obvious what beneficial information these photos provide, while they risk facilitating discrimination by guests. Particularly when a guest will be renting an entire property, the guest’s interaction with the host will be quite limited, and we see no real need for Airbnb to highlight the host’s picture.” (For its part, Airbnb responded to the study by saying that it prohibits discrimination in its terms of service, and that the data analyzed were both older and limited geographically.)
More data needed on algorithms
Finally, a 2014 paper by Nicholas Diakopoulos of University of Maryland, “Algorithmic Accountability Reporting: On the Investigation of Black Boxes,” looks at issues of establishing greater transparency and “reverse engineering” algorithms to help us better understand their biases – from auto-completion on Google and Bing to targeted political email to online pricing schemes that differentiate among users. Early on in this path-breaking report, written for the Tow Center for Digital Journalism at Columbia Journalism School, Diakopoulos sums up one of the most important emerging issues for democracy as it relates to the digital world: “What we generally lack as a public is clarity about how algorithms exercise their power over us.”
Related research: A 2013 study of Google’s search engine suggested that names more commonly adopted by African-Americans may be strongly associated with criminal record search results. A related issue is the customization of search results generally based on user data. For a more precise sense of how much Google filters results, see “Personalization of Web Search,” a 2013 paper by a research team at Northeastern University. That study finds that, on average, “11.7% of search results show differences due to personalization.”
Keywords: consumer affairs, African-American, civil rights
We welcome feedback. Please contact us here.
Read the study-related Boston Globe article titled "Online Marketplaces May Encourage Bias."
- Reporter's use of the study: Evaluate what the reporter chose to include and exclude from the study. Would the audience have acquired a clear understanding of the study's findings and limits from this article?
- Reporter's use of other material: Assess the material in the article that is not derived from the study. For example: Does the reporter place the study in the context of other research and to what effect? Does the reporter include reactions to the study from other researchers or interested parties (e.g., political groups, business leaders, or community members) and are their credentials or possible biases made clear?
Read the full study titled “Digital Discrimination: The Case of Airbnb.com.”
- What are the study's key technical terms? Which ones need to be put into language a lay audience can understand?
- Do the study’s authors put the research into context and show how they are advancing the state of knowledge about the subject? If so, what did the previous research indicate?
- What is the study’s research method? If there are statistical results, how did the scholars arrive at them?
- Evaluate the study's limitations. (For example, are there weaknesses in the study's data or research design?)
- How could the findings be misreported or misinterpreted by a reporter? In other words, what are the difficulties in conveying the data accurately? Give an example of a faulty headline or story lead.
Newswriting and digital reporting assignments
- Write a lead, headline or nut graph based on the study.
- Spend 60 minutes exploring the issue by accessing sources of information other than the study. Write a lead (or headline or nut graph) based on the study but informed by the new information. Does the new information significantly change what one would write based on the study alone?
- Compose two Twitter messages of 140 characters or fewer accurately conveying the study’s findings to a general audience. Make sure to use appropriate hashtags.
- Choose several key quotations from the study and show how they would be set up and used in a brief blog post.
- Map out the structure for a 60-second video segment about the study. What combination of study findings and visual aids could be used?
- Find pictures and graphics that might run with a story about the study. If appropriate, also find two related videos to embed in an online posting. Be sure to evaluate the credibility and appropriateness of any materials you would aggregate and repurpose.
Class discussion questions
- What is the study’s most important finding?
- Would members of the public intuitively understand the study’s findings? If not, what would be the most effective way to relate them?
- What kinds of knowledgeable sources you would interview to report the study in context?
- How could the study be “localized” and shown to have community implications?
- How might the study be explained through the stories of representative individuals? What kinds of people might a reporter feature to make such a story about the study come alive?
- What sorts of stories might be generated out of secondary information or ideas discussed in the study?