A May 2014 White House report on “big data” notes that the ability to determine the demographic traits of individuals through algorithms and aggregation of online data has a potential downside beyond just privacy concerns: Systematic discrimination.
There is a long history of denying access to bank credit and other financial services based on the communities from which applicants come — a practice called “redlining.” Likewise, the report warns, “Just as neighborhoods can serve as a proxy for racial or ethnic identity, there are new worries that big data technologies could be used to ‘digitally redline’ unwanted groups, either as customers, employees, tenants or recipients of credit.” (See materials from the report’s related research conference for scholars’ views on this and other issues.)
One vexing problem, according to the report, is that potential digital discrimination is even less likely to be pinpointed, and therefore remedied:
The technologies of automated decision-making are opaque and largely inaccessible to the average person. Yet they are assuming increasing importance and being used in contexts related to individuals’ access to health, education, employment, credit and goods and services. This combination of circumstances and technology raises difficult questions about how to ensure that discriminatory effects resulting from automated decision processes, whether intended or not, can be detected, measured, and redressed.
A 2014 research paper published subsequent to the White House report, “Big Data’s Disparate Impact,” also notes significant concerns as online industries become even more sophisticated in how they use data: “Approached without care, data mining can reproduce existing patterns of discrimination, inherit the prejudice of prior decision-makers, or simply reflect the widespread biases that persist in society. It can even have the perverse result of exacerbating existing inequalities by suggesting that historically disadvantaged groups actually deserve less favorable treatment.” The paper’s authors argue that the most likely legal basis for anti-discrimination enforcement, Title VII, is not currently adequate to stop many forms of discriminatory data mining, and “society does not have a ready answer for what to do about it.”
Such discriminatory possibilities are not hard to imagine, given the amount of personal data most people willingly and unwittingly disclose online. In a 2013 study in the Proceedings of the National Academy of Sciences (PNAS), “Private Traits and Attributes Are Predictable from Digital Records of Human Behavior,” scientists from the University of Cambridge and Microsoft Research were able to combine data on Facebook “Likes” and limited survey information to determine the following: They could accurately predict a user’s sexual orientation 88% of the time for men and 75% for women; predict a user’s ethnic origin (95%) and gender (93%) with a high degree of accuracy; and predict whether a user was Christian or Muslim (82%), a Democrat or Republican (85%), or used alcohol, drugs or cigarettes (between 65% and 75%), or was in a relationship (67%).
Of course, websites from Amazon to Netflix frequently employ “recommendation engines” that attempt to target potential preferences and identify personality and demographic types. Some online marketplaces where goods and services are exchanged feature explicit identifying information about sellers. Could this type of e-commerce lead to new digital forms of discrimination? Early Internet-based research suggested the Web could actually diminish discrimination across society, as it reduces face-to-face interactions and makes race or gender less salient in transactions such as car sales, where discrimination has traditionally been prevalent.
A 2014 study from a team of researchers at Northeastern University, “Measuring Price Discrimination and Steering on E-commerce Web Sites,” also finds troubling evidence through measuring the patterns of 16 top online sites. The scholars — Aniko Hannak, Gary Soeller, David Lazer, Alan Mislove and Christo Wilson — find that “real-world data indicates that eight of these sites implement personalization” and and seven sites have mechanisms that allow for personalization. Their specific findings include:
- “Cheaptickets and Orbitz implement price discrimination by offering reduced prices on hotels to
- “Expedia and Hotels.com engage in A/B testing that steers a subset of users towards more expensive hotels.”
- “Home Depot and Travelocity personalize search results for users on mobile devices.”
- “Priceline personalizes search results based on a user’s history of clicks and purchases.”
The researchers reached out to various companies for comments, which are included in the paper.
Further, researchers Benjamin G. Edelman and Michael Luca at Harvard Business School analyzed online data to assess whether or not the site Airbnb.com, which allows people to rent out lodging, might play host to forms of racial discrimination. Their 2014 paper “Digital Discrimination: The Case of Airbnb.com” examined listings for thousands of New York City landlords in mid-2012. Airbnb builds up a reputation system by allowing ratings from guests and hosts.
The study’s findings include:
- “The raw data show that non-black and black hosts receive strikingly different rents: roughly $144 versus $107 per night, on average.” However, the researchers had to control for a variety of factors that might skew an accurate comparison, such as differences in geographical location.
- “Controlling for all of these factors, non-black hosts earn roughly 12% more for a similar apartment with similar ratings and photos relative to black hosts.”
- “Despite the potential of the Internet to reduce discrimination, our results suggest that social platforms such as Airbnb may have the opposite effect. Full of salient pictures and social profiles, these platforms make it easy to discriminate — as evidenced by the significant penalty faced by a black host trying to conduct business on Airbnb.”
“Given Airbnb’s careful consideration of what information is available to guests and hosts,” Edelman and Luca note. “Airbnb might consider eliminating or reducing the prominence of host photos: It is not immediately obvious what beneficial information these photos provide, while they risk facilitating discrimination by guests. Particularly when a guest will be renting an entire property, the guest’s interaction with the host will be quite limited, and we see no real need for Airbnb to highlight the host’s picture.” (For its part, Airbnb responded to the study by saying that it prohibits discrimination in its terms of service, and that the data analyzed were both older and limited geographically.)
More data needed on algorithms
Finally, a 2014 paper by Nicholas Diakopoulos of University of Maryland, “Algorithmic Accountability Reporting: On the Investigation of Black Boxes,” looks at issues of establishing greater transparency and “reverse engineering” algorithms to help us better understand their biases – from auto-completion on Google and Bing to targeted political email to online pricing schemes that differentiate among users. Early on in this path-breaking report, written for the Tow Center for Digital Journalism at Columbia Journalism School, Diakopoulos sums up one of the most important emerging issues for democracy as it relates to the digital world: “What we generally lack as a public is clarity about how algorithms exercise their power over us.”
Related research: A 2013 study of Google’s search engine suggested that names more commonly adopted by African-Americans may be strongly associated with criminal record search results. A related issue is the customization of search results generally based on user data. For a more precise sense of how much Google filters results, see “Personalization of Web Search,” a 2013 paper by a research team at Northeastern University. That study finds that, on average, “11.7% of search results show differences due to personalization.”
Keywords: consumer affairs, African-American, civil rights