Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk

The Internet may offer social science researchers a way to significantly lower labor costs by providing access to a large pool of workers. Google’s now-dormant ImageLabeler, for instance, crowdsourced the tedious task of assigning text labels to millions of images, and Amazon’s Mechanical Turk service (MTurk) supports the outsourcing of brief cognitive tasks or surveys. But how useful are these pools of random individuals with a computer, an Internet connection and a few minutes to spare for conducting careful research?

A 2012 study published in Political Analysis, “Evaluating Online Labor Markets for Experimental Research: Amazon.com’s Mechanical Turk,” explores the viability of conducting social science experiments using MTurk participants. The researchers, based at the Massachusetts Institute of Technology, Yale University and the University of California at Berkeley, compare the composition of a typical MTurk recruit pool to population (convenience, Internet and face-to-face) samples used in earlier research studies with respect to study costs and sample quality.

The study’s findings include:

“The MTurk sample does not perfectly match the demographic and attitudinal characteristics of the U.S. population but does not present a wildly distorted view of the U.S. population, either.” MTurk works best for random population sampling; it is less successful with studies that require more precisely defined populations.
The MTurk population convenience sample (surveying a population based on available respondents) had 60% female and 83.5% white participants, with a mean age of 32.2 and 14.9 years of education. Earlier student samples had a mean age of 20.3 years, while adult samples had 75.5% female participants with 5.48 years of education. “The MTurk respondent pool has attractive characteristics — even apart from issues of cost.”
The MTurk sample was only slightly more female (60% versus 58%) and less educated (14.9 years of schooling versus 16.2) than the 2008 American National Election Panel Study. Its study participants were also more interested in politics in general, more likely to identify as Democrats and “substantially more liberal in their ideology.”
MTurk recruits “fare worse in comparison to [traditional national surveys] on demographic characteristics related to life cycle events.” Recruits are more likely to live in the northeastern United States, have never married (51%), rent versus own their home (53%) and not be affiliated with a major religion (42%).
With respect to the costs associated with study participant identification and recruitment, “even the highest pay rate we have used on MTurk of $.50 for a 5-min survey (an effective hourly rate of $6) is still associated with a per-respondent cost of $.55 (including Amazon.com’s 10% surcharge) or $.11 per survey minute. By contrast, per subject costs for typical undergraduate samples are about $5 to $10, for non-student campus samples about $30 … and for temporary agency subjects between $15 and $20.”

The researchers note that “MTurk subjects are often more representative of the general population and substantially less expensive to recruit…. Put simply, despite possible self-selection concerns, the MTurk subject pool is no worse than convenience samples used by other researchers in political science.” They caution, though, that MTurk recruits are typically younger, more liberal, and pay more attention to tasks than the general public, factors that could compromise the integrity of research.

Tags: technology, survey