Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk
The Internet may offer social science researchers a way to significantly lower labor costs by providing access to a large pool of workers. Google’s now-dormant ImageLabeler, for instance, crowdsourced the tedious task of assigning text labels to millions of images, and Amazon’s Mechanical Turk service (MTurk) supports the outsourcing of brief cognitive tasks or surveys. But how useful are these pools of random individuals with a computer, an Internet connection and a few minutes to spare for conducting careful research?
A 2012 study published in Political Analysis, “Evaluating Online Labor Markets for Experimental Research: Amazon.com’s Mechanical Turk,” explores the viability of conducting social science experiments using MTurk participants. The researchers, based at the Massachusetts Institute of Technology, Yale University and the University of California at Berkeley, compare the composition of a typical MTurk recruit pool to population (convenience, Internet and face-to-face) samples used in earlier research studies with respect to study costs and sample quality.
The study’s findings include:
- “The MTurk sample does not perfectly match the demographic and attitudinal characteristics of the U.S. population but does not present a wildly distorted view of the U.S. population, either.” MTurk works best for random population sampling; it is less successful with studies that require more precisely defined populations.
- The MTurk population convenience sample (surveying a population based on available respondents) had 60% female and 83.5% white participants, with a mean age of 32.2 and 14.9 years of education. Earlier student samples had a mean age of 20.3 years, while adult samples had 75.5% female participants with 5.48 years of education. “The MTurk respondent pool has attractive characteristics — even apart from issues of cost.”
- The MTurk sample was only slightly more female (60% versus 58%) and less educated (14.9 years of schooling versus 16.2) than the 2008 American National Election Panel Study. Its study participants were also more interested in politics in general, more likely to identify as Democrats and “substantially more liberal in their ideology.”
- MTurk recruits “fare worse in comparison to [traditional national surveys] on demographic characteristics related to life cycle events.” Recruits are more likely to live in the northeastern United States, have never married (51%), rent versus own their home (53%) and not be affiliated with a major religion (42%).
- With respect to the costs associated with study participant identification and recruitment, “even the highest pay rate we have used on MTurk of $.50 for a 5-min survey (an effective hourly rate of $6) is still associated with a per-respondent cost of $.55 (including Amazon.com’s 10% surcharge) or $.11 per survey minute. By contrast, per subject costs for typical undergraduate samples are about $5 to $10, for non-student campus samples about $30 … and for temporary agency subjects between $15 and $20.”
The researchers note that “MTurk subjects are often more representative of the general population and substantially less expensive to recruit…. Put simply, despite possible self-selection concerns, the MTurk subject pool is no worse than convenience samples used by other researchers in political science.” They caution, though, that MTurk recruits are typically younger, more liberal, and pay more attention to tasks than the general public, factors that could compromise the integrity of research.
Tags: technology, survey
Read the issue-related PsychCentral blog post titled "Mechanical Turk to the Rescue of Psychological Research?"
- What key issues does this raise about survey research?
- What are the study's key technical term(s)? Which ones need to be put into language a lay audience can understand?
- Do the study’s authors put the research into context and show how they are advancing the state of knowledge about the subject? If so, what did the previous research indicate?
- What is the study’s research method? If there are statistical results, how did the scholars arrive at them?
- Evaluate the study's limitations. (For example, are there weaknesses in the study's data or research design?)
- How could the findings be misreported or misinterpreted by a reporter? In other words, what are the difficulties in conveying the data accurately? Give an example of a faulty headline or story lead.
Newswriting and digital reporting assignments
- Write a lead, headline or nut graph based on the study.
- Spend 60 minutes exploring the issue by accessing sources of information other than the study. Write a lead (or headline or nut graph) based on the study but informed by the new information. Does the new information significantly change what one would write based on the study alone?
- Compose two Twitter messages of 140 characters or fewer accurately conveying the study’s findings to a general audience. Make sure to use appropriate hashtags.
- Choose several key quotations from the study and show how they would be set up and used in a brief blog post.
- Map out the structure for a 60-second video segment about the study. What combination of study findings and visual aids could be used?
- Find pictures and graphics that might run with a story about the study. If appropriate, also find two related videos to embed in an online posting. Be sure to evaluate the credibility and appropriateness of any materials you would aggregate and repurpose.
Class discussion questions
- What is the study’s most important finding?
- Would members of the public intuitively understand the study’s findings? If not, what would be the most effective way to relate them?
- What kinds of knowledgeable sources you would interview to report the study in context?
- How could the study be “localized” and shown to have community implications?
- How might the study be explained through the stories of representative individuals? What kinds of people might a reporter feature to make such a story about the study come alive?
- What sorts of stories might be generated out of secondary information or ideas discussed in the study?