Data Collection in a Flat World: Strengths and Weaknesses of Mechanical Turk Samples

Cynthia Cryder

Journal of Behavioral Decision Making, J. Behav. Dec. Making (2012) Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/bdm.1753 Data Collection in a Flat World: The Strengths and Weaknesses of Mechanical Turk Samples JOSEPH K. GOODMAN1, CYNTHIA E. CRYDER1* and AMAR CHEEMA2 1 Olin Business School, Washington University in St. Louis, Saint Louis, MO, USA 2 McIntire School of Commerce, University of Virginia, Charlottesville, VA, USA ABSTRACT Mechanical Turk (MTurk), an online labor system run by Amazon.com, provides quick, easy, and inexpensive access to online research participants. As use of MTurk has grown, so have questions from behavioral researchers about its participants, reliability, and low compensation. In this article, we review recent research about MTurk and compare MTurk participants with community and student samples on a set of personality dimensions and classic decision-making biases. Across two studies, we ﬁnd many similarities between MTurk participants and traditional samples, but we also ﬁnd important differences. For instance, MTurk participants are less likely to pay attention to experimental materials, reducing statistical power. They are more likely to use the Internet to ﬁnd answers, even with no incentive for correct responses. MTurk participants have attitudes about money that are different from a community sample’s attitudes but similar to students’ attitudes. Finally, MTurk participants are less extraverted and have lower self-esteem than other participants, presenting challenges for some research domains. Despite these differences, MTurk participants produce reliable results consistent with standard decision-making biases: they are present biased, risk-averse for gains, risk-seeking for losses, show delay/expedite asymmetries, and show the certainty effect—with almost no signiﬁcant differences in effect sizes from other samples. We conclude that MTurk offers a highly valuable opportunity for data collection and recommend that researchers using MTurk (1) include screening questions that gauge attention and language comprehension; (2) avoid questions with factual answers; and (3) consider how individual differences in ﬁnancial and social domains may inﬂuence results. Copyright © 2012 John Wiley & Sons, Ltd. key words research methods; surveys; sampling; online research; external validity INTRODUCTION It is now possible for more people than ever to collaborate and compete in real time with more other people on more different kinds of work from more different corners of the planet and on a more equal footing than at any previous time in the history of the world (Friedman, 2005, p. 8). The world’s increased connectivity and access to information, called “ﬂatness” by recent best-selling author Thomas Friedman, has changed not only business practices and social communication but also how academic researchers function. Connectivity allows new research methods and ﬁndings to be rapidly disseminated and adopted by researchers worldwide. Connectivity also provides unique advantages in human behavior research by providing more convenient access to broader groups of human participants than has been possible before, most notably through a recently developed online labor market called “Mechanical Turk” or “MTurk” (www. mturk.com). MTurk, run by Amazon.com and named after an 18th century automated chess machine, provides an online workforce that allows people to complete work, or “Human Intelligence Tasks” (HITs), in exchange for money. Psychologists, behavioral economists, theoretical biologists, and consumer behavior researchers have recently started to recruit online study participants on MTurk. MTurk participants complete HITs, including academic studies, around the clock, allowing rapid data *Correspondence to: Cynthia E. Cryder, Olin Business School, Washington University in St. Louis, One Brookings Dr., CB 1133, Saint Louis, MO 63130, USA. E-mail: cryder@wustl.edu Copyright © 2012 John Wiley & Sons, Ltd. collection for as little as 10 cents per participant. MTurk is quickly being adopted by the research community; our own informal investigations of its use through MTurk postings and colleague surveys found that researchers from at least 16 of the top 30 US universities collect behavioral data via MTurk. However, at this time, MTurk is not yet widely accepted as a participant source. A main concern about using MTurk for research is the belief that participants who are willing to participate in a study for only 10 cents must be unusual. Most importantly, they might be unusual in ways that challenge the validity of research investigations. Therefore, researchers in various ﬁelds have begun to look into data from MTurk participants to examine these concerns. These researchers have veriﬁed that MTurk demographic responses are accurate (Rand, 2011), validated the psychometric properties of MTurk responses (Buhrmester, Kwang, & Gosling, 2011), and replicated some of the classic ﬁndings in behavioral economics (Horton, Rand, & Zeckhauser, 2011; Suri & Watts, 2011) and decision-making research (Paolacci, Chandler, & Ipeirotis, 2010). Except for one study showing that MTurk workers were more risk-averse than non-MTurk participants (Paolacci et al., 2010) in the Asian Disease problem (Tversky & Kahneman, 1981), research has not identiﬁed signiﬁcant differences between MTurk participants and traditional samples. Although the body of evidence validating MTurk for use in behavioral research is growing, signiﬁcant concerns remain. In particular, researchers and reviewers worry that MTurk workers do not pay sufﬁcient attention to study materials. Additionally, researchers are concerned about the growing number of international participants on MTurk who may provide different responses because of language Journal of Behavioral Decision Making or cultural differences. Furthermore, MTurk workers, who participate in studies for extremely low amounts of money, may be especially peculiar in their attitudes about money and time, which are variables of particular interest to decision-making researchers. In this paper, we directly compare MTurk samples with traditional student and community samples on several dimensions, including attention tests, a set of personality measures that include ﬁnancial attitudes, and decision-making biases. Consistent with recent MTurk ﬁndings, we show that MTurk generally provides an excellent opportunity for inexpensive and efﬁcient behavioral data collection with reliable results. However, we also ﬁnd notable differences for MTurk participants that researchers should consider before using MTurk for their own research. We initially give a brief background about MTurk and the current state of behavioral research being conducted on MTurk. MECHANICAL TURK “MTurk” uses Internet crowd sourcing to connect potential workers with jobs or tasks, called “HITs.” Both the tasks and the compensation are provided by the “requester” (an organization or individual), which pays MTurk “workers” based on the quality of their submitted work. At any moment, there are more than 100 000 HITs available to MTurk workers ranging in payment from $.01 to more than $10. The tasks vary widely in content and include, for example, conducting Web searches, editing audio transcripts, and completing questionnaires (see Mason & Suri (2012) for an excellent comprehensive guide about using MTurk). Workers participate in studies for multiple reasons, including supplemental income and enjoyment (Buhrmester et al., 2011; Paolacci et al., 2010). Paolacci et al. (2010) report that 61% of MTurk workers state that earning additional money motivates their work (though less than 14% state that it is a primary source of income) and 41% report completing HITs for entertainment. MTurk workers are a diverse group of individuals in terms of location, with fewer than half (47%) living in the United States and about one-third living in India (Paolacci et al., 2010). Surveying MTurk workers (n = 103), we found that they learn about MTurk primarily from news articles and blogs (42%), friends (27%), and Internet searches (25%; only 3% learn about it from Amazon itself). Nevertheless, MTurk workers are not signiﬁcant outliers in terms of their general demographics. Compared with the general US population, MTurk workers have a similar income distribution (with a slightly lower mean), are only slightly younger on average, have fewer children on average (though an average number for their age), and they tend to spend a day or less per week completing HITs (Ipeirotis, 2010). Recently, behavioral researchers have dramatically increased their use of MTurk for data collection because of several advantages that MTurk offers. First, MTurk samples are inexpensive; a researcher can pay as little as 10–50 cents per participant for a short study. Second, MTurk allows for rapid data collection; often, an entire study can be completed within hours. Third, MTurk answers a call from some Copyright © 2012 John Wiley & Sons, Ltd. behavioral researchers for more representative samples compared with traditional student samples (e.g., Lynch, 1982; Parker & Fischoff, 2005; Peterson, 2001; Winer, 1999). MTurk reaches a more diverse population than even some community samples, allowing researchers to gain generalizability to broader populations (Buhrmester et al., 2011) and to test research questions across cultures (Eriksson & Simpson, 2010). Because of the high interest in MTurk, recent research has attempted to systematically investigate MTurk’s usefulness and reliability as a participant source. Rand (2011) found that demographic responses from MTurk were truthful and consistent; for example, more than 95% of MTurk participants reported their country location correctly as veriﬁed by IP address matching. Buhrmester et al. (2011) found that common ﬂuctuations in compensation do not affect MTurk data quality. Buhrmester et al. (2011) also found that responses from MTurk participants were at least as reliable as those obtained from other non-MTurk samples and were also more representative of the general population compared with student participants and other online participants. Finally, MTurk participants exhibit similar judgment and decision biases (i.e., framing effects, the conjunction fallacy, and outcome bias) compared with students and online discussion board participants (Paolacci et al., 2010). Overall, these investigations conclude that MTurk is a high-quality source of behavioral participants. RESEARCH QUESTIONS Although research to date has widely endorsed MTurk, some pressing questions about using MTurk remain in the minds of both researchers and reviewers. Initially, do MTurk participants pay sufﬁcient attention to study materials? A big concern is that MTurk participants are especially prone to skimming through study materials because they are unsupervised and poorly compensated. One study found high levels of attentiveness among MTurk participants (as measured by response to the question, “While watching the television, have you ever had a fatal heart attack?”; Paolacci et al., 2010). However, the answer to this screening question was easily recognizable, and almost all participants (95%), regardless of source, answered the question correctly. Because research has shown that unsupervised participants are less likely to pay attention to instructions (Oppenheimer, Meyvis, & Davidenko, 2009), we wished to further explore this issue by using more demanding attention checks. In both of our studies, we included a modiﬁed Instructional Manipulation Check (IMC; Oppenheimer et al., 2009) that was designed to assess whether or not participants were reading instructions carefully. We also examined whether English as a Second Language (ESL) speakers, a growing portion of MTurk participants, struggled with following these experiment instructions; demographic comparisons ﬁnd that as many as one-third of MTurk participants are located in India alone, a proportion that is increasing with time (Paolacci et al., 2010). Language barriers J. Behav. Dec. Making (2012) DOI: 10.1002/bdm J. K. Goodman et al. may compromise ESL participants’ ability to comprehend and follow detailed directions. A related question is whether MTurk participants have different cognitive capabilities compared with non-MTurk participants. This is an area of concern because cognitive predispositions have been found to be important moderators of decision patterns (Chatterjee, Heath, Milberg, & France, 2000; LeBoeuf & Shaﬁr, 2003). Many MTurk participants (41%) cite enjoyment as a major reason for participating in MTurk studies (Paolacci et al., 2010), and therefore, it is possible that MTurk participants intrinsically enjoy cognitive activities more than community and student samples. Alternatively, college students, who by deﬁnition have chosen to pursue a higher education (arguably a cognitively rigorous endeavor), may be more likely to engage in effortful cognitive processing. To investigate these potential differences, we tested for differences in the Cognitive Reﬂection Test (CRT; Frederick, 2005) between MTurk participants and both student and community samples. Although some research has examined the psychometric standards of MTurk participants’ responses (Buhrmester et al., 2011), no work to date has examined whether personality features of MTurk participants differ from traditional student and community samples. To gain understanding of these dimensions, we administered standard personality inventories. Speciﬁcally, we administered the Ten-Item Personality Inventory (TIPI; Gosling, Rentfrow, & Swann, 2003), which captures the Big Five dimensions of personality (John & Srivistava, 1999). The Big Five personality dimensions comprise the most widely used and measured model of individual differences in psychology (Gosling et al., 2003). We also included a measure of global selfesteem (the single item self-esteem scale; Robins, Hendin, & Trzesniewski, 2001). Global self-esteem is also one of the most commonly studied constructs in psychology, predicting outcomes such as academic achievement, occupational success, and relationship health (Trzesniewski, Donnellan, & Robins, 2003). Of special interest to decision-making and consumer behavior researchers are differences in the valuation and spending of money and time. Because MTurk participants are willing to complete Web tasks in exchange for little money, they are likely to differ on how they value money, material goods, and time. In both studies, we administered the time versus money scale (Cryder & Loewenstein, 2010) to measure how participants value time versus money, the Material Values Scale (MVS; Richins, 2004) to determine how they value material possessions, and the TightwadSpendthrift (TW-ST) scale (Rick, Cryder, & Loewenstein, 2008) to measure how painful it is for them to spend money. Given the low compensation of MTurk participants, they might also respond unusually to decision tasks involving money and risk. Although previous research (Paolacci et al., 2010) has shown that MTurk participants exhibit some classic Judgment and Decision Making effects, speciﬁcally, framing effects (in the Asian Disease problem), the conjunction fallacy (Linda problem), and the outcome bias (physician problem), we do not yet know whether MTurk participants differ in terms of biases that deal with money Copyright © 2012 John Wiley & Sons, Ltd. Data Collection in a Flat World and payoffs. In Study 2, we explored this proposition by testing for present bias and discounting asymmetries (Loewenstein, 1988; Malkoc & Zauberman, 2006; Thaler, 1981), risk aversion for gains, risk-seeking for losses, and the certainty effect (Barron & Erev, 2003; Hertwig, Barron, Weber, & Erev, 2004; Kahneman & Tversky, 1979; Tversky & Kahneman, 1992). Finally, MTurk also allows unique opportunities for its participants, who are completely unsupervised, to use the Internet as a resource. Although in some studies, this ability for participants to use the Internet may actually serve as a beneﬁt, in others, it may serve as a hindrance, such as when participants can seek answers to factual study questions. We examine this issue using a classic judgment task that is designed to illustrate anchoring and adjustment and which relies on factual answers. STUDY 1: COMPARING MTURK PARTICIPANTS WITH COMMUNITY PARTICIPANTS In Study 1, we compared MTurk participants with a community sample from a large city in the northeastern United States. The study compared the two samples on several dimensions: individual differences, cognitive activity, native language, and an anchoring-and-adjustment judgment heuristic. Participants and procedure We sampled 107 MTurk participants, restricting participation to participants with an approval rate of at least 95% (i.e., 95% or more of that participant’s previous submissions were approved by requesters; a 95% approval rate is the default MTurk cutoff). Participants received $.10 and were told that the study would take about 10 minutes. We posted a link for a Web-based study administered by Qualtrics, an online questionnaire software company, into an MTurk HIT. At the end of the questionnaire, participants received instructions to enter a unique code in the MTurk HIT to verify that they completed the study to receive payment. Although MTurk allows only one worker ID per person and the same ID may only do a HIT once, Qualtrics restrictions were also set to allow one response per IP address to provide additional protection against participants completing the study multiple times. The community sample consisted of 60 participants in a commercial area of a middle class urban neighborhood. Participants were approached on the street by a research assistant and asked to participate in a short academic research study. They were paid $5 for a group of studies that included a paper-and-pencil version of this study. Although the two samples were paid signiﬁcantly different amounts for the same study, the payments were intended to approximate the market rate for participation payment for each sample. Participants in both samples completed identical survey measures in an identical order. To provide a comparison of basic individual differences between samples, participants initially answered questions from the TIPI (Gosling et al., 2003) that J. Behav. Dec. Making (2012) DOI: 10.1002/bdm Journal of Behavioral Decision Making measured the Big Five dimensions of personality (extraversion, agreeableness, conscientiousness, emotional stability, and openness to experience; two traits per dimension, averaged, 1 = Disagree Strongly to 7 = Agree Strongly) and responded to a single-item self-esteem measure (“I have high self-esteem” 1 = Not very true of me to 5 = Very true of me; Robins et al., 2001). Participants then responded to three questions from the CRT to measure the extent to which the participant engaged in deliberate System 2 processing (Frederick, 2005). Next, participants answered several questionnaires related to money. Eight questions measured the extent to which participants value money versus time (Cryder & Loewenstein, 2010), four questions were from the TW-ST scale and measured the pain of paying (Rick et al., 2008), and nine questions were from the short version of the MVS (Richins, 2004). Next, participants completed a classic anchoring and adjustment task that included a factual question (Oppenheimer, LeBoeuf, & Brewer, 2008; Tversky & Kahneman, 1974). Participants entered the last two digits of their phone number, indicated whether the number of countries on the continent of Africa is higher or lower than that number, and then estimated the number of countries in Africa. Finally, participants answered demographic questions (age, gender, education, and native language) and ﬁnished with a modiﬁed IMC (Oppenheimer et al., 2009; see Appendix). The IMC gauged whether participants were paying careful attention to the instructions. For the IMC, participants read a paragraph about decision-making research, followed by a question that asked “what was this study about?” Participants were provided several multiple choice answers, including the obvious answer “your opinions and behaviors.” However, the paragraph also included special instructions. These special instructions told participants to ignore the provided multiple choice answers, “. . .select the box marked ‘other’ and type ‘decision making’ on the line below.” We note that in this study, the IMC included some answer choices that were obviously incorrect (such as “lions” and “tigers”). Such unusual options may have allowed some participants who were skimming questions and answers to notice that something was out of the ordinary, to pay more attention, and to successfully pass the IMC. This IMC design was intended to identify the group of participants paying the very least attention and to keep most participants in the sample. As will be seen in the results, this IMC still identiﬁed a signiﬁcant number of participants who were below this (low) threshold of attention. Results Demographics The MTurk sample did not signiﬁcantly differ from the community sample in age (M = 33 for both groups), gender (female: MTurk = 58.8% versus Community = 51.7%) or education (modal and median education was 4-year bachelor’s degree). More MTurk participants (27.5%) had ESL compared with the community sample (10.5%, w2(1, n = 159) = 5.79, p < .05) presumably because of 25.5% of MTurk participants living outside the US (21% from India). As expected, ESL Copyright © 2012 John Wiley & Sons, Ltd. was correlated with residency (r = .64, p < .001): of non-US participants, 80.8% had ESL, whereas 9.8% of US participants had ESL. Sample characteristics are summarized in Table 1. Attention and cognition Our ﬁrst research question asked whether MTurk participants paid as much attention to study materials as other participants. In this study, MTurk participants were just as likely to answer the IMC question correctly (81.1%) as the community sample (84.5%, w2(1, n = 167) < 1), suggesting that the two samples were equally likely to read and follow instructions. Although non-US participants were less likely to correctly answer the IMC (70.4%) compared with US participants (88.5%, w2(1, n = 167) = 5.51, p < .05), it seems that the difference is largely because of language comprehension: We found a signiﬁcant difference between ESL and non-ESL participants in rates of answering the IMC correctly (non-ESL = 81% versus ESL = 61%, w2(1, n = 135) = 4.94, p < .05). Our subsequent analyses include only participants who correctly answered the IMC; at the end of the results section, we examine whether our results change when including participants who failed the IMC and if the changes are similar when we exclude ESL or non-US participants. A second question was whether MTurk participants, who are paid very little for their efforts, are as able to think through highly cognitive problems compared with other participants. Our results suggest that MTurk (M) CRT scores were not signiﬁcantly different from the community (C) sample (MM = 1.17 versus MC = .96, F(1, 133) = 1.10, p > .2). Money, time, and individual differences Given that MTurk participants are willing to participate in studies for such small amounts of money, one might expect that they would value their time less and be tighter with their spending than would community participants. Consistent with this notion, MTurk participants valued their time slightly less and were more willing to participate in more tasks in exchange for money than were community participants. MTurk participants indicated they would participate in signiﬁcantly more time-consuming tasks in exchange for money than would the community (MM = 5.13 versus MC = 4.46, F(1, 132) = 5.22, p < .05). MTurk participants also scored lower on the TW-ST scale, indicating that they are tighter with their spending than the community group (MM = 13.69 versus MC = 15.86, F(1, 126) = 7.35, p < .01). Finally, the MTurk group scored higher on the MVS (MM = 25.62 versus MC = 23.27, F(1, 132) = 3.89, p = .05), suggesting greater materialistic values than the community participants. The results from the TIPI scale (Gosling et al., 2003) that measures the Big Five Personality factors showed that MTurk participants scored lower on extraversion (MM = 3.66 versus MC = 4.56, F(1, 133) = 9.60, p < .01) and emotional stability than community participants (MM = 4.23 versus MC = 4.80, F(1, 133) = 4.88, p < .05). Similarly, MTurk participants showed marginally lower self-esteem than J. Behav. Dec. Making (2012) DOI: 10.1002/bdm J. K. Goodman et al. Data Collection in a Flat World Table 1. Results—Study 1 MTurk Community IMC ESL** Age USA** Female 81.1% 84.5% 27.5% 10.5% 33.5 32.9 74.5% 100% 58.8% 51.7% Cognitive reﬂection test Mean SE MTurk 1.17 0.12 Community 0.96 0.16 Time value of money** Mean SE 5.13 0.18 4.46 0.23 Tightwadspendthrift** Mean SE 13.69 0.46 15.86 0.66 Material values* Agreeableness Conscientiousness Emotional stability** Mean SE 4.23 0.16 4.80 0.21 Extraversion** MTurk Community Mean 3.66 4.56 SE 0.17 0.23 Mean 4.81 5.09 SE 0.13 0.17 Mean 5.33 5.52 SE 0.14 0.18 Mean 25.62 23.27 SE 0.71 0.95 Openness to experience** Mean SE 4.98 0.12 5.77 0.15 Selfesteem* 3.3 0.12 3.67 0.15 **p < .05; MTurk participants are statistically different from community participants. *p < .06; MTurk participants are marginally different from community participants. i p < .05; MTurk IMC interaction. IMC, Instructional Manipulation Check; ESL, English as a Second Language; SE, standard error. did the community sample (MM = 3.30 versus MC = 3.67, F(1, 133) = 3.92, p = .055). Despite being active users of new technologies, MTurk participants also scored lower on openness to experience (MM = 4.98 versus MC = 5.77, F(1, 133) = 16.98, p < .001). There was no signiﬁcant difference between samples on the dimensions of agreeableness and conscientiousness. Anchoring and adjustment To examine the anchoring and adjustment data, we regressed participants’ country number estimates on their phone number anchors (mean-centered), sample group, and their interaction. We found that the larger the participants’ phone number, the greater the participants’ estimate of the number of countries in Africa (intercept = 41.37, b = .19, t(129) = 3.20, p < .01). However, we also found a signiﬁcant interaction (b = .13, t(129) = 3.03, p < .01) indicating that although the community showed a signiﬁcant anchoring and adjustment effect (b = .45, t(129) = 4.31, p < .01), MTurk participants did not (b = .06, t(129) < 1). Further analysis revealed that MTurk participants might be using their freedom from experimenter monitoring to check factual answers on the Internet. Ten percent of MTurk respondents “estimated” the correct number of countries in Africa as reported by a quick Google search (53 countries) or Wikipedia query (54 countries), whereas none of the community participants gave these precisely correct responses (w2(1, n = 156) = 5.98, p < .02). Because this tendency to fact-check is important for many types of investigations, we ran two short follow-up anchoring and adjustment studies to better understand when factchecking is likely to be a problem. One potential reason for looking up answers is that participants feel pressure to give correct answers to receive their participation payments. To test this proposition, we ran the same anchoring and adjustment study on MTurk while telling half the participants that they would get paid regardless of their answer. This intervention did not signiﬁcantly affect the rate of correct answers Copyright © 2012 John Wiley & Sons, Ltd. (14.3%) compared with a control condition (21%, w2(1, n = 94) < 1). To further examine the factors that inﬂuence cheating incidence, we ran a second study that manipulated (1) compensation and (2) whether or not we told participants the following: “Please do NOT use external sources like the Internet to search for the correct answer.” In a 3 (compensation: none, $.10, $1) 2 (instructions: control versus do not search) between subject factorial design experiment, we found main effects for compensation and instructions (with no signiﬁcant interactions; see Table 2 for percentages). First, participants who were paid for correct answers were signiﬁcantly more likely to look up a correct answer (w2(1, n = 270) = 17.21, p < .001), and the amount of payment had only a marginal effect ($.1 = 36.2% versus $1 = 49.6%, w2(1, n = 270) = 3.03, p = .08). Second, simply asking MTurk participants to not use the Internet to search for the correct answer was sufﬁcient to signiﬁcantly reduce cheating from 40.1% to 27.2% (w2(1, n = 270) = 5.76, p < .05), although it did not eliminate cheating entirely. Instructional Manipulation Check, English as a second language, and US participants The ﬁnal issue we addressed was how including participants who failed the IMC affected our results, as well as whether ESL and non-US participants showed different patterns. The goal of the IMC is to identify participants who are not following instructions, potentially increasing Type II error. Our results showed that few conclusions change when we include all participants, though not surprisingly, statistical Table 2. Results—Study 1 follow-up: percentage of participants “estimating” the correct number of countries in Africa (i.e., percentage searching online) Control Additional “do not check” instruction No payment $.10 $1 23% 8% 38% 35% 60% 39% J. Behav. Dec. Making (2012) DOI: 10.1002/bdm Journal of Behavioral Decision Making signiﬁcance suffers. We found only one signiﬁcant MTurk by IMC interaction (on emotional stability, F(1, 159) = 4.24, p < .05). As an even more conservative test, we also examined whether any of our signiﬁcant results became nonsigniﬁcant, or reversed, when those who failed the IMC were included in the analysis. Of our signiﬁcant ﬁndings, two became nonsigniﬁcant when we included those who failed the IMC: material values (F(1, 157) = 2.32, p = .13) and emotional stability (F(1, 164) = 2.09, p = .15). None of our effects showed a signiﬁcant opposite result. In sum, the results show that including participants who incorrectly answered the IMC adds statistical noise and potentially increases the incidence of Type II error. Given the growing number of international MTurk participants, we also examined whether participants with ESL and non-US participants differed signiﬁcantly from the rest of our sample. As previously noted, ESL and non-US participants were more likely to fail the IMC, but the correlations were modest (rESL, IMCfail = .18, p < .05; rnon-US, IMCfail = .16, p = .05). Examining only participants who passed the IMC, we did not ﬁnd any signiﬁcant MTurk by ESL interactions (ps > .5). Examining all participants, we found only one signiﬁcant MTurk by ESL interaction, on emotional stability, in which non-ESL participants showed a larger difference than the ESL participants (F(1, 154) = 4.23, p < .05). Limiting our analysis to only non-ESL participants, we ﬁnd that one of our signiﬁcant results became nonsigniﬁcant (material values, F(1, 121) = 1.43, p > .2). We ﬁnd similar results when we limit our analysis to only US participants: two of our signiﬁcant results became nonsigniﬁcant (material values, F(1, 129) < 1, and emotional stability, F(1, 130) = 2.62, p > .1). In sum, ESL and non-US participants are more likely to fail the IMC, but more signiﬁcant effects emerge when we ﬁlter by IMC instead of by ESL or by location. Discussion In Study 1, we found several similarities between MTurk and community participants. Initially, the two groups were similar in gender and age and similar in cognitive effort and ability as measured by the CRT, and according to performance on the IMC, they were equally likely to follow instructions. Recall that we used a 95% worker approval rating on MTurk as a prerequisite for a worker to complete our survey. This is the default cutoff level used on MTurk. We speculate that using a lower cutoff (or no cutoff) would have led to a larger proportion of MTurk participants failing the IMC, but we do not have empirical evidence to test this hypothesis. There were important differences between groups as well. MTurk participants differed in terms of their valuation of money, time, and material goods. MTurk participants valued money more than time compared with the community sample, and they were more likely to be self-reported tightwads (versus spendthrifts). They also scored higher on the MVS, which measures materialism and the meaning of material goods in participants’ lives. Interestingly, a coherent pattern emerged on the basic individual difference measures. MTurk participants were less Copyright © 2012 John Wiley & Sons, Ltd. extraverted and more socially unstable than the community participants. MTurk participants also showed slightly lower self-esteem, which is related to extraversion (Robins et al., 2001) and relationship health (Trzesniewski et al., 2003). Although not predicted, these results could possibly be explained by (a) the demographic diversity of the sample; (b) something unique about MTurk participants, such as a general predilection toward introversion and other related dispositional constructs traits; and/or (c) something unique about the community sample, such as the potential extraverted nature a group of individuals who complied with a request from a stranger on the street to participate in an experiment. In Study 2, we further investigate these basic personality characteristics of the MTurk sample to determine whether the observed differences are more likely because of the unique nature MTurk participants or community participants. Finally, we also found a difference in terms of heuristic use: MTurk participants did not seem to rely upon a prior anchor when estimating the number of countries in Africa. Instead, it appears that at least some MTurk respondents were looking up the correct answer online. Follow-up studies indicated that even small incentives amplify the tendency to cheat and that requests to not look up answers signiﬁcantly decrease, but do not entirely eliminate, cheating. STUDY 2: COMPARING MTURK PARTICIPANTS WITH STUDENT PARTICIPANTS Our main goal in Study 2 was to compare MTurk participants to a traditional undergraduate student research pool, and while doing so, gain a greater understanding of the unique features of MTurk respondents. In this study, we also wanted to control for survey format (paper and pencil versus computer based). It is possible that differences between the MTurk participants and the community participants in the previous study were inﬂuenced by differences in questionnaire format—the community sample used paper and pencil instruments, whereas the MTurk sample used a computer. Participants and procedure We sampled 207 participants from MTurk, again restricting the sample to participants with an approval rate of 95% or higher. Participants received $.20 for completing the study and were told that the HIT takes approximately 15 minutes to complete. We posted a link to the Qualtrics questionnaire in the HIT, and after completing the questionnaire, participants were instructed to return to the HIT and enter their unique code. Actual average time of completion (excluding ﬁve outliers who each took over 1 hour) was 16 minutes 35 seconds. We restricted the HIT so that it could only be completed once per MTurk ID, and we also restricted Qualtrics to accept only one response per IP address. We sampled 131 students from a Midwestern university in the United States. Participants received credit in their introductory business courses for participation in a 30-minute session that included several studies. Half of the students were J. Behav. Dec. Making (2012) DOI: 10.1002/bdm J. K. Goodman et al. Data Collection in a Flat World randomly placed in a computerized condition and half in an equivalent paper-and-pencil condition. Average time of completion on the computer was 12 minutes 41 seconds, and with paper and pencil, 14 minutes 21 seconds. To further examine differences in decision making related to money and payoffs, participants initially responded to four risky choice prospects, adapted from previous research, testing risk aversion for gains, risk-seeking for losses, and the certainty effect (Barron & Erev, 2003; Hertwig et al., 2004; Kahneman & Tversky, 1979; Tversky & Kahneman, 1992). The four problems are included in Table 3. Participants then responded to a discounting task adapted from previous literature testing present bias (Thaler, 1981) and the delay/expedite asymmetry (Loewenstein, 1988; Malkoc & Zauberman, 2006). Then, as in Study 1, participants completed the CRT (Frederick, 2005), the 10-item inventory for the Big Five personality dimensions (TIPI; Gosling et al., 2003), the single-item self-esteem scale (Robins et al., 2001), the time versus money scale (Cryder & Loewenstein, 2010), the TW-ST (Rick et al., 2008), and the MVS (Richins, 2004). New to this study, participants also completed the Maximizer/Satisﬁcer Scale (Nenkov, Morrin, Ward, Schwartz, & Hulland, 2008) to examine cognitive predispositions and preference for optimal outcomes at the expense of extra effort. Finally, participants completed the demographic questions from Study 1 and a new, more conservative version of the IMC (Oppenheimer et al., 2009) to test whether participants were closely reading the instructions of the experiment. This version of the IMC required more careful reading to give a “correct” answer than the version used in Study 1 by including reasonable, but incorrect, answer choices that were less likely to trigger suspicion (see Appendix). Demographics Not surprisingly, MTurk (M) participants were signiﬁcantly older than the student (S) participants (MM = 31.0 versus MS = 19.4, F(1, 334) = 164, p < .001), and the median level of education was higher (MdnM = “bachelor’s degree”) than the students (MdnS = “some college”, Wilcoxon Z = 10.52, p .001). As in Study 1, more MTurk participants had ESL (57.0%) compared with the student sample (20.3%, w2(1, n = 335) = 57.01, p < .001), and 62.8% of MTurk participants were living outside the US (52.2% from India). As in Study 1, ESL was correlated with residency (rESL, non-US = .82, p < .001): of non-US participants, 87.8% had ESL, whereas 9.4% of US participants had ESL. Sample characteristics are summarized in Table 4. Cognition and attention The IMC tests whether participants were paying attention and following instructions. In Study 2, we used a signiﬁcantly more difﬁcult IMC test than in Study 1 to see if increased difﬁculty mattered. We found that MTurk participants were signiﬁcantly less likely than student participants to correctly answer the more difﬁcult IMC question (MM = 66.2% versus MS = 88.5%, w2(1, n = 338) = 19.51, p < .001). We also again found a signiﬁcant difference in rates of answering the IMC correctly between non-ESL (71%) and ESL participants (29%, w2(1, n = 335) = 54.27, p < .001). As in Study 1, our subsequent analyses only consider those participants who followed instructions on the IMC. At the end of this results section, we again discuss how our results change when we include those who failed the IMC and if the changes are similar when we exclude ESL or non-US participants. Students scored signiﬁcantly higher on the CRT than did the MTurk participants (MM = 1.23 versus MS = 1.70, F(1, 250) = 10.49, p < .01). Note that there was no difference between MTurk participants and the community sample in Study 1, suggesting that it may be student participants who are unique in terms of their reﬂective thought. Despite differences between students and MTurk participants in deliberative thought on the CRT, we did not ﬁnd any differences in terms of preference for optimal outcomes as measured by the Maximizer/Satisﬁcer Scale (MM = 37.36 versus MS = 38.72, F(1, 250) = 2.25, p > .1). Results We created two orthogonal contrast codes to capture the nested design of the study (Rosenthal, Rosnow, & Rubin, 2000). The ﬁrst code compared the MTurk participants with the student group, and the second code compared the computerized and paper and pencil participants within the student group. The second contrast code (computer versus paper and pencil) was generally not signiﬁcant; thus, we report the data collapsed across conditions noting any differences that approach signiﬁcance (i.e., p < .1) in Table 4. Table 3. Risk prospects—Study 2 Alternative 1 Prospect Prospect Prospect Prospect 1 2 3 4 Alternative 2 P(R) $, Probability E(X) $, Probability E(X) $4, .8 $320, .8 $4, .2 $3, 1.0 $3.20 $256.00 $0.80 $3.00 $3, 1.0 $240, 1.0 $3, .25 $4, .8 $3.00 $240.00 $0.75 $3.20 MTurk 15% 14%* 48%** 76% Student computer Student paper 21% 21% 70% 82% 15% 25% 81% 76% *p < .06; MTurk < students. **p < .001; MTurk < students. Underlined choice predicted using Prospect Theory. E(X), expected value of alternative; P(R), percentage selecting alternative 1. Copyright © 2012 John Wiley & Sons, Ltd. J. Behav. Dec. Making (2012) DOI: 10.1002/bdm Journal of Behavioral Decision Making Table 4. Results—Study 2 IMC** ESL** Some Current USA** Female** college** student** Age** MTurk Student computer Student paper 66.2% 86.4% 90.8% 56.3% 9.4% 20.0% 37.2% 100% 100% Time versus money MTurk Student computer Student paper Cognitive Reﬂection Test**† Mean SE 1.23 0.1 1.68 0.15 1.71 0.15 SE 0.12 0.19 0.19 Tightwadspendthrift~ Mean SE 14.56 0.34 14.14 0.52 13.96 0.52 Extraversion** Agreeablenessb Conscientiousness** MTurk Student computer Student paper Mean 4.03 4.97 4.69 SE 0.12 0.19 0.19 Mean 5.49 5.26 5.27 Mean 5.06 4.66 5.09 42.5% 53.9% 59.4% 92.3% 100.0% 100.0% SE 0.11 0.16 0.16 Mean 5.18 5.43 5.58 23.7% 100% 100% SE 0.10 0.16 0.16 Mean 31.0 19.5 19.3 Maximizer/ Satisﬁcer Mean 37.37 38.79 38.64 SE 0.61 0.95 0.93 Material values Mean 27.57 27.84 27.52 SE 0.54 0.83 0.83 Emotional Stability** Mean SE 4.47 0.12 4.98 0.18 4.66 0.18 Openness to Experience Mean SE 5.08 0.09 5.20 0.14 5.32 0.14 SelfEsteem** Mean SE 3.46 0.08 3.77 0.13 3.72 0.13 **p < .05; MTurk participants are signiﬁcantly different from student participants. p < .06; student computer participants are marginally different from student paper participants. i p < .05; MTurk versus students IMC interaction. ~ p < .1; MTurk versus students IMC interaction. † p < .05; MTurk versus students ESL interaction. IMC, Instructional Manipulation Check; ESL, English as a Second Language; SE, standard error. b Money, time, and individual differences In terms of basic individual differences, we see results similar to Study 1. Most notably and consistent with Study 1, MTurk participants were less extraverted (MM = 4.03 versus MS = 4.83, F(1, 250) = 19.75, p < .001), less emotionally stable (MM = 4.47 versus MS = 4.82, F(1, 250) = 4.17, p < .05), and had less self-esteem (MM = 3.46 versus MS = 3.75, F(1, 250) = 5.18, p < .05) than student participants. Differing from Study 1, MTurk participants were also less conscientious than the student participants (MM = 5.18 versus MS = 5.50, F(1, 250) = 4.43, p < .05) and did not show a signiﬁcant difference in their openness to new experiences (MM = 5.08 versus MS = 5.26, F(1, 250) = 1.81, p > .1). Although Study 1 found that MTurk participants valued money (versus time) more than the community, Study 2 found that MTurk participants did not value their money versus time differently than student participants (MM = 5.49 versus MS = 5.28, F(1, 250) = 1.52, p > .2). Furthermore, the MTurk group was just as tightwad as students (MM = 14.56, MS = 14.05, F(1, 248) = 1.02, p > .2) and showed no differences from students in material values (MM = 27.57, MS = 27.69, F(1, 249) < 1). These similarities between students and MTurk participants on spending constructs could be because both students and MTurk participants live with constrained budgets. Given the low compensation of MTurk participants, a goal of Study 2 was to examine whether MTurk participants differed from undergraduates on decision tasks related to money and risk, speciﬁcally, risk aversion for gains, riskseeking for losses, the certainty effect (Kahneman & Tversky, 1979; Tversky & Kahneman, 1992), present bias (Thaler, 1981), and delay/expedite asymmetries (Loewenstein, 1988). Copyright © 2012 John Wiley & Sons, Ltd. Both student and MTurk participants were risk-averse for gains for both small (P(R)1: MTurk = .15, Student = .18) and large gambles (P(R)2: MTurk = .14, Student = .28), and they were risk-seeking for losses (P(R)4: MTurk = .76, Student = .79). Both groups exhibited the classic certainty effect (P(R)1 MTurk = .15 versus P(R)3 MTurk = .48, F(1, 136) = 50.75, p < .001; P(R)1 Student = .18 versus P(R)3 Student = .76, F(1, 115) = 116.79, p < .001). Our results suggest that Prospect Theory patterns of risk taking do apply to both MTurk and student participants, although we did ﬁnd some minor differences. MTurk participants were marginally more riskaverse for one of the four prospects (Prospect 2) than the student participants (w2(1, n = 253) = 3.58, p = .058), and the student sample showed a signiﬁcantly stronger certainty effect (F(1, 250) = 130.27, p < .001). Time preference results show that although both groups showed statistically signiﬁcant present bias in a delay frame (M3months = 4.50 versus M12months = 2.06, F(1, 233) = 217.86, p < .001) and expedite frame (M3months = .38 versus M12months = .20, F(1, 230) = 28.73, p < .001), MTurk participants exhibited signiﬁcantly stronger tendencies toward present bias in both frames (Delay: F(1, 233) = 4.24, p < .05; Expedite: F(1, 230) = 5.86, p < .05).1 The two groups exhibited delay/ expedite asymmetries at similar rates: present bias was greater in the delay frame compared with the expedite frame 1 We used continuously compounded discount rates as the dependent variable (Malkoc & Zauberman, 2006; Thaler, 1981). We removed from the analysis 16 participants who indicated a negative discount rate and did not understand the instructions (they were willing to pay more than $15 to expedite a $15 payment). J. Behav. Dec. Making (2012) DOI: 10.1002/bdm J. K. Goodman et al. Data Collection in a Flat World (F(1, 229) = 166.67, p < .001), and discount rates were signiﬁcantly greater when delaying 3 months versus expediting 3 months (F(1, 232) = 198.53, p < .001) and when delaying 12 months versus expediting 12 months (F(1, 229) = 145.20, p < .001). These comparisons are listed in Table 5. Instructional Manipulation Check, English as a second language, and US participants We further investigated the IMC to examine how our results would change if we included those who failed the IMC. There was only one marginally signiﬁcant MTurk by IMC interaction (F(1, 330) = 3.44, p < .1): the difference between MTurk participants and students on the TW-ST scale was greater for those who failed the IMC than those who passed. As in Study 1, our main conclusions do not change when we include participants who failed the IMC in the analyses, but statistical signiﬁcance suffers. Of the signiﬁcant ﬁndings that we found, differences in emotional stability become marginally signiﬁcant (F(1, 335) = 3.29, p = .07) when we include those who failed the IMC. Examining ESL and non-US participants, we again found that IMC failures are positively correlated with ESL and non-US location (rESL, IMCfail = .40, p < .001; rnon-US, IMCFail = .38, p < .001). We found one signiﬁcant MTurk by ESL interaction (CRT, F(1, 329) = 3.90, p < .05): the difference between MTurk and student participants on the CRT was smaller for ESL participants compared with nonESL. More importantly, we found that statistical signiﬁcance suffered when we ﬁltered by ESL and only examined non-ESL participants; two of our signiﬁcant ﬁndings became marginally signiﬁcant or nonsigniﬁcant: emotional stability (F(1, 197) < 1) and conscientiousness (F(1, 197) = 3.72, p = .06). We found similar results when we limited our analysis to non-US participants; emotional stability (F(1, 205) = 1.88, p > .15) and conscientiousness (F(1, 205) = 1.61, p > .2) became nonsigniﬁcant. In sum, ESL and non-US participants were more likely to fail the IMC, but the results suggest that it is more efﬁcient to ﬁlter by IMC because it ﬁlters fewer participants and allows us to detect more signiﬁcant differences. Table 5. Difference in asymmetries—Study 2 discount rates Delay 3 Expedite 3 versus delay versus expedite 12 months 12 months MTurk Student computer Student paper Overall and delay/expedite Delay 3 versus expedite 3 months Delay 12 versus expedite 12 months 2.8** 2.51b 0.25** 0.07 4.45 4.46 1.99 2.03 1.7 0.09 3.14 1.52 2.57 0.19 4.23 1.92 All differences are signiﬁcantly different from zero, p < .05. **p < .05; different from the student sample. b p < .1; student computer participants are marginally different from student paper participants. Copyright © 2012 John Wiley & Sons, Ltd. Discussion In many ways, MTurk participants were similar to the student sample; they had similar attitudes about money and exhibited the same classic decision-making biases. However, MTurk participants also showed important differences compared with the student participants: MTurk participants were less likely to correctly answer the IMC, they scored lower on the CRT, and they were less extraverted with lower emotional stability and self-esteem. Although MTurk participants were slightly more risk-averse and showed a stronger certainty effect than the student population (they were more likely than student participants to prefer payoffs with certainty compared with gambles with higher expected values), in general, MTurk participants exhibited the same classic decision-making biases. MTurk participants were risk-averse for gains, risk-seeking for losses, susceptible to the certainty effect, present biased, and exhibited delay/ expedite asymmetries. GENERAL DISCUSSION In two studies, we compared participants from a new and increasingly popular online labor market, MTurk, with participants from traditional community and student samples. Our results not only showed many similarities between MTurk and these traditional samples, but we also identiﬁed important dimensions on which MTurk participants differed. First, in our IMC attention test (Oppenheimer et al., 2009), which required careful reading of study materials and proﬁcient English comprehension, our results showed that MTurk participants performed more poorly compared with student participants. It is important to note, however, that participants in all groups failed at a sizeable rate. More importantly, we found that simply administering the attention test and ﬁltering participants by whether they correctly answered the IMC or not reduced statistical noise; including participants who failed the IMC reduced the likelihood of ﬁnding statistically signiﬁcant differences between groups. Although we found that IMC failure was correlated with a participant being from outside the US and ESL, the results suggest the IMC was the most efﬁcient ﬁlter: the IMC excluded fewer people and uncovered more statistically signiﬁcant differences. Second, in an anchoring and adjustment task in Study 1, we found a signiﬁcant number of MTurk participants who “estimated” the precisely correct number of countries in Africa. No community participants estimated these precisely correct answers, suggesting that MTurk participants are using their freedom from supervision to check answers online. The good news for researchers is that there are easy ways to reduce this “cheating.” A follow-up study revealed that although small incentives signiﬁcantly increased this tendency to cheat, simply asking participants to not look up answers signiﬁcantly reduced correct estimates. These ﬁndings ﬁt with the cheating behavior literature showing that subtle situational variations signiﬁcantly inﬂuence cheating rates (e.g., Mazar, Amir, & Ariely, 2008; Shu, Gino, & Bazerman, 2011). J. Behav. Dec. Making (2012) DOI: 10.1002/bdm Journal of Behavioral Decision Making Third, we found that MTurk participants were signiﬁcantly and consistently different on basic personality variables, including some Big Five personality dimensions (Gosling et al., 2003; John & Srivistava, 1999), and they also showed some differences on dimensions related to money and payoffs. Compared with non-MTurk participants, MTurk participants were less extraverted, less emotionally stable, and had lower self-esteem. MTurk participants also exhibited attitudes about money and time that were more similar to student participants than to community participants. MTurk participants valued money more than time compared with the community, were more likely to be “tightwads,” and valued material possessions more highly than the community. Compared with students, MTurk participants did not differ on these dimensions. It seems that MTurk participants may be similar to students in terms of their ﬁnancial outlook. It is important to note that we found many commonalities between MTurk participants and our traditional samples, contributing to the growing literature showing that MTurk participants give responses similar to other traditionally used samples (Buhrmester et al., 2011; Paolacci et al., 2010; Rand, 2011). MTurk participants were present biased, showed delay/expedite asymmetries, were risk-averse for gains, risk-seeking for losses, and showed the certainty effect—all with almost no signiﬁcant differences in effect sizes from other samples. Recommendations On the basis of the ﬁndings suggesting similarities between MTurk and traditional samples and the previous research showing the beneﬁts and reliability of MTurk (Buhrmester et al., 2011; Mason & Suri, 2012; Paolacci et al., 2010; Rand, 2011), we highly recommend MTurk to behavioral decisionmaking researchers because of its reliability, low cost, speed of data collection, and heterogeneity of participants. The beneﬁts notwithstanding, our results also suggest that important unique features of MTurk participants should be considered before selecting MTurk as a participant source. We recommend that researchers using MTurk take note of several factors. Initially, our ﬁndings suggest that an attention check is likely to help improve statistical power in all types of studies and reduce Type II error, but it may be especially useful with unsupervised samples such as MTurk. We caution researchers when using MTurk for studies that require participants to pay careful attention to study materials and instructions. For example, MTurk may not be appropriate for long or complicated studies in which participants may be more likely to lose attention and not follow instructions. Previous research (Paolacci et al., 2010) found that MTurk participants were equally attentive as other participants when the study was short (~5 minutes) and the attention check question had an easily identiﬁable answer. In our research, in which the studies were longer (~16 minutes) and the attention check question required careful reading, MTurk participants performed signiﬁcantly worse. Following prior research, we placed the attention check at the end of the survey and this Copyright © 2012 John Wiley & Sons, Ltd. may have increased the likelihood that participants failed the check because of fatigue. We speculate that placing the check earlier in the survey would decrease failure rates for both MTurk and traditional participants, but it is possible that MTurk participants were more susceptible to fatigue compared with the students. Our results from the CRT also suggest that MTurk participants may not be as motivated as student participants to engage in deliberate System 2 cognitive processing. Future research should continue to examine the boundaries of MTurk participants’ attention and concentration. One way to mitigate the lower attention associated with MTurk samples could be to emphasize the scientiﬁc importance of the study to participants and encourage them to be attentive. Although laboratory studies do not always employ these measures, the presence of experimenters in a university setting and/or more complicated procedures may automatically have that effect in the lab, which is difﬁcult to replicate online. We also recommend caution when researchers ask MTurk participants questions with factual answers, unless one wants to test their Internet searching abilities. Although most behavioral study questions are not factual in nature, there are many examples that are, such as the questions commonly used in the anchoring and adjustment task (Tversky & Kahneman, 1974) in which we originally uncovered the fact-checking problem, as well as those that measure competencies, for example, in the CRT (Frederick, 2005). When using such factual knowledge questions is unavoidable, researchers should consider using simple interventions that ask participants not to look up answers, as our results showed that they are effective in reducing the rate at which MTurk participants searched for correct answers. Finally, we suggest that researchers carefully consider how individual differences may inﬂuence their investigations. Speciﬁcally, researchers should take note when examining and interpreting results from MTurk participants that are related to extraversion and self-esteem—for which MTurk workers scored consistently lower—and money and spending—for which MTurk workers’ attitudes were similar to students but different from a community sample. Conclusions Recent research about the use of MTurk for behavioral research has concluded that MTurk has many beneﬁts, making it suitable for a wide range of behavioral research. We agree—we found that MTurk participants produced reliable results that are consistent with previous decisionmaking research. However, we also found important differences between MTurk participants and community and student participants. To mitigate concerns that may arise from these differences, we recommend that researchers use screening procedures to measure participants’ attention levels and take into account that MTurk participants may vary from non-MTurk participants on social and ﬁnancial traits. J. Behav. Dec. Making (2012) DOI: 10.1002/bdm J. K. Goodman et al. Data Collection in a Flat World APPENDIX. WORDING OF THE INSTRUCTIONAL MANIPULATION CHECK QUESTION: Study 1 Research in decision making shows that people, when making decisions and answering questions, prefer not to pay attention and minimize their effort as much as possible. Some studies show that over 50% of people don’t carefully read questions. If you are reading this question and have read all the other questions, please select the box marked ‘other’ and type ‘Decision Making’ in the box below. Do not select “predictions of your own behavior.” Thank you for participating and taking the time to read through the questions carefully! What was this study about? A B C D Predictions of your own behavior Lions Tigers Other _______________________________ Study 2 Research in decision making shows that people, when making decisions and answering questions, prefer not to pay attention and minimize their effort as much as possible. Some studies show that over 50% of people don’t carefully read questions. If you are reading this question and have read all the other questions, please select the box marked ‘other’ and type ‘Decision Making’ in the box below. Do not select “predictions of your own behavior.” Thank you for participating and taking the time to read through the questions carefully! What was this study about? A B C D Predictions of your own behavior Predictions of your friends’ behavior Political preferences Other _______________________________ ACKNOWLEDGEMENTS The ﬁrst two authors contributed equally to this manuscript. We thank Selin Malkoc, Reade Alexander, Josh Morris, Matt Williams, Derek Wilton, Lab Manager Rachel London, and the entire CB Research Lab for research assistance, and the Center for Behavioral Decision Research at Carnegie Mellon University for assistance in data collection. REFERENCES Barron, G., & Erev, I. (2003). Small feedback-based decisions and their limited correspondence to description-based decisions. Journal of Behavioral Decision Making, 16, 215–233. Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of cheap, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. Chatterjee, S., Heath, T. B., Milberg, S. J., & France, K. R. (2000). The differential processing of price in gains and losses: The Copyright © 2012 John Wiley & Sons, Ltd. effects of frame and need for cognition. Journal of Behavioral Decision Making, 13, 61–75. Cryder, C., & Loewenstein, G. (2010). The time versus money scale. Unpublished data, Olin Business School, Washington University in St. Louis. Eriksson, K., & Simpson, B. (2010). Emotional reactions to losing explain gender differences in entering a risky lottery. Judgment and Decision Making, 5, 159–163. Frederick, S. (2005). Cognitive reﬂection and decision making. Journal of Economic Perspectives, 19, 25–42. Friedman, T. L. (2005). The world is ﬂat: A brief history of the twenty-ﬁrst century. New York, NY: Farrar, Straus, and Giroux. Gosling, S. D., Rentfrow, P. J., & Swann Jr., W. B. (2003). A very brief measure of the big-ﬁve personality domains. Journal of Research in Personality, 37, 504–528. Hertwig, R., Barron G., Weber, E. U., & Erev, I. (2004). Decisions from experience and the effect of rare events in risky choice. Psychological Science, 8, 534–539. Horton J. J., Rand D. G., & Zeckhauser R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14, 399–425. Ipeirotis, P. (2010). Demographics of Mechanical Turk. (CeDER Working Paper-10-01). New York University. Retrieved from http://hdl.handle.net/2451/29585. John, O. P., & Srivistava, S. (1999). The big ﬁve trait taxonomy: History, measurement, and theoretical perspectives. In L. A. Pervin & O. P. John (Eds.), Handbook of personality: Theory and research (pp. 102–138). New York/London: Guilford Press. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–292. LeBoeuf, R. A., & Shaﬁr, E. (2003). Deep thoughts and shallow frames: On the susceptibility to framing effects. Journal of Behavioral Decision Making, 16, 77–92. Loewenstein, G. (1988). Frames of mind in intertemporal choice. Management Science, 34, 200–214. Lynch Jr., J. G. (1982). On the external validity of experiments in consumer research. Journal of Consumer Research, 9, 225–239. Malkoc, S. A., & Zauberman G. (2006). Deferring versus expediting consumption: The effect of outcome concreteness on sensitivity to time horizon. Journal of Marketing Research, 43, 618–627. Mason, W. & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavioral Research Methods, 44, 1–23. Mazar, N., Amir, O., Ariely, D. (2008). The dishonesty of honest people: A theory of self-concept maintenance. Journal of Marketing Research, 45, 633–644. Nenkov, G., Morrin, M., Ward, A., Schwartz, B., & Hulland J. (2008). A short form of the maximization scale: Factor structure, reliability, and validity studies. Judgment and Decision Making, 3, 371–388. Oppenheimer, D., Meyvis T., & Davidenko N. (2009). Instructional manipulation checks: Detecting satisﬁcing to increase statistical power. Journal of Experimental Social Psychology, 45, 867–872. Oppenheimer, D. M., LeBoeuf, R. A., & Brewer, N. T. (2008). Anchors aweigh: A demonstration of cross-modality anchoring and magnitude priming. Cognition, 106, 13–26. Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419. Parker, A., & Fischoff, B. (2005). Decision-making competence: External validation through an individual-differences approach. Journal of Behavioral Decision Making, 18, 1–27. Peterson, R. A. (2001). On the use of college students in social science research: Insights from a second-order meta-analysis. Journal of Consumer Research, 28, 450–461. Rand, D. G. (2011). The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments. Journal of Theoretical Biology, DOI: 10.1016/j.jtbi.2011.03.004. Richins, M. L. (2004). The material values scale: Measurement properties and development of a short form. Journal of Consumer Research, 31, 209–219. J. Behav. Dec. Making (2012) DOI: 10.1002/bdm Journal of Behavioral Decision Making Rick, S. I., Cryder, C. E., & Loewenstein, G. (2008). Tightwads and spendthrifts. Journal of Consumer Research, 34, 767–782. Robins, R. W., Hendin, H. M., & Trzesniewski, K. H. (2001). Measuring global self-esteem: Construct validation of a singleitem measure and the Rosenberg Self-Esteem Scale. Personality and Social Psychology Bulletin, 27, 151–161. Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge: Cambridge University Press. Shu, L. L., Gino, F., & Bazerman, M. H. (2011). Dishonest deed, clear conscience: When cheating leads to moral disengagement and motivated forgetting. Personality and Social Psychology Bulletin, 37, 330–349. Suri, S., & Watts, D. J. (2011). Cooperation and contagion in Webbased, networked public goods experiments. PLoS One, 6(3), 1–8. Thaler, R. H. (1981). Some empirical evidence on dynamic inconsistency. Economic Letters, 8, 201–207. Trzesniewski, K. H., Donnellan, M. B., & Robins, R. W. (2003). Stability of self-esteem across the life-span. Journal of Personality and Social Psychology, 84, 205–220. Tversky, A., & Kahneman D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 211, 453–458. Tversky, A., & Kahneman D. (1981). The framing of decisions and the psychology of choice. Science, 211, 453–458. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of Uncertainty. Journal of Risk and Uncertainty, 5, 297–323. Winer, R. S. (1999). Experimentation in the 21st century: The importance of external validity. Journal of the Academy of Marketing Science, 27, 349–358. Copyright © 2012 John Wiley & Sons, Ltd. Authors’ biographies: Joseph K. Goodman is an Assistant Professor of Marketing, Olin Business School, Washington University in St. Louis. He earned his PhD in marketing from The University of Texas at Austin. His research interests are in consumer variety seeking and choice overload, and how consumer experiences and superstitions inﬂuence happiness. Cynthia Cryder is an Assistant Professor of Marketing, Olin Business School, Washington University in St. Louis. She earned her PhD in Behavioral Decision Research from Carnegie Mellon. Her research focuses on altruistic and ﬁnancial decisions and has been published in journals including Psychological Science and the Journal of Consumer Research. Amar Cheema is an Associate Professor of Marketing, McIntire School of Commerce, University of Virginia. He earned his PhD in marketing from the University of Colorado at Boulder. His research interests include behavioral decision theory, pricing and promotion effects, auctions and online purchase behavior, and word-of-mouth inﬂuences. Authors’ addresses: Joseph K. Goodman and Cynthia E. Cryder, Olin Business School, Washington University in St. Louis, Saint Louis, MO, USA. Amar Cheema, McIntire School of Commerce, University of Virginia, Charlottesville, VA, USA. J. Behav. Dec. Making (2012) DOI: 10.1002/bdm

RELATED PAPERS

RELATED TOPICS

Log In

Data Collection in a Flat World: Strengths and Weaknesses of Mechanical Turk Samples

Data Collection in a Flat World: Strengths and Weaknesses of Mechanical Turk Samples

Related Papers

RELATED PAPERS

RELATED TOPICS