Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Making better use of the crowd: how crowdsourcing can advance machine learning research

Published: 01 January 2017 Publication History

Abstract

This survey provides a comprehensive overview of the landscape of crowdsourcing research, targeted at the machine learning community. We begin with an overview of the ways in which crowdsourcing can be used to advance machine learning research, focusing on four application areas: 1) data generation, 2) evaluation and debugging of models, 3) hybrid intelligence systems that leverage the complementary strengths of humans and machines to expand the capabilities of AI, and 4) crowdsourced behavioral experiments that improve our understanding of how humans interact with machine learning systems and technology more broadly. We next review the extensive literature on the behavior of crowdworkers themselves. This research, which explores the prevalence of dishonesty among crowdworkers, how workers respond to both monetary incentives and intrinsic forms of motivation, and how crowdworkers interact with each other, has immediate implications that we distill into best practices that researchers should follow when using crowdsourcing in their own research. We conclude with a discussion of additional tips and best practices that are crucial to the success of any project that uses crowdsourcing, but rarely mentioned in the literature.

References

[1]
Jacob Abernethy, Yiling Chen, and Jennifer Wortman Vaughan. Efficient market making via convex optimization, and a connection to online learning. ACM Transactions on Economics and Computation, 1(2):Article 12, 2013.
[2]
Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford, and Aleksandrs Slivkins. How many workers to ask? Adaptive exploration for collecting high quality labels. In SIGIR, 2016.
[3]
Arpit Agarwal, Debmalya Mandal, David C. Parkes, and Nisarg Shah. Peer prediction with heterogeneous users. In ACM EC, 2017.
[4]
Omar Alonso. Implementing crowdsourcing-based relevance experimentation: An industrial perspective. Information Retrieval, 16(2):101-120, 2013.
[5]
Omar Alonso, Daniel E. Rose, and Benjamin Stewart. Crowdsourcing for relevance evaluation. ACM SigIR Forum, 42(2):9-15, 2008.
[6]
Vamshi Ambati, Stephan Vogel, and Jaime Carbonell. Collaborative workow for crowdsourcing translation. In CSCW, 2012.
[7]
Paul André, Haoqi Zhang, Juho Kim, Lydia B. Chilton, Steven P. Dow, and Robert C. Miller. Community clustering: Leveraging an academic crowd to form coherent conference sessions. In HCOMP, 2013.
[8]
Julia Angwin, Je Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There's software used across the country to predict future criminals and it's biased against blacks. ProPublica article accessed at https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing, 2016.
[9]
Pavel D. Atanasov, Phillip Rescober, Eric Stone, Samuel A. Swift, Emile Servan-Schreiber, Philip E. Tetlock, Lyle Ungar, and Barbara Mellers. Distilling the wisdom of crowds: Prediction markets versus prediction polls. Management Science, 63(3):691-706, 2017.
[10]
Bahadir Ismail Aydin, Yavuz Selim Yilmaz, Yaliang Li, Qi Li, Jing Gao, and Murat Demirbas. Crowdsourcing for multiple-choice question answering. In AAAI, 2014.
[11]
Solon Barocas and Andrew Selbst. Big data's disparate impact. California Law Review, 104, 2016.
[12]
Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, and Lyle H. Ungar. Two reasons to make aggregated probability forecasts more extreme. Decision Analysis, 11(2):133-145, 2014.
[13]
Joyce Berg, Robert Forsythe, Forrest Nelson, and Thomas Rietz. Results from a dozen years of election futures markets research. Handbook of experimental economics results, 1:742-751, 2008.
[14]
Michael Bernstein, Greg Little, Rob Miller, Bjoern Hartmann, Mark Ackerman, David Karger, David Crowell, and Katrina Panovich. Soylent: A word processor with a crowd inside. In UIST, 2010.
[15]
Anant Bhardwaj, Juho Kim, Steven P. Dow, David Karger, Sam Madden, Robert C. Miller, and Haoqi Zhang. Attendee-sourcing: Exploring the design space of community-informed conference scheduling. In HCOMP, 2014.
[16]
Jeffrey P. Bigham. Reaching dubious parity with hamstrung humans. Blog post accessed at http://jeffreybigham.com/blog/2017/reaching-dubious-parity-with-hamstrung-humans.html, 2017.
[17]
David M. Blei and John D. Lafferty. Topic models. Text mining: Classification, clustering, and applications, 10(71):34, 2009.
[18]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993-1022, 2003.
[19]
Jordan Boyd-Graber, Yuening Hu, and David Mimno. Applications of topic models. Foundations and Trends in Information Retrieval, 11(2-3):143-296, 2017.
[20]
Jonathan Bragg, Mausam, and Daniel S. Weld. Crowdsourcing multi-label classification for taxonomy creation. In HCOMP, 2013.
[21]
Michael Brooks, Saleema Amershi, Bongshin Lee, Steven Drucker, Ashish Kapoor, and Patrice Simard. FeatureInsight: Visual support for error-driven feature ideation in text classification. In IEEE VAST, 2015.
[22]
Michael Buhrmester, Tracy Kwang, and Samuel D. Gosling. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1):3-5, 2011.
[23]
Chris Callison-Burch and Mark Dredze. Creating speech and language data with Amazon's Mechanical Turk. In NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, 2010.
[24]
Alex Campolo, Madelyn Sanfilippo, Meredith Whittaker, and Kate Crawford. AI Now 2017 Report. Accessed at https://ainowinstitute.org/AI_Now_2017_Report.pdf, 2017.
[25]
Logan Casey, Jesse Chandler, Adam Seth Levine, Andrew Proctor, and Dara Z. Strolovitch. Intertemporal differences among MTurk worker demographics. Working paper on PsyArXiv, 2017.
[26]
Dana Chandler and Adam Kapelner. Breaking monotony with meaning: Motivation in crowdsourcing markets. Journal of Economic Behavior and Organization, 90:123-133, 2013.
[27]
Jesse Chandler, Pam Mueller, and Gabriele Paolacci. Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46(1):112-130, 2014.
[28]
Jesse J. Chandler and Gabriele Paolacci. Lie for a dime: When most prescreening responses are honest but most study participants are imposters. Social Psychological and Personality Science, 8(5):500-508, 2017.
[29]
Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei. Reading tea leaves: How humans interpret topic models. In NIPS, 2009.
[30]
Shuchi Chawla, Jason D. Hartline, and Balasubramanian Sivan. Optimal crowdsourcing contests. Games and Economic Behavior, 2015.
[31]
Yiling Chen and David M. Pennock. A utility framework for bounded-loss market makers. In UAI, 2007.
[32]
Yiling Chen and Jennifer Wortman Vaughan. A new understanding of prediction markets via no-regret learning. In ACM EC, 2010.
[33]
Yiling Chen, Arpita Ghosh, Michael Kearns, Tim Roughgarden, and Jennifer Wortman Vaughan. Mathematical foundations of social computing. Communications of the ACM, 59(12):102-108, December 2016.
[34]
Lydia Chilton, Juho Kim, Paul André, Felicia Cordeiro, James Landay, DanWeld, Steven P. Dow, Robert C. Miller, and Haoqi Zhang. Frenzy: Collaborative data organization for creating conference sessions. In CHI, 2014.
[35]
Lydia B. Chilton, Greg Little, Darren Edge, Daniel S.Weld, and James A. Landay. Cascade: Crowdsourcing taxonomy creation. In CHI, 2013.
[36]
Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, Special Issue on Social and Technical Trade-Offs, 2017.
[37]
Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In KDD, 2017.
[38]
Anirban Dasgupta and Arpita Ghosh. Crowdsourced judgement elicitation with endogenous proficiency. In WWW, 2013.
[39]
Susan B. Davidson, Sanjeev Khanna, Tova Milo, and Sudeepa Roy. Top-k and clustering with noisy comparisons. ACM Transactions on Database Systems, 39(4):35:1-39, 2014.
[40]
Philip Dawid and Allan Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics), 28(1):20-28, 1979.
[41]
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. Zencrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In WWW, 2012.
[42]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A largescale hierarchical image database. In CVPR, 2009.
[43]
Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1):114, 2015.
[44]
Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science, 2016.
[45]
Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panagiotis G. Ipeirotis, and Philippe Cudré-Mauroux. The dynamics of micro-task crowdsourcing: The case of Amazon MTurk. In WWW, 2015.
[46]
Dominic DiPalantino and Milan Vojnovic. Crowdsourcing and all-pay auctions. In ACM EC, 2009.
[47]
Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. CoRR arXiv:1702.08608, 2017.
[48]
Mary T. Dzindolet, Linda G. Pierce, Hall P. Beck, and Lloyd A. Dawe. The perceived utility of human and automated aids in a visual detection task. Human Factors, 44(1): 79-94, 2002.
[49]
Robert C. Edgar and Serafim Batzoglou. Multiple sequence alignment. Current opinion in structural biology, 16(3):368-373, 2006.
[50]
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, and Jianhua Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, 2015.
[51]
Oluwaseyi Feyisetan, Elena Simperl, Max Van Kleek, and Nigel Shadbolt. Improving paid microtasks through gamification and adaptive furtherance incentives. In WWW, 2015.
[52]
Urs Fischbacher and Franziska Föllmi-Heusi. Lies in disguise: An experimental study on cheating. Journal of the European Economic Association, 11(3):525-547, 2013.
[53]
Chao Gao, Yu Lu, and Dengyong Zhou. Exact exponent in optimal rates for crowdsourcing. In ICML, 2016.
[54]
Xi Alice Gao, Yoram Bachrach, Peter Key, and Thore Graepel. Quality expectation-variance tradeoffs in crowdsourcing contests. In AAAI, 2012.
[55]
Xi Alice Gao, Andrew Mao, Yiling Chen, and Ryan Prescott Adams. Trick or treat: Putting peer prediction to the test. In ACM EC, 2014.
[56]
Yashesh Gaur, Florian Metze, Yajie Miao, and Jeffrey P. Bigham. Using keyword spotting to help humans correct captioning faster. In INTERSPEECH, 2015.
[57]
Yashesh Gaur, Florian Metze, and Jeffrey P. Bigham. Manipulating word lattices to incorporate human corrections. In INTERSPEECH, 2016.
[58]
Timnit Gebru, Jonathan Krause, Jia Deng, and Li Fei-Fei. Scalable annotation of fine-grained objects without experts. In CHI, 2017.
[59]
Arpita Ghosh, Satyen Kale, and Preston McAfee. Who moderates the moderators? Crowdsourcing abuse detection in user-generated content. In ACM EC, 2011.
[60]
Daniel G. Goldstein, R. Preston McAfee, and Siddharth Suri. The cost of annoying ads. In WWW, 2013.
[61]
Daniel G. Goldstein, Siddharth Suri, R. Preston McAfee, Matthew Ekstrand-Abueg, and Fernando Diaz. The economic and cognitive costs of annoying display advertisements. Journal of Marketing Research, 51(6):742--752, 2014.
[62]
Ryan Gomes, Peter Welinder, Andreas Krause, and Pietro Perona. Crowdclustering. In NIPS, 2011.
[63]
Joseph K. Goodman and Gabriele Paolacci. Crowdsourcing consumer research. Journal of Consumer Research, 44(1):196-210, 2017.
[64]
Mary L. Gray, Siddharth Suri, Syed Shoaib Ali, and Deepti Kulkarni. The crowd is a collaborative network. In CSCW, 2016.
[65]
Neha Gupta, David Martin, Benjamin V. Hanrahan, and Jacki O'Neil. Turk-life in India. In the Intnernational Conference on Supporting Groupwork, 2014.
[66]
Juho Hamari, Jonna Koivisto, and Harri Sarsa. Does gami_cation work? - A literature review of empirical studies on gamification. In Hawaii International Conference on System Sciences, 2014.
[67]
Robin Hanson. Combinatorial information market design. Information Systems Frontiers, 5(1):105-119, 2003.
[68]
Christopher G. Harris. You're hired! An examination of crowdsourcing incentive models in human resource tasks. In WSDM Workshop on Crwdsourcing for Search and Data Mining, 2011.
[69]
Jeffrey Heer and Michael Bostock. Crowdsourcing graphical perception: Using Mechanical Turk to assess visualization design. In CHI, 2010.
[70]
Hannes Heikinheimo and Antti Ukkonen. The crowd-median algorithm. In HCOMP, 2013.
[71]
Chien-Ju Ho and Jennifer Wortman Vaughan. Online task assignment in crowdsourcing markets. In AAAI, 2012.
[72]
Chien-Ju Ho, Shahin Jabbari, and Jennifer Wortman Vaughan. Adaptive task assignment for crowdsourced classification. In ICML, 2013.
[73]
Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. Incentivizing high quality crowdwork. In WWW, 2015.
[74]
Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. Journal of Artificial Intelligence Research, 55:317-359, 2016.
[75]
John J. Horton, David Rand, and Richard Zeckhauser. The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14(3):399-425, 2011.
[76]
Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. Interactive topic modeling. Machine Learning, 95:423-469, 2014.
[77]
Lilly C. Irani and M. Six Silberman. Turkopticon: Interrupting worker invisibility in Amazon Mechanical Turk. In CHI, 2013.
[78]
Ayush Jain, Akash Das Sarma, Aditya Parameswaran, and Jennifer Widom. Understanding workers, developing effective tasks, and enhancing marketplace dynamics: A study of a large crowdsourcing marketplace. Proceedings of the VLDB Endowment, 10(7):829-840, 2017.
[79]
Jongbin Jung, Connor Concannon, Ravi Shro, Sharad Goel, and Daniel G. Goldstein. Simple rules for complex decisions. CoRR arXiv:1702.04690, 2017.
[80]
Radu Jurca and Boi Faltings. Mechanisms for making crowds truthful. Journal of Artificial Intelligence Research, 34:209-253, 2009.
[81]
Ece Kamar. Directions in hybrid intelligence: Complementing AI systems with human intelligence. Abstract for IJCAI Early Career Spotlight Track Talk, 2016.
[82]
Ece Kamar and Eric Horvitz. Incentives for truthful reporting in crowdsourcing (short paper). In AAMAS, 2012.
[83]
Vijay Kamble, David Marn, Nihar Shah, Abhay Parekh, and Kannan Ramachandran. Truth serums for massively crowdsourced evaluation tasks. CoRR arXiv:1507.07045, 2015.
[84]
Theofanis Karaletsos, Serge Belongie, and Gunnar Rätsch. Bayesian representation learning with oracle constraints. In ICLR, 2016.
[85]
David Karger, Sewoong Oh, and Devavrat Shah. Iterative learning for reliable crowdsourcing systems. In NIPS, 2011.
[86]
David Karger, Sewoong Oh, and Devavrat Shah. Budget-optimal task allocation for reliable crowdsourcing systems. Operations Research, 62:1-24, 2014.
[87]
Gabriella Kazai. In search of quality in crowdsourcing for search engine evaluation. In ECIR, 2011.
[88]
Ashish Khetan and Sewoong Oh. Achieving budget-optimality with adaptive schemes in crowdsourcing. In NIPS, 2016.
[89]
Hyun-Chul Kim and Zoubin Ghahramani. Bayesian classiér combination. In AISTATS, 2012.
[90]
Joy Kim, Sarah Sterman, Allegra Argent Beal Cohen, and Michael S. Bernstein. Mechanical novel: Crowdsourcing complex work through reection and revision. In CSCW, 2017.
[91]
Juho Kim, Haoqi Zhang, Paul André, Lydia B. Chilton, Wendy Mackay, Michel Beaudouin-Lafon, Robert C. Miller, and Steven P. Dow. Cobi: A community-informed conference scheduling tool. In UIST, 2013.
[92]
Aniket Kittur, Ed Chi, and Bongwon Suh. Crowdsourcing user studies with Mechanical Turk. In CHI, 2008.
[93]
Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. Crowdforge: Crowdsourcing complex work. In UIST, 2011.
[94]
Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. The future of crowd work. In CSCW, 2013.
[95]
Pang Wei Koh and Percy Liang. Understanding black-box predictions via inuence functions. In ICML, 2017.
[96]
Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, and Kristen Grauman. Crowdsourcing in computer vision. Foundations and Trends in Computer Graphics and Vision, 10(3):177-243, 2016.
[97]
Chinmay E. Kulkarni, Richard Socher, Michael S. Bernstein, and Scott R. Klemmer. Scaling short-answer grading by combining peer assessment with algorithmic scoring. In ACM Conference on Learning@scale, 2014.
[98]
Raja S. Kushalnagar, Walter S. Lasecki, and Jeffrey P. Bigham. A readability evaluation of real-time crowd captions in the classroom. In ASSETS, 2012.
[99]
Walter S. Lasecki and Jeffrey P. Bigham. Online quality control for real-time crowd captioning. In ASSETS, 2012.
[100]
Walter S. Lasecki, Christopher D. Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey P. Bigham. Real-time captioning by groups of nonexperts. In UIST, 2012.
[101]
Walter S. Lasecki, Christopher D. Miller, and Jeffrey P. Bigham. Warping time for more effective real-time crowdsourcing. In CHI, 2013.
[102]
Walter S. Lasecki, Christopher D. Miller, Iftekhar Naim, Raja Kushalnagar, Adam Sadilek, Daniel Gildea, and Jeffrey P. Bigham. Scribe: Deep integration of human and machine intelligence to caption speech in real-time. Communications of the ACM, 60(11), 2017.
[103]
Edith Law, Ming Yin, Joslin Goh, Kevin Chen, Michael Terry, and Krzysztof Z. Gajos. Curiosity killed the cat, but makes crowdwork better. In CHI, 2016.
[104]
John Le, Andy Edmonds, Vaughn Hester, and Lukas Biewald. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In SIGIR Workshop on Crowdsourcing for Search Evaluation, 2010.
[105]
Tak Yeon Lee, Casey Dugan, Werner Geyer, Tristan Ratchford, Jamie Rasmussen, N. Sadat Shami, and Stela Lupushor. Experiments on motivational feedback for crowdsourced workers. In ICWSM, 2013.
[106]
Martin Lermen and Knut Reinert. The practical use of the A* algorithm for exact multiple sequence alignment. Journal of Computational Biology, 7(5):655-671, 2000.
[107]
Guoliang Li, Jiannan Wang, Yudian Zheng, and Michael J. Franklin. Crowdsourced data management: A survey. IEEE Transactions on Knowledge and Data Engineering, 28(9): 2296-2319, 2016.
[108]
Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment, 8(4):425-436, 2014a.
[109]
Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han. Resolving conicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, 2014b.
[110]
Zachary C. Lipton. The mythos of model interpretability. CoRR arXiv:1606.03490, 2016.
[111]
Leib Litman, Jonathan Robinson, and Cheskie Rosenzweig. The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk. Behavioral Research Methods, 47(2):519-528, 2014.
[112]
Qiang Liu, Jian Peng, and Alexander Ihler. Variational inference for crowdsourcing. In NIPS, 2012a.
[113]
Qiang Liu, Mark Steyvers, and Alexander Ihler. Scoring workers in crowdsourcing: How many control questions are enough? In NIPS, 2013.
[114]
Xuan Liu, Meiyu Lu, Beng Chin Ooi, Yanyan Shen, Sai Wu, and Meihui Zhang. CDAS: A crowdsourcing data analytics system. Proceedings of the VLDB Endowment, 5(10):1040- 1051, 2012b.
[115]
Stuart P. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 2(28):129-137, 1982.
[116]
Jennifer M. Logg. Theory of machine: When do people rely on algorithms? Harvard Business School NOM Unit Working Paper No. 17-086, 2017.
[117]
Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible models for classification and regression. In KDD, 2012.
[118]
Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. In KDD, 2013.
[119]
Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, and Jiawei Han. FaitCrowd: Fine grained truth discovery for crowdsourced data aggregation. In SIGMOD, 2015.
[120]
Thomas W. Malone. What makes things fun to learn? Heuristics for designing instructional computer games. In ACM SIGSMALL Symposium and the First SIGPC Symposium on Small Systems, 1980.
[121]
Thomas W. Malone. Heuristics for designing enjoyable user interfaces: Lessons from computer games. In CHI, 1982.
[122]
Adam Marcus, Eugene Wu, David Karger, Samuel Madden, and Robert Miller. Human-powered sorts and joins. Proceedings of the VLDB Endowment, 5(1):13-24, 2011.
[123]
David Martin, Benjamin V. Hanrahan, Jacki O'Neill, and Neha Gupta. Being a Turker. In CSCW, 2014.
[124]
Winter Mason and Siddharth Suri. Conducting behavioral research on Amazon's Mechanical Turk. Behavior Research Methods, 44(1):1-23, 2012.
[125]
Winter Mason and Duncan J. Watts. Financial incentives and the \performance of crowds". In HCOMP, 2009.
[126]
Arya Mazumdar and Barna Saha. Clustering via crowdsourcing. CoRR arXiv:1604.01839, 2016.
[127]
Barbara Mellers, Eric Stone, Pavel Atanasov, Nick Rohrbaugh, S. Emlen Metz, Lyle Ungar, Michael M. Bishop, Michael Horowitz, Ed Merkle, and Philip Tetlock. The psychology of intelligence analysis: Drivers of prediction accuracy in world politics. Journal of Experimental Psychology: Applied, 21(2):1-14, 2015a.
[128]
Barbara Mellers, Eric Stone, Terry Murray, Angela Minster, Nick Rohrbaugh, Michael Bishop, Eva Chen, Joshua Baker, Yuan Hou, Michael Horowitz, Lyle Ungar, and Philip Tetlock. Identifying and cultivating superforecasters as a method of improving probabilistic predictions. Perspectives on Psychological Science, 10(2):267-281, 2015b.
[129]
Nolan Miller, Paul Resnick, and Richard Zeckhauser. Eliciting informative feedback: The peer-prediction method. Management Science, 51(9):1359-1373, 2005.
[130]
Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, and Devi Parikh. Analyzing semantic segmentation using hybrid human-machine CRFs. In CVPR, 2013.
[131]
Roozbeh Mottaghi, Sanja Fidler, Alan Yuille, Raquel Urtasun, and Devi Parikh. Human-machine CRFs for identifying bottlenecks in scene understanding. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.
[132]
Iftekhar Naim, Daniel Gildea, Walter Lasecki, and Jeffrey P. Bigham. Text alignment for real-time crowd captioning. In NAACL, 2013.
[133]
David Newman, Edwin V. Bonilla, and Wray Buntine. Improving topic coherence with regularized topic models. In NIPS, 2011.
[134]
Besmira Nushi, Ece Kamar, Donald Kossmann, and Eric Horvitz. On human intellect and machine failures: Troubleshooting integrative machine learning systems. In AAAI, 2017.
[135]
David Oleson, Alexander Sorokin, Greg Laughlin, Vaughn Hester, John Le, and Lukas Biewald. Programmatic gold: Targeted and scalable quality assurance in crowdsourcing. In HCOMP, 2011.
[136]
Gabriele Paolacci, Jesse Chandler, and Panagiotis G. Ipeirotis. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5:411-419, 2010.
[137]
Devi Parikh and C. Lawrence Zitnick. Human-debugging of machines. In Second NIPS Workshop on Computational Social Science and the Wisdom of Crowds, 2011.
[138]
Kayur Patel, Naomi Bancroft, Steven M. Drucker, James Fogarty, Andrew J. Ko, and James A. Landay. Gestalt: Integrated support for implementation and analysis in machine learning processes. In UIST, 2010.
[139]
Genevieve Patterson and James Hays. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In CVPR, 2012.
[140]
Genevieve Patterson, Chen Xu, Hang Su, and James Hays. The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108(1-2):59-81, 2014.
[141]
Michael J. Paul. Interpretable machine learning: Lessons from topic modeling. In CHI Workshop on Human-Centered Machine Learning, 2016.
[142]
Michael J. Paul and Mark Dredze. Discovering health topics in social media using topic models. PloS one, 9(8):e103408, 2014.
[143]
Ellie Pavlick, Matt Post, Ann Irvine, Dmitry Kachaev, and Chris Callison-Burch. The language demographics of Amazon Mechanical Turk. Transactions of the Association for Computational Linguistics, 2:79-92, 2014.
[144]
Eyal Peera, Laura Brandimarteb, Sonam Samatc, and Alessandro Acquistic. Beyond the Turk: An empirical comparison of alternative platforms for crowdsourcing online behavioral research. Journal of Experimental Social Psychology, 70:153-163, 2017.
[145]
Matt Post, Chris Callison-Burch, and Miles Osborne. Constructing parallel corpora for six Indian languages via crowdsourcing. In Seventh Workshop on Statistical Machine Translation, 2012.
[146]
Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. Manipulating and measuring model interpretability. CoRR arXiv:1802.07810, 2018.
[147]
Drazen Prelec. A Bayesian truth serum for subjective data. Science, 306(5695):462-466, 2004.
[148]
Drazen Prelec, H. Sebastian Seung, and John McCoy. A solution to the single-question crowd wisdom problem. Nature, 541:532-535, 2017.
[149]
Marianne Promberger and Jonathan Baron. Do patients trust computers? Journal of Behavioral Decision Making, 19:455-468, 2006.
[150]
Goran Radanovic and Boi Faltings. A robust Bayesian truth serum for non-binary signals. In AAAI, 2013.
[151]
Goran Radanovic, Boi Faltings, and Radu Jurca. Incentives for effort in crowdsourcing using the peer truth serum. ACM Transactions on Intelligent Systems and Technology, 7(4):1-28, 2016.
[152]
Srinivas Rao and Amanda Michel. ProPublica's guide to Mechanical Turk. ProPublica article accessed at https://www.propublica.org/article/propublicas-guide-to-mechanical-turk, 2016.
[153]
Vikas C. Raykar, Shipeng Yu, Linda H. Zhao, Gerardo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. Learning from crowds. Journal of Machine Learning Research, 11: 1297-1322, 2010.
[154]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should I trust you?: Explaining the predictions of any classiér. In KDD, 2016.
[155]
Jakob Rogstadius, Vassilis Kostakos, Aniket Kittur, Boris Smus, Jim Laredo, and Maja Vukovic. An assessment of intrinsic and extrinsic motivation on task performance in crowdsourcing markets. In ICWSM, 2011.
[156]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211-252, 2015a.
[157]
Olga Russakovsky, Li-Jia Li, and Li Fei-Fei. Best of both worlds: Human-machine collaboration for object annotation. In CVPR, 2015b.
[158]
Niloufar Salehi, Lilly Irani, Michael Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and Clickhappier. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In CHI, 2015.
[159]
Niloufar Salehi, Jaime Teevan, Shamsi Iqbal, and Ece Kamar. Communicating context to the crowd for complex writing tasks. In CSCW, 2017.
[160]
Paul J. H. Schoemaker and Philip E. Tetlock. Superforecasting: How to upgrade your company's judgment. Harvard Business Review, 94:72-78, 2016.
[161]
Devavrat Shah and Christina E. Lee. Reducing crowdsourcing to graphon estimation, statistically. In AISTATS, 2018.
[162]
Nihar Shah, Dengyong Zhou, and Yuval Peres. Approval voting and incentives in crowdsourcing. In ICML, 2015.
[163]
Nihar B. Shah and Dengyong Zhou. Double or nothing: Multiplicative incentive mechanisms for crowdsourcing. Journal of Machine Learning Research, 17(165):1-52, 2016a.
[164]
Nihar B. Shah and Dengyong Zhou. No oops, you won't do it again: Mechanisms for self-correction in crowdsourcing. In ICML, 2016b.
[165]
Aaron D. Shaw, John J. Horton, and Daniel L. Chen. Designing incentives for inexpert human raters. In CSCW, 2011.
[166]
Victor Sheng, Foster Provost, and Panagiotis Ipeirotis. Get another label? Improving data quality using multiple, noisy labelers. In KDD, 2008.
[167]
Victor Shnayder, Arpit Agarwal, Rafael M. Frongillo, and David C. Parkes. Informed truthfulness in multi-task peer prediction. In ACM EC, 2016.
[168]
M. Six Silberman, Lilly Irani, and Joel Ross. Ethics and tactics of professional crowdwork. XRDS: Crossroads, The ACM Magazine for Students, 17(2):39-43, 2010.
[169]
M. Six Silberman, Kristy Milland, Rochelle LaPlant, Joel Ross, and Lilly Irani. Stop citing Ross et al. 2010, "Who are the crowdworkers?". Accessed at https://medium.com/@silberman/stop-citing-ross-et-al-2010-who-are-the-crowdworkers-b3b9b1e8d300, 2015.
[170]
Patrice Y. Simard, Saleema Amershi, David M. Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and John Wernsing. Machine teaching: A new paradigm for building machine learning systems. CoRR arXiv:1707.06742, 2017.
[171]
Daniel J. Simons and Christopher F. Chabris. Common (mis)beliefs about memory: A replication and comparison of telephone and Mechanical Turk survey methods. PLoS ONE, 7(12), 2012.
[172]
Rashmi R. Sinha and Kirsten Swearingen. Comparing recommendations made by online systems and friends. In DELOS workshop: Personalisation and recommender systems in digital libraries, 2001.
[173]
Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. Cheap and fast--but is it good?: Evaluating non-expert annotations for natural language tasks. In EMNLP, 2008.
[174]
Alexander Sorokin and David Forsyth. Utility data annotation with Amazon Mechanical Turk. In CVPRW, June 2008.
[175]
Neil Stewart, Christoph Ungemach, Adam J. L. Harris, Daniel M. Bartels, Ben R. Newell, Gabriele Paolacci, and Jesse Chandler. The average laboratory samples a population of 7,300 Amazon Mechanical Turk workers. Judgment and Decision Making, September 2015.
[176]
Hao Su, Jia Deng, and Li Fei-Fei. Crowdsourcing annotations for visual object detection. In HCOMP, 2012.
[177]
Siddharth Suri and Duncan J. Watts. Cooperation and contagion in web-based, networked public goods experiments. PLoS ONE, 6(3), 2011.
[178]
Siddharth Suri, Daniel Goldstein, and Winter Mason. Honesty in an online labor market. In HCOMP, 2011.
[179]
Latanya Sweeney. Discrimination in online ad delivery. Communications of the ACM, 56 (5):44-54, 2013.
[180]
Omer Tamuz, Ce Liu, Serge Belongie, Ohad Shamir, and Adam Kalai. Adaptively learning the crowd kernel. In ICML, 2011.
[181]
Jaime Teevan, Shamsi Iqbal, and Curtis von Veh. Supporting collaborative writing with microtasks. In CHI, 2016.
[182]
Philip E. Tetlock, Barbara A. Mellers, and J. Peter Scoblic. Bringing probability judgments into policy debates via forecasting tournaments. Science, 355(6324):481-483, 2017.
[183]
Tian Tian and Jun Zhu. Max-margin majority voting for learning from crowds. In NIPS, 2015.
[184]
Lyle H. Ungar, Barbara A. Mellers, Ville Satopää, Philip Tetlock, and Jon Baron. The good judgment project: A large scale test of different methods of combining expert predictions. In AAAI Fall Symposium: Machine Aggregation of Human Judgment, 2012.
[185]
Blase Ur, Jonathan Bees, Sean M. Segreti, Lujo Bauer, and Lorrie Faith Cranor Nicolas Christin. Do users' perceptions of password security match reality? In CHI, 2016.
[186]
Berk Ustun and Cynthia Rudin. Supersparse linear integer models for optimized medical scoring systems. Machine Learning Journal, 102(3):349-391, 2016.
[187]
Donna Vakharia and Matthew Lease. Beyond Mechanical Turk: An analysis of paid crowd work platforms. In Proceedings of the iConference, 2015.
[188]
Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. Community-based Bayesian aggregation models for crowdsourcing. In WWW, 2014.
[189]
Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. Crowdsourcing algorithms for entity resolution. Proceedings of the VLDB Endowment, 7(12):1071-1082, 2014.
[190]
Sudheendra Vijayanarasimhan and Kristen Grauman. What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In CVPR, 2009.
[191]
Ramya Korlakai Vinayak and Babak Hassibi. Crowdsourced clustering: Querying edges vs triangles. In NIPS, 2016.
[192]
Ramya Korlakai Vinayak, Samet Oymak, and Babak Hassibi. Graph clustering with missing data: Convex algorithms and analysis. In NIPS, 2014.
[193]
Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In CHI, 2004.
[194]
Luis von Ahn and Laura Dabbish. General techniques for designing games with a purpose. Communications of the ACM, 51(8):58-67, August 2008.
[195]
Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford. CAPTCHA: Using hard AI problems for security. In EUROCRYPT, 2003.
[196]
Luis von Ahn, Manuel Blum, and John Langford. Telling humans and computers apart automatically. Communications of the ACM, pages 56-60, 2004.
[197]
Luis von Ahn, Mihir Kedia, and Manuel Blum. Verbosity: A game for collecting commonsense knowledge. In CHI Notes, 2006.
[198]
Luis von Ahn, Benjamin Maurer, Colin McMillen, David Abraham, and Manuel Blum. reCAPTCHA: Human-based character recognition via web security measures. Science, 321(5895):1465-1468, 2008.
[199]
Bo Waggoner and Yiling Chen. Output agreement mechanisms and common knowledge. In HCOMP, 2014.
[200]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
[201]
Catherine Wah, Grant Van Horn, Steve Branson, Subhransu Maji, Pietro Perona, and Serge Belongie. Similarity comparisons for interactive fine-grained categorization. In CVPR, 2014.
[202]
Gang Wang, Christo Wilson, Xiaohan Zhao, Yibo Zhu, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao. Serf and turf: Crowdturfing for fun and profit. In WWW, 2012a.
[203]
Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. CrowdER: Crowdsourcing entity resolution. Proceedings of the VLDB Endowment, 5(11):1483-1494, 2012b.
[204]
Fabian L. Wauthier, Nebojsa Jojic, and Michael I. Jordan. Active spectral clustering via iterative uncertainty reduction. In KDD, 2012.
[205]
Peter Welinder, Steve Branson, Serge Belongie, and Perona Pietro. The multidimensional wisdom of crowds. In NIPS, 2010.
[206]
Kathryn Sharpe Wessling, Joel Huber, and Oded Netzer. Character misrepresentation by Amazon Turk workers: Assessment and solutions. Journal of Consumer Research, 44(1):211-230, 2017.
[207]
Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R. Movellan, and Paul L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, 2009.
[208]
Michael Wilber, Sam Kwak, and Serge Belongie. Cost-effective hits for relative similarity comparisons. In HCOMP, 2014.
[209]
Vanessa Williamson. On the ethics of crowdsourced research. Political Science & Politics, 49(1):77-81, 2016.
[210]
Jens Witkowski and David C. Parkes. A robust Bayesian truth serum for small populations. 2012.
[211]
Justin Wolfers and Eric Zitzewitz. Prediction markets. The Journal of Economic Perspectives, 18(2):107-126, 2004.
[212]
James R. Wright, Chris Thornton, and Kevin Leyton-Brown. Mechanical TA: Partially automated high-stakes peer grading. In ACM Technical Symposium on Computer Science Education, 2015.
[213]
Jian Yao, Sanja Fidler, and Raquel Urtasun. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In CVPR, 2012.
[214]
Teng Ye, Sangseok You, and Lionel P. Robert Jr. When does more money work? Examining the role of perceived fairness in pay on the performance quality of crowdworkers. In ICWSM, 2017.
[215]
Michael Yeomans, Anuj K. Shah, Sendhil Mullainathan, and Jon Kleinberg. Making sense of recommendations. Management Science, 2017.
[216]
Jinfeng Yi, Rong Jin, Anil K. Jain, and Shaili Jain. Crowdclustering with sparse pairwise labels: A matrix completion approach. In HCOMP, 2012a.
[217]
Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain, and Tianbao Yang. Semi crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In NIPS, 2012b.
[218]
Ming Yin, Yiling Chen, and Yu-An Sun. The effects of performance-contingent financial incentives in online labor markets. In AAAI, 2013.
[219]
Ming Yin, Yiling Chen, and Yu-An Sun. Monetary interventions in crowdsourcing task switching. In HCOMP, 2014.
[220]
Ming Yin, Mary L. Gray, Siddharth Suri, and Jennifer Wortman Vaughan. The communication network within the crowd. In WWW, 2016.
[221]
Dong Yu and Li Deng. Automatic speech recognition: A deep learning approach. Springer, 2014.
[222]
Omar Zaidan and Chris Callison-Burch. Crowdsourcing translation: Professional quality from non-professionals. In ACL, 2011.
[223]
Rabih Zbib, Erika Malchiodi, Jacob Devlin, David Stallard, Spyros Matsoukas, Richard Schwartz, John Makhoul, Omar F. Zaidan, and Chris Callison-Burch. Machine translation of Arabic dialects. In NAACL, 2012.
[224]
Haoqi Zhang, Edith Law, Krzysztof Gajos, Eric Horvitz, Rob Miller, and David Parkes. Human computation tasks with global constraints. In CHI, 2012.
[225]
Jing Zhang, Xindong Wu, and Victor S. Sheng. Learning from crowdsourced labeled data: A survey. Artificial Intelligence Review, 46(4):543-576, 2016a.
[226]
Yuchen Zhang, Xi Chen, Dengyong Zhou, and Michael I. Jordan. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. Journal of Machine Learning Research, 17(102):1-44, 2016b.
[227]
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. Truth inference in crowdsourcing: Is the problem solved? Proceedings of the VLDB Endowment, 10(5):541-552, 2017.
[228]
Dengyong Zhou, Sumit Basu, Yi Mao, and John Platt. Learning from the wisdom of crowds by minimax entropy. In NIPS, 2012.
[229]
Dengyong Zhou, Qiang Liu, John C. Platt, and Christopher Meek. Aggregating ordinal labels from crowds by minimax conditional entropy. In ICML, 2014.

Cited By

View all
  • (2024)Resolving the Human-Subjects Status of ML's CrowdworkersCommunications of the ACM10.1145/364185867:5(52-59)Online publication date: 1-May-2024
  • (2024)Teachable Facets: A Framework of Interactive Machine Teaching for Information FilteringProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638289(178-188)Online publication date: 10-Mar-2024
  • (2024)“I Prefer Regular Visitors to Answer My Questions”: Users’ Desired Experiential Background of Contributors for Location-based Crowdsourcing PlatformProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642520(1-18)Online publication date: 11-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 18, Issue 1
January 2017
8830 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Revised: 01 April 2018
Published: 01 January 2017
Published in JMLR Volume 18, Issue 1

Author Tags

  1. behavioral experiments
  2. crowdsourcing
  3. data generation
  4. hybrid intelligence
  5. incentives
  6. mechanical turk
  7. model evaluation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)9
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Resolving the Human-Subjects Status of ML's CrowdworkersCommunications of the ACM10.1145/364185867:5(52-59)Online publication date: 1-May-2024
  • (2024)Teachable Facets: A Framework of Interactive Machine Teaching for Information FilteringProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638289(178-188)Online publication date: 10-Mar-2024
  • (2024)“I Prefer Regular Visitors to Answer My Questions”: Users’ Desired Experiential Background of Contributors for Location-based Crowdsourcing PlatformProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642520(1-18)Online publication date: 11-May-2024
  • (2023)Trading-off payments and accuracy in online classification with paid stochastic expertsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619857(34809-34830)Online publication date: 23-Jul-2023
  • (2023)On Subset Selection of Multiple Humans To Improve Human-AI Team AccuracyProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598653(317-325)Online publication date: 30-May-2023
  • (2023)Toward a perspectivist turn in ground truthing for predictive computingProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25840(6860-6868)Online publication date: 7-Feb-2023
  • (2023)From Preference Elicitation to Participatory ML: A Critical Survey & Guidelines for Future ResearchProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604661(38-48)Online publication date: 8-Aug-2023
  • (2023)How do you feel? Measuring User-Perceived Value for Rejecting Machine Decisions in Hate Speech DetectionProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604655(834-844)Online publication date: 8-Aug-2023
  • (2023)The ethical ambiguity of AI data enrichment: Measuring gaps in research ethics norms and practicesProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3593995(261-270)Online publication date: 12-Jun-2023
  • (2023)A Survey of Data Quality Requirements That Matter in ML Development PipelinesJournal of Data and Information Quality10.1145/359261615:2(1-39)Online publication date: 19-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media