Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3459637.3482032acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
tutorial

Aggregation Techniques in Crowdsourcing: Multiple Choice Questions and Beyond

Published: 30 October 2021 Publication History

Abstract

Crowdsourcing has been leveraged in various tasks and applications, primarily to gather information from human annotators in exchange for a monetary reward. The main challenge associated with crowdsourcing is the low quality of the results, which can stem from multiple reasons, including bias, error, and adversarial behavior. Researchers and practitioners can apply quality control methods to prevent and detect low-quality responses. For example, worker selection methods utilize qualifications and attention check questions before assigning a task. Similarly, task routing identifies the workers who can provide a more accurate response to a given task type using recommender system techniques. In practice, posterior quality control methods are the most common approach to deal with noisy labels once they are obtained. Such methods require task repetition, i.e., assigning the task to multiple crowd-workers, followed by an aggregation mechanism (aka truth inference) to select the most likely answer or request an additional label. A large number of techniques have been proposed for crowdsourcing aggregation covering several types of task types. This tutorial aims to present common and recent label aggregation techniques for multiple-choice questions, multi-class labels, ratings, pairwise comparison, and image/text annotation. We believe that the audience will benefit from the focus on this specific research area to learn about the best techniques to apply in their crowdsourcing projects.

References

[1]
Yoram Bachrach, Thore Graepel, Tom Minka, and John Guiver. 2012. How to grade a test without knowing the answers--a Bayesian graphical model for adaptive crowdsourcing and aptitude testing. arXiv preprint arXiv:1206.6386 (2012).
[2]
Kalina Bontcheva, Ian Roberts, Leon Derczynski, and Samantha Alexander-Eames. 2014. The GATE crowdsourcing plugin: Crowdsourcing annotated corpora made easy. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics. 97--100.
[3]
Alessandro Checco and Gianluca Demartini. 2016. Pairwise, Magnitude, or Stars: What's the Best Way for Crowds to Rate? arXiv preprint arXiv:1609.00683 (2016).
[4]
Alessandro Checco, Kevin Roitero, Eddy Maddalena, Stefano Mizzaro, and Gianluca Demartini. 2017. Let's agree to disagree: Fixing agreement measures for crowdsourcing. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 5.
[5]
Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. 2013. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining. 193--202.
[6]
Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 28, 1 (1979), 20--28.
[7]
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré -Mauroux. 2012. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In WWW. ACM, 469--478.
[8]
Gianluca Demartini, Djellel Eddine Difallah, Ujwal Gadiraju, and Michele Catasta. 2017. An Introduction to Hybrid Human-Machine Information Systems. Found. Trends Web Sci., Vol. 7, 1 (2017), 1--87.
[9]
Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2012. Mechanical cheat: Spamming schemes and adversarial techniques on crowdsourcing platforms. In CrowdSearch.
[10]
Alexey Drutsa, Valentina Fedorova, Dmitry Ustalov, Olga Megorskaya, Evfrosiniya Zerminova, and Daria Baidakova. 2020. Practice of Efficient Data Collection via Crowdsourcing: Aggregation, Incremental Relabelling, and Pricing. In WSDM. ACM, 873--876.
[11]
Susan E Embretson and Steven P Reise. 2013. Item response theory .Psychology Press.
[12]
Ulle Endriss and Umberto Grandi. 2014. Binary Aggregation by Selection of the Most Representative Voters. In AAAI. AAAI Press, 668--674.
[13]
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-Lee Tan, and Jianhua Feng. 2015. iCrowd: An Adaptive Crowdsourcing Framework. In SIGMOD Conference. ACM, 1015--1030.
[14]
Ujwal Gadiraju, Gianluca Demartini, Djellel Eddine Difallah, and Michele Catasta. 2016. It's getting crowded!: how to use crowdsourcing effectively for web science research. In WebSci. ACM, 11.
[15]
Florent Garcin, Boi Faltings, Radu Jurca, and Nadine Joswig. 2009. Rating aggregation in collaborative filtering systems. In Proceedings of the third ACM conference on Recommender systems. 349--352.
[16]
Nguyen Quoc Viet Hung, Thanh Tam Nguyen, Ngoc Tran Lam, and Karl Aberer. 2013. An Evaluation of Aggregation Techniques in Crowdsourcing. In WISE (2) (Lecture Notes in Computer Science, Vol. 8181). Springer, 1--15.
[17]
Oana Inel, Khalid Khamkham, Tatiana Cristea, Anca Dumitrache, Arne Rutjes, Jelle van der Ploeg, Lukasz Romaszko, Lora Aroyo, and Robert-Jan Sips. 2014. Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data. In International semantic web conference. Springer, 486--504.
[18]
Yuan Jin, Mark Carman, Ye Zhu, and Wray Buntine. 2018. Distinguishing question subjectivity from difficulty for improved crowdsourcing. In Asian Conference on Machine Learning. PMLR, 192--207.
[19]
Yuan Jin, Mark J. Carman, Ye Zhu, and Yong Xiang. 2020. A technical survey on statistical modelling and design methods for crowdsourcing quality control. Artif. Intell., Vol. 287 (2020), 103351.
[20]
Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, and Kristen Grauman. 2016. Crowdsourcing in computer vision. arXiv preprint arXiv:1611.02145 (2016).
[21]
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2016. A survey on truth discovery. ACM Sigkdd Explorations Newsletter, Vol. 17, 2 (2016), 1--16.
[22]
Qiang Liu, Jian Peng, and Alexander T. Ihler. 2012. Variational Inference for Crowdsourcing. In NIPS. 701--709.
[23]
Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, and Jiawei Han. 2015. FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation. In KDD. ACM, 745--754.
[24]
Eddy Maddalena, Stefano Mizzaro, Falk Scholer, and Andrew Turpin. 2015. Judging relevance using magnitude estimation. In European Conference on Information Retrieval. Springer, 215--220.
[25]
Kevin Roitero, Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2018. How Many Truth Levels? Six? One Hundred? Even More? Validating Truthfulness of Statements via Crowdsourcing. In CIKM Workshops.
[26]
Marta Sabou, Kalina Bontcheva, Leon Derczynski, and Arno Scharl. 2014. Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines. In LREC. Citeseer, 859--866.
[27]
Edwin Simpson, Stephen Roberts, Ioannis Psorakis, and Arfon Smith. 2013. Dynamic bayesian combination of multiple imperfect classifiers. In Decision making and imperfection. Springer, 1--35.
[28]
Hao Su, Jia Deng, and Li Fei-Fei. 2012. Crowdsourcing annotations for visual object detection. In Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence.
[29]
Howard EA Tinsley and David J Weiss. 2000. Interrater reliability and agreement. In Handbook of applied multivariate statistics and mathematical modeling. Elsevier, 95--124.
[30]
Jinzheng Tu, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Guoqiang Xiao, and Maozu Guo. 2018. Multi-label answer aggregation based on joint matrix factorization. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 517--526.
[31]
Jennifer Wortman Vaughan. 2017. Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research. J. Mach. Learn. Res., Vol. 18, 1 (2017), 7026--7071.
[32]
Carl Vogel, Maria Koutsombogera, and Rachel Costello. 2020. Analyzing likert scale inter-annotator disagreement. In Neural Approaches to Dynamics of Signal Exchanges. Springer, 383--393.
[33]
Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR'11). 21--26.
[34]
Peter Welinder, Steve Branson, Serge J. Belongie, and Pietro Perona. 2010a. The Multidimensional Wisdom of Crowds. In NIPS. Curran Associates, Inc., 2424--2432.
[35]
Peter Welinder, Steve Branson, Pietro Perona, and Serge Belongie. 2010b. The multidimensional wisdom of crowds. Advances in neural information processing systems, Vol. 23 (2010), 2424--2432.
[36]
Jing Zhang, Xindong Wu, and Victor S Shengs. 2014. Active learning with imbalanced multiple noisy labeling. IEEE transactions on cybernetics, Vol. 45, 5 (2014), 1095--1107.
[37]
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proc. VLDB Endow., Vol. 10, 5 (2017), 541--552.
[38]
Dengyong Zhou, John C. Platt, Sumit Basu, and Yi Mao. 2012. Learning from the Wisdom of Crowds by Minimax Entropy. In NIPS. 2204--2212.
[39]
Yao Zhou, Fenglong Ma, Jing Gao, and Jingrui He. 2019. Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching. In KDD. ACM, 3231--3232.
[40]
Yao Zhou, Arun Reddy Nelakurthi, and Jingrui He. 2018. Unlearn what you have learned: Adaptive crowd teaching with exponentially decayed memory learners. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2817--2826.

Cited By

View all
  • (2024)What Matters in a Measure? A Perspective from Large-Scale Search EvaluationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657845(282-292)Online publication date: 10-Jul-2024
  • (2023)Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph EmbeddingACM Transactions on Knowledge Discovery from Data10.1145/356557617:5(1-26)Online publication date: 27-Feb-2023
  • (2023)Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing LabelsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591685(729-738)Online publication date: 19-Jul-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowdsourcing
  2. label aggregation
  3. pairwise comparison
  4. quality control
  5. rating aggregations
  6. truth inference

Qualifiers

  • Tutorial

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)What Matters in a Measure? A Perspective from Large-Scale Search EvaluationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657845(282-292)Online publication date: 10-Jul-2024
  • (2023)Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph EmbeddingACM Transactions on Knowledge Discovery from Data10.1145/356557617:5(1-26)Online publication date: 27-Feb-2023
  • (2023)Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing LabelsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591685(729-738)Online publication date: 19-Jul-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media