research-article

Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels

Authors:

Maarten de RijkeAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 729 - 738

https://doi.org/10.1145/3539618.3591685

Published: 18 July 2023 Publication History

Abstract

Label aggregation (LA) is the task of inferring a high-quality label for an example from multiple noisy labels generated by either human annotators or model predictions. Existing work on LA assumes a label generation process and designs a probabilistic graphical model (PGM) to learn latent true labels from observed crowd labels. However, the performance of PGM-based LA models is easily affected by the noise of the crowd labels. As a consequence, the performance of LA models differs on different datasets and no single LA model outperforms the rest on all datasets.

We extend PGM-based LA models by integrating a GP prior on the true labels. The advantage of LA models extended with a GP prior is that they can take as input crowd labels, example features, and existing pre-trained label prediction models to infer the true labels, while the original LA can only leverage crowd labels. Experimental results on both synthetic and real datasets show that any LA models extended with a GP prior and a suitable mean function achieves better performance than the underlying LA models, demonstrating the effectiveness of using a GP prior.

References

[1]

Shadi Albarqouni, Christoph Baur, Felix Achilles, Vasileios Belagiannis, Stefanie Demirci, and Nassir Navab. 2016. Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE transactions on medical imaging, Vol. 35, 5 (2016), 1313--1321.

[2]

Valerio Basile. [n.,d.]. The Perspectivist Data Manifesto. https://pdai.info/. [Online; accessed 2-January-2023].

[3]

Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds. In International Conference on Learning Representations.

[4]

Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. 2013. Pairwise Ranking Aggregation in a Crowdsourced Setting. In Proceedings of the sixth ACM international conference on Web search and data mining. 193--202.

Digital Library

[5]

Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985--988.

Digital Library

[6]

Alexander Philip Dawid and Allan M Skene. 1979. Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm. Applied statistics (1979), 20--28.

[7]

Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In Proceedings of the 21st International Conference on World Wide Web. 469--478.

Digital Library

[8]

Djellel Difallah and Alessandro Checco. 2021. Aggregation Techniques in Crowdsourcing: Multiple Choice Questions and Beyond. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4842--4844.

Digital Library

[9]

Alexey Drutsa, Valentina Fedorova, Dmitry Ustalov, Olga Megorskaya, Evfrosiniya Zerminova, and Daria Baidakova. 2020. Practice of Efficient Data Collection via Crowdsourcing: Aggregation, Incremental Relabelling, and Pricing. In Proceedings of the 13th International Conference on Web Search and Data Mining. 873--876.

Digital Library

[10]

Peter A Flach, José Hernández-Orallo, and Cèsar Ferri Ramirez. 2011. A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance. In ICML.

[11]

Meric Altug Gemalmaz and Ming Yin. 2021. Accounting for Confirmation Bias in Crowdsourced Label Aggregation. In IJCAI. 1729--1735.

[12]

Perry Groot, Adriana Birlutiu, and Tom Heskes. 2011. Learning from Multiple Annotators with Gaussian Processes. In International Conference on Artificial Neural Networks. Springer, 159--164.

[13]

Oliver Hamelijnck, Theodoros Damoulas, Kangrui Wang, and Mark Girolami. 2019. Multi-resolution Multi-task Gaussian Processes. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[14]

Lei Han, Eddy Maddalena, Alessandro Checco, Cristina Sarasua, Ujwal Gadiraju, Kevin Roitero, and Gianluca Demartini. 2020. Crowd Worker Strategies in Relevance Judgment Tasks. In Proceedings of the 13th International Conference on Web Search and Data Mining. 241--249.

Digital Library

[15]

Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy. 2013. Learning Whom to Trust with MACE. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1120--1130.

[16]

Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas, and Lora Aroyo. 2018. Studying Topical Relevance with Evidence-based Crowdsourcing. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1253--1262.

Digital Library

[17]

Ayush Jain, Akash Das Sarma, Aditya Parameswaran, and Jennifer Widom. 2017. Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace. Proceedings of the VLDB Endowment, Vol. 10, 7 (2017), 829--840.

Digital Library

[18]

Yuan Jin, Mark Carman, Ye Zhu, and Yong Xiang. 2020. A Technical Survey on Statistical Modelling and Design Methods for Crowdsourcing Quality Control. Artificial Intelligence, Vol. 287 (2020), 103351.

[19]

Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2013. An Analysis of Human Factors and Label Accuracy in Crowdsourcing Relevance Judgments. Information retrieval, Vol. 16, 2 (2013), 138--178.

[20]

Hyun-Chul Kim and Zoubin Ghahramani. 2012. Bayesian Classifier Combination. In Artificial Intelligence and Statistics. 619--627.

[21]

Ho Chung Law, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. 2018. Variational Learning on Aggregate Outputs with Gaussian Processes. Advances in neural information processing systems, Vol. 31 (2018).

[22]

Dan Li, Zhaochun Ren, and Evangelos Kanoulas. 2021b. CrowdGP: A Gaussian Process Model for Inferring Relevance from Crowd Annotations. In Proceedings of the Web Conference 2021. 1821--1832.

Digital Library

[23]

Shao-Yuan Li, Sheng-Jun Huang, and Songcan Chen. 2021a. Crowdsourcing Aggregation with Deep Bayesian Learning. Science China Information Sciences, Vol. 64, 3 (2021), 1--11.

[24]

Yuan Li. 2019. Probabilistic Models for Aggregating Crowdsourced Annotations. Ph.,D. Dissertation. University of Melbourne, Parkville, Victoria, Australia.

[25]

Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2017. GPflow: A Gaussian Process Library Using TensorFlow. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 1299--1304.

[26]

Geoffrey J McLachlan and Thriyambakam Krishnan. 2007. The EM algorithm and extensions. John Wiley & Sons.

[27]

Pablo Morales-Álvarez, Pablo Ruiz, Raúl Santos-Rodríguez, Rafael Molina, and Aggelos K Katsaggelos. 2019. Scalable and Efficient Learning from Crowds with Gaussian Processes. Information Fusion, Vol. 52 (2019), 110--127.

Digital Library

[28]

Yashar Moshfeghi and Alvaro Francisco Huertas-Rosero. 2021. A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments. ACM Transactions on Information Systems (TOIS), Vol. 40, 3 (2021), 1--29.

Digital Library

[29]

Radford M Neal and Geoffrey E Hinton. 1998. A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. In Learning in graphical models. Springer, 355--368.

[30]

Carl Edward Rasmussen. 2004. Gaussian Processes in Machine Learning. In Advanced lectures on machine learning. Springer, 63--71.

[31]

Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from Crowds. Journal of Machine Learning Research, Vol. 11, Apr (2010), 1297--1322.

Digital Library

[32]

Filipe Rodrigues, Francisco Pereira, and Bernardete Ribeiro. 2014. Gaussian Process Classification and Active Learning with Multiple Annotators. In International Conference on Machine Learning. 433--441.

[33]

Kevin Roitero, Alessandro Checco, Stefano Mizzaro, and Gianluca Demartini. 2022. Preferences on a Budget: Prioritizing Document Pairs when Crowdsourcing Relevance Judgments. In Proceedings of the ACM Web Conference 2022. 319--327.

Digital Library

[34]

Pablo Ruiz, Pablo Morales-Álvarez, Rafael Molina, and Aggelos K Katsaggelos. 2019. Learning from Crowds with Variational Gaussian Processes. Pattern Recognition, Vol. 88 (2019), 298--311.

Digital Library

[35]

Michael Soprano, Kevin Roitero, Francesco Bombassei De Bona, and Stefano Mizzaro. 2022. Crowd Frame: A Simple and Complete Framework to Deploy Complex Crowdsourcing Tasks Off-the-shelf. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1605--1608.

Digital Library

[36]

Yusuke Tanaka, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, and Hiroyuki Toda. 2019. Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[37]

Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy. In Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR?11). 21--26.

[38]

Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Advances in neural information processing systems. 2035--2043.

[39]

Hanlu Wu, Tengfei Ma, Lingfei Wu, Fangli Xu, and Shouling Ji. 2021. Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 16, 2 (2021), 1--18.

[40]

Ming Wu, Qianmu Li, Jing Zhang, and Jun Hou. 2022. Label Aggregation with Clustering for Biased Crowdsourced Labeling. In 2022 14th International Conference on Machine Learning and Computing (ICMLC). 165--169.

[41]

Fariba Yousefi, Michael T Smith, and Mauricio Alvarez. 2019. Multi-task Learning for Aggregated Data Using Gaussian Processes. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[42]

Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. 2023. Learning on Large-scale Text-attributed Graphs via Variational Inference. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=q0nmYciuuZN

[43]

Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proceedings of the VLDB Endowment, Vol. 10, 5 (2017), 541--552.

Digital Library

[44]

Yao Zhou, Fenglong Ma, Jing Gao, and Jingrui He. 2019. Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3231--3232.

Digital Library

Index Terms

Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning in probabilistic graphical models
        Bayesian network models
2. Information systems
  1. Information retrieval

Recommendations

Label Aggregation with Clustering for Biased Crowdsourced Labeling
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

With the rapid development of crowdsourcing learning, amount of label aggregation methods are proposed to infer the true labels of instances from multiple noisy labels provided by inexpert crowd workers. Most of the label aggregation methods take the ...
Label Aggregation for Crowdsourcing with Bi-Layer Clustering
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

This paper proposes a novel general label aggregation method for both binary and multi-class labeling in crowdsourcing, namely Bi-Layer Clustering (BLC), which clusters two layers of features - the conceptual-level and the physical-level features - to ...
Multi-Label Inference for Crowdsourcing
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

When acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Dutch Ministry of Educa- tion Culture and Science

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
89
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)6

Reflects downloads up to 14 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents