research-article

Scalpel-CD: Leveraging Crowdsourcing and Deep Probabilistic Modeling for Debugging Noisy Training Data

Authors:

Alisa Smirnova,

Gianluca Demartini,

Philippe Cudre-MaurouxAuthors Info & Claims

WWW '19: The World Wide Web Conference

Pages 2158 - 2168

https://doi.org/10.1145/3308558.3313599

Published: 13 May 2019 Publication History

Abstract

This paper presents Scalpel-CD, a first-of-its-kind system that leverages both human and machine intelligence to debug noisy labels from the training data of machine learning systems. Our system identifies potentially wrong labels using a deep probabilistic model, which is able to infer the latent class of a high-dimensional data instance by exploiting data distributions in the underlying latent feature space. To minimize crowd efforts, it employs a data sampler which selects data instances that would benefit the most from being inspected by the crowd. The manually verified labels are then propagated to similar data instances in the original training data by exploiting the underlying data structure, thus scaling out the contribution from the crowd. Scalpel-CD is designed with a set of algorithmic solutions to automatically search for the optimal configurations for different types of training data, in terms of the underlying data structure, noise ratio, and noise types (random vs. structural). In a real deployment on multiple machine learning tasks, we demonstrate that Scalpel-CD is able to improve label quality by 12.9% with only 2.8% instances inspected by the crowd.

References

[1]

Enrique Alfonseca, Katja Filippova, Jean-Yves Delort, and Guillermo Garrido. 2012. Pattern learning for relation extraction with a hierarchical topic model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association of Computational Linguistics, 54-59.

Digital Library

[2]

Alexander Ratner Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Re´. 2017. Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment (PVLDB)11, 3 (2017).

Digital Library

[3]

Michael S Bernstein, Greg Little, Robert C Miller, Björn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. 2015. Soylent: a word processor with a crowd inside. Commun. ACM58, 8 (2015), 85-94.

Digital Library

[4]

David M Blei, Alp Kucukelbir, and Jon D McAuliffe. 2017. Variational inference: A review for statisticians. J. Amer. Statist. Assoc.112, 518 (2017), 859-877.

[5]

Andrew Carlson, Justin Betteridge, Richard C Wang, Estevam R Hruschka Jr, and Tom M Mitchell. 2010. Coupled semi-supervised learning for information extraction. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). ACM, 101-110.

Digital Library

[6]

Aleksandar Chakarov, Aditya Nori, Sriram Rajamani, Shayak Sen, and Deepak Vijaykeerthy. 2016. Debugging machine learning tasks. arXiv preprint arXiv:1603.07292(2016).

[7]

Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 598-617.

[8]

Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics (1979), 20-28.

[9]

Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 65-74.

Digital Library

[10]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 248-255.

[11]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608(2017).

[12]

Meng Fang, Xingquan Zhu, Bin Li, Wei Ding, and Xindong Wu. 2012. Self-taught active learning from crowds. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM). IEEE, 858-863.

Digital Library

[13]

Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep bayesian active learning with image data. Proceedings of the 34th International Conference on Machine Learning (ICML), 1183-1192.

Digital Library

[14]

Michael Heilman and Noah A Smith. 2010. Rating Computer-generated questions with Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. 35-40.

Digital Library

[15]

Armand Joulin, Laurens van der Maaten, Allan Jabri, and Nicolas Vasilache. 2016. Learning visual features from large weakly supervised data. In European Conference on Computer Vision (ECCV). Springer, 67-84.

[16]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882(2014).

[17]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[18]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR).

[19]

Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International Conference on Machine Learning (ICML). 1885-1894.

Digital Library

[20]

Alexander Konovalov, Benjamin Strauss, Alan Ritter, and Brendan O'Connor. 2017. Learning to extract events from knowledge base revisions. In Proceedings of the 26th International Conference on World Wide Web (WWW). IW3C2, 1007-1014.

Digital Library

[21]

Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J Franklin, and Ken Goldberg. 2016. ActiveClean: interactive data cleaning for statistical modeling. Proceedings of the VLDB Endowment (PVLDB)9, 12 (2016), 948-959.

Digital Library

[22]

Edith Law and Luis von Ahn. 2011. Human computation. Synthesis Lectures on Artificial Intelligence and Machine Learning5, 3(2011), 1-121.

Digital Library

[23]

Hongwei Li, Bin Yu, and Dengyong Zhou. 2013. Error rate analysis of labeling by crowdsourcing. In ICML Workshop: Machine Learning Meets Crowdsourcing. Atalanta, Georgia, USA.

[24]

Jiwei Li, Will Monroe, and Dan Jurafsky. 2016. Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220(2016).

[25]

Zachary C Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490(2016).

[26]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research9, Nov (2008), 2579-2605.

[27]

Tyler McDonnell, Matthew Lease, Mucahid Kutlu, and Tamer Elsayed. 2016. Why is that relevant? Collecting annotator rationales for relevance judgments. In Fourth AAAI Conference on Human Computation and Crowdsourcing (HCOMP). 139-148.

[28]

Bart Mellebeek, Francesc Benavent, Jens Grivolla, Joan Codina, Marta R Costa-Jussa, and Rafael Banchs. 2010. Opinion mining of Spanish customer comments with non-expert annotations on Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. 114-121.

Digital Library

[29]

Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In International Conference on Machine Learning (ICML). 1727-1736.

Digital Library

[30]

Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics (ACL. Association for Computational Linguistics, 271-278.

Digital Library

[31]

Alexander J Ratner, Christopher M De Sa, Sen Wu, Daniel Selsam, and Christopher Re´. 2016. Data programming: Creating large training sets, quickly. In Advances in Neural Information Processing Systems (NIPS). 3567-3575.

Digital Library

[32]

Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from crowds. Journal of Machine Learning Research11, Apr (2010), 1297-1322.

Digital Library

[33]

Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R Voss, Heng Ji, Tarek F Abdelzaher, and Jiawei Han. 2017. CoType: Joint extraction of typed entities and relations with knowledge bases. In Proceedings of the 26th International Conference on World Wide Web (WWW). IW2C2, 1015-1024.

Digital Library

[34]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1135-1144.

Digital Library

[35]

Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD). Springer, 148-163.

Digital Library

[36]

Alan Ritter, Evan Wright, William Casey, and Tom Mitchell. 2015. Weakly supervised extraction of computer security events from twitter. In Proceedings of the 24th International Conference on World Wide Web (WWW). IW3C2, 896-905.

Digital Library

[37]

Burr Settles. 2010. Active Learning Literature Survey. University of Wisconsin, Madison52, 55-66 (2010), 11.

Digital Library

[38]

Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 614-622.

Digital Library

[39]

Shingo Takamatsu, Issei Sato, and Hiroshi Nakagawa. 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association of Computational Linguistics, 721-729.

Digital Library

[40]

Yuandong Tian and Jun Zhu. 2012. Learning from crowds in the presence of schools of thought. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 226-234.

Digital Library

[41]

Jennifer Wortman Vaughan. 2018. Making better use of the crowd: How crowdsourcing can advance machine learning research. Journal of Machine Learning Research18, 193 (2018), 1-46.

Digital Library

[42]

Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems (NIPS). 2035-2043.

Digital Library

[43]

Mohamed Yakout, Laure Berti-E&Acute;quille, and Ahmed K Elmagarmid. 2013. Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, 553-564.

Digital Library

[44]

Mohamed Yakout, Ahmed K Elmagarmid, Jennifer Neville, Mourad Ouzzani, and Ihab F Ilyas. 2011. Guided data repair. Proceedings of the VLDB Endowment (PVLDB)4, 5 (2011), 279-289.

Digital Library

[45]

Yan Yan, Glenn M Fung, Rómer Rosales, and Jennifer G Dy. 2011. Active learning from crowds. In Proceedings of the 28th International Conference on Machine Learning (ICML). 1161-1168.

Digital Library

[46]

Yan Yan, Rómer Rosales, Glenn Fung, Mark W Schmidt, Gerardo H Valadez, Luca Bogoni, Linda Moy, and Jennifer G Dy. 2010. Modeling annotator expertise: Learning when everybody knows a bit of something. In International Conference on Artificial Intelligence and Statistics (AISTATS). 932-939.

[47]

Jie Yang, Thomas Drake, Andreas Damianou, and Yoelle Maarek. 2018. Leveraging crowdsourcing data for deep active learning - An application: learning intents in Alexa. In Proceedings of the 2018 Web Conference (WWW). IW3C2, 23-32.

Digital Library

[48]

Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association of Computational Linguistics, 1753-1762.

[49]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. In Proceedings of the 5th International Conference on Learning Representations (ICLR).

Cited By

Tocchetti ACorti LBalayn AYurrita MLippmann PBrambilla MYang J(2024)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/3665926Online publication date: 27-May-2024
https://doi.org/10.1145/3665926
Han XXiong HHe ZWang PWang CWang X(2024)Akane: Perplexity-Guided Time Series Data CleaningProceedings of the ACM on Management of Data10.1145/36549932:3(1-26)Online publication date: 30-May-2024
https://doi.org/10.1145/3654993
Smirnova AYang JCudre-Mauroux PSerra ESpezzano F(2024)XCrowd: Combining Explainability and Crowdsourcing to Diagnose Models in Relation ExtractionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679777(2097-2107)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679777
Show More Cited By

Recommendations

Leveraging Crowdsourcing Data for Deep Active Learning An Application: Learning Intents in Alexa
WWW '18: Proceedings of the 2018 World Wide Web Conference

This paper presents a generic Bayesian framework that enables any deep learning model to actively learn from targeted crowds. Our framework inherits from recent advances in Bayesian deep learning, and extends existing work by considering the targeted ...
Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

Crowdsourcing has become a popular method for collecting labeled training data. However, in many practical scenarios traditional labeling can be difficult for crowdworkers(for example, if the data is high-dimensional or unintuitive, or the labels are ...
Three-way decision-based label integration for crowdsourcing
Abstract
In crowdsourcing learning, label integration is often used to infer instances’ integrated labels from their multiple noisy labels. However, almost all existing label integration algorithms apply the same strategy to infer different instances’ ...
Highlights
- Crowdsourcing provides an effective way to collect labels from crowd workers.
- Label integration aims to infer each instance’s true label.
- This paper proposes a three-way decision-based label integration algorithm.
- The extensive ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '19: The World Wide Web Conference

May 2019

3620 pages

ISBN:9781450366748

DOI:10.1145/3308558

Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '19

WWW '19: The Web Conference

May 13 - 17, 2019

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
301
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tocchetti ACorti LBalayn AYurrita MLippmann PBrambilla MYang J(2024)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/3665926Online publication date: 27-May-2024
https://doi.org/10.1145/3665926
Han XXiong HHe ZWang PWang CWang X(2024)Akane: Perplexity-Guided Time Series Data CleaningProceedings of the ACM on Management of Data10.1145/36549932:3(1-26)Online publication date: 30-May-2024
https://doi.org/10.1145/3654993
Smirnova AYang JCudre-Mauroux PSerra ESpezzano F(2024)XCrowd: Combining Explainability and Crowdsourcing to Diagnose Models in Relation ExtractionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679777(2097-2107)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679777
Correia AGrover ASchneider DPimentel AChaves Rde Almeida MFonseca B(2023)Designing for Hybrid Intelligence: A Taxonomy and Survey of Crowd-Machine InteractionApplied Sciences10.3390/app1304219813:4(2198)Online publication date: 8-Feb-2023
https://doi.org/10.3390/app13042198
Sharifi Noorian SQiu SSayin BBalayn AGadiraju UYang JBozzon A(2023)Perspective: Leveraging Human Understanding for Identifying and Characterizing Image AtypicalityProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584096(650-663)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584096
Chen ZSun HHe HChen P(2023)Learning from Noisy Crowd Labels with Logics2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00011(41-52)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00011
Sharifi Noorian SQiu SGadiraju UYang JBozzon A(2022)What Should You Know? A Human-In-the-Loop Approach to Unknown Unknowns Characterization in Image RecognitionProceedings of the ACM Web Conference 202210.1145/3485447.3512040(882-892)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512040
Sun ZYang JFeng KFang HQu XOng YAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Revisiting Bundle RecommendationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531904(2900-2911)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531904
Bhardwaj AYang JCudre-Mauroux P(2022)Human-in-the-Loop Rule Discovery for Micropost Event DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3208345(1-12)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3208345
Smirnova AYang JYang DCudre-Mauroux P(2022)Nessy: A Neuro-Symbolic System for Label Noise ReductionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3199570(1-12)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3199570
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten