Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3308558.3313599acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Scalpel-CD: Leveraging Crowdsourcing and Deep Probabilistic Modeling for Debugging Noisy Training Data

Published: 13 May 2019 Publication History

Abstract

This paper presents Scalpel-CD, a first-of-its-kind system that leverages both human and machine intelligence to debug noisy labels from the training data of machine learning systems. Our system identifies potentially wrong labels using a deep probabilistic model, which is able to infer the latent class of a high-dimensional data instance by exploiting data distributions in the underlying latent feature space. To minimize crowd efforts, it employs a data sampler which selects data instances that would benefit the most from being inspected by the crowd. The manually verified labels are then propagated to similar data instances in the original training data by exploiting the underlying data structure, thus scaling out the contribution from the crowd. Scalpel-CD is designed with a set of algorithmic solutions to automatically search for the optimal configurations for different types of training data, in terms of the underlying data structure, noise ratio, and noise types (random vs. structural). In a real deployment on multiple machine learning tasks, we demonstrate that Scalpel-CD is able to improve label quality by 12.9% with only 2.8% instances inspected by the crowd.

References

[1]
Enrique Alfonseca, Katja Filippova, Jean-Yves Delort, and Guillermo Garrido. 2012. Pattern learning for relation extraction with a hierarchical topic model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association of Computational Linguistics, 54-59.
[2]
Alexander Ratner Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Re´. 2017. Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment (PVLDB)11, 3 (2017).
[3]
Michael S Bernstein, Greg Little, Robert C Miller, Björn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. 2015. Soylent: a word processor with a crowd inside. Commun. ACM58, 8 (2015), 85-94.
[4]
David M Blei, Alp Kucukelbir, and Jon D McAuliffe. 2017. Variational inference: A review for statisticians. J. Amer. Statist. Assoc.112, 518 (2017), 859-877.
[5]
Andrew Carlson, Justin Betteridge, Richard C Wang, Estevam R Hruschka Jr, and Tom M Mitchell. 2010. Coupled semi-supervised learning for information extraction. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). ACM, 101-110.
[6]
Aleksandar Chakarov, Aditya Nori, Sriram Rajamani, Shayak Sen, and Deepak Vijaykeerthy. 2016. Debugging machine learning tasks. arXiv preprint arXiv:1603.07292(2016).
[7]
Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 598-617.
[8]
Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics (1979), 20-28.
[9]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 65-74.
[10]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 248-255.
[11]
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608(2017).
[12]
Meng Fang, Xingquan Zhu, Bin Li, Wei Ding, and Xindong Wu. 2012. Self-taught active learning from crowds. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM). IEEE, 858-863.
[13]
Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep bayesian active learning with image data. Proceedings of the 34th International Conference on Machine Learning (ICML), 1183-1192.
[14]
Michael Heilman and Noah A Smith. 2010. Rating Computer-generated questions with Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. 35-40.
[15]
Armand Joulin, Laurens van der Maaten, Allan Jabri, and Nicolas Vasilache. 2016. Learning visual features from large weakly supervised data. In European Conference on Computer Vision (ECCV). Springer, 67-84.
[16]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882(2014).
[17]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[18]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR).
[19]
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International Conference on Machine Learning (ICML). 1885-1894.
[20]
Alexander Konovalov, Benjamin Strauss, Alan Ritter, and Brendan O'Connor. 2017. Learning to extract events from knowledge base revisions. In Proceedings of the 26th International Conference on World Wide Web (WWW). IW3C2, 1007-1014.
[21]
Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J Franklin, and Ken Goldberg. 2016. ActiveClean: interactive data cleaning for statistical modeling. Proceedings of the VLDB Endowment (PVLDB)9, 12 (2016), 948-959.
[22]
Edith Law and Luis von Ahn. 2011. Human computation. Synthesis Lectures on Artificial Intelligence and Machine Learning5, 3(2011), 1-121.
[23]
Hongwei Li, Bin Yu, and Dengyong Zhou. 2013. Error rate analysis of labeling by crowdsourcing. In ICML Workshop: Machine Learning Meets Crowdsourcing. Atalanta, Georgia, USA.
[24]
Jiwei Li, Will Monroe, and Dan Jurafsky. 2016. Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220(2016).
[25]
Zachary C Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490(2016).
[26]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research9, Nov (2008), 2579-2605.
[27]
Tyler McDonnell, Matthew Lease, Mucahid Kutlu, and Tamer Elsayed. 2016. Why is that relevant? Collecting annotator rationales for relevance judgments. In Fourth AAAI Conference on Human Computation and Crowdsourcing (HCOMP). 139-148.
[28]
Bart Mellebeek, Francesc Benavent, Jens Grivolla, Joan Codina, Marta R Costa-Jussa, and Rafael Banchs. 2010. Opinion mining of Spanish customer comments with non-expert annotations on Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. 114-121.
[29]
Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In International Conference on Machine Learning (ICML). 1727-1736.
[30]
Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics (ACL. Association for Computational Linguistics, 271-278.
[31]
Alexander J Ratner, Christopher M De Sa, Sen Wu, Daniel Selsam, and Christopher Re´. 2016. Data programming: Creating large training sets, quickly. In Advances in Neural Information Processing Systems (NIPS). 3567-3575.
[32]
Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from crowds. Journal of Machine Learning Research11, Apr (2010), 1297-1322.
[33]
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R Voss, Heng Ji, Tarek F Abdelzaher, and Jiawei Han. 2017. CoType: Joint extraction of typed entities and relations with knowledge bases. In Proceedings of the 26th International Conference on World Wide Web (WWW). IW2C2, 1015-1024.
[34]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1135-1144.
[35]
Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD). Springer, 148-163.
[36]
Alan Ritter, Evan Wright, William Casey, and Tom Mitchell. 2015. Weakly supervised extraction of computer security events from twitter. In Proceedings of the 24th International Conference on World Wide Web (WWW). IW3C2, 896-905.
[37]
Burr Settles. 2010. Active Learning Literature Survey. University of Wisconsin, Madison52, 55-66 (2010), 11.
[38]
Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 614-622.
[39]
Shingo Takamatsu, Issei Sato, and Hiroshi Nakagawa. 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association of Computational Linguistics, 721-729.
[40]
Yuandong Tian and Jun Zhu. 2012. Learning from crowds in the presence of schools of thought. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 226-234.
[41]
Jennifer Wortman Vaughan. 2018. Making better use of the crowd: How crowdsourcing can advance machine learning research. Journal of Machine Learning Research18, 193 (2018), 1-46.
[42]
Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems (NIPS). 2035-2043.
[43]
Mohamed Yakout, Laure Berti-E&Acute;quille, and Ahmed K Elmagarmid. 2013. Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, 553-564.
[44]
Mohamed Yakout, Ahmed K Elmagarmid, Jennifer Neville, Mourad Ouzzani, and Ihab F Ilyas. 2011. Guided data repair. Proceedings of the VLDB Endowment (PVLDB)4, 5 (2011), 279-289.
[45]
Yan Yan, Glenn M Fung, Rómer Rosales, and Jennifer G Dy. 2011. Active learning from crowds. In Proceedings of the 28th International Conference on Machine Learning (ICML). 1161-1168.
[46]
Yan Yan, Rómer Rosales, Glenn Fung, Mark W Schmidt, Gerardo H Valadez, Luca Bogoni, Linda Moy, and Jennifer G Dy. 2010. Modeling annotator expertise: Learning when everybody knows a bit of something. In International Conference on Artificial Intelligence and Statistics (AISTATS). 932-939.
[47]
Jie Yang, Thomas Drake, Andreas Damianou, and Yoelle Maarek. 2018. Leveraging crowdsourcing data for deep active learning - An application: learning intents in Alexa. In Proceedings of the 2018 Web Conference (WWW). IW3C2, 23-32.
[48]
Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association of Computational Linguistics, 1753-1762.
[49]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. In Proceedings of the 5th International Conference on Learning Representations (ICLR).

Cited By

View all
  • (2024)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/3665926Online publication date: 27-May-2024
  • (2024)Akane: Perplexity-Guided Time Series Data CleaningProceedings of the ACM on Management of Data10.1145/36549932:3(1-26)Online publication date: 30-May-2024
  • (2024)XCrowd: Combining Explainability and Crowdsourcing to Diagnose Models in Relation ExtractionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679777(2097-2107)Online publication date: 21-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Debugging training data
  2. crowdsourcing
  3. deep probabilistic models

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/3665926Online publication date: 27-May-2024
  • (2024)Akane: Perplexity-Guided Time Series Data CleaningProceedings of the ACM on Management of Data10.1145/36549932:3(1-26)Online publication date: 30-May-2024
  • (2024)XCrowd: Combining Explainability and Crowdsourcing to Diagnose Models in Relation ExtractionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679777(2097-2107)Online publication date: 21-Oct-2024
  • (2023)Designing for Hybrid Intelligence: A Taxonomy and Survey of Crowd-Machine InteractionApplied Sciences10.3390/app1304219813:4(2198)Online publication date: 8-Feb-2023
  • (2023)Perspective: Leveraging Human Understanding for Identifying and Characterizing Image AtypicalityProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584096(650-663)Online publication date: 27-Mar-2023
  • (2023)Learning from Noisy Crowd Labels with Logics2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00011(41-52)Online publication date: Apr-2023
  • (2022)What Should You Know? A Human-In-the-Loop Approach to Unknown Unknowns Characterization in Image RecognitionProceedings of the ACM Web Conference 202210.1145/3485447.3512040(882-892)Online publication date: 25-Apr-2022
  • (2022)Revisiting Bundle RecommendationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531904(2900-2911)Online publication date: 6-Jul-2022
  • (2022)Human-in-the-Loop Rule Discovery for Micropost Event DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3208345(1-12)Online publication date: 2022
  • (2022)Nessy: A Neuro-Symbolic System for Label Noise ReductionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3199570(1-12)Online publication date: 2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media