short-paper

Label Aggregation for Crowdsourcing with Bi-Layer Clustering

Authors:

Victor S. Sheng,

Tao LiAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 921 - 924

https://doi.org/10.1145/3077136.3080679

Published: 07 August 2017 Publication History

Abstract

This paper proposes a novel general label aggregation method for both binary and multi-class labeling in crowdsourcing, namely Bi-Layer Clustering (BLC), which clusters two layers of features - the conceptual-level and the physical-level features - to infer true labels of instances. BLC first clusters the instances using the conceptual-level features extracted from their multiple noisy labels and then performs clustering again using the physical-level features. It can facilitate tracking the uncertainty changes of the instances, so that the integrated labels that are likely to be falsely inferred on the conceptual layer can be easily corrected using the estimated labels on the physical layer. Experimental results on two real-world crowdsourcing data sets show that BLC outperforms seven state-of-the-art methods.

References

[1]

W. Bi, L. Wang, J. T. Kwok, and Z. Tu. Learning to predict from crowdsourced data. In UAI, pages 82--91, 2014.

[2]

A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28(1): 20--28, 1979.

[3]

G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In WWW, pages 469--478, 2012.

Digital Library

[4]

E. Kamar, A. Kapoor, and E. Horvitz. Identifying and accounting for task-dependent bias in crowdsourcing. In AAAI HCOMP, pages 92--101, 2015.

[5]

D. R. Karger, S. Oh, and D. Shah. Iterative learning for reliable crowdsourcing systems. In NIPS, 24: 1953--1961, 2011.

[6]

A. Kurve, D. J. Miller, and G. Kesidis. Multicategory crowdsourcing accounting for variable task difficulty, worker skill, and worker intention. IEEE TKDE, 27(3): 794--809, 2015.

Digital Library

[7]

H. Li and B. Yu. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086, 2014.

[8]

V. C. Raykar, S. Yu, L. H. Zhao, C. Florin, G. H. Valadez, L. Bogoni, and L. Moy. Learning from crowds. JMLR, 11: 1297--1322, 2010.

Digital Library

[9]

A. Sheshadri and M. Lease. SQUARE: a benchmark for research on computing crowd consensus. In AAAI HCOMP, pages 56--164, 2013.

[10]

M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation models for crowdsourcing. In WWW, pages 155--164, 2014.

Digital Library

[11]

P. Welinder, S. Branson, P. Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In NIPS, 23: 2424--2432, 2010.

[12]

J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movella. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In NIPS, 22: 2035--2043, 2009.

Digital Library

[13]

Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. In NIPS, 27: 1260--1268, 2014.

[14]

J. Zhang, V. S. Sheng, J. Wu, and X. Wu. Multi-class ground truth inference in crowdsourcing with clustering. IEEE TKDE, 28(4): 1080--1085, 2016.

Digital Library

[15]

Y. Zheng, G. Li, Y. Li, C. Shan, and R. Cheng. Truth inference in crowdsourcing: Is the problem solved? In VLDB Endowment, 10(5), 2017.

Digital Library

[16]

D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In NIPS, pages 2195--2203, 2012.

Digital Library

Cited By

Ying ZZhang JLi QWu MSheng V(2024)A Little Truth Injection but a Big Reward: Label Aggregation With Graph Neural NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.3338216(1-14)Online publication date: 2024
https://doi.org/10.1109/TPAMI.2023.3338216
Wu GZhuo XZhou LBao XHong RWu X(2023)TIRA: Truth Inference via Reliability Aggregation on Object-Source GraphIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322530835:11(11967-11981)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3225308
Wang SDang D(2023)A Generative Answer Aggregation Model for Sentence-Level Crowdsourcing TasksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314282135:4(3299-3312)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TKDE.2022.3142821
Show More Cited By

Index Terms

Label Aggregation for Crowdsourcing with Bi-Layer Clustering
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering
  2. World Wide Web
    1. Web applications
      1. Crowdsourcing

Recommendations

Multi-Label Inference for Crowdsourcing
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

When acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the ...
Label Aggregation with Clustering for Biased Crowdsourced Labeling
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

With the rapid development of crowdsourcing learning, amount of label aggregation methods are proposed to infer the true labels of instances from multiple noisy labels provided by inexpert crowd workers. Most of the label aggregation methods take the ...
Learning from biased crowdsourced labeling with deep clustering
Highlights
- The phenomenon of biased labeling usually existing in the scenario of crowdsourcing.
- Biased labeling is a critical factor that effects label aggregation performance.
- Deep clustering estimates the underlying label distribution and ...
Abstract
With the rapid development of crowdsourcing learning, amount of labels can be obtained from crowd workers fast and cheaply. However, crowdsourcing learning also faces challenges due to the varied qualities of amateurish crowd workers. To improve ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Natural Science Foundation of China
Scientific and Technological Support Project of Jiangsu Province, China
Natural Science Foundation of Jiangsu Province, China
Postdoctoral Science Foundation of Jiangsu Province, China
China Postdoctoral Science Foundation

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
355
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ying ZZhang JLi QWu MSheng V(2024)A Little Truth Injection but a Big Reward: Label Aggregation With Graph Neural NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.3338216(1-14)Online publication date: 2024
https://doi.org/10.1109/TPAMI.2023.3338216
Wu GZhuo XZhou LBao XHong RWu X(2023)TIRA: Truth Inference via Reliability Aggregation on Object-Source GraphIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322530835:11(11967-11981)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3225308
Wang SDang D(2023)A Generative Answer Aggregation Model for Sentence-Level Crowdsourcing TasksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314282135:4(3299-3312)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TKDE.2022.3142821
Wu GZhuo XBao XHu XHong RWu X(2022)Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph EmbeddingACM Transactions on Knowledge Discovery from Data10.1145/356557617:5(1-26)Online publication date: 4-Oct-2022
https://dl.acm.org/doi/10.1145/3565576
Zhang J(2022)Knowledge Learning With Crowdsourcing: A Brief Review and Systematic PerspectiveIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2022.1054349:5(749-762)Online publication date: May-2022
https://doi.org/10.1109/JAS.2022.105434
Tu JYu GWang JDomeniconi CGuo MZhang X(2020)CrowdWTACM Transactions on Knowledge Discovery from Data10.1145/342171215:1(1-24)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3421712
Chatterjee SMukhopadhyay ABhattacharyya M(2020)A Review of Judgment Analysis Algorithms for Crowdsourced OpinionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.290406432:7(1234-1248)Online publication date: 1-Jul-2020
https://doi.org/10.1109/TKDE.2019.2904064
Tao FJiang LLi C(2020)Label similarity-based weighted soft majority voting and pairing for crowdsourcingKnowledge and Information Systems10.1007/s10115-020-01475-yOnline publication date: 14-May-2020
https://doi.org/10.1007/s10115-020-01475-y
Zhang JSheng VWu J(2019)Crowdsourced Label Aggregation Using Bilayer Collaborative ClusteringIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2018.289014830:10(3172-3185)Online publication date: Oct-2019
https://doi.org/10.1109/TNNLS.2018.2890148
Zhang JWu X(2019)Multi-Label Truth Inference for Crowdsourcing Using Mixture ModelsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2951668(1-1)Online publication date: 2019
https://doi.org/10.1109/TKDE.2019.2951668

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents