Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3299869.3320221acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

CrowdGame: A Game-Based Crowdsourcing System for Cost-Effective Data Labeling

Published: 25 June 2019 Publication History

Abstract

Large-scale data labeling has become a major bottleneck for many applications, such as machine learning and data integration. This paper presents CrowdGame, a crowdsourcing system that harnesses the crowd to gather data labels in a cost-effective way. CrowdGame focuses on generating high-quality labeling rules to largely reduce the labeling cost while preserving quality. It first generates candidate rules, and then devises a game-based crowdsourcing approach to select rules with high coverage and accuracy. CrowdGame applies the generated rules for effective data labeling. We have implemented CrowdGame and provided a user-friendly interface for users to deploy their labeling applications. We will demonstrate CrowdGame in two representative data labeling scenarios, entity matching and relation extraction.

References

[1]
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248--255, 2009.
[2]
J. Fan, G. Li, B. C. Ooi, K. Tan, and J. Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, pages 1015--1030, 2015.
[3]
D. Gao, Y. Tong, J. She, T. Song, L. Chen, and K. Xu. Top-k team recommendation and its variants in spatial crowdsourcing. Data Science and Engineering, 2(2):136--150, 2017.
[4]
A. Ratner, S. H. Bach, H. R. Ehrenberg, J. A. Fries, S. Wu, and C. Ré . Snorkel: Rapid training data creation with weak supervision. PVLDB, 11(3):269--282, 2017.
[5]
Y. Tong, L. Chen, Z. Zhou, H. V. Jagadish, L. Shou, and W. Lv. SLADE: A smart large-scale task decomposer in crowdsourcing. IEEE Trans. Knowl. Data Eng., 30(8):1588--1601, 2018.
[6]
P. Varma and C. Ré . Snuba: Automating weak supervision to label training data. PVLDB, 12(3):223--236, 2018.
[7]
J. Yang, J. Fan, Z. Wei, G. Li, T. Liu, and X. Du. Cost-effective data annotation using game-based crowdsourcing. PVLDB, 12(1):57--70, 2018.

Cited By

View all
  • (2023)Tabular data synthesis with generative adversarial networks: design space and optimizationsThe VLDB Journal10.1007/s00778-023-00807-y33:2(255-280)Online publication date: 15-Aug-2023
  • (2022)Data Management for Machine Learning: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3148237(1-1)Online publication date: 2022
  • (2021)CrowdRL: An End-to-End Reinforcement Learning Framework for Data Labelling2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00032(289-300)Online publication date: Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowdsourcing
  2. data labeling
  3. rule learning

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • the 973 Program of China
  • the Humanities and Social Sciences Base Foundation of MOE of China

Conference

SIGMOD/PODS '19
Sponsor:
SIGMOD/PODS '19: International Conference on Management of Data
June 30 - July 5, 2019
Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Tabular data synthesis with generative adversarial networks: design space and optimizationsThe VLDB Journal10.1007/s00778-023-00807-y33:2(255-280)Online publication date: 15-Aug-2023
  • (2022)Data Management for Machine Learning: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3148237(1-1)Online publication date: 2022
  • (2021)CrowdRL: An End-to-End Reinforcement Learning Framework for Data Labelling2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00032(289-300)Online publication date: Apr-2021
  • (2020)Inspector gadgetProceedings of the VLDB Endowment10.14778/3421424.342142914:1(28-36)Online publication date: 27-Oct-2020
  • (2020)Relational data synthesis using generative adversarial networksProceedings of the VLDB Endowment10.14778/3407790.340780213:12(1962-1975)Online publication date: 14-Sep-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media