Abstract
Crowdsourcing is an effective method to obtain large databases of manually-labeled images, which is especially important for image understanding with supervised machine learning algorithms. However, for several kinds of tasks regarding image labeling, e.g., dog breed recognition, it is hard to achieve high-quality results. Therefore, further optimizing crowdsourcing workflow mainly involves task allocation and result inference. For task allocation, we design a two-round crowdsourcing framework, which contains a smart decision mechanism based on information entropy to determine whether to perform the second round task allocation. Regarding result inference, after quantifying the similarity of all labels, two graphical models are proposed to describe the labeling process and corresponding inference algorithms are designed to further improve the result quality of image labeling. Extensive experiments on real-world tasks in Crowdflower and synthesis datasets were conducted. The experimental results demonstrate the superiority of these methods in comparison with state-of-the-art methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J R. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proc. the 23rd Annual Conference on Neural Information Processing Systems (NIPS), December 2009, pp.2035-2043.
Li G, Wang J, Zheng Y, Franklin M J. Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng., 2016, 28(9): 2296-2319.
Deng J, Dong W, Socher R, Li L, Li K, Li F. ImageNet: A large-scale hierarchical image database. In Proc. the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2009, pp.248-255.
von Ahn L, Dabbish L. Labeling images with a computer game. In Proc. the 2004 Conference on Human Factors in Computing Systems (CHI), April 2004, pp.319-326.
Han T, Sun H, Song Y, Fang Y, Liu X. Incorporating external knowledge into crowd intelligence for more specific knowledge acquisition. In Proc. the 25th International Joint Conference on Artificial Intelligence (IJCAI), July 2016, pp.1541-1547.
Fang Y, Sun H, Li G, Zhang R, Huai J. Effective result inference for context-sensitive tasks in crowdsourcing. In Proc. the 21st Database Systems for Advanced Applications (DASFAA), April 2016, pp.33-48.
Li G, Zheng Y, Fan J, Wang J, Cheng R. Crowdsourced data management: Overview and challenges. In Proc. the ACM International Conference on Management of Data (SIGMOD), May 2017, pp.1711-1716.
Li G, Chai C, Fan J, Weng X, Li J, Zheng Y, Li Y, Yu X, Zhang X, Yuan H. CDB: Optimizing queries with crowd-based selections and joins. In Proc. the ACM International Conference on Management of Data (SIGMOD), May 2017, pp.1463-1478.
Hu H, Zheng Y, Bao Z, Li G, Feng J, Cheng R. Crowdsourced POI labelling: Location-aware result inference and task assignment. In Proc. the 32nd IEEE International Conference on Data Engineering (ICDE), May 2016, pp.61-72.
Zheng Y, Wang J, Li G, Cheng R, Feng J. QASCA: A quality-aware task assignment system for crowdsourcing applications. In Proc. the ACM SIGMOD International Conference on Management of Data (SIGMOD), May 31-June 4, 2015, pp.1031-1046.
Zheng Y, Cheng R, Maniu S, Mo L. On optimality of jury selection in crowdsourcing. In Proc. the 18th International Conference on Extending Database Technology (EDBT), Mar. 2015, pp.193-204.
Zheng Y, Li G, Li Y, Shan C, Cheng R. Truth inference in crowdsourcing: Is the problem solved? Proceedings of the VLDB Endowment (PVLDB), 2017, 10(5): 541-552.
Zheng Y, Li G, Cheng R. DOCS: A domain-aware crowdsourcing system using knowledge bases. Proceedings of the VLDB Endowment (PVLDB), 2016, 10(4): 361-372.
von Ahn L. Duolingo: Learn a language for free while helping to translate the Web. In Proc. the 18th International Conference on Intelligent User Interfaces (IUI), March 2013.
Hu J, Oh J, Gershman A. Learning lexical entries for robotic commands using crowdsourcing. arXiv: 1609.02549, 2016. https://arxiv.org/abs/1609.02549, July 2017.
Li Z, Wang T, Zhang Y, Zhan Y, Yin G. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proc. the 8th Asia-Pacific Symposium on Internetware, September 2016, pp.36-44.
Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society, 1979, 28(1): 20-28.
Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. J. Mach. Learn. Res., 2010, 11: 1297-1322.
Lakshminarayanan B, Teh Y W. Inferring ground truth from multi-annotator ordinal data: A probabilistic approach. arXiv: 1305.0015, 2013. https://arxiv.org/abs/1305.0015, July 2017.
Mamykina L, Smyth T N, Dimond J P, Gajos K Z. Learning from the crowd: Observational learning in crowdsourcing communities. In Proc. CHI Conference on Human Factors in Computing Systems (CHI), May 2016, pp.2635-2644.
Pan S, Larson K, Bradshaw J, Law E. Dynamic task allocation algorithm for hiring workers that learn. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.3825-3831.
Liu Y, Liu Y, Zhang M, Ma S. Pay me and I’ll follow you: Detection of crowdturfing following activities in microblog environment. In Proc. the 25th International Joint Conference on Artificial Intelligence (IJCAI), July 2016, pp.3789-3796.
Liu Q, Peng J, Ihler A T. Variational inference for crowdsourcing. In Proc. the 26th Advances in Neural Information Processing Systems (NIPS), December 2012, pp.701-709.
Sheng V S, Provost F J, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), August 2008, pp.614-622.
Gao C, Lu Y, Zhou D. Exact exponent in optimal rates for crowdsourcing. In Proc. the 33rd International Conference on Machine Learning (ICML), June 2016, pp.603-611.
Zhang Y, Chen X, Zhou D, Jordan M I. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. In Proc. the 28th Advances in Neural Information Processing Systems (NIPS), December 2014, pp.1260-1268.
Zhou D, Platt J C, Basu S, Mao Y. Learning from the wisdom of crowds by minimax entropy. In Proc. the 26th Advances in Neural Information Processing Systems (NIPS), December 2012, pp.2204-2212.
Chiang L H, Russell R D. Pattern Classification. Springer London, 2001.
Yang J, Bozzon A, Houben G. Knowledge crowdsourcing acceleration. In Proc. the 15th International Conference Engineering the Web in the Big Data Era, June 2015, pp.639-643.
Li H, Zhao B, Fuxman A. The wisdom of minority: Discovering and targeting the right group of workers for crowdsourcing. In Proc. the 23rd International Conference on World Wide Web (WWW), April 2014, pp.165-176.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 635 kb)
Rights and permissions
About this article
Cite this article
Fang, YL., Sun, HL., Chen, PP. et al. Improving the Quality of Crowdsourced Image Labeling via Label Similarity. J. Comput. Sci. Technol. 32, 877–889 (2017). https://doi.org/10.1007/s11390-017-1770-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-017-1770-7