Abstract
The power and expressiveness of AIs are rapidly increasing, and now AIs have the ability to complete tasks in crowdsourcing as if they were human crowd workers. Therefore, the development of methods to effectively aggregate the results of tasks performed by AIs and humans is becoming a critical problem. In this study, we revisit the Dawid-Skene model that has been used to aggregate human votes to obtain better results in classification problems. Most of the state-of-the-art AI classifiers predict the class probabilities as their output. Considering the probabilities represent their uncertainty, utilizing them in Dawid-Skene aggregation may provide higher-quality annotations. To this end, we introduce a variation of the Dawid-Skene model to directly use the probabilities without discarding them and conduct experiments with two real-world datasets of different domains. Experimental results show that the Dawid-Skene model with probabilities improves the overall accuracy. Moreover, a detailed analysis shows that the aggregation results were improved for classification tasks with high uncertainty.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amer-Yahia, S., et al.: Making AI machines work for humans in FoW. ACM SIGMOD Rec. 49(2), 30–35 (2020)
Bi, W., Wang, L., Kwok, J.T., Tu, Z.: Learning to predict from crowdsourced data. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI), pp. 82–91 (2014)
Branson, S., Horn, G.V., Perona, P.: Lean crowdsourcing: combining humans and machines in an online system. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6109–6118 (2017)
Correia, A., et al.: Designing for hybrid intelligence: a taxonomy and survey of crowd-machine interaction. Appl. Sci. 13(4), 2198 (2023)
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 20–28 (1979)
He, X., et al.: AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. arXiv preprint arXiv:2303.16854 (2023)
Kanda, T., Ito, H., Morishima, A.: Efficient evaluation of AI workers for the human+AI crowd task assignment. In: Proceedings of IEEE International Conference on Big Data (BigData), pp. 3995–4001 (2022)
Kobayashi, M., Wakabayashi, K., Morishima, A.: Human+AI crowd task assignment considering result quality requirements. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP), vol. 9, pp. 97–107 (2021)
Krivosheev, E., Casati, F., Baez, M., Benatallah, B.: Combining crowd and machines for multi-predicate item screening. In: Proceedings of the ACM on Human-Computer Interaction (CSCW), vol. 2, pp. 1–18 (2018)
Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N 7(7) (2015)
Liao, Y.H., Kar, A., Fidler, S.: Towards good practices for efficiently annotating large-scale image classification datasets. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4348–4357 (2021)
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 task 4: sentiment analysis in Twitter. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval), pp. 1–18 (2016)
Nguyen, D.Q., Vu, T., Nguyen, A.T.: BERTweet: a pre-trained language model for English Tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP), pp. 9–14 (2020)
Oyama, S., Baba, Y., Sakurai, Y., Kashima, H.: Accurate integration of crowdsourced labels using workers’ self-reported confidence scores. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), pp. 2554–2560 (2013)
Pérez, J.M., Giudici, J.C., Luque, F.: pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks. arXiv preprint arXiv:2106.09462 (2021)
Ramírez, J., Baez, M., Casati, F., Benatallah, B.: Understanding the impact of text highlighting in crowdsourcing tasks. In: Proceedings of the Seventh AAAI Conference on Human Computation and Crowdsourcing (HCOMP), vol. 7, pp. 144–152 (2019)
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS), pp. 2035–2043 (2009)
Yamashita, Y., Ito, H., Wakabayashi, K., Kobayashi, M., Morishima, A.: HAEM: obtaining higher-quality classification task results with AI workers. In: Proceedings of the 14th ACM Web Science Conference (WebSci), pp. 118–128 (2022)
Acknowledgements
This work was supported by JSPS KAKENHI Grant Number JP21H03552, JP22H00508, JP22K17944, JP23H03405, JST CREST Grant Number JPMJCR21D1, and JPMJCR22M2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tamura, T., Ito, H., Oyama, S., Morishima, A. (2024). Influence of AI’s Uncertainty in the Dawid-Skene Aggregation for Human-AI Crowdsourcing. In: Sserwanga, I., et al. Wisdom, Well-Being, Win-Win. iConference 2024. Lecture Notes in Computer Science, vol 14598. Springer, Cham. https://doi.org/10.1007/978-3-031-57867-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-57867-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57866-3
Online ISBN: 978-3-031-57867-0
eBook Packages: Computer ScienceComputer Science (R0)