Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Combining Human and Machine Confidence in Truthfulness Assessment

Published: 28 December 2022 Publication History

Abstract

Automatically detecting online misinformation at scale is a challenging and interdisciplinary problem. Deciding what is to be considered truthful information is sometimes controversial and also difficult for educated experts. As the scale of the problem increases, human-in-the-loop approaches to truthfulness that combine both the scalability of machine learning (ML) and the accuracy of human contributions have been considered.
In this work, we look at the potential to automatically combine machine-based systems with human-based systems. The former exploit superviseds ML approaches; the latter involve either crowd workers (i.e., human non-experts) or human experts. Since both ML and crowdsourcing approaches can produce a score indicating the level of confidence on their truthfulness judgments (either algorithmic or self-reported, respectively), we address the question of whether it is feasible to make use of such confidence scores to effectively and efficiently combine three approaches: (i) machine-based methods, (ii) crowd workers, and (iii) human experts. The three approaches differ significantly, as they range from available, cheap, fast, scalable, but less accurate to scarce, expensive, slow, not scalable, but highly accurate.

References

[1]
Jennifer Allen, Antonio A. Arechar, Gordon Pennycook, and David G. Rand. 2021. Scaling up fact-checking using the wisdom of crowds. Science Advances 7, 36 (2021), eabf4393.
[2]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120.
[3]
Lora Aroyo and Chris Welty. 2013. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. In The 5th International ACM Conference on Web Science in 2013 (WebSci’13). ACM, Paris, France.
[4]
Joshua Attenberg, Panos Ipeirotis, and Foster Provost. 2015. Beat the machine: Challenging humans to find a predictive model’s “unknown unknowns”. Journal of Data and Information Quality (JDIQ) 6, 1 (2015), 1–17.
[5]
Alessandro Checco, Kevin Roitero, Eddy Maddalena, Stefano Mizzaro, and Gianluca Demartini. 2017. Let’s agree to disagree: Fixing agreement measures for crowdsourcing. In Proceedings of HCOMP. AAAI Press, Quebec City, Canada, 11–20. https://aaai.org/ocs/index.php/HCOMP/HCOMP17/paper/view/15927
[6]
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of WWW. ACM, New York, NY, 469–478.
[7]
Gianluca Demartini, Djellel Eddine Difallah, Ujwal Gadiraju, and Michele Catasta. 2017. An introduction to hybrid human-machine information systems. Foundations and Trends in Web Science 7, 1 (2017), 1–87.
[8]
Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop artificial intelligence for fighting online misinformation: Challenges and opportunities. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 43, 3 (Sep2020), 65–74. http://sites.computer.org/debull/A20sept/p65.pdf
[9]
Gianluca Demartini, Beth Trushkowsky, Tim Kraska, Michael J. Franklin, and UC Berkeley. 2013. CrowdQ: Crowdsourced query understanding. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR’13). www.cidrdb.org, Asilomar, CA, USA. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper137.pdf
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805http://arxiv.org/abs/1810.04805
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.
[12]
Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2016. Scheduling human intelligence tasks in multi-tenant crowd-powered systems. In Proceedings of the 25th International Conference on World Wide Web. ACM, New York, NY, 855–865.
[13]
Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. 2011. CrowdDB: Answering queries with crowdsourcing. In Proceedings of SIGMOD. ACM, New York, NY, 61–72.
[14]
Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, and Stefan Dietze. 2015. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems 30, 4 (2015), 81–85.
[15]
Ujwal Gadiraju, Besnik Fetahu, Ricardo Kawase, Patrick Siehndel, and Stefan Dietze. 2017. Using worker self-assessments for competence-based pre-selection in crowdsourcing microtasks. ACM Trans. Comput.-Hum. Interact. 24, 4, Article 30 (Aug.2017), 26 pages.
[16]
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 1321–1330. https://proceedings.mlr.press/v70/guo17a.html
[17]
Ralph Hertwig. 2012. Tapping into the wisdom of the crowd-with confidence. Science 336, 6079 (2012), 303–304.
[18]
Julian Jarrett, Larissa Ferreira Da Silva, Laerte Mello, Sadallo Andere, Gustavo Cruz, and M. Brian Blake. 2015. Self-generating a labor force for crowdsourcing: Is worker confidence a predictor of quality?. In Proceedings of the 3rd IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb’15). IEEE, Washington, DC, 85–90.
[19]
Thorsten Joachims and Filip Radlinski. 2007. Search engines that learn from implicit feedback. Computer 40, 8 (2007), 34–40.
[20]
Manas Joglekar, Hector Garcia-Molina, and Aditya Parameswaran. 2013. Evaluating the crowd with confidence. In Proceedings of KDD. ACM, Chicago, IL, 686–694.
[21]
David La Barbera, Kevin Roitero, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2020. Crowdsourcing truthfulness: The impact of judgment scale and assessor bias. In Proceedings of ECIR. 207–214.
[22]
Qunwei Li and Pramod K. Varshney. 2017. Does confidence reporting from the crowd benefit crowdsourcing performance?. In Proceedings of the 2nd International Workshop on Social Sensing (SocialSens). ACM, Pittsburgh, PA, 49–54.
[23]
Eddy Maddalena, Kevin Roitero, Gianluca Demartini, and Stefano Mizzaro. 2017. Considering assessor agreement in IR evaluation. In Proceedings of ICTIR. ACM, Amsterdam, NL, 75–82.
[24]
Amit Mandelbaum and Daphna Weinshall. 2017. Distance-based confidence score for neural network classifiers. (2017). https://arxiv.org/abs/1709.09844
[25]
Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated fact-checking for assisting human fact-checkers. In Proceedings of IJCAI. IJCAI, 4551–4558. Survey Track.
[26]
Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval (MIR’10). ACM, Philadelphia, PA, 557–566.
[27]
Matteo Poggi, Fabio Tosi, and Stefano Mattoccia. 2017. Quantitative evaluation of confidence measures in a machine learning world. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). Piscataway, NJ, IEEE 2017, Venice, IT, 5228–5237.
[28]
Kevin Roitero, Michael Soprano, Shaoyang Fan, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2020. Can the crowd identify misinformation objectively? The effects of judgment scale and assessor’s background. In Proceedings of SIGIR. ACM, Xi’an, China, 439–448.
[29]
Kevin Roitero, Michael Soprano, Beatrice Portelli, Massimiliano De Luise, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, and Gianluca Demartini. 2021. Can the crowd judge truthfulness? A longitudinal study on recent misinformation about COVID-19. (2021), 31 pages.
[30]
Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, and Gianluca Demartini. 2020. The COVID-19 infodemic: Can the crowd judge recent misinformation objectively? In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland. ACM, New York, NY, 1305–1314.
[31]
Lauren L. Saling, Devi Mallal, Falk Scholer, Russell Skelton, and Damiano Spina. 2021. No one is immune to misinformation: An investigation of misinformation sharing by subscribers to a fact-checking newsletter. PLOS ONE 16, 8 (82021), 1–13.
[32]
Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. Crowdmap: Crowdsourcing ontology alignment with microtasks. In Proceedings of the International Semantic Web Conference (ISWC’12). Springer, Boston, MA, 525–541.
[33]
Thibault Sellam, Steve Yadlowsky, Ian Tenney, Jason Wei, Naomi Saphra, Alexander Nicholas D’Amour, Tal Linzen, Jasmijn Bastings, Iulia Raluca Turc, Jacob Eisenstein, Dipanjan Das, and Ellie Pavlick. 2022. The MultiBERTs: BERT reproductions for robustness analysis. In ICLR 2022. https://arxiv.org/abs/2106.16163
[34]
Burr Settles. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences. http://digital.library.wisc.edu/1793/60660
[35]
Jinhua Song, Hao Wang, Yang Gao, and Bo An. 2018. Active learning with confidence-based answers for crowdsourcing labeling tasks. Knowledge-Based Systems 159 (2018), 244–258.
[36]
Michael Soprano, Kevin Roitero, David La Barbera, Davide Ceolin, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2021. The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale. Information Processing & Management 58, 6 (2021), 102710.
[37]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NeurIPS. Curran Associates, Inc., Long Beach, CA, 5998–6008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[38]
Michela Del Vicario, Walter Quattrociocchi, Antonio Scala, and Fabiana Zollo. 2019. Polarization and fake news: Early warning of potential misinformation targets. ACM Trans. Web 13, 2, Article 10 (Mar2019), 22 pages.
[39]
Luis Von Ahn. 2006. Games with a purpose. Computer 39, 6 (2006), 92–94.
[40]
William Yang Wang. 2017. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of ACL. ACL, Vancouver, Canada, 422–426. https://aclanthology.org/P17-2067

Cited By

View all
  • (2024)Combining Large Language Models and Crowdsourcing for Hybrid Human-AI Misinformation DetectionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657965(2332-2336)Online publication date: 10-Jul-2024
  • (2024)Crowdsourced Fact-checkingInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10379261:5Online publication date: 1-Sep-2024
  • (2023)Combining human intelligence and machine learning for fact-checking: Towards a hybrid human-in-the-loop frameworkIntelligenza Artificiale10.3233/IA-23001117:2(163-172)Online publication date: 20-Dec-2023

Index Terms

  1. Combining Human and Machine Confidence in Truthfulness Assessment

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Journal of Data and Information Quality
      Journal of Data and Information Quality  Volume 15, Issue 1
      March 2023
      197 pages
      ISSN:1936-1955
      EISSN:1936-1963
      DOI:10.1145/3578367
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 December 2022
      Online AM: 11 July 2022
      Accepted: 25 May 2022
      Revised: 31 March 2022
      Received: 23 November 2021
      Published in JDIQ Volume 15, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Misinformation
      2. crowdsourcing
      3. hybrid intelligence

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • Facebook Research award, the Australian Research Council
      • ARC Training Centre for Information Resilience
      • ARC Centre of Excellence for Automated Decision-Making and Society

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)134
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Combining Large Language Models and Crowdsourcing for Hybrid Human-AI Misinformation DetectionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657965(2332-2336)Online publication date: 10-Jul-2024
      • (2024)Crowdsourced Fact-checkingInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10379261:5Online publication date: 1-Sep-2024
      • (2023)Combining human intelligence and machine learning for fact-checking: Towards a hybrid human-in-the-loop frameworkIntelligenza Artificiale10.3233/IA-23001117:2(163-172)Online publication date: 20-Dec-2023

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media