research-article

Open access

Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement

Authors:

Amy X. ZhangAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 7, Issue CSCW2

Article No.: 283, Pages 1 - 26

https://doi.org/10.1145/3610074

Published: 04 October 2023 Publication History

Abstract

When groups of people are tasked with making a judgment, the issue of uncertainty often arises. Existing methods to reduce uncertainty typically focus on iteratively improving specificity in the overall task instruction. However, uncertainty can arise from multiple sources, such as ambiguity of the item being judged due to limited context, or disagreements among the participants due to different perspectives and an under-specified task. A one-size-fits-all intervention may be ineffective if it is not targeted to the right source of uncertainty. In this paper we introduce a new workflow, Judgment Sieve, to reduce uncertainty in tasks involving group judgment in a targeted manner. By utilizing measurements that separate different sources of uncertainty during an initial round of judgment elicitation, we can then select a targeted intervention adding context or deliberation to most effectively reduce uncertainty on each item being judged. We test our approach on two tasks: rating word pair similarity and toxicity of online comments, showing that targeted interventions reduced uncertainty for the most uncertain cases. In the top 10% of cases, we saw an ambiguity reduction of 21.4% and 25.7%, and a disagreement reduction of 22.2% and 11.2% for the two tasks respectively. We also found through a simulation that our targeted approach reduced the average uncertainty scores for both sources of uncertainty as opposed to uniform approaches where reductions in average uncertainty from one source came with an increase for the other.

References

[1]

Lora Aroyo and Chris Welty. 2013. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. WebSci2013. ACM, Vol. 2013, 2013 (2013).

[2]

Shubham Atreja, Libby Hemphill, and Paul Resnick. 2022. What is the Will of the People? Moderation Preferences for Misinformation. ArXiv, Vol. abs/2202.00799 (2022).

[3]

Stephanie Alice Baker, Matthew Wade, and Michael James Walsh. 2020. The challenges of responding to misinformation during a pandemic: content moderation and the limitations of the concept of harm. Media International Australia, Vol. 177 (2020), 103--107.

[4]

Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. 2011. Crowds in Two Seconds: Enabling Realtime Crowd-Powered Interfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). Association for Computing Machinery, New York, NY, USA, 33--42. https://doi.org/10.1145/2047196.2047201

Digital Library

[5]

Lucas Beyer, Olivier J. H'enaff, Alexander Kolesnikov, Xiaohua Zhai, and A"aron van den Oord. 2020. Are we done with ImageNet? ArXiv, Vol. abs/2006.07159 (2020).

[6]

Umang Bhatt, Javier Antorán, Yunfeng Zhang, Q. Vera Liao, Prasanna Sattigeri, Riccardo Fogliato, Gabrielle Melanccon, Ranganath Krishnan, Jason Stanley, Omesh Tickoo, Lama Nachman, Rumi Chunara, Madhulika Srikumar, Adrian Weller, and Alice Xiang. 2021. Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty. Association for Computing Machinery, New York, NY, USA, 401--413. https://doi.org/10.1145/3461702.3462571

Digital Library

[7]

Abeba Birhane. 2021. The Impossibility of Automating Ambiguity. Artificial Life, Vol. 27 (2021), 44--61.

[8]

Flora Blangis, Slimane Allali, Jérémie F Cohen, Nathalie Vabres, Catherine Adamsbaum, Caroline Rey-Salmon, Andreas Werner, Yacine Refes, Pauline Adnot, Christèle Gras-Le Guen, et al. 2021. Variations in guidelines for diagnosis of child physical abuse in high-income countries: a systematic review. JAMA network open, Vol. 4, 11 (2021), e2129068--e2129068.

[9]

Jonathan Bragg, Mausam, and Daniel S. Weld. 2018. Sprout: Crowd-Powered Task Design for Crowdsourcing. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST '18). Association for Computing Machinery, New York, NY, USA, 165--176. https://doi.org/10.1145/3242587.3242598

Digital Library

[10]

Hancheng Cao, Vivian Yang, Victor Chen, Yu Jin Lee, Lydia Stone, N'godjigui Junior Diarrassouba, Mark E. Whiting, and Michael S. Bernstein. 2021. My Team Will Go On: Differentiating High and Low Viability Teams through Team Interaction. Proc. ACM Hum.-Comput. Interact., Vol. 4, CSCW3, Article 230 (jan 2021), 27 pages. https://doi.org/10.1145/3432929

Digital Library

[11]

Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2334--2346.

Digital Library

[12]

Quanze Chen, Jonathan Bragg, Lydia B. Chilton, and Dan S. Weld. 2019. Cicero: Multi-Turn, Contextual Argumentation for Accurate Crowdsourcing. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--14. https://doi.org/10.1145/3290605.3300761

Digital Library

[13]

Quan Ze Chen, Daniel S. Weld, and Amy X. Zhang. 2021. Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty. Proc. ACM Hum.-Comput. Interact., Vol. 5, CSCW2, Article 335 (oct 2021), 25 pages. https://doi.org/10.1145/3476076

Digital Library

[14]

John Joon Young Chung, Jean Y. Song, Sindhu Kutty, Sungsoo (Ray) Hong, Juho Kim, and Walter S. Lasecki. 2019. Efficient Elicitation Approaches to Estimate Collective Crowd Answers. Proc. ACM Hum.-Comput. Interact., Vol. 3, CSCW, Article 62 (nov 2019), 25 pages. https://doi.org/10.1145/3359164

Digital Library

[15]

Katherine M. Collins, Umang Bhatt, and Adrian Weller. 2022. Eliciting and Learning with Soft Labels from Every Annotator. In Proceedings of the Tenth AAAI Conference on Human Computation and Crowdsourcing (HCOMP2022) (HCOMP '22). Association for the Advancement of ArtificialIntelligence, Washington, DC, USA.

[16]

Corinna Cortes and Neil D. Lawrence. 2021. Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment. ArXiv, Vol. abs/2109.09774 (2021).

[17]

Stephen Crowder, Collin Delker, Eric Forrest, and Nevin Martin. 2020. Introduction to Statistics in Metrology. Springer.

[18]

Todd Davies and Reid Chandler. 2013. Online deliberation design: Choices, criteria, and evidence. arXiv preprint arXiv:1302.5177 (2013).

[19]

A. Philip Dawid and Allan Skene. 1979. Maximum Likelihood Estimation of Observer Error?Rates Using the EM Algorithm. Journal of The Royal Statistical Society Series C-applied Statistics, Vol. 28 (1979), 20--28.

[20]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, K. Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), 248--255.

[21]

Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of mechanical turk workers. In Proceedings of the eleventh ACM international conference on web search and data mining. 135--143.

Digital Library

[22]

Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2012. Mechanical cheat: Spamming schemes and adversarial techniques on crowdsourcing platforms. In CrowdSearch.

[23]

Ryan Drapeau, Lydia Chilton, Jonathan Bragg, and Daniel Weld. 2016. Microtalk: Using argumentation to improve crowdsourcing accuracy. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 4. 32--41.

[24]

Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018. Crowdsourcing Ground Truth for Medical Relation Extraction. ACM Trans. Interact. Intell. Syst., Vol. 8, 2, Article 11 (jul 2018), 20 pages. https://doi.org/10.1145/3152889

Digital Library

[25]

Jenny Fan and Amy X. Zhang. 2020. Digital Juries: A Civics-Oriented Approach to Platform Governance. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020).

Digital Library

[26]

Lev Finkelstein, Evgeniy Gabrilovich, Y. Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing Search in Context: The Concept Revisited. ACM Trans. Inf. Syst., Vol. 20, 1 (jan 2002), 116--131. https://doi.org/10.1145/503104.503110

Digital Library

[27]

Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy, and Massimo Poesio. 2021. Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2591--2597. https://doi.org/10.18653/v1/2021.naacl-main.204

[28]

Craig R Fox and Gülden Ülkümen. 2011. Distinguishing two dimensions of uncertainty. Fox, Craig R. and Gülden Ülkümen (2011),?Distinguishing Two Dimensions of Uncertainty," in Essays in Judgment and Decision Making, Brun, W., Kirkebøen, G. and Montgomery, H., eds. Oslo: Universitetsforlaget (2011).

[29]

Ujwal Gadiraju, Besnik Fetahu, and Ricardo Kawase. 2015. Training workers for improving performance in crowdsourcing microtasks. In European Conference on Technology Enhanced Learning. Springer, 100--114.

Digital Library

[30]

Ujwal Gadiraju, Jie Yang, and Alessandro Bozzon. 2017. Clarity is a Worthwhile Quality: On the Role of Task Clarity in Microtask Crowdsourcing. In Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT '17). Association for Computing Machinery, New York, NY, USA, 5--14. https://doi.org/10.1145/3078714.3078715

Digital Library

[31]

Timnit Gebru, Jamie H. Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna M. Wallach, Hal Daumé, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM, Vol. 64 (2021), 86--92.

Digital Library

[32]

R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from? Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (2020).

Digital Library

[33]

Tarleton Gillespie. 2018. Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media. Yale University Press.

[34]

Mitchell L. Gordon, Kaitlyn Zhou, Kayur Patel, Tatsunori Hashimoto, and Michael S. Bernstein. 2021. The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 388, 14 pages. https://doi.org/10.1145/3411764.3445423

Digital Library

[35]

Shinsuke Goto, Toru Ishida, and Donghui Lin. 2016. Understanding Crowdsourcing Workflow: Modeling and Optimizing Iterative and Parallel Processes. In AAAI Conference on Human Computation & Crowdsourcing.

[36]

Kevin A. Hallgren. 2012. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutorials in quantitative methods for psychology, Vol. 8 1 (2012), 23--34.

[37]

Danula Hettiachchi, Mike Schaekermann, Tristan J. McKinney, and Matthew Lease. 2021. The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help. Proc. ACM Hum.-Comput. Interact., Vol. 5, CSCW2, Article 332 (oct 2021), 26 pages. https://doi.org/10.1145/3476073

Digital Library

[38]

Martin Hilbert. 2012. Toward a synthesis of cognitive biases: how noisy information processing can bias human decision making. Psychological bulletin, Vol. 138 2 (2012), 211--37.

[39]

Stephen C. Hora. 1996. Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management. Reliability Engineering & System Safety, Vol. 54 (1996), 217--223.

[40]

Eric Huang, Richard Socher, Christopher Manning, and Andrew Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Jeju Island, Korea, 873--882. https://aclanthology.org/P12--1092

Digital Library

[41]

E. Hullermeier and W. Waegeman. 2019. Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods. arXiv: Learning (2019).

[42]

Oana Inel and Lora Aroyo. 2017. Harnessing Diversity in Crowds and Machines for Better NER Performance. In ESWC.

[43]

Matthew Ingram. [n. d.]. Here's Why Facebook Removing That Vietnam War Photo Is So Important. Fortune ([n. d.]). https://fortune.com/2016/09/09/facebook-napalm-photo-vietnam-war/

[44]

Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality Management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP '10). Association for Computing Machinery, New York, NY, USA, 64--67. https://doi.org/10.1145/1837885.1837906

Digital Library

[45]

Jialun Aaron Jiang, Morgan Klaus Scheuerman, Casey Fiesler, and Jed R. Brubaker. 2021. Understanding international perceptions of the severity of harmful content online. PLoS ONE, Vol. 16 (2021).

[46]

V K. Chaithanya Manam, Dwarakanath Jampani, Mariam Zaim, Meng-Han Wu, and Alexander J. Quinn. 2019. TaskMate: A Mechanism to Improve the Quality of Instructions in Crowdsourcing. In Companion Proceedings of The 2019 World Wide Web Conference. 1121--1130.

[47]

Sanjay Kairam and Jeffrey Heer. 2016. Parting crowds: Characterizing divergent interpretations in crowdsourced annotation tasks. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 1637--1648.

Digital Library

[48]

Armen Der Kiureghian and O. Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural Safety, Vol. 31 (2009), 105--112.

[49]

Travis Kriplean, Jonathan T. Morgan, Deen Freelon, Alan Borning, and Lance Bennett. 2011. ConsiderIt: Improving Structured Public Deliberation. In CHI '11 Extended Abstracts on Human Factors in Computing Systems (CHI EA '11). ACM, New York, NY, USA, 1831--1836. https://doi.org/10.1145/1979742.1979869

Digital Library

[50]

Ji-Ung Lee, Jan-Christoph Klie, and Iryna Gurevych. 2022. Annotation Curricula to Implicitly Train Non-Expert Annotators. ArXiv, Vol. abs/2106.02382 (2022).

[51]

Elisa Leonardelli, Stefano Menini, Alessio Palmero Aprosio, Marco Guerini, and Sara Tonelli. 2021. Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators' Disagreement. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 10528--10539. https://doi.org/10.18653/v1/2021.emnlp-main.822

[52]

Dangwei Li, Zhang Zhang, Xiaotang Chen, and Kaiqi Huang. 2018. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE transactions on image processing, Vol. 28, 4 (2018), 1575--1590.

[53]

E Allan Lind, John Thibaut, and Laurens Walker. 1973. Discovery and presentation of evidence in adversary and nonadversary proceedings. Michigan Law Review, Vol. 71, 6 (1973), 1129--1144.

[54]

Angli Liu, Stephen Soderland, Jonathan Bragg, Christopher H. Lin, Xiao Ling, and Daniel S. Weld. 2016. Effective Crowd Annotation for Relation Extraction. In Proceedings of NAACL and HLT 2016.

[55]

VK Chaithanya Manam and Alexander J Quinn. 2018. Wingit: Efficient refinement of unclear task instructions. In Sixth AAAI Conference on Human Computation and Crowdsourcing.

[56]

V. K. Chaithanya Manam, Dwarakanath Jampani, Mariam Zaim, Meng-Han Wu, and Alexander J. Quinn. 2019. TaskMate: A Mechanism to Improve the Quality of Instructions in Crowdsourcing. Companion Proceedings of The 2019 World Wide Web Conference (2019).

Digital Library

[57]

Emily Megan Marshman, Ryan Thomas Sayer, Charles Henderson, Edit Yerushalmi, and Chandralekha Singh. 2018. The challenges of changing teaching assistants' grading practices: Requiring students to show evidence of understanding. Canadian Journal of Physics, Vol. 96 (2018), 420--437.

[58]

Aiden R. McGillicuddy, Jean-Grégoire Bernard, and Jocelyn Cranefield. 2020. Controlling Bad Behavior in Online Communities: An Examination of Moderation Work. In International Conference on Interaction Sciences.

[59]

Stefano Menini, Alessio Palmero Aprosio, and Sara Tonelli. 2021. Abuse is Contextual, What about NLP? The Role of Context in Abusive Language Annotation and Detection. ArXiv, Vol. abs/2103.14916 (2021).

[60]

George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM, Vol. 38, 11 (nov 1995), 39--41. https://doi.org/10.1145/219717.219748

Digital Library

[61]

Vikram Mohanty, David Thames, Sneha Mehta, and Kurt Luther. 2019. Photo Sleuth: Combining Human Expertise and Face Recognition to Identify Historical Portraits. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI '19). Association for Computing Machinery, New York, NY, USA, 547--557. https://doi.org/10.1145/3301275.3302301

Digital Library

[62]

Jethro Mullen and Charles Riley. [n. d.]. After outcry, Facebook will reinstate iconic Vietnam War photo. CNN Business ([n. d.]). https://money.cnn.com/2016/09/09/technology/facebook-censorship-vietnam-war-photo/index.html

[63]

Alexandra Papoutsaki, Hua Guo, Danaë Metaxa-Kakavouli, Connor Gramazio, Jeff Rasley, Wenting Xie, Guan Wang, and Jeff Huang. 2015. Crowdsourcing from Scratch: A Pragmatic Experiment in Data Collection by Novice Requesters. In HCOMP.

[64]

R. Passonneau and Bob Carpenter. 2013. The Benefits of a Model of Annotation. Transactions of the Association for Computational Linguistics, Vol. 2 (2013), 311--326.

[65]

Ellie Pavlick and Tom Kwiatkowski. 2019. Inherent Disagreements in Human Textual Inferences. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 677--694. https://doi.org/10.1162/tacl_a_00293

[66]

John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, and Ion Androutsopoulos. 2020. Toxicity Detection: Does Context Really Matter?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4296--4305. https://doi.org/10.18653/v1/2020.acl-main.396

[67]

Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, and Mark Diaz. 2021. On Releasing Annotator-Level Labels and Information in Datasets. In Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop. Association for Computational Linguistics, Punta Cana, Dominican Republic, 133--138. https://doi.org/10.18653/v1/2021.law-1.14

[68]

Vivek Pradhan, Mike Schaekermann, and Matthew Lease. 2021. In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers. ArXiv, Vol. abs/2112.02255 (2021).

[69]

David L. Rosenhan, Sara L. Eisner, and Robert J. Robinson. 1994. Notetaking can aid juror recall. Law and Human Behavior, Vol. 18 (1994), 53--61.

[70]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, Vol. 115 (2015), 211--252.

Digital Library

[71]

Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A Smith. 2019. The Risk of Racial Bias in Hate Speech Detection. In ACL. https://www.aclweb.org/anthology/P19--1163.pdf

[72]

Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A Smith. 2021. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. arXiv preprint arXiv:2111.07997 (2021).

[73]

Mike Schaekermann, Graeme Beaton, Minahz Habib, Andrew Lim, Kate Larson, and Edith Law. 2019. Understanding Expert Disagreement in Medical Data Analysis through Structured Adjudication. Proc. ACM Hum.-Comput. Interact., Vol. 3, CSCW, Article 76 (nov 2019), 23 pages. https://doi.org/10.1145/3359178

Digital Library

[74]

Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 154 (nov 2018), 19 pages. https://doi.org/10.1145/3274423

Digital Library

[75]

Arjun Singh, Sergey Karayev, Kevin Gutowski, and Pieter Abbeel. 2017. Gradescope: A Fast, Flexible, and Fair System for Scalable Assessment of Handwritten Work. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale (L@S '17). Association for Computing Machinery, New York, NY, USA, 81--88. https://doi.org/10.1145/3051457.3051466

Digital Library

[76]

Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Ng. 2008. Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Honolulu, Hawaii, 254--263. https://aclanthology.org/D08--1027

[77]

Robert Soden, Laura Devendorf, Richmond Y. Wong, Yoko Akama, and Ann Light. 2022. Modes of Uncertainty in HCI. Found. Trends Hum. Comput. Interact., Vol. 15 (2022), 317--426.

[78]

Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2014. Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), Reykjavik, Iceland, 1355--1358. http://www.lrec-conf.org/proceedings/lrec2014/pdf/1007_Paper.pdf

[79]

George Stoica, Emmanouil Antonios Platanios, and Barnab'as P'oczos. 2021. Re-TACRED: Addressing Shortcomings of the TACRED Dataset. In AAAI Conference on Artificial Intelligence.

[80]

Nicolas Suzor and Darryl Woodford. 2013. Evaluating consent and legitimacy amongst shifting community norms: an EVE Online case study. Suzor, Nicolas P. & Woodford, Darryl (2013) Evaluating consent and legitimacy amongst shifting community norms: an EVE Online case study. Journal of Virtual Worlds Research, Vol. 6, 3 (2013), 1--14.

[81]

Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. 2020. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 9275--9293. https://doi.org/10.18653/v1/2020.emnlp-main.746

[82]

R Peter Terrebonne. 1981. A strictly evolutionary model of common law. The Journal of Legal Studies, Vol. 10, 2 (1981), 397--407.

[83]

Craig Thorley, Lara Beaton, Phillip Deguara, Brittany Jerome, Dua Khan, and Kaela Schopp. 2020. Misinformation encountered during a simulated jury deliberation can distort jurors' memory of a trial and bias their verdicts. Legal and Criminological Psychology, Vol. 25 (2020), 150--164.

[84]

Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuristics and Biases. Science, Vol. 185 (1974), 1124--1131.

[85]

Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR'11). 21--26.

[86]

Warren E. Walker, Poul Harremoës, Jan Rotmans, Jeroen P. van der Sluijs, M. B. A. Asselt, Paul Janssen, and Martin Krayer von Krauss. 2003. Defining Uncertainty: A Conceptual Basis for Uncertainty Management in Model-Based Decision Support. Integrated Assessment, Vol. 4 (2003), 5--17.

[87]

Dongsheng Wang, Prayag Tiwari, Mohammad Shorfuzzaman, and Ingo Schmitt. 2021. Deep neural learning on weighted datasets utilizing label disagreement from crowdsourcing. Computer Networks, Vol. 196 (2021), 108227. https://doi.org/10.1016/j.comnet.2021.108227

Digital Library

[88]

Zeerak Waseem. 2016. Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In Proceedings of the First Workshop on NLP and Computational Social Science. Association for Computational Linguistics, Austin, Texas, 138--142. https://doi.org/10.18653/v1/W16--5618

[89]

Chris Welty, Lora Mois Aroyo, and Praveen Kumar Paritosh. 2019. A Metrological Framework for Evaluating Crowd-powered Instruments. In HCOMP-2019: AAAI Conference on Human Computation.

[90]

Mark E. Whiting, Allie Blaising, Chloe Barreau, Laura Fiuza, Nik Marda, Melissa Valentine, and Michael S. Bernstein. 2019. Did It Have To End This Way? Understanding The Consistency of Team Fracture. Proc. ACM Hum.-Comput. Interact., Vol. 3, CSCW, Article 209 (nov 2019), 23 pages. https://doi.org/10.1145/3359311

Digital Library

[91]

Janyce M. Wiebe, Rebecca F. Bruce, and Thomas P. O'Hara. 1999. Development and Use of a Gold-Standard Data Set for Subjectivity Classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, College Park, Maryland, USA, 246--253. https://doi.org/10.3115/1034678.1034721

Digital Library

[92]

Meng-Han Wu and Alexander J. Quinn. 2017. Confusing the Crowd: Task Instruction Quality on Amazon Mechanical Turk. In HCOMP.

[93]

Shuicheng Yan, Huan Wang, Thomas S. Huang, Qiong Yang, and Xiaoou Tang. 2007. Ranking with Uncertain Labels. 2007 IEEE International Conference on Multimedia and Expo (2007), 96--99.

[94]

Hao-Yu Yang, Junling Yang, Yue Pan, Kunlin Cao, Qi Song, Feng Gao, and Youbing Yin. 2019. Learn To Be Uncertain: Leveraging Uncertain Labels In Chest X-rays With Bayesian Neural Networks. In CVPR Workshops.

[95]

Amy X Zhang, Bryan Culbertson, and Praveen Paritosh. 2017a. Characterizing online discussion using coarse discourse sequences. In Eleventh International AAAI Conference on Web and Social Media.

[96]

Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017b. Position-aware Attention and Supervised Data Improve Slot Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 35--45. https://doi.org/10.18653/v1/D17--1004

Cited By

Bañuelos JSigg SHe JSalim FCosta-Requena J(2024)Generating Multivariate Synthetic Time Series Data for Absent Sensors from Correlated SourcesProceedings of the 2nd International Workshop on Networked AI Systems10.1145/3662004.3663553(19-24)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3662004.3663553
Ma JZhang FJin BSu CLi SWang ZNi J(2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659602
Chen YYu JChen YKong LZhu YChen YOkoshi TKo JLiKamWa R(2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661887
Show More Cited By

Index Terms

Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Collaborative interaction

Recommendations

Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty
CSCW2

Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between ...
Judgment Extremity and Accuracy Under Epistemic vs. Aleatory Uncertainty

People view uncertain events as knowable in principle epistemic uncertainty, as fundamentally random aleatory uncertainty, or as some mixture of the two. We show that people make more extreme probability judgments i.e., closer to 0 or 1 for events they ...
Testing the descriptive validity of possibility theory in human judgments of uncertainty

Many works in the past showed that human judgments of uncertainty do not conform very well to probability theory. The present paper reports four experiments that were conducted in order to evaluate if human judgments of uncertainty conform better to ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction

Proceedings of the ACM on Human-Computer Interaction Volume 7, Issue CSCW2

CSCW

October 2023

4055 pages

EISSN:2573-0142

DOI:10.1145/3626953

Editor:
Jeff Nichols
Apple Inc., United States

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2023

Published in PACMHCI Volume 7, Issue CSCW2

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
176
Total Downloads

Downloads (Last 12 months)176
Downloads (Last 6 weeks)25

Reflects downloads up to 29 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bañuelos JSigg SHe JSalim FCosta-Requena J(2024)Generating Multivariate Synthetic Time Series Data for Absent Sensors from Correlated SourcesProceedings of the 2nd International Workshop on Networked AI Systems10.1145/3662004.3663553(19-24)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3662004.3663553
Ma JZhang FJin BSu CLi SWang ZNi J(2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659602
Chen YYu JChen YKong LZhu YChen YOkoshi TKo JLiKamWa R(2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661887
Chang ZZhang FXiong JChen WZhang DGanesan DLane NShi W(2024)MSense: Boosting Wireless Sensing Capability Under Motion InterferenceProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649350(108-123)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649350
Lisnic MLex AKogan M(2024)"Yeah, this graph doesn't show that": Analysis of Online Engagement with Misleading Data VisualizationsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642448(1-14)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642448
Fok RWeld D(2024)In search of verifiability: Explanations rarely enable complementary performance in AI‐advised decision makingAI Magazine10.1002/aaai.12182Online publication date: Jul-2024
https://doi.org/10.1002/aaai.12182
Xu LGu CTan RHe SChen JEskicioglu RHuang PPatwari N(2023)MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few LabelsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625782(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3625687.3625782

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents