Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement

Published: 04 October 2023 Publication History
  • Get Citation Alerts
  • Abstract

    When groups of people are tasked with making a judgment, the issue of uncertainty often arises. Existing methods to reduce uncertainty typically focus on iteratively improving specificity in the overall task instruction. However, uncertainty can arise from multiple sources, such as ambiguity of the item being judged due to limited context, or disagreements among the participants due to different perspectives and an under-specified task. A one-size-fits-all intervention may be ineffective if it is not targeted to the right source of uncertainty. In this paper we introduce a new workflow, Judgment Sieve, to reduce uncertainty in tasks involving group judgment in a targeted manner. By utilizing measurements that separate different sources of uncertainty during an initial round of judgment elicitation, we can then select a targeted intervention adding context or deliberation to most effectively reduce uncertainty on each item being judged. We test our approach on two tasks: rating word pair similarity and toxicity of online comments, showing that targeted interventions reduced uncertainty for the most uncertain cases. In the top 10% of cases, we saw an ambiguity reduction of 21.4% and 25.7%, and a disagreement reduction of 22.2% and 11.2% for the two tasks respectively. We also found through a simulation that our targeted approach reduced the average uncertainty scores for both sources of uncertainty as opposed to uniform approaches where reductions in average uncertainty from one source came with an increase for the other.

    References

    [1]
    Lora Aroyo and Chris Welty. 2013. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. WebSci2013. ACM, Vol. 2013, 2013 (2013).
    [2]
    Shubham Atreja, Libby Hemphill, and Paul Resnick. 2022. What is the Will of the People? Moderation Preferences for Misinformation. ArXiv, Vol. abs/2202.00799 (2022).
    [3]
    Stephanie Alice Baker, Matthew Wade, and Michael James Walsh. 2020. The challenges of responding to misinformation during a pandemic: content moderation and the limitations of the concept of harm. Media International Australia, Vol. 177 (2020), 103--107.
    [4]
    Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. 2011. Crowds in Two Seconds: Enabling Realtime Crowd-Powered Interfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). Association for Computing Machinery, New York, NY, USA, 33--42. https://doi.org/10.1145/2047196.2047201
    [5]
    Lucas Beyer, Olivier J. H'enaff, Alexander Kolesnikov, Xiaohua Zhai, and A"aron van den Oord. 2020. Are we done with ImageNet? ArXiv, Vol. abs/2006.07159 (2020).
    [6]
    Umang Bhatt, Javier Antorán, Yunfeng Zhang, Q. Vera Liao, Prasanna Sattigeri, Riccardo Fogliato, Gabrielle Melanccon, Ranganath Krishnan, Jason Stanley, Omesh Tickoo, Lama Nachman, Rumi Chunara, Madhulika Srikumar, Adrian Weller, and Alice Xiang. 2021. Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty. Association for Computing Machinery, New York, NY, USA, 401--413. https://doi.org/10.1145/3461702.3462571
    [7]
    Abeba Birhane. 2021. The Impossibility of Automating Ambiguity. Artificial Life, Vol. 27 (2021), 44--61.
    [8]
    Flora Blangis, Slimane Allali, Jérémie F Cohen, Nathalie Vabres, Catherine Adamsbaum, Caroline Rey-Salmon, Andreas Werner, Yacine Refes, Pauline Adnot, Christèle Gras-Le Guen, et al. 2021. Variations in guidelines for diagnosis of child physical abuse in high-income countries: a systematic review. JAMA network open, Vol. 4, 11 (2021), e2129068--e2129068.
    [9]
    Jonathan Bragg, Mausam, and Daniel S. Weld. 2018. Sprout: Crowd-Powered Task Design for Crowdsourcing. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST '18). Association for Computing Machinery, New York, NY, USA, 165--176. https://doi.org/10.1145/3242587.3242598
    [10]
    Hancheng Cao, Vivian Yang, Victor Chen, Yu Jin Lee, Lydia Stone, N'godjigui Junior Diarrassouba, Mark E. Whiting, and Michael S. Bernstein. 2021. My Team Will Go On: Differentiating High and Low Viability Teams through Team Interaction. Proc. ACM Hum.-Comput. Interact., Vol. 4, CSCW3, Article 230 (jan 2021), 27 pages. https://doi.org/10.1145/3432929
    [11]
    Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2334--2346.
    [12]
    Quanze Chen, Jonathan Bragg, Lydia B. Chilton, and Dan S. Weld. 2019. Cicero: Multi-Turn, Contextual Argumentation for Accurate Crowdsourcing. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--14. https://doi.org/10.1145/3290605.3300761
    [13]
    Quan Ze Chen, Daniel S. Weld, and Amy X. Zhang. 2021. Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty. Proc. ACM Hum.-Comput. Interact., Vol. 5, CSCW2, Article 335 (oct 2021), 25 pages. https://doi.org/10.1145/3476076
    [14]
    John Joon Young Chung, Jean Y. Song, Sindhu Kutty, Sungsoo (Ray) Hong, Juho Kim, and Walter S. Lasecki. 2019. Efficient Elicitation Approaches to Estimate Collective Crowd Answers. Proc. ACM Hum.-Comput. Interact., Vol. 3, CSCW, Article 62 (nov 2019), 25 pages. https://doi.org/10.1145/3359164
    [15]
    Katherine M. Collins, Umang Bhatt, and Adrian Weller. 2022. Eliciting and Learning with Soft Labels from Every Annotator. In Proceedings of the Tenth AAAI Conference on Human Computation and Crowdsourcing (HCOMP2022) (HCOMP '22). Association for the Advancement of ArtificialIntelligence, Washington, DC, USA.
    [16]
    Corinna Cortes and Neil D. Lawrence. 2021. Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment. ArXiv, Vol. abs/2109.09774 (2021).
    [17]
    Stephen Crowder, Collin Delker, Eric Forrest, and Nevin Martin. 2020. Introduction to Statistics in Metrology. Springer.
    [18]
    Todd Davies and Reid Chandler. 2013. Online deliberation design: Choices, criteria, and evidence. arXiv preprint arXiv:1302.5177 (2013).
    [19]
    A. Philip Dawid and Allan Skene. 1979. Maximum Likelihood Estimation of Observer Error?Rates Using the EM Algorithm. Journal of The Royal Statistical Society Series C-applied Statistics, Vol. 28 (1979), 20--28.
    [20]
    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, K. Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), 248--255.
    [21]
    Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of mechanical turk workers. In Proceedings of the eleventh ACM international conference on web search and data mining. 135--143.
    [22]
    Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2012. Mechanical cheat: Spamming schemes and adversarial techniques on crowdsourcing platforms. In CrowdSearch.
    [23]
    Ryan Drapeau, Lydia Chilton, Jonathan Bragg, and Daniel Weld. 2016. Microtalk: Using argumentation to improve crowdsourcing accuracy. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 4. 32--41.
    [24]
    Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018. Crowdsourcing Ground Truth for Medical Relation Extraction. ACM Trans. Interact. Intell. Syst., Vol. 8, 2, Article 11 (jul 2018), 20 pages. https://doi.org/10.1145/3152889
    [25]
    Jenny Fan and Amy X. Zhang. 2020. Digital Juries: A Civics-Oriented Approach to Platform Governance. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020).
    [26]
    Lev Finkelstein, Evgeniy Gabrilovich, Y. Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing Search in Context: The Concept Revisited. ACM Trans. Inf. Syst., Vol. 20, 1 (jan 2002), 116--131. https://doi.org/10.1145/503104.503110
    [27]
    Tommaso Fornaciari, Alexandra Uma, Silviu Paun, Barbara Plank, Dirk Hovy, and Massimo Poesio. 2021. Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2591--2597. https://doi.org/10.18653/v1/2021.naacl-main.204
    [28]
    Craig R Fox and Gülden Ülkümen. 2011. Distinguishing two dimensions of uncertainty. Fox, Craig R. and Gülden Ülkümen (2011),?Distinguishing Two Dimensions of Uncertainty," in Essays in Judgment and Decision Making, Brun, W., Kirkebøen, G. and Montgomery, H., eds. Oslo: Universitetsforlaget (2011).
    [29]
    Ujwal Gadiraju, Besnik Fetahu, and Ricardo Kawase. 2015. Training workers for improving performance in crowdsourcing microtasks. In European Conference on Technology Enhanced Learning. Springer, 100--114.
    [30]
    Ujwal Gadiraju, Jie Yang, and Alessandro Bozzon. 2017. Clarity is a Worthwhile Quality: On the Role of Task Clarity in Microtask Crowdsourcing. In Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT '17). Association for Computing Machinery, New York, NY, USA, 5--14. https://doi.org/10.1145/3078714.3078715
    [31]
    Timnit Gebru, Jamie H. Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna M. Wallach, Hal Daumé, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM, Vol. 64 (2021), 86--92.
    [32]
    R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from? Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (2020).
    [33]
    Tarleton Gillespie. 2018. Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media. Yale University Press.
    [34]
    Mitchell L. Gordon, Kaitlyn Zhou, Kayur Patel, Tatsunori Hashimoto, and Michael S. Bernstein. 2021. The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 388, 14 pages. https://doi.org/10.1145/3411764.3445423
    [35]
    Shinsuke Goto, Toru Ishida, and Donghui Lin. 2016. Understanding Crowdsourcing Workflow: Modeling and Optimizing Iterative and Parallel Processes. In AAAI Conference on Human Computation & Crowdsourcing.
    [36]
    Kevin A. Hallgren. 2012. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutorials in quantitative methods for psychology, Vol. 8 1 (2012), 23--34.
    [37]
    Danula Hettiachchi, Mike Schaekermann, Tristan J. McKinney, and Matthew Lease. 2021. The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help. Proc. ACM Hum.-Comput. Interact., Vol. 5, CSCW2, Article 332 (oct 2021), 26 pages. https://doi.org/10.1145/3476073
    [38]
    Martin Hilbert. 2012. Toward a synthesis of cognitive biases: how noisy information processing can bias human decision making. Psychological bulletin, Vol. 138 2 (2012), 211--37.
    [39]
    Stephen C. Hora. 1996. Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management. Reliability Engineering & System Safety, Vol. 54 (1996), 217--223.
    [40]
    Eric Huang, Richard Socher, Christopher Manning, and Andrew Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Jeju Island, Korea, 873--882. https://aclanthology.org/P12--1092
    [41]
    E. Hullermeier and W. Waegeman. 2019. Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods. arXiv: Learning (2019).
    [42]
    Oana Inel and Lora Aroyo. 2017. Harnessing Diversity in Crowds and Machines for Better NER Performance. In ESWC.
    [43]
    Matthew Ingram. [n. d.]. Here's Why Facebook Removing That Vietnam War Photo Is So Important. Fortune ([n. d.]). https://fortune.com/2016/09/09/facebook-napalm-photo-vietnam-war/
    [44]
    Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality Management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP '10). Association for Computing Machinery, New York, NY, USA, 64--67. https://doi.org/10.1145/1837885.1837906
    [45]
    Jialun Aaron Jiang, Morgan Klaus Scheuerman, Casey Fiesler, and Jed R. Brubaker. 2021. Understanding international perceptions of the severity of harmful content online. PLoS ONE, Vol. 16 (2021).
    [46]
    V K. Chaithanya Manam, Dwarakanath Jampani, Mariam Zaim, Meng-Han Wu, and Alexander J. Quinn. 2019. TaskMate: A Mechanism to Improve the Quality of Instructions in Crowdsourcing. In Companion Proceedings of The 2019 World Wide Web Conference. 1121--1130.
    [47]
    Sanjay Kairam and Jeffrey Heer. 2016. Parting crowds: Characterizing divergent interpretations in crowdsourced annotation tasks. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 1637--1648.
    [48]
    Armen Der Kiureghian and O. Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural Safety, Vol. 31 (2009), 105--112.
    [49]
    Travis Kriplean, Jonathan T. Morgan, Deen Freelon, Alan Borning, and Lance Bennett. 2011. ConsiderIt: Improving Structured Public Deliberation. In CHI '11 Extended Abstracts on Human Factors in Computing Systems (CHI EA '11). ACM, New York, NY, USA, 1831--1836. https://doi.org/10.1145/1979742.1979869
    [50]
    Ji-Ung Lee, Jan-Christoph Klie, and Iryna Gurevych. 2022. Annotation Curricula to Implicitly Train Non-Expert Annotators. ArXiv, Vol. abs/2106.02382 (2022).
    [51]
    Elisa Leonardelli, Stefano Menini, Alessio Palmero Aprosio, Marco Guerini, and Sara Tonelli. 2021. Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators' Disagreement. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 10528--10539. https://doi.org/10.18653/v1/2021.emnlp-main.822
    [52]
    Dangwei Li, Zhang Zhang, Xiaotang Chen, and Kaiqi Huang. 2018. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE transactions on image processing, Vol. 28, 4 (2018), 1575--1590.
    [53]
    E Allan Lind, John Thibaut, and Laurens Walker. 1973. Discovery and presentation of evidence in adversary and nonadversary proceedings. Michigan Law Review, Vol. 71, 6 (1973), 1129--1144.
    [54]
    Angli Liu, Stephen Soderland, Jonathan Bragg, Christopher H. Lin, Xiao Ling, and Daniel S. Weld. 2016. Effective Crowd Annotation for Relation Extraction. In Proceedings of NAACL and HLT 2016.
    [55]
    VK Chaithanya Manam and Alexander J Quinn. 2018. Wingit: Efficient refinement of unclear task instructions. In Sixth AAAI Conference on Human Computation and Crowdsourcing.
    [56]
    V. K. Chaithanya Manam, Dwarakanath Jampani, Mariam Zaim, Meng-Han Wu, and Alexander J. Quinn. 2019. TaskMate: A Mechanism to Improve the Quality of Instructions in Crowdsourcing. Companion Proceedings of The 2019 World Wide Web Conference (2019).
    [57]
    Emily Megan Marshman, Ryan Thomas Sayer, Charles Henderson, Edit Yerushalmi, and Chandralekha Singh. 2018. The challenges of changing teaching assistants' grading practices: Requiring students to show evidence of understanding. Canadian Journal of Physics, Vol. 96 (2018), 420--437.
    [58]
    Aiden R. McGillicuddy, Jean-Grégoire Bernard, and Jocelyn Cranefield. 2020. Controlling Bad Behavior in Online Communities: An Examination of Moderation Work. In International Conference on Interaction Sciences.
    [59]
    Stefano Menini, Alessio Palmero Aprosio, and Sara Tonelli. 2021. Abuse is Contextual, What about NLP? The Role of Context in Abusive Language Annotation and Detection. ArXiv, Vol. abs/2103.14916 (2021).
    [60]
    George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM, Vol. 38, 11 (nov 1995), 39--41. https://doi.org/10.1145/219717.219748
    [61]
    Vikram Mohanty, David Thames, Sneha Mehta, and Kurt Luther. 2019. Photo Sleuth: Combining Human Expertise and Face Recognition to Identify Historical Portraits. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI '19). Association for Computing Machinery, New York, NY, USA, 547--557. https://doi.org/10.1145/3301275.3302301
    [62]
    Jethro Mullen and Charles Riley. [n. d.]. After outcry, Facebook will reinstate iconic Vietnam War photo. CNN Business ([n. d.]). https://money.cnn.com/2016/09/09/technology/facebook-censorship-vietnam-war-photo/index.html
    [63]
    Alexandra Papoutsaki, Hua Guo, Danaë Metaxa-Kakavouli, Connor Gramazio, Jeff Rasley, Wenting Xie, Guan Wang, and Jeff Huang. 2015. Crowdsourcing from Scratch: A Pragmatic Experiment in Data Collection by Novice Requesters. In HCOMP.
    [64]
    R. Passonneau and Bob Carpenter. 2013. The Benefits of a Model of Annotation. Transactions of the Association for Computational Linguistics, Vol. 2 (2013), 311--326.
    [65]
    Ellie Pavlick and Tom Kwiatkowski. 2019. Inherent Disagreements in Human Textual Inferences. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 677--694. https://doi.org/10.1162/tacl_a_00293
    [66]
    John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, and Ion Androutsopoulos. 2020. Toxicity Detection: Does Context Really Matter?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4296--4305. https://doi.org/10.18653/v1/2020.acl-main.396
    [67]
    Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, and Mark Diaz. 2021. On Releasing Annotator-Level Labels and Information in Datasets. In Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop. Association for Computational Linguistics, Punta Cana, Dominican Republic, 133--138. https://doi.org/10.18653/v1/2021.law-1.14
    [68]
    Vivek Pradhan, Mike Schaekermann, and Matthew Lease. 2021. In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers. ArXiv, Vol. abs/2112.02255 (2021).
    [69]
    David L. Rosenhan, Sara L. Eisner, and Robert J. Robinson. 1994. Notetaking can aid juror recall. Law and Human Behavior, Vol. 18 (1994), 53--61.
    [70]
    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, Vol. 115 (2015), 211--252.
    [71]
    Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A Smith. 2019. The Risk of Racial Bias in Hate Speech Detection. In ACL. https://www.aclweb.org/anthology/P19--1163.pdf
    [72]
    Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A Smith. 2021. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. arXiv preprint arXiv:2111.07997 (2021).
    [73]
    Mike Schaekermann, Graeme Beaton, Minahz Habib, Andrew Lim, Kate Larson, and Edith Law. 2019. Understanding Expert Disagreement in Medical Data Analysis through Structured Adjudication. Proc. ACM Hum.-Comput. Interact., Vol. 3, CSCW, Article 76 (nov 2019), 23 pages. https://doi.org/10.1145/3359178
    [74]
    Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 154 (nov 2018), 19 pages. https://doi.org/10.1145/3274423
    [75]
    Arjun Singh, Sergey Karayev, Kevin Gutowski, and Pieter Abbeel. 2017. Gradescope: A Fast, Flexible, and Fair System for Scalable Assessment of Handwritten Work. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale (L@S '17). Association for Computing Machinery, New York, NY, USA, 81--88. https://doi.org/10.1145/3051457.3051466
    [76]
    Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Ng. 2008. Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Honolulu, Hawaii, 254--263. https://aclanthology.org/D08--1027
    [77]
    Robert Soden, Laura Devendorf, Richmond Y. Wong, Yoko Akama, and Ann Light. 2022. Modes of Uncertainty in HCI. Found. Trends Hum. Comput. Interact., Vol. 15 (2022), 317--426.
    [78]
    Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2014. Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), Reykjavik, Iceland, 1355--1358. http://www.lrec-conf.org/proceedings/lrec2014/pdf/1007_Paper.pdf
    [79]
    George Stoica, Emmanouil Antonios Platanios, and Barnab'as P'oczos. 2021. Re-TACRED: Addressing Shortcomings of the TACRED Dataset. In AAAI Conference on Artificial Intelligence.
    [80]
    Nicolas Suzor and Darryl Woodford. 2013. Evaluating consent and legitimacy amongst shifting community norms: an EVE Online case study. Suzor, Nicolas P. & Woodford, Darryl (2013) Evaluating consent and legitimacy amongst shifting community norms: an EVE Online case study. Journal of Virtual Worlds Research, Vol. 6, 3 (2013), 1--14.
    [81]
    Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. 2020. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 9275--9293. https://doi.org/10.18653/v1/2020.emnlp-main.746
    [82]
    R Peter Terrebonne. 1981. A strictly evolutionary model of common law. The Journal of Legal Studies, Vol. 10, 2 (1981), 397--407.
    [83]
    Craig Thorley, Lara Beaton, Phillip Deguara, Brittany Jerome, Dua Khan, and Kaela Schopp. 2020. Misinformation encountered during a simulated jury deliberation can distort jurors' memory of a trial and bias their verdicts. Legal and Criminological Psychology, Vol. 25 (2020), 150--164.
    [84]
    Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuristics and Biases. Science, Vol. 185 (1974), 1124--1131.
    [85]
    Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR'11). 21--26.
    [86]
    Warren E. Walker, Poul Harremoës, Jan Rotmans, Jeroen P. van der Sluijs, M. B. A. Asselt, Paul Janssen, and Martin Krayer von Krauss. 2003. Defining Uncertainty: A Conceptual Basis for Uncertainty Management in Model-Based Decision Support. Integrated Assessment, Vol. 4 (2003), 5--17.
    [87]
    Dongsheng Wang, Prayag Tiwari, Mohammad Shorfuzzaman, and Ingo Schmitt. 2021. Deep neural learning on weighted datasets utilizing label disagreement from crowdsourcing. Computer Networks, Vol. 196 (2021), 108227. https://doi.org/10.1016/j.comnet.2021.108227
    [88]
    Zeerak Waseem. 2016. Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In Proceedings of the First Workshop on NLP and Computational Social Science. Association for Computational Linguistics, Austin, Texas, 138--142. https://doi.org/10.18653/v1/W16--5618
    [89]
    Chris Welty, Lora Mois Aroyo, and Praveen Kumar Paritosh. 2019. A Metrological Framework for Evaluating Crowd-powered Instruments. In HCOMP-2019: AAAI Conference on Human Computation.
    [90]
    Mark E. Whiting, Allie Blaising, Chloe Barreau, Laura Fiuza, Nik Marda, Melissa Valentine, and Michael S. Bernstein. 2019. Did It Have To End This Way? Understanding The Consistency of Team Fracture. Proc. ACM Hum.-Comput. Interact., Vol. 3, CSCW, Article 209 (nov 2019), 23 pages. https://doi.org/10.1145/3359311
    [91]
    Janyce M. Wiebe, Rebecca F. Bruce, and Thomas P. O'Hara. 1999. Development and Use of a Gold-Standard Data Set for Subjectivity Classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, College Park, Maryland, USA, 246--253. https://doi.org/10.3115/1034678.1034721
    [92]
    Meng-Han Wu and Alexander J. Quinn. 2017. Confusing the Crowd: Task Instruction Quality on Amazon Mechanical Turk. In HCOMP.
    [93]
    Shuicheng Yan, Huan Wang, Thomas S. Huang, Qiong Yang, and Xiaoou Tang. 2007. Ranking with Uncertain Labels. 2007 IEEE International Conference on Multimedia and Expo (2007), 96--99.
    [94]
    Hao-Yu Yang, Junling Yang, Yue Pan, Kunlin Cao, Qi Song, Feng Gao, and Youbing Yin. 2019. Learn To Be Uncertain: Leveraging Uncertain Labels In Chest X-rays With Bayesian Neural Networks. In CVPR Workshops.
    [95]
    Amy X Zhang, Bryan Culbertson, and Praveen Paritosh. 2017a. Characterizing online discussion using coarse discourse sequences. In Eleventh International AAAI Conference on Web and Social Media.
    [96]
    Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017b. Position-aware Attention and Supervised Data Improve Slot Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 35--45. https://doi.org/10.18653/v1/D17--1004

    Cited By

    View all
    • (2024)Generating Multivariate Synthetic Time Series Data for Absent Sensors from Correlated SourcesProceedings of the 2nd International Workshop on Networked AI Systems10.1145/3662004.3663553(19-24)Online publication date: 3-Jun-2024
    • (2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
    • (2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
    • Show More Cited By

    Index Terms

    1. Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 7, Issue CSCW2
      CSCW
      October 2023
      4055 pages
      EISSN:2573-0142
      DOI:10.1145/3626953
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 October 2023
      Published in PACMHCI Volume 7, Issue CSCW2

      Check for updates

      Author Tags

      1. ambiguity
      2. annotation
      3. calibration
      4. crowdsourcing

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)176
      • Downloads (Last 6 weeks)25
      Reflects downloads up to 29 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Generating Multivariate Synthetic Time Series Data for Absent Sensors from Correlated SourcesProceedings of the 2nd International Workshop on Networked AI Systems10.1145/3662004.3663553(19-24)Online publication date: 3-Jun-2024
      • (2024)Push the Limit of Highly Accurate Ranging on Commercial UWB DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596028:2(1-27)Online publication date: 15-May-2024
      • (2024)RFSpy: Eavesdropping on Online Conversations with Out-of-Vocabulary Words by Sensing Metal Coil Vibration of Headsets Leveraging RFIDProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661887(169-182)Online publication date: 3-Jun-2024
      • (2024)MSense: Boosting Wireless Sensing Capability Under Motion InterferenceProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649350(108-123)Online publication date: 29-May-2024
      • (2024)"Yeah, this graph doesn't show that": Analysis of Online Engagement with Misleading Data VisualizationsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642448(1-14)Online publication date: 11-May-2024
      • (2024)In search of verifiability: Explanations rarely enable complementary performance in AI‐advised decision makingAI Magazine10.1002/aaai.12182Online publication date: Jul-2024
      • (2023)MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few LabelsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625782(1-14)Online publication date: 12-Nov-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media