Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Fallacies of Agreement: A Critical Review of Consensus Assessment Methods for Gesture Elicitation

Published: 28 June 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Discovering gestures that gain consensus is a key goal of gesture elicitation. To this end, HCI research has developed statistical methods to reason about agreement. We review these methods and identify three major problems. First, we show that raw agreement rates disregard agreement that occurs by chance and do not reliably capture how participants distinguish among referents. Second, we explain why current recommendations on how to interpret agreement scores rely on problematic assumptions. Third, we demonstrate that significance tests for comparing agreement rates, either within or between participants, yield large Type I error rates (>40% for α =.05). As alternatives, we present agreement indices that are routinely used in inter-rater reliability studies. We discuss how to apply them to gesture elicitation studies. We also demonstrate how to use common resampling techniques to support statistical inference with interval estimates. We apply these methods to reanalyze and reinterpret the findings of four gesture elicitation studies.

    References

    [1]
    Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34, 4 (2008), 555--596.
    [2]
    Thomas Baguley. 2012. Serious Stats: A Guide to Advanced Statistics for the Behavioral Sciences. Palgrave Macmillan. https://books.google.fr/books?id=ObUcBQAAQBAJ
    [3]
    Gilles Bailly, Thomas Pietrzak, Jonathan Deber, and Daniel J. Wigdor. 2013. Métamorphe: Augmenting hotkey usage with actuated keys. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM, New York, NY, 563--572.
    [4]
    Adrien Bousseau, Theophanis Tsandilas, Lora Oehlberg, and Wendy E. Mackay. 2016. How novices sketch and prototype hand-fabricated objects. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI’16). ACM, New York, NY, 397--408.
    [5]
    Robert L. Brennan and Dale J. Prediger. 1981. Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement 41, 3 (1981), 687--699.
    [6]
    James Carpenter and John Bithell. 2000. Bootstrap confidence intervals: When, which, what? a practical guide for medical statisticians. Statistics in Medicine 19, 9 (2000), 1141--1164.
    [7]
    Edwin Chan, Teddy Seyed, Wolfgang Stuerzlinger, Xing-Dong Yang, and Frank Maurer. 2016. User elicitation on single-hand microgestures. In Proceedings of the Conference on Human Factors in Computing Systems (CHI). ACM, San Jose.
    [8]
    Helena Chmura Kraemer, Vyjeyanthi S. Periyakoil, and Art Noda. 2002. Kappa coefficients in medical research. Statistics in Medicine 21, 14 (2002), 2109--2129.
    [9]
    Domenic V. Cicchetti and Alvan R. Feinstein. 1990. High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology 43, 6 (1990), 551--558.
    [10]
    William. G. Cochran. 1950. The comparison of percentages in matched samples. Biometrika 37, 3--4 (1950), 256--266. arXiv:http://biomet.oxfordjournals.org/content/37/3-4/256.full.pdf+html.
    [11]
    Andy Cockburn, Carl Gutwin, and Saul Greenberg. 2007. A predictive model of menu performance. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’07). ACM, New York, NY, 627--636.
    [12]
    Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37.
    [13]
    Jennifer Culbertson, Paul Smolensky, and Géraldine Legendre. 2012. Learning biases predict a word order universal. Cognition 122, 3 (2012), 306--329.
    [14]
    Amy Deep-Soboslay, Mayada Akil, Catherine E. Martin, Llewelyn B. Bigelow, Mary M. Herman, Thomas M. Hyde, and Joel E. Kleinman. 2005. Reliability of psychiatric diagnosis in postmortem research. Biological Psychiatry 57, 1 (2005), 96--101.
    [15]
    Pierre Dragicevic. 2016. Fair Statistical Communication in HCI. Springer International Publishing, Cham, 291--330.
    [16]
    Bradley Efron. 1979. Bootstrap methods: Another look at the jackknife. The Annals of Mathematical Statistics 7, 1 (Jan. 1979), 1--26.
    [17]
    David Ellerman. 2010. History of the Logical Entropy Formula. Retrieved from http://www.ellerman.org/history-of-the-logical-entropy-formula/.
    [18]
    Alvan R. Feinstein and Domenic V. Cicchetti. 1990. High agreement but low Kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology 43, 6 (1990), 543--549.
    [19]
    Leah Findlater, Ben Lee, and Jacob Wobbrock. 2012. Beyond QWERTY: Augmenting touch screen keyboards with multi-touch gestures for non-alphanumeric input. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’12). ACM, New York, NY, 2679--2682.
    [20]
    Ronald Aylmer Fisher. 1954. Statistical Methods for Research Workers (20th ed.). Oliver and Boyd, Edinburgh. https://cds.cern.ch/record/724001.
    [21]
    Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378--382.
    [22]
    Andrew Garrett and Keith Johnson. 2012. Phonetic bias in sound change. In Origins of Sound Change: Approaches to Phonologization. Alan C. L. Yu (Ed.), Oxford University Press, Oxford, 51--97.
    [23]
    Brian Gleeson, Karon MacLean, Amir Haddadi, Elizabeth Croft, and Javier Alcazar. 2013. Gestures for industry: Intuitive human-robot communication from human observation. In Proceedings of the 8th ACM/IEEE International Conference on Human-robot Interaction (HRI’13). IEEE Press, Piscataway, NJ, 349--356. http://dl.acm.org/citation.cfm?id=2447556.2447679.
    [24]
    Michael D. Good, John A. Whiteside, Dennis R. Wixon, and Sandra J. Jones. 1984. Building a user-derived interface. Communications of the ACM 27, 10 (Oct. 1984), 1032--1043.
    [25]
    Daniela Grijincu, Miguel A Nacenta, and Per Ola Kristensson. 2014. User-defined interface gestures: Dataset and analysis. In Proceedings of the 9th ACM International Conference on Interactive Tabletops and Surfaces. ACM, 25--34.
    [26]
    Kilem Li Gwet. 2008. Variance estimation of nominal-scale inter-rater reliability with random selection of raters. Psychometrika 73, 3 (Jan. 2008), 407.
    [27]
    Kilem Li Gwet. 2014. Handbook of Inter-Rater Reliability, 4th Edition: The Definitive Guide to Measuring The Extent of Agreement Among Raters. Advanced Analytics, LLC. https://books.google.fr/books?id=fac9BQAAQBAJ.
    [28]
    Joshua Hailpern, Karrie Karahalios, James Halle, Laura Dethorne, and Mary-Kelsey Coletto. 2009. A3: HCI coding guideline for research using video annotation to assess behavior of nonverbal subjects with computer-based intervention. ACM Transactions on Accessible Computing 2, 2 (2009), 8.
    [29]
    Andrew F. Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication Methods and Measures 1, 1 (2007), 77--89.
    [30]
    Tim Hesterberg, David Moore, Shaun Monaghan, Ashley Clipson, and Rachel Epstein. 2005. Bootstrap methods and permutation tests. In Introduction to the Practice of Statistics. W. H. Freeman and Company, New York.
    [31]
    Kasper Hornbæk, Søren S. Sander, Javier Andrés Bargas-Avila, and Jakob Grue Simonsen. 2014. Is once enough?: On the extent and content of replications in human-computer interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’14). ACM, New York, NY, 3523--3532.
    [32]
    Torsten Hothorn, Kurt Hornik, Mark A. van de Wiel, and Achim Zeileis. 2008. Implementing a class of permutation tests: The coin package. Journal of Statistical Software 28, 8 (2008), 1--23.
    [33]
    Dan Jurafsky, Elizabeth Shriberg, and Debra Biasca. 1997. Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual. Institute of Cognitive Science Technical Report (1997), 97--102.
    [34]
    Maurits Kaptein and Judy Robertson. 2012. Rethinking statistical analysis methods for CHI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’12). ACM, New York, NY, 1105--1114.
    [35]
    Matthew Kay, Steve Haroz, Shion Guha, and Pierre Dragicevic. 2016. Special interest group on transparent statistics in HCI. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA’16). ACM, New York, NY, 1081--1084.
    [36]
    Vassilis Kostakos. 2015. The big hole in HCI research. Interactions 22, 2 (2015), 48--51.
    [37]
    Klaus Krippendorff. 2004. Reliability in content analysis: Some common misconceptions and recommendations. Human Communication Research 30, 3 (2004), 411--433.
    [38]
    Klaus Krippendorff. 2011. Agreement and information in the reliability of coding. Communication Methods and Measures 5, 2 (2011), 93--112.
    [39]
    Klaus Krippendorff. 2013. Content Analysis: An Introduction to its Methodology. Sage, Thousand Oaks, CA.
    [40]
    Byron Lahey, Audrey Girouard, Winslow Burleson, and Roel Vertegaal. 2011. PaperPhone: Understanding the use of bend gestures in mobile devices with flexible electronic paper displays. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 1303--1312.
    [41]
    Sang-Su Lee, Sohyun Kim, Bopil Jin, Eunji Choi, Boa Kim, Xu Jia, Daeeop Kim, and Kun-pyo Lee. 2010. How users manipulate deformable displays as input devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 1647--1656.
    [42]
    Kathleen M. MacQueen, Eleanor McLellan, Kelly Kay, and Bobby Milstein. 1998. Codebook development for team-based qualitative analysis. Cultural Anthropology Methods 10, 2 (1998), 31--36.
    [43]
    Benoit Mandelbrot. 1967. Information Theory and Psycholinguistics: A Theory of Word Frequencies. MIT Press, MA.
    [44]
    Ellen M. Markman. 1991. The Whole-object, Taxonomic, and Mutual Exclusivity Assumptions as Initial Constraints on Word Meanings. Cambridge University Press, 72--106.
    [45]
    Mark Micire, Munjal Desai, Amanda Courtemanche, Katherine M Tsui, and Holly A Yanco. 2009. Analysis of natural gestures for controlling robot teams on multi-touch tabletop surfaces. In Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces. ACM, 41--48.
    [46]
    Meredith Ringel Morris. 2012. Web on the Wall: Insights from a multimodal interaction elicitation study. In Proceedings of the 2012 ACM International Conference on Interactive Tabletops and Surfaces (ITS’12). ACM, New York, NY, 95--104.
    [47]
    Meredith Ringel Morris, Andreea Danielescu, Steven Drucker, Danyel Fisher, Bongshin Lee, M. C. schraefel, and Jacob O. Wobbrock. 2014. Reducing legacy bias in gesture elicitation studies. Interactions 21, 3 (May 2014), 40--45.
    [48]
    Meredith Ringel Morris, Jacob O. Wobbrock, and Andrew D. Wilson. 2010. Understanding users’ preferences for surface gestures. In Proceedings of Graphics Interface 2010 (GI’10). Canadian Information Processing Society, Toronto, ON, Canada, 261--268.
    [49]
    Mark E. J. Newman. 2005. Power laws, pareto distributions and Zipf’s law. Contemporary Physics 46, 5 (Sep. 2005), 323--351. arXiv:cond-mat/0412004
    [50]
    Michael Nielsen, Moritz Störring, Thomas B. Moeslund, and Erik Granum. 2004. A Procedure for Developing Intuitive and Ergonomic Gesture Interfaces for HCI. Springer, Berlin, 409--420.
    [51]
    Dianne L. O’Connell and Annette J. Dobson. 1984. General observer-agreement measures on individual subjects and groups of subjects. Biometrics 40, 4 (1984), 973--983.
    [52]
    Uran Oh and Leah Findlater. 2013. The challenges and potential of end-user gesture customization. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM, New York, NY, 1129--1138.
    [53]
    Kimberly J. O’Malley, Karon F. Cook, Matt D. Price, Kimberly Raiford Wildes, John F. Hurdle, and Carol M. Ashton. 2005. Measuring diagnoses: ICD code accuracy. Health Services Research 40, 5p2 (2005), 1620--1639.
    [54]
    Steven T. Piantadosi. 2014. Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin and Review 21, 5 (2014), 1112--1130.
    [55]
    Thammathip Piumsomboon, Adrian Clark, Mark Billinghurst, and Andy Cockburn. 2013. User-defined gestures for augmented reality. In Procedings of the14th IFIP TC13 Conference on Human-Computer Interaction (INTERACT’13). Springer, Berlin, 282--299.
    [56]
    Karen L. Posner, Paul D. Sampson, Robert A. Caplan, Richard J. Ward, and Frederick W. Cheney. 1990. Measuring interrater reliability among multiple raters: An example of methods for nominal data. Statistics in Medicine 11, 10 (1990), 1103--1115.
    [57]
    Maurice H. Quenouille. 1949. Problems in plane sampling. The Annals of Mathematical Statistics 20, 3 (Sep. 1949), 355--375.
    [58]
    Julie Rico and Stephen Brewster. 2010. Usable gestures for mobile interfaces: Evaluating social acceptability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 887--896.
    [59]
    Jaime Ruiz and Daniel Vogel. 2015. Soft-constraints to reduce legacy and performance bias to elicit whole-body gestures with low arm fatigue. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15). ACM, New York, NY, 3347--3350.
    [60]
    William A. Scott. 1955. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly 19, 3 (1955), 321--325.
    [61]
    Edward H. Simpson. 1949. Measurement of diversity. Nature 163 (1949), 688.
    [62]
    Robert L. Spitzer and Joseph L. Fleiss. 1974. A re-analysis of the reliability of psychiatric diagnosis. The British Journal of Psychiatry 125, 587 (1974), 341--347. arXiv:http://bjp.rcpsych.org/content/125/587/341.full.pdf.
    [63]
    Giovanni Maria Troiano, Esben Warming Pedersen, and Kasper Hornbæk. 2014. User-defined gestures for elastic, deformable displays. In Proceedings of the 2014 International Working Conference on Advanced Visual Interfaces (AVI’14). ACM, New York, NY, 1--8.
    [64]
    Theophanis Tsandilas and Pierre Dragicevic. 2016. Accounting for Chance Agreement in Gesture Elicitation Studies. Research Report 1584. LRI - CNRS, University Paris-Sud. 5 pages. Retrieved from https://hal.archives-ouvertes.fr/hal-01267288.
    [65]
    John S. Uebersax. 1982. A design-independent method for measuring the reliability of psychiatric diagnosis. Journal of Psychiatric Research 17, 4 (1982), 335--342.
    [66]
    John S. Uebersax. 2015. Statistical Methods for Diagnostic Agreement. Retrieved from http://www.john-uebersax.com/stat/agree.htm.
    [67]
    Sophie Vanbelle and Adelin Albert. 2009. Agreement between two independent groups of raters. Psychometrika 74, 3 (2009), 477--491.
    [68]
    Radu-Daniel Vatavu and Jacob O. Wobbrock. 2015. Formalizing agreement analysis for elicitation studies: New measures, significance test, and toolkit. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15). ACM, New York, NY, 1325--1334.
    [69]
    Radu-Daniel Vatavu and Jacob O. Wobbrock. 2016. Between-subjects elicitation studies: Formalization and tool support. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI’16). ACM, New York, NY, 3390--3402.
    [70]
    Julie Wagner, Stéphane Huot, and Wendy Mackay. 2012. BiTouch and bipad: Designing bimanual interaction for hand-held tablets. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’12). ACM, New York, NY, 2317--2326.
    [71]
    Martin Weigel, Vikram Mehta, and Jürgen Steimle. 2014. More than touch: Understanding how people use skin as an input surface for mobile computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’14). ACM, New York, NY, 179--188.
    [72]
    Max Wilson, Wendy Mackay, Ed Chi, Michael Bernstein, and Jeffrey Nichols. 2012. RepliCHI SIG: From a panel to a new submission venue for replication. In Proceedings of the CHI’12 Extended Abstracts on Human Factors in Computing Systems (CHI EA’12). ACM, New York, NY, 1185--1188.
    [73]
    Jacob O. Wobbrock, Htet Htet Aung, Brandon Rothrock, and Brad A. Myers. 2005. Maximizing the guessability of symbolic input. In Proceedings of the CHI’05 Extended Abstracts on Human Factors in Computing Systems (CHI EA’05). ACM, New York, NY, 1869--1872.
    [74]
    Jacob O. Wobbrock, Meredith Ringel Morris, and Andrew D. Wilson. 2009. User-defined gestures for surface computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’09). ACM, New York, NY, 1083--1092.
    [75]
    Michael Wood. 2005. Bootstrapped confidence intervals as an approach to statistical inference. Organizational Research Methods 8, 4 (2005), 454--470.
    [76]
    George K. Zipf. 1949. Human Behaviour and the Principle of Least Effort. Addison-Wesley.

    Cited By

    View all
    • (2024)User Preferences for Interactive 3D Object Transitions in Cross Reality - An Elicitation StudyProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656698(1-9)Online publication date: 3-Jun-2024
    • (2024)Eliciting Multimodal and Collaborative Interactions for Data Exploration on Large Vertical DisplaysIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332315030:2(1624-1637)Online publication date: 1-Feb-2024
    • (2024)Exploring Methods to Optimize Gesture Elicitation Studies: A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2024.338726912(64958-64979)Online publication date: 2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Computer-Human Interaction
    ACM Transactions on Computer-Human Interaction  Volume 25, Issue 3
    June 2018
    217 pages
    ISSN:1073-0516
    EISSN:1557-7325
    DOI:10.1145/3231919
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2018
    Accepted: 01 January 2018
    Revised: 01 August 2017
    Received: 01 March 2017
    Published in TOCHI Volume 25, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Gesture elicitation
    2. agreement indices
    3. agreement rates
    4. bias
    5. chance agreement
    6. confidence intervals
    7. content analysis
    8. gestures
    9. kappa coefficients
    10. replication
    11. statistics

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)2

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)User Preferences for Interactive 3D Object Transitions in Cross Reality - An Elicitation StudyProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656698(1-9)Online publication date: 3-Jun-2024
    • (2024)Eliciting Multimodal and Collaborative Interactions for Data Exploration on Large Vertical DisplaysIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332315030:2(1624-1637)Online publication date: 1-Feb-2024
    • (2024)Exploring Methods to Optimize Gesture Elicitation Studies: A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2024.338726912(64958-64979)Online publication date: 2024
    • (2024)Priming users with babies’ gestures: Investigating the influences of priming with different development origin of image schemas in gesture elicitation studyInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2024.103288189(103288)Online publication date: Sep-2024
    • (2024)Universal Hand Gesture Interaction Vocabulary for Cross-Cultural Users: Challenges and ApproachesHCI International 2024 Posters10.1007/978-3-031-61932-8_8(56-69)Online publication date: 1-Jun-2024
    • (2023)Brave New GES World: A Systematic Literature Review of Gestures and Referents in Gesture Elicitation StudiesACM Computing Surveys10.1145/363645856:5(1-55)Online publication date: 7-Dec-2023
    • (2023)Spreadsheets on Interactive Surfaces: Breaking through the Grid with the PenACM Transactions on Computer-Human Interaction10.1145/363009731:2(1-33)Online publication date: 25-Oct-2023
    • (2023)Interactive 3D Annotation of Objects in Moving Videos from Sparse Multi-view FramesProceedings of the ACM on Human-Computer Interaction10.1145/36264767:ISS(309-326)Online publication date: 1-Nov-2023
    • (2023)Towards a Consensus Gesture Set: A Survey of Mid-Air Gestures in HCI for Maximized Agreement Across DomainsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581420(1-24)Online publication date: 19-Apr-2023
    • (2023)User-Driven Constraints for Layout Optimisation in Augmented RealityProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580873(1-16)Online publication date: 19-Apr-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media