Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2702123.2702603acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy

Published: 18 April 2015 Publication History

Abstract

Many HCI and ubiquitous computing systems are characterized by two important properties: their output is uncertain-it has an associated accuracy that researchers attempt to optimize-and this uncertainty is user-facing-it directly affects the quality of the user experience. Novel classifiers are typically evaluated using measures like the F1 score-but given an F-score of (e.g.) 0.85, how do we know whether this performance is good enough? Is this level of uncertainty actually tolerable to users of the intended application-and do people weight precision and recall equally? We set out to develop a survey instrument that can systematically answer such questions. We introduce a new measure, acceptability of accuracy, and show how to predict it based on measures of classifier accuracy. Out tool allows us to systematically select an objective function to optimize during classifier evaluation, but can also offer new insights into how to design feedback for user-facing classification systems (e.g., by combining a seemingly-low-performing classifier with appropriate feedback to make a highly usable system). It also reveals potential issues with the ubiquitous F1-measure as applied to user-facing systems.

References

[1]
Antifakos, S., Schwaninger, A., and Schiele, B. Evaluating the Effects of Displaying Uncertainty in Context-Aware Applications. UbiComp '04, (2004).
[2]
Bellotti, V., Back, M., Edwards, W.K., Grinter, R.E., Henderson, A., and Lopes, C. Making sense of sensing systems: five questions for designers and researchers. CHI '02, 1 (2002), 415--422.
[3]
Choe, E.K., Consolvo, S., Jung, J., Harrison, B., Patel, S.N., and Kientz, J. a. Investigating receptiveness to sensing and inference in the home using sensor proxies. UbiComp '12, (2012), 61.
[4]
Consolvo, S., Chen, M.Y., Everitt, K., and Landay, J.A. Conducting in situ evaluations for and with ubiquitous computing technologies. Int J Hum-Comput Int 22, (2007), 103--118.
[5]
Consolvo, S., McDonald, D.W., Toscos, T., et al. Activity sensing in the wild: a field trial of ubifit garden. CHI '08, (2008), 1797--1806.
[6]
Davis, F.D. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly 13, 3 (1989), 319--340.
[7]
Froehlich, J., Larson, E., Campbell, T., Haggerty, C., Fogarty, J., and Patel, S.N. HydroSense: infrastructuremediated single-point sensing of whole-home water activity. Ubicomp '09, (2009).
[8]
Gigerenzer, G. and Hoffrage, U. How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review 102, 4 (1995), 684--704.
[9]
Gupta, S., Reynolds, M.S., and Patel, S.N. ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home. Ubicomp '10, (2010), 139--148.
[10]
Koleva, B., Anastasi, R.O.B., Greenhalgh, C., et al. Expected, sensed, and desired: A framework for designing sensing-based interaction. ACM Transactions on Computer-Human Interaction 12, 1 (2005), 3--30.
[11]
Kruschke, J.K. Doing Bayesian Data Analysis. Elsevier Inc., 2011.
[12]
Larson, E.C., Lee, T., Liu, S., Rosenfeld, M., and Patel, S.N. Accurate and privacy preserving cough sensing using a low-cost microphone. UbiComp '11, (2011).
[13]
Li, X., Wang, Y.-Y., and Acero, A. Learning query intent from regularized click graphs. SIGIR '08, (2008), 339.
[14]
Lim, B.Y. and Dey, A.K. Assessing Demand for Intelligibility in Context-Aware Applications. Ubicomp '09, (2009), 195--204.
[15]
Lim, B.Y. and Dey, A.K. Investigating Intelligibility for Uncertain Context-Aware Applications. Ubicomp '11, (2011), 415--424.
[16]
Or, C.K.L. and Karsh, B.-T. A systematic review of patient acceptance of consumer health information technology. JAMIA 16, 4, 550--60.
[17]
Patel, S.N., Robertson, T., Kientz, J.A., Reynolds, M.S., and Abowd, G.D. At the Flick of a Switch: Detecting and Classifying Unique Electrical Events on the Residential Power Line. UbiComp '07, (2007).
[18]
Pavlou, P. Consumer acceptance of electronic commerce: integrating trust and risk with the technology acceptance model. Int J Electron Comm 7, 3 (2003).
[19]
Pentland, A. and Choudhury, T. Face recognition for smart environments. Computer, February (2000).
[20]
Popescu, M. and Li, Y. An acoustic fall detector system that uses sound height information to reduce the false alarm rate. IEEE EMBS, (2008), 4628--4631.
[21]
Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. J of Mach Lear Tech 2, 1 (2011), 37--63.
[22]
Van Rijsbergen, C.J. Evaluation. In Information Retrieval. Butterworth & Co., 1975, 95--132.
[23]
Rogers, Y., Connelly, K., Tedesco, L., et al. Why it's worth the hassle: The value of in-situ studies when designing UbiComp. UbiComp '07, (2007), 336--353.
[24]
Rukzio, E., Hamard, J., Noda, C., and Luca, A. De. Visualization of Uncertainty in Context Aware Mobile Applications. (2006), 247--250.
[25]
Scholtz, J. and Consolvo, S. Toward a framework for evaluating ubiquitous computing applications. IEEE Pervasive Computing 3, 2 (2004), 82--88.
[26]
Silver, N. The Weatherman Is Not a Moron. The New York Times, 2012. www.nytimes.com/2012/09/09/ magazine/the-weatherman-is-not-a-moron.html.
[27]
Venkatesh, V. and Davis, F.D. A theoretical extension of the Technology Acceptance Model: Four longitudinal field studies. Manage Sci 46, 2 (2000), 186--204.
[28]
Venkatesh, V., Morris, M.G., Davis, G.B., and Davis, F.D. User acceptance of information technology: Toward a unified view. MIS quarterly 27, 3 (2003), 425--478.
[29]
Ward, A., Jones, A., and Hopper, A. A new location technique for the active office. IEEE Personal Communications, October (1997), 42--47.

Cited By

View all
  • (2024)exHARProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435008:1(1-30)Online publication date: 6-Mar-2024
  • (2024)Discovering key factors and causalities impacting bridge pile resistance using Ensemble Bayesian networks: A bridge infrastructure asset management systemExpert Systems with Applications10.1016/j.eswa.2023.121677238(121677)Online publication date: Mar-2024
  • (2024)Measuring Cybercrime in Calls for Police ServiceAsian Journal of Criminology10.1007/s11417-024-09432-219:3(329-351)Online publication date: 5-Jul-2024
  • Show More Cited By

Index Terms

  1. How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems
    April 2015
    4290 pages
    ISBN:9781450331456
    DOI:10.1145/2702123
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 April 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. accuracy
    2. accuracy acceptability
    3. classifiers
    4. inference
    5. machine learning
    6. sensors

    Qualifiers

    • Research-article

    Conference

    CHI '15
    Sponsor:
    CHI '15: CHI Conference on Human Factors in Computing Systems
    April 18 - 23, 2015
    Seoul, Republic of Korea

    Acceptance Rates

    CHI '15 Paper Acceptance Rate 486 of 2,120 submissions, 23%;
    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)89
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)exHARProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435008:1(1-30)Online publication date: 6-Mar-2024
    • (2024)Discovering key factors and causalities impacting bridge pile resistance using Ensemble Bayesian networks: A bridge infrastructure asset management systemExpert Systems with Applications10.1016/j.eswa.2023.121677238(121677)Online publication date: Mar-2024
    • (2024)Measuring Cybercrime in Calls for Police ServiceAsian Journal of Criminology10.1007/s11417-024-09432-219:3(329-351)Online publication date: 5-Jul-2024
    • (2024)That's not what my app says: Perceptions of accuracy, consistency, and trust in weather appsMeteorological Applications10.1002/met.220531:3Online publication date: 9-May-2024
    • (2023)Comparing Zealous and Restrained AI Recommendations in a Real-World Human-AI Collaboration TaskProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581282(1-15)Online publication date: 19-Apr-2023
    • (2023)XR Input Error Mediation for Hand-Based Input: Task and Context Influences a User’s Preference2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)10.1109/ISMAR59233.2023.00117(1006-1015)Online publication date: 16-Oct-2023
    • (2022)Using Health Concept Surveying to Elicit Usable Evidence: Case Studies of a Novel Evaluation MethodologyJMIR Human Factors10.2196/304749:1(e30474)Online publication date: 3-Jan-2022
    • (2022)Understanding People's Perceptions of Approaches to Semi-Automated Dietary MonitoringProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35502886:3(1-27)Online publication date: 7-Sep-2022
    • (2022)Investigating Cross-Modal Approaches for Evaluating Error Acceptability of a Recognition-Based Input TechniqueProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35172626:1(1-24)Online publication date: 29-Mar-2022
    • (2022)Probability Weighting in Interactive DecisionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517477(1-12)Online publication date: 29-Apr-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media