research-article

How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy

Authors:

Shwetak N. Patel,

Julie A. KientzAuthors Info & Claims

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Pages 347 - 356

https://doi.org/10.1145/2702123.2702603

Published: 18 April 2015 Publication History

Abstract

Many HCI and ubiquitous computing systems are characterized by two important properties: their output is uncertain-it has an associated accuracy that researchers attempt to optimize-and this uncertainty is user-facing-it directly affects the quality of the user experience. Novel classifiers are typically evaluated using measures like the F1 score-but given an F-score of (e.g.) 0.85, how do we know whether this performance is good enough? Is this level of uncertainty actually tolerable to users of the intended application-and do people weight precision and recall equally? We set out to develop a survey instrument that can systematically answer such questions. We introduce a new measure, acceptability of accuracy, and show how to predict it based on measures of classifier accuracy. Out tool allows us to systematically select an objective function to optimize during classifier evaluation, but can also offer new insights into how to design feedback for user-facing classification systems (e.g., by combining a seemingly-low-performing classifier with appropriate feedback to make a highly usable system). It also reveals potential issues with the ubiquitous F1-measure as applied to user-facing systems.

References

[1]

Antifakos, S., Schwaninger, A., and Schiele, B. Evaluating the Effects of Displaying Uncertainty in Context-Aware Applications. UbiComp '04, (2004).

[2]

Bellotti, V., Back, M., Edwards, W.K., Grinter, R.E., Henderson, A., and Lopes, C. Making sense of sensing systems: five questions for designers and researchers. CHI '02, 1 (2002), 415--422.

Digital Library

[3]

Choe, E.K., Consolvo, S., Jung, J., Harrison, B., Patel, S.N., and Kientz, J. a. Investigating receptiveness to sensing and inference in the home using sensor proxies. UbiComp '12, (2012), 61.

Digital Library

[4]

Consolvo, S., Chen, M.Y., Everitt, K., and Landay, J.A. Conducting in situ evaluations for and with ubiquitous computing technologies. Int J Hum-Comput Int 22, (2007), 103--118.

[5]

Consolvo, S., McDonald, D.W., Toscos, T., et al. Activity sensing in the wild: a field trial of ubifit garden. CHI '08, (2008), 1797--1806.

Digital Library

[6]

Davis, F.D. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly 13, 3 (1989), 319--340.

Digital Library

[7]

Froehlich, J., Larson, E., Campbell, T., Haggerty, C., Fogarty, J., and Patel, S.N. HydroSense: infrastructuremediated single-point sensing of whole-home water activity. Ubicomp '09, (2009).

Digital Library

[8]

Gigerenzer, G. and Hoffrage, U. How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review 102, 4 (1995), 684--704.

[9]

Gupta, S., Reynolds, M.S., and Patel, S.N. ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home. Ubicomp '10, (2010), 139--148.

Digital Library

[10]

Koleva, B., Anastasi, R.O.B., Greenhalgh, C., et al. Expected, sensed, and desired: A framework for designing sensing-based interaction. ACM Transactions on Computer-Human Interaction 12, 1 (2005), 3--30.

Digital Library

[11]

Kruschke, J.K. Doing Bayesian Data Analysis. Elsevier Inc., 2011.

[12]

Larson, E.C., Lee, T., Liu, S., Rosenfeld, M., and Patel, S.N. Accurate and privacy preserving cough sensing using a low-cost microphone. UbiComp '11, (2011).

Digital Library

[13]

Li, X., Wang, Y.-Y., and Acero, A. Learning query intent from regularized click graphs. SIGIR '08, (2008), 339.

Digital Library

[14]

Lim, B.Y. and Dey, A.K. Assessing Demand for Intelligibility in Context-Aware Applications. Ubicomp '09, (2009), 195--204.

Digital Library

[15]

Lim, B.Y. and Dey, A.K. Investigating Intelligibility for Uncertain Context-Aware Applications. Ubicomp '11, (2011), 415--424.

Digital Library

[16]

Or, C.K.L. and Karsh, B.-T. A systematic review of patient acceptance of consumer health information technology. JAMIA 16, 4, 550--60.

[17]

Patel, S.N., Robertson, T., Kientz, J.A., Reynolds, M.S., and Abowd, G.D. At the Flick of a Switch: Detecting and Classifying Unique Electrical Events on the Residential Power Line. UbiComp '07, (2007).

Digital Library

[18]

Pavlou, P. Consumer acceptance of electronic commerce: integrating trust and risk with the technology acceptance model. Int J Electron Comm 7, 3 (2003).

Digital Library

[19]

Pentland, A. and Choudhury, T. Face recognition for smart environments. Computer, February (2000).

Digital Library

[20]

Popescu, M. and Li, Y. An acoustic fall detector system that uses sound height information to reduce the false alarm rate. IEEE EMBS, (2008), 4628--4631.

[21]

Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. J of Mach Lear Tech 2, 1 (2011), 37--63.

[22]

Van Rijsbergen, C.J. Evaluation. In Information Retrieval. Butterworth & Co., 1975, 95--132.

[23]

Rogers, Y., Connelly, K., Tedesco, L., et al. Why it's worth the hassle: The value of in-situ studies when designing UbiComp. UbiComp '07, (2007), 336--353.

Digital Library

[24]

Rukzio, E., Hamard, J., Noda, C., and Luca, A. De. Visualization of Uncertainty in Context Aware Mobile Applications. (2006), 247--250.

Digital Library

[25]

Scholtz, J. and Consolvo, S. Toward a framework for evaluating ubiquitous computing applications. IEEE Pervasive Computing 3, 2 (2004), 82--88.

Digital Library

[26]

Silver, N. The Weatherman Is Not a Moron. The New York Times, 2012. www.nytimes.com/2012/09/09/ magazine/the-weatherman-is-not-a-moron.html.

[27]

Venkatesh, V. and Davis, F.D. A theoretical extension of the Technology Acceptance Model: Four longitudinal field studies. Manage Sci 46, 2 (2000), 186--204.

[28]

Venkatesh, V., Morris, M.G., Davis, G.B., and Davis, F.D. User acceptance of information technology: Toward a unified view. MIS quarterly 27, 3 (2003), 425--478.

[29]

Ward, A., Jones, A., and Hopper, A. A new location technique for the active office. IEEE Personal Communications, October (1997), 42--47.

Cited By

Kianpisheh MMariakakis ATruong K(2024)exHARProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435008:1(1-30)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643500
Hu XAssaad RHussein M(2024)Discovering key factors and causalities impacting bridge pile resistance using Ensemble Bayesian networks: A bridge infrastructure asset management systemExpert Systems with Applications10.1016/j.eswa.2023.121677238(121677)Online publication date: Mar-2024
https://doi.org/10.1016/j.eswa.2023.121677
Kwon DBorrion HWortley R(2024)Measuring Cybercrime in Calls for Police ServiceAsian Journal of Criminology10.1007/s11417-024-09432-219:3(329-351)Online publication date: 5-Jul-2024
https://doi.org/10.1007/s11417-024-09432-2
Show More Cited By

Index Terms

How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy
1. Human-centered computing

Recommendations

Performance Evaluation of Ensemble Methods For Software Fault Prediction: An Experiment
ASWEC ' 15 Vol. II: Proceedings of the ASWEC 2015 24th Australasian Software Engineering Conference

In object-oriented software development, a plethora of studies have been carried out to present the application of machine learning algorithms for fault prediction. Furthermore, it has been empirically validated that an ensemble method can improve ...
A lot of randomness is hiding in accuracy

The proportion of successful hits, usually referred to as "accuracy", is by far the most dominant meter for measuring classifiers' accuracy. This is despite of the fact that accuracy does not compensate for hits that can be attributed to mere chance. Is ...
A weight-adjusted-voting framework on an ensemble of classifiers for improving sensitivity

We propose a weight-adjusted-voting framework that combines an ensemble of classifiers for improving sensitivity of prediction. In this framework, we first adjust each individual classifier’s weight in the ensemble based on their ability of making ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

April 2015

4290 pages

ISBN:9781450331456

DOI:10.1145/2702123

General Chairs:
Bo Begole
Huawei, USA
,
Jinwoo Kim
Yonsei University, Korea
,
Program Chairs:
Kori Inkpen
Microsoft Research, USA
,
Woontack Woo
KAIST, Korea

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CHI '15

Sponsor:

SIGCHI

CHI '15: CHI Conference on Human Factors in Computing Systems

April 18 - 23, 2015

Seoul, Republic of Korea

Acceptance Rates

CHI '15 Paper Acceptance Rate 486 of 2,120 submissions, 23%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
802
Total Downloads

Downloads (Last 12 months)89
Downloads (Last 6 weeks)16

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kianpisheh MMariakakis ATruong K(2024)exHARProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435008:1(1-30)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643500
Hu XAssaad RHussein M(2024)Discovering key factors and causalities impacting bridge pile resistance using Ensemble Bayesian networks: A bridge infrastructure asset management systemExpert Systems with Applications10.1016/j.eswa.2023.121677238(121677)Online publication date: Mar-2024
https://doi.org/10.1016/j.eswa.2023.121677
Kwon DBorrion HWortley R(2024)Measuring Cybercrime in Calls for Police ServiceAsian Journal of Criminology10.1007/s11417-024-09432-219:3(329-351)Online publication date: 5-Jul-2024
https://doi.org/10.1007/s11417-024-09432-2
Vaughn CSherman‐Morris KBrown MGutter B(2024)That's not what my app says: Perceptions of accuracy, consistency, and trust in weather appsMeteorological Applications10.1002/met.220531:3Online publication date: 9-May-2024
https://doi.org/10.1002/met.2205
Xu CLien KHöllerer T(2023)Comparing Zealous and Restrained AI Recommendations in a Real-World Human-AI Collaboration TaskProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581282(1-15)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581282
Lin TLafreniere BXu YGrossman TWigdor DGlueck M(2023)XR Input Error Mediation for Hand-Based Input: Task and Context Influences a User’s Preference2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)10.1109/ISMAR59233.2023.00117(1006-1015)Online publication date: 16-Oct-2023
https://doi.org/10.1109/ISMAR59233.2023.00117
Mariakakis AKarkar RPatel SKientz JFogarty JMunson S(2022)Using Health Concept Surveying to Elicit Usable Evidence: Case Studies of a Novel Evaluation MethodologyJMIR Human Factors10.2196/304749:1(e30474)Online publication date: 3-Jan-2022
https://doi.org/10.2196/30474
Lu XThomaz EEpstein D(2022)Understanding People's Perceptions of Approaches to Semi-Automated Dietary MonitoringProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35502886:3(1-27)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.1145/3550288
Henderson JJonker TLank EWigdor DLafreniere B(2022)Investigating Cross-Modal Approaches for Evaluating Error Acceptability of a Recognition-Based Input TechniqueProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35172626:1(1-24)Online publication date: 29-Mar-2022
https://dl.acm.org/doi/10.1145/3517262
Cockburn AQuinn PGutwin CChen ZSuwanaposee P(2022)Probability Weighting in Interactive DecisionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517477(1-12)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3517477
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten