Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3290605.3300234acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

Published: 02 May 2019 Publication History

Abstract

Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs. In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm, and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these tools increased the diagnostic utility of images found and increased user trust in the algorithm. The tools were preferred over a traditional interface, without a loss in diagnostic accuracy. We also observed that users adopted new strategies when using refinement tools, re-purposing them to test and understand the underlying algorithm and to disambiguate ML errors from their own errors. Taken together, these findings inform future human-ML collaborative systems for expert decision-making.

References

[1]
Ceyhun Burak Akgül, Daniel L Rubin, Sandy Napel, Christopher F Beaulieu, Hayit Greenspan, and Burak Acar. 2011. Content-based image retrieval in radiology: current status and future directions. Journal of Digital Imaging 24, 2 (2011), 208--222.
[2]
Guillaume Alain and Yoshua Bengio. 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 (2016).
[3]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105--120.
[4]
Saleema Amershi, James Fogarty, and Daniel Weld. 2012. Regroup: Interactive machine learning for on-demand group creation in social networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 21--30.
[5]
Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. 2015. From generic to specific deep representations for visual recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 36--45.
[6]
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network Dissection: Quantifying Interpretability of Deep Visual Representations. In Computer Vision and Pattern Recognition.
[7]
Eta S Berner. 2007. Clinical decision support systems. Vol. 233. Springer.
[8]
Marshal A Blatt, Michael C Higgins, Keith I Marton, and HC Sox Jr. 1988. Medical decision making.
[9]
Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The Effects of Example-Based Explanations in a Machine Learning Interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM.
[10]
Matthew Chalmers and Ian MacColl. 2003. Seamful and seamless design in ubiquitous computing. In Workshop at the crossroads: The interaction of HCI and systems issues in UbiComp, Vol. 8.
[11]
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2008. Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Comput. Surv. 40, 2, Article 5 (May 2008), 60 pages.
[12]
Scott Doyle, Mark Hwang, Kinsuk Shah, Anant Madabhushi, Michael Feldman, and John Tomaszeweski. 2007. Automated grading of prostate cancer using architectural and textural image features. In Biomedical imaging: from nano to macro, 2007. ISBI 2007. 4th IEEE international symposium on. IEEE, 1284--1287.
[13]
Jesse Engel, Matthew Hoffman, and Adam Roberts. 2017. Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models. Computing Research Repository abs/1711.05772 (2017).
[14]
Jonathan I Epstein, Michael J Zelefsky, Daniel D Sjoberg, Joel B Nelson, Lars Egevad, Cristina Magi-Galluzzi, Andrew J Vickers, Anil V Parwani, Victor E Reuter, Samson W Fine, et al. 2016. A contemporary prostate cancer grading system: a validated alternative to the Gleason score. European urology 69, 3 (2016), 428--435.
[15]
Motahhare Eslami, Karrie Karahalios, Christian Sandvig, Kristen Vaccaro, Aimee Rickman, Kevin Hamilton, and Alex Kirlik. 2016. First i like it, then i hide it: Folk theories of social feeds. In Proceedings of the 2016 cHI conference on human factors in computing systems. ACM, 2371--2382.
[16]
Jerry Alan Fails and Dan R Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 39--45.
[17]
Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al. 1995. Query by image and video content: The QBIC system. computer 28, 9 (1995), 23--32.
[18]
James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the sigchi conference on human factors in computing systems. ACM, 29--38.
[19]
Amit X Garg, Neill KJ Adhikari, Heather McDonald, M Patricia RosasArellano, PJ Devereaux, Joseph Beyene, Justina Sam, and R Brian Haynes. 2005. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. Jama 293, 10 (2005), 1223--1238.
[20]
Sandra G Hart and Lowell E Staveland. 1988. Development of NASATLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139--183.
[21]
Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors 57, 3 (2015), 407--434.
[22]
Avi Kak and Christina Pavlopoulou. 2002. Content-based image retrieval from large medical databases. In 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on. IEEE, 138--147.
[23]
Saif Khairat, David Marc, William Crosby, and Ali Al Sanousi. 2018. Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis. JMIR medical informatics 6, 2 (2018).
[24]
Been Kim, Elena Glassman, Brittney Johnson, and Julie Shah. 2015. iBCM: Interactive Bayesian Case Model Empowering Humans via Intuitive Interaction. (2015).
[25]
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International Conference on Machine Learning. 2673--2682.
[26]
René F Kizilcec. 2016. How much information?: Effects of transparency on trust in an algorithmic interface. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2390--2395.
[27]
Ajay Kohli and Saurabh Jha. 2018. Why CAD failed in mammography. Journal of the American College of Radiology 15, 3 (2018), 535--537.
[28]
Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3075--3084.
[29]
Roger C Mayer, James H Davis, and F David Schoorman. 1995. An integrative model of organizational trust. Academy of management review 20, 3 (1995), 709--734.
[30]
Neville Mehta, Raja Alomari, and Vipin Chaudhary. 2009. Content based sub-image retrieval system for high resolution pathology images using salient interest points. 2009 (09 2009), 3719--22.
[31]
B Middleton, DF Sittig, and A Wright. 2016. Clinical decision support: a 25 year retrospective and a 25 year vision. Yearbook of medical informatics 25, S 01 (2016), S103--S116.
[32]
Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In HLT-NAACL.
[33]
Clara Mosquera-Lopez, Sos Agaian, Alejandro Velez-Hoyos, and Ian Thompson. 2015. Computer-aided prostate cancer diagnosis from digitized histopathology: a review on texture-based systems. IEEE reviews in biomedical engineering 8 (2015), 98--113.
[34]
Henning Müller, Nicolas Michoux, David Bandon, and Antoine Geissbuhler. 2004. A review of content-based image retrieval systems in medical applications clinical benefits and future directions. International journal of medical informatics 73, 1 (2004), 1--23.
[35]
Carlton Wayne Niblack, Ron Barber, Will Equitz, Myron D Flickner, Eduardo H Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. 1993. QBIC project: querying images by content, using color, texture, and shape. In Storage and retrieval for image and video databases, Vol. 1908. International Society for Optics and Photonics, 173--188.
[36]
Raymond S Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of general psychology 2, 2 (1998), 175.
[37]
Jerome A Osheroff, Jonathan M Teich, Blackford Middleton, Elaine B Steen, Adam Wright, and Don E Detmer. 2007. A roadmap for national action on clinical decision support. Journal of the American medical informatics association 14, 2 (2007), 141--145.
[38]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1135--1144.
[39]
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813.
[40]
Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, et al. 2017. Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 (2017).
[41]
Judah ES Sklan, Andrew J Plassard, Daniel Fabbri, and Bennett A Landman. 2015. Toward content-based image retrieval with deep convolutional neural networks. In Medical Imaging 2015: Biomedical Applications in Molecular, Structural, and Functional Imaging, Vol. 9417. International Society for Optics and Photonics, 94172C.
[42]
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (Dec 2000), 1349--1380.
[43]
Kristen Vaccaro, Dylan Huang, Motahhare Eslami, Christian Sandvig, Kevin Hamilton, and Karrie Karahalios. 2018. The Illusion of Control: Placebo Effects of Control Settings. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 16.
[44]
Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep Learning for ContentBased Image Retrieval: A Comprehensive Study. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM '14). ACM, New York, NY, USA, 157--166.
[45]
Qian Yang, John Zimmerman, Aaron Steinfeld, Lisa Carey, and James F Antaki. 2016. Investigating the heart pump implant decision process: opportunities for decision support tools to help. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4477--4488.
[46]
Mussarat Yasmin, Sajjad Mohsin, and Muhammad Sharif. 2014. Intelligent Image Retrieval Techniques: A Survey. Journal of Applied Research and Technology 12, 1 (2014), 87 -- 103.
[47]
HongJiang Zhang and Zhong Su. 2002. Relevance feedback in CBIR. In Visual and Multimedia Information Management. Springer, 21--35.
[48]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv preprint arXiv:1703.10593 (2017).

Cited By

View all
  • (2024)Validating the Accuracy of a Patient-Facing Clinical Decision Support System in Predicting Lumbar Disc Herniation: Diagnostic Accuracy StudyDiagnostics10.3390/diagnostics1417187014:17(1870)Online publication date: 26-Aug-2024
  • (2024)Human-centered AI Technologies in Human-robot Interaction for Social SettingsProceedings of the International Conference on Mobile and Ubiquitous Multimedia10.1145/3701571.3701610(501-505)Online publication date: 1-Dec-2024
  • (2024)Beyond Recommendations: From Backward to Forward AI Support of Pilots' Decision-Making ProcessProceedings of the ACM on Human-Computer Interaction10.1145/36870248:CSCW2(1-32)Online publication date: 8-Nov-2024
  • Show More Cited By

Index Terms

  1. Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
    May 2019
    9077 pages
    ISBN:9781450359702
    DOI:10.1145/3290605
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 May 2019

    Check for updates

    Badges

    • Honorable Mention

    Author Tags

    1. clinical health
    2. human-ai interaction
    3. machine learning

    Qualifiers

    • Research-article

    Conference

    CHI '19
    Sponsor:

    Acceptance Rates

    CHI '19 Paper Acceptance Rate 703 of 2,958 submissions, 24%;
    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2,077
    • Downloads (Last 6 weeks)270
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Validating the Accuracy of a Patient-Facing Clinical Decision Support System in Predicting Lumbar Disc Herniation: Diagnostic Accuracy StudyDiagnostics10.3390/diagnostics1417187014:17(1870)Online publication date: 26-Aug-2024
    • (2024)Human-centered AI Technologies in Human-robot Interaction for Social SettingsProceedings of the International Conference on Mobile and Ubiquitous Multimedia10.1145/3701571.3701610(501-505)Online publication date: 1-Dec-2024
    • (2024)Beyond Recommendations: From Backward to Forward AI Support of Pilots' Decision-Making ProcessProceedings of the ACM on Human-Computer Interaction10.1145/36870248:CSCW2(1-32)Online publication date: 8-Nov-2024
    • (2024)Outcome First or Overview First? Optimizing Patient-Oriented Framework for Evidence-Based Healthcare Treatment Selections with XAI ToolsCompanion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing10.1145/3678884.3681859(248-254)Online publication date: 11-Nov-2024
    • (2024)Misfitting With AI: How Blind People Verify and Contest AI ErrorsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675659(1-17)Online publication date: 27-Oct-2024
    • (2024)Integrating Crowd and Machine Learning in an Intelligent Interface: A Case Study of Oil Spill Detection in Satellite ImagesProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656680(1-9)Online publication date: 3-Jun-2024
    • (2024)DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-ExplorationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676366(1-20)Online publication date: 13-Oct-2024
    • (2024)How Can I Signal You To Trust Me: Investigating AI Trust Signalling in Clinical Self-AssessmentsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661612(525-540)Online publication date: 1-Jul-2024
    • (2024)"It depends": Configuring AI to Improve Clinical Usefulness Across ContextsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660707(874-889)Online publication date: 1-Jul-2024
    • (2024)Development and translation of human-AI interaction models into working prototypes for clinical decision-makingProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660697(1607-1619)Online publication date: 1-Jul-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media