Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Constructing a Classification Scheme - and its Consequences: A Field Study of Learning to Label Data for Computer Vision in a Hospital Intensive Care Unit

Published: 08 November 2024 Publication History

Abstract

Research on data annotation for artificial intelligence (AI) has demonstrated that biases, power, and culture impact the ways that annotators apply labels to data and subsequently affect downstream AI systems. However, annotators can only apply labels that are available to them in the annotation classification scheme. Drawing on a 3-year ethnographic study of an R&D collaboration between medical and AI researchers, we argue that the construction of the classification schema itself -- decisions about what kinds of data can and cannot be collected, what activities can and cannot be detected in the data, what the possible annotation classes ought to be, and the rules by which an item ought to be classified into each class -- dramatically shape the annotation process, and through it, the AI. We draw on Bowker and Star's [9] classification theory to detail how the creation of a training data codebook for a computer vision algorithm in hospital intensive care units (ICUs) evolved from its original, clinically-driven goal of classifying complex clinical activities into a narrower goal of identifying physical objects and simpler activities in the ICU. This work reinforces how trade-offs and decisions made long before annotators begin labeling data are highly consequential to the resulting AI system.

References

[1]
K. Annaiahshetty and N. Prasad. 2013. Expert System for Multiple Domain Experts Knowledge Acquisition in Software Design and Development. In 2013 UKSim 15th International Conference on Computer Modelling and Simulation. IEEE, Cambridge, 196--201. https://doi.org/10.1109/UKSim.2013.124
[2]
Lora Aroyo and Chris Welty. 2015. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine, Vol. 36, 1 (March 2015), 15--24. https://doi.org/10.1609/aimag.v36i1.2564
[3]
Diane E. Bailey and Stephen R. Barley. 2020. Beyond design and use: How scholars should study intelligent technologies. Information and Organization, Vol. 30, 2 (June 2020), 100286. https://doi.org/10.1016/j.infoandorg.2019.100286
[4]
Anja Bechmann and Geoffrey C Bowker. 2019. Unsupervised by any other name: Hidden layers of knowledge production in artificial intelligence on social media. Big Data & Society, Vol. 6, 1 (Jan. 2019), 1--11. https://doi.org/10.1177/2053951718819569
[5]
Lindsay Blackwell, Jill Dimond, Sarita Schoenebeck, and Cliff Lampe. 2017. Classification and Its Consequences for Online Harassment: Design Insights from HeartMob. Proceedings of the ACM on Human-Computer Interaction, Vol. 1, CSCW (Dec. 2017), 1--19. https://doi.org/10.1145/3134659
[6]
Geoffrey C. Bowker. 1994 a. Information Mythology and Infrastructure. In Information Acumen: The Understanding and Use of Knowledge in Modern Business, Lisa Bud-Frierman (Ed.). Routledge, London; New York, 231--247.
[7]
Geoffrey C. Bowker. 1994 b. Science on the run: information management and industrial geophysics at Schlumberger, 1920--1940. MIT Press, Cambridge, MA; London.
[8]
Geoffrey C. Bowker and Susan Leigh Star. 1994. Knowledge and Infrastructure in International Information Management: Problems of Classification and Coding. In Information Acumen: The Understanding and Use of Knowledge in Modern Business, Lisa Bud-Frierman (Ed.). Routledge, London; New York, 187--213.
[9]
Geoffrey C. Bowker and Susan Leigh Star. 2000. Sorting Things Out: Classification and Its Consequences. The MIT Press. https://doi.org/10.7551/mitpress/6352.001.0001
[10]
Geoffrey C. Bowker, Stefan Timmermans, and Susan Leigh Star. 1996. Infrastructure and Organizational Transformation: Classifying Nurses' Work. In Information Technology and Changes in Organizational Work: Proceedings of the IFIP WG8. 2 Working Conference on Information Technology and Changes in Organizational Work, December 1995, Wanda J. Orlikowski, Geoff Walsham, Matthew R. Jones, and Janice I. Degross (Eds.). Springer US, Boston, MA, 344--370. https://doi.org/10.1007/978-0--387--34872--8_21
[11]
Kathy Charmaz. 2014. Grounded Theory in Global Perspective: Reviews by International Researchers. Qualitative Inquiry, Vol. 20, 9 (Nov. 2014), 1074--1084. https://doi.org/10.1177/1077800414545235
[12]
Justin Cheng and Dan Cosley. 2013. How annotation styles influence content and preferences. In Proceedings of the 24th ACM Conference on Hypertext and Social Media. ACM, Paris France, 214--218. https://doi.org/10.1145/2481492.2481519
[13]
Hannah Davis. 2020. A Dataset is a Worldview. https://towardsdatascience.com/a-dataset-is-a-worldview-5328216dd44d
[14]
Vania Dimitrova, Ronald Denaux, Glen Hart, Catherine Dolbear, Ian Holt, and Anthony G. Cohn. 2008. Involving Domain Experts in Authoring OWL Ontologies. In The Semantic Web - ISWC 2008, Amit Sheth, Steffen Staab, Mike Dean, Massimo Paolucci, Diana Maynard, Timothy Finin, and Krishnaprasad Thirunarayan (Eds.). Vol. 5318. Springer Berlin Heidelberg, Berlin, Heidelberg, 1--16. https://doi.org/10.1007/978--3--540--88564--1_1 Series Title: Lecture Notes in Computer Science.
[15]
Amy C. Edmondson. 1996. Learning from Mistakes is Easier Said Than Done: Group and Organizational Influences on the Detection and Correction of Human Error. The Journal of Applied Behavioral Science, Vol. 32, 1 (March 1996), 5--28. https://doi.org/10.1177/0021886396321001
[16]
E. Wesley Ely. 2017. The ABCDEF Bundle: Science and Philosophy of How ICU Liberation Serves Patients and Families. Critical care medicine, Vol. 45, 2 (Feb. 2017), 321--330. https://doi.org/10.1097/CCM.0000000000002175
[17]
Li Fei-Fei and Ranjay Krishna. 2022. Searching for Computer Vision North Stars. Daedalus, Vol. 151, 2 (May 2022), 85--99. https://doi.org/10.1162/daed_a_01902
[18]
Melanie Feinberg. 2017. A Design Perspective on Data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, Denver Colorado USA, 2952--2963. https://doi.org/10.1145/3025453.3025837
[19]
Diana Forsythe and David J. Hess. 2001. Studying those who study us: an anthropologist in the world of artificial intelligence. Stanford University Press, Stanford, Calif.
[20]
R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, Barcelona Spain, 325--336. https://doi.org/10.1145/3351095.3372862
[21]
D. Gilhooly, S. A. Green, C. McCann, N. Black, and S. R. Moonesinghe. 2019. Barriers and facilitators to the successful development, implementation and evaluation of care bundles in acute care in hospital: a scoping review. Implementation Science, Vol. 14, 1 (May 2019), 47. https://doi.org/10.1186/s13012-019-0894--2
[22]
Karen Golden-Biddle and Karen Locke. 2007. Composing qualitative research 2nd ed.). Sage, Thousand Oaks, Calif.
[23]
Severin Hornung, Denise M. Rousseau, Jürgen Glaser, Peter Angerer, and Matthias Weigl. 2010. Beyond top-down and bottom-up work redesign: Customizing job content through idiosyncratic deals. Journal of Organizational Behavior, Vol. 31, 2--3 (Feb. 2010), 187--215. https://doi.org/10.1002/job.625
[24]
Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. ACM, San Francisco California USA, 1637--1648. https://doi.org/10.1145/2818048.2820016
[25]
Zelun Luo, Zane Durante, Linden Li, Wanze Xie, Ruochen Liu, Emily Jin, Zhuoyi Huang, Lun Yu Li, Jiajun Wu, Juan Carlos Niebles, Ehsan Adeli, and Fei-Fei Li. 2022. MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 5282--5298. https://proceedings.neurips.cc/paper_files/paper/2022/file/22c16986b2f50af520f56dc34d91e403-Paper-Datasets_and_Benchmarks.pdf
[26]
Zelun Luo, Wanze Xie, Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, and Fei-Fei Li. 2021. MOMA: Multi-Object Multi-Actor Activity Parsing. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 17939--17955. https://proceedings.neurips.cc/paper/2021/hash/95688ba636a4720a85b3634acfec8cdd-Abstract.html
[27]
Annachiara Marra, E. Wesley Ely, Pratik P. Pandharipande, and Mayur B. Patel. 2017. The ABCDEF Bundle in Critical Care. Critical care clinics, Vol. 33, 2 (April 2017), 225--243. https://doi.org/10.1016/j.ccc.2016.12.005
[28]
Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision. Proceedings of the ACM on Human-Computer Interaction, Vol. 4, CSCW2 (Oct. 2020), 1--25. https://doi.org/10.1145/3415186
[29]
Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T Dudley. 2018. Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics, Vol. 19, 6 (Nov. 2018), 1236--1246. https://doi.org/10.1093/bib/bbx044
[30]
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland UK, 1--15. https://doi.org/10.1145/3290605.3300356
[31]
Michael Muller, Christine T. Wolf, Josh Andres, Michael Desmond, Narendra Nath Joshi, Zahra Ashktorab, Aabhas Sharma, Kristina Brimijoin, Qian Pan, Evelyn Duesterwald, and Casey Dugan. 2021. Designing Ground Truth and the Social Life of Labels. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1--16. https://doi.org/10.1145/3411764.3445402
[32]
Srinivas Murthy and Hannah Wunsch. 2012. Clinical review: International comparisons in critical care - lessons learned. Critical Care, Vol. 16, 2 (2012), 218. https://doi.org/10.1186/cc11140
[33]
Esther Olsen, Zhanna Novikov, Theadora Sakata, Monique H. Lambert, Javier Lorenzo, Roger Bohn, and Sara J. Singer. 2024. More isnt always better: Technology in the intensive care unit. Health Care Management Review, Vol. 49, 2 (April 2024), 127--138. https://doi.org/10.1097/HMR.0000000000000398
[34]
Samir Passi and Steven Jackson. 2017. Data Vision: Learning to See Through Algorithmic Abstraction. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, Portland Oregon USA, 2436--2447. https://doi.org/10.1145/2998181.2998331
[35]
Samir Passi and Steven J. Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (Nov. 2018), 1--28. https://doi.org/10.1145/3274405
[36]
Samir Passi and Phoebe Sengers. 2020. Making data science systems work. Big Data & Society, Vol. 7, 2 (July 2020), 1--13. https://doi.org/10.1177/2053951720939605
[37]
Kathleen H. Pine and Max Liboiron. 2015. The Politics of Measurement and Action. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, Seoul Republic of Korea, 3147--3156. https://doi.org/10.1145/2702123.2702298
[38]
Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021. Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1--15. https://doi.org/10.1145/3411764.3445518
[39]
Alexander Sorokin and David Forsyth. 2008. Utility data annotation with Amazon Mechanical Turk. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Anchorage, AK, USA, 1--8. https://doi.org/10.1109/CVPRW.2008.4562953
[40]
James P. Spradley. 1979. The ethnographic interview. Holt, Rinehart and Winston, New York.
[41]
Susan Leigh Star (Ed.). 1995 a. The Cultures of Computing. Blackwell Publisher, Oxford, UK; Cambridge, MA, USA.
[42]
Susan Leigh Star (Ed.). 1995 b. Ecologies of Knowledge: Work and Politics in Science and Technology. State University of New York Press, Albany.
[43]
Susan Leigh Star. 1995 c. The Politics of Formal Representations: Wizards, Gurus, and Organizational Complexity. In Ecologies of Knowledge: Work and Politics in Science and Technology, Susan Leigh Star (Ed.). State University of New York Press, Albany, 88--118.
[44]
Lucy Suchman. 1995. Representations of work: Making work visible. Commun. ACM, Vol. 38, 9 (1995), 33--35.
[45]
Paola Tubaro, Antonio A Casilli, and Marion Coville. 2020. The trainer, the verifier, the imitator: Three ways in which human platform workers support artificial intelligence. Big Data & Society, Vol. 7, 1 (Jan. 2020), 1--12. https://doi.org/10.1177/2053951720919776
[46]
Ding Wang, Shantanu Prabhat, and Nithya Sambasivan. 2022. Whose AI Dream? In search of the aspiration in data annotation. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1--16. https://doi.org/10.1145/3491102.3502121
[47]
Hannah Wunsch, Derek C. Angus, David A. Harrison, Olivier Collange, Robert Fowler, Eric A. J. Hoste, Nicolette F. De Keizer, Alexander Kersten, Walter T. Linde-Zwirble, Alberto Sandiumenge, and Kathryn M. Rowan. 2008. Variation in critical care services across North America and Western Europe*:. Critical Care Medicine, Vol. 36, 10 (Oct. 2008), 2787--93, e1--9. https://doi.org/10.1097/CCM.0b013e318186aec8
[48]
Jingru Yang, Ju Fan, Zhewei Wei, Guoliang Li, Tongyu Liu, and Xiaoyong Du. 2018. Cost-effective data annotation using game-based crowdsourcing. Proceedings of the VLDB Endowment, Vol. 12, 1 (Sept. 2018), 57--70. https://doi.org/10.14778/3275536.3275541
[49]
Serena Yeung, Francesca Rinaldo, Jeffrey Jopling, Bingbin Liu, Rishab Mehra, N. Lance Downing, Michelle Guo, Gabriel M. Bianconi, Alexandre Alahi, Julia Lee, Brandi Campbell, Kayla Deru, William Beninati, Li Fei-Fei, and Arnold Milstein. 2019. A computer vision system for deep learning-based detection of patient mobilization activities in the ICU. npj Digital Medicine, Vol. 2, 1 (Dec. 2019), 1--5. https://doi.org/10.1038/s41746-019-0087-z
[50]
Kristina Yordanova, Frank Kruger, and Thomas Kirste. 2018. Providing Semantic Annotation for the Carnegie Mellon University Grand Challenge Dataset. In 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, Athens, 579--584. https://doi.org/10.1109/PERCOMW.2018.8480380
[51]
Kristina Yordanova, Adeline Paiement, Max Schröder, Emma Tonkin, Przemyslaw Woznowski, Carl Magnus Olsson, Joseph Rafferty, and Timo Sztyler. 2018. Challenges in Annotation of useR Data for UbiquitOUs Systems: Results from the 1st ARDUOUS Workshop. (2018). https://doi.org/10.48550/ARXIV.1803.05843 Publisher: arXiv Version Number: 1.
[52]
Gary J. Young. 2000. Managing Organizational Transformations: Lessons from the Veterans Health Administration. California Management Review, Vol. 43, 1 (Oct. 2000), 66--82. https://doi.org/10.2307/41166066

Index Terms

  1. Constructing a Classification Scheme - and its Consequences: A Field Study of Learning to Label Data for Computer Vision in a Hospital Intensive Care Unit

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Human-Computer Interaction
    Proceedings of the ACM on Human-Computer Interaction  Volume 8, Issue CSCW2
    CSCW
    November 2024
    5177 pages
    EISSN:2573-0142
    DOI:10.1145/3703902
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 November 2024
    Published in PACMHCI Volume 8, Issue CSCW2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. computer vision
    3. data annotation
    4. labeling

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 79
      Total Downloads
    • Downloads (Last 12 months)79
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 01 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media