Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3613905.3636301acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
extended-abstract

LLMs as Research Tools: Applications and Evaluations in HCI Data Work

Published: 11 May 2024 Publication History

Abstract

Large language models (LLMs) stand to reshape traditional methods of working with data. While LLMs unlock new and potentially useful ways of interfacing with data, their use in research processes requires methodological and critical evaluation. In this workshop, we seek to gather a community of HCI researchers interested in navigating the responsible integration of LLMs into data work: data collection, processing, and analysis. We aim to create an understanding of how LLMs are being used to work with data in HCI research, and document the early challenges and concerns that have arisen. Together, we will outline a research agenda on using LLMs as research tools to work with data by defining the open empirical and ethical evaluation questions and thus contribute to setting norms in the community. We believe CHI to be the ideal place to address these questions due to the methodologically diverse researcher attendees, the prevalence of HCI research on human interaction with new computing and data paradigms, and the community’s sense of ethics and care. Insights from this forum can contribute to other research communities grappling with related questions.

References

[1]
Ferran Altarriba Bertran, Soomin Kim, Minsuk Chang, Ella Dagan, Jared Duval, Katherine Isbister, and Laia Turmo Turmo Vidal. 2021. Social Media as a Design and Research Site in HCI: Mapping Out Opportunities and Envisioning Future Uses. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems(CHI EA ’21). Association for Computing Machinery, New York, NY, USA, 1–5. https://doi.org/10.1145/3411763.3441311
[2]
Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis 31, 3 (July 2023), 337–351. https://doi.org/10.1017/pan.2023.2 Publisher: Cambridge University Press.
[3]
Mohammad Atari, Mona J. Xue, Peter S. Park, Damián Blasi, and Joseph Henrich. 2023. Which Humans?https://doi.org/10.31234/osf.io/5b26t
[4]
Christopher A. Bail. 2024. Can Generative AI Improve Social Science? (Jan. 2024). https://doi.org/10.31235/osf.io/rwtzs Publisher: OSF.
[5]
Gagan Bansal, Alison Marie Smith-Renner, Zana Buçinca, Tongshuang Wu, Kenneth Holstein, Jessica Hullman, and Simone Stumpf. 2022. Workshop on Trust and Reliance in AI-Human Teams (TRAIT). In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems(CHI EA ’22). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3491101.3503704
[6]
Eric P. S. Baumer, David Mimno, Shion Guha, Emily Quan, and Geri K. Gay. 2017. Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?Journal of the Association for Information Science and Technology 68, 6 (2017), 1397–1410. https://doi.org/10.1002/asi.23786 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.23786.
[7]
Claus Bossen, Kathleen H Pine, Federico Cabitza, Gunnar Ellingsen, and Enrico Maria Piras. 2019. Data work in healthcare: An Introduction. Health Informatics Journal 25, 3 (Sept. 2019), 465–474. https://doi.org/10.1177/1460458219864730 Publisher: SAGE Publications Ltd.
[8]
Duncan P. Brumby, Ann Blandford, Anna L. Cox, Sandy J. J. Gould, and Paul Marshall. 2017. Understanding People: A Course on Qualitative and Quantitative HCI Research Methods. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems(CHI EA ’17). Association for Computing Machinery, New York, NY, USA, 1170–1173. https://doi.org/10.1145/3027063.3027103
[9]
Robyn Caplan, Joan Donovan, Lauren Hanson, and Jeanna Matthews. 2018. Algorithmic Accountability: A Primer. https://datasociety.net/library/algorithmic-accountability-a-primer/ Publisher: Data & Society Research Institute.
[10]
Minsuk Chang, John Joon Young Chung, Katy Ilonka Gero, Ting-Hao Kenneth Huang, Dongyeop Kang, Mina Lee, Vipul Raheja, and Thiemo Wambsganss. 2023. The Second Workshop on Intelligent and Interactive Writing Assistants. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–5. https://doi.org/10.1145/3544549.3573826
[11]
Janet X. Chen, Allison McDonald, Yixin Zou, Emily Tseng, Kevin A Roundy, Acar Tamersoy, Florian Schaub, Thomas Ristenpart, and Nicola Dell. 2022. Trauma-Informed Computing: Towards Safer Technology Experiences for All. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems(CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–20. https://doi.org/10.1145/3491102.3517475
[12]
Felix Chopra and Ingar Haaland. 2023. Conducting Qualitative Interviews with AI. https://doi.org/10.2139/ssrn.4583756
[13]
Sasha Costanza-Chock. 2020. Design Justice: Community-Led Practices to Build the Worlds We Need. The MIT Press. https://library.oapen.org/handle/20.500.12657/43542 Accepted: 2020-12-15T13:38:22Z.
[14]
Stefano De Paoli. 2023. Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model. https://doi.org/10.48550/arXiv.2305.13014 arXiv:2305.13014 [cs].
[15]
Bosheng Ding, Chengwei Qin, Linlin Liu, Yew Ken Chia, Shafiq Joty, Boyang Li, and Lidong Bing. 2023. Is GPT-3 a Good Data Annotator?http://arxiv.org/abs/2212.10450 arXiv:2212.10450 [cs].
[16]
Upol Ehsan, Philipp Wintersberger, Elizabeth A Watkins, Carina Manger, Gonzalo Ramos, Justin D. Weisz, Hal Daumé Iii, Andreas Riener, and Mark O Riedl. 2023. Human-Centered Explainable AI (HCXAI): Coming of Age. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems(CHI EA ’23). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3544549.3573832
[17]
Xiachong Feng, Xiaocheng Feng, Libo Qin, Bing Qin, and Ting Liu. 2021. Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization. https://doi.org/10.48550/arXiv.2105.12544 arXiv:2105.12544 [cs].
[18]
Jie Gao, Yuchen Guo, Gionnieve Lim, Tianqin Zhang, Zheng Zhang, Toby Jia-Jun Li, and Simon Tangi Perrault. 2023. CollabCoder: A GPT-Powered Workflow for Collaborative Qualitative Analysis. https://doi.org/10.48550/arXiv.2304.07366 arXiv:2304.07366 [cs].
[19]
Adam Hayes. 2023. “Conversing” with Qualitative Data: Enhancing Qualitative Research through Large Language Models (LLMs). https://doi.org/10.31235/osf.io/yms8p
[20]
Michael Heseltine and Bernhard Clemm von Hohenberg. 2023. Large Language Models as a Substitute for Human Experts in Annotating Political Text. https://doi.org/10.31219/osf.io/cx752
[21]
Josiah Hester. 2023. Why is CHI in Hawaii. https://www.chiinhawaii.info
[22]
John J. Horton. 2023. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?https://doi.org/10.3386/w31122
[23]
Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–19. https://doi.org/10.1145/3544548.3580688
[24]
Neslihan Iskender, Tim Polzehl, and Sebastian Möller. 2020. Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Steffen Eger, Yang Gao, Maxime Peyrard, Wei Zhao, and Eduard Hovy (Eds.). Association for Computational Linguistics, Online, 164–175. https://doi.org/10.18653/v1/2020.eval4nlp-1.16
[25]
Marina Kogan, Aaron Halfaker, Shion Guha, Cecilia Aragon, Michael Muller, and Stuart Geiger. 2020. Mapping Out Human-Centered Data Science: Methods, Approaches, and Best Practices. In Companion Proceedings of the 2020 ACM International Conference on Supporting Group Work(GROUP ’20). Association for Computing Machinery, New York, NY, USA, 151–156. https://doi.org/10.1145/3323994.3369898
[26]
Katherine Lee, A. Feder Cooper, and James Grimmelmann. 2023. Talkin’ ‘Bout AI Generation: Copyright and the Generative-AI Supply Chain. https://doi.org/10.2139/ssrn.4523551
[27]
Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov. 2019), 72:1–72:23. https://doi.org/10.1145/3359174
[28]
Mary L. McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia Medica 22, 3 (Oct. 2012), 276–282. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900052/
[29]
Michael Muller, Lydia B Chilton, Anna Kantosalo, Q. Vera Liao, Mary Lou Maher, Charles Patrick Martin, and Greg Walsh. 2023. GenAICHI 2023: Generative AI and HCI at CHI 2023. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–7. https://doi.org/10.1145/3544549.3573794
[30]
Arvind Narayanan. 2023. Evaluating LLMs is a minefield. https://www.aisnakeoil.com/p/evaluating-llms-is-a-minefield
[31]
Anna-Marie Ortloff, Matthias Fassl, Alexander Ponticello, Florin Martius, Anne Mertens, Katharina Krombholz, and Matthew Smith. 2023. Different Researchers, Different Results? Analyzing the Influence of Researcher Experience and Data Type During Qualitative Analysis of an Interview and Survey Study on Security Advice. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–21. https://doi.org/10.1145/3544548.3580766
[32]
Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. https://doi.org/10.48550/arXiv.2304.03442 arXiv:2304.03442 [cs].
[33]
Kathleen Pine, Claus Bossen, Naja Holten Møller, Milagros Miceli, Alex Jiahong Lu, Yunan Chen, Leah Horgan, Zhaoyuan Su, Gina Neff, and Melissa Mazmanian. 2022. Investigating Data Work Across Domains: New Perspectives on the Work of Creating Data. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems(CHI EA ’22). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3491101.3503724
[34]
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 784–789. https://doi.org/10.18653/v1/P18-2124
[35]
Casey Randazzo and Tawfiq Ammari. 2023. “If Someone Downvoted My Posts—That’d Be the End of the World”: Designing Safer Online Spaces for Trauma Survivors. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3544548.3581453
[36]
Steve Rathje, Dan-Mircea Mirea, Ilia Sucholutsky, Raja Marjieh, Claire Robertson, and Jay J. Van Bavel. 2023. GPT is an effective tool for multilingual psychological text analysis. https://doi.org/10.31234/osf.io/sekf5
[37]
Tim Rietz and Alexander Maedche. 2021. Cody: An AI-Based System to Semi-Automate Coding for Qualitative Research. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3411764.3445591
[38]
Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A. Smith. 2022. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection. https://doi.org/10.48550/arXiv.2111.07997 arXiv:2111.07997 [cs].
[39]
Carol F Scott, Gabriela Marcu, Riana Elyse Anderson, Mark W Newman, and Sarita Schoenebeck. 2023. Trauma-Informed Social Media: Towards Solutions for Reducing and Healing Online Harm. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–20. https://doi.org/10.1145/3544548.3581512
[40]
Hong Shen, Tianshi Li, Toby Jia-Jun Li, Joon Sung Park, and Diyi Yang. 2023. Shaping the Emerging Norms of Using Large Language Models in Social Computing Research. http://arxiv.org/abs/2307.04280 arXiv:2307.04280 [cs].
[41]
Joon Gi Shin, Janin Koch, Andrés Lucero, Peter Dalsgaard, and Wendy E. Mackay. 2023. Integrating AI in Human-Human Collaborative Ideation. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems(CHI EA ’23). Association for Computing Machinery, New York, NY, USA, 1–5. https://doi.org/10.1145/3544549.3573802
[42]
Petter Törnberg. 2023. ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning. https://doi.org/10.48550/arXiv.2304.06588 arXiv:2304.06588 [cs].
[43]
Veniamin Veselovsky, Manoel Horta Ribeiro, Akhil Arora, Martin Josifoski, Ashton Anderson, and Robert West. 2023. Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science. http://arxiv.org/abs/2305.15041 arXiv:2305.15041 [cs].
[44]
Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. 2023. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. http://arxiv.org/abs/2306.07899 arXiv:2306.07899 [cs].
[45]
Shaun Wallace, Tianyuan Cai, Brendan Le, and Luis A. Leiva. 2022. Debiased Label Aggregation for Subjective Crowdsourcing Tasks. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems(CHI EA ’22). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3491101.3519614
[46]
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.https://papers.nips.cc/paper_files/paper/2019/hash/4496bf24afe7fab6f046bf4923da8de6-Abstract.html
[47]
Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer. 2023. Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding. In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces(IUI ’23 Companion). Association for Computing Machinery, New York, NY, USA, 75–78. https://doi.org/10.1145/3581754.3584136
[48]
Diyi Yang and Chenguang Zhu. 2023. Summarization of Dialogues and Conversations At Scale. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts. Association for Computational Linguistics, Dubrovnik, Croatia, 13–18. https://doi.org/10.18653/v1/2023.eacl-tutorials.3
[49]
Himanshu Zade, Margaret Drouhard, Bonnie Chinh, Lu Gan, and Cecilia Aragon. 2018. Conceptualizing Disagreement in Qualitative Coding. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3173733
[50]
Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, and Diyi Yang. 2023. Can Large Language Models Transform Computational Social Science?http://arxiv.org/abs/2305.03514 arXiv:2305.03514 [cs].
[51]
Sena Çerçi, Marta E. Cecchinato, and John Vines. 2021. How Design Researchers Interpret Probes: Understanding the Critical Intentions of a Designerly Approach to Research. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3411764.3445328
[52]
Asil Çetin, Torsten Moeller, and Thomas Torsney-Weir. 2021. CorpSum: Towards an Enabling Tool-Design for Language Researchers to Explore, Analyze and Visualize Corpora. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3411764.3445145

Cited By

View all
  • (2025)The State of Large Language Models in HCI Research: Workshop ReportInteractions10.1145/370561732:1(8-9)Online publication date: 7-Jan-2025
  • (2024)Under the (neighbor)hood: Hyperlocal Surveillance on NextdoorProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641967(1-22)Online publication date: 11-May-2024
  • (2024)Bridging Quantitative and Qualitative Methods for Visualization Research: A Data/Semantics Perspective in Light of Advanced AI2024 IEEE Evaluation and Beyond - Methodological Approaches for Visualization (BELIV)10.1109/BELIV64461.2024.00019(119-128)Online publication date: 14-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems
May 2024
4761 pages
ISBN:9798400703317
DOI:10.1145/3613905
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Check for updates

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Conference

CHI '24

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)842
  • Downloads (Last 6 weeks)99
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)The State of Large Language Models in HCI Research: Workshop ReportInteractions10.1145/370561732:1(8-9)Online publication date: 7-Jan-2025
  • (2024)Under the (neighbor)hood: Hyperlocal Surveillance on NextdoorProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641967(1-22)Online publication date: 11-May-2024
  • (2024)Bridging Quantitative and Qualitative Methods for Visualization Research: A Data/Semantics Perspective in Light of Advanced AI2024 IEEE Evaluation and Beyond - Methodological Approaches for Visualization (BELIV)10.1109/BELIV64461.2024.00019(119-128)Online publication date: 14-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media