research-article

Open access

PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language Models

Authors:

Jill Fain Lehman,

Mayank GoelAuthors Info & Claims

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 8, Issue 4

Article No.: 180, Pages 1 - 26

https://doi.org/10.1145/3699759

Published: 21 November 2024 Publication History

Abstract

Voice assistants capable of answering user queries during various physical tasks have shown promise in guiding users through complex procedures. However, users often find it challenging to articulate their queries precisely, especially when unfamiliar with the specific terminologies required for machine-oriented tasks. We introduce PrISM-Q&A, a novel question-answering (Q&A) interaction termed step-aware Q&A, which enhances the functionality of voice assistants on smartwatches by incorporating Human Activity Recognition (HAR) and providing the system with user context. It continuously monitors user behavior during procedural tasks via audio and motion sensors on the watch and estimates which step the user is performing. When a question is posed, this contextual information is supplied to Large Language Models (LLMs) as part of the context used to generate a response, even in the case of inherently vague questions like "What should I do next with this?" Our studies confirmed that users preferred the convenience of our approach compared to existing voice assistants. Our real-time assistant represents the first Q&A system that provides contextually situated support during tasks without camera use, paving the way for the ubiquitous, intelligent assistant.

Supplemental Material

JPG File - Thumbnail image

A smartwatch-based system integrates audio and motion data to enhance the voice assistant's question answering with step context.

Download
297.93 KB

References

[1]

Saleema Amershi, Daniel S. Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi T. Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 3. https://doi.org/10.1145/3290605.3300233

Digital Library

[2]

Riku Arakawa, Hiromu Yakura, and Mayank Goel. 2024. PrISM-Observer: Intervention Agent to Help Users Perform Everyday Procedures Sensed using a Smartwatch. In UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology, Pittsburgh, USA, October 13-16, 2024. ACM, New York, NY, 1--16. https://doi.org/10.1145/3654777.3676350

Digital Library

[3]

Riku Arakawa, Hiromu Yakura, Vimal Mollyn, Suzanne Nie, Emma Russell, Dustin P. DeMeo, Haarika A. Reddy, Alexander K. Maytin, Bryan T. Carroll, Jill Fain Lehman, and Mayank Goel. 2022. PrISM-Tracker: A Framework for Multimodal Procedure Tracking Using Wearable Sensors and State Transition Information with User-Driven Handling of Errors and Uncertainty. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 4 (2022), 156:1--156:27. https://doi.org/10.1145/3569504

Digital Library

[4]

Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem, and Rosie Jones. 2015. Characterizing and Predicting Voice Query Reformulation. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, October 19-23, 2015. ACM, New York, NY, 543--552. https://doi.org/10.1145/2806416.2806491

Digital Library

[5]

Vincent Becker, Linus Fessler, and Gábor Sörös. 2019. GestEar: combining audio and motion sensing for gesture recognition on smartwatches. In Proceedings of the 23rd International Symposium on Wearable Computers, UbiComp 2019, London, UK, September 09-13, 2019. ACM, New York, NY, 10--19. https://doi.org/10.1145/3341163.3347735

Digital Library

[6]

Parishad BehnamGhader, Santiago Miret, and Siva Reddy. 2023. Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. Association for Computational Linguistics, 15492--15509. https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP.1036

[7]

Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle M. Lottridge. 2018. Understanding the Long-Term Use of Smart Speaker Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 3 (2018), 91:1--91:24. https://doi.org/10.1145/3264901

Digital Library

[8]

Edgar A. Bernal, Xitong Yang, Qun Li, Jayant Kumar, Sriganesh Madhvanath, Palghat Ramesh, and Raja Bala. 2018. Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors. IEEE Transactions on Multimedia 20, 1 (2018), 107--118. https://doi.org/10.1109/TMM.2017.2726187

Digital Library

[9]

Sarnab Bhattacharya, Rebecca Adaimi, and Edison Thomaz. 2022. Leveraging Sound and Wrist Motion to Detect Activities of Daily Living with Commodity Smartwatches. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2 (2022), 42:1--42:28. https://doi.org/10.1145/3534582

Digital Library

[10]

John Brooke. 1996. SUS: A 'Quick and Dirty' Usability Scale. In Usability Evaluation In Industry, Patrick W. Jordan, B. Thomas, Ian Lyall McClelland, and Bernard Weerdmeester (Eds.). CRC Press, London, UK, 207--212.

[11]

Yuanyuan Chen, Zhengjie Liu, and Juhani Vainio. 2013. Activity-Based Context-Aware Model. In Design, User Experience, and Usability. Design Philosophy, Methods, and Tools - Second International Conference, DUXU 2013, Held as Part of HCI International 2013, Las Vegas, NV, USA, July 21-26, 2013, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 8012). Springer, New York, NY, 479--487. https://doi.org/10.1007/978-3-642-39229-0_51

Digital Library

[12]

Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies: why and how. In Proceedings of the 1st International Workshop on Intelligent User Interfaces, IUI 1993, Orlando, Florida, USA, January 4-7, 1993. ACM, 193--200. https://doi.org/10.1145/169891.169968

Digital Library

[13]

Praveen Damacharla, Parashar Dhakal, Sebastian Stumbo, Ahmad Y. Javaid, Subhashini Ganapathy, David A. Malek, Douglas C. Hodge, and Vijay Kumar Devabhaktuni. 2019. Effects of Voice-Based Synthetic Assistant on Performance of Emergency Care Provider in Training. Int. J. Artif. Intell. Educ. 29, 1 (2019), 122--143. https://doi.org/10.1007/S40593-018-0166-3

[14]

Stephan Diederich, Alfred Benedikt Brendel, Stefan Morana, and Lutz M. Kolbe. 2022. On the Design of and Interaction with Conversational Agents: An Organizing and Assessing Review of Human-Computer Interaction Research. J. Assoc. Inf. Syst. 23, 1 (2022), 9. https://aisel.aisnet.org/jais/vol23/iss1/9

[15]

Dat Duong and Benjamin D. Solomon. 2023. Analysis of large-language model versus human performance for genetics questions. European Journal of Human Genetics 32, 4 (May 2023), 466--468. https://doi.org/10.1038/s41431-023-01396-8

[16]

Shahul Es, Jithin James, Luis Espinosa-Anke, and Steven Schockaert. 2023. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217 (2023).

[17]

Rohan Anil et al. 2023. Gemini: A Family of Highly Capable Multimodal Models. CoRR abs/2312.11805 (2023). https://doi.org/10.48550/ARXIV.2312.11805

[18]

Alexander Frummet, Alessandro Speggiorin, David Elsweiler, Anton Leuski, and Jeff Dalton. 2024. Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing Assistant. ACM Transactions on Information Systems (March 2024). https://doi.org/10.1145/3649500

Digital Library

[19]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. CoRR abs/2312.10997 (2023). https://doi.org/10.48550/ARXIV.2312.10997

[20]

Yu Guan and Thomas Plötz. 2017. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 2 (2017), 11:1--11:28. https://doi.org/10.1145/3090076

Digital Library

[21]

Raymonde Guindon, Kelly Shuldberg, and Joyce Conner. 1987. Grammatical and Ungrammatical Structures in User-Adviser Dialogues=Evidence for Sufficiency of Restricted Languages in Natural Language Interfaces to Advisory Systems. In 25th Annual Meeting of the Association for Computational Linguistics, Stanford University, Stanford, California, USA, July 6-9, 1987. ACL, 41--44. https://doi.org/10.3115/981175.981181

Digital Library

[22]

Nancie Gunson, Daniel Hernández García, Weronika Sieinska, Angus Addlesee, Christian Dondrup, Oliver Lemon, Jose L. Part, and Yanchao Yu. 2022. A Visually-Aware Conversational Robot Receptionist. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2022, Edinburgh, UK, 07--09 September 2022. Association for Computational Linguistics, 645--648. https://doi.org/10.18653/V1/2022.SIGDIAL-1.61

[23]

Reiko Hamada, Jun Okabe, Ichiro Ide, Shin'ichi Satoh, Shuichi Sakai, and Hidehiko Tanaka. 2005. Cooking navi: assistant for daily cooking in kitchen. In Proceedings of the 13th ACM International Conference on Multimedia, Singapore, November 6-11, 2005. ACM, 371--374. https://doi.org/10.1145/1101149.1101228

Digital Library

[24]

Jun Hatori, Yuta Kikuchi, Sosuke Kobayashi, Kuniyuki Takahashi, Yuta Tsuboi, Yuya Unno, Wilson Ko, and Jethro Tan. 2018. Interactively picking real-world objects with unconstrained spoken language instructions. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation. IEEE, New York, NY, 3774--3781. https://doi.org/10.1109/ICRA.2018.8460699

Digital Library

[25]

Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. In Proceeding of the 1999 ACM SIGCHI Conference on Human Factors in Computing Systems, Marian G. Williams and Mark W. Altom (Eds.). ACM, New York, NY, 159--166. https://doi.org/10.1145/302979.303030

Digital Library

[26]

Gaoping Huang, Xun Qian, Tianyi Wang, Fagun Patel, Maitreya Sreeram, Yuanzhi Cao, Karthik Ramani, and Alexander J. Quinn. 2021. AdapTutAR: An adaptive tutoring system for machine tasks in augmented reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 417:1--417:15. https://doi.org/10.1145/3411764.3445283

Digital Library

[27]

Alyssa Hwang, Natasha Oza, Chris Callison-Burch, and Andrew Head. 2023. Rewriting the Script: Adapting Text Instructions for Voice Interaction. In Proceedings of the 2023 ACM Designing Interactive Systems Conference, DIS 2023, Pittsburgh, PA, USA, July 10-14, 2023. ACM, New York, NY, 2233--2248. https://doi.org/10.1145/3563657.3596059

Digital Library

[28]

Keisuke Imoto and Suehiro Shimauchi. 2016. Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence. IEICE Trans. Inf. Syst. 99-D, 10 (2016), 2539--2549. https://doi.org/10.1587/TRANSINF.2016SLP0004

[29]

Takahiko Ito, Shintaro Inuzuka, Yoshiaki Yamada, and Jun Harashima. 2019. Real World Voice Assistant System for Cooking. In Proceedings of the 12th International Conference on Natural Language Generation, INLG 2019, Tokyo, Japan, October 29 - November 1, 2019. Association for Computational Linguistics, 508--509. https://doi.org/10.18653/V1/W19-8663

[30]

Razan Jaber, Sabrina Zhong, Sanna Kuoppamäki, Aida Hosseini, Iona Gessinger, Duncan P Brumby, Benjamin R. Cowan, and Donald Mcmillan. 2024. Cooking With Agents: Designing Context-aware Voice Interaction. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24). ACM. https://doi.org/10.1145/3613904.3642183

Digital Library

[31]

Mohit Jain, Pratyush Kumar, Ishita Bhansali, Q. Vera Liao, Khai N. Truong, and Shwetak N. Patel. 2018. FarmChat: A Conversational Agent to Answer Farmer Queries. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 4 (2018), 170:1--170:22. https://doi.org/10.1145/3287048

Digital Library

[32]

Jane Huang. 2024. Evaluating Large Language Model (LLM) systems: Metrics, challenges, and best practices. https://medium.com/data-science-at-microsoft/evaluating-llm-systems-metrics-challenges-and-best-practices-664ac25be7e5

[33]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12 (2023), 248:1--248:38. https://doi.org/10.1145/3571730

Digital Library

[34]

Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. 2023. Large Language Models Struggle to Learn Long-Tail Knowledge. In International Conference on Machine Learning, ICML 2023, 23--29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202). PMLR, 15696--15707.

[35]

Sunyoung Kim and Abhishek Choudhury. 2021. Exploring older adults' perception and use of smart speaker-based voice assistants: A longitudinal study. Comput. Hum. Behav. 124 (2021), 106914. https://doi.org/10.1016/J.CHB.2021.106914

Digital Library

[36]

Yusaku Korematsu, Daisuke Saito, and Nobuaki Minematsu. 2019. Cooking state recognition based on acoustic event detection. In Proceedings of the 11th Workshop on Multimedia for Cooking and Eating Activities. ACM, New York, NY, 41--44. https://doi.org/10.1145/3326458.3326932

Digital Library

[37]

Gierad Laput, Karan Ahuja, Mayank Goel, and Chris Harrison. 2018. Ubicoustics: Plug-and-play acoustic activity recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. ACM, New York, NY, 213--224. https://doi.org/10.1145/3242587.3242609

Digital Library

[38]

Gierad Laput, Robert Xiao, and Chris Harrison. 2016. ViBand: High-fidelity bio-acoustic sensing using commodity smartwatch accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, New York, NY, 321--333. https://doi.org/10.1145/2984511.2984582

Digital Library

[39]

Chi-Jung Lee, Ruidong Zhang, Devansh Agarwal, Tianhong Catherine Yu, Vipin Gunda, Oliver Lopez, James Kim, Sicheng Yin, Boao Dong, Ke Li, Mose Sakashita, François Guimbretière, and Cheng Zhang. 2024. EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband. In Proceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11-16, 2024. ACM, 403:1--403:21. https://doi.org/10.1145/3613904.3642910

Digital Library

[40]

Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, and Jon E. Froehlich. 2024. GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY.

[41]

Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.

[42]

Georgianna Lin, Jin Yi Li, Afsaneh Fazly, Vladimir Pavlovic, and Khai N. Truong. 2023. Identifying Multimodal Context Awareness Requirements for Supporting User Interaction with Procedural Videos. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023. ACM, 761:1--761:17. https://doi.org/10.1145/3544548.3581006

Digital Library

[43]

Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7-12, 2016. ACM, 5286--5297. https://doi.org/10.1145/2858036.2858288

Digital Library

[44]

Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023. Association for Computational Linguistics, 9802--9822. https://doi.org/10.18653/V1/2023.ACL-LONG.546

[45]

Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. 2021. Generation-Augmented Retrieval for Open-Domain Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. Association for Computational Linguistics, 4089--4100. https://doi.org/10.18653/V1/2021.ACL-LONG.316

[46]

Fabrice Matulic, Riku Arakawa, Brian K. Vogel, and Daniel Vogel. 2020. PenSight: Enhanced Interaction with a Pen-Top Camera. In CHI '20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020. ACM, 1--14. https://doi.org/10.1145/3313831.3376147

Digital Library

[47]

Sven Mayer, Gierad Laput, and Chris Harrison. 2020. Enhancing Mobile Voice Assistants with WorldGaze. In CHI '20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020. ACM, 1--10. https://doi.org/10.1145/3313831.3376479

Digital Library

[48]

Meta Platforms. 2023. Smart glasses for living all in. https://www.meta.com/smart-glasses/

[49]

Microsoft. 2024. Tiny but mighty: The Phi-3 small language models with big potential. https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/

[50]

Vimal Mollyn, Karan Ahuja, Dhruv Verma, Chris Harrison, and Mayank Goel. 2022. SAMoSA: Sensing Activities with Motion and Subsampled Audio. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3 (2022), 132:1--132:19. https://doi.org/10.1145/3550284

Digital Library

[51]

Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, and Francisco Flórez-Revuelta. 2016. Recognition of Activities of Daily Living with Egocentric Vision: A Review. Sensors 16, 1 (2016), 72. https://doi.org/10.3390/S16010072

[52]

Jennifer Ockerman and Amy Pritchett. 2000. A review and reappraisal of task guidance: Aiding workers in procedure following. International Journal of Cognitive Ergonomics 4, 3 (2000), 191--212. https://doi.org/10.1207/s15327566ijce0403_2

[53]

OpenAI. 2022. Introducing Whisper. https://openai.com/research/whisper

[54]

OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023). https://doi.org/10.48550/ARXIV.2303.08774 arXiv:2303.08774

[55]

Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. 2023. Unifying Large Language Models and Knowledge Graphs: A Roadmap. CoRR abs/2306.08302 (2023). https://doi.org/10.48550/ARXIV.2306.08302

[56]

Vineet Parikh, Saif Mahmud, Devansh Agarwal, Ke Li, François Guimbretière, and Cheng Zhang. 2024. EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos. CoRR abs/2406.10750 (2024). https://doi.org/10.48550/ARXIV.2406.10750

[57]

Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21-26, 2018. ACM, 640. https://doi.org/10.1145/3173574.3174214

Digital Library

[58]

Alisha Pradhan, Leah Findlater, and Amanda Lazar. 2019. "Phantom Friend" or "Just a Box with Information": Personification and Ontological Categorization of Smart Speaker-based Voice Assistants by Older Adults. Proc. ACM Hum. Comput. Interact. 3, CSCW (2019), 214:1--214:21. https://doi.org/10.1145/3359316

Digital Library

[59]

Meghana Ratna Pydi, Petra Stankard, Neha Parikh, Purnima Ranawat, Ravneet Kaur, AG Shankar, Angela Chaudhuri, Sonjelle Shilton, Aditi Srinivasan, Joyita Chowdhury, and Elena Ivanova Reipold. 2023. Assessment of the Usability of SARS-CoV-2 Self Tests in a Peer-Assisted Model among Factory Workers in Bengaluru, India. (Nov. 2023). https://doi.org/10.1101/2023.11.20.23298784

[60]

A. RAOUF and S. ARORA. 1980. Effect of informational load, index of difficulty direction and plane angles of discrete moves in a combined manual and decision task. International Journal of Production Research 18, 1 (Jan. 1980), 117--128. https://doi.org/10.1080/00207548008919653

[61]

Jorge Rodrguez, Teresa Gutirrez, Emilio J., Sara Casado, and Iker Aguinag. 2012. Training of Procedural Tasks Through the Use of Virtual Reality and Direct Aids. InTech. https://doi.org/10.5772/36650

[62]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, LA, USA, December 10-16, 2023.

[63]

Ekaterina H. Spriggs, Fernando De la Torre, and Martial Hebert. 2009. Temporal segmentation and activity classification from first-person sensing. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Washington, DC, 17--24. https://doi.org/10.1109/CVPRW.2009.5204354

[64]

Annalise Vaccarello, Alexander K. Maytin, Yash Kumar, Toluwalashe Onamusi, Haarika A. Reddy, Mayank Goel, Riku Arakawa, Jill Fain Lehman, and Bryan T. Carroll. 2024. Barriers to use of digital assistance for postoperative wound care: a single-center survey of dermatologic surgery patients. Archives of Dermatological Research 316, 7 (June 2024). https://doi.org/10.1007/s00403-024-03025-w

[65]

Sarah Theres Völkel, Daniel Buschek, Malin Eiband, Benjamin R. Cowan, and Heinrich Hussmann. 2021. Eliciting and Analysing Users' Envisioned Dialogues with Perfect Voice Assistants. In CHI '21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021. ACM, 254:1--254:15. https://doi.org/10.1145/3411764.3445536

Digital Library

[66]

Alexandra Vtyurina and Adam Fourney. 2018. Exploring the Role of Conversational Cues in Guided Task Support with Virtual Assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21-26, 2018. ACM, 208. https://doi.org/10.1145/3173574.3173782

Digital Library

[67]

Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, and Marc Pollefeys. 2023. HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE, 20213--20224. https://doi.org/10.1109/ICCV51070.2023.01854

[68]

Stefan Wellsandta, Zoltan Rusak, Santiago Ruiz Arenas, Doris Aschenbrenner, Karl A. Hribernik, and Klaus-Dieter Thoben. 2020. Concept of a Voice-Enabled Digital Assistant for Predictive Maintenance in Manufacturing. SSRN Electronic Journal (2020). https://doi.org/10.2139/ssrn.3718008

[69]

Jason Wu, Chris Harrison, Jeffrey P. Bigham, and Gierad Laput. 2020. Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition. In CHI '20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020. ACM, 1--14. https://doi.org/10.1145/3313831.3376875

Digital Library

[70]

Kevin Wu, Eric Wu, Ally Cassasola, Angela Zhang, Kevin Wei, Teresa Nguyen, Sith Riantawan, Patricia Shi Riantawan, Daniel E. Ho, and James Zou. 2024. How well do LLMs cite relevant medical references? An evaluation framework and analyses. CoRR abs/2402.02008 (2024). https://doi.org/10.48550/ARXIV.2402.02008

[71]

Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. 2023. Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation. In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023. IEEE, 1--5. https://doi.org/10.1109/ICASSP49357.2023.10095969

[72]

Qingxin Xia, Atsushi Wada, Joseph Korpela, Takuya Maekawa, and Yasuo Namioka. 2019. Unsupervised factory activity recognition with wearable sensors using process instruction information. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 60:1--60:23. https://doi.org/10.1145/3328931

Digital Library

[73]

Huatao Xu, Liying Han, Qirui Yang, Mo Li, and Mani Srivastava. 2024. Penetrative AI: Making LLMs Comprehend the Physical World. In Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications, HOTMOBILE 2024, San Diego, CA, USA, February 28-29, 2024. ACM, New York, NY, 1--7. https://doi.org/10.1145/3638550.3641130

Digital Library

[74]

Saelyne Yang, Sunghyun Park, Yunseok Jang, and Moontae Lee. 2024. YTCommentQA: Video Question Answerability in Instructional Videos. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada. AAAI Press, 19359--19367. https://doi.org/10.1609/AAAI.V38I17.29906

Digital Library

[75]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.

[76]

Ye Yuan, Stryker Thompson, Kathleen Watson, Alice Chase, Ashwin Senthilkumar, A. J. Bernheim Brush, and Svetlana Yarosh. 2019. Speech interface reformulations and voice assistant personification preferences of children and parents. Int. J. Child Comput. Interact. 21 (2019), 77--88. https://doi.org/10.1016/J.IJCCI.2019.04.005

Digital Library

[77]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. CoRR abs/2303.18223 (2023). https://doi.org/10.48550/ARXIV.2303.18223

[78]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 -16, 2023.

[79]

Honglu Zhou, Roberto Martín-Martín, Mubbasir Kapadia, Silvio Savarese, and Juan Carlos Niebles. 2023. Procedure-Aware Pretraining for Instructional Video Understanding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 10727--10738. https://doi.org/10.1109/CVPR52729.2023.01033

[80]

Shuyan Zhou, Li Zhang, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch, and Graham Neubig. 2022. Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022. Association for Computational Linguistics, 2998--3012. https://doi.org/10.18653/V1/2022.ACL-LONG.214

[81]

Fengbin Zhu, Wenqiang Lei, Chao Wang, Jianming Zheng, Soujanya Poria, and Tat-Seng Chua. 2021. Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering. CoRR abs/2101.00774 (2021).

Index Terms

PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language Models
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
  2. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

PrISM-Observer: Intervention Agent to Help Users Perform Everyday Procedures Sensed using a Smartwatch
UIST '24: Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

We routinely perform procedures (such as cooking) that include a set of atomic steps. Often, inadvertent omission or misordering of a single step can lead to serious consequences, especially for those experiencing cognitive challenges such as dementia. ...
Unified Framework for Procedural Task Assistants powered by Human Activity Recognition
UbiComp '24: Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing

Context awareness is key to developing intelligent voice assistants that offer situated support for users performing various daily tasks, like cooking and machine use. Human Activity Recognition (HAR) with various sensors can be a powerful approach to ...
Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Community Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 8, Issue 4

December 2024

1788 pages

EISSN:2474-9567

DOI:10.1145/3705705

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2024

Published in IMWUT Volume 8, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSF (National Science Foundation)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
270
Total Downloads

Downloads (Last 12 months)270
Downloads (Last 6 weeks)160

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents