Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language Models

Published: 21 November 2024 Publication History

Abstract

Voice assistants capable of answering user queries during various physical tasks have shown promise in guiding users through complex procedures. However, users often find it challenging to articulate their queries precisely, especially when unfamiliar with the specific terminologies required for machine-oriented tasks. We introduce PrISM-Q&A, a novel question-answering (Q&A) interaction termed step-aware Q&A, which enhances the functionality of voice assistants on smartwatches by incorporating Human Activity Recognition (HAR) and providing the system with user context. It continuously monitors user behavior during procedural tasks via audio and motion sensors on the watch and estimates which step the user is performing. When a question is posed, this contextual information is supplied to Large Language Models (LLMs) as part of the context used to generate a response, even in the case of inherently vague questions like "What should I do next with this?" Our studies confirmed that users preferred the convenience of our approach compared to existing voice assistants. Our real-time assistant represents the first Q&A system that provides contextually situated support during tasks without camera use, paving the way for the ubiquitous, intelligent assistant.

Supplemental Material

JPG File - Thumbnail image
A smartwatch-based system integrates audio and motion data to enhance the voice assistant's question answering with step context.

References

[1]
Saleema Amershi, Daniel S. Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi T. Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 3. https://doi.org/10.1145/3290605.3300233
[2]
Riku Arakawa, Hiromu Yakura, and Mayank Goel. 2024. PrISM-Observer: Intervention Agent to Help Users Perform Everyday Procedures Sensed using a Smartwatch. In UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology, Pittsburgh, USA, October 13-16, 2024. ACM, New York, NY, 1--16. https://doi.org/10.1145/3654777.3676350
[3]
Riku Arakawa, Hiromu Yakura, Vimal Mollyn, Suzanne Nie, Emma Russell, Dustin P. DeMeo, Haarika A. Reddy, Alexander K. Maytin, Bryan T. Carroll, Jill Fain Lehman, and Mayank Goel. 2022. PrISM-Tracker: A Framework for Multimodal Procedure Tracking Using Wearable Sensors and State Transition Information with User-Driven Handling of Errors and Uncertainty. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 4 (2022), 156:1--156:27. https://doi.org/10.1145/3569504
[4]
Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem, and Rosie Jones. 2015. Characterizing and Predicting Voice Query Reformulation. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, October 19-23, 2015. ACM, New York, NY, 543--552. https://doi.org/10.1145/2806416.2806491
[5]
Vincent Becker, Linus Fessler, and Gábor Sörös. 2019. GestEar: combining audio and motion sensing for gesture recognition on smartwatches. In Proceedings of the 23rd International Symposium on Wearable Computers, UbiComp 2019, London, UK, September 09-13, 2019. ACM, New York, NY, 10--19. https://doi.org/10.1145/3341163.3347735
[6]
Parishad BehnamGhader, Santiago Miret, and Siva Reddy. 2023. Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. Association for Computational Linguistics, 15492--15509. https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP.1036
[7]
Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle M. Lottridge. 2018. Understanding the Long-Term Use of Smart Speaker Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 3 (2018), 91:1--91:24. https://doi.org/10.1145/3264901
[8]
Edgar A. Bernal, Xitong Yang, Qun Li, Jayant Kumar, Sriganesh Madhvanath, Palghat Ramesh, and Raja Bala. 2018. Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors. IEEE Transactions on Multimedia 20, 1 (2018), 107--118. https://doi.org/10.1109/TMM.2017.2726187
[9]
Sarnab Bhattacharya, Rebecca Adaimi, and Edison Thomaz. 2022. Leveraging Sound and Wrist Motion to Detect Activities of Daily Living with Commodity Smartwatches. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2 (2022), 42:1--42:28. https://doi.org/10.1145/3534582
[10]
John Brooke. 1996. SUS: A 'Quick and Dirty' Usability Scale. In Usability Evaluation In Industry, Patrick W. Jordan, B. Thomas, Ian Lyall McClelland, and Bernard Weerdmeester (Eds.). CRC Press, London, UK, 207--212.
[11]
Yuanyuan Chen, Zhengjie Liu, and Juhani Vainio. 2013. Activity-Based Context-Aware Model. In Design, User Experience, and Usability. Design Philosophy, Methods, and Tools - Second International Conference, DUXU 2013, Held as Part of HCI International 2013, Las Vegas, NV, USA, July 21-26, 2013, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 8012). Springer, New York, NY, 479--487. https://doi.org/10.1007/978-3-642-39229-0_51
[12]
Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies: why and how. In Proceedings of the 1st International Workshop on Intelligent User Interfaces, IUI 1993, Orlando, Florida, USA, January 4-7, 1993. ACM, 193--200. https://doi.org/10.1145/169891.169968
[13]
Praveen Damacharla, Parashar Dhakal, Sebastian Stumbo, Ahmad Y. Javaid, Subhashini Ganapathy, David A. Malek, Douglas C. Hodge, and Vijay Kumar Devabhaktuni. 2019. Effects of Voice-Based Synthetic Assistant on Performance of Emergency Care Provider in Training. Int. J. Artif. Intell. Educ. 29, 1 (2019), 122--143. https://doi.org/10.1007/S40593-018-0166-3
[14]
Stephan Diederich, Alfred Benedikt Brendel, Stefan Morana, and Lutz M. Kolbe. 2022. On the Design of and Interaction with Conversational Agents: An Organizing and Assessing Review of Human-Computer Interaction Research. J. Assoc. Inf. Syst. 23, 1 (2022), 9. https://aisel.aisnet.org/jais/vol23/iss1/9
[15]
Dat Duong and Benjamin D. Solomon. 2023. Analysis of large-language model versus human performance for genetics questions. European Journal of Human Genetics 32, 4 (May 2023), 466--468. https://doi.org/10.1038/s41431-023-01396-8
[16]
Shahul Es, Jithin James, Luis Espinosa-Anke, and Steven Schockaert. 2023. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217 (2023).
[17]
Rohan Anil et al. 2023. Gemini: A Family of Highly Capable Multimodal Models. CoRR abs/2312.11805 (2023). https://doi.org/10.48550/ARXIV.2312.11805
[18]
Alexander Frummet, Alessandro Speggiorin, David Elsweiler, Anton Leuski, and Jeff Dalton. 2024. Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing Assistant. ACM Transactions on Information Systems (March 2024). https://doi.org/10.1145/3649500
[19]
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. CoRR abs/2312.10997 (2023). https://doi.org/10.48550/ARXIV.2312.10997
[20]
Yu Guan and Thomas Plötz. 2017. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 2 (2017), 11:1--11:28. https://doi.org/10.1145/3090076
[21]
Raymonde Guindon, Kelly Shuldberg, and Joyce Conner. 1987. Grammatical and Ungrammatical Structures in User-Adviser Dialogues=Evidence for Sufficiency of Restricted Languages in Natural Language Interfaces to Advisory Systems. In 25th Annual Meeting of the Association for Computational Linguistics, Stanford University, Stanford, California, USA, July 6-9, 1987. ACL, 41--44. https://doi.org/10.3115/981175.981181
[22]
Nancie Gunson, Daniel Hernández García, Weronika Sieinska, Angus Addlesee, Christian Dondrup, Oliver Lemon, Jose L. Part, and Yanchao Yu. 2022. A Visually-Aware Conversational Robot Receptionist. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2022, Edinburgh, UK, 07--09 September 2022. Association for Computational Linguistics, 645--648. https://doi.org/10.18653/V1/2022.SIGDIAL-1.61
[23]
Reiko Hamada, Jun Okabe, Ichiro Ide, Shin'ichi Satoh, Shuichi Sakai, and Hidehiko Tanaka. 2005. Cooking navi: assistant for daily cooking in kitchen. In Proceedings of the 13th ACM International Conference on Multimedia, Singapore, November 6-11, 2005. ACM, 371--374. https://doi.org/10.1145/1101149.1101228
[24]
Jun Hatori, Yuta Kikuchi, Sosuke Kobayashi, Kuniyuki Takahashi, Yuta Tsuboi, Yuya Unno, Wilson Ko, and Jethro Tan. 2018. Interactively picking real-world objects with unconstrained spoken language instructions. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation. IEEE, New York, NY, 3774--3781. https://doi.org/10.1109/ICRA.2018.8460699
[25]
Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. In Proceeding of the 1999 ACM SIGCHI Conference on Human Factors in Computing Systems, Marian G. Williams and Mark W. Altom (Eds.). ACM, New York, NY, 159--166. https://doi.org/10.1145/302979.303030
[26]
Gaoping Huang, Xun Qian, Tianyi Wang, Fagun Patel, Maitreya Sreeram, Yuanzhi Cao, Karthik Ramani, and Alexander J. Quinn. 2021. AdapTutAR: An adaptive tutoring system for machine tasks in augmented reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 417:1--417:15. https://doi.org/10.1145/3411764.3445283
[27]
Alyssa Hwang, Natasha Oza, Chris Callison-Burch, and Andrew Head. 2023. Rewriting the Script: Adapting Text Instructions for Voice Interaction. In Proceedings of the 2023 ACM Designing Interactive Systems Conference, DIS 2023, Pittsburgh, PA, USA, July 10-14, 2023. ACM, New York, NY, 2233--2248. https://doi.org/10.1145/3563657.3596059
[28]
Keisuke Imoto and Suehiro Shimauchi. 2016. Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence. IEICE Trans. Inf. Syst. 99-D, 10 (2016), 2539--2549. https://doi.org/10.1587/TRANSINF.2016SLP0004
[29]
Takahiko Ito, Shintaro Inuzuka, Yoshiaki Yamada, and Jun Harashima. 2019. Real World Voice Assistant System for Cooking. In Proceedings of the 12th International Conference on Natural Language Generation, INLG 2019, Tokyo, Japan, October 29 - November 1, 2019. Association for Computational Linguistics, 508--509. https://doi.org/10.18653/V1/W19-8663
[30]
Razan Jaber, Sabrina Zhong, Sanna Kuoppamäki, Aida Hosseini, Iona Gessinger, Duncan P Brumby, Benjamin R. Cowan, and Donald Mcmillan. 2024. Cooking With Agents: Designing Context-aware Voice Interaction. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24). ACM. https://doi.org/10.1145/3613904.3642183
[31]
Mohit Jain, Pratyush Kumar, Ishita Bhansali, Q. Vera Liao, Khai N. Truong, and Shwetak N. Patel. 2018. FarmChat: A Conversational Agent to Answer Farmer Queries. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 4 (2018), 170:1--170:22. https://doi.org/10.1145/3287048
[32]
Jane Huang. 2024. Evaluating Large Language Model (LLM) systems: Metrics, challenges, and best practices. https://medium.com/data-science-at-microsoft/evaluating-llm-systems-metrics-challenges-and-best-practices-664ac25be7e5
[33]
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12 (2023), 248:1--248:38. https://doi.org/10.1145/3571730
[34]
Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. 2023. Large Language Models Struggle to Learn Long-Tail Knowledge. In International Conference on Machine Learning, ICML 2023, 23--29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202). PMLR, 15696--15707.
[35]
Sunyoung Kim and Abhishek Choudhury. 2021. Exploring older adults' perception and use of smart speaker-based voice assistants: A longitudinal study. Comput. Hum. Behav. 124 (2021), 106914. https://doi.org/10.1016/J.CHB.2021.106914
[36]
Yusaku Korematsu, Daisuke Saito, and Nobuaki Minematsu. 2019. Cooking state recognition based on acoustic event detection. In Proceedings of the 11th Workshop on Multimedia for Cooking and Eating Activities. ACM, New York, NY, 41--44. https://doi.org/10.1145/3326458.3326932
[37]
Gierad Laput, Karan Ahuja, Mayank Goel, and Chris Harrison. 2018. Ubicoustics: Plug-and-play acoustic activity recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. ACM, New York, NY, 213--224. https://doi.org/10.1145/3242587.3242609
[38]
Gierad Laput, Robert Xiao, and Chris Harrison. 2016. ViBand: High-fidelity bio-acoustic sensing using commodity smartwatch accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, New York, NY, 321--333. https://doi.org/10.1145/2984511.2984582
[39]
Chi-Jung Lee, Ruidong Zhang, Devansh Agarwal, Tianhong Catherine Yu, Vipin Gunda, Oliver Lopez, James Kim, Sicheng Yin, Boao Dong, Ke Li, Mose Sakashita, François Guimbretière, and Cheng Zhang. 2024. EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband. In Proceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11-16, 2024. ACM, 403:1--403:21. https://doi.org/10.1145/3613904.3642910
[40]
Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, and Jon E. Froehlich. 2024. GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY.
[41]
Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
[42]
Georgianna Lin, Jin Yi Li, Afsaneh Fazly, Vladimir Pavlovic, and Khai N. Truong. 2023. Identifying Multimodal Context Awareness Requirements for Supporting User Interaction with Procedural Videos. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023. ACM, 761:1--761:17. https://doi.org/10.1145/3544548.3581006
[43]
Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, May 7-12, 2016. ACM, 5286--5297. https://doi.org/10.1145/2858036.2858288
[44]
Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023. Association for Computational Linguistics, 9802--9822. https://doi.org/10.18653/V1/2023.ACL-LONG.546
[45]
Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. 2021. Generation-Augmented Retrieval for Open-Domain Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. Association for Computational Linguistics, 4089--4100. https://doi.org/10.18653/V1/2021.ACL-LONG.316
[46]
Fabrice Matulic, Riku Arakawa, Brian K. Vogel, and Daniel Vogel. 2020. PenSight: Enhanced Interaction with a Pen-Top Camera. In CHI '20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020. ACM, 1--14. https://doi.org/10.1145/3313831.3376147
[47]
Sven Mayer, Gierad Laput, and Chris Harrison. 2020. Enhancing Mobile Voice Assistants with WorldGaze. In CHI '20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020. ACM, 1--10. https://doi.org/10.1145/3313831.3376479
[48]
Meta Platforms. 2023. Smart glasses for living all in. https://www.meta.com/smart-glasses/
[49]
Microsoft. 2024. Tiny but mighty: The Phi-3 small language models with big potential. https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/
[50]
Vimal Mollyn, Karan Ahuja, Dhruv Verma, Chris Harrison, and Mayank Goel. 2022. SAMoSA: Sensing Activities with Motion and Subsampled Audio. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3 (2022), 132:1--132:19. https://doi.org/10.1145/3550284
[51]
Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, and Francisco Flórez-Revuelta. 2016. Recognition of Activities of Daily Living with Egocentric Vision: A Review. Sensors 16, 1 (2016), 72. https://doi.org/10.3390/S16010072
[52]
Jennifer Ockerman and Amy Pritchett. 2000. A review and reappraisal of task guidance: Aiding workers in procedure following. International Journal of Cognitive Ergonomics 4, 3 (2000), 191--212. https://doi.org/10.1207/s15327566ijce0403_2
[53]
OpenAI. 2022. Introducing Whisper. https://openai.com/research/whisper
[54]
OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023). https://doi.org/10.48550/ARXIV.2303.08774 arXiv:2303.08774
[55]
Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. 2023. Unifying Large Language Models and Knowledge Graphs: A Roadmap. CoRR abs/2306.08302 (2023). https://doi.org/10.48550/ARXIV.2306.08302
[56]
Vineet Parikh, Saif Mahmud, Devansh Agarwal, Ke Li, François Guimbretière, and Cheng Zhang. 2024. EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos. CoRR abs/2406.10750 (2024). https://doi.org/10.48550/ARXIV.2406.10750
[57]
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21-26, 2018. ACM, 640. https://doi.org/10.1145/3173574.3174214
[58]
Alisha Pradhan, Leah Findlater, and Amanda Lazar. 2019. "Phantom Friend" or "Just a Box with Information": Personification and Ontological Categorization of Smart Speaker-based Voice Assistants by Older Adults. Proc. ACM Hum. Comput. Interact. 3, CSCW (2019), 214:1--214:21. https://doi.org/10.1145/3359316
[59]
Meghana Ratna Pydi, Petra Stankard, Neha Parikh, Purnima Ranawat, Ravneet Kaur, AG Shankar, Angela Chaudhuri, Sonjelle Shilton, Aditi Srinivasan, Joyita Chowdhury, and Elena Ivanova Reipold. 2023. Assessment of the Usability of SARS-CoV-2 Self Tests in a Peer-Assisted Model among Factory Workers in Bengaluru, India. (Nov. 2023). https://doi.org/10.1101/2023.11.20.23298784
[60]
A. RAOUF and S. ARORA. 1980. Effect of informational load, index of difficulty direction and plane angles of discrete moves in a combined manual and decision task. International Journal of Production Research 18, 1 (Jan. 1980), 117--128. https://doi.org/10.1080/00207548008919653
[61]
Jorge Rodrguez, Teresa Gutirrez, Emilio J., Sara Casado, and Iker Aguinag. 2012. Training of Procedural Tasks Through the Use of Virtual Reality and Direct Aids. InTech. https://doi.org/10.5772/36650
[62]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, LA, USA, December 10-16, 2023.
[63]
Ekaterina H. Spriggs, Fernando De la Torre, and Martial Hebert. 2009. Temporal segmentation and activity classification from first-person sensing. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Washington, DC, 17--24. https://doi.org/10.1109/CVPRW.2009.5204354
[64]
Annalise Vaccarello, Alexander K. Maytin, Yash Kumar, Toluwalashe Onamusi, Haarika A. Reddy, Mayank Goel, Riku Arakawa, Jill Fain Lehman, and Bryan T. Carroll. 2024. Barriers to use of digital assistance for postoperative wound care: a single-center survey of dermatologic surgery patients. Archives of Dermatological Research 316, 7 (June 2024). https://doi.org/10.1007/s00403-024-03025-w
[65]
Sarah Theres Völkel, Daniel Buschek, Malin Eiband, Benjamin R. Cowan, and Heinrich Hussmann. 2021. Eliciting and Analysing Users' Envisioned Dialogues with Perfect Voice Assistants. In CHI '21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021. ACM, 254:1--254:15. https://doi.org/10.1145/3411764.3445536
[66]
Alexandra Vtyurina and Adam Fourney. 2018. Exploring the Role of Conversational Cues in Guided Task Support with Virtual Assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21-26, 2018. ACM, 208. https://doi.org/10.1145/3173574.3173782
[67]
Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, and Marc Pollefeys. 2023. HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE, 20213--20224. https://doi.org/10.1109/ICCV51070.2023.01854
[68]
Stefan Wellsandta, Zoltan Rusak, Santiago Ruiz Arenas, Doris Aschenbrenner, Karl A. Hribernik, and Klaus-Dieter Thoben. 2020. Concept of a Voice-Enabled Digital Assistant for Predictive Maintenance in Manufacturing. SSRN Electronic Journal (2020). https://doi.org/10.2139/ssrn.3718008
[69]
Jason Wu, Chris Harrison, Jeffrey P. Bigham, and Gierad Laput. 2020. Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition. In CHI '20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020. ACM, 1--14. https://doi.org/10.1145/3313831.3376875
[70]
Kevin Wu, Eric Wu, Ally Cassasola, Angela Zhang, Kevin Wei, Teresa Nguyen, Sith Riantawan, Patricia Shi Riantawan, Daniel E. Ho, and James Zou. 2024. How well do LLMs cite relevant medical references? An evaluation framework and analyses. CoRR abs/2402.02008 (2024). https://doi.org/10.48550/ARXIV.2402.02008
[71]
Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. 2023. Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation. In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023. IEEE, 1--5. https://doi.org/10.1109/ICASSP49357.2023.10095969
[72]
Qingxin Xia, Atsushi Wada, Joseph Korpela, Takuya Maekawa, and Yasuo Namioka. 2019. Unsupervised factory activity recognition with wearable sensors using process instruction information. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 60:1--60:23. https://doi.org/10.1145/3328931
[73]
Huatao Xu, Liying Han, Qirui Yang, Mo Li, and Mani Srivastava. 2024. Penetrative AI: Making LLMs Comprehend the Physical World. In Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications, HOTMOBILE 2024, San Diego, CA, USA, February 28-29, 2024. ACM, New York, NY, 1--7. https://doi.org/10.1145/3638550.3641130
[74]
Saelyne Yang, Sunghyun Park, Yunseok Jang, and Moontae Lee. 2024. YTCommentQA: Video Question Answerability in Instructional Videos. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada. AAAI Press, 19359--19367. https://doi.org/10.1609/AAAI.V38I17.29906
[75]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
[76]
Ye Yuan, Stryker Thompson, Kathleen Watson, Alice Chase, Ashwin Senthilkumar, A. J. Bernheim Brush, and Svetlana Yarosh. 2019. Speech interface reformulations and voice assistant personification preferences of children and parents. Int. J. Child Comput. Interact. 21 (2019), 77--88. https://doi.org/10.1016/J.IJCCI.2019.04.005
[77]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. CoRR abs/2303.18223 (2023). https://doi.org/10.48550/ARXIV.2303.18223
[78]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 -16, 2023.
[79]
Honglu Zhou, Roberto Martín-Martín, Mubbasir Kapadia, Silvio Savarese, and Juan Carlos Niebles. 2023. Procedure-Aware Pretraining for Instructional Video Understanding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 10727--10738. https://doi.org/10.1109/CVPR52729.2023.01033
[80]
Shuyan Zhou, Li Zhang, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch, and Graham Neubig. 2022. Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022. Association for Computational Linguistics, 2998--3012. https://doi.org/10.18653/V1/2022.ACL-LONG.214
[81]
Fengbin Zhu, Wenqiang Lei, Chao Wang, Jianming Zheng, Soujanya Poria, and Tat-Seng Chua. 2021. Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering. CoRR abs/2101.00774 (2021).

Index Terms

  1. PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language Models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
      Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 8, Issue 4
      December 2024
      1788 pages
      EISSN:2474-9567
      DOI:10.1145/3705705
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 November 2024
      Published in IMWUT Volume 8, Issue 4

      Check for updates

      Author Tags

      1. context-aware
      2. large language models
      3. procedure tracking
      4. question answering
      5. task assistance

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 270
        Total Downloads
      • Downloads (Last 12 months)270
      • Downloads (Last 6 weeks)160
      Reflects downloads up to 12 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media