Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

Published: 15 May 2024 Publication History

Abstract

Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze --- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables --- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems.

References

[1]
[n. d.]. Microsoft Bing. https://www.bing.com/. Accessed: 2023-09-07.
[2]
Henny Admoni and Siddhartha Srinivasa. 2016. Predicting user intent through eye gaze for shared autonomy. In 2016 AAAI Fall Symposium Series.
[3]
Antti Ajanki, Mark Billinghurst, Toni Järvenpää, Melih Kandemir, Samuel Kaski, Markus Koskela, Mikko Kurimo, Jorma Laaksonen, Kai Puolamäki, Teemu Ruokolainen, et al. 2010. Contextual information access with augmented reality. In 2010 IEEE International Workshop on Machine Learning for Signal Processing. IEEE, 95--100.
[4]
Antti Ajanki, David R Hardoon, Samuel Kaski, Kai Puolamäki, and John Shawe-Taylor. 2009. Can eyes reveal interest? Implicit queries from gaze patterns. User Modeling and User-Adapted Interaction 19 (2009), 307--339.
[5]
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. 2022. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems 35 (2022), 23716--23736.
[6]
Alibaba. 2023. Tongyi Qianwen. (2023).
[7]
James Allan, Bruce Croft, Alistair Moffat, and Mark Sanderson. 2012. Frontiers, challenges, and opportunities for information retrieval: Report from SWIRL 2012 the second strategic workshop on information retrieval in Lorne. In Acm sigir forum, Vol. 46. ACM New York, NY, USA, 2--32.
[8]
Sean Andrist, Michael Gleicher, and Bilge Mutlu. 2017. Looking Coordinated: Bidirectional Gaze Mechanisms for Collaborative Interaction with Virtual Characters. International Conference on Human Factors in Computing Systems (2017). https://doi.org/10.1145/3025453.3026033
[9]
Anas Awadalla, Irena Gao, Joshua Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, and Ludwig Schmidt. 2023. OpenFlamingo. https://doi.org/10.5281/zenodo.7733589
[10]
Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. 1999. Modern information retrieval. Vol. 463. ACM press New York.
[11]
Baidu. 2023. ERNIE Bot: Enhanced Representation through Knowledge Integration. (2023).
[12]
Richard A Bolt. 1980. "Put-that-there" Voice and gesture at the graphics interface. In Proceedings of the 7th annual conference on Computer graphics and interactive techniques. 262--270.
[13]
Andreas Bulling, Jamie A Ward, Hans Gellersen, and Gerhard Tröster. 2009. Eye movement analysis for activity recognition. In Proceedings of the 11th international conference on Ubiquitous computing. 41--50.
[14]
Wolfgang Büschel, Annett Mitschick, and Raimund Dachselt. 2018. Demonstrating Reality-Based Information Retrieval. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. 1--4.
[15]
Wolfgang Büschel, Annett Mitschick, and Raimund Dachselt. 2018. Here and Now: Reality-Based Information Retrieval: Perspective Paper. In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval (New Brunswick, NJ, USA) (CHIIR '18). Association for Computing Machinery, New York, NY, USA, 171--180. https://doi.org/10.1145/3176349.3176384
[16]
Jia Chen, Jiaxin Mao, Yiqun Liu, Fan Zhang, Min Zhang, and Shaoping Ma. 2021. Towards a better understanding of query reformulation behavior in web search. In Proceedings of the web conference 2021. 743--755.
[17]
Keqin Chen, Zhao Zhang, Weili Zeng, Richong Zhang, Feng Zhu, and Rui Zhao. 2023. Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic. arXiv preprint arXiv:2306.15195 (2023).
[18]
Liangyu Chen, Bo Li, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, and Ziwei Liu. 2023. Language Models are Visual Reasoning Coordinators. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models. https://openreview.net/forum?id=kdHpWogtX6Y
[19]
Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://vicuna.lmsys.org
[20]
Jaemin Cho, Jie Lei, Hao Tan, and Mohit Bansal. 2021. Unifying vision-and-language tasks via text generation. In International Conference on Machine Learning. PMLR, 1931--1942.
[21]
Sarah D'Angelo and Darren Gergle. 2016. Gazed and Confused: Understanding and Designing Shared Gaze for Remote Collaboration. International Conference on Human Factors in Computing Systems (2016). https://doi.org/10.1145/2858036.2858499
[22]
Ellysse Dick. 2021. The promise of immersive learning: Augmented and virtual reality's potential in education. Information Technology and Innovation Foundation (2021).
[23]
Jiexin Ding, Bowen Zhao, Yuqi Huang, Yuntao Wang, and Yuanchun Shi. 2023. GazeReader: Detecting Unknown Word Using Webcam for English as a Second Language (ESL) Learners. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA '23). Association for Computing Machinery, New York, NY, USA, Article 149, 7 pages. https://doi.org/10.1145/3544549.3585790
[24]
Anup Doshi and Mohan M. Trivedi. 2009. Investigating the relationships between gaze patterns, dynamic vehicle surround analysis, and driver intentions. IEEE Intelligent Vehicles Symposium (2009). https://doi.org/10.1109/ivs.2009.5164397
[25]
Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. 2023. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).
[26]
Mats Ole Ellenberg, Marc Satkowski, Weizhou Luo, and Raimund Dachselt. 2023. Spatiality and Semantics - Towards Understanding Content Placement in Mixed Reality. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA '23). Association for Computing Machinery, New York, NY, USA, Article 254, 8 pages. https://doi.org/10.1145/3544549.3585853
[27]
Mats Ole Ellenberg, Marc Satkowski, Weizhou Luo, and Raimund Dachselt. 2023. Spatiality and Semantics - Towards Understanding Content Placement in Mixed Reality. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA '23). Association for Computing Machinery, New York, NY, USA, Article 254, 8 pages. https://doi.org/10.1145/3544549.3585853
[28]
Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, et al. 2023. LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. arXiv preprint arXiv:2304.15010 (2023).
[29]
Ziqi Gao, Yuntao Wang, Jianguo Chen, Junliang Xing, Shwetak Patel, Xin Liu, and Yuanchun Shi. 2023. MMTSA: Multi-Modal Temporal Segment Attention Network for Efficient Human Activity Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 3 (2023), 1--26.
[30]
Toni Giorgino. 2009. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software 31, 7 (2009). https://doi.org/10.18637/jss.v031.i07
[31]
Chien-Ming Huang, Sean Andrist, Allison Sauppé, and Bilge Mutlu. 2015. Using gaze patterns to predict task intent in collaboration. Frontiers in Psychology (2015). https://doi.org/10.3389/fpsyg.2015.01049
[32]
Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1072--1080.
[33]
Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, and Xiang Ren. 2023. GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions. arXiv preprint arXiv:2305.14676 (2023).
[34]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arXiv:2304.02643 [cs.CV]
[35]
Michael F Land. 2006. Eye movements and the control of actions in everyday life. Progress in retinal and eye research 25, 3 (2006), 296--324.
[36]
Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Jingkang Yang, and Ziwei Liu. 2023. Otter: A Multi-Modal Model with In-Context Instruction Tuning. arXiv preprint arXiv:2305.03726 (2023).
[37]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).
[38]
Mingyang Li, Yulin Xu, and Aolei Yang. 2021. Collaborative Robot Grasping System Based on Gaze Interaction. Intelligent Equipment, Robots, and Vehicles (2021). https://doi.org/10.1007/978-981-16-7213-2_8
[39]
Tica Lin, Yalong Yang, Johanna Beyer, and Hanspeter Pfister. 2021. Labeling out-of-view objects in immersive analytics to support situated visual searching. IEEE Transactions on Visualization and Computer Graphics (2021).
[40]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).
[41]
Xiaoyi Liu, Yingtian Shi, Chun Yu, Cheng Gao, Tianao Yang, Chen Liang, and Yuanchun Shi. 2023. Understanding In-Situ Programming for Smart Home Automation. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 2 (2023), 1--31.
[42]
Jianxun Lou, Hanhe Lin, David Marshall, Dietmar Saupe, and Hantao Liu. 2022. TranSalNet: Towards perceptually relevant visual saliency prediction. Neurocomputing (2022). https://doi.org/10.1016/j.neucom.2022.04.080
[43]
Jérôme Louradour. 2023. whisper-timestamped. https://github.com/linto-ai/whisper-timestamped.
[44]
Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. 2022. Simple Open-Vocabulary Object Detection with Vision Transformers. arXiv:2205.06230 [cs.CV]
[45]
Joshua Newn, Ronal Singh, Fraser Allison, Prashan Madumal, Eduardo Velloso, and Frank Vetere. 2019. Designing Interactions with Intention-Aware Gaze-Enabled Artificial Agents. null (2019). https://doi.org/10.1007/978-3-030-29384-0_17
[46]
OpenAI. 2023. GPT-4 Technical Report. (2023).
[47]
OpenAI. 2023. Introducing ChatGPT. (2023).
[48]
Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, and Furu Wei. 2023. Kosmos-2: Grounding Multimodal Large Language Models to the World. arXiv:2306.14824 [cs.CL]
[49]
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 724--732.
[50]
Lisa Perkhofer and Othmar Lehner. 2019. Using gaze behavior to measure cognitive load. In Information Systems and Neuroscience: NeuroIS Retreat 2018. Springer, 73--83.
[51]
Ken Pfeuffer, Jason Alexander, Ming Ki Chong, Yanxia Zhang, and Hans Gellersen. 2015. Gaze-Shifting: Direct-Indirect Input with Pen and Touch Modulated by Gaze. ACM Symposium on User Interface Software and Technology (2015). https://doi.org/10.1145/2807442.2807460
[52]
Robin Piening, Robin Piening, Ken Pfeuffer, Augusto Esteves, Tim Mittermeier, Sarah Prange, Philippe Schröder, and Florian Alt. 2021. Looking for Info: Evaluation of Gaze Based Information Retrieval in Augmented Reality. IFIP TC13 International Conference on Human-Computer Interaction (2021). https://doi.org/10.1007/978-3-030-85623-6_32
[53]
Alexander Plopski, Teresa Hirzle, Nahal Norouzi, Long Qian, Gerd Bruder, and Tobias Langlotz. 2023. The Eye in Extended Reality: A Survey on Gaze Interaction and Eye Tracking in Head-worn Extended Reality. Comput. Surveys (2023). https://doi.org/10.1145/3491207
[54]
Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, and Vittorio Ferrari. 2020. Connecting vision and language with localized narratives. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V 16. Springer, 647--664.
[55]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning. PMLR, 28492--28518.
[56]
B. J. Rhodes and P. Maes. 2000. Just-in-time information retrieval agents. IBM Systems Journal 39, 3.4 (2000), 685--704. https://doi.org/10.1147/sj.393.0685
[57]
Hosnieh Sattar, Mario Fritz, and Andreas Bulling. 2020. Deep gaze pooling: Inferring and visually decoding search intents from human gaze fixations. Neurocomputing 387 (2020), 369--382.
[58]
SenseTime. 2023. Sense Nova. (2023).
[59]
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. 2023. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580 (2023).
[60]
Yukun Su, Jingliang Deng, Ruizhou Sun, Guosheng Lin, and Qingyao Wu. 2023. A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection. IEEE Transactions on Multimedia (2023).
[61]
Dídac Surís, Sachit Menon, and Carl Vondrick. 2023. Vipergpt: Visual inference via python execution for reasoning. arXiv preprint arXiv:2303.08128 (2023).
[62]
Vildan Tanriverdi and Robert JK Jacob. 2000. Interacting with eye movements in virtual environments. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 265--272.
[63]
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
[64]
Marc Tonsen, Chris Kay Baumann, and Kai Dierkes. 2020. A High-Level Description and Performance Evaluation of Pupil Invisible. arXiv preprint arXiv:2009.00508 (2020).
[65]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971 (2023).
[66]
Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. 2022. Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning. PMLR, 23318--23340.
[67]
Yuntao Wang, Zirui Cheng, Xin Yi, Yan Kong, Xueyang Wang, Xuhai Xu, Yukang Yan, Chun Yu, Shwetak Patel, and Yuanchun Shi. 2023. Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution Images. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1--15.
[68]
Yuntao Wang, Jiexin Ding, Ishan Chatterjee, Farshid Salemi Parizi, Yuzhou Zhuang, Yukang Yan, Shwetak Patel, and Yuanchun Shi. 2022. FaceOri: Tracking head position and orientation using ultrasonic ranging on earphones. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1--12.
[69]
Yushi Wei, Rongkai Shi, Difeng Yu, Yihong Wang, Yue Li, Lingyun Yu, and Hai-Ning Liang. 2023. Predicting Gaze-based Target Selection in Augmented Reality Headsets based on Eye and Head Endpoint Distributions. International Conference on Human Factors in Computing Systems (2023). https://doi.org/10.1145/3544548.3581042
[70]
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. 2023. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671 (2023).
[71]
Xuhai Xu, Anna Yu, Tanya R. Jonker, Kashyap Todi, Feiyu Lu, Xun Qian, João Marcelo Evangelista Belo, Tianyi Wang, Michelle Li, Aran Mun, Te-Yen Wu, Junxiao Shen, Ting Zhang, Narine Kokhlikyan, Fulton Wang, Paul Sorenson, Sophie Kim, and Hrvoje Benko. 2023. XAIR: A Framework of Explainable AI in Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI '23). Association for Computing Machinery, New York, NY, USA, Article 202, 30 pages. https://doi.org/10.1145/3544548.3581500
[72]
Xuhai Xu, Chun Yu, Yuntao Wang, and Yuanchun Shi. 2020. Recognizing unintentional touch on interactive tabletop. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--24.
[73]
Kun Yan, Lei Ji, Huaishao Luo, Ming Zhou, Nan Duan, and Shuai Ma. 2021. Control image captioning spatially and temporally. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2014--2025.
[74]
Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, and Shuai Ma. 2023. Voila-A: Aligning Vision-Language Models with User's Gaze Attention. arXiv:2401.09454 [cs.CV]
[75]
Yukang Yan, Haohua Liu, Yingtian Shi, Jingying Wang, Ruici Guo, Zisu Li, Xuhai Xu, Chun Yu, Yuntao Wang, and Yuanchun Shi. 2023. ConeSpeech: Exploring Directional Speech Interaction for Multi-Person Remote Communication in Virtual Reality. IEEE Transactions on Visualization and Computer Graphics 29, 5 (2023), 2647--2657.
[76]
Felix Yang, Saikishore Kalloori, Ribin Chalumattu, and Markus Gross. 2022. Personalized Information Retrieval for Touristic Attractions in Augmented Reality. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (Virtual Event, AZ, USA) (WSDM '22). Association for Computing Machinery, New York, NY, USA, 1613--1616. https://doi.org/10.1145/3488560.3502194
[77]
Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and Lijuan Wang. 2023. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381 (2023).
[78]
Yuan Yao, Qianyu Chen, Ao Zhang, Wei Ji, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2022. PEVL: Position-enhanced pre-training and prompt tuning for vision-language models. arXiv preprint arXiv:2205.11169 (2022).
[79]
Belinda Zeng. 2022. Go beyond the search box: Introducing multisearch. https://blog.google/products/search/multisearch/
[80]
Zhao Zhang, Wenda Jin, Jun Xu, and Ming-Ming Cheng. 2020. Gradient-Induced Co-Saliency Detection. In European Conference on Computer Vision (ECCV).
[81]
Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, et al. 2022. Regionclip: Region-based language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16793--16803.
[82]
Qiang Zhou, Chaohui Yu, Shaofeng Zhang, Sitong Wu, Zhibing Wang, and Fan Wang. 2023. RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension. arXiv preprint arXiv:2308.02299 (2023).
[83]
Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv preprint arXiv:2304.10592 (2023).
[84]
Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, et al. 2022. Generalized Decoding for Pixel, Image, and Language. arXiv preprint arXiv:2212.11270 (2022).

Cited By

View all
  • (2024)LR-Auth: Towards Practical Implementation of Implicit User Authentication on EarbudsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997938:4(1-27)Online publication date: 21-Nov-2024
  • (2024)SpaceBeat: Identity-aware Multi-person Vital Signs Monitoring Using Commodity WiFiProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785908:3(1-23)Online publication date: 9-Sep-2024
  • (2024)Functional Now, Wearable Later: Examining the Design Practices of Wearable TechnologistsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676615(71-81)Online publication date: 5-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 8, Issue 2
June 2024
1330 pages
EISSN:2474-9567
DOI:10.1145/3665317
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2024
Published in IMWUT Volume 8, Issue 2

Check for updates

Author Tags

  1. gaze tracking
  2. information query
  3. information retrieval
  4. large language models
  5. smart glasses

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)738
  • Downloads (Last 6 weeks)112
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LR-Auth: Towards Practical Implementation of Implicit User Authentication on EarbudsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997938:4(1-27)Online publication date: 21-Nov-2024
  • (2024)SpaceBeat: Identity-aware Multi-person Vital Signs Monitoring Using Commodity WiFiProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785908:3(1-23)Online publication date: 9-Sep-2024
  • (2024)Functional Now, Wearable Later: Examining the Design Practices of Wearable TechnologistsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676615(71-81)Online publication date: 5-Oct-2024
  • (2024)Emotion Recognition on the Go: Utilizing Wearable IMUs for Personalized Emotion RecognitionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678452(537-544)Online publication date: 5-Oct-2024
  • (2024)Pushing the Limits of Acoustic Spatial Perception via Incident Angle EncodingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595838:2(1-28)Online publication date: 15-May-2024
  • (2024)Thermal In Motion: Designing Thermal Flow Illusions with Tactile and Thermal InteractionProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676460(1-13)Online publication date: 13-Oct-2024
  • (2024)Adaptive Metasurface-Based Acoustic Imaging using Joint OptimizationProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661863(492-504)Online publication date: 3-Jun-2024
  • (2024)Affective RelevanceIEEE Intelligent Systems10.1109/MIS.2024.339150839:4(12-22)Online publication date: 19-Apr-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media