research-article

Open access

G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

Authors:

Chun YuAuthors Info & Claims

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 8, Issue 2

Article No.: 78, Pages 1 - 33

https://doi.org/10.1145/3659623

Published: 15 May 2024 Publication History

Abstract

Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze --- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables --- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems.

References

[1]

[n. d.]. Microsoft Bing. https://www.bing.com/. Accessed: 2023-09-07.

[2]

Henny Admoni and Siddhartha Srinivasa. 2016. Predicting user intent through eye gaze for shared autonomy. In 2016 AAAI Fall Symposium Series.

[3]

Antti Ajanki, Mark Billinghurst, Toni Järvenpää, Melih Kandemir, Samuel Kaski, Markus Koskela, Mikko Kurimo, Jorma Laaksonen, Kai Puolamäki, Teemu Ruokolainen, et al. 2010. Contextual information access with augmented reality. In 2010 IEEE International Workshop on Machine Learning for Signal Processing. IEEE, 95--100.

[4]

Antti Ajanki, David R Hardoon, Samuel Kaski, Kai Puolamäki, and John Shawe-Taylor. 2009. Can eyes reveal interest? Implicit queries from gaze patterns. User Modeling and User-Adapted Interaction 19 (2009), 307--339.

Digital Library

[5]

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. 2022. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems 35 (2022), 23716--23736.

[6]

Alibaba. 2023. Tongyi Qianwen. (2023).

[7]

James Allan, Bruce Croft, Alistair Moffat, and Mark Sanderson. 2012. Frontiers, challenges, and opportunities for information retrieval: Report from SWIRL 2012 the second strategic workshop on information retrieval in Lorne. In Acm sigir forum, Vol. 46. ACM New York, NY, USA, 2--32.

[8]

Sean Andrist, Michael Gleicher, and Bilge Mutlu. 2017. Looking Coordinated: Bidirectional Gaze Mechanisms for Collaborative Interaction with Virtual Characters. International Conference on Human Factors in Computing Systems (2017). https://doi.org/10.1145/3025453.3026033

Digital Library

[9]

Anas Awadalla, Irena Gao, Joshua Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, and Ludwig Schmidt. 2023. OpenFlamingo. https://doi.org/10.5281/zenodo.7733589

[10]

Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. 1999. Modern information retrieval. Vol. 463. ACM press New York.

[11]

Baidu. 2023. ERNIE Bot: Enhanced Representation through Knowledge Integration. (2023).

[12]

Richard A Bolt. 1980. "Put-that-there" Voice and gesture at the graphics interface. In Proceedings of the 7th annual conference on Computer graphics and interactive techniques. 262--270.

Digital Library

[13]

Andreas Bulling, Jamie A Ward, Hans Gellersen, and Gerhard Tröster. 2009. Eye movement analysis for activity recognition. In Proceedings of the 11th international conference on Ubiquitous computing. 41--50.

Digital Library

[14]

Wolfgang Büschel, Annett Mitschick, and Raimund Dachselt. 2018. Demonstrating Reality-Based Information Retrieval. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. 1--4.

[15]

Wolfgang Büschel, Annett Mitschick, and Raimund Dachselt. 2018. Here and Now: Reality-Based Information Retrieval: Perspective Paper. In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval (New Brunswick, NJ, USA) (CHIIR '18). Association for Computing Machinery, New York, NY, USA, 171--180. https://doi.org/10.1145/3176349.3176384

Digital Library

[16]

Jia Chen, Jiaxin Mao, Yiqun Liu, Fan Zhang, Min Zhang, and Shaoping Ma. 2021. Towards a better understanding of query reformulation behavior in web search. In Proceedings of the web conference 2021. 743--755.

Digital Library

[17]

Keqin Chen, Zhao Zhang, Weili Zeng, Richong Zhang, Feng Zhu, and Rui Zhao. 2023. Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic. arXiv preprint arXiv:2306.15195 (2023).

[18]

Liangyu Chen, Bo Li, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, and Ziwei Liu. 2023. Language Models are Visual Reasoning Coordinators. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models. https://openreview.net/forum?id=kdHpWogtX6Y

[19]

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://vicuna.lmsys.org

[20]

Jaemin Cho, Jie Lei, Hao Tan, and Mohit Bansal. 2021. Unifying vision-and-language tasks via text generation. In International Conference on Machine Learning. PMLR, 1931--1942.

[21]

Sarah D'Angelo and Darren Gergle. 2016. Gazed and Confused: Understanding and Designing Shared Gaze for Remote Collaboration. International Conference on Human Factors in Computing Systems (2016). https://doi.org/10.1145/2858036.2858499

Digital Library

[22]

Ellysse Dick. 2021. The promise of immersive learning: Augmented and virtual reality's potential in education. Information Technology and Innovation Foundation (2021).

[23]

Jiexin Ding, Bowen Zhao, Yuqi Huang, Yuntao Wang, and Yuanchun Shi. 2023. GazeReader: Detecting Unknown Word Using Webcam for English as a Second Language (ESL) Learners. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA '23). Association for Computing Machinery, New York, NY, USA, Article 149, 7 pages. https://doi.org/10.1145/3544549.3585790

Digital Library

[24]

Anup Doshi and Mohan M. Trivedi. 2009. Investigating the relationships between gaze patterns, dynamic vehicle surround analysis, and driver intentions. IEEE Intelligent Vehicles Symposium (2009). https://doi.org/10.1109/ivs.2009.5164397

[25]

Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. 2023. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).

[26]

Mats Ole Ellenberg, Marc Satkowski, Weizhou Luo, and Raimund Dachselt. 2023. Spatiality and Semantics - Towards Understanding Content Placement in Mixed Reality. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA '23). Association for Computing Machinery, New York, NY, USA, Article 254, 8 pages. https://doi.org/10.1145/3544549.3585853

Digital Library

[27]

Mats Ole Ellenberg, Marc Satkowski, Weizhou Luo, and Raimund Dachselt. 2023. Spatiality and Semantics - Towards Understanding Content Placement in Mixed Reality. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA '23). Association for Computing Machinery, New York, NY, USA, Article 254, 8 pages. https://doi.org/10.1145/3544549.3585853

Digital Library

[28]

Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, et al. 2023. LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. arXiv preprint arXiv:2304.15010 (2023).

[29]

Ziqi Gao, Yuntao Wang, Jianguo Chen, Junliang Xing, Shwetak Patel, Xin Liu, and Yuanchun Shi. 2023. MMTSA: Multi-Modal Temporal Segment Attention Network for Efficient Human Activity Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 3 (2023), 1--26.

Digital Library

[30]

Toni Giorgino. 2009. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software 31, 7 (2009). https://doi.org/10.18637/jss.v031.i07

[31]

Chien-Ming Huang, Sean Andrist, Allison Sauppé, and Bilge Mutlu. 2015. Using gaze patterns to predict task intent in collaboration. Frontiers in Psychology (2015). https://doi.org/10.3389/fpsyg.2015.01049

[32]

Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1072--1080.

[33]

Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, and Xiang Ren. 2023. GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions. arXiv preprint arXiv:2305.14676 (2023).

[34]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arXiv:2304.02643 [cs.CV]

[35]

Michael F Land. 2006. Eye movements and the control of actions in everyday life. Progress in retinal and eye research 25, 3 (2006), 296--324.

[36]

Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Jingkang Yang, and Ziwei Liu. 2023. Otter: A Multi-Modal Model with In-Context Instruction Tuning. arXiv preprint arXiv:2305.03726 (2023).

[37]

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).

[38]

Mingyang Li, Yulin Xu, and Aolei Yang. 2021. Collaborative Robot Grasping System Based on Gaze Interaction. Intelligent Equipment, Robots, and Vehicles (2021). https://doi.org/10.1007/978-981-16-7213-2_8

[39]

Tica Lin, Yalong Yang, Johanna Beyer, and Hanspeter Pfister. 2021. Labeling out-of-view objects in immersive analytics to support situated visual searching. IEEE Transactions on Visualization and Computer Graphics (2021).

[40]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning. arXiv preprint arXiv:2304.08485 (2023).

[41]

Xiaoyi Liu, Yingtian Shi, Chun Yu, Cheng Gao, Tianao Yang, Chen Liang, and Yuanchun Shi. 2023. Understanding In-Situ Programming for Smart Home Automation. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 2 (2023), 1--31.

Digital Library

[42]

Jianxun Lou, Hanhe Lin, David Marshall, Dietmar Saupe, and Hantao Liu. 2022. TranSalNet: Towards perceptually relevant visual saliency prediction. Neurocomputing (2022). https://doi.org/10.1016/j.neucom.2022.04.080

[43]

Jérôme Louradour. 2023. whisper-timestamped. https://github.com/linto-ai/whisper-timestamped.

[44]

Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. 2022. Simple Open-Vocabulary Object Detection with Vision Transformers. arXiv:2205.06230 [cs.CV]

[45]

Joshua Newn, Ronal Singh, Fraser Allison, Prashan Madumal, Eduardo Velloso, and Frank Vetere. 2019. Designing Interactions with Intention-Aware Gaze-Enabled Artificial Agents. null (2019). https://doi.org/10.1007/978-3-030-29384-0_17

Digital Library

[46]

OpenAI. 2023. GPT-4 Technical Report. (2023).

[47]

OpenAI. 2023. Introducing ChatGPT. (2023).

[48]

Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, and Furu Wei. 2023. Kosmos-2: Grounding Multimodal Large Language Models to the World. arXiv:2306.14824 [cs.CL]

[49]

Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 724--732.

[50]

Lisa Perkhofer and Othmar Lehner. 2019. Using gaze behavior to measure cognitive load. In Information Systems and Neuroscience: NeuroIS Retreat 2018. Springer, 73--83.

[51]

Ken Pfeuffer, Jason Alexander, Ming Ki Chong, Yanxia Zhang, and Hans Gellersen. 2015. Gaze-Shifting: Direct-Indirect Input with Pen and Touch Modulated by Gaze. ACM Symposium on User Interface Software and Technology (2015). https://doi.org/10.1145/2807442.2807460

Digital Library

[52]

Robin Piening, Robin Piening, Ken Pfeuffer, Augusto Esteves, Tim Mittermeier, Sarah Prange, Philippe Schröder, and Florian Alt. 2021. Looking for Info: Evaluation of Gaze Based Information Retrieval in Augmented Reality. IFIP TC13 International Conference on Human-Computer Interaction (2021). https://doi.org/10.1007/978-3-030-85623-6_32

Digital Library

[53]

Alexander Plopski, Teresa Hirzle, Nahal Norouzi, Long Qian, Gerd Bruder, and Tobias Langlotz. 2023. The Eye in Extended Reality: A Survey on Gaze Interaction and Eye Tracking in Head-worn Extended Reality. Comput. Surveys (2023). https://doi.org/10.1145/3491207

Digital Library

[54]

Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, and Vittorio Ferrari. 2020. Connecting vision and language with localized narratives. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V 16. Springer, 647--664.

Digital Library

[55]

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning. PMLR, 28492--28518.

[56]

B. J. Rhodes and P. Maes. 2000. Just-in-time information retrieval agents. IBM Systems Journal 39, 3.4 (2000), 685--704. https://doi.org/10.1147/sj.393.0685

Digital Library

[57]

Hosnieh Sattar, Mario Fritz, and Andreas Bulling. 2020. Deep gaze pooling: Inferring and visually decoding search intents from human gaze fixations. Neurocomputing 387 (2020), 369--382.

Digital Library

[58]

SenseTime. 2023. Sense Nova. (2023).

[59]

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. 2023. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580 (2023).

[60]

Yukun Su, Jingliang Deng, Ruizhou Sun, Guosheng Lin, and Qingyao Wu. 2023. A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection. IEEE Transactions on Multimedia (2023).

[61]

Dídac Surís, Sachit Menon, and Carl Vondrick. 2023. Vipergpt: Visual inference via python execution for reasoning. arXiv preprint arXiv:2303.08128 (2023).

[62]

Vildan Tanriverdi and Robert JK Jacob. 2000. Interacting with eye movements in virtual environments. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 265--272.

Digital Library

[63]

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.

[64]

Marc Tonsen, Chris Kay Baumann, and Kai Dierkes. 2020. A High-Level Description and Performance Evaluation of Pupil Invisible. arXiv preprint arXiv:2009.00508 (2020).

[65]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971 (2023).

[66]

Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. 2022. Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning. PMLR, 23318--23340.

[67]

Yuntao Wang, Zirui Cheng, Xin Yi, Yan Kong, Xueyang Wang, Xuhai Xu, Yukang Yan, Chun Yu, Shwetak Patel, and Yuanchun Shi. 2023. Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution Images. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1--15.

Digital Library

[68]

Yuntao Wang, Jiexin Ding, Ishan Chatterjee, Farshid Salemi Parizi, Yuzhou Zhuang, Yukang Yan, Shwetak Patel, and Yuanchun Shi. 2022. FaceOri: Tracking head position and orientation using ultrasonic ranging on earphones. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1--12.

Digital Library

[69]

Yushi Wei, Rongkai Shi, Difeng Yu, Yihong Wang, Yue Li, Lingyun Yu, and Hai-Ning Liang. 2023. Predicting Gaze-based Target Selection in Augmented Reality Headsets based on Eye and Head Endpoint Distributions. International Conference on Human Factors in Computing Systems (2023). https://doi.org/10.1145/3544548.3581042

Digital Library

[70]

Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. 2023. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671 (2023).

[71]

Xuhai Xu, Anna Yu, Tanya R. Jonker, Kashyap Todi, Feiyu Lu, Xun Qian, João Marcelo Evangelista Belo, Tianyi Wang, Michelle Li, Aran Mun, Te-Yen Wu, Junxiao Shen, Ting Zhang, Narine Kokhlikyan, Fulton Wang, Paul Sorenson, Sophie Kim, and Hrvoje Benko. 2023. XAIR: A Framework of Explainable AI in Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI '23). Association for Computing Machinery, New York, NY, USA, Article 202, 30 pages. https://doi.org/10.1145/3544548.3581500

Digital Library

[72]

Xuhai Xu, Chun Yu, Yuntao Wang, and Yuanchun Shi. 2020. Recognizing unintentional touch on interactive tabletop. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--24.

Digital Library

[73]

Kun Yan, Lei Ji, Huaishao Luo, Ming Zhou, Nan Duan, and Shuai Ma. 2021. Control image captioning spatially and temporally. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2014--2025.

[74]

Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, and Shuai Ma. 2023. Voila-A: Aligning Vision-Language Models with User's Gaze Attention. arXiv:2401.09454 [cs.CV]

[75]

Yukang Yan, Haohua Liu, Yingtian Shi, Jingying Wang, Ruici Guo, Zisu Li, Xuhai Xu, Chun Yu, Yuntao Wang, and Yuanchun Shi. 2023. ConeSpeech: Exploring Directional Speech Interaction for Multi-Person Remote Communication in Virtual Reality. IEEE Transactions on Visualization and Computer Graphics 29, 5 (2023), 2647--2657.

Digital Library

[76]

Felix Yang, Saikishore Kalloori, Ribin Chalumattu, and Markus Gross. 2022. Personalized Information Retrieval for Touristic Attractions in Augmented Reality. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (Virtual Event, AZ, USA) (WSDM '22). Association for Computing Machinery, New York, NY, USA, 1613--1616. https://doi.org/10.1145/3488560.3502194

Digital Library

[77]

Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and Lijuan Wang. 2023. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381 (2023).

[78]

Yuan Yao, Qianyu Chen, Ao Zhang, Wei Ji, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2022. PEVL: Position-enhanced pre-training and prompt tuning for vision-language models. arXiv preprint arXiv:2205.11169 (2022).

[79]

Belinda Zeng. 2022. Go beyond the search box: Introducing multisearch. https://blog.google/products/search/multisearch/

[80]

Zhao Zhang, Wenda Jin, Jun Xu, and Ming-Ming Cheng. 2020. Gradient-Induced Co-Saliency Detection. In European Conference on Computer Vision (ECCV).

[81]

Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, et al. 2022. Regionclip: Region-based language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16793--16803.

[82]

Qiang Zhou, Chaohui Yu, Shaofeng Zhang, Sitong Wu, Zhibing Wang, and Fan Wang. 2023. RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension. arXiv preprint arXiv:2308.02299 (2023).

[83]

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv preprint arXiv:2304.10592 (2023).

[84]

Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, et al. 2022. Generalized Decoding for Pixel, Image, and Language. arXiv preprint arXiv:2212.11270 (2022).

Cited By

Hu CMa XHuang XShen YMa D(2024)LR-Auth: Towards Practical Implementation of Implicit User Authentication on EarbudsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997938:4(1-27)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699793
Li BRen YWang YYang J(2024)SpaceBeat: Identity-aware Multi-person Vital Signs Monitoring Using Commodity WiFiProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785908:3(1-23)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678590
Pettys-Baker RClarke MHolschuh BKostakos VKay JHoang T(2024)Functional Now, Wearable Later: Examining the Design Practices of Wearable TechnologistsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676615(71-81)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675095.3676615
Show More Cited By

Index Terms

G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Natural language interfaces
  2. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Haptic feedback of gaze gestures with glasses: localization accuracy and effectiveness
UbiComp/ISWC'15 Adjunct: Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers

Wearable devices including smart eyewear require new interaction methods between the device and the user. In this paper, we describe our work on the combined use of eye tracking for input and haptic (touch) stimulation for output with eyewear. Input ...
Hide my Gaze with EOG!: Towards Closed-Eye Gaze Gesture Passwords that Resist Observation-Attacks with Electrooculography in Smart Glasses
MoMM2019: Proceedings of the 17th International Conference on Advances in Mobile Computing & Multimedia

Smart glasses allow for gaze gesture passwords as a hands-free form of mobile authentication. However, pupil movements for password input are easily observed by attackers, who thereby can derive the password. In this paper we investigate closed-eye gaze ...
Gliding and saccadic gaze gesture recognition in real time

Eye movements can be consciously controlled by humans to the extent of performing sequences of predefined movement patterns, or 'gaze gestures'. Gaze gestures can be tracked noninvasively employing a video-based eye tracking system. Gaze gestures hold ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 8, Issue 2

June 2024

1330 pages

EISSN:2474-9567

DOI:10.1145/3665317

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2024

Published in IMWUT Volume 8, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
738
Total Downloads

Downloads (Last 12 months)738
Downloads (Last 6 weeks)112

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu CMa XHuang XShen YMa D(2024)LR-Auth: Towards Practical Implementation of Implicit User Authentication on EarbudsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997938:4(1-27)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699793
Li BRen YWang YYang J(2024)SpaceBeat: Identity-aware Multi-person Vital Signs Monitoring Using Commodity WiFiProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785908:3(1-23)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678590
Pettys-Baker RClarke MHolschuh BKostakos VKay JHoang T(2024)Functional Now, Wearable Later: Examining the Design Practices of Wearable TechnologistsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676615(71-81)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675095.3676615
Leng ZJung MHwang SOh SZhang LPlötz TKim KKostakos VKay JHoang T(2024)Emotion Recognition on the Go: Utilizing Wearable IMUs for Personalized Emotion RecognitionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678452(537-544)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678452
Fu YZhang YPan HLu YLi XChen LRen JLi XZhang XZhang Y(2024)Pushing the Limits of Acoustic Spatial Perception via Incident Angle EncodingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595838:2(1-28)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659583
Singhal YHonrales DWang HKim J(2024)Thermal In Motion: Designing Thermal Flow Illusions with Tactile and Thermal InteractionProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676460(1-13)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676460
Fu YZhang YLu YQiu LChen YWang YWang MLi YRen JZhang YOkoshi TKo JLiKamWa R(2024)Adaptive Metasurface-Based Acoustic Imaging using Joint OptimizationProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661863(492-504)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661863
Ruotsalo TTraver VKawala-Sterniuk ALeiva L(2024)Affective RelevanceIEEE Intelligent Systems10.1109/MIS.2024.339150839:4(12-22)Online publication date: 19-Apr-2024
https://dl.acm.org/doi/10.1109/MIS.2024.3391508

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents