Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Commonsense Knowledge-Driven Joint Reasoning Approach for Object Retrieval in Virtual Reality

Published: 05 December 2023 Publication History

Abstract

National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence (BIGAI), China
Retrieving out-of-reach objects is a crucial task in virtual reality (VR). One of the most commonly used approaches for this task is the gesture-based approach, which allows for bare-hand, eyes-free, and direct retrieval. However, previous work has primarily focused on assigned gesture design, neglecting the context. This can make it challenging to accurately retrieve an object from a large number of objects due to the one-to-one mapping metaphor, limitations of finger poses, and memory burdens. There is a general consensus that objects and contexts are related, which suggests that the object expected to be retrieved is related to the context, including the scene and the objects with which users interact. As such, we propose a commonsense knowledge-driven joint reasoning approach for object retrieval, where human grasping gestures and context are modeled using an And-Or graph (AOG). This approach enables users to accurately retrieve objects from a large number of candidate objects by using natural grasping gestures based on their experience of grasping physical objects. Experimental results demonstrate that our proposed approach improves retrieval accuracy. We also propose an object retrieval system based on the proposed approach. Two user studies show that our system enables efficient object retrieval in virtual environments (VEs).

Supplementary Material

ZIP File (papers_221s4-file4.zip)
supplemental
MP4 File (papers_221s4-file3.mp4)
supplemental

References

[1]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, et al. 2022. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).
[2]
Martin Schrepp Andreas Hinderks and Jörg Thomaschewski. 2019. User Experience Questionnaire. https://www.ueq-online.org/.
[3]
Visar Arapi, Cosimo Della Santina, Giuseppe Averta, Antonio Bicchi, and Matteo Bianchi. 2021. Understanding human manipulation with the environment: a novel taxonomy for video labelling. IEEE Robotics and Automation Letters 6, 4 (2021), 6537--6544.
[4]
Ferran Argelaguet and Carlos Andujar. 2009. Efficient 3D pointing selection in cluttered virtual environments. IEEE Comput. Graph. Appl. 29, 6 (nov 2009), 34--43.
[5]
Ferran Argelaguet and Carlos Andujar. 2013. A survey of 3D object selection techniques for virtual environments. Computers & Graphics 37, 3 (2013), 121--136.
[6]
Rahul Arora, Rubaiat Habib Kazi, Danny M. Kaufman, Wilmot Li, and Karan Singh. 2019. MagicalHands: mid-air hand gestures for animating in VR. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST '19). Association for Computing Machinery, New York, NY, USA, 463--477.
[7]
Marc Baloup, Thomas Pietrzak, and Géry Casiez. 2019. RayCursor: a 3D pointing facilitation technique based on raycasting. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12.
[8]
Patrick Baudisch, Henning Pohl, Stefanie Reinicke, Emilia Wittmers, Patrick Lühne, Marius Knaust, Sven Köhler, Patrick Schmidt, and Christian Holz. 2013. Imaginary reality gaming: ball games without a ball. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (St. Andrews, Scotland, United Kingdom) (UIST '13). Association for Computing Machinery, New York, NY, USA, 405--410.
[9]
Doug A Bowman and Chadwick A Wingrave. 2001. Design and evaluation of menu systems for immersive virtual environments. In Preceedings of the IEEE Virtual Reality. IEEE, 149--156.
[10]
John Brooke et al. 1996. SUS-A quick and dirty usability scale. Usability evaluation in industry 189, 194 (1996), 4--7.
[11]
Ian M. Bullock, Joshua Z. Zheng, Sara De La Rosa, Charlotte Guertler, and Aaron M. Dollar. 2013. Grasp frequency and usage in daily household and machine shop tasks. IEEE Transactions on Haptics 6, 3 (2013), 296--308.
[12]
Berk Calli, Aaron Walsman, Arjun Singh, Siddhartha Srinivasa, Pieter Abbeel, and Aaron M. Dollar. 2015. Benchmarking in manipulation research: using the Yale-CMU-Berkeley object and model set. IEEE Robotics & Automation Magazine 22, 3 (2015), 36--52.
[13]
Jeffrey Cashion, Chadwick A. Wingrave, and Joseph J. LaViola. 2012. Dense and dynamic 3D selection for game-based virtual environments. IEEE Transactions on Visualization and Computer Graphics 18 (2012), 634--642.
[14]
Vinton Cerf and Robert Kahn. 1974. A protocol for packet network intercommunication. IEEE Transactions on communications 22, 5 (1974), 637--648.
[15]
Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, and Alex Hauptmann. 2021. Scene graphs: a survey of generations and applications. arXiv preprint arXiv:2104.01111 (2021).
[16]
Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, and Song-Chun Zhu. 2019. Holistic++ scene understanding: single-view 3d holistic scene parsing and human pose estimation with human-object interaction and physical commonsense. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8648--8657.
[17]
Enric Corona, Albert Pumarola, Guillem Alenyà, Francesc Moreno-Noguer, and Grégory Rogez. 2020. GanHand: predicting human grasp affordances in multi-object scenes. In Preceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5030--5040.
[18]
Nathan Cournia, John D. Smith, and Andrew T. Duchowski. 2003. Gaze- vs. hand-based pointing in virtual environments. In Proceedings of the 2003 CHI Extended Abstracts on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI EA '03). Association for Computing Machinery, New York, NY, USA, 772--773.
[19]
R. Dachselt and A. Hübner. 2006. A survey and taxonomy of 3D menu techniques. In Proceedings of the 12th Eurographics Conference on Virtual Environments (Lisbon, Portugal) (EGVE'06). Eurographics Association, Goslar, DEU, 89--99.
[20]
Kaushik Das and Christoph W Borst. 2010. An evaluation of menu properties and pointing techniques in a projection-based VR environment. In Proceedings of the IEEE Symposium on 3D User Interfaces (3DUI). IEEE, 47--50.
[21]
William Delamare, Maxime Daniel, and Khalad Hasan. 2022. MultiFingerBubble: a 3D bubble cursor variation for dense environments. In Proceedings of the Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA '22). Association for Computing Machinery, New York, NY, USA, Article 453, 6 pages.
[22]
Cheng-Long Deng, Chen-Yu Tian, and Shu-Guang Kuai. 2022. A combination of eye-gaze and head-gaze interactions improves efficiency and user experience in an object positioning task in virtual environments. Applied Ergonomics 103 (2022), 103785.
[23]
Shujie Deng, Jian Chang, Shi-Min Hu, and Jian Jun Zhang. 2017. Gaze modulated disambiguation technique for gesture control in 3D virtual objects selection. In Preceedings of the 3rd IEEE International Conference on Cybernetics (CYBCONF). 1--8.
[24]
Diane Dewez, Ludovic Hoyet, Anatole Lécuyer, and Ferran Argelaguet. 2022. Do you need another hand? Investigating dual body representations during anisomorphic 3D manipulation. IEEE Transactions on Visualization and Computer Graphics 28, 5 (2022), 2047--2057.
[25]
Thomas Feix, Javier Romero, Heinz-Bodo Schmiedmayer, Aaron M. Dollar, and Danica Kragic. 2016. The GRASP taxonomy of human grasp types. IEEE Transactions on Human-Machine Systems 46, 1 (2016), 66--77.
[26]
Matthew Fisher, Daniel Ritchie, Manolis Savva, Thomas Funkhouser, and Pat Hanrahan. 2012. Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics (TOG) 31, 6 (2012), 1--11.
[27]
Qiang Fu, Xiaowu Chen, Xiaotian Wang, Sijia Wen, Bin Zhou, and Hongbo Fu. 2017. Adaptive synthesis of indoor scenes via activity-associated object relation graphs. 36, 6, Article 201 (nov 2017), 13 pages.
[28]
Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, and Roozbeh Mottaghi. 2022. Continuous scene representations for embodied AI. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14849--14859.
[29]
Paul Gay, James Stuart, and Alessio Del Bue. 2018. Visual graphs from motion (vgfm): scene understanding with object geometry reasoning. In Proceedings of the Asian Conference on Computer Vision (ACCV). Springer, 330--346.
[30]
Sascha Gebhardt, Sebastian Pick, Franziska Leithold, Bernd Hentschel, and Torsten Kuhlen. 2013. Extended pie menus for immersive virtual environments. IEEE Transactions on Visualization and Computer Graphics 19, 4 (2013), 644--651.
[31]
Dominique Gerber and Dominique Bechmann. 2005. The spin menu: a menu system for virtual environments. In Proceedings of the IEEE Virtual Reality 2005. IEEE Computer Society, 271--272.
[32]
Francesco Giuliari, Geri Skenderi, Marco Cristani, Yiming Wang, and Alessio Del Bue. 2022. Spatial commonsense graph for object localisation in partial scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19518--19527.
[33]
Tovi Grossman and Ravin Balakrishnan. 2006. The design and evaluation of selection techniques for 3D volumetric displays. In Proceedings of the 19th Annual ACM Symposium on User Interface Software and Technology (Montreux, Switzerland) (UIST '06). Association for Computing Machinery, New York, NY, USA, 3--12.
[34]
Sean Gustafson, Daniel Bierwirth, and Patrick Baudisch. 2010. Imaginary interfaces: spatial interaction with empty hands and without visual feedback. In Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology (New York, New York, USA) (UIST '10). Association for Computing Machinery, New York, NY, USA, 3--12.
[35]
Sean Gustafson, Christian Holz, and Patrick Baudisch. 2011. Imaginary phone:learning imaginary interfaces by transferring spatial memory from a familiar device. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST '11). Association for Computing Machinery, New York, NY, USA, 283--292.
[36]
Feng Han and Song-Chun Zhu. 2005. Bottom-up/top-down image parsing by attribute graph grammar. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Vol. 2. IEEE, 1778--1785.
[37]
Chris Harrison, Robert Xiao, Julia Schwarz, and Scott E. Hudson. 2014. TouchTools: leveraging familiarity and skill with physical tools to augment touch interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI '14). Association for Computing Machinery, New York, NY, USA, 2913--2916.
[38]
Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139--183.
[39]
Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, et al. 2023. Grounded decoding: guiding text generation with grounded models for robot control. arXiv preprint arXiv:2303.00855 (2023).
[40]
Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, and Song-Chun Zhu. 2018. Configurable 3d scene synthesis and 2d image rendering with per-pixel ground truth using stochastic grammars. International Journal of Computer Vision 126 (2018), 920--941.
[41]
Haiyan Jiang, Dongdong Weng, Xiaonuo Dongye, and Yue Liu. 2022a. A Pinch-based Text Entry Method for Head-mounted Displays. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 646--647.
[42]
Haiyan Jiang, Dongdong Weng, Xiaonuo Dongye, and Yue Liu. 2022b. PinchText: One-handed text entry technique combining pinch gestures and hand positions for head-mounted displays. International Journal of Human-Computer Interaction (2022), 1--17.
[43]
Haiyan Jiang, Dongdong Weng, Xiaonuo Dongye, Nan Zhang, and Luo Le. 2023a. A commonsense knowledge-based object retrieval approach for Virtual reality. In Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). 795--796.
[44]
Haiyan Jiang, Dongdong Weng, Zhen Song, Xiaonuo Dongye, and Zhenliang Zhang. 2023b. DexHand: dexterous hand manipulation motion synthesis for virtual reality. Virtual Reality (2023), 1--16.
[45]
Haiyan Jiang, Dongdong Weng, Zhenliang Zhang, and Feng Chen. 2019. Hifinger: one-handed text entry technique for virtual environments based on touches between fingers. Sensors 19, 14 (2019), 3063.
[46]
Yang Jin, Linchao Zhu, and Yadong Mu. 2022. Complex video action reasoning via learnable Markov logic network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3232--3241.
[47]
Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2015. Image retrieval using scene graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3668--3678.
[48]
Z Sadeghipour Kermani, Zicheng Liao, Ping Tan, and H Zhang. 2016. Learning 3D scene synthesis from annotated RGB-D images. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 197--206.
[49]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123, 1 (2017), 32--73.
[50]
Arun Kulshreshth and Joseph J LaViola Jr. 2014. Exploring the usefulness of finger-based 3D gesture menu selection. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1093--1102.
[51]
Mikko Kytö, Barrett Ens, Thammathip Piumsomboon, Gun A. Lee, and Mark Billinghurst. 2018. Pinpointing: precise head- and eye-based target selection for augmented reality. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI '18). Association for Computing Machinery, New York, NY, USA, 1--14.
[52]
Joseph J. LaViola. 2015. Context aware 3D gesture recognition for games and virtual reality. In Proceedings of the ACM SIGGRAPH 2015 Courses (Los Angeles, California) (SIGGRAPH '15). Association for Computing Machinery, New York, NY, USA, Article 10, 61 pages.
[53]
Joseph J. LaViola, E. Kruijff, D. Bowman, Ryan P. McMahan, and Ivan P. Poupyrev. 2017. 3D user interfaces: theory and practice. Addison-Wesley Professional, USA.
[54]
Changyang Li, Wanwan Li, Haikun Huang, and Lap-Fai Yu. 2022a. Interactive augmented reality storytelling guided by scene semantics. ACM Trans. Graph. 41, 4, Article 91 (jul 2022), 15 pages.
[55]
Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri, Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen, Daniel Cohen-Or, and Hao Zhang. 2019. GRAINS: generative recursive autoencoders for indoor scenes. ACM Trans. Graph. 38, 2, Article 12 (feb 2019), 16 pages.
[56]
Nianlong Li, Teng Han, Feng Tian, Jin Huang, Minghui Sun, Pourang Irani, and Jason Alexander. 2020. Get a grip: evaluating grip gestures for VR input using a lightweight pen. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--13.
[57]
Rongjie Li, Songyang Zhang, and Xuming He. 2022c. SGTR: end-to-end scene graph generation with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19486--19496.
[58]
Wei Li, Haiwei Zhang, Qijie Bai, Guoqing Zhao, Ning Jiang, and Xiaojie Yuan. 2022d. PPDL: predicate probability distribution based loss for unbiased scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19425--19434.
[59]
Yiming Li, Xiaoshan Yang, and Changsheng Xu. 2022b. Dynamic scene graph generation via anticipatory pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13864--13873.
[60]
Hangxin Liu, Yaofang Zhang, Wenwen Si, Xu Xie, Yixin Zhu, and Song-Chun Zhu. 2018. Interactive robot knowledge patching using augmented reality. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1947--1954.
[61]
Hangxin Liu, Zhenliang Zhang, Xu Xie, Yixin Zhu, Yue Liu, Yongtian Wang, and Song-Chun Zhu. 2019. High-fidelity grasping in virtual reality using a glove-based system. In Preceedings of the International Conference on Robotics and Automation (ICRA). IEEE, 5180--5186.
[62]
Xiaobai Liu, Yibiao Zhao, and Song-chun Zhu. 2014. Single-view 3D scene parsing by attributed grammar. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 684--691.
[63]
Yiqin Lu, Chun Yu, and Yuanchun Shi. 2020. Investigating bubble mechanism for ray-casting to improve 3D target acquisition in virtual reality. In Preceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 35--43.
[64]
Mathias N. Lystbæk, Peter Rosenberg, Ken Pfeuffer, Jens Emil Grønbæk, and Hans Gellersen. 2022. Gaze-Hand alignment: Combining eye gaze and mid-air pointing for interacting with menus in augmented reality. 6, ETRA, Article 145 (may 2022), 18 pages.
[65]
Rui Ma, Akshay Gadi Patil, Matthew Fisher, Manyi Li, Sören Pirk, Binh-Son Hua, Sai-Kit Yeung, Xin Tong, Leonidas Guibas, and Hao Zhang. 2018. Language-driven synthesis of 3D scenes from scene databases. ACM Transactions on Graphics (TOG) 37, 6, Article 212 (dec 2018), 16 pages.
[66]
Microsoft. HoloLens Interaction Model. https://blogs.windows.com/windowsdeveloper/2016/01/21/hololens-interaction-model/.
[67]
Alec G. Moore, John G. Hatch, Stephen Kuehl, and Ryan P. McMahan. 2018. VOTE: a ray-casting study of vote-oriented technique enhancements. International Journal of Human-Computer Studies 120 (2018), 36--48.
[68]
Michael Nebeling, Shwetha Rajaram, Liwei Wu, Yifei Cheng, and Jaylin Herskovitz. 2021. XRStudio: a virtual production and live streaming system for immersive instructional experiences. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 107, 12 pages.
[69]
Oculus. WebXR Hands. https://developer.oculus.com/documentation/web/webxr-hands/.
[70]
Seyoung Park, Bruce Xiaohan Nie, and Song-Chun Zhu. 2018. Attribute And-Or grammar for joint parsing of human pose, parts and attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 7 (2018), 1555--1569.
[71]
Brandon Paulson, Danielle Cummings, and Tracy Hammond. 2011. Object interaction detection using hand posture cues in an office setting. International Journal of Human-Computer Studies 69, 1 (2011), 19--29.
[72]
Siyou Pei, Alexander Chen, Jaewook Lee, and Yang Zhang. 2022. Hand interfaces: using hands to imitate objects in AR/VR for expressive interactions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 429, 16 pages.
[73]
Ken Pfeuffer, Benedikt Mayer, Diako Mardanbegi, and Hans Gellersen. 2017. Gaze+ pinch interaction in virtual reality. In Proceedings of the 5th Symposium on Spatial User Interaction. 99--108.
[74]
Jake Porway, Benjamin Yao, and Song Chun Zhu. 2008. Learning compositional models for object categories from small sample sets. Object Categorization: Computer and Human Vision Perspectives 1 (2008).
[75]
Ivan Poupyrev, Mark Billinghurst, Suzanne Weghorst, and Tadao Ichikawa. 1996. The Go-Go interaction technique: non-linear mapping for direct manipulation in VR. In Proceedings of the 9th Annual ACM Symposium on User Interface Software and Technology (Seattle, Washington, USA) (UIST '96). Association for Computing Machinery, New York, NY, USA, 79--80.
[76]
Siyuan Qi, Siyuan Huang, Ping Wei, and Song-Chun Zhu. 2017. Predicting human activities using stochastic grammar. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1164--1172.
[77]
Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, and Song-Chun Zhu. 2018a. Learning human-object interactions by graph parsing neural networks. In Proceedings of the European Conference on Computer Vision (ECCV). 401--417.
[78]
Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018b. Human-centric indoor scene synthesis using stochastic grammar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5899--5908.
[79]
Gang Ren and Eamonn O'Neill. 2013a. 3D selection with freehand gesture. Computers & Graphics 37, 3 (2013), 101--120.
[80]
Gang Ren and Eamonn O'Neill. 2013b. 3D selection with freehand gesture. Computers & Graphics 37, 3 (2013), 101--120.
[81]
Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied hands: modeling and capturing hands and bodies together. ACM Transactions on Graphics (ToG) 36, 6 (Nov. 2017).
[82]
Yu Rong, Takaaki Shiratori, and Hanbyul Joo. 2021. Frankmocap: a monocular 3D whole-body pose estimation system via regression and integration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1749--1759.
[83]
RootMotion. ADVANCED CHARACTER ANIMATION SYSTEMS - Final IK. http://root-motion.com/.
[84]
Alexander Schäfer, Gerd Reis, and Didier Stricker. 2022. The gesture authoring space: authoring customised hand gestures for grasping virtual objects in immersive virtual environments. In Proceedings of Mensch Und Computer 2022 (Darmstadt, Germany) (MuC '22). Association for Computing Machinery, New York, NY, USA, 85--95.
[85]
Martin Schrepp, Andreas Hinderks, and Jörg Thomaschewski. 2017. Design and evaluation of a short version of the user experience questionnaire (UEQ-S). International Journal of Interactive Multimedia and Artificial Intelligence, 4 (6), 103--108. (2017).
[86]
Tianmin Shu, Xiaofeng Gao, Michael S Ryoo, and Song-Chun Zhu. 2017. Learning social affordance grammar from videos: transferring human interactions to human-robot interactions. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1669--1676.
[87]
Zhangzhang Si, Mingtao Pei, Benjamin Yao, and Song-Chun Zhu. 2011. Unsupervised learning of event and-or grammar and semantics from video. In Proceedings of the International Conference on Computer Vision (ICCV). IEEE, 41--48.
[88]
Zhangzhang Si and Song-Chun Zhu. 2013. Learning and-or templates for object recognition and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 2189--2205.
[89]
Christian Steins, Sean Gustafson, Christian Holz, and Patrick Baudisch. 2013. Imaginary devices: gesture-based interaction mimicking traditional input devices. In Proceedings of the 15th International Conference on Human-Computer Interaction with Mobile Devices and Services (Munich, Germany) (MobileHCI '13). Association for Computing Machinery, New York, NY, USA, 123--126.
[90]
Omid Taheri, Nima Ghorbani, Michael J. Black, and Dimitrios Tzionas. 2020. GRAB: a Dataset of Whole-Body Human Grasping of Objects. In Proceedings of the European Conference on Computer Vision (ECCV). https://grab.is.tue.mpg.de
[91]
Leitian Tao, Li Mi, Nannan Li, Xianhang Cheng, Yaosi Hu, and Zhenzhong Chen. 2022. Predicate correlation learning for scene graph generation. IEEE Transactions on Image Processing 31 (2022), 4173--4185.
[92]
Eugene M Taranta II, Thaddeus K Simons, Rahul Sukthankar, and Joseph J Laviola Jr. 2015. Exploring the benefits of context in 3D gesture recognition for game-based virtual environments. ACM Transactions on Interactive Intelligent Systems (TIIS) 5, 1 (2015), 1--34.
[93]
Santawat Thanyadit, Parinya Punpongsanon, Thammathip Piumsomboon, and Ting-Chuen Pong. 2022. XR-LIVE: enhancing asynchronous shared-space demonstrations with spatial-temporal assistive toolsets for effective learning in immersive virtual laboratories. Proceedings of the ACM on Human Computer Interaction (HCI) 6, CSCW1, Article 136 (apr 2022), 23 pages.
[94]
Huawei Tu, Susu Huang, Jiabin Yuan, Xiangshi Ren, and Feng Tian. 2019. Crossing-based selection with virtual reality head-mounted displays. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--14.
[95]
Thomas Tullis and Jacqueline Stetson. 2006. A comparison of questionnaires for assessing website usability. (06 2006).
[96]
Valve. 2020. Half-Life:ALYX. https://www.half-life.com/zh-cn/alyx/.
[97]
Lode Vanacken, Tovi Grossman, and Karin Coninx. 2007. Exploring the effects of environment density and target visibility on object selection in 3D virtual environments. In Proceedings of the IEEE Symposium on 3D User Interfaces.
[98]
Daniel Vogel and Ravin Balakrishnan. 2005. Distant freehand pointing and clicking on very large, high resolution displays. In Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology (Seattle, WA, USA) (UIST '05). Association for Computing Machinery, New York, NY, USA, 33--42.
[99]
Julie Wagner, Eric Lecolinet, and Ted Selker. 2014. Multi-finger chords for handheld tablets: recognizable and memorable. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI '14). Association for Computing Machinery, New York, NY, USA, 2883--2892.
[100]
Uta Wagner, Mathias N Lystbæk, Pavel Manakhov, Jens Emil Sloth Grønbæk, Ken Pfeuffer, and Hans Gellersen. 2023. A Fitts' law study of gaze-hand alignment for selection in 3D User interfaces. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1--15.
[101]
Tianyi Wang, Xun Qian, Fengming He, Xiyun Hu, Yuanzhi Cao, and Karthik Ramani. 2021. GesturAR: an authoring system for creating freehand interactive augmented reality applications. In Preceedings of the 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST '21). Association for Computing Machinery, New York, NY, USA, 552--567.
[102]
Yanbin Wang, Yi zhou Hu, and Yu Chen. 2020. An experimental investigation of menu selection for immersive virtual environments: fixed versus handheld menus. Virtual Reality 25 (2020), 409--419.
[103]
Ping Wei, Yibiao Zhao, Nanning Zheng, and Song-Chun Zhu. 2013. Modeling 4d human-object interactions for event and object recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 3272--3279.
[104]
René Weller, Waldemar Wegele, Christoph Schröder, and Gabriel Zachmann. 2021. LenSelect: object selection in virtual environments by dynamic object scaling. In Frontiers in Virtual Reality.
[105]
Jacob O Wobbrock, Meredith Ringel Morris, and Andrew D Wilson. 2009. User-defined gestures for surface computing. In Proceedings of the SIGCHI conference on human factors in computing systems. 1083--1092.
[106]
Haijun Xia, Michael Glueck, Michelle Annett, Michael Wang, and Daniel Wigdor. 2022. Iteratively designing gesture vocabularies: a survey and analysis of best practices in the HCI literature. ACM Transactions on Computer-Human Interaction 29, 4, Article 37 (may 2022), 54 pages.
[107]
Xu Xie, Hangxin Liu, Mark Edmonds, Feng Gaol, Siyuan Qi, Yixin Zhu, Brandon Rothrock, and Song-Chun Zhu. 2018. Unsupervised learning of hierarchical models for hand-object interactions. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4097--4102.
[108]
Xu Xie, Hangxin Liu, Zhenliang Zhang, Yuxing Qiu, Feng Gao, Siyuan Qi, Yixin Zhu, and Song-Chun Zhu. 2019. Vrgym: A virtual testbed for physical and interactive ai. In Proceedings of the ACM Turing Celebration Conference-China. 1--6.
[109]
Danfei Xu, Yuke Zhu, Christopher B Choy, and Li Fei-Fei. 2017. Scene graph generation by iterative message passing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5410--5419.
[110]
Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, and Shi-Min Hu. 2013. Sketch2Scene: sketch-based co-retrieval and co-placement of 3D models. ACM Trans. Graph. 32, 4, Article 123 (jul 2013), 15 pages.
[111]
Yuanlu Xu, Wenguan Wang, Tengyu Liu, Xiaobai Liu, Jianwen Xie, and Song-Chun Zhu. 2021. Monocular 3d pose estimation via pose grammar and data augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[112]
Yukang Yan, Chun Yu, Xiaojuan Ma, Xin Yi, Ke Sun, and Yuanchun Shi. 2018. VirtualGrasp: leveraging experience of interacting with physical objects to facilitate digital object retrieval. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI '18). Association for Computing Machinery, New York, NY, USA, 1--13.
[113]
Chiao-An Yang, Cheng-Yo Tan, Wan-Cyuan Fan, Cheng-Fu Yang, Meng-Lin Wu, and Yu-Chiang Frank Wang. 2022b. Scene graph expansion for semantics-guided image outpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15617--15626.
[114]
Lixin Yang, Kailin Li, Xinyu Zhan, Fei Wu, Anran Xu, Liu Liu, and Cewu Lu. 2022a. OakInk: a large-scale knowledge repository for understanding hand-object interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 20953--20962.
[115]
Bangguo Yu, Chongyu Chen, Fengyu Zhou, Fang Wan, Wenmi Zhuang, and Yang Zhao. 2020a. A bottom-up framework for construction of structured semantic 3D scene graph. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 8224--8230.
[116]
Difeng Yu, Qiushi Zhou, Joshua Newn, Tilman Dingler, Eduardo Velloso, and Jorge Goncalves. 2020b. Fully-occluded target selection in virtual reality. IEEE Transactions on Visualization and Computer Graphics 26, 12 (2020), 3402--3413.
[117]
Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, and Song-Chun Zhu. 2020. Joint inference of states, robot knowledge, and human (false-) beliefs. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5972--5978.
[118]
Mingrui Ray Zhang, Shumin Zhai, and Jacob O. Wobbrock. 2022. TypeAnywhere: a QWERTY-based text entry solution for ubiquitous computing. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 339, 16 pages.
[119]
Song-Hai Zhang, Shao-Kui Zhang, Wei-Yu Xie, Cheng-Yang Luo, Yong-Liang Yang, and Hongbo Fu. 2021. Fast 3D indoor scene synthesis by learning spatial relation priors of objects. IEEE Transactions on Visualization and Computer Graphics 28, 9 (2021), 3082--3092.
[120]
Zhenliang Zhang, Benyang Cao, Dongdong Weng, Yue Liu, Yongtian Wang, and Hua Huang. 2018. Evaluation of hand-based interaction for near-field mixed reality with optical see-through head-mounted displays. In Preceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 739--740.
[121]
Zhenliang Zhang, Jie Guo, Dongdong Weng, Yue Liu, and Yongtian Wang. 2020a. Extracting and transferring hierarchical knowledge to robots using virtual reality. In Preceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 668--669.
[122]
Zhenliang Zhang, Cong Wang, Dongdong Weng, Yue Liu, and Yongtian Wang. 2019a. Symmetrical reality: toward a unified framework for physical and virtual reality. In Preceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 1275--1276.
[123]
Zhenliang Zhang, Dongdong Weng, Jie Guo, Yue Liu, and Yongtian Wang. 2019b. Toward an efficient hybrid interaction paradigm for object manipulation in optical see-through mixed reality. In Preceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3216--3221.
[124]
Zhenliang Zhang, Yixin Zhu, and Song-Chun Zhu. 2020b. Graph-based hierarchical knowledge representation for robot task transfer from virtual to physical world. In Preceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11139--11145.
[125]
Yibiao Zhao and Song-Chun Zhu. 2013. Scene parsing by integrating function, geometry and appearance models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3119--3126.
[126]
Song-Chun Zhu, David Mumford, et al. 2007. A stochastic grammar of images. Foundations and Trends® in Computer Graphics and Vision 2, 4 (2007), 259--362.

Cited By

View all
  • (2024)Low-light Video Enhancement with Conditional Diffusion Models and Wavelet Interscale AttentionsProceedings of 21st ACM SIGGRAPH Conference on Visual Media Production10.1145/3697294.3697304(1-10)Online publication date: 18-Nov-2024
  • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
  • (2024)Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image EnhancementProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681580(1534-1543)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 42, Issue 6
December 2023
1565 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3632123
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023
Published in TOG Volume 42, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. commonsense knowledge
  2. context-aware
  3. gesture
  4. joint reasoning
  5. object retrieval
  6. object selection
  7. virtual reality

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)250
  • Downloads (Last 6 weeks)25
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Low-light Video Enhancement with Conditional Diffusion Models and Wavelet Interscale AttentionsProceedings of 21st ACM SIGGRAPH Conference on Visual Media Production10.1145/3697294.3697304(1-10)Online publication date: 18-Nov-2024
  • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
  • (2024)Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image EnhancementProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681580(1534-1543)Online publication date: 28-Oct-2024
  • (2024)Diffusion Posterior Proximal Sampling for Image RestorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681556(214-223)Online publication date: 28-Oct-2024
  • (2024)DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffusion PriorProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681064(8546-8555)Online publication date: 28-Oct-2024
  • (2024)JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image EnhancementProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680876(1810-1818)Online publication date: 28-Oct-2024
  • (2024)NICER: A New and Improved Consumed Endurance and Recovery Metric to Quantify Muscle Fatigue of Mid-Air InteractionsACM Transactions on Graphics10.1145/365823043:4(1-14)Online publication date: 19-Jul-2024
  • (2024)Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic SolidsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657402(1-11)Online publication date: 13-Jul-2024
  • (2024)PSC diffusion: patch-based simplified conditional diffusion model for low-light image enhancementMultimedia Systems10.1007/s00530-024-01391-z30:4Online publication date: 21-Jun-2024
  • (2023)Interactive NeRF Geometry Editing With Shape PriorsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.331506845:12(14821-14837)Online publication date: 15-Sep-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media