Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3488162.3488218acmotherconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Parallax Engine: Head Controlled Motion Parallax Using Notebooks’ RGB Camera

Published: 03 January 2022 Publication History

Abstract

Research on the Fish Tank Virtual Reality (FTVR) technique commonly uses specific sensors (e.g. infrared cameras and LEDs on glasses) to estimate user’s eye position. However, estimating the face position with an RGB camera is becoming more accessible. In this work, we explore community available face characteristics detection software to implement the FTVR technique for everyday uses of 3D-enabled applications on consumer notebooks without requiring extra devices. We introduce the Parallax Engine solution that can be added with ease to any Unity game engine application. The solution supports two parallax-related visualization options: 1) a monoscopic FTVR mode (FishTank), which locks the virtual camera of the 3D environment to the laptop’s screen 2) and a 2D parallax mode (Parallax2DoF), which allows horizontal and vertical displacement of 3D scene camera. Regarding face characteristics detection techniques, the Parallax Engine uses a standardized interface that can receive input from different methods and currently supports three options: Google’s MediaPipe, dlib, and PoseNet. We evaluated the proposed solution with five users, performing tasks using different options for viewing and face characteristics detection, aiming to understand how suitable it is for end-users. Besides some detection failures from dlib, results showed an overall good acceptance for both the FishTank and Parallax2DoF visualization options.

Supplementary Material

MP4 File (short_video_ParallaxEngineHeadControlledMotionParallaxUsingNotebooksRGBCamera.mp4)
Supplemental video

References

[1]
Brian Amberg, Reinhard Knothe, and Thomas Vetter. 2008. Expression invariant 3D face recognition with a morphable model. In 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition. IEEE, 1–6.
[2]
Volker Blanz, Kristina Scherbaum, and Hans-Peter Seidel. 2007. Fitting a morphable model to 3D scans of faces. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1–8.
[3]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187–194.
[4]
Timo Bolkart and Stefanie Wuhrer. 2013. Statistical analysis of 3d faces in motion. In 2013 International conference on 3D vision-3DV 2013. IEEE, 103–110.
[5]
Mark F Bradshaw, Andrew D Parton, and Andrew Glennerster. 2000. The task-dependent use of binocular disparity and motion parallax information. Vision Research 40, 27 (2000), 3725–3734. https://doi.org/10.1016/S0042-6989(00)00214-5
[6]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008(2018).
[7]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, and Jian Yang. 2020. Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 7(2020), 1654–1669. https://doi.org/10.1109/TPAMI.2019.2901875
[9]
Neil Dodgson. 2004. Variation and extrema of human interpupillary distance. Stereoscopic Displays and Virtual Reality Syst XI 5291, 36–46. https://doi.org/10.1117/12.529999
[10]
Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Style Aggregated Network for Facial Landmark Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12]
Dylan Fafard, Ian Stavness, Martin Dechant, Regan Mandryk, Qian Zhou, and Sidney Fels. 2019. Ftvr in vr: Evaluation of 3d perception with a simulated volumetric fish-tank virtual reality display. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
[13]
Dylan Brodie Fafard, Qian Zhou, Chris Chamberlain, Georg Hagemann, Sidney Fels, and Ian Stavness. 2018. Design and implementation of a multi-person fish-tank virtual reality display. In Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. 1–9.
[14]
Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2020. Learning an animatable detailed 3D face model from in-the-wild images. arXiv preprint arXiv:2012.04012(2020).
[15]
Lucas S Figueiredo, Edvar Vilar Neto, Ermano Arruda, João Marcelo Teixeira, and Veronica Teichrieb. 2014. Fishtank everywhere: Improving viewing experience over 3D content. In International Conference of Design, User Experience, and Usability. Springer, 560–571.
[16]
James J. Gibson. 1979. The Ecological Approach to Visual Perception (1st ed.). Houghton Mifflin, Boston. 346 pages.
[17]
Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z Li. 2020. Towards fast, accurate and stable 3d dense face alignment. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. Springer, 152–168.
[18]
Harvey J Howard. 1919. A test for the judgment of distance. Transactions of the American Ophthalmological Society 17 (1919), 195.
[19]
Thibaut Jacob, Gilles Bailly, Eric Lecolinet, Géry Casiez, and Marc Teyssier. 2016. Desktop orbital camera motions using rotational head movements. In Proceedings of the 2016 Symposium on Spatial User Interaction. 139–148.
[20]
Youngkyoon Jang, Hatice Gunes, and Ioannis Patras. 2019. Registration-free face-ssd: Single shot analysis of smiles, facial attributes, and affect in the wild. Computer Vision and Image Understanding 182 (2019), 17–29.
[21]
Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, and Ping Luo. 2020. Differentiable hierarchical graph grouping for multi-person pose estimation. In European Conference on Computer Vision. Springer, 718–734.
[22]
Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. 2020. Whole-body human pose estimation in the wild. In European Conference on Computer Vision. Springer, 196–214.
[23]
Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann. 2019. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv preprint arXiv:1907.06724(2019).
[24]
Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann. 2019. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv preprint arXiv:1907.06724(2019).
[25]
Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1867–1874.
[26]
Petr Kellnhofer, Piotr Didyk, Tobias Ritschel, Belen Masia, Karol Myszkowski, and Hans-Peter Seidel. 2016. Motion parallax in stereo 3D: Model and applications. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1–12.
[27]
Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755–1758.
[28]
Sirisilp Kongsilp and Matthew N Dailey. 2017. Communication portals: Immersive communication for everyday life. In 2017 20th Conference on Innovations in Clouds, Internet and Networks (ICIN). IEEE, 226–228.
[29]
Sirisilp Kongsilp and Matthew N Dailey. 2017. Motion parallax from head movement enhances stereoscopic displays by improving presence and decreasing visual fatigue. Displays 49(2017), 72–79.
[30]
Sirisilp Kongsilp and Matthew N Dailey. 2020. User Behavior and the Importance of Stereo for Depth Perception in Fish Tank Virtual Reality. PRESENCE: Virtual and Augmented Reality 27, 2 (2020), 206–225.
[31]
Alexandros Lattas, Stylianos Moschoglou, Baris Gecer, Stylianos Ploumpis, Vasileios Triantafyllou, Abhijeet Ghosh, and Stefanos Zafeiriou. 2020. AvatarMe: Realistically Renderable 3D Facial Reconstruction” In-the-Wild”. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 760–769.
[32]
Jia Li, Wen Su, and Zengfu Wang. 2020. Simple pose: Rethinking and improving a bottom-up approach for multi-person pose estimation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 11354–11361.
[33]
Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, and Cewu Lu. 2020. Detailed 2d-3d joint representation for human-object interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10166–10175.
[34]
Jiangke Lin, Yi Yuan, and Zhengxia Zou. 2021. MeInGame: Create a Game Character Face from a Single Portrait. arXiv preprint arXiv:2102.02371(2021).
[35]
Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, 2019. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172(2019).
[36]
George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, and Kevin Murphy. 2018. PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. arxiv:1803.08225 [cs.CV]
[37]
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards Accurate Multi-person Pose Estimation in the Wild. arxiv:1701.01779 [cs.CV]
[38]
Ankur Patel and William AP Smith. 2009. 3d morphable face models revisited. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1327–1334.
[39]
Eduardo Rodrigues, Lucas Silva Figueiredo, Lucas Maggi, Edvar Neto, Layon Tavares Bezerra, João Marcelo Teixeira, and Veronica Teichrieb. 2017. Mixed Reality TVs: Applying Motion Parallax for Enhanced Viewing and Control Experiences on Consumer TVs. In 2017 19th Symposium on Virtual and Augmented Reality (SVR). IEEE, 319–330.
[40]
Eduardo Rodrigues, Lucas Silva Figueiredo, Lucas Maggi, Edvar Neto, Layon Tavares Bezerra, João Marcelo Teixeira, and Veronica Teichrieb. 2017. Mixed Reality TVs: Applying Motion Parallax for Enhanced Viewing and Control Experiences on Consumer TVs. In 2017 19th Symposium on Virtual and Augmented Reality (SVR). 319–330. https://doi.org/10.1109/SVR.2017.48
[41]
Jiaxiang Shang, Tianwei Shen, Shiwei Li, Lei Zhou, Mingmin Zhen, Tian Fang, and Long Quan. 2020. Self-supervised monocular 3d face reconstruction by occlusion-aware multi-view geometry consistency. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer, 53–70.
[42]
Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.
[43]
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1653–1660.
[44]
M Alex O Vasilescu and Demetri Terzopoulos. 2002. Multilinear analysis of image ensembles: Tensorfaces. In European conference on computer vision. Springer, 447–460.
[45]
Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovic. 2006. Face transfer with multilinear models. In ACM SIGGRAPH 2006 Courses. 24–es.
[46]
Collin Ware, Kevin Arthur, and Kellogg S. Booth. 1993. Fish tank virtual reality. In Conference on Human Factors in Computing Systems - Proceedings. 37–42. https://doi.org/10.1145/169059.169066
[47]
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4724–4732.
[48]
E Wright, PE Connolly, M Sackley, J McCollom, S Malek, K Fan, 2012. A comparative analysis of Fish Tank Virtual Reality to stereoscopic 3D imagery. In 67th Midyear Meeting Proceedings. 37–45.
[49]
Yue Wu, Chao Gou, and Qiang Ji. 2017. Simultaneous Facial Landmark Detection, Pose and Deformation Estimation Under Facial Occlusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50]
Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7093–7102.
[51]
Jialiang Zhang, Lixiang Lin, Jianke Zhu, and Steven CH Hoi. 2021. Weakly-Supervised Multi-Face 3D Reconstruction. arXiv preprint arXiv:2101.02000(2021).
[52]
Meilu Zhu, Daming Shi, Mingjie Zheng, and Muhammad Sadiq. 2019. Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53]
Xiangyu Zhu, Zhen Lei, Junjie Yan, Dong Yi, and Stan Z. Li. 2015. High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54]
Xu Zou, Sheng Zhong, Luxin Yan, Xiangyun Zhao, Jiahuan Zhou, and Ying Wu. 2019. Learning Robust Facial Landmark Detection via Hierarchical Structured Ensemble. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

Cited By

View all
  • (2023)Digital Wah-Wah Guitar Effect Controlled by Mouth MovementsComputer Vision and Graphics10.1007/978-3-031-22025-8_3(31-39)Online publication date: 11-Feb-2023

Index Terms

  1. Parallax Engine: Head Controlled Motion Parallax Using Notebooks’ RGB Camera
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        SVR '21: Proceedings of the 23rd Symposium on Virtual and Augmented Reality
        October 2021
        196 pages
        ISBN:9781450395526
        DOI:10.1145/3488162
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 03 January 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Fish tank virtual reality
        2. face characteristics detection
        3. motion parallax

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        SVR'21
        SVR'21: Symposium on Virtual and Augmented Reality
        October 18 - 21, 2021
        Virtual Event, Brazil

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)27
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 22 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Digital Wah-Wah Guitar Effect Controlled by Mouth MovementsComputer Vision and Graphics10.1007/978-3-031-22025-8_3(31-39)Online publication date: 11-Feb-2023

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media