Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3657242.3657244acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinteraccionConference Proceedingsconference-collections
research-article

Benchmarking analysis of human pose estimation solutions for virtual television sets

Published: 19 June 2024 Publication History

Abstract

In recent years, the use of virtual television sets (VTS) has grown in traditional TV productions, online broadcasting shows and streaming for both professional and amateur applications. Nevertheless, the interaction between actor or television anchors and the virtual scene is very limited because human body tracking is a complex problem that requires expensive equipment and high-performance software to be developed in real time. On the other hand, Human Pose Estimation (HPE) by low-cost devices, has been a hot topic of research due to its wide range of applications from sport visualization, security, medicine and so on. The objective of this paper is to determine if the modern technologies of human pose estimation can be used as interface between users, actors, presenters or speakers and a scene in a VTS. A comprehensive comparative of the different technologies is developed to determine those solutions that can be used in VTS for broadcasting and streaming, allowing to improve the communicative capacities of the modern VTS.

References

[1]
Ivan A. Aguilar, Antonio C. Sementille, and Silvio R.R. Sanches. 2019. ARStudio: A low-cost virtual studio based on Augmented Reality for video production. Multimed Tools Appl 78, 23 (2019). https://doi.org/10.1007/s11042-019-08064-4
[2]
[Brainstorm. 2022. Brainstorm. https://www.brainstorm3d.com.
[3]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih En Wei, and Yaser Sheikh. 2021. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans Pattern Anal Mach Intell 43, 1 (2021). https://doi.org/10.1109/TPAMI.2019.2929257
[4]
Yu Chen, Chunhua Shen, Xiu Shen Wei, Lingqiao Liu, and Jian Yang. 2017. Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation. In Proceedings of the IEEE International Conference on Computer Vision, (2017) https://doi.org/10.1109/ICCV.2017.137
[5]
Hyunwoo Cho, Sung Uk Jung, and Hyung Keun Jee. 2017. Real-time interactive AR system for broadcasting. In Proceedings - IEEE Virtual Reality, (2017) https://doi.org/10.1109/VR.2017.7892322
[6]
Qi Dang, Jianqin Yin, Bin Wang, and Wenqing Zheng. 2019. Deep learning based 2D human pose estimation: A survey. Tsinghua Sci Technol 24, 6 (2019). https://doi.org/10.26599/TST.2018.9010100
[7]
Google MediaPipe. 2023. MediaPipe. . https://mediapipe.dev/
[8]
Timothee De Goussencourt and Pascal Bertolino. 2015. Using the unity® game engine as a platform for advanced real time cinema image processing. In Proceedings - International Conference on Image Processing, ICIP, (2015) https://doi.org/10.1109/ICIP.2015.7351586
[9]
Daniel Groos, Heri Ramampiaro, and Espen Af Ihlen. 2021. EfficientPose: Scalable single-person pose estimation. Applied Intelligence 51, 4 (2021). https://doi.org/10.1007/s10489-020-01918-7
[10]
Thomas Hach, Pablo Arias, Carles Bosch, Javier Montesa, and Pablo Gasco. 2017. Seamless 3D Interaction of Virtual and Real Objects in Professional Virtual Studios. SMPTE Motion Imaging J 126, 1 (2017). https://doi.org/10.5594/JMI.2016.2632398
[11]
Ginés Hidalgo Martínez, Yaser Sheikh, Kris Kitani, and Aayush Bansal. 2019. OpenPose: Whole-Body Pose Estimation. Master Thesis April (2019).
[12]
Paul Kruszewski and Thomas Jan Mahamad. 2018. The AI Powered Magic Mirror: Building Immersive AR/VR Experiences with Only Webcams and Deep Learning. In ACM SIGGRAPH 2018 Virtual, Augmented, and Mixed Reality, (2018)
[13]
MediaPipe. 2023. Pose - MediaPipe. https://google.github.io/mediapipe/solutions/pose .
[14]
Michael Meehan, Sharif Razzaque, Brent Insko, Mary Whitton, and Frederick P. Brooks. 2005. Review of four studies on the use of physiological reaction as a measure of presence in stressful virtual environments. Applied Psychophysiology Biofeedback 30, 3 (2005). https://doi.org/10.1007/s10484-005-6381-3
[15]
Michael Meehan, Sharif Razzaque, Mary C. Whitton, and Frederick P. Brooks. 2003. Effect of latency on presence in stressful virtual environments. In Proceedings - IEEE Virtual Reality, (2003) https://doi.org/10.1109/VR.2003.1191132
[16]
Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2018. Monocular 3D human pose estimation in the wild using improved CNN supervision. In Proceedings - 2017 International Conference on 3D Vision, 3DV 2017, (2018). https://doi.org/10.1109/3DV.2017.00064
[17]
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. In ACM Transactions on Graphics, (2017) https://doi.org/10.1145/3072959.3073596
[18]
Roi Méndez, Julián Flores, Enrique Castelló, and Jose R.R. Viqueira. 2019. Natural interaction in virtual TV sets through the synergistic operation of low-cost sensors. Univers Access Inf Soc 18, 1 (2019). https://doi.org/10.1007/s10209-017-0586-0
[19]
Nobuyasu Nakano, Tetsuro Sakura, Kazuhiro Ueda, Leon Omura, Arata Kimura, Yoichi Iino, Senshi Fukashiro, and Shinsuke Yoshioka. 2020. Evaluation of 3D Markerless Motion Capture Accuracy Using OpenPose With Multiple Video Cameras. Front Sports Act Living 2, (2020). https://doi.org/10.3389/fspor.2020.00050
[20]
Farzan Majeed Noori, Benedikte Wallace, Md Zia Uddin, and Jim Torresen. 2019. A Robust Human Activity Recognition Approach Using OpenPose, Motion Features, and Deep Recurrent Neural Network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2019) https://doi.org/10.1007/978-3-030-20205-7_25
[21]
George Papandreou, Tyler Zhu, Liang Chieh Chen, Spyros Gidaris, Jonathan Tompson, and Kevin Murphy. 2018. Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2018). https://doi.org/10.1007/978-3-030-01264-9_17
[22]
M., Jaksic, B., Spalevic, P., Petrovic, I., Dakovic Petrovic, B Jaksic, P Spalevic, I. Petrovic, and Dakovic V. 2012. The analysis background on the effect of chroma-key in virtual tv studio. INFOTECH 12 (2012), 973–941.
[23]
M. J. Schuemie, P. Van der Straaten, M. Krijn, and C. A.P.G. Van der Mast. 2001. Research on presence in virtual reality: A survey. Cyberpsychology and Behavior 4. https://doi.org/10.1089/109493101300117884
[24]
S. Shimoda, M. Hayashi, and Y. Kanatsugu. 1989. New chroma-key imagining technique with Hi-Vision background. IEEE Transactions on Broadcasting 35, 4 (1989). https://doi.org/10.1109/11.40835
[25]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2015). . https://doi.org/10.1109/CVPR.2015.7298594
[26]
Town of Ocean City. 2023. Patrol Semaphore. https://oceancitymd.gov/pdf/ocbpsemaphore.pdf.
[27]
Alfred N.; Goldsborough T.R. Goldsmith. 1937. US2073370A Television system – Google Patents.
[28]
Andrew Wojdala. 1998. Challenges of virtual set technology. IEEE Multimedia 5, 1 (1998). https://doi.org/10.1109/93.664742
[29]
WU Y. (2022). Detectron2. https://github.com/facebookresearch/detectron2.
[30]
Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. 2019. Pose flow: Efficient online pose tracking. In British Machine Vision Conference 2018, BMVC 2018,
[31]
Tyler Zhu, Per Karlsson, and Christoph Bregler. 2020. SimPose: Effectively Learning DensePose and Surface Normals of People from Simulated Data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2020). . https://doi.org/10.1007/978-3-030-58526-6_14.
[32]
R. Méndez, J. Flores, E. Castelló, and J. R. R. Viqueira, 2019 “Natural interaction in virtual TV sets through the synergistic operation of low-cost sensors,” Univers. Access Inf. Soc., 18,17-29

Index Terms

  1. Benchmarking analysis of human pose estimation solutions for virtual television sets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    Interacción '24: Proceedings of the XXIV International Conference on Human Computer Interaction
    June 2024
    155 pages
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Benchmarking
    2. broadcast
    3. human computer interaction
    4. human pose estimation
    5. virtual television sets

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    INTERACCION 2024

    Acceptance Rates

    Overall Acceptance Rate 109 of 163 submissions, 67%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 13
      Total Downloads
    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media