Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3664647.3680964acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Subjective and Objective Quality-of-Experience Assessment for 3D Talking Heads

Published: 28 October 2024 Publication History

Abstract

In recent years, immersive communication has emerged as a compelling alternative to traditional video communication methods. One prospective avenue for immersive communication involves augmenting the user's immersive experience through the transmission of three-dimensional (3D) talking heads (THs). However, transmitting 3D THs poses significant challenges due to its complex and voluminous nature, often leading to pronounced distortion and a compromised user experience. Addressing this challenge, we introduce the 3D Talking Heads Quality Assessment (THQA-3D) dataset, comprising 1,000 sets of distorted and 50 original TH mesh sequences (MSs), to facilitate quality assessment in 3D TH transmission. A subjective experiment, characterized by a novel interactive approach, is conducted with recruited participants to assess the quality of MSs in THQA-3D dataset. Leveraging this dataset, we also propose a multimodal Quality-of-Experience (QoE) method incorporating a Large Quality Model (LQM). This method involves frontal projection of MSs and subsequent rendering into videos, with quality assessment facilitated by the LQM and a variable-length video memory filter (VVMF). Additionally, tone-lip coherence and silence detection techniques are employed to characterize audio-visual coherence in 3D MS streams. Experimental evaluation demonstrates the proposed method's superiority, achieving state-of-the-art performance on the THQA-3D dataset and competitiveness on other QoE datasets. Both the THQA-3D dataset and the QoE model have been publicly released at https://github.com/zyj-2000/THQA-3D

References

[1]
Ali Ak, Emin Zerman, Maurice Quach, Aladine Chetouani, Aljosa Smolic, Giuseppe Valenzise, and Patrick Le Callet. 2024. BASICS: Broad Quality Assessment of Static Point Clouds in a Compression Scenario. IEEE Transactions on Multimedia (2024).
[2]
Evangelos Alexiou, Yana Nehmé, Emin Zerman, Irene Viola, Guillaume Lavoué, Ali Ak, Aljosa Smolic, Patrick Le Callet, and Pablo Cesar. 2023. Subjective and objective quality assessment for volumetric video. In Immersive Video Technologies. Elsevier, 501--552.
[3]
K Brandenburg and H Popp. 2000. MPEG layer-3. EBU Technical review (2000), 1--15.
[4]
RECOMMENDATION ITU-R BT. 2002. Methodology for the subjective assessment of the quality of television pictures. International Telecommunication Union (2002).
[5]
Shi Chen, Zicheng Zhang, Yingjie Zhou, Wei Sun, and Xiongkuo Min. 2023. A no-reference quality assessment metric for dynamic 3D digital human. Displays 80 (2023), 102540.
[6]
Joon Son Chung and Andrew Zisserman. 2017. Out of time: automated lip sync in the wild. In Computer Vision?ACCV 2016 Workshops: ACCV 2016 International Workshops, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part II 13. Springer, 251--263.
[7]
Paolo Cignoni, Claudio Rocchini, and Roberto Scopigno. 1998. Metro: measuring error on simplified surfaces. In Computer graphics forum, Vol. 17. Wiley Online Library, 167--174.
[8]
Sue VG Cobb, Sarah Nichols, Amanda Ramsey, and John R Wilson. 1999. Virtual reality-induced symptoms and effects (VRISE). Presence: Teleoperators & Virtual Environments 8, 2 (1999), 169--186.
[9]
Zhengfang Duanmu, Abdul Rehman, and Zhou Wang. 2018. A Quality-of-Experience Database for Adaptive Video Streaming. IEEE Transactions on Broadcasting 64, 2 (June 2018), 474--487.
[10]
Hermann Ebbinghaus. 1913. A contribution to experimental psychology. New York, NY: Teachers College, Columbia University (1913).
[11]
Sebastian Emmot, Maciej Pedzisz, David Lindero, and Samuel Chambers. 2019. No-reference pixel-based video quality estimation algorithm. Technical Report. International Telecommunication Union. https://www.itu.int/itu-t/workprog/wp_item.aspx?isn=16450
[12]
Simon NB Gunkel, Sylvie Dijkstra-Soudarissanane, Hans M Stokking, and Omar A Niamut. 2023. From 2d to 3D video conferencing: Modular RGB-D capture and reconstruction for interactive natural user representations in immersive extended reality (XR) communication. Frontiers in Signal Processing 3 (2023), 1139897.
[13]
Tobias Hoßfeld, Raimund Schatz, Ernst Biersack, and Louis Plissonneau. 2013. Internet video delivery in YouTube: From traffic measurements to quality of experience. Data Traffic Monitoring and Analysis: From Measurement, Classification, and Anomaly Detection to Quality of Experience (2013), 264--301.
[14]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
[15]
Jari Korhonen. 2019. Two-level approach for no-reference consumer video quality assessment. IEEE Transactions on Image Processing 28, 12 (2019), 5923--5938.
[16]
S Shunmuga Krishnan and Ramesh K Sitaraman. 2012. Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. In Proceedings of the 2012 Internet Measurement Conference. 211--224.
[17]
Mohamed Aymen Labiod, Mohamed Gharbi, François-Xavier Coudoux, Patrick Corlay, and Noureddine Doghmane. 2019. Enhanced adaptive cross-layer scheme for low latency HEVC streaming over Vehicular Ad-hoc Networks (VANETs). Vehicular communications 15 (2019), 28--39.
[18]
Bowen Li, Weixia Zhang, Meng Tian, Guangtao Zhai, and Xianpei Wang. 2022. Blindly Assess Quality of In-the-Wild Videos via Quality-aware Pre-training and Motion Perception. IEEE TCSVT 32, 9 (2022), 5944--5958.
[19]
Dingquan Li, Tingting Jiang, and Ming Jiang. 2019. Quality assessment of in-the- wild videos. In ACM International Conference on Multimedia. 2351--2359.
[20]
Z Li, A Aaron, I Katsavounidis, A Moorthy, and M Manohara. 2018. The NETFLIX tech blog: Toward a practical perceptual video quality metric.
[21]
Qi Liu, Honglei Su, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. 2022. Perceptual quality assessment of colored 3D point clouds. IEEE Transactions on Visualization and Computer Graphics (2022).
[22]
Xi Liu, Florin Dobrian, Henry Milner, Junchen Jiang, Vyas Sekar, Ion Stoica, and Hui Zhang. 2012. A case for a coordinated internet video control plane. In ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication. 359--370.
[23]
Yipeng Liu, Qi Yang, Yiling Xu, and Le Yang. 2023. Point cloud quality assessment: Dataset construction and learning-based no-reference metric. ACM Transactions on Multimedia Computing, Communications and Applications 19, 2s (2023), 1--26.
[24]
J Mitchell. 1992. Digital compression and coding of continuous-tone still images: Requirements and guidelines. ITU-T Recommendation T 81 (1992).
[25]
Anish Mittal, Michele A Saad, and Alan C Bovik. 2015. A completely blind video integrity oracle. IEEE Transactions on Image Processing 25, 1 (2015), 289--300.
[26]
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. 2012. Making a ?completely blind? image quality analyzer. IEEE SPL 20, 3 (2012), 209--212.
[27]
Ricky KP Mok, Xiapu Luo, Edmond WW Chan, and Rocky KC Chang. 2012. QDASH: a QoE-aware DASH system. In ACM Multimedia Systems Conference. 11--22.
[28]
Brid O'Conaill, Steve Whittaker, and Sylvia Wilbur. 1993. Conversations over video conferences: An evaluation of the spoken aspects of video-mediated communication. Human-computer interaction 8, 4 (1993), 389--428.
[29]
Tetsuro Ogi, Toshio Yamada, Ken Tamagawa, Makoto Kano, and Michitaka Hirose. 2001. Immersive telecommunication using stereo video avatar. In Proceedings IEEE Virtual Reality 2001. IEEE, 45--51.
[30]
Jiwoo Park, Minsu Kim, and Kwangsue Chung. 2021. Buffer-based rate adaptation scheme for HTTP video streaming with consistent quality. Computer Science and Information Systems 18, 4 (2021), 1139--1157.
[31]
Maurice Quach, Giuseppe Valenzise, and Frederic Dufaux. 2020. Improved deep point cloud geometry compression. In 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1--6.
[32]
Alexander Raake, Marie-Neige Garcia, Werner Robitza, Peter List, Steve Göring, and Bernhard Feiten. 2017. A bitstream-based, scalable video-quality model for HTTP adaptive streaming: ITU-T P.1203.1. In Ninth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, Erfurt. https://doi.org/10.1109/QoMEX.2017.7965631
[33]
Werner Robitza, Steve Göring, Alexander Raake, David Lindegren, Gunnar Heikkilä, Jörgen Gustafsson, Peter List, Bernhard Feiten, Ulf Wüstenhagen, Marie-Neige Garcia, Kazuhisa Yamagishi, and Simon Broom. 2018. HTTP Adaptive Streaming QoE Estimation with ITU-T Rec. P.1203 ' Open Databases and Software. In 9th ACM Multimedia Systems Conference. Amsterdam. https: //doi.org/10.1145/3204949.3208124
[34]
Demostenes Z Rodriguez, Julia Abrahao, Dante C Begazo, Renata L Rosa, and Graca Bressan. 2012. Quality metric to assess video streaming service over TCP considering temporal location of pauses. IEEE Transactions on Consumer Electronics 58, 3 (2012), 985--992.
[35]
Pablo Salva-Garcia, Jose M Alcaraz-Calero, Qi Wang, Miguel Arevalillo-Herraez, and Jorge Bernal Bernabe. 2020. Scalable virtual network video-optimizer for adaptive real-time video transmission in 5G networks. IEEE Transactions on Network and Service Management 17, 2 (2020), 1068--1081.
[36]
Michael Seufert, Sebastian Egger, Martin Slanina, Thomas Zinner, Tobias Hoßfeld, and Phuoc Tran-Gia. 2014. A survey on quality of experience of HTTP adaptive streaming. IEEE Communications Surveys & Tutorials 17, 1 (2014), 469--492.
[37]
Angelo G Solimini. 2013. Are there side effects to watching 3D movies? A prospective crossover observational study on visually induced motion sickness. PloS one 8, 2 (2013), e56160.
[38]
Wei Sun, Xiongkuo Min, Wei Lu, and Guangtao Zhai. 2022. A deep learning based no-reference quality assessment model for ugc videos. In ACM International Conference on Multimedia. 856--865.
[39]
Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, et al. 2024. Dual-Branch Network for Portrait Image Quality Assessment. arXiv preprint arXiv:2405.08555 (2024).
[40]
Dong Tian, Hideaki Ochimizu, Chen Feng, Robert Cohen, and Anthony Vetro. 2017. Geometric distortion metrics for point cloud compression. In IEEE ICIP. 3460--3464.
[41]
Eric M Torlig, Evangelos Alexiou, Tiago A Fonseca, Ricardo L de Queiroz, and Touradj Ebrahimi. 2018. A novel methodology for quality assessment of voxelized point clouds. In Appl. digit. image process. XLI, Vol. 10752. SPIE, 174--190.
[42]
Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, and Alan C Bovik. 2021. UGC-VQA: Benchmarking blind video quality assessment for user generated content. IEEE Transactions on Image Processing 30 (2021), 4449--4464.
[43]
Zhengzhong Tu, Xiangxu Yu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, and Alan C Bovik. 2021. RAPIQUE: Rapid and accurate video quality prediction of user generated content. IEEE Open Journal of Signal Processing 2 (2021), 425--440.
[44]
Gregory K Wallace. 1991. The JPEG still picture compression standard. Commun. ACM 34, 4 (1991), 30--44.
[45]
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE TIP 13, 4 (2004), 600--612.
[46]
Haoning Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. 2022. Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. In European Conference on Computer Vision. 538--554.
[47]
Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Chunyi Li, Liang Liao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtai Zhai, and Weisi Lin. 2023. Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels. arXiv preprint arXiv:2312.17090 (2023). Equal Contribution by Wu, Haoning and Zhang, Zicheng. Project Lead by Wu, Haoning. Corresponding Authors: Zhai, Guangtai and Lin, Weisi.
[48]
Cheng-hsin Wuu, Ningyuan Zheng, Scott Ardisson, Rohan Bali, Danielle Belko, Eric Brockmeyer, Lucas Evans, Timothy Godisart, Hyowon Ha, Xuhua Huang, et al. 2022. Multiface: A dataset for neural face rendering. arXiv preprint arXiv:2207.11243 (2022).
[49]
Qi Yang, Hao Chen, Zhan Ma, Yiling Xu, Rongjun Tang, and Jun Sun. 2020. Predicting the perceptual quality of point cloud: A 3d-to-2d projection-based exploration. IEEE Transactions on Multimedia 23 (2020), 3877--3891.
[50]
Zhenyu Yang, Yi Cui, Zahid Anwar, Robert Bocchino, Nadir Kiyanclar, Klara Nahrstedt, Roy H Campbell, and William Yurcik. 2006. Real-time 3d video compression for tele-immersive environments. In Multimedia Computing and Networking 2006, Vol. 6071. SPIE, 12--23.
[51]
Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A control-theoretic approach for dynamic adaptive video streaming over HTTP. In ACM Conference on Special Interest Group on Data Communication. 325--338.
[52]
Lin Zhang, Lei Zhang, and Alan C Bovik. 2015. A feature-enriched completely blind image quality evaluator. IEEE TIP 24, 8 (2015), 2579--2591.
[53]
Zicheng Zhang, Wei Sun, Xiongkuo Min, Tao Wang, Wei Lu, and Guangtao Zhai. 2022. No-reference quality assessment for 3d colored point cloud and mesh models. IEEE TCSVT (2022).
[54]
Zicheng Zhang, Wei Sun, Xiongkuo Min, Quan Zhou, Jun He, Qiyuan Wang, and Guangtao Zhai. 2023. MM-PCQA: Multi-Modal Learning for No-reference Point Cloud Quality Assessment. IJCAI (2023).
[55]
Zicheng Zhang, Wei Sun, Haoning Wu, Yingjie Zhou, Chunyi Li, Zijian Chen, Xiongkuo Min, Guangtao Zhai, and Weisi Lin. 2024. Gms-3dqa: Projection-based grid mini-patch sampling for 3d model quality assessment. ACM Transactions on Multimedia Computing, Communications and Applications 20, 6 (2024), 1--19.
[56]
Zicheng Zhang, Wei Sun, Yingjie Zhou, Jun Jia, Zhichao Zhang, Jing Liu, Xiongkuo Min, and Guangtao Zhai. 2023. Subjective and objective quality assessment for in-the-wild computer graphics images. ACM Transactions on Multimedia Computing, Communications and Applications 20, 4 (2023), 1--22.
[57]
Zicheng Zhang, Wei Sun, Yingjie Zhou, Wei Lu, Yucheng Zhu, Xiongkuo Min, and Guangtao Zhai. 2023. EEP-3DQA: Efficient and Effective Projection-Based 3D Model Quality Assessment. In 2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2483--2488.
[58]
Zicheng Zhang, Wei Sun, Yingjie Zhou, Haoning Wu, Chunyi Li, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, and Weisi Lin. 2023. Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation. arXiv preprint arXiv:2307.02808 (2023).
[59]
Zicheng Zhang, Wei Sun, Yucheng Zhu, Xiongkuo Min, Wei Wu, Ying Chen, and Guangtao Zhai. 2022. Treating Point Cloud as Moving Camera Videos: A No-Reference Quality Assessment Metric. arXiv preprint arXiv:2208.14085 (2022).
[60]
Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, and Guangtao Zhai. 2024. LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM. arXiv preprint arXiv:2404.18203 (2024).
[61]
Zicheng Zhang, Yingjie Zhou, Chunyi Li, Kang Fu, Wei Sun, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. 2024. A Reduced-Reference Quality Assessment Metric for Textured Mesh Digital Humans. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2965--2969.
[62]
Zicheng Zhang, Yingjie Zhou, Wei Sun, Wei Lu, Xiongkuo Min, Yu Wang, and Guangtao Zhai. 2023. Ddh-qa: A dynamic digital humans quality assessment database. In ICME. IEEE, 2519--2524.
[63]
Zicheng Zhang, Yingjie Zhou, Wei Sun, Xiongkuo Min, Yuzhe Wu, and Guangtao Zhai. 2023. Perceptual Quality Assessment for Digital Human Heads. In ICASSP. IEEE, 1--5.
[64]
Zicheng Zhang, Yingjie Zhou, Wei Sun, Xiongkuo Min, and Guangtao Zhai. 2023. Geometry-Aware Video Quality Assessment for Dynamic Digital Human. In 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 1365--1369.
[65]
Zicheng Zhang, Yingjie Zhou, Wei Sun, Xiongkuo Min, and Guangtao Zhai. 2023. Simple Baselines for Projection-based Full-reference and No-reference Point Cloud Quality Assessment. arXiv preprint arXiv:2310.17147 (2023).
[66]
Zicheng Zhang, Yingjie Zhou, Long Teng, Wei Sun, Chunyi Li, Xiongkuo Min, Xiao-Ping Zhang, and Guangtao Zhai. 2024. Quality-of-Experience Evaluation for Digital Twins in 6G Network Environments. IEEE Transactions on Broadcasting (2024).
[67]
Yingjie Zhou, Yaodong Chen, Kaiyue Bi, Lian Xiong, and Hui Liu. 2023. An Implementation of Multimodal Fusion System for Intelligent Digital Human Generation. arXiv preprint arXiv:2310.20251 (2023).
[68]
Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiaohong Liu, Xiongkuo Min, Zhihua Wang, Xiao-Ping Zhang, and Guangtao Zhai. 2024. THQA: A Perceptual Quality Assessment Database for Talking Heads. arXiv preprint arXiv:2404.09003 (2024).
[69]
Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiongkuo Min, Xianghe Ma, and Guangtao Zhai. 2023. A No-Reference Quality Assessment Method for Digital Human Head. In 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 36--40.
[70]
Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiongkuo Min, and Guangtao Zhai. 2023. Perceptual Quality Assessment for Point Clouds Point Clouds: A Survey A Survey. ZTE COMMUNICATIONS 21, 4 (2023)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. large quality model
  2. mesh
  3. multi-modal
  4. quality assessment
  5. quality-of-experience
  6. syncnet
  7. talking heads

Qualifiers

  • Research-article

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 73
    Total Downloads
  • Downloads (Last 12 months)73
  • Downloads (Last 6 weeks)32
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media