Frame importance and temporal memory effect-based fast video quality assessment for user-generated content

Zhang, Yuan; Yang, Mingchuan; Huang, Zhiwei; He, Lijun; Wu, Zijun

doi:10.1007/s10489-023-04624-2

Frame importance and temporal memory effect-based fast video quality assessment for user-generated content

Published: 05 June 2023

Volume 53, pages 21517–21531, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yuan Zhang ORCID: orcid.org/0000-0003-0032-8631¹,
Mingchuan Yang¹,
Zhiwei Huang²,
Lijun He^2,3 &
…
Zijun Wu¹

270 Accesses
Explore all metrics

Abstract

User-generated content (UGC) has become increasingly popular, promoted by the widespread use of social media and mobile devices. Therefore, instant and immersive UGC video quality assessment is urgently needed to provide appropriate recommendations for video reviewers prior to distribution. However, existing methods are neither efficient at assessing UGC videos due to the expensive frame-by-frame process nor suitable for deployment on devices with limited computational capabilities because they require sophisticated GPU-dependent computation. In this paper, we propose a fast UGC video quality assessment method, named FastVQA, by considering both keyframe importance and human temporal memory effects. First, a novel key frame selection strategy based on feature entropy is developed to achieve efficient and accurate feature extraction. Inspired by human short-term and long-term memory effects, we design a temporal feature aggregation module by taking both local content details and global semantic information into consideration. Experimental results show that FastVQA can outperform the state-of-the-art (SOTA) methods on many datasets with significantly reduced CPU time, which implies that FastVQA can achieve a better balance between complexity and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

STAN: Spatio-Temporal Alignment Network for No-Reference Video Quality Assessment

Video flickering removal using temporal reconstruction optimization

Article 15 March 2019

Video summarization via exploring the global and local importance

Article 03 January 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
D G, J P, A CB, A KM, P P, K Y, (2018) In-capture mobile video distortions: A study of subjective behavior and objective algorithms. IEEE Transactions on Circuits and Systems for Video Technology 28(9):2061–2077
Fastowicz J, Grudziński M, Tecław M, Okarma K (2019) Objective 3d printed surface quality assessment based on entropy of depth maps. Entropy 21(1):97
Article Google Scholar
Feichtenhofer C (2020) X3d: Expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 203–213
Ghadiyaram D, Bovik AC (2015) Massive online crowdsourced study of subjective and objective picture quality. IEEE Transactions on Image Processing 25(1):372–387
Article MathSciNet MATH Google Scholar
Group VQE, et al. (2003) Final report from the video quality experts group on the validation of objective models of video quality assessment, phase ii. 2003 VQEG
Guan X, He L, Li M, Li F (2019) Entropy based data expansion method for blind image quality assessment. Entropy 22(1):60
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hosu V, Hahn F, Jenadeleh M, Lin H, Men H, Szirányi T, Li S, Saupe D (2017) The konstanz natural video database (konvid-1k). In: 2017 Ninth international conference on quality of multimedia experience, IEEE, pp 1–6
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861
Hu Y, Zhang B, Zhang Y, Jiang C, Chen Z (2022) A feature-level full-reference image denoising quality assessment method based on joint sparse representation. Applied Intelligence pp 1–16
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360
Imran J, Raman B, Rajput AS (2020) Robust, efficient and privacy-preserving violent activity recognition in videos. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, pp 2081–2088
Kim J, Nguyen AD, Ahn S, Luo C, Lee S (2018) Multiple level feature-based universal blind image quality assessment model. In: 2018 25th IEEE International Conference on Image Processing (ICIP), IEEE, pp 291–295
Köpüklü O, Kose N, Gunduz A, Rigoll G (2019) Resource efficient 3d convolutional neural networks. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), IEEE, pp 1910–1919
Korbar B, Tran D, Torresani L (2019) Scsampler: Sampling salient clips from video for efficient action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6232–6242
Korhonen J (2019) Two-level approach for no-reference consumer video quality assessment. IEEE Transactions on Image Processing 28(12):5923–5938
Article MathSciNet MATH Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Kundu D, Ghadiyaram D, Bovik AC, Evans BL (2017) No-reference quality assessment of tone-mapped hdr pictures. IEEE Transactions on Image Processing 26(6):2957–2971
Article MathSciNet MATH Google Scholar
Li D, Jiang T, Jiang M (2019) Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2351–2359
Li D, Jiang T, Jiang M (2021) Unified quality assessment of in-the-wild videos with mixed datasets training. International Journal of Computer Vision 129(4):1238–1257
Article Google Scholar
Liu M, Zhu M (2018) Mobile video object detection with temporally-aware feature maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5686–5695
Luo Z, Tang Z, Jiang L, Ma G (2022) A referenceless image degradation perception method based on the underwater imaging model. Applied Intelligence 52(6):6522–6538
Article Google Scholar
M N, T V, M V, T V, P O, J H, (2016) Cvd 2014-a database for evaluating no-reference video quality assessment algorithms. IEEE Transactions on Image Processing 25(7):3073–3086
Ma J, Wu J, Li L, Dong W, Xie X, Shi G, Lin W (2021) Blind image quality assessment with active inference. IEEE Transactions on Image Processing 30:3650–3663
Article Google Scholar
Mittal A, Moorthy AK, Bovik AC (2011) Blind/referenceless image spatial quality evaluator. In: 2011 conference record of the forty fifth asilomar conference on signals, systems and computers (ASILOMAR), IEEE, pp 723–727
Mittal A, Soundararajan R, Bovik AC (2013) Making a completely blind image quality analyzer. IEEE Signal Processing Letters 20(3):209–212
Article Google Scholar
Mittal A, Saad MA, Bovik AC (2016) A completely blind video integrity oracle. IEEE Transactions on Image Processing 25(1):289–300
Article MathSciNet MATH Google Scholar
Nizami IF, Majid M, Khurshid K (2018) New feature selection algorithms for no-reference image quality assessment. Applied Intelligence 48(10):3482–3501
Article Google Scholar
Ren H, Chen D, Wang Y (2018) Ran4iqa: restorative adversarial nets for no-reference image quality assessment. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, pp 7308–7314
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115(3):211–252
Article MathSciNet Google Scholar
Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Transactions on Image Processing 23(3):1352–1365
Article MathSciNet MATH Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Tu Z, Wang Y, Birkbeck N, Adsumilli B, Bovik AC (2021) Ugc-vqa: Benchmarking blind video quality assessment for user generated content. IEEE transactions on image processing 30:4449–4464
Article Google Scholar
Wang P, Zhang J, Zhu H (2021) Fire detection in video surveillance using superpixel-based region proposal and ese-shufflenet. Multimedia Tools and Applications pp 1–28
Xu J, Ye P, Li Q, Du H, Liu Y, Doermann D (2016) Blind image quality assessment based on high order statistics aggregation. IEEE Transactions on Image Processing 25(9):4444–4457
Article MathSciNet MATH Google Scholar
Xu J, Zhou W, Chen Z (2020) Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Transactions on Circuits and Systems for Video Technology 31(5):1724–1737
Xue W, Mou X, Zhang L, Bovik AC, Feng X (2014) Blind image quality assessment using joint statistics of gradient magnitude and laplacian features. IEEE Transactions on Image Processing 23(11):4850–4862
Article MathSciNet MATH Google Scholar
Yang S, Jiang Q, Lin W, Wang Y (2019) Sgdnet: An end-to-end saliency-guided deep neural network for no-reference image quality assessment. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 1383–1391
Yang X, Li F, Liu H (2020) Deep feature importance awareness based no-reference image quality prediction. Neurocomputing 401:209–223
Article Google Scholar
Ye P, Kumar J, Kang L, Doermann D (2012) Unsupervised feature learning framework for no-reference image quality assessment. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1098–1105
Yu X, Tian X (2022) A fault detection algorithm for pipeline insulation layer based on immune neural network. International Journal of Pressure Vessels and Piping 196:104611
Article Google Scholar
Yu X, Ye X, Zhang S (2022) Floating pollutant image target extraction algorithm based on immune extremum region. Digital Signal Processing 123:103442
Article Google Scholar
Zhang T, Zhang K, Xiao C, Xiong Z, Lu J (2022a) Joint channel-spatial attention network for super-resolution image quality assessment. Applied Intelligence pp 1–15
Zhang W, Ma K, Zhai G, Yang X (2021) Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Transactions on Image Processing 30:3474–3486
Article Google Scholar
Zhang W, Zhuang P, Sun HH, Li G, Kwong S, Li C (2022) Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement. IEEE Transactions on Image Processing 31:3997–4010
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Zhang Y, Wang Y, Camps O, Sznaier M (2020) Key frame proposal network for efficient pose estimation in videos. In: European Conference on Computer Vision, Springer, pp 609–625
Zhou Z, Zhang B, Yu X (2022) Immune coordination deep network for hand heat trace extraction. Infrared Physics and Technology 127:104400
Article Google Scholar
Zhu L, Tran D, Sevilla-Lara L, Yang Y, Feiszli M, Wang H (2020) Faster recurrent networks for efficient video classification. Proceedings of the AAAI Conference on Artificial Intelligence 34:13098–13105
Article Google Scholar
Zhuang P, Wu J, Porikli F, Li C (2022) Underwater image enhancement with hyper-laplacian reflectance priors. IEEE Transactions on Image Processing 31:5442–5455
Article Google Scholar

Download references

Acknowledgements

This research work was supported in part by the National Science Foundation of China (U1903213) and the Natural Science Foundation of Sichuan Province (2022NSFSC0966).

Author information

Authors and Affiliations

China Telecom, Shanghai, China
Yuan Zhang, Mingchuan Yang & Zijun Wu
Xi’an Jiaotong University, Xi’an, Shaanxi, China
Zhiwei Huang & Lijun He
Sichuan Digital Economy Industry Development Research Institute, Chengdu, Sichuan, China
Lijun He

Authors

Yuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mingchuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lijun He
View author publications
You can also search for this author in PubMed Google Scholar
Zijun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yuan Zhang contributed to the conception of the study; Mingchuan Yang and Zhiwei Huang performed the experiment; Lijun He and Zijun Wu contributed significantly to analysis and manuscript preparation; Yuan Zhang, Mingchuan Yang and Zhiwei Huang performed the data analyses and wrote the manuscript.

Corresponding author

Correspondence to Yuan Zhang.

Ethics declarations

Ethics approval

All authors contributed to the conception and design of the study. All authors read and approved the final manuscript.

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Yang, M., Huang, Z. et al. Frame importance and temporal memory effect-based fast video quality assessment for user-generated content. Appl Intell 53, 21517–21531 (2023). https://doi.org/10.1007/s10489-023-04624-2

Download citation

Accepted: 08 April 2023
Published: 05 June 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04624-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Frame importance and temporal memory effect-based fast video quality assessment for user-generated content

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

STAN: Spatio-Temporal Alignment Network for No-Reference Video Quality Assessment

Video flickering removal using temporal reconstruction optimization

Video summarization via exploring the global and local importance

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Frame importance and temporal memory effect-based fast video quality assessment for user-generated content

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

STAN: Spatio-Temporal Alignment Network for No-Reference Video Quality Assessment

Video flickering removal using temporal reconstruction optimization

Video summarization via exploring the global and local importance

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation