Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

On Content-Aware Post-Processing: Adapting Statistically Learned Models to Dynamic Content

Published: 18 September 2023 Publication History

Abstract

Learning-based post-processing methods generally produce neural models that are statistically optimal on their training datasets. These models, however, neglect intrinsic variations of local video content and may fail to process unseen content. To address this issue, this article proposes a content-aware approach for the post-processing of compressed videos. We develop a backbone network, called BackboneFormer, where a Fast Transformer using Separable Self-Attention, Spatial Attention, and Channel Attention is devised to support underlying feature embedding and aggregation. Furthermore, we introduce Meta-learning to strengthen BackboneFormer for better performance. Specifically, we propose Meta Post-Processing (Meta-PP) which leverages the Meta-learning framework to drive BackboneFormer to capture and analyze input video variations for spontaneous updating. Since the original frame is unavailable to the decoder, we devise a Compression Degradation Estimation model where a low-complexity neural model and classic operators are used collaboratively to estimate the compression distortion. The estimated distortion is then utilized to guide the BackboneFormer model for dynamic updating of weighting parameters. Experimental results demonstrate that the proposed BackboneFormer itself gains about 3.61% Bjøntegaard delta bit-rate reduction over Versatile Video Coding in the post-processing task and “BackboneFormer + Meta-PP” attains 4.32%, costing only 50K and 61K parameters, respectively. The computational complexity of MACs is 49k/pixel and 50k/pixel, which represents only about 16% of state-of-the-art methods having similar coding gains.

Supplementary Material

3612925.supp (3612925.supp.pdf)
Supplementary material

References

[1]
Yuval Bahat, Netalee Efrat, and Michal Irani. 2017. Non-uniform blind deblurring by reblurring. In Proceedings of the IEEE International Conference on Computer Vision. 3286–3294.
[2]
G. Bjøntegaard. 2001. Calculation of average PSNR differences between RD-curves. In Proceedings of the 13th Meeting of the Video Coding Experts Group.
[3]
Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. 2021. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12299–12310.
[4]
Haibo Chen, Lei Zhao, Zhizhong Wang, Huiming Zhang, Zhiwen Zuo, Ailin Li, Wei Xing, and Dongming Lu. 2021. Artistic style transfer with internal-external learning and contrastive learning. Advances in Neural Information Processing Systems 34 (2021), 26561–26573.
[5]
Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. 2022. Simple baselines for image restoration. arXiv preprint arXiv:2204.04676 (2022).
[6]
Yuanying Dai, Dong Liu, and Feng Wu. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In Proceedings of the International Conference on Multimedia Modeling. 28–39.
[7]
Dandan Ding, Lingyi Kong, Guangyao Chen, Zoe Liu, and Yong Fang. 2019. A switchable deep learning approach for in-loop filtering in video coding. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2019), 1871–1887.
[8]
Dandan Ding, Junjie Wang, Guangkun Zhen, Debargha Mukherjee, Urvang Joshi, and Zhan Ma. 2023. Neural adaptive loop filtering for video coding: Exploring multi-hypothesis sample refinement. IEEE Transactions on Circuits and Systems for Video Technology. Early access, March 22, 2023.
[9]
Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2015. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision. 576–584.
[10]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[11]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning. 1126–1135.
[12]
Chelsea Finn and Sergey Levine. 2017. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. arXiv preprint arXiv:1710.11622 (2017).
[13]
C. Fu, E. Alshina, A. Alshin, Y. Huang, C. Chen, C. Tsai, C. Hsu, S. Lei, J. Park, and W. Han. 2012. Sample adaptive offset in the HEVC standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1755–1764.
[14]
Xueyang Fu, Zheng-Jun Zha, Feng Wu, Xinghao Ding, and John Paisley. 2019. JPEG artifacts reduction via deep convolutional sparse coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2501–2510.
[15]
Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, and Thomas Griffiths. 2018. Recasting gradient-based meta-learning as hierarchical Bayes. arXiv preprint arXiv:1801.08930 (2018).
[16]
Zhenyu Guan, Qunliang Xing, Mai Xu, Ren Yang, Tie Liu, and Zulin Wang. 2019. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 949–963.
[17]
Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, and Shi-Min Hu. 2022. Attention mechanisms in computer vision: A survey. Computational Visual Media 8 (2022), 331–368.
[18]
Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, and Dacheng Tao. 2020. A survey on visual transformer. arXiv preprint arXiv:2012.12556 2, 4 (2020).
[19]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[20]
Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, and Jian Sun. 2019. Meta-SR: A magnification-arbitrary network for super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1575–1584.
[21]
Zhijie Huang, Jun Sun, Xiaopeng Guo, and Mingyu Shang. 2021. Adaptive deep reinforcement learning-based in-loop filter for VVC. IEEE Transactions on Image Processing 30 (2021), 5439–5451.
[22]
Zhijie Huang, Jun Sun, Xiaopeng Guo, and Mingyu Shang. 2021. One-for-all: An efficient variable convolution neural network for in-loop filter of VVC. IEEE Transactions on Circuits and Systems for Video Technology 32, 4 (2021), 2342–2355.
[23]
Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X. Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, Jiewen Ran, Chen Xing, Xingguang Zhou, Pengfei Zhu, Mingrui Geng, Yawei LI, Eirikur Agustsson, Shuhang Gu, Luc Van Gool, Etienne de Stoutz, Nikolay Kobyshev, Kehui Nie, Yan Zhao, Gen Li, Tong Tong, Qinquan Gao, Liu Hanwen, Pablo Navarrete Michelini, Zhu Dan, Hu Fengshuo, Zheng Hui, Xiumei Wang, Lirui Deng, Rang Meng, Jinghui Qin, Yukai Shi, Wushao Wen, Liang Lin, Ruicheng Feng, Shixiang Wu, Chao Dong, Yu Qiao, Subeesh Vasu, Nimisha Thekke Madam, Praveen Kandula, A. N. Rajagopalan, Jie Liu, and Cheolkon Jung. 2018. PIRM challenge on perceptual image enhancement on smartphones: Report. In Proceedings of the European Conference on Computer Vision Workshops (ECCV’18).
[24]
Chuanmin Jia, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Jiaying Liu, Shiliang Pu, and Siwei Ma. 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing 28, 7 (2019), 3343–3356.
[25]
Wei Jia, Li Li, Zhu Li, Xiang Zhang, and Shan Liu. 2021. Residual-guided in-loop filter using convolution neural network. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 4 (2021), 1–19.
[26]
Marta Karczewicz, Nan Hu, Jonathan Taquet, Ching-Yeh Chen, Kiran Misra, Kenneth Andersson, Peng Yin, Taoran Lu, Edouard François, and Jie Chen. 2021. VVC in-loop filters. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3907–3925.
[27]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[28]
Lingyi Kong, Dandan Ding, Fuchang Liu, Debargha Mukherjee, Urvang Joshi, and Yue Chen. 2020. Guided CNN restoration with explicitly signaled linear combination. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP’20). IEEE, Los Alamitos, CA, 3379–3383.
[29]
Y. Li, K. Zhang, J. Li, L. Zhang, H. Wang, M. Coban, A. M. Kotra, M. Karczewicz, F. Galpin, K. Andersson, J. Ström, D. Liu, and R. Sjöberg. 2022. EE1-1.6: Deep in-loop filter with fixed point implementation. WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16, JVET-AA0111 (July2022).
[30]
Yue Li, Li Zhang, and Kai Zhang. 2023. iDAM: Iteratively trained deep in-loop filter with adaptive model selection. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 1s (2023), Article 34, 22 pages.
[31]
Kai Lin, Chuanmin Jia, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2022. NR-CNN: Nested-residual guided CNN in-loop filtering for video coding. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 4 (2022), 1–22.
[32]
Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, and Yibo Fan. 2020. A learning-based low complexity in-loop filter for video coding. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo Workshops (ICMEW’20). IEEE, Los Alamitos, CA, 1–6.
[33]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
[34]
Di Ma, Fan Zhang, and David R. Bull. 2020. MFRNet: A new CNN architecture for post-processing and in-loop filtering. IEEE Journal of Selected Topics in Signal Processing 15, 2 (2020), 378–387.
[35]
Haoyu Ma, Bingchen Gong, and Yizhou Yu. 2022. Structure-aware meta-fusion for image super-resolution. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2 (2022), 1–25.
[36]
Sachin Mehta and Mohammad Rastegari. 2022. Separable self-attention for mobile vision transformers. arXiv preprint arXiv:2206.02680 (2022).
[37]
Nicola Messina, Giuseppe Amato, Andrea Esuli, Fabrizio Falchi, Claudio Gennaro, and Stéphane Marchand-Maillet. 2021. Fine-grained visual textual alignment for cross-modal retrieval using transformer encoders. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 4 (2021), 1–23.
[38]
Inbar Mosseri, Maria Zontak, and Michal Irani. 2013. Combining the power of internal and external denoising. In Proceedings of the IEEE International Conference on Computational Photography (ICCP’13). IEEE, Los Alamitos, CA, 1–9.
[39]
Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018).
[40]
A. Norkin, G. Bjontegaard, A. Fuldseth, M. Narroschke, M. Ikeda, K. Andersson, M. Zhou, and G. Van der Auwera. 2012. HEVC deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1746–1754.
[41]
Zhaoqing Pan, Xiaokai Yi, Yun Zhang, Byeungwoo Jeon, and Sam Kwong. 2020. Efficient in-loop filtering based on enhanced deep convolutional neural networks for HEVC. IEEE Transactions on Image Processing 29 (2020), 5352–5366.
[42]
Seobin Park, Jinsu Yoo, Donghyeon Cho, Jiwon Kim, and Tae Hyun Kim. 2020. Fast adaptation to super-resolution networks via meta-learning. In Proceedings of the European Conference on Computer Vision. 754–769.
[43]
Jae Woong Soh, Sunwoo Cho, and Nam Ik Cho. 2020. Meta-transfer learning for zero-shot super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3516–3525.
[44]
Heiko Schwarz, Thomas Schierl, and Detlev Marpe. 2014. Block structures and parallelism features in HVEC. In High Efficiency Video Coding (HEVC): Algorithms and Architectures, Vivienne Sze, Madhukar Budagavi, and Gary J. Sullivan (Eds.). Integrated Circuits and Systems, Vol. 39. Springer, 49–90.
[45]
Chia-Yang Tsai, Ching-Yeh Chen, Tomoo Yamakage, In Suk Chong, Yu-Wen Huang, Chih-Ming Fu, Takayuki Itoh, Takashi Watanabe, Takeshi Chujoh, Marta Karczewicz, and Shaw-Min Lei. 2013. Adaptive loop filtering for video coding. IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 934–945.
[46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.
[47]
Dezhao Wang, Sifeng Xia, Wenhan Yang, and Jiaying Liu. 2021. Combining progressive rethinking and collaborative learning: A deep framework for in-loop filtering. IEEE Transactions on Image Processing 30 (2021), 4198–4211.
[48]
Jiahao Wang, Songyang Zhang, Yong Liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, and Dahua Lin. 2023. RIFormer: Keep your vision backbone effective but removing token mixer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14443–14452.
[49]
Shen Wang, Yibing Fu, Chen Zhu, Li Song, and Wenjun Zhang. 2022. Low-complexity multi-model CNN in-loop filter for AVS3. In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’22). IEEE, Los Alamitos, CA, 1630–1634.
[50]
Tengfei Wang, Hao Ouyang, and Qifeng Chen. 2021. Image inpainting with external-internal learning and monochromic bottleneck. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5120–5129.
[51]
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid Vision Transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 568–578.
[52]
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. 2021. Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1905–1914.
[53]
Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. 2022. Uformer: A general U-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17683–17693.
[54]
Zhao Wang, Changyue Ma, Ru-Ling Liao, and Yan Ye. 2021. Multi-density convolutional neural network for in-loop filter in video coding. In Proceedings of the 2021 Data Compression Conference (DCC’21). IEEE, Los Alamitos, CA, 23–32.
[55]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 3–19.
[56]
Jie Xiao, Xueyang Fu, Feng Wu, and Zheng-Jun Zha. 2022. Stochastic window transformer for image restoration. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS’22).
[57]
Qunliang Xing, Mai Xu, Tianyi Li, and Zhenyu Guan. 2020. Early exit or not: Resource-efficient blind quality enhancement for compressed images. In Proceedings of the European Conference on Computer Vision. 275–292.
[58]
Yazhou Xing, Zian Qian, and Qifeng Chen. 2021. Invertible image signal processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6287–6296.
[59]
Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, and Baining Guo. 2020. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5791–5800.
[60]
Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, and Shuicheng Yan. 2022. MetaFormer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10819–10829.
[61]
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2022. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5728–5739.

Cited By

View all

Index Terms

  1. On Content-Aware Post-Processing: Adapting Statistically Learned Models to Dynamic Content

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 1
      January 2024
      639 pages
      EISSN:1551-6865
      DOI:10.1145/3613542
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 September 2023
      Online AM: 16 August 2023
      Accepted: 28 July 2023
      Revised: 24 May 2023
      Received: 27 October 2022
      Published in TOMM Volume 20, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. VVC
      2. in-loop filtering
      3. post-processing
      4. transformer
      5. Meta-learning

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • National Undergraduate Training Program for Innovation and Entrepreneurship

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 246
        Total Downloads
      • Downloads (Last 12 months)185
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media