research-article

Gated fusion network for SAO filter and inter frame prediction in Versatile Video Coding

Authors:

Shiba Kuanar,

Vassilis Athitsos,

Dwarikanath Mahapatra,

K.R. RaoAuthors Info & Claims

Volume 109, Issue C

https://doi.org/10.1016/j.image.2022.116839

Published: 01 November 2022 Publication History

Abstract

In order to achieve higher coding efficiency, the Versatile Video Coding (VVC) standard includes several new components at the expense of an increase in decoder computational complexity. These technologies often create ringing and contouring effects on the reconstructed frames at a low bit rate and introduce blurring and distortion. To smooth those visual artifacts, the H.266/VVC framework supports four post-processing filter operations. The state-of-the-art CNN-based in-loop filters prefer to deploy multiple networks for various quantization parameters and frame resolutions, which increases training resources and subsequently becomes overhead at decoder frame reconstruction. This paper presents a single deep-learning-based model for sample adaptive off-set (SAO) non-linear filtering operation on the decoder side, uses feature correlation among adjacent frames, and substantiates the merits of intra–inter frame quality enhancement. We introduced a variable filter size dual multi-scale convolutional neural network (D-MSCNN) to attenuate the compression artifact and incorporated strided deconvolution to restore the high-frequency details on the distorted frame. Our model follows sequential training across all QP values and updates the model weights. Using data augmentation, weight fusion, and residual learning, we demonstrated that our model could be trained effectively by transferring the convolution prior feature indices to the decoder to produce a dense output map. The Objective measurements demonstrate that the proposed method outperforms the baseline VVC method in PSNR, MS-SSIM, and VMAF metrics and achieves an average of 5.16% bit rate saving on different test sequence categories.

Highlights

•

We presented a gated fusion-guided framework in our model design, which effectively combines the inter–intra frame local and temporal feature heterogeneity.

•

Our decoupled model includes a modified loss function to constrain the pixel errors and incorporates the intermediate convolution feature maps through skip connections.

•

Our loss function can be viewed as a generalization of MSE at each batch and adds image gradients as priors for final image reconstruction.

•

A data-driven deconvolution framework is integrated into the decoder module to overcome the quantization artifacts.

•

The end-to-end framework learns the feature map aggregation in separate sub-tasks, optimizes the parameters, and reduces the noise to a greater capacity.

•

Our model’s qualitative and quantitative evaluation shows the effectiveness of artifact removal, especially at crowded target regions, and performs favorably against the existing in-loop deep learning models.

References

[1]

ITU-T and ISO/IEC JTC 1, Versatile Video Coding, Rec. ITU-T H.266 and ISO/IEC 23090-3 (VVC), 2020.

Abstract

Highlights

References

Index Terms

Recommendations

An explicit self-attention-based multimodality CNN in-loop filter for versatile video coding

A nonlocal HEVC in-loop filter using CNN-based compression noise estimation

Complexity-based intra frame rate control by jointing inter-frame correlation for high efficiency video coding

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations