Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Gated fusion network for SAO filter and inter frame prediction in Versatile Video Coding

Published: 01 November 2022 Publication History

Abstract

In order to achieve higher coding efficiency, the Versatile Video Coding (VVC) standard includes several new components at the expense of an increase in decoder computational complexity. These technologies often create ringing and contouring effects on the reconstructed frames at a low bit rate and introduce blurring and distortion. To smooth those visual artifacts, the H.266/VVC framework supports four post-processing filter operations. The state-of-the-art CNN-based in-loop filters prefer to deploy multiple networks for various quantization parameters and frame resolutions, which increases training resources and subsequently becomes overhead at decoder frame reconstruction. This paper presents a single deep-learning-based model for sample adaptive off-set (SAO) non-linear filtering operation on the decoder side, uses feature correlation among adjacent frames, and substantiates the merits of intra–inter frame quality enhancement. We introduced a variable filter size dual multi-scale convolutional neural network (D-MSCNN) to attenuate the compression artifact and incorporated strided deconvolution to restore the high-frequency details on the distorted frame. Our model follows sequential training across all QP values and updates the model weights. Using data augmentation, weight fusion, and residual learning, we demonstrated that our model could be trained effectively by transferring the convolution prior feature indices to the decoder to produce a dense output map. The Objective measurements demonstrate that the proposed method outperforms the baseline VVC method in PSNR, MS-SSIM, and VMAF metrics and achieves an average of 5.16% bit rate saving on different test sequence categories.

Highlights

We presented a gated fusion-guided framework in our model design, which effectively combines the inter–intra frame local and temporal feature heterogeneity.
Our decoupled model includes a modified loss function to constrain the pixel errors and incorporates the intermediate convolution feature maps through skip connections.
Our loss function can be viewed as a generalization of MSE at each batch and adds image gradients as priors for final image reconstruction.
A data-driven deconvolution framework is integrated into the decoder module to overcome the quantization artifacts.
The end-to-end framework learns the feature map aggregation in separate sub-tasks, optimizes the parameters, and reduces the noise to a greater capacity.
Our model’s qualitative and quantitative evaluation shows the effectiveness of artifact removal, especially at crowded target regions, and performs favorably against the existing in-loop deep learning models.

References

[1]
ITU-T and ISO/IEC JTC 1, Versatile Video Coding, Rec. ITU-T H.266 and ISO/IEC 23090-3 (VVC), 2020.
[2]
Bross B., Chen J., Ohm J.-R., Sullivan G.J., Wang Y.-K., Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC), Proc. IEEE 109 (2021) 1463–1493.
[3]
Liu S., Bross B., Chen J., Versatile video coding (draft 10), JVET-s2001, joint video exploration team (JVET), 2020.
[4]
Sullivan G.J., et al., Overview of the high efficiency video coding (HEVC) standard, IEEE Trans. Circuits Syst. Video Technol. 22 (2012) 1649–1668.
[5]
Bonnineau C., Hamidouche W., Fournier J., Sidaty N., Travers J.-F., Déforges O., Perceptual quality assessment of HEVC and VVC standards for 8K video, IEEE Trans. Broadcast. 68 (2022) 246–253.
[6]
Wiegand T., Sullivan G., Bjøntegaard G., Luthra A., Overview of the H.264/AVC video coding standard, IEEE Trans. Circuits Syst. Video Technol. 13 (2003) 560–576.
[7]
Norkin A., et al., HEVC deblocking filter, IEEE Trans. Circuits Syst. Video Technol. 22 (2012) 1746–1754.
[8]
Kim S.D., Yi J., Kim H.M., Ra J.B., A deblocking filter with two separate modes in block-based video coding, IEEE Trans. Circuits Syst. Video Technol. 9 (1999) 156–160.
[9]
Kuanar S., Rao K., Conly C., Gorey N., Deep learning based HEVC in-loop filter and noise reduction, Signal Process., Image Commun. 99 (2021).
[10]
Huang Y., et al., A VVC proposal with quaternary tree plus binary-ternary tree coding block structure and advanced coding techniques, IEEE Trans. Circuits Syst. Video Technol. 30 (2020) 1311–1325.
[11]
Ma S., Zhang X., Zhang J., Jia C., Wang S., Gao W., Nonlocal in-loop filter: The way toward next-generation video coding?, IEEE MultiMedia 23 (2016) 16–26.
[12]
Krutz A., et al., Adaptive global motion temporal filtering for high efficiency video coding, IEEE Trans. Circuits Syst. Video Technol. 22 (2012) 1802–1812.
[13]
Kim J., et al., Accurate image super-resolution using very deep convolutional networks., in: IEEE CVPR, 2016, pp. 1646–1654.
[14]
Bordes P., Galpin F., Dumas T., Nikitin P., AHG11: Replacing SAO in-loop filter with neural networks, in: JVET-V0092, 2021.
[15]
Wang S., Zhang X., Wang S., Ma S., Gao W., Adaptive wavelet domain filter for versatile video coding (VVC), in: IEEE Data Compression Conference (DCC), 2019.
[16]
Karczewicz M., Hu N., Taquet J., Chen C.-Y., Misra K., Andersson K., et al., VVC in-loop filters, IEEE Trans. Circuits Syst. Video Technol. 31 (2021) 3907–3925.
[17]
PU F., Lu T., Yin P., et al., Luma mapping with chroma scaling in versatile video coding, in: IEEE, Data Compression Conference (DCC), 2020.
[18]
Jia C., Content-aware convolutional neural network for in-loop filtering in high efficiency video coding, IEEE Trans. Image Process. 28 (2019) 3343–3356.
[19]
Fu C.-M., et al., Sample adaptive offset in the HEVC standard, IEEE Trans. Circuits Syst. Video Technol. 22 (2012) 1755–1764.
[20]
Tsai C.-Y., et al., Adaptive loop filtering for video coding, IEEE J. Sel. Topics Signal Process 7 (2013) 934–945.
[21]
Erfurt J., Lim W., Schwarz H., Marpe D., Wiegand T., Extended multiple feature-based classifications for adaptive loop filtering, in: APSIPA Transactions on Signal and Information Processing, 2019.
[22]
Kuanar S., Mahapatra D., Athitsos K., abd Rao V., Gated fusion network for SAO filter and inter frame prediction in versatile video coding, 2019, URL: https://arxiv.org/abs/2105.12229.
[23]
Li T., et al., A deep learning approach for multi-frame in-loop filter of HEVC, IEEE Trans. Image Process. 28 (2019) 5663–5678.
[24]
Xie S., Tu Z., Holistically nested edge detection, in: IEEE ICCV, 2015, pp. 1395–1403.
[25]
Liu D., Wang Z., Wen B., Yang J., Han W., Huang T.S., Robust single image super-resolution via deep networks with sparse prior, IEEE Trans. Image Process. 25 (2016) 3194–3207.
[26]
Lv F., Lu F., Wu J., Lim C., MBLLEN: low-light image/video enhancement using CNNs, in: British Machine Vision Conference, 2018, pp. 1–13.
[27]
Huang Z., Sun J., Guo X., Shang M., Adaptive deep reinforcement learning-based in-loop filter for VVC, IEEE Trans. Image Process. 30 (2021) 5439–5451.
[28]
Ma M., Zhang F., Bull B., MFRNet: A new CNN architecture for post-processing and in-loop filtering, IEEE J. Sel. Top. Sign. Proces. 15 (2021) 378–387.
[29]
Wang M.-Z., Wan S., Gong H., Ma M.-Y., Attention-based dual-scale CNN in-loop filter for versatile video coding, IEEE Access 7 (2019) 145214–145226.
[30]
VMAF: the journey continues, in netflix technology blog, Oct, 2018, URL: https://netflixtechblog.com/vmaf-the-journey-continues-44b51ee9ed12.
[31]
Lee C., Woo S., Baek S., Han J., Chae J., Rim J., Comparison of objective quality models for adaptive bit-streaming services, in: 2017 8th International Conference on Information Intelligence Systems & Applications (IISA), 2017, pp. 1–4.
[32]
Z. Wang, E.P. Simoncelli, A.C. Bovik, Multiscale structural similarity for image quality assessment, in: Proc. IEEE Conf. Rec. 37th Asilomar Conf. Signals Syst. Comput, 2003, pp. 1398–1402.
[33]
A. Wieckowski, G. Hege, C. Lehmann, B. Bross, D. Marpe, C. Feldmann, M. Smole, VVC in the cloud and browser playback: it works, in: Proceedings of the 1st Mile-High Video Conference, 2022.
[34]
Li Z., Aaron A., Katsavounidis I., Moorthy A., Manohara M., Toward a practical perceptual video quality metric,” in Netflix TechBlog, June, 2016, URL: https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652.
[35]
Rassool R., VMAF reproducibility: Validating a perceptual practical video quality metric, in: 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), 2017, pp. 1–2.
[36]
N. Barman, S. Schmidt, S. Zadtootaghaj, M.G. Martini, S. Möller, An evaluation of video quality assessment metrics for passive gaming video streaming, in: Proc. 23rd Packet Video Workshop, 2018, pp. 7–12.
[37]
C. Zhu, Y. Huang, R. Xie, L. Song, HEVC VMAF-oriented perceptual rate distortion optimization using CNN, in: Proc. IEEE Int. Picture Coding Symposium (PCS), 2021, pp. 1–5.
[39]
Badrinarayanan V., Kendall A., Cipolla R., SegNet: A deep convolutional encoder-decoder architecture for image segmentation, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 2017, pp. 2481–2495.
[40]
Cheng Y., Cai R., Li Z., Zhao X., Huang K., Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation, in: IEEE CVPR, 2017, pp. 1475–1483.
[41]
Bossen F., Boyce J., Li X., Seregin V., Sühring K., JVET-k1010 JVET common test conditions and software reference configurations for SDR video, in: Joint Video Experts Team (JVET), 11th Meeting, 2018.
[42]
Agustsson E., Timofte R., NTIRE 2017 challenge on single image super-resolution: Dataset and study, in: IEEE CVPRW, 2017, pp. 1122–1131.
[43]
CDVL, Consumer digital video library, 2019, [Online]. Available at:, URL: https://cdvl.org/resources/.
[44]
VQEG, VQEG video datasets and organizations, 2017, [Online]. Available at:, URL: https://www.its.bldrdoc.gov/vqeg/video-datasets-and-organizations.aspx.
[45]
Ma D., Zhang F., Bull D., BVI-DVC: a training database for deep video compression, IEEE Trans. Multimed. (2021).
[46]
Xiph, Xiph. Org video test media. [online], 2017, Available:. URL: https://media.xiph.org/video/derf.
[47]
F. Bossen, Common test conditions and software reference configurations, in: JCT-VC, Tech. Rep. JCTVC-F900, Torino, Italy, 2011.
[48]
M. Wien, V. Baroncini, Report on VVC Compression Performance Verification Testing in the SDR UHD Random Access Category, Tech. Rep. Joint Video Experts Team (JVET) Doc. JVET-T0097, 2020.
[49]
Nguyen T., Marpe D., Compression efficiency analysis of AV1, VVC, and HEVC for random access applications, in: APSIPA Transactions on Signal and Information Processing 10, 2021.
[50]
Nasiri F., Hamidouche W., Morin L., Dhollande N., Cocherel G., Model selection CNN-based VVC quality enhancement, in: 35th Picture Coding Symposium (PCS), 2021.
[51]
Bossen F., Sühring K., Wieckowski A., Liu S., VVC complexity and software implementation analysis, IEEE Trans. Circuits Syst. Video Technol. 31 (2021) 3765–3778.
[52]
Bross B., Kirchhoffer H., Bartnik C., Palkow M., Marpe D., AHG4: “Multiformat berlin test sequences, in: Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 17th Meeting: Brussels, BE, 7–17, 2020.
[53]
Seshadrinathan K., Soundararajan R., Bovik A.C., Cormack L.K., Study of subjective and objective quality assessment of video, IEEE Trans. Image Process. 19 (2010) 1427–1441.
[54]
Song L., Tang X., Zhang W., Yang X., Xia P., The sjtu 4k video sequence dataset, in: Fifth International Workshop on Quality of Multimedia Experience (QoMEX2013), 2013, URL: https://qualinet.github.io/databases/video/sjtu_4k_video_sequence_dataset/.
[55]
G. Bertasius, H. Wang, L. Torresani, Is space-time attention all you need for video understanding? in: Proceedings of the International Conference on Machine Learning (ICML), 2021.
[56]
Kolesnikov A., Dosovitskiy A., Weissenborn D., Heigold G., Uszkoreit J., Beyer L., Minderer M., Dehghani M., Houlsby N., Gelly S., Unterthiner T., Zhai X., An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations (ICLR), 2021.
[57]
Yang R., et al., Learning for video compression with hierarchical quality and recurrent enhancement, in: IEEE, Conference on Computer Vision and Pattern Recognition, CVPR, 2020.

Index Terms

  1. Gated fusion network for SAO filter and inter frame prediction in Versatile Video Coding
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Image Communication
          Image Communication  Volume 109, Issue C
          Nov 2022
          208 pages

          Publisher

          Elsevier Science Inc.

          United States

          Publication History

          Published: 01 November 2022

          Author Tags

          1. Artifacts
          2. VVC
          3. SAO
          4. De-convolution
          5. In-loop filter
          6. Deep learning

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 15 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media