Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Deep Inter Prediction with Error-Corrected Auto-Regressive Network for Video Coding

Published: 23 January 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Modern codecs remove temporal redundancy of a video via inter prediction, i.e., searching previously coded frames for similar blocks and storing motion vectors to save bit-rates. However, existing codecs adopt block-level motion estimation, where a block is regressed by reference blocks linearly and is doomed to fail to deal with non-linear motions. In this article, we generate virtual reference frames (VRFs) with previously reconstructed frames via deep networks to offer an additional candidate, which is not constrained to linear motion structure and further significantly improves coding efficiency. More specifically, we propose a novel deep Auto-Regressive Moving-Average (ARMA) model, Error-Corrected Auto-Regressive Network (ECAR-Net), equipped with the powers of the conventional statistic ARMA models and deep networks jointly for reference frame prediction. Similar to conventional ARMA models, the ECAR-Net consists of two stages: Auto-Regression (AR) stage and Error-Correction (EC) stage, where the first part predicts the signal at the current time-step based on previously reconstructed frames, while the second one compensates for the output of the AR stage to obtain finer details. Different from the statistic AR models only focusing on short-term temporal dependency, the AR model of our ECAR-Net is further injected with the long-term dynamics mechanism, where long temporal information is utilized to help predict motions more accurately. Furthermore, ECAR-Net works in a configuration-adaptive way, i.e., using different dynamics and error definitions for the Low Delay B and Random Access configurations, which helps improve the adaptivity and generality in diverse coding scenarios. With the well-designed network, our method surpasses HEVC on average 5.0% and 6.6% BD-rate saving for the luma component under the Low Delay B and Random Access configurations and also obtains on average 1.54% BD-rate saving over VVC. Furthermore, ECAR-Net works in a configuration-adaptive way, i.e., using different dynamics and error definitions for the Low Delay B and Random Access configurations, which helps improve the adaptivity and generality in diverse coding scenarios.

    References

    [1]
    T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560–576.
    [2]
    G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.
    [3]
    Benjamin Bross, Jianle Chen, Shan Liu, and Ye-Kui Wang. 2020. Versatile video coding (draft 9). In Proceedings of the Document JVET-R2001.
    [4]
    X. Zhang, W. Yang, Y. Hu, and J. Liu. 2018. Dmcnn: Dual-domain multi-scale convolutional neural network for compression artifacts removal. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 390–394.
    [5]
    Mading Li, Jiaying Liu, Xiaoyan Sun, and Zhiwei Xiong. 2019. Image/video restoration via multiplanar autoregressive model and low-rank optimization. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 4 (2019), 1–23.
    [6]
    Wenhan Yang, Jiaying Liu, Mading Li, and Zongming Guo. 2016. Isophote-constrained autoregressive model with adaptive window extension for image interpolation. IEEE Transactions on Circuits and Systems for Video Technology 28, 5 (2016), 1071–1086.
    [7]
    Mading Li, Jiaying Liu, Jie Ren, and Zongming Guo. 2014. Adaptive general scale interpolation based on weighted autoregressive models. IEEE Transactions on Circuits and Systems for Video Technology 25, 2 (2014), 200–211.
    [8]
    Xin Jin, Zhibo Chen, Sen Liu, and Wei Zhou. 2018. Augmented coarse-to-fine video frame synthesis with semantic loss. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision. Springer, 439–452.
    [9]
    Yueyu Hu, Wenhan Yang, Sifeng Xia, Wen-Huang Cheng, and Jiaying Liu. 2018. Enhanced intra prediction with recurrent neural network in video coding. In Proceedings of the Data Compression Conference.413–413.
    [10]
    Y. Hu, W. Yang, M. Li, and J. Liu. 2019. Progressive spatial recurrent neural network for intra prediction. IEEE Transactions on Multimedia 21, 12 (2019), 3024–3037.
    [11]
    Y. Hu, W. Yang, S. Xia, and J. Liu. 2018. Optimized spatial recurrent network for intra prediction in video coding. In Proceedings of the IEEE Visual Communications and Image Processing. IEEE, 1–4.
    [12]
    Sifeng Xia, Wenhan Yang, Yueyu Hu, Siwei Ma, and Jiaying Liu. 2018. A group variational transformation neural network for fractional interpolation of video coding. In Proceedings of the Data Compression Conference.127–136.
    [13]
    Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. 2018. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Transactions on Image Processing 28, 5 (2018), 2140–2151.
    [14]
    Dezhao Wang, Sifeng Xia, Wenhan Yang, Yueyu Hu, and Jiaying Liu. 2019. Partition tree guided progressive rethinking network for in-loop filtering of HEVC. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 2671–2675.
    [15]
    L. Zhao, S. Wang, X. Zhang, S. Wang, and S. Ma. 2019. Enhanced motion-compensated video coding with deep virtual reference frame generation. IEEE Transactions on Image Processing 28, 10 (2019), 4832–4844.
    [16]
    S. Niklaus, L. Mai, and F. Liu. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the International Conference on Computer Vision. 261–270.
    [17]
    Hyomin Choi and Ivan V. Bajić. 2019. Deep frame prediction for video coding. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2019), 1843–1855.
    [18]
    J. Liu, S. Xia, and W. Yang. 2020. Deep reference generation with multi-domain hierarchical constraints for inter prediction. IEEE Transactions on Multimedia 22, 10 (2020), 2497–2510.
    [19]
    T. Laude, F. Haub, and J. Ostermann. 2019. HEVC inter coding using deep recurrent neural networks and artificial reference pictures. In Proceedings of the Picture Coding Symposium. 1–5.
    [20]
    William Lotter, Gabriel Kreiman, and David Cox. 2017. Deep predictive coding networks for video prediction and unsupervised learning. In Proceedings of the International Conference on Learning Representations.
    [21]
    E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. In Proceedings of the Advances in Neural Information Processing Systems.
    [22]
    J. Lin, D. Liu, H. Li, and F. Wu. 2018. Generative adversarial network-based frame extrapolation for video coding. In Proceedings of the IEEE Visual Communication and Image Processing. IEEE, 1–4.
    [23]
    E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 2462–2470.
    [24]
    D. Sun, X. Yang, M. Liu, and J. Kautz. 2018. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 8934–8943.
    [25]
    Ping Hu, Gang Wang, and Yap-Peng Tan. 2018. Recurrent spatial pyramid CNN for optical flow estimation. IEEE Transactions on Multimedia 20, 10 (2018), 2814–2823.
    [26]
    S. Meyer, O. Wang, H. Zimmer, M. Grosse, and A. Sorkine-Hornung. 2015. Phase-based frame interpolation for video. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 1410–1418.
    [27]
    S. Niklaus, L. Mai, and F. Liu. 2017. Video frame interpolation via adaptive convolution. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 670–679.
    [28]
    Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala. 2017. Video frame synthesis using deep voxel flow. In Proceedings of the International Conference on Computer Vision. 4463–4471.
    [29]
    Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, and Bryan Catanzaro. 2018. Sdc-net: Video prediction using spatially-displaced convolution. In Proceedings of the European Conference on Computer Vision. 718–733.
    [30]
    J. Li, B. Li, J. Xu, R. Xiong, and W. Gao. 2018. Fully connected network-based intra prediction for image coding. IEEE Transactions on Image Processing 27, 7 (2018), 3236–3247.
    [31]
    X. Zhang, W. Yang, Y. Hu, and J. Liu. 2018. DMCNN: Dual-domain multi-scale convolutional neural network for compression artifacts removal. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 390–394.
    [32]
    W. Yang, S. Xia, J. Liu, and Z. Guo. 2018. Reference-guided deep super-resolution via manifold localized external compensation. IEEE Transactions on Circuits and Systems for Video Technology 29, 5 (2018), 1270–1283.
    [33]
    J. Kang, S. Kim, and K. M. Lee. 2017. Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 26–30.
    [34]
    C. Jia, S. Wang, X. Zhang, S. Wang, J. Liu, S. Pu, and S. Ma. 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing 28, 7 (2019), 3343–3356.
    [35]
    Y. Wang, X. Fan, C. Jia, D. Zhao, and W. Gao. 2018. Neural network based inter prediction for HEVC. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 1–6.
    [36]
    L. Zhao, S. Wang, X. Zhang, S. Wang, S. Ma, and W. Gao. 2018. Enhanced ctu-level inter prediction with deep frame rate up-conversion for high efficiency video coding. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 206–210.
    [37]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
    [38]
    Shuai Huo, Dong Liu, Bin Li, Siwei Ma, Feng Wu, and Wen Gao. 2020. Deep network-based frame extrapolation with reference frame alignment. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1178–1192.
    [39]
    Jie Ren, Jiaying Liu, Wei Bai, and Zongming Guo. 2011. Similarity modulated block estimation for image interpolation. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 1177–1180.
    [40]
    Mading Li, Jiaying Liu, Zhiwei Xiong, Xiaoyan Sun, and Zongming Guo. 2016. Marlow: A joint multiplanar autoregressive and low-rank approach for image completion. In Proceedings of the European Conference on Computer Vision. 819–834.
    [41]
    X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and W. Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems. 802–810.
    [42]
    Ranjan Anurag and Black Michael J.2017. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 4161–4170.
    [43]
    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. 2020. Learning enriched features for real image restoration and enhancement. In Proceedings of the European Conference on Computer Vision. 492–511.
    [44]
    Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T. Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.
    [45]
    Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.
    [46]
    Frank Bossen. 2013. Common test conditions and software reference configurations. Technical Report JCTVC-L1100 (2013).
    [47]
    Gisle Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. Technical Report VCEG-M33 (2001).
    [48]
    Y. Hu, S. Xia, W. Yang, and J. Liu. 2020. Memory-augmented auto-regressive network for frame recurrent inter prediction. In Proceedings of the IEEE International Symposium on Circuits and Systems. IEEE, 1–5.
    [49]
    Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An end-to-end deep video compression framework. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 11006–11015.

    Cited By

    View all
    • (2024)Deep Reference Frame Generation Method for VVC Inter Prediction EnhancementIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329941034:5(3111-3124)Online publication date: May-2024
    • (2023)Towards Deep Reference Frame in Versatile Video Coding NNVC2023 IEEE International Conference on Visual Communications and Image Processing (VCIP)10.1109/VCIP59821.2023.10402696(1-5)Online publication date: 4-Dec-2023
    • (2022)Context-aware Pseudo-true Video Interpolation at 6G EdgeACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355531318:3s(1-17)Online publication date: 1-Nov-2022
    • Show More Cited By

    Index Terms

    1. Deep Inter Prediction with Error-Corrected Auto-Regressive Network for Video Coding

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1s
      February 2023
      504 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572859
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 January 2023
      Online AM: 01 April 2022
      Accepted: 21 March 2022
      Revised: 19 February 2022
      Received: 25 October 2021
      Published in TOMM Volume 19, Issue 1s

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. High Efficient Video Coding (HEVC)
      2. inter prediction
      3. deep learning
      4. virtual reference frame
      5. Error-Corrected Auto-Regressive Network
      6. Versatile Video Coding (VVC)

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • National Key Research and Development Program of China
      • National Natural Science Foundation of China
      • Research achievement of Key Laboratory of Science, Techonology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology)
      • State Key Laboratory of Media Convergence Production Technology and Systems

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)185
      • Downloads (Last 6 weeks)13

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Deep Reference Frame Generation Method for VVC Inter Prediction EnhancementIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329941034:5(3111-3124)Online publication date: May-2024
      • (2023)Towards Deep Reference Frame in Versatile Video Coding NNVC2023 IEEE International Conference on Visual Communications and Image Processing (VCIP)10.1109/VCIP59821.2023.10402696(1-5)Online publication date: 4-Dec-2023
      • (2022)Context-aware Pseudo-true Video Interpolation at 6G EdgeACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355531318:3s(1-17)Online publication date: 1-Nov-2022
      • (2022)Deep Reference Frame Interpolation based Inter Prediction Enhancement for Versatile Video Coding2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)10.1109/VCIP56404.2022.10008890(1-5)Online publication date: 13-Dec-2022

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media