Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Learning for Video Compression

Published: 01 February 2020 Publication History

Abstract

One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper, we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the learning network. On the basis of PMCNN, we further explore a learning-based framework for video compression with additional components of iterative analysis/synthesis and binarization. The experimental results demonstrate the effectiveness of the proposed scheme. Although entropy coding and complex configurations are not employed in this paper, we still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec. The proposed learning-based scheme provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

References

[1]
Cisco visual networking index: Forecast and methodology, 2016–2021,” CISCO System, San Jose, CA, USA, White Paper. [Online]. Available: https://www.reinvention.be/webhdfs/v1/docs/complete-white-paper-c11-481360.pdf
[2]
A. Habibi, “Hybrid coding of pictorial data,” IEEE Trans. Commun., vol. COM-22, no. 5, pp. 614–624, May 1974.
[3]
R. Forchheimer, “Differential transform coding: A new hybrid coding scheme,” in Proc. Picture Coding Symp. (PCS), Montreal, Canada, Jun. 1981, pp. 15–16.
[4]
G. Todericiet al., “Full resolution image compression with recurrent neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5435–5443.
[5]
J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” in Proc. Int. Conf. Learn. Represent. (ICLR), Nov. 2017.
[6]
L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” in Proc. Int. Conf. Learn. Represent. (ICLR), Mar. 2017.
[7]
G. Todericiet al., “Variable rate image compression with recurrent neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2016.
[8]
J. Ohm and M. Wien, “Future video coding-tools and developments beyond HEVC,” in Proc. ICIP, 2017. [Online]. Available: http://www.2017.ieeeicip.org/Tutorials.html
[9]
A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer, “Semantic perceptual image compression using deep convolution networks,” in Proc. Data Compress. Conf. (DCC), Apr. 2017, pp. 250–259.
[10]
N. Yan, D. Liu, H. Li, and F. Wu, “A convolutional neural network approach for half-pel interpolation in video coding,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2017, pp. 1–4.
[11]
F. Jiang, W. Tao, S. Liu, J. Ren, X. Guo, and D. Zhao, “An end-to-end compression framework based on convolutional neural networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 10, pp. 3007–3018, Oct. 2018.
[12]
G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.
[13]
R. Song, D. Liu, H. Li, and F. Wu, “Neural network-based arithmetic coding of intra prediction modes in HEVC,” in Proc. IEEE Vis. Commun. Image Process. (VCIP), Dec. 2017, pp. 1–4.
[14]
T. Li, M. Xu, and X. Deng, “A deep convolutional neural network approach for complexity reduction on intra-mode HEVC,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2017, pp. 1255–1260.
[15]
X. Yu, Z. Liu, J. Liu, Y. Gao, and D. Wang, “VLSI friendly fast CU/PU mode decision for HEVC intra encoding: Leveraging convolution neural network,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2015, pp. 1285–1289.
[16]
C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 576–584.
[17]
T. Wang, M. Chen, and H. Chao, “A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC,” in Proc. Data Compress. Conf. (DCC), Apr. 2017, pp. 410–419.
[18]
Y. Dai, D. Liu, and F. Wu, “A convolutional neural network approach for post-processing in HEVC intra coding,” in Proc. Int. Conf. Multimedia Modeling (ICMM). Cham, Switzerland: Springer, 2017, pp. 28–39.
[19]
Y. Liet al., “Convolutional neural network-based block up-sampling for intra frame coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 9, pp. 2316–2330, Sep. 2018.
[20]
K. Gregor, F. Besse, D. J. Rezende, I. Danihelka, and D. Wierstra, “Towards conceptual compression,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2016, pp. 3549–3557.
[21]
N. Johnstonet al. (2017). “Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks.” [Online]. Available: https://arxiv.org/abs/1703.10114
[22]
M. H. Baig, V. Koltun, and L. Torresani, “Learning to inpaint for image compression,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2017, pp. 1246–1255.
[23]
Z. Chen and T. He. (2018). “Learning based facial image compression with semantic fidelity metric.” [Online]. Available: https://arxiv.org/abs/~1812.10067
[24]
S. Santurkar, D. Budden, and N. Shavit. (2017). “Generative compression.” [Online]. Available: https://arxiv.org/abs/1703.01467
[25]
M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction beyond mean square error,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2016.
[26]
W. Lotter, G. Kreiman, and D. Cox, “Deep predictive coding networks for video prediction and unsupervised learning,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017.
[27]
X. Jin, Z. Chen, S. Liu, and W. Zhou, “Augmented coarse-to-fine video frame synthesis with semantic loss,” in Proc. Chin. Conf. Pattern Recognit. Comput. Vis. (PRCV). Cham, Switzerland: Springer, 2018, pp. 439–452.
[28]
A. Dosovitskiyet al., “FlowNet: Learning optical flow with convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 2758–2766.
[29]
Z. Ren, J. Yan, B. Ni, B. Liu, X. Yang, and H. Zha, “Unsupervised deep learning for optical flow estimation,” in Proc. AAAI, vol. 3, 2017, p. 7.
[30]
M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2015, pp. 2017–2025.
[31]
S. K. Sønderby, C. K. Sønderby, L. Maaløe, and O. Winther. (2015). “Recurrent spatial transformer networks.” [Online]. Available: https://arxiv.org/abs/1509.05329
[32]
A. V. D. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 1747–1756.
[33]
Z. Chen, J. Xu, Y. He, and J. Zheng, “Fast integer-pel and fractional-pel motion estimation for H.264/AVC,” J. Vis. Commun. Image Represent., vol. 17, no. 2, pp. 264–290, 2006.
[34]
X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo, “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2015, pp. 802–810.
[35]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[36]
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 448–456.
[37]
J. O’Neal, “Predictive quantizing systems (differential pulse code modulation) for the transmission of television signals,” Bell Syst. Tech. J., vol. 45, no. 5, pp. 689–721, May/Jun. 1966.
[38]
T. Raiko, M. Berglund, G. Alain, and L. Dinh, “Techniques for learning binary stochastic feedforward neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
[39]
G. J. Sullivan and T. Wiegand, “Video compression—From concepts to the H.264/AVC standard,” Proc. IEEE, vol. 93, no. 1, pp. 18–31, Jan. 2005.
[40]
G. Sullivan, Common HM Test Conditions and Software Reference Configurations, document JCTVC-L1100, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC), 2013.
[41]
D. Vaisey and A. Gersho, “Variable block-size image coding,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Apr. 1987, pp. 1051–1054.
[42]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
[43]
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034.
[44]
P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614–619, Jul. 2003.
[45]
Generic Coding of Moving Pictures and Associated Audio Information-Part 2: Video, document ITU-T H.262, 1994.
[46]
T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.
[47]
Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. 37th Asilomar Conf. Signals, Syst. Comput., vol. 2, Nov. 2003, pp. 1398–1402.
[48]
G. Bjontegaard, Calculation of Average PSNR Differences Between RD-Curves, document VCEG-M33, ITU-T Q6/16, Austin, TX, USA, Apr. 2001, pp. 2–4.
[49]
G. Bjontegaard, Improvements of the BD-PSNR Model, VCEG-AI11, document ITU-T Q. 6/SG16, 34th VCEG Meeting, Berlin, Germany, 2008.

Cited By

View all
  • (2024)HPC: Hierarchical Progressive Coding Framework for Volumetric VideoProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681107(7937-7946)Online publication date: 28-Oct-2024
  • (2024)Spatial Decomposition and Temporal Fusion Based Inter Prediction for Learned Video CompressionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336024834:7(6460-6473)Online publication date: 1-Jul-2024
  • (2024)Latency-Aware Neural Architecture Performance Predictor With Query-to-Tier TechniqueIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328768434:7(5868-5883)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. Learning for Video Compression
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Circuits and Systems for Video Technology
    IEEE Transactions on Circuits and Systems for Video Technology  Volume 30, Issue 2
    Feb. 2020
    26 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 February 2020

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)HPC: Hierarchical Progressive Coding Framework for Volumetric VideoProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681107(7937-7946)Online publication date: 28-Oct-2024
    • (2024)Spatial Decomposition and Temporal Fusion Based Inter Prediction for Learned Video CompressionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336024834:7(6460-6473)Online publication date: 1-Jul-2024
    • (2024)Latency-Aware Neural Architecture Performance Predictor With Query-to-Tier TechniqueIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328768434:7(5868-5883)Online publication date: 1-Jul-2024
    • (2023)Insights From Generative Modeling for Neural Video CompressionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326068445:8(9908-9921)Online publication date: 1-Aug-2023
    • (2023)Temporal Context Mining for Learned Video CompressionIEEE Transactions on Multimedia10.1109/TMM.2022.322042125(7311-7322)Online publication date: 1-Jan-2023
    • (2023)Learning Cross-Scale Weighted Prediction for Efficient Neural Video CompressionIEEE Transactions on Image Processing10.1109/TIP.2023.328749532(3567-3579)Online publication date: 1-Jan-2023
    • (2023)Motion Compression Using Structurally Connected Neural NetworkIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.333291134:6(4299-4310)Online publication date: 15-Nov-2023
    • (2023)B-CANF: Adaptive B-Frame Coding With Conditional Augmented Normalizing FlowsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330101634:4(2908-2921)Online publication date: 2-Aug-2023
    • (2023)An Iterative Threshold Algorithm of Log-Sum Regularization for Sparse ProblemIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.324794433:9(4728-4740)Online publication date: 22-Feb-2023
    • (2023)Deep In-Loop Filtering via Multi-Domain Correlation Learning and Partition Constraint for Multiview Video CodingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.321351533:4(1911-1921)Online publication date: 1-Apr-2023
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media