research-article

Learning for Video Compression

Authors:

Feng WuAuthors Info & Claims

IEEE Transactions on Circuits and Systems for Video Technology, Volume 30, Issue 2

Pages 566 - 576

https://doi.org/10.1109/TCSVT.2019.2892608

Published: 01 February 2020 Publication History

Abstract

One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper, we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the learning network. On the basis of PMCNN, we further explore a learning-based framework for video compression with additional components of iterative analysis/synthesis and binarization. The experimental results demonstrate the effectiveness of the proposed scheme. Although entropy coding and complex configurations are not employed in this paper, we still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec. The proposed learning-based scheme provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

References

[1]

“Cisco visual networking index: Forecast and methodology, 2016–2021,” CISCO System, San Jose, CA, USA, White Paper. [Online]. Available: https://www.reinvention.be/webhdfs/v1/docs/complete-white-paper-c11-481360.pdf

[2]

A. Habibi, “Hybrid coding of pictorial data,” IEEE Trans. Commun., vol. COM-22, no. 5, pp. 614–624, May 1974.

[3]

R. Forchheimer, “Differential transform coding: A new hybrid coding scheme,” in Proc. Picture Coding Symp. (PCS), Montreal, Canada, Jun. 1981, pp. 15–16.

[4]

G. Todericiet al., “Full resolution image compression with recurrent neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5435–5443.

[5]

J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” in Proc. Int. Conf. Learn. Represent. (ICLR), Nov. 2017.

[6]

L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” in Proc. Int. Conf. Learn. Represent. (ICLR), Mar. 2017.

[7]

G. Todericiet al., “Variable rate image compression with recurrent neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2016.

[8]

J. Ohm and M. Wien, “Future video coding-tools and developments beyond HEVC,” in Proc. ICIP, 2017. [Online]. Available: http://www.2017.ieeeicip.org/Tutorials.html

[9]

A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer, “Semantic perceptual image compression using deep convolution networks,” in Proc. Data Compress. Conf. (DCC), Apr. 2017, pp. 250–259.

[10]

N. Yan, D. Liu, H. Li, and F. Wu, “A convolutional neural network approach for half-pel interpolation in video coding,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2017, pp. 1–4.

[11]

F. Jiang, W. Tao, S. Liu, J. Ren, X. Guo, and D. Zhao, “An end-to-end compression framework based on convolutional neural networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 10, pp. 3007–3018, Oct. 2018.

Digital Library

[12]

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.

Digital Library

[13]

R. Song, D. Liu, H. Li, and F. Wu, “Neural network-based arithmetic coding of intra prediction modes in HEVC,” in Proc. IEEE Vis. Commun. Image Process. (VCIP), Dec. 2017, pp. 1–4.

[14]

T. Li, M. Xu, and X. Deng, “A deep convolutional neural network approach for complexity reduction on intra-mode HEVC,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2017, pp. 1255–1260.

[15]

X. Yu, Z. Liu, J. Liu, Y. Gao, and D. Wang, “VLSI friendly fast CU/PU mode decision for HEVC intra encoding: Leveraging convolution neural network,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2015, pp. 1285–1289.

[16]

C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 576–584.

[17]

T. Wang, M. Chen, and H. Chao, “A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC,” in Proc. Data Compress. Conf. (DCC), Apr. 2017, pp. 410–419.

[18]

Y. Dai, D. Liu, and F. Wu, “A convolutional neural network approach for post-processing in HEVC intra coding,” in Proc. Int. Conf. Multimedia Modeling (ICMM). Cham, Switzerland: Springer, 2017, pp. 28–39.

[19]

Y. Liet al., “Convolutional neural network-based block up-sampling for intra frame coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 9, pp. 2316–2330, Sep. 2018.

[20]

K. Gregor, F. Besse, D. J. Rezende, I. Danihelka, and D. Wierstra, “Towards conceptual compression,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2016, pp. 3549–3557.

[21]

N. Johnstonet al. (2017). “Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks.” [Online]. Available: https://arxiv.org/abs/1703.10114

[22]

M. H. Baig, V. Koltun, and L. Torresani, “Learning to inpaint for image compression,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2017, pp. 1246–1255.

[23]

Z. Chen and T. He. (2018). “Learning based facial image compression with semantic fidelity metric.” [Online]. Available: https://arxiv.org/abs/~1812.10067

[24]

S. Santurkar, D. Budden, and N. Shavit. (2017). “Generative compression.” [Online]. Available: https://arxiv.org/abs/1703.01467

[25]

M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction beyond mean square error,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2016.

[26]

W. Lotter, G. Kreiman, and D. Cox, “Deep predictive coding networks for video prediction and unsupervised learning,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017.

[27]

X. Jin, Z. Chen, S. Liu, and W. Zhou, “Augmented coarse-to-fine video frame synthesis with semantic loss,” in Proc. Chin. Conf. Pattern Recognit. Comput. Vis. (PRCV). Cham, Switzerland: Springer, 2018, pp. 439–452.

[28]

A. Dosovitskiyet al., “FlowNet: Learning optical flow with convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 2758–2766.

[29]

Z. Ren, J. Yan, B. Ni, B. Liu, X. Yang, and H. Zha, “Unsupervised deep learning for optical flow estimation,” in Proc. AAAI, vol. 3, 2017, p. 7.

[30]

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2015, pp. 2017–2025.

[31]

S. K. Sønderby, C. K. Sønderby, L. Maaløe, and O. Winther. (2015). “Recurrent spatial transformer networks.” [Online]. Available: https://arxiv.org/abs/1509.05329

[32]

A. V. D. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 1747–1756.

[33]

Z. Chen, J. Xu, Y. He, and J. Zheng, “Fast integer-pel and fractional-pel motion estimation for H.264/AVC,” J. Vis. Commun. Image Represent., vol. 17, no. 2, pp. 264–290, 2006.

[34]

X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo, “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2015, pp. 802–810.

[35]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.

[36]

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 448–456.

[37]

J. O’Neal, “Predictive quantizing systems (differential pulse code modulation) for the transmission of television signals,” Bell Syst. Tech. J., vol. 45, no. 5, pp. 689–721, May/Jun. 1966.

[38]

T. Raiko, M. Berglund, G. Alain, and L. Dinh, “Techniques for learning binary stochastic feedforward neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.

[39]

G. J. Sullivan and T. Wiegand, “Video compression—From concepts to the H.264/AVC standard,” Proc. IEEE, vol. 93, no. 1, pp. 18–31, Jan. 2005.

[40]

G. Sullivan, Common HM Test Conditions and Software Reference Configurations, document JCTVC-L1100, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC), 2013.

[41]

D. Vaisey and A. Gersho, “Variable block-size image coding,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Apr. 1987, pp. 1051–1054.

[42]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.

[43]

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034.

[44]

P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614–619, Jul. 2003.

Digital Library

[45]

Generic Coding of Moving Pictures and Associated Audio Information-Part 2: Video, document ITU-T H.262, 1994.

[46]

T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.

Digital Library

[47]

Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. 37th Asilomar Conf. Signals, Syst. Comput., vol. 2, Nov. 2003, pp. 1398–1402.

[48]

G. Bjontegaard, Calculation of Average PSNR Differences Between RD-Curves, document VCEG-M33, ITU-T Q6/16, Austin, TX, USA, Apr. 2001, pp. 2–4.

[49]

G. Bjontegaard, Improvements of the BD-PSNR Model, VCEG-AI11, document ITU-T Q. 6/SG16, 34th VCEG Meeting, Berlin, Germany, 2008.

Cited By

Zheng ZZhong HHu QZhang XSong LZhang YWang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HPC: Hierarchical Progressive Coding Framework for Volumetric VideoProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681107(7937-7946)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681107
Sheng XLi LLiu DLi H(2024)Spatial Decomposition and Temporal Fusion Based Inter Prediction for Learned Video CompressionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336024834:7(6460-6473)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3360248
Guo BXu LChen TYe PHe SLiu HChen J(2024)Latency-Aware Neural Architecture Performance Predictor With Query-to-Tier TechniqueIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328768434:7(5868-5883)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3287684
Show More Cited By

Index Terms

Learning for Video Compression
1. Theory of computation

Index terms have been assigned to the content through auto-classification.

Recommendations

Multiview video compression
Real-time lossless compression of mosaic video sequences
Special issue on multi-dimensional image processing

This paper presents a simple, fast coding technique for lossless compression of mosaic video data. The design of a video codec needs to strike a balance between the compression performance and the codec throughput. Aiming to make the encoding throughput ...
Imaging and video compression using embedded zerotree coding

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Circuits and Systems for Video Technology

IEEE Transactions on Circuits and Systems for Video Technology Volume 30, Issue 2

Feb. 2020

26 pages

ISSN:1051-8215

Issue’s Table of Contents

1051-8215 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 February 2020

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng ZZhong HHu QZhang XSong LZhang YWang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)HPC: Hierarchical Progressive Coding Framework for Volumetric VideoProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681107(7937-7946)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681107
Sheng XLi LLiu DLi H(2024)Spatial Decomposition and Temporal Fusion Based Inter Prediction for Learned Video CompressionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.336024834:7(6460-6473)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3360248
Guo BXu LChen TYe PHe SLiu HChen J(2024)Latency-Aware Neural Architecture Performance Predictor With Query-to-Tier TechniqueIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328768434:7(5868-5883)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3287684
Yang RYang YMarino JMandt S(2023)Insights From Generative Modeling for Neural Video CompressionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326068445:8(9908-9921)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3260684
Sheng XLi JLi BLi LLiu DLu Y(2023)Temporal Context Mining for Learned Video CompressionIEEE Transactions on Multimedia10.1109/TMM.2022.322042125(7311-7322)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3220421
Guo ZFeng RZhang ZJin XChen Z(2023)Learning Cross-Scale Weighted Prediction for Efficient Neural Video CompressionIEEE Transactions on Image Processing10.1109/TIP.2023.328749532(3567-3579)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIP.2023.3287495
Xia GYe WXue PSun YLiu Q(2023)Motion Compression Using Structurally Connected Neural NetworkIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.333291134:6(4299-4310)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3332911
Chen MChen YPeng W(2023)B-CANF: Adaptive B-Frame Coding With Conditional Augmented Normalizing FlowsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.330101634:4(2908-2921)Online publication date: 2-Aug-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3301016
Zhou XLiu XZhang GJia LWang XZhao Z(2023)An Iterative Threshold Algorithm of Log-Sum Regularization for Sparse ProblemIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.324794433:9(4728-4740)Online publication date: 22-Feb-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3247944
Peng BChang RPan ZLi GLing NLei J(2023)Deep In-Loop Filtering via Multi-Domain Correlation Learning and Partition Constraint for Multiview Video CodingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.321351533:4(1911-1921)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TCSVT.2022.3213515
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents