survey

Deep Learning-Based Video Coding: A Review and a Case Study

Authors:

Feng WuAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 53, Issue 1

Article No.: 11, Pages 1 - 35

https://doi.org/10.1145/3368405

Published: 06 February 2020 Publication History

Abstract

The past decade has witnessed the great success of deep learning in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. We review the representative works about using deep learning for image/video coding, an actively developing research area since 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks, and deep network-based coding tools that shall be used within traditional coding schemes. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding and transform coding, respectively. For deep tools, there have been several techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter and CNN-based block adaptive resolution coding. The source code of DLVC has been released for future research.

References

[1]

Mariana Afonso, Fan Zhang, and David R. Bull. 2019. Video compression based on spatio-temporal resolution adaptation. IEEE Transactions on Circuits and Systems for Video Technology 29, 1 (2019), 275--280

Digital Library

[2]

Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc Van Gool. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. In NIPS. 1141--1151.

[3]

Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. 2018. Extreme learned image compression with GANs. In CVPR Workshops. 2587--2590.

[4]

Eze Ahanonu, Michael Marcellin, and Ali Bilgin. 2018 Lossless image compression using reversible integer wavelet transforms and convolutional neural networks. In DCC. IEEE, 395.

[5]

Mohammad Akbari, Jie Liang, and Jingning Han. 2019. DSSLIC: Deep semantic segmentation-based layered image compression. In ICASSP. 2042--2046.

[6]

Mohammad Haris Baig, Vladlen Koltun, and Lorenzo Torresani. 2017. Learning to inpaint for image compression. In NIPS. 1246--1255.

[7]

Mohammad Haris Baig and Lorenzo Torresani. 2017. Multiple hypothesis colorization and its application to image compression. Computer Vision and Image Understanding 164 (2017), 111--123.

[8]

Johannes Ballé. 2018. Efficient nonlinear transforms for lossy image compression. In PCS. 248--252.

[9]

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-end optimization of nonlinear transform codes for perceptual quality. In PCS. IEEE, 1--5.

[10]

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016).

[11]

Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018).

[12]

Yoshua Bengio and Samy Bengio. 2000. Modeling high-dimensional discrete data with multi-layer neural networks. In NIPS. 400--406.

[13]

Gisle Bjontegaard. 2001. Calculation of Average PSNR Differences between RD-curves. Technical Report VCEG-M33. VCEG.

[14]

Yochai Blau and Tomer Michaeli. 2018. The perception-distortion tradeoff. In CVPR. 6228--6237.

[15]

Chunlei Cai, Li Chen, Xiaoyun Zhang, and Zhiyong Gao. 2019. Efficient variable rate image compression with multi-scale decomposition network. IEEE Transactions on Circuits and Systems for Video Technology 29, 12 (2019), 3687–3700.

[16]

Lukas Cavigelli, Pascal Hager, and Luca Benini. 2017. CAS-CNN: A deep convolutional neural network for image compression artifact suppression. In IJCNN. IEEE, 752--759.

[17]

Honggang Chen, Xiaohai He, Linbo Qing, Shuhua Xiong, and Truong Q. Nguyen. 2018. DPW-SDNet: Dual pixel-wavelet domain deep CNNs for soft decoding of JPEG-compressed images. In CVPR Workshops. 711--720.

[18]

Tong Chen, Haojie Liu, Qiu Shen, Tao Yue, Xun Cao, and Zhan Ma. 2017. DeepCoder: A deep neural network based video compression. In VCIP. IEEE, 1--4.

[19]

Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, and Pieter Abbeel. 2018. PixelSNAIL: An improved autoregressive generative model. In ICML. 863--871.

[20]

Yunjin Chen and Thomas Pock. 2016. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2016), 1256--1272.

Digital Library

[21]

Zhibo Chen and Tianyu He. 2019. Learning based facial image compression with semantic fidelity metric. Neurocomputing 338 (2019), 16--25.

Digital Library

[22]

Zhibo Chen, Tianyu He, Xin Jin, and Feng Wu. 2019. Learning for video compression. IEEE Transactions on Circuits and Systems for Video Technology.

Digital Library

[23]

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2018. Deep convolutional autoencoder-based lossy image compression. In PCS. IEEE, 253--257.

[24]

Michele Covell, Nick Johnston, David Minnen, Sung Jin Hwang, Joel Shor, Saurabh Singh, Damien Vincent, and George Toderici. 2017. Target-quality image compression with recurrent, convolutional neural networks. arXiv preprint arXiv:1705.06687 (2017).

[25]

Wenxue Cui, Tao Zhang, Shengping Zhang, Feng Jiang, Wangmeng Zuo, Zhaolin Wan, and Debin Zhao. 2017. Convolutional neural networks based intra prediction for HEVC. In DCC. IEEE, 436.

[26]

Yuanying Dai, Dong Liu, and Feng Wu. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In MMM. Springer, 28--39.

[27]

Yuanying Dai, Dong Liu, Zheng-Jun Zha, and Feng Wu. 2018. A CNN-based in-loop filter with CU classification for HEVC. In VCIP. 1--4.

[28]

Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2015. Compression artifacts reduction by a deep convolutional network. In ICCV. 576--584.

[29]

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a deep convolutional network for image super-resolution. In ECCV. Springer, 184--199.

[30]

Robert D. Dony and Simon Haykin. 1995. Neural network approaches to image compression. Proc. IEEE 83, 2 (1995), 288--303.

[31]

Thierry Dumas, Aline Roumy, and Christine Guillemot. 2017. Image compression with stochastic winner-take-all auto-encoder. In ICASSP. IEEE, 1512--1516.

[32]

Thierry Dumas, Aline Roumy, and Christine Guillemot. 2018. Autoencoder based image compression: Can the learning be quantization independent?. In ICASSP. IEEE, 1188--1192.

[33]

Longtao Feng, Xinfeng Zhang, Xiang Zhang, Shanshe Wang, Ronggang Wang, and Siwei Ma. 2018. A dual-network based super-resolution for compressed high definition video. In PCM. Springer, 600--610.

[34]

Chih-Ming Fu, Elena Alshina, Alexander Alshin, Yu-Wen Huang, Ching-Yeh Chen, Chia-Yang Tsai, Chih-Wei Hsu, Shaw-Min Lei, Jeong-Hoon Park, and Woo-Jin Han. 2012. Sample adaptive offset in the HEVC standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1755--1764.

Digital Library

[35]

Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, and Alberto Del Bimbo. 2017. Deep generative adversarial compression artifact removal. In ICCV. 4826--4835.

[36]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580--587.

[37]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672--2680.

[38]

Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, and Daan Wierstra. 2016. Towards conceptual compression. In NIPS. 3549--3557.

[39]

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. In ICML. 1462--1471.

[40]

Karol Gregor and Yann LeCun. 2011. Learning representations by maximizing compression. arXiv preprint arXiv:1108.1169 (2011).

[41]

Jun Guo and Hongyang Chao. 2016. Building dual-domain representations for compression artifacts reduction. In ECCV. Springer, 628--644.

[42]

Jun Guo and Hongyang Chao. 2017. One-to-many network for visually pleasing compression artifacts reduction. In CVPR. 3038--3047.

[43]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[44]

Xiaoyi He, Qiang Hu, Xiaoyun Zhang, Chongyang Zhang, Weiyao Lin, and Xintong Han. 2018. Enhancing HEVC compressed videos with a partition-masked convolutional neural network. In ICIP. IEEE, 216--220.

[45]

Geoffrey Hinton and Ruslan Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.

[46]

Jun-Hao Hu, Wen-Hsiao Peng, and Chia-Hua Chung. 2018. Reinforcement learning for HEVC/H.265 intra-frame rate control. In ISCAS. IEEE, 1--5.

[47]

Yueyu Hu, Wenhan Yang, Mading Li, and Jiaying Liu. 2019. Progressive spatial recurrent neural network for intra prediction. IEEE Transactions on Multimedia 21, 12 (2019), 3024–3037.

Digital Library

[48]

Shuai Huo, Dong Liu, Feng Wu, and Houqiang Li. 2018. Convolutional neural network-based motion compensation refinement for video coding. In ISCAS. 1--4.

[49]

Chuanmin Jia, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Jiaying Liu, Shiliang Pu, and Siwei Ma. 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing 28, 7 (2019), 3343--3356.

Digital Library

[50]

Feng Jiang, Wen Tao, Shaohui Liu, Jie Ren, Xun Guo, and Debin Zhao. 2018. An end-to-end compression framework based on convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2018), 3007--3018.

Digital Library

[51]

J. Jiang. 1999. Image compression with neural networks--A survey. Signal Processing: Image Communication 14, 9 (1999), 737--760.

[52]

Zhipeng Jin, Ping An, Liquan Shen, and Chao Yang. 2017. CNN oriented fast QTBT partition algorithm for JVET intra coding. In VCIP. IEEE, 1--4.

[53]

Zhipeng Jin, Ping An, Chao Yang, and Liquan Shen. 2018. Quality enhancement for intra frame coding via CNNs: An adversarial approach. In ICASSP. IEEE, 1368--1372.

[54]

Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jin Hwang, Joel Shor, and George Toderici. 2018. Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In CVPR. 4385--4393.

[55]

Nal Kalchbrenner, Aäron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. 2017. Video pixel networks. In ICML. 1771--1779.

[56]

Jihong Kang, Sungjei Kim, and Kyoung Mu Lee. 2017. Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec. In ICIP. 26--30.

[57]

Sungsoo Kim, Jin Soo Park, Christos G. Bampis, Jaeseong Lee, Mia K. Markey, Alexandros G. Dimakis, and Alan C. Bovik. 2018. Adversarial video compression guided by soft edge detection. arXiv preprint arXiv:1811.10673 (2018).

[58]

Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013).

[59]

Jan P. Klopp, Yu-Chiang Frank Wang, Shao-Yi Chien, and Liang-Gee Chen. 2018. Learning a code-space predictor by exploiting intra-image-dependencies. In BMVC. 1--12.

[60]

Alexander Kolesnikov and Christoph H. Lampert. 2016. Latent variable PixelCNNs for natural image modeling. arXiv preprint arXiv:1612.08185 (2016).

[61]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS. 1097--1105.

[62]

Hugo Larochelle and Iain Murray. 2011. The neural autoregressive distribution estimator. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 29--37.

[63]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.

[64]

Jooyoung Lee, Seunghyun Cho, and Seung-Kwon Beack. 2018. Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452 (2018).

[65]

Bin Li, Houqiang Li, Li Li, and Jinlei Zhang. 2014. λ domain rate control algorithm for high efficiency video coding. IEEE Transactions on Image Processing 23, 9 (2014), 3841--3854.

[66]

Chen Li, Li Song, Rong Xie, and Wenjun Zhang. 2017. CNN based post-processing to improve HEVC. In ICIP. IEEE, 4577--4580.

[67]

Jiahao Li, Bin Li, Jizheng Xu, Ruiqin Xiong, and Wen Gao. 2018. Fully connected network-based intra prediction for image coding. IEEE Transactions on Image Processing 27, 7 (2018), 3236--3247.

[68]

Ke Li, Bahetiyaer Bare, and Bo Yan. 2017. An efficient deep convolutional neural networks model for compressed image deblocking. In ICME. IEEE, 1320--1325.

[69]

Mu Li, Shuhang Gu, David Zhang, and Wangmeng Zuo. 2018. Enlarging context with low cost: Efficient arithmetic coding with trimmed convolution. arXiv preprint arXiv:1801.04662 (2018).

[70]

Mu Li, Wangmeng Zuo, Shuhang Gu, Debin Zhao, and David Zhang. 2018. Learning convolutional networks for content-weighted image compression. In CVPR. 673--681.

[71]

Ye Li, Bin Li, Dong Liu, and Zhibo Chen. 2017. A convolutional neural network-based approach to rate control in HEVC intra coding. In VCIP. IEEE, 1--4.

[72]

Yue Li, Li Li, Zhu Li, Jianchao Yang, Ning Xu, Dong Liu, and Houqiang Li. 2018. A hybrid neural network for chroma intra prediction. In ICIP. 1797--1801.

[73]

Yue Li, Dong Liu, Houqiang Li, Li Li, Zhu Li, and Feng Wu. 2019. Learning a convolutional neural network for image compact-resolution. IEEE Transactions on Image Processing 28, 3 (2019), 1092--1107.

[74]

Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2018. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transactions on Circuits and Systems for Video Technology 28, 9 (2018), 2316--2330.

[75]

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. Enhanced deep residual networks for single image super-resolution. In CVPR Workshops. 136--144.

[76]

Jianping Lin, Dong Liu, Houqiang Li, and Feng Wu. 2018. Generative adversarial network-based frame extrapolation for video coding. In VCIP. 1--4.

[77]

Jianping Lin, Dong Liu, Haitao Yang, Houqiang Li, and Feng Wu. 2019. Convolutional neural network-based block up-sampling for HEVC. IEEE Transactions on Circuits and Systems for Video Technology 29, 12 (2019), 3701–3715.

[78]

Dong Liu, Zhenzhong Chen, Shan Liu, and Feng Wu. 2019. Deep learning-based technology in responses to the joint call for proposals on video compression with capability beyond HEVC. IEEE Transactions on Circuits and Systems for Video Technology.

[79]

Dong Liu, Haichuan Ma, Zhiwei Xiong, and Feng Wu. 2018. CNN-based DCT-like transform for image compression. In MMM. Springer, 61--72.

[80]

Dong Liu, Dandan Wang, and Houqiang Li. 2017. Recognizable or not: Towards image semantic quality assessment for compression. Sensing and Imaging 18, 1 (2017), 1--20.

[81]

Dong Liu, Haochen Zhang, and Zhiwei Xiong. 2019. On the classification-distortion-perception tradeoff. In NeurIPS. 1204–1213.

[82]

Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. 2019. One-for-all: Grouped variation network based fractional interpolation in video coding. IEEE Transactions on Image Processing 28, 5 (2019), 2140--2151.

Digital Library

[83]

Kang Liu, Dong Liu, Houqiang Li, and Feng Wu. 2018. Convolutional neural network-based residue super-resolution for video coding. In VCIP. 1--4.

[84]

Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, and Wangmeng Zuo. 2018. Multi-level wavelet-CNN for image restoration. In CVPR Workshops. 773--782.

[85]

Zhenyu Liu, Xianyu Yu, Yuan Gao, Shaolin Chen, Xiangyang Ji, and Dongsheng Wang. 2016. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Transactions on Image Processing 25, 11 (2016), 5088--5103.

Digital Library

[86]

Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An end-to-end deep video compression framework. In CVPR. 11006--11015.

[87]

Sihui Luo, Yezhou Yang, Yanling Yin, Chengchao Shen, Ya Zhao, and Mingli Song. 2018. DeepSIC: Deep semantic image compression. In International Conference on Neural Information Processing. Springer, 96--106.

Digital Library

[88]

Changyue Ma, Dong Liu, Xiulian Peng, Li Li, and Feng Wu. 2019. Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues. IEEE Transactions on Circuits and Systems for Video Technology.

Digital Library

[89]

Changyue Ma, Dong Liu, Xiulian Peng, and Feng Wu. 2018. Convolutional neural network-based arithmetic coding of DC coefficients for HEVC intra coding. In ICIP. 1772--1776.

[90]

Changyue Ma, Dong Liu, Xiulian Peng, Zheng-Jun Zha, and Feng Wu. 2019. Neural network-based arithmetic coding for inter prediction information in HEVC. In ISCAS. 1--5.

[91]

Haichuan Ma, Dong Liu, Ruiqin Xiong, and Feng Wu. 2019. A CNN-based image compression scheme compatible with JPEG2000. In ICIP. 704--708.

[92]

Li Ma, Yonahong Tian, and Tieiun Huang. 2018. Residual-based video restoration for HEVC intra coding. In BigMM. IEEE, 1--7.

[93]

Siwei Ma, Xinfeng Zhang, Chuanmin Jia, Zhenghui Zhao, Shiqi Wang, and Shanshe Wang. 2019. Image and video compression with neural networks: A review. IEEE Transactions on Circuits and Systems for Video Technology.

[94]

Xiandong Meng, Chen Chen, Shuyuan Zhu, and Bing Zeng. 2018. A new HEVC in-loop filter based on multi-channel long-short-term dependency residual networks. In DCC. IEEE, 187--196.

[95]

Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. In CVPR. 4394--4402.

[96]

Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2019. Practical full resolution learned lossless image compression. In CVPR. 10629--10638.

[97]

David Minnen, Johannes Ballé, and George Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. In NIPS. 10794--10803.

[98]

David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, and Saurabh Singh. 2017. Spatially adaptive image compression using a tiled deep network. In ICIP. IEEE, 2796--2800.

[99]

Vinod Nair and Geoffrey Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In ICML. 807--814.

[100]

Andrey Norkin, Gisle Bjontegaard, Arild Fuldseth, Matthias Narroschke, Masaru Ikeda, Kenneth Andersson, Minhua Zhou, and Geert van der Auwera. 2012. HEVC deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1746--1754.

Digital Library

[101]

Alexander G. Ororbia, Ankur Mali, Jian Wu, Scott O’Connell, William Dreese, David Miller, and C. Lee Giles. 2019. Learned neural iterative decoding for lossy image compression systems. In DCC. 3--12.

[102]

Woon-Sung Park and Munchurl Kim. 2016. CNN-based in-loop filtering for coding efficiency improvement. In IEEE Image, Video, and Multidimensional Signal Processing Workshop. IEEE, 1--5.

[103]

J. Pfaff, P. Helle, D. Maniry, S. Kaltenstadler, W. Samek, H. Schwarz, D. Marpe, and T. Wiegand. 2018. Neural network based intra prediction for video coding. In Applications of Digital Image Processing XLI, Vol. 10752. 1075213.

[104]

Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, and James Storer. 2017. Semantic perceptual image compression using deep convolution networks. In DCC. IEEE, 250--259.

[105]

Saurabh Puri, Sébastien Lasserre, and Patrick Le Callet. 2017. CNN-based transform index prediction in multiple transforms framework to assist entropy coding. In EUSIPCO. IEEE, 798--802.

[106]

Oren Rippel and Lubomir Bourdev. 2017. Real-time adaptive image compression. In ICML. 2922--2930.

[107]

Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2019. Learned video compression. In ICCV. 3454–3463.

[108]

Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. 2017. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517 (2017).

[109]

Shibani Santurkar, David Budden, and Nir Shavit. 2018. Generative compression. In PCS. IEEE, 258--262.

[110]

Ionut Schiopu, Yu Liu, and Adrian Munteanu. 2018. CNN-based prediction for lossless coding of photographic images. In PCS. IEEE, 16--20.

[111]

Andrew Segall, Vittorio Baroncini, Jill Boyce, Jianle Chen, and Teruhiko Suzuki. 2017. Joint Call for Proposals on Video Compression with Capability Beyond HEVC. Technical Report JVET-H1002. JVET.

[112]

Claude Elwood Shannon. 1948. A mathematical theory of communication. Bell System Technical Journal 27, 3 (1948), 379--423.

[113]

Athanassios Skodras, Charilaos Christopoulos, and Touradj Ebrahimi. 2001. The JPEG 2000 still image compression standard. IEEE Signal Processing Magazine 18, 5 (2001), 36--58.

[114]

Jake Snell, Karl Ridgeway, Renjie Liao, Brett D. Roads, Michael C. Mozer, and Richard S. Zemel. 2017. Learning to generate images with perceptual similarity metrics. In ICIP. IEEE, 4277--4281.

Digital Library

[115]

Nan Song, Zhenyu Liu, Xiangyang Ji, and Dongsheng Wang. 2017. CNN oriented fast PU mode decision for HEVC hardwired intra encoder. In GlobalSIP. IEEE, 239--243.

[116]

Rui Song, Dong Liu, Houqiang Li, and Feng Wu. 2017. Neural network-based arithmetic coding of intra prediction modes in HEVC. In VCIP. 1--4.

[117]

Xiaodan Song, Jiabao Yao, Lulu Zhou, Li Wang, Xiaoyang Wu, Di Xie, and Shiliang Pu. 2018. A practical convolutional neural network as loop filter for intra frame. In ICIP. IEEE, 1133--1137.

[118]

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649--1668.

Digital Library

[119]

Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. 2017. Memnet: A persistent memory network for image restoration. In ICCV. 4539--4547.

[120]

Lucas Theis and Matthias Bethge. 2015. Generative image modeling using spatial LSTMs. In NIPS. 1927--1935.

[121]

Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395 (2017).

[122]

George Toderici, Sean M. O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015).

[123]

George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. 2017. Full resolution image compression with recurrent neural networks. In CVPR. 5306--5314.

[124]

Robert Torfason, Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Towards image understanding from deep compression without decoding. arXiv preprint arXiv:1803.06131 (2018).

[125]

Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, and Jan Kautz. 2018. Learning binary residual representations for domain-specific video streaming. In AAAI. 7363--7370.

[126]

P. N. Tudor. 1995. MPEG-2 video compression. Electronics 8 Communication Engineering Journal 7, 6 (1995), 257--264.

[127]

Benigno Uria, Iain Murray, and Hugo Larochelle. 2013. RNADE: The real-valued neural autoregressive density-estimator. In NIPS. 2175--2183.

[128]

Benigno Uria, Iain Murray, and Hugo Larochelle. 2014. A deep and tractable density estimator. In ICML. 467--475.

[129]

Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel recurrent neural networks. In ICML. 1747--1756.

[130]

Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. Conditional image generation with PixelCNN decoders. In NIPS. 4790--4798.

[131]

Aaron van den Oord and Benjamin Schrauwen. 2014. Factoring variations in natural images with deep Gaussian mixture models. In NIPS. 3518--3526.

[132]

Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii--xxxiv.

Digital Library

[133]

Tingting Wang, Mingjin Chen, and Hongyang Chao. 2017. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In DCC. IEEE, 410--419.

[134]

Tingting Wang, Wenhui Xiao, Mingjin Chen, and Hongyang Chao. 2018. The multi-scale deep decoder for the standard HEVC bitstreams. In DCC. IEEE, 197--206.

[135]

Yang Wang, Xiaopeng Fan, Chuanmin Jia, Debin Zhao, and Wen Gao. 2018. Neural network based inter prediction for HEVC. In ICME. IEEE, 1--6.

[136]

Yingbin Wang, Han Zhu, Yiming Li, Zhenzhong Chen, and Shan Liu. 2018. Dense residual convolutional neural network based in-loop filter for HEVC. In VCIP. IEEE, 1--4.

[137]

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600--612.

Digital Library

[138]

Zhangyang Wang, Ding Liu, Shiyu Chang, Qing Ling, Yingzhen Yang, and Thomas S. Huang. 2016. D3: Deep dual-domain based fast restoration of JPEG-compressed images. In CVPR. 2764--2772.

[139]

Yijing Watkins, Oleksandr Iaroshenko, Mohammad Sayeh, and Garrett Kenyon. 2018. Image compression: Sparse coding vs. bottleneck autoencoders. In IEEE Southwest Symposium on Image Analysis and Interpretation. IEEE, 17--20.

[140]

Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560--576.

Digital Library

[141]

Ian H. Witten, Radford M. Neal, and John G. Cleary. 1987. Arithmetic coding for data compression. Commun. ACM 30, 6 (1987), 520--541.

Digital Library

[142]

David H. Wolpert and William G. Macready. 1997. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1, 1 (1997), 67--82.

Digital Library

[143]

Chao-Yuan Wu, Nayan Singhal, and Philipp Krähenbühl. 2018. Video compression through image interpolation. In ECCV. 416--431.

[144]

Feng Wu, Dong Liu, Jizheng Xu, Bin Li, Houqiang Li, Zhibo Chen, Li Li, Fangdong Chen, Yuanying Dai, Lei Guo, Ye Li, Yue Li, Jianping Lin, Changyue Ma, Ning Yan, Wen Gao, Siwei Ma, Ruiqin Xiong, Yiqun Xu, Jiahao Li, Xiaopeng Fan, Na Zhang, Yang Wang, Tao Zhang, Min Gao, Zhenzhong Chen, Yan Zhou, Xiang Pan, Yiming Li, Feiyang Liu, and Yingbin Wang. 2018. Description of SDR Video Coding Technology Proposal by University of Science and Technology of China, Peking University, Harbin Institute of Technology, and Wuhan University. Technical Report JVET-J0032. JVET.

[145]

Jingyao Xu, Mai Xu, Yanan Wei, Zulin Wang, and Zhenyu Guan. 2019. Fast H.264 to HEVC transcoding: A deep learning method. IEEE Transactions on Multimedia 21, 7 (2019), 1633--1645.

[146]

Mai Xu, Tianyi Li, Zulin Wang, Xin Deng, Ren Yang, and Zhenyu Guan. 2018. Reducing complexity of HEVC: A deep learning approach. IEEE Transactions on Image Processing 27, 10 (2018), 5044--5059.

[147]

Ning Yan, Dong Liu, Bin Li, Houqiang Li, Tong Xu, and Feng Wu. 2018. Convolutional neural network-based invertible half-pixel interpolation filter for video coding. In ICIP. 201--205.

[148]

Ning Yan, Dong Liu, Houqiang Li, Bin Li, Li Li, and Feng Wu. 2019. Convolutional neural network-based fractional-pixel motion compensation. IEEE Transactions on Circuits and Systems for Video Technology 29, 3 (2019), 840--853.

Digital Library

[149]

Ning Yan, Dong Liu, Houqiang Li, Bin Li, Li Li, and Feng Wu. 2019. Invertibility-driven interpolation filter for video coding. IEEE Transactions on Image Processing 28, 10 (2019), 4912--4925.

[150]

Ning Yan, Dong Liu, Houqiang Li, and Feng Wu. 2017. A convolutional neural network approach for half-pel interpolation in video coding. In ISCAS. IEEE, 1--4.

[151]

Ren Yang, Mai Xu, Tie Liu, Zulin Wang, and Zhenyu Guan. 2019. Enhancing quality for HEVC compressed videos. IEEE Transactions on Circuits and Systems for Video Technology 29, 7 (2019), 2039--2054.

[152]

Ren Yang, Mai Xu, Zulin Wang, and Tianyi Li. 2018. Multi-frame quality enhancement for compressed video. In CVPR. 6664--6673.

[153]

Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. 2018. Crafting a toolchain for image restoration by deep reinforcement learning. In CVPR. 2443--2452.

[154]

Han Zhang, Li Song, Zhengyi Luo, and Xiaokang Yang. 2017. Learning a convolutional neural network for fractional interpolation in HEVC inter coding. In VCIP. IEEE, 1--4.

[155]

Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26, 7 (2017), 3142--3155.

Digital Library

[156]

Qingyu Zhang, Dong Liu, and Houqiang Li. 2017. Deep network-based image coding for simultaneous compression and retrieval. In ICIP. IEEE, 405--409.

[157]

Xiaoshuai Zhang, Wenhan Yang, Yueyu Hu, and Jiaying Liu. 2018. DMCNN: Dual-domain multi-scale convolutional neural network for compression artifacts removal. In ICIP. IEEE, 390--394.

[158]

Yongbing Zhang, Tao Shen, Xiangyang Ji, Yun Zhang, Ruiqin Xiong, and Qionghai Dai. 2018. Residual highway convolutional neural networks for in-loop filtering in HEVC. IEEE Transactions on Image Processing 27, 8 (2018), 3827--3841.

[159]

Yongbing Zhang, Lulu Sun, Chenggang Yan, Xiangyang Ji, and Qionghai Dai. 2018. Adaptive residual networks for high-quality image restoration. IEEE Transactions on Image Processing 27, 7 (2018), 3150--3163.

[160]

Zhizheng Zhang, Zhibo Chen, Jianxin Lin, and Weiping Li. 2019. Learned scalable image compression with bidirectional context disentanglement network. In ICME. 1438--1443.

[161]

Lijun Zhao, Huihui Bai, Anhong Wang, and Yao Zhao. 2019. Learning a virtual codec based on deep convolutional neural network to compress image. Journal of Visual Communication and Image Representation 63 (2019), 102589.

Digital Library

[162]

Lei Zhao, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2019. Enhanced motion-compensated video coding with deep virtual reference frame generation. IEEE Transactions on Image Processing 28, 10 (2019), 4832--4844.

[163]

Zhenghui Zhao, Shiqi Wang, Shanshe Wang, Xinfeng Zhang, Siwei Ma, and Jiansheng Yang. 2019. Enhanced bi-prediction with convolutional neural network for high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology 29, 11 (2019), 3291–3301.

Digital Library

[164]

Lei Zhou, Chunlei Cai, Yue Gao, Sanbao Su, and Junmin Wu. 2018. Variational autoencoder for low bit-rate image compression. In CVPR Workshops. 2617--2620.

Cited By

Mohod NAgrawal PMadaan V(2024)A Novel Approach for Surveillance Compression using Neural Network TechniqueInternational Research Journal of Multidisciplinary Technovation10.54392/irjmt2436(77-89)Online publication date: 23-Apr-2024
https://doi.org/10.54392/irjmt2436
Huo SLiu DZhang HLi LMa SWu FGao W(2024)Towards Hybrid-Optimization Video CodingACM Computing Surveys10.1145/365214856:9(1-36)Online publication date: 24-Apr-2024
https://dl.acm.org/doi/10.1145/3652148
Yang RLiu DMa SWu FGao W(2024)Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003420:7(1-22)Online publication date: 29-Feb-2024
https://dl.acm.org/doi/10.1145/3650034
Show More Cited By

Index Terms

Deep Learning-Based Video Coding: A Review and a Case Study
1. Computing methodologies
  1. Computer graphics
    1. Image compression

Recommendations

Multiwavelet video coding based on DCT time domain filtering
Transactions on Edutainment VII

To improve the video encoding efficiency and deal with the real-time demerits of the multiwavelet time-domain filtering in the 3D multiwavelet, a multiwavelet video coding scheme based on DCT(Digital Cosine Transform) time-domain filtering is proposed ...
Learning-Based Video Coding with Joint Deep Compression and Enhancement
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

End-to-end learning-based video coding has attracted substantial attentions by compressing video signals as stacked visual features. This paper proposes an end-to-end deep video codec with jointly optimized compression and enhancement modules (JCEVC). ...
On lossless intra coding in HEVC with 3-tap filters

This paper presents a pixel-by-pixel spatial prediction method for lossless intra coding within High Efficiency Video Coding (HEVC). Previous pixel-by-pixel spatial prediction methods use only two neighboring pixels for prediction, based on the angular ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 53, Issue 1

January 2021

781 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3382040

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 February 2020

Accepted: 01 October 2019

Revised: 01 September 2019

Received: 01 May 2019

Published in CSUR Volume 53, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Survey
Refereed

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

91
Total Citations
View Citations
3,822
Total Downloads

Downloads (Last 12 months)582
Downloads (Last 6 weeks)33

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mohod NAgrawal PMadaan V(2024)A Novel Approach for Surveillance Compression using Neural Network TechniqueInternational Research Journal of Multidisciplinary Technovation10.54392/irjmt2436(77-89)Online publication date: 23-Apr-2024
https://doi.org/10.54392/irjmt2436
Huo SLiu DZhang HLi LMa SWu FGao W(2024)Towards Hybrid-Optimization Video CodingACM Computing Surveys10.1145/365214856:9(1-36)Online publication date: 24-Apr-2024
https://dl.acm.org/doi/10.1145/3652148
Yang RLiu DMa SWu FGao W(2024)Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003420:7(1-22)Online publication date: 29-Feb-2024
https://dl.acm.org/doi/10.1145/3650034
Wang LShi YWang JChen SYin BLing N(2024)Graph Based Cross-Channel Transform for Color Image CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363171020:4(1-25)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3631710
Shao JZhang XZhang J(2024)Task-Oriented Communication for Edge Video AnalyticsIEEE Transactions on Wireless Communications10.1109/TWC.2023.331488823:5(4141-4154)Online publication date: May-2024
https://doi.org/10.1109/TWC.2023.3314888
Guo HKwong SYe DWang S(2024)Enhanced Context Mining and Filtering for Learned Video CompressionIEEE Transactions on Multimedia10.1109/TMM.2023.331642926(3814-3826)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3316429
Akhtar ALi ZVan der Auwera G(2024)Inter-Frame Compression for Dynamic Point Cloud Geometry CodingIEEE Transactions on Image Processing10.1109/TIP.2023.334309633(584-594)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2023.3343096
Jia CYe FDong FLin KChiariglione LMa SSun HGao W(2024)MPAI-EEV: Standardization Efforts of Artificial Intelligence Based End-to-End Video CodingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.331018834:5(3096-3110)Online publication date: May-2024
https://doi.org/10.1109/TCSVT.2023.3310188
Jia JZhang YZhu HChen ZLiu ZXu XLiu S(2024)Deep Reference Frame Generation Method for VVC Inter Prediction EnhancementIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329941034:5(3111-3124)Online publication date: May-2024
https://doi.org/10.1109/TCSVT.2023.3299410
Lim WStallenberger BPfaff JSchwarz HMarpe DWiegand T(2024)Simplified CNN In-Loop Filter with fixed Classifications2024 Picture Coding Symposium (PCS)10.1109/PCS60826.2024.10566438(1-5)Online publication date: 12-Jun-2024
https://doi.org/10.1109/PCS60826.2024.10566438
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents