Soc Hpec18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/328314317
Rate-Distortion Optimized Quantization: A Deep Learning Approach
Conference Paper · September 2018
CITATIONS READS
6 877
3 authors, including:
Thuong Nguyen Canh Byeungwoo Jeon

Tencent America Sungkyunkwan University
74 PUBLICATIONS 320 CITATIONS 317 PUBLICATIONS 3,639 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Future Video Coding View project
Image Defogging View project
All content following this page was uploaded by Thuong Nguyen Canh on 16 October 2018.
The user has requested enhancement of the downloaded file.

Rate-Distortion Optimized Quantization: A Deep
Learning Approach
Thuong Nguyen Canh, Motong Xu, and Byeungwoo Jeon
School of Electrical and Computer Engineering, Sungkyunkwan Unviersity, Korea
{ngcthuong, xumotong, bjeon}@skku.edu
Abstract—Rate-Distortion Optimized Quantization (RDOQ) Loop inside Transform Block

Loop inside a CG
is a very effective video encoding tool in MPEG HEVC/H.265. It
Scalar Last Non- Sign Bit
sequentially decides the optimum quantized levels of each Quant.
Level All-Zero
Zero Coef. Hiding
Est. (LE) CG (AZ)
transform coefficient in a given transform block in the sense of C (SQ) (LAST) (SBH)
Transform
the least rate-distortion cost. The best level of each coefficient is Coefficient LSQ LLE LAZ LLAST LSBH
searched among multiple candidate quantized levels. Since the
optimal quantized level of one coefficient is affected by its
context (i.e., its location and previous quantized levels), RDOQ Transform
is difficult to parallelize. This paper is the first attempt to Block (TB) of
size 16x16
explore possibility of using deep learning in HEVC quantization. Diagonal Horizontal Vertical
We set up a machine learning problem for RDOQ which
predicts corresponding RDOQ quantized output upon receiving Fig. 1. Rate-Distortion Optimized Quantization in HEVC. Each
coefficient group (CG) in the given transform block (TB) is scanned
scalar quantization (SQ) result of a block as input. A residual following a scan order (diagonal, horizontal, or vertical). Each
learning framework is employed to predict the difference of SQ coefficient in a CG is scanned also in the same order as CG.
from RDOQ after further simplification that the residual
becomes binary. By using a deep convolution neural network, better prediction [4], and pre/post filtering processing [5, 6].
the proposed deep learning based RDOQ (DL-RDOQ) is able to The second group uses DL to perform end-to-end compression
predict the optimal quantized levels without computing rate and in different ways from standards with various approaches such
distortion. Our experiments show potentially promising as learned pixel generation network [7], joint up/down
performance following RDOQ, especially at high bitrates. sampling [8], etc. In the first and second groups, DL has been
applied to such problems already shown their outstanding
Keywords—Quantization, HEVC, rate-distortion optimized
performance such as super-resolution [9], denoising [10], etc.
quantization, deep learning, convolutional neural network.
In this paper, we apply the deep learning to HEVC
I. INTRODUCTION1 encoding problem, more specifically, unlikely those DL-based
approaches [2-8] so far, to quantization problem. Rate-
The recent video compression standard of ISO/IEC MPEG Distortion Optimized Quantization (RDOQ) [11] is known to
HEVC/ITU-T H.265, more popularly known as High offer about 5% of bitrate reduction compared to the
Efficiency Video Coding (HEVC), has improved coding conventional scalar quantization plus dead-zone (SQDZ). It
performance of ITU-T H.264/MPEG AVC (Advanced Video sequentially estimates the optimal quantization levels for each
Coding) by about 50% [1]. Similarly, to its predecessors, it coefficient in a transform block (TB) in the sense of minimum
first generates residual signal by either inter or intra rate and distortion. This is computationally very complex, so
prediction, then quantizes the signal after transform. The for more practical HEVC encoding, a computationally simpler
statistical redundancy in the quantized signal is further method is implemented in HM [16]. However, even such a
reduced by entropy coding. However, HEVC has improved computationally simplified RDOQ still takes up about 12% of
AVC in many aspects, for examples, by more flexible block encoding time in HM [16]. As a result, most research for
partitioning, better prediction (through more intra prediction RDOQ is devoted to reducing its complexity by either even
modes, advanced motion prediction, etc.), improved design of further simplification [12] or even skipping [12, 13] of the rate
transform, optimal quantization by RDOQ, better adaptive estimation. However, still less explored in implementing
arithmetic coding, and so on. As closed-circuit television RDOQ is those problems dealing with its sequential nature
becomes more popular for security application, its massive deterring parallelization.
data storage requirement necessitates low bitrate-low
complexity surveillance video system based on HEVC [18]. This is the first attempt to study deep learning in RDOQ.
We propose to use deep convolutional neural network to
With the recent blooming of big data processing assisted by predict RDOQ without actual estimation of rate and distortion,
massive computation, deep learning (DL) has demonstrated its and it is named as Deep Learning-based RDOQ (DL-RDOQ).
superior performance in many tasks from classification, It is designed to predict optimal quantized TB as a whole,
recognition, low-level vision to image/video coding. The DL- therefore, it will be friendly to parallelization when the deep
based-video coding research can be grouped into (i) standard- learning framework itself supports parallelization.
compliant (requires no modification for standard) and; (ii)
standard-non-compliant (requires modification of standard) The rest of this paper is organized as follows. Section II
ones. The first group encodes more effectively the existing first introduces RDOQ and formulates deep learning approach
coding standard with DL and it can be further divided to (1) for RDOQ. The proposed DL-RDOQ is addressed in Section
fast methods for selecting block partition [2], intra prediction III with the simplified residual learning framework. Finally,
mode [3], etc.; (2) improving compression efficiency with the paper is concluded in Section V.
1
This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No.2018-0-00348,
Development of Intelligent Video Surveillance Technology to Solve Problem of Deteriorating Arrest Rate by Improving CCTV Constraint).
Table 1. Possible input and output of DL-RDOQ. Multiple input and
SQ output can be used to improve the prediction performance.
SQDZ Possible Input ( ) Possible Output ( )

Unquant. coeff: After level estimation:
/Δ ∑ Float coeff. /Δ After all zero coeff. group:
Scalar quant.: After last nonzero coeff.:
DL
Model Deadzone quant.: After sign bit hiding:
… Estimated residual
The capital letter denotes a 2D matrix representing a whole TB.
Some combination
Fig. 2. The proposed residual learning approach for the rate- III. DEEP LEARNING-BASED RDOQ (DL-RDOQ)
distortion optimized quantization.
A. Simplified RDOQ
This section addresses the problem of deep learning-based
II. RATE-DISTORTION OPTIMIZED QUANTIZATION RDOQ, namely, DL-RDOQ, which predicts the optimal
quantized levels of a whole TB without estimating rate and
A. Optimal Rate-Distortion Optimized Quantization distortion. For practical reason, we investigate DL-RDOQ as
a supervised learning problem with possible inputs and
RDOQ finds the optimal quantized level [11] of each outputs given in Table 1. In addition, noting that the residual
transform coefficient by minimizing the rate ( ) and distortion learning has better proven performance [10], we make DL-
( ) cost in (1) of a given TB. RDOQ implemented on HM following the residual learning
arg min where . 1 scheme, that is, predicting the residual of a given TB, ≜
as shown in Fig. 2.
RDOQ has practical implementation issues due to (i) huge
number of candidates to search for achieving optimality; too To find out which RDOQ process is suitable for DL-
excessive amount of computation to evaluate (ii) rate and (iii) RDOQ, we analyze the values in at various stages. As in
distortion for those candidates. Fig. 3, except in SBH, the residual signal assumes only three
values of {0, 1, 2} with a very small probability of residual
B. Rate-Distortion Optimized Quantization in HEVC value equal to 2, Pr re 2 where re is an element in the
Practical compromise in RDOQ [11] can be made by matrix Re. SBH can have additional -1 value. In addition, SBH
limiting the number of candidates [16] and/or simplifying the is related to signs of coefficients in a CG, so a single miss
rate calculation via lookup tables [16]. The CABAC process prediction in SBH will cause a level to be – , which will
in HEVC splits the transform coefficients of a TB (which can significantly increase distortion. We, therefore, use the output
be of size 4×4, 8×8, 16×16, or 32×32), denoted by , to one after the LAST process as the output of prediction and
or more coefficient groups (CG) of size 4×4, and processes set value 2 to 1 for simplification. Now, element in the
them in five internal steps shown in Fig. 1. simplified only assumes values of 0 or 1.
Firstly, scalar quantization (SQ) is executed as B. Proposed Deep Learning-Based RDOQ (DL-RDOQ)
, , where Δ denotes a quantization step size As deep learning has also drawn significant interest in
and ⋅ represents the floor operator. Following, the level video coding communities, this work studies DL-RDOQ.
estimation (LE) process selects the best quantized levels of a However, as the first research on DL- RDOQ, it is challenging
to find a suitable network for RDOQ. It is because, DL is often
given CG from a candidate list consisting of and 1.
applied to signals in the spatial domain like in image
Value 0 is further considered as the third candidate if 2. enhancement, while RDOQ deals with the DCT transformed
The LE process is bypassed if 0. Thirdly, the detection signals with have less correlation and different characteristic.
process of All Zero (AZ) coding group decides whether to set
all levels in the given CG to all zero based on rate-distortion 1) Deep Convolution Neural Network
cost. Fourthly, the last non-zero coefficient (LAST) process Convolutional neural network (CNN) is a class of deep
detects the best location for the last non-zero level. Finally, learning methods which show high performance in many
Sign Bit Hiding (SBH) process is used to hide a sign bit for recognition tasks. CNN is well-known for its low complexity,
the given CG. transition invariance and weight sharing characteristic.
Fig. 3. Distribution (%) of the residual values in (vertical axis) at various RDOQ stages in Fig. 1 for Kimono sequence, Intra, TB size of 8×8 and
32×32. This analysis tells that we can simplify the residual values by removing the case of value of 2 in Re as it rarely occurs.
Fig. 4 Average L1 prediction error from RDOQ output in eq. (2) (vertical axis) over iterations (horizontal axis, ×1000) respectively by DL-RDOQ
implemented using FCN_VGG [14] and scalar quantization (SQ) for intra coded 1st frame of Kimono, BQTerrace, BasketballDrive with QPs (22~37).
The percent number indicates error reduction ratio in (3).
The reason for using CNN to predict RDOQ is that local different TB size. We set learning rate to 0.0001, momentum
correlation exists in optimal quantization. Rate estimation is of 0.99, mini-batch size of 512 and total 80,000 iterations.
based on the context modeling for each level depending on the
previous levels, its frequency location and quantized value. In IV. EXPERIMENTAL RESULTS
addition, RDOQ is processed in CG units of size 4×4.
Therefore, CNN filter could learn to predict the optimal level A. Prediction Performance
of RDOQ utilizing the local/context characteristic. This work
fully follows convolution network FCN_VGG [14] to validate To evaluate the prediction performance of RDOQ output,
the effectiveness of DL which only uses convolution and Relu its average prediction error is computed as:
activation layers. The DL-RDOQ can be parallelized using ∑ , 2
GPU thanks to the nature of CNN framework such as Caffe
[15]. The training input/output pair is selected as and where denotes the predicted residual of a given TB,
– . denotes the RDOQ prediction, and m is the total
number of TBs for evaluation. Since the residual only contain
2) Dataset Collection 0 or 1, 1 error is equivalent to the average number of
To enable DL-RDOQ, training dataset is collected from different levels between DL-RDOQ and RDOQ.
HM 16.15 under coding configuration of Random Access The prediction results along iterations are shown in Fig. 5.
(RA) with QP 22, 27, 32, 37. We collect the values of scalar The fact that the error is reduced much more by DL-RDOQ
quantization and RDOQ after each internal stage (LE, AZ, than by SQ over iterations clearly shows that the DL could
LAST, and SBH) together with information of the current TB predict the RDOQ output successfully. It is noteworthy to see
(i.e., prediction mode, scan mode, CU size, TB size). The the different degree of reduction with respect to different TB
dataset is then grouped according to TB size (4×4, 8×8, sizes, which certainly hints on further necessary investigation
16×16, 32×32). The dataset of each size is further divided into for designing different network for different TB size as future
training, testing, and validation with proportion of 90:5:5. work.
3) Loss Function To further evaluate the difference regarding TB sizes, we
Since the residual output is a 2D matrix having only 0 and normalize the error by TB size, and name it as error reduction
1 values as elements, we model the problem similar to ratio. It represents the ratio of prediction errors between DL-
semantic segmentation with only two labels, and employ the RDOQ and SQ at the last iteration, and computed as,
logistic loss function. In fact, the network structure
FCN_VGG remains identical to the semantic segmentation error reduction ratio= ∗ 100, 3
[14]. It should be noted that the proposed network only mimics We observe that DL-RDOQ reduced L1 error by
the RDOQ and does not utilize any rate information. 50%~63% on average compared to SQ and has better error
4) Training reduction ratio. We observe that the variation in residual
Four networks corresponding to different TB sizes (4×4, characteristic has effects on the prediction results. That is, the
8×8, 16×16, and 32×32) are implemented under Caffe smaller probability of 1, (i.e., large QPs, large TB size),
framework [15] and trained with corresponding dataset of the poorer prediction performance DL-RDOQ has.
Fig. 6. Coding performance of DL-RDOQ implemented on HM 16.15 for Kimono (1920x1080, 50fps) at All Intra (left) and Random-Access (right).
_ V. CONCLUSION
∑ SBH* This paper proposed a DL-based method to predict RDOQ
without rate-distortion estimation via residual CNN network.
DL 4x4
The proposed method demonstrated that despite of using DL
DL 8x8 simply as a black box, we were able to predict RDOQ pretty
well and produced much better performance than RDOQ-Off.
DL 16x16
DL 32x32 REFERENCES
Sign
*SBH considers distortion only (similar in SQDZ) [1] G. Sullivan, and et al., “Overview of the High Efficiency Video Coding
Fig. 5. The proposed residual DL-RDOQ implemented on HM 16.15. (HEVC) Standard,” IEEE Trans. Circ. Syst. Video Tech., vol. 22, no.
12, pp. 1649-1668, 2012.
[2] M. Xu, and et al., “Reducing complexity of HEVC: A deep learning
approach,” IEEE Trans. Image Process., vol. 27, no. 10, 2018.
B. Coding Performance
[3] L. Thorsten and O. Jorn, “Deep learning based intra prediction mode
To test DL-RDOQ in HEVC encoding scenario, we decision for HEVC,” Proc. of IEEE Picture Coding Symposium, 2016.
implement the trained networks on top of the reference [4] J. Li, and et al., “Fully connected network-based intra prediciton for
software HM 16.15 with Caffe C++ [15] interface. Four image coding,” IEEE Trans. Image Process., vol. 27, no. 7, 2018.
trained FCN_VGG networks are used to generate output when [5] R. Lin, and et al., “Deep CNN for Decompressed Video Enhancement,”
scalar quantized data is given as input. The DL-RDOQ Proc. of IEEE Data Compression Conference, 2016.
prediction process does not consider the sign information. The [6] R. Yang, M. Xu, Z. Wang, and T. Li, “Multi-Frame Quality Enha
corresponding TB size is provided to choose a proper residual ncement for Compressed Video,” arXiv:1803.04680, 2018.
DL-RDOQ network corresponding to the given size. The [7] A. Oord, and et al., “Conditional image generation with PixelCNN
decoders,” Inter. Conf. Neur. Info. Process. Sys., pp. 4797-4805, 2016.
predicted residual signal is subtracted from the input to
[8] F. Jiang, and et al., “An end-to-end compression framework based on
obtain the RDOQ prediction after LAST. Sign information is convolutional neural networks,” IEEE Trans.Circ.Syst.Vid.Tech., 2017.
used in SBH to deliver the final RDOQ as shown in Fig. 5. [9] C. Dong, and et al. “Image super-resolution using deep convolutional
We compare the coding performance of DL-RDOQ with networks,” arXiv: 501.00092, Jul. 2015.
RDOQ-On and RDOQ-Off with HM 16.15 [16]. SBH is on in [10] K. Zhang, and et al., “Beyond a Gaussian Denoiser: Residual learning
of deep CNN for image denoising,” IEEE Trans. Image Process., vol.
both testing cases. For SBH, both rate and distortion are 26, no. 7, pp. 3142 – 3155, 2017.
computed in the RDOQ-On case but distortion is computed [11] M. Karczewicz, and et al., “Rate Distortion Optimized Quantization,”
only in the DL-RDOQ and RDOQ-Off cases in testing. It is document ITU-T SG16 Q.6, VCEG-AH21, 2008.
because the rate is estimated in RDOQ-On but not in RDOQ- [12] H. Lee, and et al., “Fast quantization method with simplified rate-
Off nor DL-RDOQ. The rate-distortion curves for all intra distortion optimized quantization for an HEVC encoder,” IEEE Trans.
(AI) and random access (RA) are shown in Fig. 6 using 10 Cir. and Sys. Video Tech., vol. 26, no. 1, pp. 106 – 116, 2016.
frames of the sequence Kimono (1920x1080, 50fps) for AI [13] M. Xu, and et al., “Simplified rate-distortion optimized quantization
and 64 frames for RA under common test condition [17]. for HEVC,” Proc. IEEE Inter. Sym. Broad. Mul. Sys. Broadcast., 2018.
[14] E. Shelhamer et al., “Fully convolutional neural network for semantic
DL-RDOQ performs better than RDOQ-Off (or SQDZ) segmentation,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 39, no. 4,
while fairly approximating the performance of RDOQ-On pp.640 – 651, 2016.
especially at high bit-rate. It is because, at high bit-rate, there [15] Y. Jia, and et al., “Caffe: Convolutional Architecture for Fast Feature
is more residual value of 1 which leads to smaller prediction Embedding,” ACM Inter. Conf. Multi Media, pp. 674-678, 2014.
error of DL-RDOQ. As RDOQ-Off (SQDZ) is better than SQ, [16] High Efficiency Video Coding Test Software 16.15, available at
DL-RDOQ shows better performance than its initial input, https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/M-16.15.
SQ. On the other hand, this work only utilizes deep learning [17] F. Bossen, “Common HM Test Conditions and Software Reference
Config,” Joint Collaborative Team on Video Coding, JCTVC-L1100.
in DL-RDOQ as a black box solution. So, its performance
boost can be archived more by further fine tuning such as [18] X. Zhang, and et al., “Optimizing the Hierarchical Prediction and
Coding in HEVC for Surveillance and Conference Videos With
adding more layers, customizing network structure so that to Background Modeling,” IEEE Trans. Image Process., vol. 23, no. 10,
be able to better utilize knowledge on RDOQ. pp. 4511 – 4526, 2014.
View publication stats

Soc Hpec18

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Soc Hpec18

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Soc Hpec18

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Rate-Distortion Optimized Quantization: A Deep Learning Approach

Conference Paper · September 2018

Thuong Nguyen Canh Byeungwoo Jeon

SEE PROFILE SEE PROFILE

Future Video Coding View project

Image Defogging View project

The user has requested enhancement of the downloaded file.

Abstract—Rate-Distortion Optimized Quantization (RDOQ) Loop inside Transform Block

SQDZ Possible Input ( ) Possible Output ( )

View publication stats

You might also like