Complexity-Based Consistent-Quality Encoding in The Cloud

COMPLEXITY-BASED CONSISTENT-QUALITY ENCODING IN THE CLOUD Jan De Cock, Zhi Li, Megha Manohara, and Anne Aaron Netflix Inc.,100 Winchester Circle, Los Gatos, CA, United States

Uploaded by

Anonymous d2Vg3tO

0% found this document useful (0 votes)

173 views

Complexity-Based Consistent-Quality Encoding in The Cloud

COMPLEXITY-BASED CONSISTENT-QUALITY ENCODING IN THE CLOUD Jan De Cock, Zhi Li, Megha Manohara, and Anne Aaron Netflix Inc.,100 Winchester Circle, Los Gatos, CA, United States

Uploaded by

Anonymous d2Vg3tO

You are on page 1/ 5

COMPLEXITY-BASED CONSISTENT-QUALITY ENCODING IN THE CLOUD

Jan De Cock, Zhi Li, Megha Manohara, and Anne Aaron

Netﬂix Inc.,100 Winchester Circle, Los Gatos, CA, United States

ABSTRACT such as scenes with high camera noise or film grain noise, a
5000 kbps stream would still exhibit blockiness in the noisy
A cloud-based encoding pipeline which generates streams for
areas. On the other end, for simple content like cartoons, 5000
video-on-demand distribution typically processes a wide di-
kbps is far more than needed to produce excellent 1080p en-
versity of content that exhibit varying signal characteristics.
codes.
To produce the best quality video streams, the system needs to
adapt the encoding to each piece of content, in an automated The titles in a VOD collection such as Netflix’s have very
and scalable way. In this paper, we describe two algorithm op- high diversity in signal characteristics. For example, some
timizations for a distributed cloud-based encoding pipeline: animation titles reach very high PSNR (45 dB or more) at bi-
(i) per-title complexity analysis for bitrate-resolution selec- trates of 2500 kbps or less. On the other extreme, some titles
tion; and (ii) per-chunk bitrate control for consistent-quality with high action scenes or significant spatial texture (camera
encoding. These improvements result in a number of advan- or film grain noise) require bitrates of 8000 kbps or more to
tages over a simple “one-size-fits-all” encoding system, in- achieve an acceptable PSNR of 38 dB. Given this diversity,
cluding more efficient bandwidth usage and more consistent a one-size-fits-all scheme obviously cannot provide the best
video quality. video quality for a given title and user’s allowable bandwidth.
It can also waste storage and transmission bits because, in
Index Terms— Encoding pipeline, parallel encoding, rate some cases, the allocated bitrate goes beyond what is neces-
control sary to achieve a perceptible improvement in video quality.
Furthermore, even within a title, the signal characteristics can
1. INTRODUCTION vary significantly, from simple talking head scenes to explo-
sions and car chases.
Internet streaming allows video-on-demand (VOD) distribu- In this paper we describe a cloud-based encoding system
tors such as Netflix to tailor their streams to the viewers’ avail- which selects optimized bitrate ladders per title. Given a se-
able bandwidth and viewing device capability. Streams are lected bitrate-resolution pair, we further enhance the bitrate
pre-encoded at various bitrates applying optimized encoding allocation by adapting the target bitrate of each video chunk
recipes. On the user’s device, the client runs adaptive stream- to the complexity of that segment. The chunk-based algo-
ing algorithms which instantaneously select the best encode rithm is similar to the approach described in [1], where the
to maximize video quality while avoiding playback interrup- authors propose multi-pass encoding to steer the bitrate of
tions due to rebuffers. each video segment to meet maximum quality and bitrate con-
Encoding with the best recipe is not a simple problem. For straints. However, for our approach we base the initial Con-
example, assuming a 1 Mbps bandwidth, should H.264/AVC stant Rate Factor (CRF) [2] value and the maximum bitrate of
video be streamed at 480p, 720p or 1080p resolution? At each chunk on the results of per-title complexity analysis.
480p, 1 Mbps will likely not exhibit encoding artifacts such This paper is structured as follows. In Section 2, we give
as blocking or ringing, but if the user is watching on an HD an overview of our complexity analysis algorithm, which de-
device, the upsampled video will not be sharp. On the other termines the per-title bitrate ladder. Section 3 describes en-
hand, if we encode at 1080p we send a higher resolution hancements to a VOD encoding pipeline which lead to per-
video, but the bitrate may be too low such that most scenes chunk quality and rate control. In Section 4, we present re-
will contain annoying encoding artifacts. sults on a set of full-length titles from the Netflix catalog.
In a fixed-bitrate encoding system, codec parameters can
be selected that produce the best quality trade-offs across dif-
ferent types of content. A set of bitrate-resolution pairs (re- 2. PER-TITLE COMPLEXITY ANALYSIS
ferred to as a bitrate ladder), are selected such that the bi-
trates are sufficient to encode the stream at that resolution To design the optimal per-title bitrate ladder, we select the to-
without significant encoding artifacts. This “one-size-fits-all” tal number of quality levels and the bitrate-resolution pair for
fixed bitrate ladder achieves, for most content, good quality each quality level according to several practical constraints.
encodes given the bitrate constraint. However, for some cases, For example, we need backward-compatibility (streams are

,((( ,&,3

playable on all previously certified devices), so we limit It is practically infeasible to construct the full bitrate-
the resolution selection to a finite set – e.g. 1920x1080, quality graphs spanning the entire quality region for each
1280x720, 720x480, 512x384, 384x288 and 320x240. In title in a VOD catalogue. To implement a practical solution
addition, the bitrate selection is also limited to a finite set, in production, we perform trial encodings at different CRF
where the adjacent bitrates have an increment of roughly 5%. values, over a finite set of resolutions. The CRF values are
For optimality, we select the bitrate-resolution pair such that chosen such that they are one JND apart. For each trial en-
i) At a given bitrate, the produced encode should have as high code, we measure the bitrate and quality. By interpolating
quality as possible, and ii) The perceptual difference between curves based on the sample points, we produce bitrate-quality
two adjacent bitrates should fall just below one JND. curves at each candidate resolution. The final per-title bitrate
Fig. 1 shows an example where we encode a source at ladder is then derived by selecting points closest to the convex
three different resolutions with various bitrates. hull.
In practice, a movie or TV show is composed of scenes
of varying complexity. To account for this in the generation
of the optimal bitrate ladder, we only utilize segments of the
video that are at the high complexity end of the title. This

guarantees optimal quality for the high complexity scenes but
may over-allocate bits for the simple segments of the video.

3. ENCODING PROCESS

3.1. Per-title encoding
Once the complexity of the title has been analyzed and a per-
title resolution-bitrate ladder has been constructed, the encod-
ing process is launched. For each resolution-bitrate pair, a
video encode is generated in the cloud-based video encod-
ing pipeline as follows [3,4]: The video source is divided into
fixed-length chunks and each chunk is independently encoded
in the parallel encoding pipeline. After all the encode chunks
Fig. 1. Example encodes showing individual R-D curves and are completed, a video assembler stitches the bitstreams to-
convex hull. gether to produce the full bitstream of the title. The target
bitrate is achieved by using two-pass bitrate-based rate con-
At each resolution, the quality of the encode monotoni- trol on each of the chunks, with the same target average bitrate
cally increases with the bitrate, but the curve starts flattening for all the chunks.
out (A and B) when the bitrate goes above some threshold.
On the other hand, a high-resolution encode may produce a 3.2. Per-chunk bitrate setting and encoding
quality lower than the one produced by encoding at the same
bitrate but at a lower resolution (see C and D). This is be- We enhance the encoding pipeline to support per-chunk bi-
cause encoding more pixels with lower precision can produce trate variation. For each encode chunk, we select the bitrate
a worse picture than encoding less pixels at higher precision such that it adapts to the complexity of the video for that spe-
combined with upsampling and interpolation. Furthermore, at cific segment. As mentioned above, the complexity analysis
very low bitrates the encoding overhead associated with every results in optimal resolution-bitrate pairs for that title. In ad-
fixed-size coding block starts to dominate in the bitrate con- dition, each resolution-bitrate pair (Ri , Bi ) corresponds to a
sumption, leaving very few bits for encoding the actual signal. specific CRF value, Ci that was used to generate the trial en-
Encoding at high resolution at insufficient bitrate would pro- coding. This CRF number represents the consistent quality
duce artifacts such as blocking, ringing and contouring. target for the title given the ladder point i. The objective of
We can see that each resolution has a bitrate region in the per-chunk bitrate adaptation is to encode each chunk at
which it outperforms other resolutions. If we collect all these resolution Ri with quality Ci and capped at bitrate Bi . Since
regions from all the resolutions available, they collectively the resolution-bitrate pairs for the title were chosen using the
form the convex hull. Ideally, we want to operate exactly at complex segments of the title, per-chunk adaptation results in
the convex hull, but due to practical constraints (for example, an average bitrate across the title of less than Bi .
we can only select from a finite number of resolutions), we In particular, we apply multi-pass encoding. For each
would like to select bitrate-resolution pairs that are as close to chunk n, the first pass uses CRF rate control at the desired
the convex hull as possible. CRF Ci , and the size of the resulting encode determines the

chunk bitrate Bi,n . Based on this bitrate, two options are pos- 46
1080p
1080p
sible: (i) When the bitrate Bi,n does not exceed the maximum 45 1080p
1080p
720p
bitrate Bi , Bi,n is passed on as the rate target for the second 720p
720p
720p
44
pass. The second pass is bitrate-controlled and uses the per- 720p
43

Scaled PSNR [dB]

frame statistics from the ﬁrst pass. Compared to the one-pass 480p
CRF encode, the additional second pass allows for improved 480p
42 480p
480p
bit allocation and buffer compliance while still maintaining a 41
consistent quality target across chunks. (ii) If the per-chunk 480p
40
bitrate exceeds the maximum bitrate, a regular two-pass en-
code is started with rate target Bi , leading to three passes 39 Per-title bitrate ladder
overall for this chunk. 38 Fixed bitrate ladder

37
4. RESULTS 0 1000 2000 3000 4000 5000 6000
Bitrate [kbps]

(a) Title with moderate motion and complexity.

In this section, we present results of the complexity analy-
sis and encoding process for a set of VOD titles in the Net- 44

ﬂix catalog. Each title went through a cloud-based encoding 1080p

1080p
pipeline consisting of source inspection, complexity analysis, 42
720p 1080p
1080p
multi-pass chunk encoding, and assembly [4]. To measure the 720p
40 720p

Scaled PSNR [dB]

quality of the encodes, we use both scaled PSNR (measured 720p 720p
at 1080p resolution) and the Video Multi-method Assessment 480p 480p
38 480p
Fusion (VMAF) metric [3]. VMAF estimates quality by com- 480p
bining scores from multiple quality assessment algorithms, 480p
36
including Anti-Noise SNR, Detail Loss Measure [5], Visual
Information Fidelity [6], and motion information, and shows Per-title bitrate ladder
34
a higher correlation with subjective quality scores than e.g. Fixed bitrate ladder
PSNR and SSIM.
32
0 1000 2000 3000 4000 5000 6000 7000 8000
Bitrate [kbps]
4.1. Comparison between per-title encoding and fixed-
(b) Complex title containing film grain.
ladder encoding
The per-title encoding scheme has the advantage that client Fig. 2. Example R-D curves for fixed and per-title bitrate
devices can switch between resolutions at a bitrate more ladders.
appropriate for each individual title. The gains in quality
become apparent from the example rate-distortion graph in
Fig. 2(a), where we show the fixed-rate and per-title R-D olution at a lower bitrate. We are, however, increasing the
curves for a full-length episode of a drama series with mod- highest bitrate point to 7500 kbps. Visual inspection shows
erate spatio-temporal complexity. R-D points for different that encoding at this higher bitrate point better preserves the
resolutions are plotted in this graph, with their scaled PSNR film grain present in this show.
values shown in the ordinate. When looking at the combined
R-D points for a certain resolution (indicated for 480p, 720p 4.2. Comparison between per-chunk and per-title encod-
and 1080p), the R-D curve for that particular resolution can ing
be distinguished.
The red per-title curve shows how we are now encoding Per-chunk encoding leads to a more consistent video qual-
at the convex hull encompassing the individual per-resolution ity across the entire title. On average, we obtain PSNR and
R-D curves. In this example, the transitions between resolu- VMAF values which are similar for both per-title and per-
tions occur at lower bitrates for the per-title ladder (e.g. 720p chunk encoding. The benefit of using per-chunk encoding
resolution is enabled at 1050 kbps instead of 2350 kbps). In (based on multi-pass rate control as described in Section 3),
particular at these transition points, we are obtaining substan- however, lies in the reduction of the quality variation, and in
tial quality gains over the fixed-ladder approach. the increase of the minimum quality.
Fig. 2(b) shows the RD curve for a second example title, The per-chunk PSNR and VMAF values for the first 500
further illustrating the benefit of encoding at the convex hull. seconds of a drama series episode are plotted in Fig. 3. In
For this example, there are clear gains at low resolutions, but these graphs, each point represents a 10-second chunk of the
for e.g. 1080p there is little benefit in switching to this res- episode. On average for the whole episode, the quality scores

are close to each other (e.g. a PSNR of 38.11 dB for per- zero, indicating a reduced quality variation.
chunk encoding vs. 37.98 dB for per-title encoding), but we
can see signiﬁcant differences in the per-chunk quality. This
is more apparent from the VMAF quality scores. By using
the per-chunk encoding approach, we are able to reduce the
variation, and to limit the drops in quality. For this example,
the standard deviation σ of the per-frame quality is reduced
from 4.24 dB to 3.97 dB (PSNR), and from 10.91 to 9.16
(VMAF).

46
Per-title
44
Per-chunk
42
Scaled PSNR [dB]

34
Fig. 4. Cumulative distribution function for ΔPSNR for per-
32
title and per-chunk encoding (aggregated for 10 titles).
30

28 To improve quality of experience, we are interested in

0 5 10 15 20 25 30 35 40 45 50
maximizing the minimal quality in the encoded streams. This
Chunk index
is particularly important for the lower resolution encodes,
(a) Scaled PSNR values per chunk. Per-title @ 365 kbps (PSNRavg =
37.98 dB, σ=4.24 dB) vs. per-chunk @ 370 kbps (PSNRavg = 38.11 dB,
where the quality variation is the highest. The CDF for per-
σ=3.97 dB). chunk encoding indicates that we are effectively increasing
85 the quality on the lower end. When looking at the ﬁfth per-
centile of the frames, we improve the quality, as indicated
80
by the red line in Fig. 4, and as detailed in Table 1. For the
75 480p encodes of the test set, we obtain gains of 0.5 dB for the
70 lowest resolution, and a VMAF improvement of 3.15. The
gains decrease for higher resolutions to about 0.3 dB (PSNR)
65
VMAF

and 0.45 (VMAF) for the 1080p encodes.

55 Table 1. Quality (scaled PSNR and VMAF) for ﬁfth per-

50 centile of frames.
Per-title Per-title Per-chunk
45 Resolution PSNR [dB] VMAF PSNR [dB] VMAF
Per-chunk
40 480p 32.85 50.61 33.35 53.76
0 5 10 15 20 25 30 35 40 45 50 720p 36.32 64.77 36.74 66.06
Chunk index
1080p 37.77 72.87 38.05 73.32
(b) VMAF score per chunk. Per-title @ 365 kbps (VMAFavg = 71.26,
σ=10.91) vs. per-chunk @ 370 kbps (VMAFavg = 72.29, σ=9.16).
5. CONCLUSIONS
Fig. 3. Scaled PSNR and VMAF scores for the ﬁrst 500 sec- In this paper, we gave an overview of two improvements that
onds of a drama series episode. can be implemented in a VOD encoding pipeline. Based on a
complexity analysis step, we are able to determine resolution-
To aggregate the results over multiple titles, Fig. 4 shows bitrate pairs which closely reﬂect the convex hull of the R-
the cumulative distribution function (CDF) of the quality vari- D curves at different resolutions. Furthermore, we are de-
ation for per-title and per-chunk encoding, where ΔPSNR = termining the bitrate of individual chunks, by using a CRF-
PSNR − PSNRavg . These statistics were collected on 10 full- based multi-pass encoding process. Per-chunk encoding was
length titles with varying spatio-temporal characteristics, rep- shown to lead to more consistent quality across the individ-
resenting more than 500,000 frames. The graph shows that ual chunks, and to improve the minimal quality in the video
the CDF for per-chunk encoding rises more sharply around streams.

6. REFERENCES

[1] Y.-C. Lin, H. Denman, and A. Kokaram, “Multipass en-

coding for reducing pulsing artifacts in cloud based video
transcoding,” in IEEE International Conference on Image
Processing (ICIP). IEEE, 2015, pp. 907–911.

[2] L. Merritt and R. Vanam, “Improved rate control and mo-

tion estimation for h.264 encoder,” in IEEE International
Conference on Image Processing (ICIP). IEEE, 2007, pp.
309–312.

[3] A. Aaron, Z. Li, M. Manohara, J.Y. Lin, E.C.-H. Wu, and

C.-C.J. Kuo, “Challenges in cloud based ingest and en-
coding for high quality streaming media,” in IEEE Inter-
national Conference on Image Processing (ICIP), 2015,
pp. 1732–1736.

[4] A. Aaron and D. Ronca, “High quality

video encoding at scale,” Netﬂix Tech Blog,
http://techblog.netﬂix.com/2015/12/high-quality-video-
encoding-at-scale.html, December 9, 2015.
[5] S. Li, F. Zhang, L. Ma, and K.N. Ngan, “Image quality
assessment by separately evaluating detail losses and ad-
ditive impairments,” IEEE Transactions on Multimedia,
vol. 13, no. 5, pp. 935–949, Oct 2011.