Complexity-Based Consistent-Quality Encoding in The Cloud
Complexity-Based Consistent-Quality Encoding in The Cloud
ABSTRACT such as scenes with high camera noise or film grain noise, a
5000 kbps stream would still exhibit blockiness in the noisy
A cloud-based encoding pipeline which generates streams for
areas. On the other end, for simple content like cartoons, 5000
video-on-demand distribution typically processes a wide di-
kbps is far more than needed to produce excellent 1080p en-
versity of content that exhibit varying signal characteristics.
codes.
To produce the best quality video streams, the system needs to
adapt the encoding to each piece of content, in an automated The titles in a VOD collection such as Netflix’s have very
and scalable way. In this paper, we describe two algorithm op- high diversity in signal characteristics. For example, some
timizations for a distributed cloud-based encoding pipeline: animation titles reach very high PSNR (45 dB or more) at bi-
(i) per-title complexity analysis for bitrate-resolution selec- trates of 2500 kbps or less. On the other extreme, some titles
tion; and (ii) per-chunk bitrate control for consistent-quality with high action scenes or significant spatial texture (camera
encoding. These improvements result in a number of advan- or film grain noise) require bitrates of 8000 kbps or more to
tages over a simple “one-size-fits-all” encoding system, in- achieve an acceptable PSNR of 38 dB. Given this diversity,
cluding more efficient bandwidth usage and more consistent a one-size-fits-all scheme obviously cannot provide the best
video quality. video quality for a given title and user’s allowable bandwidth.
It can also waste storage and transmission bits because, in
Index Terms— Encoding pipeline, parallel encoding, rate some cases, the allocated bitrate goes beyond what is neces-
control sary to achieve a perceptible improvement in video quality.
Furthermore, even within a title, the signal characteristics can
1. INTRODUCTION vary significantly, from simple talking head scenes to explo-
sions and car chases.
Internet streaming allows video-on-demand (VOD) distribu- In this paper we describe a cloud-based encoding system
tors such as Netflix to tailor their streams to the viewers’ avail- which selects optimized bitrate ladders per title. Given a se-
able bandwidth and viewing device capability. Streams are lected bitrate-resolution pair, we further enhance the bitrate
pre-encoded at various bitrates applying optimized encoding allocation by adapting the target bitrate of each video chunk
recipes. On the user’s device, the client runs adaptive stream- to the complexity of that segment. The chunk-based algo-
ing algorithms which instantaneously select the best encode rithm is similar to the approach described in [1], where the
to maximize video quality while avoiding playback interrup- authors propose multi-pass encoding to steer the bitrate of
tions due to rebuffers. each video segment to meet maximum quality and bitrate con-
Encoding with the best recipe is not a simple problem. For straints. However, for our approach we base the initial Con-
example, assuming a 1 Mbps bandwidth, should H.264/AVC stant Rate Factor (CRF) [2] value and the maximum bitrate of
video be streamed at 480p, 720p or 1080p resolution? At each chunk on the results of per-title complexity analysis.
480p, 1 Mbps will likely not exhibit encoding artifacts such This paper is structured as follows. In Section 2, we give
as blocking or ringing, but if the user is watching on an HD an overview of our complexity analysis algorithm, which de-
device, the upsampled video will not be sharp. On the other termines the per-title bitrate ladder. Section 3 describes en-
hand, if we encode at 1080p we send a higher resolution hancements to a VOD encoding pipeline which lead to per-
video, but the bitrate may be too low such that most scenes chunk quality and rate control. In Section 4, we present re-
will contain annoying encoding artifacts. sults on a set of full-length titles from the Netflix catalog.
In a fixed-bitrate encoding system, codec parameters can
be selected that produce the best quality trade-offs across dif-
ferent types of content. A set of bitrate-resolution pairs (re- 2. PER-TITLE COMPLEXITY ANALYSIS
ferred to as a bitrate ladder), are selected such that the bi-
trates are sufficient to encode the stream at that resolution To design the optimal per-title bitrate ladder, we select the to-
without significant encoding artifacts. This “one-size-fits-all” tal number of quality levels and the bitrate-resolution pair for
fixed bitrate ladder achieves, for most content, good quality each quality level according to several practical constraints.
encodes given the bitrate constraint. However, for some cases, For example, we need backward-compatibility (streams are
3. ENCODING PROCESS
3.1. Per-title encoding
Once the complexity of the title has been analyzed and a per-
title resolution-bitrate ladder has been constructed, the encod-
ing process is launched. For each resolution-bitrate pair, a
video encode is generated in the cloud-based video encod-
ing pipeline as follows [3,4]: The video source is divided into
fixed-length chunks and each chunk is independently encoded
in the parallel encoding pipeline. After all the encode chunks
Fig. 1. Example encodes showing individual R-D curves and are completed, a video assembler stitches the bitstreams to-
convex hull. gether to produce the full bitstream of the title. The target
bitrate is achieved by using two-pass bitrate-based rate con-
At each resolution, the quality of the encode monotoni- trol on each of the chunks, with the same target average bitrate
cally increases with the bitrate, but the curve starts flattening for all the chunks.
out (A and B) when the bitrate goes above some threshold.
On the other hand, a high-resolution encode may produce a 3.2. Per-chunk bitrate setting and encoding
quality lower than the one produced by encoding at the same
bitrate but at a lower resolution (see C and D). This is be- We enhance the encoding pipeline to support per-chunk bi-
cause encoding more pixels with lower precision can produce trate variation. For each encode chunk, we select the bitrate
a worse picture than encoding less pixels at higher precision such that it adapts to the complexity of the video for that spe-
combined with upsampling and interpolation. Furthermore, at cific segment. As mentioned above, the complexity analysis
very low bitrates the encoding overhead associated with every results in optimal resolution-bitrate pairs for that title. In ad-
fixed-size coding block starts to dominate in the bitrate con- dition, each resolution-bitrate pair (Ri , Bi ) corresponds to a
sumption, leaving very few bits for encoding the actual signal. specific CRF value, Ci that was used to generate the trial en-
Encoding at high resolution at insufficient bitrate would pro- coding. This CRF number represents the consistent quality
duce artifacts such as blocking, ringing and contouring. target for the title given the ladder point i. The objective of
We can see that each resolution has a bitrate region in the per-chunk bitrate adaptation is to encode each chunk at
which it outperforms other resolutions. If we collect all these resolution Ri with quality Ci and capped at bitrate Bi . Since
regions from all the resolutions available, they collectively the resolution-bitrate pairs for the title were chosen using the
form the convex hull. Ideally, we want to operate exactly at complex segments of the title, per-chunk adaptation results in
the convex hull, but due to practical constraints (for example, an average bitrate across the title of less than Bi .
we can only select from a finite number of resolutions), we In particular, we apply multi-pass encoding. For each
would like to select bitrate-resolution pairs that are as close to chunk n, the first pass uses CRF rate control at the desired
the convex hull as possible. CRF Ci , and the size of the resulting encode determines the
chunk bitrate Bi,n . Based on this bitrate, two options are pos- 46
1080p
1080p
sible: (i) When the bitrate Bi,n does not exceed the maximum 45 1080p
1080p
720p
bitrate Bi , Bi,n is passed on as the rate target for the second 720p
720p
720p
44
pass. The second pass is bitrate-controlled and uses the per- 720p
43
37
4. RESULTS 0 1000 2000 3000 4000 5000 6000
Bitrate [kbps]
are close to each other (e.g. a PSNR of 38.11 dB for per- zero, indicating a reduced quality variation.
chunk encoding vs. 37.98 dB for per-title encoding), but we
can see significant differences in the per-chunk quality. This
is more apparent from the VMAF quality scores. By using
the per-chunk encoding approach, we are able to reduce the
variation, and to limit the drops in quality. For this example,
the standard deviation σ of the per-frame quality is reduced
from 4.24 dB to 3.97 dB (PSNR), and from 10.91 to 9.16
(VMAF).
48
46
Per-title
44
Per-chunk
42
Scaled PSNR [dB]
40
38
36
34
Fig. 4. Cumulative distribution function for ΔPSNR for per-
32
title and per-chunk encoding (aggregated for 10 titles).
30
6. REFERENCES