The Lecture Contains:: Lecture 41: Performance Measures, Intraframe Coding, Predictive and Transform Coding

Objectives_template
Module 8: Video Coding Basics

Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding

The Lecture Contains:

Performance Measures

Intraframe Coding

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_1.htm[12/31/2015 11:55:56 AM]

Objectives_template


Performance Measures

(unitless), (8.7)

(8.15)
(bits/pel), (8.8)

(8.16)

(bits/s), (8.9)

Another important aspect is the reconstruction quality. This can be assessed using a number of
subjective and objective measures.
Subjective measures are normally evaluated by showing the reconstructed video to a group of
subjects and asking for their views on the perceived quality. A number of subjective assessment
methodologies have been developed over the years. Examples are the double stimulus impairment
scale (DSIS) and the double and single stimulus continuous quality scales, (DSCQS) and (SSCQS),
respectively. Despite their reliability, subjective quality experiments are expensive and time
consuming.
Objective measures provide cheaper and faster alternatives. One commonly used objective measure
is the mean squared error (MSE), which is defined as
(8.10)

Where H and V are the horizontal and vertical dimensions of the frame, respectively, and
and are the pel values at location of the original and reconstructed frames,
respectively. Care should be taken to include color components and to take into account any chroma
subsampling. For example, the MSE of a reconstructed 4:2:0 color frame can be calculated as

(8.11)


Objectives_template


A more common form of the MSE measure is the peak signal-to-noise ratio (PSNR), which is defined
as

(dB), (8.12)

Where is the maximum possible pel value (for example, 255 for an 8-bit resolution component).
Although this measure does not always correlate well with perceived video quality, its relative
simplicity makes it a very popular choice in the video coding community. Thus, to facilitate
comparisons with other algorithms reported in the literature, we adopt the PSNR measure. If
accuracy is a major concern, then more sophisticated objective measures based on perceptual
models can be used.

When testing a video coding algorithm, it is very important to subject it to a range of input video
sequences with different characteristics and a reasonable
Figure 8.3: Three test sequences

spread of data properties. The Moving Picture Experts Group (MPEG) established a library of CCIR-
601 test sequences divided into five classes: class A (low spatial detail and low amount of motion),
class B (medium spatial detail and low amount of motion or vice versa), class C (high spatial detail
and medium amount of motion, or vice versa), class D (stereoscopic), and class E (hybrid of natural
and synthetic content) . The first three classes are more relevant for discussion. Thus, the discussion
in this module uses three test sequences: FOREMAN, AKIYO, and TABLE TENNIS, where each
sequence is a representative of one of the three relevant classes, A,B, and C, respectively. The three
sequences are at QSIF resolution and include 300 frames each. This resolution is typical of the
sequences used in very-low-bit-rate applications. Both AKIYO and TABLE TENNIS have luma
components of 176 x 120 and a frame rate of 30 frames/s, where FOREMAN has a luma component
of 176 x 144 and a frame rate of 25 frames/s. Figure 8.3 shows the luma component of the first
frame of each of the three test sequences.


Objectives_template


Intraframe Coding
Intraframe coding refers to video coding techniques that achieve compression by exploiting (reducing)
the high spatial correlation between neighboring pels within a video frame. Such techniques are also

known as spatial redundancy reduction techniques or still-image coding techniques.

1) Predictive Coding
Predictive coding was originally proposed by Cutler in 1952. In this method, a number of previously
coded pels are used to form a prediction of the current pel. The difference between the pel and its
prediction forms the signal to be coded. Obviously, the better the prediction, the smaller the error
signal and the more efficient the coding system.

(a) Encoder (b) Decoder

Figure 8.4: Block diagram of a predictive coding system.
At the decoder, the same prediction is produced using previously decoded pels, and the received
error signal is added to reconstruct the current pel. A block diagram of a predictive coding system is
depicted in Figure 8.4.
Predictive coding is commonly referred to as differential pulse code modulation (DPCM). A special
case of this method is delta modulation (DM), which quantizes the error signal using two quantization
levels only.
Predictive coding can take many forms, depending on the design of the predictor and the quantizer
blocks. The predictor can use a linear or a nonlinear function of the previously decoded pels, it can
be 1-D (using pels from the same line) or 2-D (using pels from the same line and from previous
lines), and it can be fixed or adaptive. The quantizer also can be uniform or nonuniform, and it can be
fixed or adaptive.


Objectives_template


The minimal storage and processing requirements were partly responsible for the early popularity of
this method, when storage and processing devices were scarce and expensive resources. The
method, however, provides only a modest amount of compression. In addition, its performance is
highly dependent on the statistics of the input data, and it is very sensitive to errors (feedback through

the prediction loop can cause error propagation). As processing and storage devices became more

available, more complex, more efficient methods like transform coding have become more popular.

Despite this, predictive coding is still used in video coding, as, for example, in the lossless coding of
motion vectors.
2) Transform Coding
Transform coding, developed more than two decades ago, has proven to be a very effective video
coding method. Today, it forms the basis of almost all video coding standards. Figure 8.5 shows a
block diagram of a typical transform coding system. The input frame is first segmented into
blocks.
(a) Encoder

(b) Decoder
Figure 8.5: Block diagram of a transform coding system


Objectives_template


A unitary space-frequency transform is applied to each block to produce a block of transform
(spectral) coefficients that are then suitably quantized and coded. At the decoder, an inverse
transform is applied to reconstruct the frame.

The main goal of the transform is to decorrelate the pels of the input block. This is achieved by
redistributing the energy of the pels and concentrating most of it in a small set of transform
coefficients. This is known as energy compaction. The transform process can also to interpreted as
a coordinate rotation of the input or as a decomposition of the input into orthogonal basis functions
weighted by the transform coefficients.

Compression comes about from two main mechanisms. First, low-energy coefficients can be
discarded with minimum impact on the reconstruction quality. Second, the HVS has differing
sensitivity to different frequencies. Thus, the retained coefficients can be quantized according to their
visual importance.
When choosing a transform, three main properties are desired: good energy compaction, data-
independent basis functions, and fast implementation.
The Karhunen-Loeve transform (KLT) is the optimal transform in an energy-compaction sense.

Unfortunately, this optimality is due to the fact that the KLT basis functions are dependent on the
covariance matrix of the input block. Recomputing and transmitting basis functions for each block are
a nontrivial computational task. These disadvantages severely limit the use of the KLT in practical
coding systems.
The performance of many suboptimal transforms with data-independent basis functions has been
studied. Examples are the discrete Fourier transform (DFT), the discrete cosine transform (DCT), the
Walsh-Hadamard transform (WHT), and the Haar transform. It has been demonstrated that the DCT
has the closest energy-compaction performance to that of the optimum KLT. This has motivated the
development of a number of fast DCT algorithms Due to these attractive features, i.e., near-
optimum energy-compaction, data-independent basis functions and fast algorithms, the DCT
has become the "workhorse" of most image and video coding standards .


Objectives_template


The DCT was developed by Ahmed et.al. in 1974. There are four slightly different versions of the
DCT, but the one commonly used for video coding is denoted by DCT-II. The 2-D DCT-II of an
block of pels is given by

(8.13)

where is the pel value at location within the block, is the corresponding
transform coefficient, and

(8.14)
The transform coefficient F(0,0) at the top-left corner of the transformed block is called the DC
coefficient because it contains the lowest frequencies in both the horizontal and vertical dimensions.
The corresponding inverse DCT transform is given by
(8.15
)
It can be deduced from Equation (8.13) that the computational complexity of an 2-D DCT is of

the order . However, one of the advantages of the DCT is that it is separable. This means
that a 2-D DCT can be separated into a pair of 1-D DCTs. Thus, to obtain the 2-D DCT of an
block, a 1-D DCT is performed first on each of the N rows of the block and then on each of the N
columns of the resulting block (or vice versa). The same applies to the inverse DCT. This reduces
the complexity to . Further reductions in complexity can be achieved using a number of fast
DCT algorithms.


Objectives_template


Beside transform selection, a significant factor that affects tr ansform coding performance and
computational complexity is the block size. In general, the

Figure 8.6: Transform coefficient bit allocation

use of smaller block sizes reduces computational complexity. However, as will be discussed later,
transform coding suffers from blocking artifacts at very low bit rates. Such artifacts are more
disturbing with smaller block sizes. As a compromise between computational complexity and blocking
artefacts, most transform coding systems employ a block size of 8 x 8 or 16 x 16. Note that both
sizes are powers of 2, which simplifies computations.
Another important factor in transform coding is bit allocation. This refers to the process of determining
which coefficients should be retained for coding and how coarsely each retained coefficient should be
quantized. There are two main approaches: zonal coding and threshold coding. In zonal coding the
retained coefficients are selected on the basis of maximum variance. Thus, the locations of the
retained coefficients with the largest variances are indicated by a zonal mask that is the same for all
blocks. Once the retained coefficients are decided, a number of methods can be used to decide the
number of bits allocated to each. One method is to choose the number of bits to be proportional to
the variance of the coefficient. Figure 8.6(a) shows a zonal mask with the allocated bits. Once the
number of bits allocated for each coefficient is determined, a different quantizer can be designed for
each coefficient.


Objectives_template


One disadvantage of zonal coding is that the locations of the retained coefficients and the bits
allocated to them are fixed for all blocks. In threshold coding, however, the locations and the bit
allocation can be adapted to the characteristics of the block. For this reason, this method is
employed by most video coding standards. In threshold coding, the retained coefficients are selected

on the basis of maximum magnitude. Thus, only those coefficients whose magnitudes are above a

threshold are retained. In practice, the thresholding and the following quantization operations are

combined in one operation using a uniform threshold quantizer. In this case, a quantization matrix is
used to define the quantizer step size, for each coefficient in the block.
A typical quantization matrix is given in Figure 8.6(b). Note that low-frequency coefficients (towards
top-left corner) are more finely quantized (i.e., quantized with a smaller step size) because of two
reasons. First, the DCT tends to concentrate most of the energy in low frequencies. Second, the HVS
is more sensitive to variations in low frequencies. Since in threshold coding the locations of the
retained coefficients vary from block to block, those locations need to be encoded. A commonly used
strategy is to zigzag scan the transform coefficients, as illustrated in Figure 8.6(c), in an attempt to
produce long runs of zeros, and then RLE is used to encode the resulting array.
Compared to predictive coding, transform coding provides higher compression with less sensitivity to
errors and less dependence on the input data statistics. Its higher computational complexity and
storage requirements have been offset by advances in integrated circuit technology. One
disadvantage, however, is that when compression factors are pushed to the limit, three types of
artefacts start to occur: (i) "graininess" due to coarse quantization of some coefficients, (ii) "blurring"
due to the truncation of high-frequency coefficients, and (iii) "blocking artifacts," which refer to
artificial discontinuities appearing at the borders of neighboring blocks due to independent processing
of each block. Since blocking artefacts are the most disturbing, a number of methods have been
proposed to reduce them. Examples are overlapping blocks at the encoder, the use of the lapped
orthogonal transform , and postprocessing using filtering and image restoration techniques.


The Lecture Contains:: Lecture 41: Performance Measures, Intraframe Coding, Predictive and Transform Coding

Uploaded by

Copyright:

Available Formats

The Lecture Contains:: Lecture 41: Performance Measures, Intraframe Coding, Predictive and Transform Coding

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Lecture Contains:: Lecture 41: Performance Measures, Intraframe Coding, Predictive and Transform Coding

Uploaded by

Copyright:

Available Formats

Objectives_template

Module 8: Video Coding Basics

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_1.htm[12/31/2015 11:55:56 AM]

Module 8: Video Coding Basics

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_2.htm[12/31/2015 11:55:56 AM]

Module 8: Video Coding Basics

Figure 8.3: Three test sequences

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_3.htm[12/31/2015 11:55:57 AM]

Module 8: Video Coding Basics

(a) Encoder (b) Decoder

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_4.htm[12/31/2015 11:55:57 AM]

Module 8: Video Coding Basics

Figure 8.5: Block diagram of a transform coding system

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_5.htm[12/31/2015 11:55:57 AM]

Module 8: Video Coding Basics

The Karhunen-Loeve transform (KLT) is the optimal transform in an energy-compaction sense.

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_6.htm[12/31/2015 11:55:57 AM]

Module 8: Video Coding Basics

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_7.htm[12/31/2015 11:55:58 AM]

Module 8: Video Coding Basics

Figure 8.6: Transform coefficient bit allocation

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_8.htm[12/31/2015 11:55:58 AM]

Module 8: Video Coding Basics

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_9.htm[12/31/2015 11:55:58 AM]

You might also like