Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Lecture Contains:: Lecture 41: Performance Measures, Intraframe Coding, Predictive and Transform Coding

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
 
 
 The Lecture Contains:
 
 
  Performance Measures
 
Intraframe Coding
 
 
 
  
 
 
 
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_1.htm[12/31/2015 11:55:56 AM]


Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
  
  Performance Measures
 
  (unitless), (8.7)
 
   (8.15)  
  (bits/pel), (8.8)
 
   (8.16)  
 
  (bits/s), (8.9)
 
  Another important aspect is the reconstruction quality. This can be assessed using a number of
subjective and objective measures.

Subjective measures are normally evaluated by showing the reconstructed video to a group of
subjects and asking for their views on the perceived quality. A number of subjective assessment
methodologies have been developed over the years. Examples are the double stimulus impairment
scale (DSIS) and the double and single stimulus continuous quality scales, (DSCQS) and (SSCQS),
respectively. Despite their reliability, subjective quality experiments are expensive and time
consuming.

Objective measures provide cheaper and faster alternatives. One commonly used objective measure
is the mean squared error (MSE), which is defined as

(8.10)

 
Where H and V are the horizontal and vertical dimensions of the frame, respectively, and
and are the pel values at location of the original and reconstructed frames,
respectively. Care should be taken to include color components and to take into account any chroma
subsampling. For example, the MSE of a reconstructed 4:2:0 color frame can be calculated as
 
 
 
 
(8.11)

   
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_2.htm[12/31/2015 11:55:56 AM]


Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
  
  A more common form of the MSE measure is the peak signal-to-noise ratio (PSNR), which is defined
  as
 
(dB), (8.12)
 
  Where is the maximum possible pel value (for example, 255 for an 8-bit resolution component).
  Although this measure does not always correlate well with perceived video quality, its relative
  simplicity makes it a very popular choice in the video coding community. Thus, to facilitate
  comparisons with other algorithms reported in the literature, we adopt the PSNR measure. If
  accuracy is a major concern, then more sophisticated objective measures based on perceptual
  models can be used.
 
  When testing a video coding algorithm, it is very important to subject it to a range of input video
sequences with different characteristics and a reasonable

Figure 8.3: Three test sequences


spread of data properties. The Moving Picture Experts Group (MPEG) established a library of CCIR-
  601 test sequences divided into five classes: class A (low spatial detail and low amount of motion),
class B (medium spatial detail and low amount of motion or vice versa), class C (high spatial detail
and medium amount of motion, or vice versa), class D (stereoscopic), and class E (hybrid of natural
and synthetic content) . The first three classes are more relevant for discussion. Thus, the discussion
in this module uses three test sequences: FOREMAN, AKIYO, and TABLE TENNIS, where each
sequence is a representative of one of the three relevant classes, A,B, and C, respectively. The three
sequences are at QSIF resolution and include 300 frames each. This resolution is typical of the
sequences used in very-low-bit-rate applications. Both AKIYO and TABLE TENNIS have luma
components of 176 x 120 and a frame rate of 30 frames/s, where FOREMAN has a luma component
of 176 x 144 and a frame rate of 25 frames/s. Figure 8.3 shows the luma component of the first
frame of each of the three test sequences.
 
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_3.htm[12/31/2015 11:55:57 AM]


Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
  
  Intraframe Coding
  Intraframe coding refers to video coding techniques that achieve compression by exploiting (reducing)
  the high spatial correlation between neighboring pels within a video frame. Such techniques are also
 
known as spatial redundancy reduction techniques or still-image coding techniques.
 
  1) Predictive Coding
  Predictive coding was originally proposed by Cutler in 1952. In this method, a number of previously
  coded pels are used to form a prediction of the current pel. The difference between the pel and its
  prediction forms the signal to be coded. Obviously, the better the prediction, the smaller the error
  signal and the more efficient the coding system.
 
 

(a) Encoder (b) Decoder


Figure 8.4: Block diagram of a predictive coding system.

At the decoder, the same prediction is produced using previously decoded pels, and the received
  error signal is added to reconstruct the current pel. A block diagram of a predictive coding system is
depicted in Figure 8.4.

Predictive coding is commonly referred to as differential pulse code modulation (DPCM). A special
case of this method is delta modulation (DM), which quantizes the error signal using two quantization
levels only.

Predictive coding can take many forms, depending on the design of the predictor and the quantizer
blocks. The predictor can use a linear or a nonlinear function of the previously decoded pels, it can
be 1-D (using pels from the same line) or 2-D (using pels from the same line and from previous
lines), and it can be fixed or adaptive. The quantizer also can be uniform or nonuniform, and it can be
fixed or adaptive.
 
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_4.htm[12/31/2015 11:55:57 AM]


Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
  
  The minimal storage and processing requirements were partly responsible for the early popularity of
  this method, when storage and processing devices were scarce and expensive resources. The
  method, however, provides only a modest amount of compression. In addition, its performance is
  highly dependent on the statistics of the input data, and it is very sensitive to errors (feedback through
 
the prediction loop can cause error propagation). As processing and storage devices became more
 
available, more complex, more efficient methods like transform coding have become more popular.
 
  Despite this, predictive coding is still used in video coding, as, for example, in the lossless coding of
  motion vectors.
  2) Transform Coding
  Transform coding, developed more than two decades ago, has proven to be a very effective video
  coding method. Today, it forms the basis of almost all video coding standards. Figure 8.5 shows a
block diagram of a typical transform coding system. The input frame is first segmented into
blocks.

(a) Encoder

  

(b) Decoder

Figure 8.5: Block diagram of a transform coding system


 
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_5.htm[12/31/2015 11:55:57 AM]


Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
  
  A unitary space-frequency transform is applied to each block to produce a block of transform
  (spectral) coefficients that are then suitably quantized and coded. At the decoder, an inverse
  transform is applied to reconstruct the frame.
 
  The main goal of the transform is to decorrelate the pels of the input block. This is achieved by
  redistributing the energy of the pels and concentrating most of it in a small set of transform
  coefficients. This is known as energy compaction. The transform process can also to interpreted as
  a coordinate rotation of the input or as a decomposition of the input into orthogonal basis functions
  weighted by the transform coefficients.
 
  Compression comes about from two main mechanisms. First, low-energy coefficients can be
  discarded with minimum impact on the reconstruction quality. Second, the HVS has differing
sensitivity to different frequencies. Thus, the retained coefficients can be quantized according to their
visual importance.

When choosing a transform, three main properties are desired: good energy compaction, data-
independent basis functions, and fast implementation.

The Karhunen-Loeve transform (KLT) is the optimal transform in an energy-compaction sense.


Unfortunately, this optimality is due to the fact that the KLT basis functions are dependent on the
covariance matrix of the input block. Recomputing and transmitting basis functions for each block are
a nontrivial computational task. These disadvantages severely limit the use of the KLT in practical
  coding systems.

The performance of many suboptimal transforms with data-independent basis functions has been
studied. Examples are the discrete Fourier transform (DFT), the discrete cosine transform (DCT), the
Walsh-Hadamard transform (WHT), and the Haar transform. It has been demonstrated that the DCT
has the closest energy-compaction performance to that of the optimum KLT. This has motivated the
development of a number of fast DCT algorithms Due to these attractive features, i.e., near-
optimum energy-compaction, data-independent basis functions and fast algorithms, the DCT
has become the "workhorse" of most image and video coding standards .
 
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_6.htm[12/31/2015 11:55:57 AM]


Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
  
  The DCT was developed by Ahmed et.al. in 1974. There are four slightly different versions of the
  DCT, but the one commonly used for video coding is denoted by DCT-II. The 2-D DCT-II of an
  block of pels is given by
 
 
(8.13)
 
 
  where is the pel value at location within the block, is the corresponding
  transform coefficient, and
 
 
  (8.14)

The transform coefficient F(0,0) at the top-left corner of the transformed block is called the DC
coefficient because it contains the lowest frequencies in both the horizontal and vertical dimensions.
The corresponding inverse DCT transform is given by

(8.15
)

It can be deduced from Equation (8.13) that the computational complexity of an 2-D DCT is of
 
the order . However, one of the advantages of the DCT is that it is separable. This means
that a 2-D DCT can be separated into a pair of 1-D DCTs. Thus, to obtain the 2-D DCT of an
block, a 1-D DCT is performed first on each of the N rows of the block and then on each of the N
columns of the resulting block (or vice versa). The same applies to the inverse DCT. This reduces
the complexity to . Further reductions in complexity can be achieved using a number of fast
DCT algorithms.
 
 
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_7.htm[12/31/2015 11:55:58 AM]


Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
  
  Beside transform selection, a significant factor that affects tr ansform coding performance and
  computational complexity is the block size. In general, the
 
 
 
 
 
 
 
 
 
 

Figure 8.6: Transform coefficient bit allocation


use of smaller block sizes reduces computational complexity. However, as will be discussed later,
transform coding suffers from blocking artifacts at very low bit rates. Such artifacts are more
disturbing with smaller block sizes. As a compromise between computational complexity and blocking
artefacts, most transform coding systems employ a block size of 8 x 8 or 16 x 16. Note that both
sizes are powers of 2, which simplifies computations.
  Another important factor in transform coding is bit allocation. This refers to the process of determining
which coefficients should be retained for coding and how coarsely each retained coefficient should be
quantized. There are two main approaches: zonal coding and threshold coding. In zonal coding the
retained coefficients are selected on the basis of maximum variance. Thus, the locations of the
retained coefficients with the largest variances are indicated by a zonal mask that is the same for all
blocks. Once the retained coefficients are decided, a number of methods can be used to decide the
number of bits allocated to each. One method is to choose the number of bits to be proportional to
the variance of the coefficient. Figure 8.6(a) shows a zonal mask with the allocated bits. Once the
number of bits allocated for each coefficient is determined, a different quantizer can be designed for
each coefficient.
 
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_8.htm[12/31/2015 11:55:58 AM]


Objectives_template

  Module 8: Video Coding Basics


  Lecture 41: Performance measures, Intraframe coding, Predictive and transform coding
  
  One disadvantage of zonal coding is that the locations of the retained coefficients and the bits
  allocated to them are fixed for all blocks. In threshold coding, however, the locations and the bit
  allocation can be adapted to the characteristics of the block. For this reason, this method is
  employed by most video coding standards. In threshold coding, the retained coefficients are selected
 
on the basis of maximum magnitude. Thus, only those coefficients whose magnitudes are above a
 
threshold are retained. In practice, the thresholding and the following quantization operations are
 
  combined in one operation using a uniform threshold quantizer. In this case, a quantization matrix is
  used to define the quantizer step size, for each coefficient in the block.
  A typical quantization matrix is given in Figure 8.6(b). Note that low-frequency coefficients (towards
  top-left corner) are more finely quantized (i.e., quantized with a smaller step size) because of two
  reasons. First, the DCT tends to concentrate most of the energy in low frequencies. Second, the HVS
is more sensitive to variations in low frequencies. Since in threshold coding the locations of the
retained coefficients vary from block to block, those locations need to be encoded. A commonly used
strategy is to zigzag scan the transform coefficients, as illustrated in Figure 8.6(c), in an attempt to
produce long runs of zeros, and then RLE is used to encode the resulting array.

Compared to predictive coding, transform coding provides higher compression with less sensitivity to
errors and less dependence on the input data statistics. Its higher computational complexity and
storage requirements have been offset by advances in integrated circuit technology. One
  disadvantage, however, is that when compression factors are pushed to the limit, three types of
artefacts start to occur: (i) "graininess" due to coarse quantization of some coefficients, (ii) "blurring"
due to the truncation of high-frequency coefficients, and (iii) "blocking artifacts," which refer to
artificial discontinuities appearing at the borders of neighboring blocks due to independent processing
of each block. Since blocking artefacts are the most disturbing, a number of methods have been
proposed to reduce them. Examples are overlapping blocks at the encoder, the use of the lapped
orthogonal transform , and postprocessing using filtering and image restoration techniques.
 
 

file:///D|/...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2041/41_9.htm[12/31/2015 11:55:58 AM]

You might also like