Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Graphics and Multimedia Systems Unit 13

Unit 13 Video
Structure:
13.1 Introduction
Objectives
13.2 MPEG Compression Standard
MPEG-1
MPEG-2
MPEG-4
13.3 Compression Through Redundancy
Frame Types
13.4 Frame Compression
13.5 Summary
13.6 Terminal Questions
13.7 Answers

13.1 Introduction
In the previous unit, we discussed the concepts of digital audio, the role of
MIDI, its devices and messages. We also discussed how audio signals can
be processed and sampled. The previous unit concluded with a study on the
different techniques of audio compression such as Differential Pulse Code
Modulation, Adaptive Differential PCM, Adaptive Predictive Coding, and
Linear Predictive Coding.In this unit we will discuss the importance of video
and compression and the various techniques used for this.
MPEG video compression is used in many current as well as emerging
products. It is at the heart of digital television products and other
applications. These applications benefit from video compression as they
require less storage space for archived video information, less bandwidth for
the transmission of the video information from one point to another or a
combination of both. It is also ideal for a wide variety of applications since it
is defined in two finalized international standards. Video data is traditionally
represented in the form of a stream of images, called frames. Sequence of
frames between the consecutive frames creates redundancy. In this unit we
will discuss the different types of frames and the various types of frame
compressions.

Sikkim Manipal University B1552 Page No.: 210


Graphics and Multimedia Systems Unit 13

Objectives:
After studying this unit, you should be able to:
 list and discuss various MPEG compression standards.
 discuss the techniques of compression through redundancy.
 list and explain the types of frames.
 explain the concept of interframe and intraframe compression.

13.2 MPEG Compression Standard


Compression aims at lowering the total number of parameters required to
represent the signal, while maintaining good quality. These parameters are
then coded for transmission or storage. A result of compressing digital video
is that it becomes available as computer data, ready to be transmitted over
existing communication networks. The MPEG was formed by the ISO to
formulate a set of standards relating to a range of multimedia applications
that involve the use of video with sound. There are three standards namely
MPEG-1, MPEG-2 and MPEG-4. Each group is targeted at a particular
application domain and describes how the audio and video are compressed
and integrated together. In this section we will discuss the three standards in
the context of video coding.
13.2.1 MPEG-1
The MPEG-1 video standard is defined in ISO Recommendation 11172. The
resolution used for the NTSC and PAL is given below.

It is basically used for VHS quality audio and video on CD-ROM at a bit rate
of 1.5 Mbps. MPEG-1 uses a combination of I-frames only, I- and P-frames
only, or I-,P- and B-frames. It does not support D-frames. The compression
algorithm used is based on the H.261 standard with two main differences.
The first is that timestamps are inserted which will enable the decoder to
resynchronize quickly when there are one or more corrupted or missing
macroblocks. The number of macroblocks between two timestamps is
known as a slice and a slice can comprise from 1 to a maximum number of
macroblocks in a frame that is 22. The second difference is because
B-frames are supported by MPEG-1 and this increases the time interval

Sikkim Manipal University B1552 Page No.: 211


Graphics and Multimedia Systems Unit 13

between I and P frames. The compressed bit stream produced by the


MPEG-1 video coder is hierarchical and is shown in figure 13.1.

Figure 13.1: MPEG-1 video bit stream

The compressed video is known as sequence which in turn consists of a


string of groups of pictures (GOPs) each containing a string of I, P or B
frames/pictures in the considered sequence. Each frame is made of up of N
slices, each of which comprises of multiple macroblocks. Each macro block
consists of six 8x8 pixel block as shown in figure 13.1. For the decoder to
decode the received bit stream, each field has to be clearly specified within
the bit stream. The format of the bit stream is shown in figure 13.2.

Figure 13.2: Format of MPEG-1 bit stream

Sikkim Manipal University B1552 Page No.: 212


Graphics and Multimedia Systems Unit 13

13.2.2 MPEG-2
The MPEG-2 video standard is defined in ISO Recommendation 13818. It is
used in the recording and transmission of studio quality audio and video.
The basic coding structure of MPEG-2 video is the same as that of MPEG-1
with some difference. In this case there are different levels of video
resolution possible:-
Low: It is based on SIF digitization format with a resolution of 352 x 288
pixels. It is comparable with MPEG-1 format and produces VHS-quality. The
audio is of CD quality and the target bit rate is up to 4 Mbps.
Main: It is based on 4:2:0 digitization formats with a resolution of 720 x 576
pixels. It produces studio quality video and audio with bit rate up to 15 Mbps
or 20 Mbps with the 4:2:2 digitization format.
High 1440: It is based on the 4:2:0 digitization format with the resolution of
1440 x 1152 pixels. It is proposed for high definition television (HDTV) at bit
rates up to 60 Mbps or 80 Mbps with the 4:2:2 digitization format.
High: It is based on the 4:2:0 with a resolution of 1920x1152 pixels. It is
proposed for wide screen HDTV at bit rates up to 80 Mbps or 100 Mbps with
the 4:2:2 digitization format.
For each of the above levels, MPEG-2 provides five profiles: simple, main,
spatial resolution, quantization accuracy, and high. The four levels and five
profiles collectively form a two-dimensional table which acts as a framework
for all standards activities associated with MPEG-2. Since the Low level is
compatible with MPEG-1, discuss here is only the main profile at the main
level (MP@ML).
MP@ML: The main application of MP@ML is digital television broadcasting.
Hence interlaced scanning is used with a frame refresh rate of 30Hz for
NTSC and 25Hz for PAL. The 4:2:0 digitization format is used and the bit
rate ranges from 4Mbps to 15 Mbps. The coding scheme is similar to
MPEG-1 with only a difference in the scanning method. It uses interlaced
scanning instead of progressive scanning which results in each macroblock
having two fields: odd and even fields each with the refresh rate of 60Hz for
NTSC or 50Hz for PAL. Therefore for the I-frame, the DCT blocks from each
macroblock have to be derived. So there are two alternatives:

Sikkim Manipal University B1552 Page No.: 213


Graphics and Multimedia Systems Unit 13

 Field Mode: This method is used for encoding DCT blocks when a large
amount of motion is present. This means that there will be shorter time
difference between successive fields due to which compression ratio will
be higher. For example, a live cricket match video can be encoded using
this mode as there will be a large amount of movement.
 Frame Mode: This method is used when there is small amount of
movement. This means that the time intervals between successive fields
are large and the compression ratio very low. Hence the DCT blocks are
encoded from each complete frame. For example, a news broadcast can
be encoded using this mode since the movement between frames is
more as compared to the movements in the field.
Similarly for encoding P- and B-frames, three different modes are possible:
field, frame and mixed. The field and frame mode works the same way as
I-frame with additional consideration of the preceding frame filed for both
P-and B-frame and for B-frame, the immediately succeeding field. In the
mixed mode, both the motion vectors of frame and field modes are
computed and the one with the smallest value is selected.
13.2.3 MPEG-4
This standard is related to interactive audio and video over the internet and
other entertainment network. This standard has a feature that allows the
user to not only access a video sequence, but also to influence
independently, the individual elements that make up the video sequence. In
short, using MPEG-4 standard, the user can be made capable of not only
starting, stopping and pausing functions, but also, repositioning, deleting
and altering the movement of each characters within a scene. The MPEG-4
standard has a very high coding efficiency and therefore it can be used over
low bit-rate networks like wireless and PSTNs (Public Switched Telephone
Network). The MPEG-4 standard is a good alternative for the H.263
standard.
An important feature of MPEG-4 is content based coding which signifies that
before the video is compressed, each scene is defined in the form of a
background and one or more foreground audio-visual objects (AVO). Each
AVO is composed of one or more video objects and/or audio objects. Taking
the example of a news broadcast, the laptop in a scene can be considered
as a single video object while the news reader can be defined as using both

Sikkim Manipal University B1552 Page No.: 214


Graphics and Multimedia Systems Unit 13

an audio and video object. Similarly, each video and audio object can further
be made up of many sub-objects. In the case of news reader example there
is a movement in his/her eyes and mouth only. So the reader’s face can be
defined in the form of three sub-objects: one each for head, eye and mouth.
Once the content-based coding is done, the encoding of the background
and each AVO is carried out separately.
Each audio and video object is described by an object descriptor which
enables a user to manipulate the objects. The language used to describe
the objects and define functions for manipulating the shape, size and
location of the objects is called Binary Format for Scenes (BIFS). In a
complete scene there may be many AVOs and some relation may exist
between these AVOs. So the relation between the AVOs is defined by a
scene descriptor. Each video scene is segmented into a number of Video
Object Planes (VOP), each of which corresponds to an AVO of interest. For
example, in the news broadcast example, VOP0 represents the news
reader, VOP1 represents the laptop present in the table, VOP2 represent
the background setting of the studio. Each VOP is encoded separately
based on its shape, motion and texture as shown in figure 13.3.

Figure 13.3: VOP encoder

The resulting bit stream from the VOP encoder is encoded for transmission
by multiplexing the VOPs together with the related object and scene
descriptors as shown in figure 13.4(a). Similarly, at the receiver, the bit
Sikkim Manipal University B1552 Page No.: 215
Graphics and Multimedia Systems Unit 13

stream is first demultiplexed and the individual VOPs decoded. The


individual VOPs together with the object and scene descriptors are then
used to create the video frame that is played out on the terminal as shown in
figure 13.4(b).

(a)

(b)
Figure 13.4: MPEG-4 (a) Encoder (b) Decoder

Sikkim Manipal University B1552 Page No.: 216


Graphics and Multimedia Systems Unit 13

The audio associated with an AVO is compressed using any one algorithm,
depending on the available bit rate of the transmission channel and the
sound quality required. For example, CELP can be used for video
telephony. The audio encoder and decoder are included inside the MPEG-4
encoder and decoder as shown in figure 13.4(a) & (b).
Self Assessment Questions
1. MPEG-1 does not support D-frames. (True/False)
2. The MPEG-2 video standards is defined in ISO Recommendation
____________.
3. PSTN stands for __________________________________.
a. Public Standard Telephone Network
b. Public Switched Telephone Network
c. Private Switched Telephone Network
d. Private Standard Telephone Network
4. In VOP encoder the relation between the AVOs is defined by
_________________.

13.3 Compression through Redundancy


Assume that you are watching news broadcast in the television. What would
you observe apart from listening to the news? One can notice the movement
of a person’s lips or eyes whereas the background remains the same. That
means the sequence of frames related to the news broadcast consists of
many repeated information. The repetition of same information many times
is known as redundancy. Therefore redundancy can be eliminated/
reduced by taking into consideration only those portions in a frame that
involves some changes when compared to the previous frame. Redundancy
has been categorized into two types:
Spatial Redundancy: If neighboring pixels are similar within each frame
then it is known as spatial redundancy. For example, in an image of blue sky
there will be repeated pixel values. Figure 13.5 shows spatial redundancy
attribute.

Sikkim Manipal University B1552 Page No.: 217


Graphics and Multimedia Systems Unit 13

Figure 13.5: spatial redundancy

Temporal Redundancy: In a video when adjacent frames are similar then it


is known as temporal redundancy. The example about news broadcast
discussed above falls under this type of redundancy. Figure 13.6 shows
temporal redundancy attribute.
A video contains much spatial and temporal redundancy. Therefore
compression can be achieved by exploiting spatial and temporal
redundancy inherent to video. As only the difference (movement) between
the successive frames is being considered, the accuracy of the predicted
frames depends on the estimation of the difference. The method that
performs estimation is known as motion estimation. Since the estimation
method is not exact, some additional information must be sent that indicates
the small differences between the predicted and actual positions of the
moving segments involved. This process is known as motion
compensation.

Figure 13.6: Temporal redundancy

13.3.1 Frame types


Depending in the type of redundancy exploited there are different types of
frames are used as shown on figure 13.7. The details regarding each frame
is explained below.

Sikkim Manipal University B1552 Page No.: 218


Graphics and Multimedia Systems Unit 13

Figure 13.7: Frame types

I-frames (Intra-coded frames): Frames that are encoded independently of


any other frames are called I-frame (as shown in figure 13.7). These exploit
spatial redundancy within a frame. Each frame is treated as a separate
picture/image and the Y, Cr and Cb matrices are encoded separately using
JPEG. Therefore the compression achieved is relatively small. The first
frame of every video sequence must necessarily be an I-picture, since it
does not have any past reference. I-frames must be repeated at regular
intervals to avoid loss of the whole picture. This is because during
transmission it can get corrupted and the frame may be lost. The number of
frames/pictures between successive I-frames is known as a Group of
Pictures (GOP). Typical values of GOP are 3 – 12. When the I-frames are
received at the destination, they are decoded immediately to reconstruct the
original image as it does not depend on other frames.
P-frame (Predictive Frame): Frames that are encoded with reference to
either the previous coded I-frame or P-frame are called P-frame. P-frames
are encoded using a combination of motion estimation and motion
compensation. P-frames exploit temporal redundancy and therefore a
P‑ frame holds only the changes in the image from the previous frame. The
number of frames between a P-frame and the immediately preceding
I-frame or P-frame is called the prediction span. When P-frames are
received at the destination, they are decoded first and then the resulting
information is used together with the decoded information of the preceding
I- or P-frame to derive the decoded frame contents.
B-frame (Bi-directionally predicted frame): Frames that are encoded with
reference to both the previous and future coded frames are known as

Sikkim Manipal University B1552 Page No.: 219


Graphics and Multimedia Systems Unit 13

B-frame. These frames have the highest level of compression and because
they are not involved in the coding of other frames, they do not propagate
errors.

Consider encoded frame sequence I B B P B BI . Figure 13.8 shows the


relation between different frame types for a given sequence of frames.

Figure 13.8: Relation between different frames

When B-frames are received at the destination, they are first decoded and
then the resulting information is used together with the decoded information
of the preceding I- or P-frame and the immediately succeeding I- or P-frame
to derive the decoded frame contents. While decoding B-frame, if either the
preceding or succeeding I- or P-frame is not available, then the time
required to decode the B-frame increases. Therefore to minimize the
decoding time all the required frames should be made available, and the
reordering of frames are done. For example, if the uncoded frame sequence
is I B B P B B P B BI ... then,
Let us number these frame for easy understanding, therefore,
0123456789
I B B P B B P B BI ...
Then the reordered sequence would be
0312645789
I P B B P B BBBI ...

Sikkim Manipal University B1552 Page No.: 220


Graphics and Multimedia Systems Unit 13

PB-frame: A PB-frame consists of two consecutive frames (P- and


B- frames) being coded as one unit. Figure 13.9 shows this type of frame.

Figure 13.9: BP-frame

D-frame: D-frames also called DC-frames are independent frames where


only the DC coefficients are encoded. D-frames are of very low quality and
are never referenced by I-, P- or B- frames.
Self Assessment Questions
5. The repetition of same information many times is known as
________________.
6. Additional information indicates the small differences between the
predicted and actual positions of the moving segments are called
____________
a. motion estimation b. motion calculation
c. motion compensation d. motion segmentation
7. Number of frames between a P-frame and the immediately preceding
I-frame or P-frame is called ___________________.

13.4 Frame Compression


Following are the two common and powerful ways of reducing video data.
Spatial Compression: Spatial Compression refers to compression applied
to a single frame of data. The frame is compressed independently of any
surrounding frames. Compression can be Lossless or Lossy. A spatially
compressed frame is often referred to as an “intraframe.”, I frame or
Keyframe.
Intraframe Compression looks for redundant or imperceptible data within
each frame and eliminates it, but keeps each frame separate from all others.
Sikkim Manipal University B1552 Page No.: 221
Graphics and Multimedia Systems Unit 13

It is a commonly used method which works by comparing each frame in the


video with the previous one. If the frame contains areas where nothing has
moved, the system simply issues a short command that copies that part of
the previous frame, bit-for-bit, into the next one. If sections of the frame
move in a simple manner, the compressor emits a command that tells the
decompressor to shift, rotate, lighten, or darken the copy – a longer
command, but still much shorter than intraframe compression. Interframe
compression works well for programs that are likely to be simple played
back by the viewer, but can cause problems if the video sequence needs to
be edited. Since interframe compression copies data from one frame to
another, if the original frame is simply cut out (or lost in transmission), the
following frames cannot be reconstructed properly.
Temporal Compression: Temporal compression identifies the differences
between frames and stores only those differences. Unchanged areas are
simply repeated from the previous frame(s). A temporally compressed frame
is often referred to as an “interframe.” or P frame
Interframe compression looks for portions of the image which do not
change from frame to frame and encodes them only once. This usually
involves saving a Keyframe which is a single, complete frame, and then
saving a series of Delta frames which only contain the changes for each
subsequent frame.
All CODECs use some form of intraframe compression. CODEC is short
form of "Compressor/Decompressor" and refers to the particular scheme
used to reduce the video data. More efficient CODECs generally use both
intraframe and interframe compression. Intraframe CODECS are well suited
for acquisition and post-production because they keep every frame whole
and separate, making it easy to cut the video at any point. CODECs which
use both are generally better suited to distribution because they allow much
lower file sizes with higher picture quality.
Digital video compression is always a trade-off between file size and image
quality. Lossless CODECs usually cannot reduce the data by more than
half. Most CODECs attempt to reduce file size by throwing away redundant
information first, then eliminating information that is least likely to be
perceived by the viewer (Perceptual Encoding). The more compression is

Sikkim Manipal University B1552 Page No.: 222


Graphics and Multimedia Systems Unit 13

applied, the farther one pushes beyond the perception threshold, with the
result being a noticeable reduction in image quality.
Self Assessment Questions
8. Spatially compressed frame is often referred to as an _____________.
9. CODEC is the short form of Compressor/Decompressor. [True/False]
10. ______________________ identifies the differences between frames
and stores only those differences.

13.5 Summary
This unit provides information about the various video compression
techniques. Let us recapitulate the unit content. One of the most popular
MPEG standards has evolved from the original MPEG-1 audio and video
compression schemes into MPEG-2 now used for digital cable, satellite and
terrestrial broadcasting, DVD, High-definition and many other applications
and MPEG-4, which includes several improvements over MPEG-2, including
multi-directional motion vectors, ¼-pixel offset, object coding for greater
efficiency, separate control of individual picture elements and behavior
coding for interactive use. Spatial and temporal are the two categories of
compression technique helps to compress the video in an efficient way. Inter
and intra frame compression helps save large amounts of data when
compared to the task of storing or transmitting a full description of every
pixel in the original image.

13.6 Terminal Questions


1. Explain MPEG-1 video standard.
2. Discuss the role of VOP encoder.
3. Explain MPEG-4 encoder and decoder.
4. Brief about spatial and temporal redundancy.
5. Explain the different types of frames.
6. Differentiate between interframe and intraframe compression.

13.7 Answers
Self Assessment Questions
1. True
2. 13818
3. (b) Public Switched Telephone Network
Sikkim Manipal University B1552 Page No.: 223
Graphics and Multimedia Systems Unit 13

4. scene descriptor.
5. redundancy
6. (c) motion compensation
7. Prediction span
8. intraframe
9. True
10. Temporal compression

Terminal Questions
1. The MPEG-1 video standard is defined in ISO Recommendation 11172.
Refer subsection 13.2.1.
2. Each video scene is segmented into a number of Video Object Planes
(VOP), each of which corresponds to an AVO of interest. Refer sub
section 13.2.3.
3. The resulting bit stream from the VOP encoder is encoded for
transmission by multiplexing the VOPs together with the related object
and scene descriptors. Similarly, at the receiver, the bit stream is first
demultiplexed and the individual VOPs decoded. Refer Sub Section
13.2.3.
4. The repetition of same information many times is known as redundancy.
Redundancy has been categorized into two types they are spatial and
temporal redundancy. Refer Section 13.3.
5. Redundancy can be eliminated / reduced by taking into consideration
only those portions in a frame that involves some changes compared to
the previous frame. Depending in the type of redundancy exploited there
are different types of frames. Refer sub section 13.3.1.
6. Intraframe compression looks for redundant or imperceptible data within
each frame and eliminates it, but keeps each frame separate from all
others. Interframe compression looks for portions of the image which
don't change from frame to frame and encodes them only once. Refer
Section 13.4.

Sikkim Manipal University B1552 Page No.: 224

You might also like