Professional Documents
Culture Documents
SLM - Unit 13
SLM - Unit 13
Unit 13 Video
Structure:
13.1 Introduction
Objectives
13.2 MPEG Compression Standard
MPEG-1
MPEG-2
MPEG-4
13.3 Compression Through Redundancy
Frame Types
13.4 Frame Compression
13.5 Summary
13.6 Terminal Questions
13.7 Answers
13.1 Introduction
In the previous unit, we discussed the concepts of digital audio, the role of
MIDI, its devices and messages. We also discussed how audio signals can
be processed and sampled. The previous unit concluded with a study on the
different techniques of audio compression such as Differential Pulse Code
Modulation, Adaptive Differential PCM, Adaptive Predictive Coding, and
Linear Predictive Coding.In this unit we will discuss the importance of video
and compression and the various techniques used for this.
MPEG video compression is used in many current as well as emerging
products. It is at the heart of digital television products and other
applications. These applications benefit from video compression as they
require less storage space for archived video information, less bandwidth for
the transmission of the video information from one point to another or a
combination of both. It is also ideal for a wide variety of applications since it
is defined in two finalized international standards. Video data is traditionally
represented in the form of a stream of images, called frames. Sequence of
frames between the consecutive frames creates redundancy. In this unit we
will discuss the different types of frames and the various types of frame
compressions.
Objectives:
After studying this unit, you should be able to:
list and discuss various MPEG compression standards.
discuss the techniques of compression through redundancy.
list and explain the types of frames.
explain the concept of interframe and intraframe compression.
It is basically used for VHS quality audio and video on CD-ROM at a bit rate
of 1.5 Mbps. MPEG-1 uses a combination of I-frames only, I- and P-frames
only, or I-,P- and B-frames. It does not support D-frames. The compression
algorithm used is based on the H.261 standard with two main differences.
The first is that timestamps are inserted which will enable the decoder to
resynchronize quickly when there are one or more corrupted or missing
macroblocks. The number of macroblocks between two timestamps is
known as a slice and a slice can comprise from 1 to a maximum number of
macroblocks in a frame that is 22. The second difference is because
B-frames are supported by MPEG-1 and this increases the time interval
13.2.2 MPEG-2
The MPEG-2 video standard is defined in ISO Recommendation 13818. It is
used in the recording and transmission of studio quality audio and video.
The basic coding structure of MPEG-2 video is the same as that of MPEG-1
with some difference. In this case there are different levels of video
resolution possible:-
Low: It is based on SIF digitization format with a resolution of 352 x 288
pixels. It is comparable with MPEG-1 format and produces VHS-quality. The
audio is of CD quality and the target bit rate is up to 4 Mbps.
Main: It is based on 4:2:0 digitization formats with a resolution of 720 x 576
pixels. It produces studio quality video and audio with bit rate up to 15 Mbps
or 20 Mbps with the 4:2:2 digitization format.
High 1440: It is based on the 4:2:0 digitization format with the resolution of
1440 x 1152 pixels. It is proposed for high definition television (HDTV) at bit
rates up to 60 Mbps or 80 Mbps with the 4:2:2 digitization format.
High: It is based on the 4:2:0 with a resolution of 1920x1152 pixels. It is
proposed for wide screen HDTV at bit rates up to 80 Mbps or 100 Mbps with
the 4:2:2 digitization format.
For each of the above levels, MPEG-2 provides five profiles: simple, main,
spatial resolution, quantization accuracy, and high. The four levels and five
profiles collectively form a two-dimensional table which acts as a framework
for all standards activities associated with MPEG-2. Since the Low level is
compatible with MPEG-1, discuss here is only the main profile at the main
level (MP@ML).
MP@ML: The main application of MP@ML is digital television broadcasting.
Hence interlaced scanning is used with a frame refresh rate of 30Hz for
NTSC and 25Hz for PAL. The 4:2:0 digitization format is used and the bit
rate ranges from 4Mbps to 15 Mbps. The coding scheme is similar to
MPEG-1 with only a difference in the scanning method. It uses interlaced
scanning instead of progressive scanning which results in each macroblock
having two fields: odd and even fields each with the refresh rate of 60Hz for
NTSC or 50Hz for PAL. Therefore for the I-frame, the DCT blocks from each
macroblock have to be derived. So there are two alternatives:
Field Mode: This method is used for encoding DCT blocks when a large
amount of motion is present. This means that there will be shorter time
difference between successive fields due to which compression ratio will
be higher. For example, a live cricket match video can be encoded using
this mode as there will be a large amount of movement.
Frame Mode: This method is used when there is small amount of
movement. This means that the time intervals between successive fields
are large and the compression ratio very low. Hence the DCT blocks are
encoded from each complete frame. For example, a news broadcast can
be encoded using this mode since the movement between frames is
more as compared to the movements in the field.
Similarly for encoding P- and B-frames, three different modes are possible:
field, frame and mixed. The field and frame mode works the same way as
I-frame with additional consideration of the preceding frame filed for both
P-and B-frame and for B-frame, the immediately succeeding field. In the
mixed mode, both the motion vectors of frame and field modes are
computed and the one with the smallest value is selected.
13.2.3 MPEG-4
This standard is related to interactive audio and video over the internet and
other entertainment network. This standard has a feature that allows the
user to not only access a video sequence, but also to influence
independently, the individual elements that make up the video sequence. In
short, using MPEG-4 standard, the user can be made capable of not only
starting, stopping and pausing functions, but also, repositioning, deleting
and altering the movement of each characters within a scene. The MPEG-4
standard has a very high coding efficiency and therefore it can be used over
low bit-rate networks like wireless and PSTNs (Public Switched Telephone
Network). The MPEG-4 standard is a good alternative for the H.263
standard.
An important feature of MPEG-4 is content based coding which signifies that
before the video is compressed, each scene is defined in the form of a
background and one or more foreground audio-visual objects (AVO). Each
AVO is composed of one or more video objects and/or audio objects. Taking
the example of a news broadcast, the laptop in a scene can be considered
as a single video object while the news reader can be defined as using both
an audio and video object. Similarly, each video and audio object can further
be made up of many sub-objects. In the case of news reader example there
is a movement in his/her eyes and mouth only. So the reader’s face can be
defined in the form of three sub-objects: one each for head, eye and mouth.
Once the content-based coding is done, the encoding of the background
and each AVO is carried out separately.
Each audio and video object is described by an object descriptor which
enables a user to manipulate the objects. The language used to describe
the objects and define functions for manipulating the shape, size and
location of the objects is called Binary Format for Scenes (BIFS). In a
complete scene there may be many AVOs and some relation may exist
between these AVOs. So the relation between the AVOs is defined by a
scene descriptor. Each video scene is segmented into a number of Video
Object Planes (VOP), each of which corresponds to an AVO of interest. For
example, in the news broadcast example, VOP0 represents the news
reader, VOP1 represents the laptop present in the table, VOP2 represent
the background setting of the studio. Each VOP is encoded separately
based on its shape, motion and texture as shown in figure 13.3.
The resulting bit stream from the VOP encoder is encoded for transmission
by multiplexing the VOPs together with the related object and scene
descriptors as shown in figure 13.4(a). Similarly, at the receiver, the bit
Sikkim Manipal University B1552 Page No.: 215
Graphics and Multimedia Systems Unit 13
(a)
(b)
Figure 13.4: MPEG-4 (a) Encoder (b) Decoder
The audio associated with an AVO is compressed using any one algorithm,
depending on the available bit rate of the transmission channel and the
sound quality required. For example, CELP can be used for video
telephony. The audio encoder and decoder are included inside the MPEG-4
encoder and decoder as shown in figure 13.4(a) & (b).
Self Assessment Questions
1. MPEG-1 does not support D-frames. (True/False)
2. The MPEG-2 video standards is defined in ISO Recommendation
____________.
3. PSTN stands for __________________________________.
a. Public Standard Telephone Network
b. Public Switched Telephone Network
c. Private Switched Telephone Network
d. Private Standard Telephone Network
4. In VOP encoder the relation between the AVOs is defined by
_________________.
B-frame. These frames have the highest level of compression and because
they are not involved in the coding of other frames, they do not propagate
errors.
When B-frames are received at the destination, they are first decoded and
then the resulting information is used together with the decoded information
of the preceding I- or P-frame and the immediately succeeding I- or P-frame
to derive the decoded frame contents. While decoding B-frame, if either the
preceding or succeeding I- or P-frame is not available, then the time
required to decode the B-frame increases. Therefore to minimize the
decoding time all the required frames should be made available, and the
reordering of frames are done. For example, if the uncoded frame sequence
is I B B P B B P B BI ... then,
Let us number these frame for easy understanding, therefore,
0123456789
I B B P B B P B BI ...
Then the reordered sequence would be
0312645789
I P B B P B BBBI ...
applied, the farther one pushes beyond the perception threshold, with the
result being a noticeable reduction in image quality.
Self Assessment Questions
8. Spatially compressed frame is often referred to as an _____________.
9. CODEC is the short form of Compressor/Decompressor. [True/False]
10. ______________________ identifies the differences between frames
and stores only those differences.
13.5 Summary
This unit provides information about the various video compression
techniques. Let us recapitulate the unit content. One of the most popular
MPEG standards has evolved from the original MPEG-1 audio and video
compression schemes into MPEG-2 now used for digital cable, satellite and
terrestrial broadcasting, DVD, High-definition and many other applications
and MPEG-4, which includes several improvements over MPEG-2, including
multi-directional motion vectors, ¼-pixel offset, object coding for greater
efficiency, separate control of individual picture elements and behavior
coding for interactive use. Spatial and temporal are the two categories of
compression technique helps to compress the video in an efficient way. Inter
and intra frame compression helps save large amounts of data when
compared to the task of storing or transmitting a full description of every
pixel in the original image.
13.7 Answers
Self Assessment Questions
1. True
2. 13818
3. (b) Public Switched Telephone Network
Sikkim Manipal University B1552 Page No.: 223
Graphics and Multimedia Systems Unit 13
4. scene descriptor.
5. redundancy
6. (c) motion compensation
7. Prediction span
8. intraframe
9. True
10. Temporal compression
Terminal Questions
1. The MPEG-1 video standard is defined in ISO Recommendation 11172.
Refer subsection 13.2.1.
2. Each video scene is segmented into a number of Video Object Planes
(VOP), each of which corresponds to an AVO of interest. Refer sub
section 13.2.3.
3. The resulting bit stream from the VOP encoder is encoded for
transmission by multiplexing the VOPs together with the related object
and scene descriptors. Similarly, at the receiver, the bit stream is first
demultiplexed and the individual VOPs decoded. Refer Sub Section
13.2.3.
4. The repetition of same information many times is known as redundancy.
Redundancy has been categorized into two types they are spatial and
temporal redundancy. Refer Section 13.3.
5. Redundancy can be eliminated / reduced by taking into consideration
only those portions in a frame that involves some changes compared to
the previous frame. Depending in the type of redundancy exploited there
are different types of frames. Refer sub section 13.3.1.
6. Intraframe compression looks for redundant or imperceptible data within
each frame and eliminates it, but keeps each frame separate from all
others. Interframe compression looks for portions of the image which
don't change from frame to frame and encodes them only once. Refer
Section 13.4.