z
zyxw
zyxw
MPEG Codina For Variable Bit Rate Video
Tfansmrssron
J
m
m
For real-time transmission of broadcast-quality video on ATM-based B-ISDN,
the intraframe to interframe ratio and the quantizer scale are two key parameters
that can be used to control a video source in a network environment. Their
impact on the traffic characteristics of the coder provides insights into the cell
arrival process for an MPEG source.
Pramod Pancha and Magda El Zarki
zyxwvutsrq
everal coding algorithms have been proposedforvariablebitrate (VBR)video.
This variety in coding techniques has
made the task of characterizing video
sources difficult. Previous studies of
bit r a t e statistics of video sources
have utilized coding algorithms such as conditional
replenishment [ 1-31, that are less efficient than
current coding techniques. A more recent study,
which provides insights into long run statistics
for c o d e d video sequences, uses block-based
intraframe discrete cosine transform (DCT) coding
141,but doesnot utilize any interframe coding. Models that utilize results from these studies, while important, may not be valid for source coders which utilize
newer coding techniques.
A source characterization method that models
the properties of a large variety of sources is ideally required f o r analysis. O n e a p p r o a c h f o r
achieving such models for video sources is to
identify the basic components of video coding
systems and from these properties predict bit
rates for sources [5].The goal is to then obtain a
universal source characterization framework
using these components. The basic components
that are needed to model a source, however, may
vary substantially among different classes of coding algorithms. As a result, the parameters needed in order to use this approach may not be easy
to obtain.
A n o t h e r a p p r o a c h , which is t h e o n e t h a t
we have taken in this article, is to obtain video
source models for coders that utilize a standard
algorithm that can be applied to a multitude of
video services. T h e o u t p u t stream of a video
coder, which complies with the Motion Pictures
E x p e r t G r o u p ( M P E G ) c o d i n g s t a n d a r d , is
s t u d i e d with a n National Television Systems
Committee (NTSC) quality video sequence as
t h e i n p u t . B e c a u s e t h e M P E G video c o d i n g
algorithm has b e e n p r o p o s e d f o r a variety of
applications, we also investigate t h e effect of
changing the coding parameters on the statistics
of interest.
The MPEG Coder
zyxwvutsrq
zyxwvuts
zyxwvuts
he M P E G coding algorithm was developed
T
primarily for storage of compressed video on
digital storage media. Provisions were therefore
made in the algorithm to enable random access,
fast fonvard/reverse searches, and other features
when decoding from any digital storage media.
However, the coding standard is flexible enough
to be suitable for a much wider range of video
applications. Recent applications of MPEG-like
coding algorithms have appeared for a variety of
video services from multimedia workstations to
high definition television.
The decision parameters for the compression
algorithm (source coder) were derived from the
MPEG Video SimulationModel Three (SM3) report
issued by the Simulation Model Editorial Group
of the ISO-IEC working group. The input video
sequence for the MPEG algorithm was a threeminute, 40-second sequence from the movie Star
Wars. This sequence was chosen such that it contained a mix of scenes with and without motion.
T h e sequence that was digitized from laser disc
has a frame resolution of 512 x 480 pixels, which
is similar to NTSC broadcast quality. It should be
noted that this resolution is significantly higher than
the 352 x 240 pixel resolution that is usually recommended for achieving video tape quality.
Two f e a t u r e s of t h e algorithm, which a r e
important to characterize the source, are the coding modes and the coding layers that are utilized.
These can strongly influence the output traffic
characteristics of the coder. We present an outline of the coding algorithm below with emphasis
on the parameters and processes that are of interest in understanding the cell generation process.
A more detailed description of the MPEG coding
standard can be obtained elsewhere (for example,
see [6]).
zyxwvutsrqpon
PRAMOD PANCHA is
working toward his Ph.D.in
electrical engineeringat the
University of Pennsvlvania.
zyxwvutsrqponmlkjih
MAGDA EL ZARKI is assistant professor in the Department of Electrical
Engincering at the Universiy
of Pennsylvania andparttime professor of telecommunication networks at Delp
University of Technology.
54
0163-6804/94/$04.001994 0 IEEE
Coding Layers
T h e M P E G simulation model utilized for generating VBR video can be organized into four
IEEE Communications Magazine
May 1994
zyxwvutsrqp
zyxwv
zyxwvutsrqpon
zyxw
zyxwvuts
zyxwvutsrqp
zyxw
--
I
_
layers. The layers are arranged below in order of
increasing spatial size:
Block - A block is the smallest coding unit in
the MPEG algorithm. It is made up of 8 x 8 pixels
and can b e o n e of three types; luminance ( Y ) ,
red chrominance ( C r ) , a n d blue chrominance
(Cb). T h e block is the basic unit in intraframe
D C T coded frames.
B
-____
zy
7
P
....
I
i
1
2
3
4
5
6
7
16
Frame
-.
Figure I.Frame layer coding sequence.
Macroblock - A macroblock is the basic coding
unit in the M P E G algorithm. A macroblock is a
1 6 x 1 6 pixel s e g m e n t in a f r a m e . S i n c e e a c h
chrominance component has one-half the vertical
and horizontal resolution of the luminance component, a macroblock consists of 4 Y, 1 Cr, and 1
Cb block.
Slice - A slice, which is a horizontal strip within
a frame, is the basic processing unit in the MPEG
coding scheme. Coding operations on blocks and
macroblocks c a n only b e p e r f o r m e d when all
pixels f o r a slice a r e available. A slice is a n
a u t o n o m o u s u n i t since coding a slice is d o n e
independently from its neighbors. I n this study,
e a c h f r a m e contains 30 slices of size 512 x 1 6
pixels.
Picture - A picture in M P E G terminology is
the basic unit of display and corresponds to a single frame in a video sequence. The spatial dimensions of a frame are variable and are determined
by t h e requirements of an application. In this
study a frame size of 5 12 x 480 pixels corresponding to NTSC television quality was utilized.
Coding Modes
In the MPEG coding scheme, a choice of several
coding modes is available at the frame and macroblock layer. For encoding frames, three modes
can be utilized; intraframe (I), predictive (P) and
interpolative (B). Frames coded in the intraframe
m o d e can only contain macroblocks c o d e d in
intraframe mode. For predictive and interpolative
(interframe) coded frames, however, macroblocks
can be coded in either motion-compensated or
intraframe modes. The choice of codingmodes influences thecodingefficiency of the MPEG algorithm.
A typical sequence of coding modes is shown in
Fig. 1. However. not all modes may be suitable
for all applications. Since interpolative coding is
non-causal, it may not be suitable for some applic a t i o n s b e c a u s e it r e q u i r e s o u t - o f - s e q u e n c e
transmission of frames. This would lead to longer
reconstruction delays and larger buffers at the receive r , which may b e undesirable. F o r situations
where this extra delay is permissible, interpolative coding may be used t o achieve higher compression ratios. However, it should be pointed
out that the complexity of interpolative coding may
make it more difficult t o achieve real-time rates
when coding NTSC quality television. In addition, by utilizing non-causal techniques, the video
service becomes more susceptible to errors since
there are more dependencies.
I n this study, coding has b e e n restricted t o
intraframe and predictive interframe modes and
all interpolative coded frames in Fig. 1 have been
replaced by predictive coded frames. The ratio of
total frames (intra predictive) to intraframes is
+
IEEE Communications Magazine
May 1994
zyxwvu
W Figure 2.Basic MPEG coding loop.
determined by a parameter N. With this parameter the coder can b e configured as an intraframe
coder (N = 1) or a mixed i n t r a h t e r f r a m e coder
as in Fig. 1 (N = 16).
T h e basic coding l o o p f o r t h e M P E G algorithm is shown in Fig. 2. T h e main processing
s t e p of t h e M P E G a l g o r i t h m is block-based
motion compensation in interframe coding and
block-based D C T in intraframe coding. The flow
chart for coding a macroblock is depicted in Fig.
3. In the intraframe coding mode, a frame is processed block by block. A two dimensional 8 x 8
D C T is applied t o all luminance a n d chromin a n c e blocks in t h e f r a m e . T h e coefficients
obtained from this transformation are then fed to
a quantizer. The step size for the quantization of
each D C T coefficient is obtained from an 8 x 8
quantizer matrix. This matrix ensures that the
low frequency D C T coefficients are quantized more
accurately (with a small step size) while the high
frequency coefficients are quantized more coarsely. The D C coefficient of the DCT, which remains
fairly constant throughout a frame, is coded differentially within a slice using the D C value of
the previous block as a predictor. This predictor
is reset at the beginning of every slice. Once the
64 quantized coefficients for each block in a macroblockare obtained,variable length codes are generated for the macroblock. As aresult of quantization
a significant proportion of the quantized coefficients
will be zero valued. These blocks of coefficients
can therefore be efficiently coded by relying on a
combination of run length coding and modified Huffman coding. For the most frequent combinations
of z e r o r u n l e n g t h s a n d n o n - z e r o coefficient
value that follow thesezero runsvariable lengthcodes
have beendefined in the MPEG standard. If novariable length code exists for a particular combination of zero run length and coefficient value, a
fixed length code is used instead.
55
-
zy
The MPEG
algorithm
uses several
parameters
which, when
vaned, can
lead to a
significant
change in the
characte&ics
o f a video
source.
zyxwvutsrqp
zyxwvuts
zyx
zyxw
Figure 3.MPEG macroblock codingJow chart.
In the predictive coding mode, frames are processed on a macroblock basis. T h e initial step in
the coding process is macroblock based motion
estimation. A s q u a r e a r e a of 32 pixels around
each macroblock is chosen as the motion vector search
area. A potential motion vector is identified within the search area,whichminimizes the absolute macroblock difference between the current macroblock
a n d a displaced (predicted) macroblock in the
previous frame. This motion vcctor is then utilized indetermining thecoding mode for thecurrent
macroblock.
If the absolute difference between the current
macroblock and the motion compensated macroblock
is less than a threshold, the motion vectors are
variable length coded and transmitted. T h e prediction e r r o r . a f t e r m o t i o n c o m p e n s a t i o n is
a p p l i e d t o t h e macroblock, is e n c o d e d using
DCT. Thc coefficients obtained from the transform of the prediction error are then quantized.
Unlike the intraframe quantizer matrix. the quantization step size for these error terms are constant for all coefficients. The quantized coefficients
are then variable length coded as in the intraframe
mode. Ifthe predictionerror islargewhencompared
to the total enerLy in the macroblock, predictive coding mode is not utilized and all blocks in the current macroblockare coded using the intraframe mode
as previously described.
With a fixed quantizer matrix for each coding
56
mode, it is possible to scale all quantization levels
using a single parameter, the quantizer scale, q.
I n constant bit rate (CBR) coding, this parameter
is utilized t o dynamically vary t h e bit r a t e to
ensure that the number of bits generated for a
sequence iswithin the target range ofbit rates. However, varying q leads to a variable image quality in
the reconstructed frame at the receiver. In VBR
video, q is not varied in order to maintain a fixed
image quality for all frames.
Modeling MPEG Video
revious statistical studiesand models have restrictP
ed their attention to the number of bits generated per frame for a given coding algorithm. The
total packet arrivals per frame is then obtained by
converting this number to the equivalent number
of ATM cells. In our study we look at the traffic
characteristics of the MPEG video source at the
frame, slice and macroblock layers. The statistical
properties of the coder at these layers are examined to obtain a better understanding of the cell
generation process [7].
The MPEG algorithm uses several parameters
that when varied can lead to a significant change
in the characteristics of a video source. Two of
these parameters, N , the interframe to intraframe
ratio. and q. the quantizer scale. a r e especially
interesting as they can potentially be used to control
IEEE Communications Magazine
May 1994
zyxwvutsrqpon
zyxwvutsrq
zyxwvut
Figure 4. Bits-generated-per-framefor sequence (enlarged 30 seconds sequence).
a video source through network feedback. For
example, if the network cell loss rate temporarily
increases, then. by decreasing N , the frequency of
intraframes in the MPEG coding sequence could be
increased. Since intraframes halt error propagation,
this action will ensure that the durationoferror propagation is short. Similarly. if the network detects
t h e o n s e t of congestion, a feedback message
could b e sent t o all video sources t h a t would
require them to decrease their bit rates. This can best
be accomplished by increasing 4, at the expense
of a temporary loss of quality. There is therefore
aneed tocharacterize thevideosourcenot onlyunder
normal operation but also to identify the changes
that occur when these parameters are modified.
Frame Layer
At the frame layer, the traffic characteristics of
the source can be modeled by assuming a simple
packetization process. T h e bits generated from
the M P E G coding process a r e placed in ATM
cells. A user payload size of 44 bytes is utilized in
order to accommodate a 4 byte ATM adaptation
layer (AAL) header for each cell. O n e constraint
that is imposed on the cell arrival process is that
bits generated for a frame will not be combined
with the bits of the next frame. This implies that
at least o n e cell is always sent for each frame.
T h i s assumption is justified since the MPEG
frame header information that must be transmit-
zyxwvutsrqp
zyxwvutsrqpon
zyxwvutsr
IEEE Communications Magazine
May 1994
57
zyxwvutsrqponm
zyxwvuts
zyx
zyxwvutsrqp
zyxwvutsr
Figure 6. Cells-per-frame autocowelation function for standard coder.
ted for each frame will require transmission of at
least one cell.
Statistical Results -The
coder was initially run
several times to obtain an estimate of q at a “normal” operating point. This was done by determining if the visual quality of the reconstructed
frames for various q was acceptable. For a q of 8,
the image quality was judged t o be good for all
viewed frames. When q is increased beyond this
value, some blocking effects are visible in some
Figure 7. Effect of N cells on per-frame average.
Figure 8 . Relution offrunie and slice layer model.
58
reconstructed frames. In this study, we investigated the effect of varying q (3 values; 4, 8, and
16), o n the traffic characteristics of the coded
video bit stream. The effect of varying the type
of coder from a pure intraframe coder ( N = 1 )
to a mixed intra/interframe coder ( N = 16) to a
p u r e i n t e r f r a m e c o d e r ( N = 2688) was a l s o
investigated .
Figure 4 illustrates a sample bits per frame
time series for a ( N = 16, q = 8) coder. The mean
bit rate for this sample is 1.2 Mbis. This value is
within the range we would expect for this coder.
since the frame resolution we utilize is higher
t h a n t h e f o r m a t r e c o m m e n d e d in t h e MPEG
specifications. The periodic impulses i n the bits
per frame trace which occur every 16 frames are
due to the intraframes. This phenomenon is not
observed with the N = I a n d N = 2688 coders.
For coding schemes that only utilize intraframe
or interframe techniques these large changes in
bit rate arecausedmainlybyscenechangesthat occur
approximately every 6 seconds. Although these transitions as a result of scene changes still exist in
mixed i n t r a l i n t e r f r a m e MPEG coding, t h e i r
effect is masked to some degree by the periodic
intraframe coding process.
The traffic characteristics of the video source are
presented in Figs. 5 and 6 . An interesting observation is that the mean and variance of the cell
arrivals p e r frame f o r t h e ( N = 16, q = 4 ) a n d
( N = 2688. q = 4) coder are very similar. In fact.
as Fig. 7 shows, when N is increased the average
cell arrivals per frame decreases until N=32 but
increases f o r l a r g e r values of N . T h e initial
decrease in bit rate as A‘ increases occurs because
greater compression can be achieved with predictive coding. However, ifintraframe coding is not utilized f o r long periods, e r r o r s from predictive
coding begin t o accumulate leading to less efficient coding which causes the increase in bit rates
seen in Fig. 7.
The probability mass function (pmf) or distribution ofATMcellsgenerated per frame for acoder
for different ( N . q ) combinations is shown in
Fig. 5 . It is interesting t o note that the distribution for a ( N = 1, q = 4) coder is quite different
from a ( N = 16, q = 4 ) coder, in that both the
average number of cells per frame is higher and
zyxw
IEEE Communications Magazine * May 1994
zyxwvutsrqpon
zyxwvutsrq
zyxwvutsrqpo
W Figure 9. Cells-per-slicepmf for standard coder.
the spread of the distribution is greater. It can
also be noticed that increasing q causes the shape
of the pmf to become more one sided. This observation is somewhat similar to those made in [3],where
it was shown that a source of lower resolution,
like video conferencing, had a skewed probability
density function when compared to a broadcast television source.
The autocorrelation function for cells generated per frame for 3 differentvaluesofNandq isshown
in Fig. 6. For the case where both interframe and
intraframe coding were utilized (N = 16), impulses in the autocorrelation function occur every N
frames confirming that significant correlation exists
between the intraframes in the cell generation
process. When only intraframe ( N = 1)codingis utilized, correlation between cellsgenerated per frame
exists for up to 300 frames.
I
.
0I
Slice Layer
If network resource allocation decisions can be
made using a more detailed source characterization,
a slice layer model may be a better choice than a
frame layer model (since the MPEG coding process is essentially slice-based). In addition, slices
are the smallest self-contained unit of variable
length coded data in the MPEG coding scheme
and are therefore the smallest independent units
of measure. It should also be noted that the time
frame in which cells are generated for an encoded slice is on the order of 1 ms. This time scale is
closer t o the cell level process in ATM than the
cells-per-frame statistics used in previous studies
of video sources and should therefore be more accurate for modeling. This leads us to the hypothesis
that a better sourcecharacterizationcan beobtained
by examining the process at the slice layer.
zyxwvut
zyxwvutsrq
zyxwvutsrqp
I
1.0 I
zyxwvutsrq
---7
Figure 10. Cells-per-slice autocovelation function for standard coder.
IEEE Communications Magazine
May 1994
59
zyxwvutsrqpon
zyxwvutsr
Figure 11. Burst size and interam'val time pmf
I
lndianaJones2
I
500.73
1095
604.14
I
Last Emperor 2
League of Our Own
809
News
.
I
1252
I
~
zyxwvutsrq
zyxwvutsrq
I 1452 I
tad Emperor 1
352.24
I
I
537.43
I
852
222.67
1
i
110.02
!
204.28
172.00
I
248.21
zyxwvuts
zyxw
113.20
Table 1. Cells-per-frame statistics for several
sequences.
cells-per-frame statistics. A comparison of the
pmf of cells per slice (Fig. 9) with that of cells per
frame (Fig. 5 ) shows the cells-per-slice distribution to be smoother. This is especially true for the
high quality sources ( q = 4) and for the tails of
the distributions.
T h e autocorrelation function for the number
of cells generated in a slice is extremely high for
lags of up to 1200 slices (40 frames), as shown in
Fig. 10. In fact. the structure of the correlations is
more pronounced in the cells-per-slice data than
in the cells-per-frame data because the effect of
cross-correlations between different slices within
a frame are separated from the correlations between
the same slice in adjacentframes.The resultingperiodicity that occurs in the autocorrelation function
every30slicescan beseen in Fig. 10.Thecorrelation
peaks from the intraframe components are also
visible but are less pronounced than for the cellsper-frame autocorrelation function. This obscrvation implies that it may be possible to treat eachslice
as an independent unit and model each frame as
30 independent slices.
I
The slice layer model's relation to the frame layer
model is depicted in Fig. 8. For t h e slice layer
model we assume that the time to code each slice
is essentially a constant, T',,since the coding processfiir each macroblock isvery similar. Cell arrivals
from successive slices in a frame can therefore be
depicted as a periodic process, where the cells
generated for a slice a r e assumed t o arrive a s a
burst of cclls at the end of a slice. In this model,
the bits of o n e slice will not be combined with
those of other slices and one cell is always generated for every slice. Note that the number of cells
generated per frame will be slightly higher than
for the frame layer model since half filled cells
will exist at the end of each slice. However the
increase in cell generation r a t e is small (e.g.,
227.69 versus 242.1 cellsiframe for N = 16, q = 4)
and treating slices as independent packetization
units can be useful for error recovery mechanisms.
Macroblock Layer
An a l t e r n a t e a p p r o a c h t o modeling o n a subFramelevel is tocharacterize the source trafficat the
macroblock laycr by examining the cell interarrival process. If the bits for a macroblock a r e
packetized into cells as they are generated by the
coder, the interarrival times for cells can be used
to model the source. Using this technique, the
interarrival distribution for bursts combined with
t h e b u r s t length d i s t r i b u t i o n c a n b e used t o
describe t h e cell g e n e r a t i o n process for e a c h
f r a m e . F o r this m o d e l . t h e units of t h e burst
interarrival time in Fig. I1 are expressed in terms
of a normalized value, the time required to encode
a macroblock. T h e r e a l - t i m e n a t u r e of video
results in a maximum value for this normalized
unit that isdeterminedbythe frame rateand the spatial dimensions (e.g., 43 ms for this video sequence).
From Fig. 11. it can be observed that the distributions of both the mixed intraiinterframe and
intraframe source coder a r e approximately
exponential. However, it should be noted that in
intraframecoding burstsof greater than one cell per
macroblock are rare. while for mixed intra/interframe coding, burst lengths of up to six cells per
zyxwvutsr
zy
StatisticalResults-The statistics ofthe cells-perslice process (Fig. 9) show the same trends as the cell
per frame process (e.g., the standard deviation
for mixed intraiinterframe coders is proportionally larger when compared t o the mean than for
intraframe coders). T h e cells-per-slice statistics
t h e r e f o r e c a p t u r e the same p r o p e r t i e s a s t h e
60
IEEE Communications Magazine
May 1994
zyxwvutsrqpon
zyxwvutsr
macroblock were possible. This reinforces the
observation that the mixed intrahnterframe coder
generates more bursty traffic at all layers than the
intraframe coder.
Effect of Scene Content
In the previous section the statistical properties
of video traffic for a variety of s o u r c e coding
parameters were examined in detail using a single
video sequence as input. Another important factor in
determining the characteristics of video traffic is the
scene content of the input video sequence. In this
context, the scene-content-related factors encompass any component that is intrinsic to a particular scene or sequence. These factors, which can
vary across video sequences, include color, brightness. a n d motion c o n t e n t . By utilizing a long
video sequence to collect statistics we can eliminate some local variations associated with particular scenes. However, it is a l s o i m p o r t a n t to
determine the efficacy of the coding algorithm
for several different types of input sequences.
This will provide an indication o f the range of bit
rates and associated statistics which will be generated by asourcewhich utilizes the MPEG algorithm.
Statistics for nine two-minutc sequences from
a variety of sources that were digitized and encoded using the MPEG coding parameters ( N = 16,
4 = 4)are given in Tahle 1. The results show the variability in the average a n d peak cells per frame
t h a t can exist when encoding sequences with
differing characteristics. From the table. it can be
observed t h a t t h e bit r a t e f o r t h e S f u r Wars
sequence is lower than for the other sequences. A
plausible explanation for this effect is that the
color content in the Stur W m sequence is fairly low
(i.e.,colorsareclose tograyscale). Inaddition, some
of these sequences contain m o r e image detail
(i.e., more high frequency components) than the Star
Wars sequence which also results in lower compression ratios.
T h e average rates for the sequences varies
between 2.1 Mbis for “News” t o 7.8 Mbis f o r
“Boxing.” The activity contained in a sequence is
closely r e l a t e d t o t h e a v e r a g e r a t e s t h a t a r e
required t o encode the sequences. It is evident
from the results that although the average bit
ratesvarywith the xenescontained in each sequence,
s o m e basic statistical p r o p e r t i e s of the video
source trafficare quite similar even when sequences
with these differing characteristics are encoded.
In particular, the ratio of mean-to-standard deviation and the ratio of peak-to-mean of the cell
g e n e r a t i o n p r o c e s s is fairly i n v a r i a n t f o r all
sequences with typical values for these ratios
ranging between 1.8 t o 2.7 and 2 to 3 , respectively. T h e consistency of values for these ratios for
all sequences suggests that a large p a r t of t h e
variability in the traffic characteristics is a result
of t h e encoding process rather than the scene
content.
Finally, an indication of the importance o f
multiplexing techniques for video can b e seen
from the autocorrelation function for the ensemble of all nine sequences shown in Fig. 12. Figure 12a
shows thecharacteristicsofthisensemblewhen multiplexing is performed with the intraframes of
each sequence in-phase. T h e autocorrelation
function in this case is similar to the autocorrelation function for the Star Wars sequence encoded
with the identical coding parameters (Fig. 6). The
same features, i.e., intraframe impulses and slow
decay of the autocorrelation function, are visible
in both figures. In thecasewhere the intraframes are
staggered (Fig. 12b) the periodicity associatcd
with the intraframes is not visible. However, it
should be noted that even for this best case scenario the tail of the autocorrelation is still fairly long
and must be taken into account.
zyxwvutsrq
zyxwvutsrq
zyxwvutsrq
zyxwvutsrqpon
IEEE Communications Magazine
May 1993
Layered MPEG Coding
lthough many authors have proposed a n d
A
analyzed congestion control schemes that
utilizeprioritization, few have addressed the details
of implementing these priority mechanisms and
t h e performance of t h e w schemes in terms of
video quality. In [4], this issue is addressed for
intraframe(DCT)codedVBRvideoand resultsare
shown for both voice and video in terms of signalto-noise ratio (SNR) performance. With the current
motion compensation based coding techniques,
the effect of packet loss on service performance
is harder to analyze.
zyxwvutsrq
61
zyxwvutsrqponm
zyxwvutsrqp
Figure 13. MPEG coder with pnoritizution.
zyxwvuts
zyxwvut
zyx
zyxwvuts
zyxwv
Multiresolution techniques that lead to layered
coding schemes 18, 91 are particularly amenable
to prioritized transmission. A combination o f an
MPEG-compatible algorithm which utilizes prioritized transmission can have practical significance
in f u t u r e networks. Several issues need to be
addressed in order to determine if priority schemes
will be useful in A T M networks. First, can the
output of a VBR MPEG coder be prioritized satisfactorily? What is the cost o f prioritization in terms
o f extra bandwidth required'.) What are the traffic
characteristics of each priority lc.vcl'? How can the
level o f traffic in each priority bc controlled'?
Prioritization Scheme
In general, an efficient prioritization scheme separates components of a data stream into some relative order of importance. Given the number o f
priority levels, the scheme must decide on the priorityassignmcnt foreachofthese components.Two
desirable criteria for a prioritization scheme are simplicity and the ability t o generate a non-prinritized traffic stream o f equal quality if required. Using
these principles, the MPEG coding algorithm was
modified for prioritized network transmission.
In order to satisfy the simplicity criterion, the
operation of the priority coder is designed to be
very similar to the MPEG coder. The coding loop
for this prioritized MPEG coder is shown in Fig. 13.
The main processing steps arc the same as for the
standard MPEG coder with the modifications to
provide for prioritization shown in the highlighted
box. In standard MPEG coding. there is no priority control ( P C ) mechanism a n d only a single
variable length code generator is uscd. In our scheme,
prioritization decisions can be made for the components of the coded data to give multiple priority streams. We consider a 2 priority scheme here,
giving high priority ( H P ) and low priority (LP)
streams. Each prioritystream is thenvariable length
c o d c d i n d c p e n d e n t l y . T h e PC c a n h e used t o
adjust the number of components that are assigned
to each priority level.
In this priority codcr. the coding mode used
for the current macroblock determines the exact
procedure for prioritizing the components. For I
frames. only o n e macroblock coding m o d e is
used. In thcse frames, prioritization f o r each
macroblock is performed as follows:
I ) All header information is placed in the H P
s t r e a m . T h i s h e a d e r information is primarily
made up o f code words for the macroblock address
and the macroblock type.
2)The DC'valueofthe DC'Tcoefficientsforeach
o f the six 8 x 8 blocks in a macroblock are assigned
to the H P stream.
3 ) For the remaining 63 coefficients in each
8x8 block. we define a parameter which specifics the number of A C coefficients that are to be
placed in the H P stream.
4 ) T h e remaining ( 6 3 - p) coefficients a r e
transmitted in the LPstream.Amacroblock address
header is also added t o the LP information from each
macroblock.
--________
High prmrity cells
zyxwvutsr
Figure 14.Mitltiplexing and transmission orpriontized < d l r
62
IEEE Communication\ Magazine
May 1994
zyxwvutsrqponm
zyxwvutsrqponm
zyxwvuts
zyxwvutsrqpo
For P frames. two macroblock coding modes
are possible; intraframe macroblock coding mode
and motion compensated interframe coding
m o d e . F o r t h e first m o d e , t h e prioritization
scheme is the same as for macroblocks in thelframe.
For the motion compensated mode, priorities are
assigned as follows:
1) All header information is placed in the H P
stream as in the intraframe mode. This header
consists of the macroblock address, macroblock type
and a coded block pattern indicating the blocks
in the current macroblock which are coded (i.e.,
the quantized prediction error is non zero).
2) Motionvectors for each macroblock are placed
in the HP stream.
3) For the 64 DCTcoefficientsobtained from the
transform of the prediction e r r o r for an 8 x 8
block, the first 3! coefficients are assigned to the
HP stream.
4)The remaining(64-p)coefficientsare assigned
t o the LP stream. T h e LP stream in this mode
also contains the macroblock address in order
that skipped macroblocks may be identified.
It should be noted that unlike intraframe macroblocks where the largest energy components
are in the low frequencies. in predictive coding
the large energy DCT coefficients are distributed
among all frequencies. Therefore, the assignment
of lower frequency components to higher priorities as performed in step 3 is somewhat arbitrary.
However, in order to make a more simple transition from a standard to a layered coder, we utilize t h e s a m e assignment p r o c e d u r e in b o t h
intraframe and predictive coding modes. In this
scheme, when p = 64, the LP stream will contain
IEEE Communication\ Magazine
May 1994
I
I
1
1
8
I 16 I
I
452
599
I
275.5
I
325.7
75.6
I
102.2
I
267
137.1
34.2
160
96.5
10.5
no information, while the H P stream will be identical to the bit stream generated by a non-prioritized MPEG coder.
Transmission of the H P and L P components
r e q u i r e s packetization of t h e bit s t r e a m into
ATM cells. W e can visualize this p r o c e s s a s
s h o w n in Fig. 14. T h e H P a n d L P o u t p u t bit
streams of the coder are first fed to a packetizer
a n d then t o a multiplexer t o combine the two
output streams. All H P cells for a frame are first
collected and then transmitted, followed by the
LP cells for the frame.
An interesting feature of the prioritization scheme
described a b o v e is t h a t t h e coding process is
independent of p. This ensures that the coding
zyxwvutsr
63
zyxwvutsrqponml
zyxwvutsrqpo
zyxwvutsrqpo
zyxwvutsrqp
zyxwvutsrqp
Figure 16.Autocon-elation of HP traffic (q = 3,
p = 16).
the degradationof image quality from thelossof low
priority components was very small. Further, the
proportion of low priority traffic at these larger
values of p was low. Therefore, there may be little
advantage in using a p higher than 16, since one
could equivalently use a non-prioritized coder in
this case.
Table 2 shows the peak, mean, and standard deviation of the cell arrivals per frame process for
several N and p. In general we observe that, for
the same p, the peak and standard deviation of
t h e cell arrivals for both priority streams of a
mixed intraiintcrframe coder is higher than that
of an intraframe coder, while the mean of t h e
cell arrivals is lower. As expected, when p is
decreased from 16 to 2, both the peak value and
the variance of cell arrivals for the H P stream
d r o p dramatically for both N = 1 and N = 16.
The sum of the mean total cell arrivals per frame
is fairly constant for all p, implying that varying p
is an efficient way of controlling the magnitude of
traffic in each of the priority streams.
The average total bit rate (sum of H P and LP)
per frame for the sequence is approximately 2.1 Mbis
for N = 16 and 3.4 Mbls for N = 1. The bit rates
per frame for a standard MPEGcoder using the same
y value were 1.9 Mbis for N = 16 and 2.9 Mbls
for N = 1. Using the priority coder therefore increases the bit rate per frame by approximately 10 to
20 percent. This value is important in determining if prioritized transmission is useful, i.e., by
using prioritization can we gain some advantage that
warrants this increase in bandwidth.
Figure 15 shows the distribution of the cells
g e n e r a t e d p e r f r a m e f o r a N = 16 a n d N = 1
coder respectively. For the N = 16 coder, the distributionsofthe LPand HPcellsgeneratedperframe
havc a similar shape. For the N = 1 coder, as b is
increased the distribution of cell arrivals for the
LP stream converges to a fixed value. This fixed
value of approximately three cells is the amount
of header (overhead) required by the LP stream.
The autocorrelation functions for the HP and
LP traffic for the N = 1 and N = 16 coder (Fig.
16) shows an interesting property of the layered
coding scheme. For the H P traffic, we notice that
the correlation between intraframe coded frames
zyxwv
4'
Figure 17. HPltotal bit rate versus
p.
zyxwvuts
zyxwvuts
process is efficient for any value of p. The drawback of this scheme is that when losses occur in
the LP stream, the coder and decoder lose synchronization until the next I frame. I t should be
pointed out that in our scheme for a given p the
frequency components in the HP stream that are
transmitted to t h e d e c o d e r will always b e t h e
same. Therefore although synchronization may
be lost between coder and decoder the effect of
LP cell losses is similar to applying a low pass filter. We found that the visual effect of these types
of losses to be relatively mild.
In the next section, we discuss the traffic characteristics of the prioritized MPEG video coder,
in particular. the effect of p on the traffic generated by the coder. All the statistics for the coder
are presented using frame layer data.
Results
For this study, we utilized values of p between 2
and 16. When p is greater than 16. we found that
64
IEEE Communications Magazine
May 1994
zyxwvutsrqpon
zyxwvutsrqp
zyxwvutsrq
zyxwvutsrqpo
zyxwvutsrqp
zyxwvutsrq
Figure 18. Smoothing video sources.
that occur every Nth frame still remains. In the
LP autocorrelation function, however, the correlation between intraframe coded frames is much
smaller. This implies that for the LP traffic, the number of cells generated per slice is less dependent
on the coding mode utilized than the H P traffic.
T h e ratio of H P cells to the total for different
p is shown in Fig. 17. T h e results for the N = 1
case show that a large proportion of the traffic is
d u e t o the low frequency D C T coefficients. For
the N = 16 case, for a given p, a smaller proportion of the total cells consists of HP cells than for
t h e N = 1 c o d e r . T h i s is mainly b e c a u s e in P
frames, as mentioned previously, the non zero
values for the coded coefficients of the prediction
e r r o r will be uniformly distributed a m o n g all
DCTcomponents. Hence for agiven p, a larger fraction of the traffic will consist of LP cells.
Smoothing video sources
moothing has been suggested as a mechanism
S
t o make VBR video sources easier t o manage on B-ISDN. In smoothing, cells from a source
are buffered for some time period and are then
transmitted at the average rate over this period. The
rationale for smoothing can be seen most clearly
if we examine a bursty source. When bursts occur,
without smoothing, the transmission rate required
would b e t h e p e a k r a t e f o r t h e b u r s t . W h e n
smoothing is utilized, these burst periods a r e
time a v e r a g e d with p e r i o d s of lower activity
t h e r e b y d e c r e a s i n g t h e bit r a t e r e q u i r e d f o r
transmission. T h e trade-off in smoothing traffic
is the resulting added buffer requirements and
increased delay. T h e maximum smoothing that
can be utilized will be determined primarily by
the extra delay which can be tolerated for a service.
For video services,smoothing can be implemented
by buffering coded data for several frames and
then trammitting this data over the time period
of these frames. In [lo], an analysis of an optimal
smoothing technique for video sources is presented. This technique optimizes the trade-off between
delay and buffer occupancy given the traffic characteristics of a video source. It was shown that if
the video Tource traffic could be predicted for
several frames, smoothing could be used advantageously.
IEEE Communications Magazine
May 1994
The traffic characteristics of smoothedvideocan
be viewed by considering longer time periods
than a frame. In terms of layers, smoothed video
c o r r e s p o n d s t o observing t h e coding process
above the frame layer, i.e. when several frames
a r e combined. T h e effect of smoothing on the
negative cumulative distribution function of cells
p e r f r a m e for an N = 1 a n d a n N = 16 M P E G
video coder are shown in Fig. 18. We notice that
for the intraframe coder the effect of smoothing
is imperceptible. This occurs because, for the
intraframe coder, high bit rates extend for long
periods as can be seen from the autocorrelation
function. Therefore, there is little advantage in
smoothing since the average bit rate over a small
number of frames can still be large. For the N = 16
coder smoothing changes the traffic characteristics t o s o m e d e g r e e . F r o m Fig. 18, it c a n b e
observed that there is a substantial decrease in
the peak bit rate required when smoothing is utilized. This can b e explained by the observation
that smoothingwill reduce the size of the intraframe
peaks seen in Fig. 4. However, it should be noted
that the incrementaleffect of smoothingon the peak
rate when the smoothing duration is greater than
three frames is not very large. Apart from this
diminishing return from increasing the smoothing period, t h e increase in transmission delay
caused by smoothing will result in larger buffers
at the decoder, making large smoothing durations
less attractive in practice.
Summary
he study of the statistical properties of packet
T
video streams, t o model video sources, is a
required step in the process of designing B-ISDN
networks t o handle heterogenous traffic. This
work is especially critical because of t h e high
bandwidth required for each video connection
and because of the delay-sensitive nature of realtime video. We find that the traffic characteristics
ofvideo sources are very sensitive to changes in coding algorithms. Previous studies on the statistics
of video sources used a variety of coding algorithms and over time many of these coding algorithms have either become outdated or unpopular.
Withworkon the MPEGvideocodingstandard nearing completion (MPEG 1 is complete, MPEG 2 is
near to completion), statistical studies are required
65
zyxwvut
zyxw
zyxwvutsrqp
zyxwvut
zyxwvutsr
The trafic
characteristics
of video
sources
are very
sensitive to
changes
in coding
algorithms.
66
to determine the characteristics of coders that
utilize this algorithm. This work is especially
important for networkmodeling because of the many
proposed uses of MPEG-like algorithms for video
services from HDTV to multimedia communications.
By studying the underlying coding process of a
MPEG coder, we determined a simple yet realistic packetization process for a real video coder.
We were able to characterize avariety ofvideo sources
by examining the coder output at different operating points ( N , 4 ) . From the results we have
n o t e d t h a t as m o r e c o m p r e s s i o n is u s e d (by
increasing N ) , the ratio of standard deviation to
the average bit rate increases - confirming that
higher compression leads to more bursty sources.
It should also be pointed out that increasing N
beyond a certain value does not increase compression. Video source traffic was also shown to
be extremely correlated especially for intraframe
coding. For the mixed intraiinterframe coding,
the periodic impulses cause by the I frames introduces an extra periodiccorrelation component. The
range of the statistical properties for different
input video sequences was examined and it was
shown that some of the characteristics are dependent primarily on the coding mechanism.
The observations that the time scale for the
cell generation process for a slice corresponds
approximately to the multiplexing layer in ATM
networks, and that a slice is the smallest self-contained coding unit in MPEG, led us to examine
the coding process at the slice layer. It was shown
that modeling video sources at this layer may be
an alternative for the VBR MPEG video coders.
A modified MPEG video coder that generates
a prioritized output bit stream is presented. The
prioritization scheme was designed based not
only on the relative importance of coded components, but also on ease of implementation and
ability t o p r o d u c e n o n prioritized d a t a . T h e
parameter 0, which can be varied between 1 and
64, can be utilized in order to control the proportion of traffic assigned to high and low priorities.
With a p value of 64, the priority coder produces
standard, non-prioritized M P E G data. I t was
shown that by utilizing this type of coder we can
achieve basic image quality only if H P traffic is
guaranteed.
B e c a u s e of t h e high correlation in the bit
stream, it was shown that smoothing has very little impact on the source characteristics, in particular for the intraframe coder. For t h e mixed
intraiinterframe coders, some improvement was
obtained from smoothing the I frames over larger
intervals. T h e maximum delay for a service will
dictate the maximum amount of smoothing that
can be tolerated. Smoothing and prioritization
can be combined for a better transmission environment for VBR video sources.
Acknowledgments
We would like to thank Mark Garrett at Bellcore for
his technical assistance in this study.
References
zyxwvut
zyxwvut
zyxwvu
[ l I B Maglaris et al. Performance Models of Statistical Multiplexing in
Packet Video Communications. IEEE Trans. on Comm., vol. 36, no.
7, pp. 834444. July 1988.
[21 P. Sen et al. Models for Packet Switching of Variable-Bit-Rate Video
Sources IEEEJ. on Sei. Areas in Comm.. vol. 7, no. 5, pp. 865-869,
June 1989.
[31 W. Verbiest, L. Pinnoo. and B. Voeten, The impact of the ATM concept on Video Coding IEEEJ. on Se1 Areas in Comm , vol. 6, no.
9. pp. 1623-1 632, Dec. 1988
141 M Garrett and M. Vetterli. Congestion control strategies for packet video In Proc. Fourth International Workshop on Packet Video.
Aug. 1991
[51 R. M. Rodriguez-Dagnino. M. R K. Khansari. and A. Leon-Garcia,
Prediction of Bit Rate Sequences of Encoded Video Signals IEEE 1.
onSel. Areasin Comm., vol. 9, no. 3, pp. 305-313. April 1991.
I61 D. LeGall. MPEG.AVideoCompression Standard for Multimedia Applications. Communications of the ACM. vol. 34, no. 4, pp. 305-313,
April 1991
171 T. Urabe et al.. MPEGTool: An X window-based MPEG encoder and
statistics t o o l To appear in Multimedra Systems, May 1994.
(Available i n t h e public domain; for information, e-mail t o :
mpegtool@ee.upenn.edu)
181 M. Ghanbari, An Adaptive Video Codec for ATM Networks. In Proc.
Third International Workshop on Packet Video, March 1990.
[91 G. Karlsson and M . Vetterli, Subband coding of video for packet
networks. Optical Engineering, vol. 27. no. 7, pp. 574-586. July 1988
1101 T. Ott. A. Tabatabai, and T. V. Lakshman, A Scheme for Smoothing Delay SensitiveTraffic Offered to ATM Networks. In Proc. IEEE
INFOCOM '92, May 1992.
Biographies
PRAMOD
PANCHA
received his B.S.E. and an M S E degrees in electrical
engineering from the University of Pennsylvania, Philadelphia, in 1988
and 1991. respectively. He i s currently working toward his Ph.D. i n
electrical engineering at the University of Pennsvlvania. His interests
are in video processing and multimedia servicesin B-ISDN. He is a student
member of IEEE and a member of Eta Kappa Nu
MAGDAEL ZARKi [ M '881 received her 6.E.E from Cairo University in
1979,and herM.S.andPh.D.degrees, bothinelectricalengineering,from
Columbia University in 1981 and1987. respectively. From 1981 t o
1983 she was a communciations network planner in the Department
of International Telecommunications at Citibank. She joined Columbia
University in 1983 as a research assistant in the Computer Communications ResearchLabatory whereshe was involved in thedesign and development of an integrated LAN testbed called MAGNET. Currently she IS
an assistant professor in the Department of Electrical Engineering at
the University of Pennsylvania. Philadelphia. Pennsylvania. where she
i s involved in research in telecommunication networks She also holds
a secondaryappointment in the Department of Computer and Information
Sciences. In January 1993 she was appointed part-time professor of
Telecommunication Networks in the Faculty of Electrical Engineering
at Delft University of Technology, Delft, The Netherlands She is a
member of the Association for Computing Machinery and Sigma Xi.
IEEE Communications Magazine
May 1994