MPEG coding for variable bit rate video transmission

P Pancha

z zyxw zyxw MPEG Codina For Variable Bit Rate Video Tfansmrssron J m m For real-time transmission of broadcast-quality video on ATM-based B-ISDN, the intraframe to interframe ratio and the quantizer scale are two key parameters that can be used to control a video source in a network environment. Their impact on the traffic characteristics of the coder provides insights into the cell arrival process for an MPEG source. Pramod Pancha and Magda El Zarki zyxwvutsrq everal coding algorithms have been proposedforvariablebitrate (VBR)video. This variety in coding techniques has made the task of characterizing video sources difficult. Previous studies of bit r a t e statistics of video sources have utilized coding algorithms such as conditional replenishment [ 1-31, that are less efficient than current coding techniques. A more recent study, which provides insights into long run statistics for c o d e d video sequences, uses block-based intraframe discrete cosine transform (DCT) coding 141,but doesnot utilize any interframe coding. Models that utilize results from these studies, while important, may not be valid for source coders which utilize newer coding techniques. A source characterization method that models the properties of a large variety of sources is ideally required f o r analysis. O n e a p p r o a c h f o r achieving such models for video sources is to identify the basic components of video coding systems and from these properties predict bit rates for sources [5].The goal is to then obtain a universal source characterization framework using these components. The basic components that are needed to model a source, however, may vary substantially among different classes of coding algorithms. As a result, the parameters needed in order to use this approach may not be easy to obtain. A n o t h e r a p p r o a c h , which is t h e o n e t h a t we have taken in this article, is to obtain video source models for coders that utilize a standard algorithm that can be applied to a multitude of video services. T h e o u t p u t stream of a video coder, which complies with the Motion Pictures E x p e r t G r o u p ( M P E G ) c o d i n g s t a n d a r d , is s t u d i e d with a n National Television Systems Committee (NTSC) quality video sequence as t h e i n p u t . B e c a u s e t h e M P E G video c o d i n g algorithm has b e e n p r o p o s e d f o r a variety of applications, we also investigate t h e effect of changing the coding parameters on the statistics of interest. The MPEG Coder zyxwvutsrq zyxwvuts zyxwvuts he M P E G coding algorithm was developed T primarily for storage of compressed video on digital storage media. Provisions were therefore made in the algorithm to enable random access, fast fonvard/reverse searches, and other features when decoding from any digital storage media. However, the coding standard is flexible enough to be suitable for a much wider range of video applications. Recent applications of MPEG-like coding algorithms have appeared for a variety of video services from multimedia workstations to high definition television. The decision parameters for the compression algorithm (source coder) were derived from the MPEG Video SimulationModel Three (SM3) report issued by the Simulation Model Editorial Group of the ISO-IEC working group. The input video sequence for the MPEG algorithm was a threeminute, 40-second sequence from the movie Star Wars. This sequence was chosen such that it contained a mix of scenes with and without motion. T h e sequence that was digitized from laser disc has a frame resolution of 512 x 480 pixels, which is similar to NTSC broadcast quality. It should be noted that this resolution is significantly higher than the 352 x 240 pixel resolution that is usually recommended for achieving video tape quality. Two f e a t u r e s of t h e algorithm, which a r e important to characterize the source, are the coding modes and the coding layers that are utilized. These can strongly influence the output traffic characteristics of the coder. We present an outline of the coding algorithm below with emphasis on the parameters and processes that are of interest in understanding the cell generation process. A more detailed description of the MPEG coding standard can be obtained elsewhere (for example, see [6]). zyxwvutsrqpon PRAMOD PANCHA is working toward his Ph.D.in electrical engineeringat the University of Pennsvlvania. zyxwvutsrqponmlkjih MAGDA EL ZARKI is assistant professor in the Department of Electrical Engincering at the Universiy of Pennsylvania andparttime professor of telecommunication networks at Delp University of Technology. 54 0163-6804/94/$04.001994 0 IEEE Coding Layers T h e M P E G simulation model utilized for generating VBR video can be organized into four IEEE Communications Magazine May 1994 zyxwvutsrqp zyxwv zyxwvutsrqpon zyxw zyxwvuts zyxwvutsrqp zyxw -- I _ layers. The layers are arranged below in order of increasing spatial size: Block - A block is the smallest coding unit in the MPEG algorithm. It is made up of 8 x 8 pixels and can b e o n e of three types; luminance ( Y ) , red chrominance ( C r ) , a n d blue chrominance (Cb). T h e block is the basic unit in intraframe D C T coded frames. B -____ zy 7 P .... I i 1 2 3 4 5 6 7 16 Frame -. Figure I.Frame layer coding sequence. Macroblock - A macroblock is the basic coding unit in the M P E G algorithm. A macroblock is a 1 6 x 1 6 pixel s e g m e n t in a f r a m e . S i n c e e a c h chrominance component has one-half the vertical and horizontal resolution of the luminance component, a macroblock consists of 4 Y, 1 Cr, and 1 Cb block. Slice - A slice, which is a horizontal strip within a frame, is the basic processing unit in the MPEG coding scheme. Coding operations on blocks and macroblocks c a n only b e p e r f o r m e d when all pixels f o r a slice a r e available. A slice is a n a u t o n o m o u s u n i t since coding a slice is d o n e independently from its neighbors. I n this study, e a c h f r a m e contains 30 slices of size 512 x 1 6 pixels. Picture - A picture in M P E G terminology is the basic unit of display and corresponds to a single frame in a video sequence. The spatial dimensions of a frame are variable and are determined by t h e requirements of an application. In this study a frame size of 5 12 x 480 pixels corresponding to NTSC television quality was utilized. Coding Modes In the MPEG coding scheme, a choice of several coding modes is available at the frame and macroblock layer. For encoding frames, three modes can be utilized; intraframe (I), predictive (P) and interpolative (B). Frames coded in the intraframe m o d e can only contain macroblocks c o d e d in intraframe mode. For predictive and interpolative (interframe) coded frames, however, macroblocks can be coded in either motion-compensated or intraframe modes. The choice of codingmodes influences thecodingefficiency of the MPEG algorithm. A typical sequence of coding modes is shown in Fig. 1. However. not all modes may be suitable for all applications. Since interpolative coding is non-causal, it may not be suitable for some applic a t i o n s b e c a u s e it r e q u i r e s o u t - o f - s e q u e n c e transmission of frames. This would lead to longer reconstruction delays and larger buffers at the receive r , which may b e undesirable. F o r situations where this extra delay is permissible, interpolative coding may be used t o achieve higher compression ratios. However, it should be pointed out that the complexity of interpolative coding may make it more difficult t o achieve real-time rates when coding NTSC quality television. In addition, by utilizing non-causal techniques, the video service becomes more susceptible to errors since there are more dependencies. I n this study, coding has b e e n restricted t o intraframe and predictive interframe modes and all interpolative coded frames in Fig. 1 have been replaced by predictive coded frames. The ratio of total frames (intra predictive) to intraframes is + IEEE Communications Magazine May 1994 zyxwvu W Figure 2.Basic MPEG coding loop. determined by a parameter N. With this parameter the coder can b e configured as an intraframe coder (N = 1) or a mixed i n t r a h t e r f r a m e coder as in Fig. 1 (N = 16). T h e basic coding l o o p f o r t h e M P E G algorithm is shown in Fig. 2. T h e main processing s t e p of t h e M P E G a l g o r i t h m is block-based motion compensation in interframe coding and block-based D C T in intraframe coding. The flow chart for coding a macroblock is depicted in Fig. 3. In the intraframe coding mode, a frame is processed block by block. A two dimensional 8 x 8 D C T is applied t o all luminance a n d chromin a n c e blocks in t h e f r a m e . T h e coefficients obtained from this transformation are then fed to a quantizer. The step size for the quantization of each D C T coefficient is obtained from an 8 x 8 quantizer matrix. This matrix ensures that the low frequency D C T coefficients are quantized more accurately (with a small step size) while the high frequency coefficients are quantized more coarsely. The D C coefficient of the DCT, which remains fairly constant throughout a frame, is coded differentially within a slice using the D C value of the previous block as a predictor. This predictor is reset at the beginning of every slice. Once the 64 quantized coefficients for each block in a macroblockare obtained,variable length codes are generated for the macroblock. As aresult of quantization a significant proportion of the quantized coefficients will be zero valued. These blocks of coefficients can therefore be efficiently coded by relying on a combination of run length coding and modified Huffman coding. For the most frequent combinations of z e r o r u n l e n g t h s a n d n o n - z e r o coefficient value that follow thesezero runsvariable lengthcodes have beendefined in the MPEG standard. If novariable length code exists for a particular combination of zero run length and coefficient value, a fixed length code is used instead. 55 - zy The MPEG algorithm uses several parameters which, when vaned, can lead to a significant change in the characte&ics o f a video source. zyxwvutsrqp zyxwvuts zyx zyxw Figure 3.MPEG macroblock codingJow chart. In the predictive coding mode, frames are processed on a macroblock basis. T h e initial step in the coding process is macroblock based motion estimation. A s q u a r e a r e a of 32 pixels around each macroblock is chosen as the motion vector search area. A potential motion vector is identified within the search area,whichminimizes the absolute macroblock difference between the current macroblock a n d a displaced (predicted) macroblock in the previous frame. This motion vcctor is then utilized indetermining thecoding mode for thecurrent macroblock. If the absolute difference between the current macroblock and the motion compensated macroblock is less than a threshold, the motion vectors are variable length coded and transmitted. T h e prediction e r r o r . a f t e r m o t i o n c o m p e n s a t i o n is a p p l i e d t o t h e macroblock, is e n c o d e d using DCT. Thc coefficients obtained from the transform of the prediction error are then quantized. Unlike the intraframe quantizer matrix. the quantization step size for these error terms are constant for all coefficients. The quantized coefficients are then variable length coded as in the intraframe mode. Ifthe predictionerror islargewhencompared to the total enerLy in the macroblock, predictive coding mode is not utilized and all blocks in the current macroblockare coded using the intraframe mode as previously described. With a fixed quantizer matrix for each coding 56 mode, it is possible to scale all quantization levels using a single parameter, the quantizer scale, q. I n constant bit rate (CBR) coding, this parameter is utilized t o dynamically vary t h e bit r a t e to ensure that the number of bits generated for a sequence iswithin the target range ofbit rates. However, varying q leads to a variable image quality in the reconstructed frame at the receiver. In VBR video, q is not varied in order to maintain a fixed image quality for all frames. Modeling MPEG Video revious statistical studiesand models have restrictP ed their attention to the number of bits generated per frame for a given coding algorithm. The total packet arrivals per frame is then obtained by converting this number to the equivalent number of ATM cells. In our study we look at the traffic characteristics of the MPEG video source at the frame, slice and macroblock layers. The statistical properties of the coder at these layers are examined to obtain a better understanding of the cell generation process [7]. The MPEG algorithm uses several parameters that when varied can lead to a significant change in the characteristics of a video source. Two of these parameters, N , the interframe to intraframe ratio. and q. the quantizer scale. a r e especially interesting as they can potentially be used to control IEEE Communications Magazine May 1994 zyxwvutsrqpon zyxwvutsrq zyxwvut Figure 4. Bits-generated-per-framefor sequence (enlarged 30 seconds sequence). a video source through network feedback. For example, if the network cell loss rate temporarily increases, then. by decreasing N , the frequency of intraframes in the MPEG coding sequence could be increased. Since intraframes halt error propagation, this action will ensure that the durationoferror propagation is short. Similarly. if the network detects t h e o n s e t of congestion, a feedback message could b e sent t o all video sources t h a t would require them to decrease their bit rates. This can best be accomplished by increasing 4, at the expense of a temporary loss of quality. There is therefore aneed tocharacterize thevideosourcenot onlyunder normal operation but also to identify the changes that occur when these parameters are modified. Frame Layer At the frame layer, the traffic characteristics of the source can be modeled by assuming a simple packetization process. T h e bits generated from the M P E G coding process a r e placed in ATM cells. A user payload size of 44 bytes is utilized in order to accommodate a 4 byte ATM adaptation layer (AAL) header for each cell. O n e constraint that is imposed on the cell arrival process is that bits generated for a frame will not be combined with the bits of the next frame. This implies that at least o n e cell is always sent for each frame. T h i s assumption is justified since the MPEG frame header information that must be transmit- zyxwvutsrqp zyxwvutsrqpon zyxwvutsr IEEE Communications Magazine May 1994 57 zyxwvutsrqponm zyxwvuts zyx zyxwvutsrqp zyxwvutsr Figure 6. Cells-per-frame autocowelation function for standard coder. ted for each frame will require transmission of at least one cell. Statistical Results -The coder was initially run several times to obtain an estimate of q at a “normal” operating point. This was done by determining if the visual quality of the reconstructed frames for various q was acceptable. For a q of 8, the image quality was judged t o be good for all viewed frames. When q is increased beyond this value, some blocking effects are visible in some Figure 7. Effect of N cells on per-frame average. Figure 8 . Relution offrunie and slice layer model. 58 reconstructed frames. In this study, we investigated the effect of varying q (3 values; 4, 8, and 16), o n the traffic characteristics of the coded video bit stream. The effect of varying the type of coder from a pure intraframe coder ( N = 1 ) to a mixed intra/interframe coder ( N = 16) to a p u r e i n t e r f r a m e c o d e r ( N = 2688) was a l s o investigated . Figure 4 illustrates a sample bits per frame time series for a ( N = 16, q = 8) coder. The mean bit rate for this sample is 1.2 Mbis. This value is within the range we would expect for this coder. since the frame resolution we utilize is higher t h a n t h e f o r m a t r e c o m m e n d e d in t h e MPEG specifications. The periodic impulses i n the bits per frame trace which occur every 16 frames are due to the intraframes. This phenomenon is not observed with the N = I a n d N = 2688 coders. For coding schemes that only utilize intraframe or interframe techniques these large changes in bit rate arecausedmainlybyscenechangesthat occur approximately every 6 seconds. Although these transitions as a result of scene changes still exist in mixed i n t r a l i n t e r f r a m e MPEG coding, t h e i r effect is masked to some degree by the periodic intraframe coding process. The traffic characteristics of the video source are presented in Figs. 5 and 6 . An interesting observation is that the mean and variance of the cell arrivals p e r frame f o r t h e ( N = 16, q = 4 ) a n d ( N = 2688. q = 4) coder are very similar. In fact. as Fig. 7 shows, when N is increased the average cell arrivals per frame decreases until N=32 but increases f o r l a r g e r values of N . T h e initial decrease in bit rate as A‘ increases occurs because greater compression can be achieved with predictive coding. However, ifintraframe coding is not utilized f o r long periods, e r r o r s from predictive coding begin t o accumulate leading to less efficient coding which causes the increase in bit rates seen in Fig. 7. The probability mass function (pmf) or distribution ofATMcellsgenerated per frame for acoder for different ( N . q ) combinations is shown in Fig. 5 . It is interesting t o note that the distribution for a ( N = 1, q = 4) coder is quite different from a ( N = 16, q = 4 ) coder, in that both the average number of cells per frame is higher and zyxw IEEE Communications Magazine * May 1994 zyxwvutsrqpon zyxwvutsrq zyxwvutsrqpo W Figure 9. Cells-per-slicepmf for standard coder. the spread of the distribution is greater. It can also be noticed that increasing q causes the shape of the pmf to become more one sided. This observation is somewhat similar to those made in [3],where it was shown that a source of lower resolution, like video conferencing, had a skewed probability density function when compared to a broadcast television source. The autocorrelation function for cells generated per frame for 3 differentvaluesofNandq isshown in Fig. 6. For the case where both interframe and intraframe coding were utilized (N = 16), impulses in the autocorrelation function occur every N frames confirming that significant correlation exists between the intraframes in the cell generation process. When only intraframe ( N = 1)codingis utilized, correlation between cellsgenerated per frame exists for up to 300 frames. I . 0I Slice Layer If network resource allocation decisions can be made using a more detailed source characterization, a slice layer model may be a better choice than a frame layer model (since the MPEG coding process is essentially slice-based). In addition, slices are the smallest self-contained unit of variable length coded data in the MPEG coding scheme and are therefore the smallest independent units of measure. It should also be noted that the time frame in which cells are generated for an encoded slice is on the order of 1 ms. This time scale is closer t o the cell level process in ATM than the cells-per-frame statistics used in previous studies of video sources and should therefore be more accurate for modeling. This leads us to the hypothesis that a better sourcecharacterizationcan beobtained by examining the process at the slice layer. zyxwvut zyxwvutsrq zyxwvutsrqp I 1.0 I zyxwvutsrq ---7 Figure 10. Cells-per-slice autocovelation function for standard coder. IEEE Communications Magazine May 1994 59 zyxwvutsrqpon zyxwvutsr Figure 11. Burst size and interam'val time pmf I lndianaJones2 I 500.73 1095 604.14 I Last Emperor 2 League of Our Own 809 News . I 1252 I ~ zyxwvutsrq zyxwvutsrq I 1452 I tad Emperor 1 352.24 I I 537.43 I 852 222.67 1 i 110.02 ! 204.28 172.00 I 248.21 zyxwvuts zyxw 113.20 Table 1. Cells-per-frame statistics for several sequences. cells-per-frame statistics. A comparison of the pmf of cells per slice (Fig. 9) with that of cells per frame (Fig. 5 ) shows the cells-per-slice distribution to be smoother. This is especially true for the high quality sources ( q = 4) and for the tails of the distributions. T h e autocorrelation function for the number of cells generated in a slice is extremely high for lags of up to 1200 slices (40 frames), as shown in Fig. 10. In fact. the structure of the correlations is more pronounced in the cells-per-slice data than in the cells-per-frame data because the effect of cross-correlations between different slices within a frame are separated from the correlations between the same slice in adjacentframes.The resultingperiodicity that occurs in the autocorrelation function every30slicescan beseen in Fig. 10.Thecorrelation peaks from the intraframe components are also visible but are less pronounced than for the cellsper-frame autocorrelation function. This obscrvation implies that it may be possible to treat eachslice as an independent unit and model each frame as 30 independent slices. I The slice layer model's relation to the frame layer model is depicted in Fig. 8. For t h e slice layer model we assume that the time to code each slice is essentially a constant, T',,since the coding processfiir each macroblock isvery similar. Cell arrivals from successive slices in a frame can therefore be depicted as a periodic process, where the cells generated for a slice a r e assumed t o arrive a s a burst of cclls at the end of a slice. In this model, the bits of o n e slice will not be combined with those of other slices and one cell is always generated for every slice. Note that the number of cells generated per frame will be slightly higher than for the frame layer model since half filled cells will exist at the end of each slice. However the increase in cell generation r a t e is small (e.g., 227.69 versus 242.1 cellsiframe for N = 16, q = 4) and treating slices as independent packetization units can be useful for error recovery mechanisms. Macroblock Layer An a l t e r n a t e a p p r o a c h t o modeling o n a subFramelevel is tocharacterize the source trafficat the macroblock laycr by examining the cell interarrival process. If the bits for a macroblock a r e packetized into cells as they are generated by the coder, the interarrival times for cells can be used to model the source. Using this technique, the interarrival distribution for bursts combined with t h e b u r s t length d i s t r i b u t i o n c a n b e used t o describe t h e cell g e n e r a t i o n process for e a c h f r a m e . F o r this m o d e l . t h e units of t h e burst interarrival time in Fig. I1 are expressed in terms of a normalized value, the time required to encode a macroblock. T h e r e a l - t i m e n a t u r e of video results in a maximum value for this normalized unit that isdeterminedbythe frame rateand the spatial dimensions (e.g., 43 ms for this video sequence). From Fig. 11. it can be observed that the distributions of both the mixed intraiinterframe and intraframe source coder a r e approximately exponential. However, it should be noted that in intraframecoding burstsof greater than one cell per macroblock are rare. while for mixed intra/interframe coding, burst lengths of up to six cells per zyxwvutsr zy StatisticalResults-The statistics ofthe cells-perslice process (Fig. 9) show the same trends as the cell per frame process (e.g., the standard deviation for mixed intraiinterframe coders is proportionally larger when compared t o the mean than for intraframe coders). T h e cells-per-slice statistics t h e r e f o r e c a p t u r e the same p r o p e r t i e s a s t h e 60 IEEE Communications Magazine May 1994 zyxwvutsrqpon zyxwvutsr macroblock were possible. This reinforces the observation that the mixed intrahnterframe coder generates more bursty traffic at all layers than the intraframe coder. Effect of Scene Content In the previous section the statistical properties of video traffic for a variety of s o u r c e coding parameters were examined in detail using a single video sequence as input. Another important factor in determining the characteristics of video traffic is the scene content of the input video sequence. In this context, the scene-content-related factors encompass any component that is intrinsic to a particular scene or sequence. These factors, which can vary across video sequences, include color, brightness. a n d motion c o n t e n t . By utilizing a long video sequence to collect statistics we can eliminate some local variations associated with particular scenes. However, it is a l s o i m p o r t a n t to determine the efficacy of the coding algorithm for several different types of input sequences. This will provide an indication o f the range of bit rates and associated statistics which will be generated by asourcewhich utilizes the MPEG algorithm. Statistics for nine two-minutc sequences from a variety of sources that were digitized and encoded using the MPEG coding parameters ( N = 16, 4 = 4)are given in Tahle 1. The results show the variability in the average a n d peak cells per frame t h a t can exist when encoding sequences with differing characteristics. From the table. it can be observed t h a t t h e bit r a t e f o r t h e S f u r Wars sequence is lower than for the other sequences. A plausible explanation for this effect is that the color content in the Stur W m sequence is fairly low (i.e.,colorsareclose tograyscale). Inaddition, some of these sequences contain m o r e image detail (i.e., more high frequency components) than the Star Wars sequence which also results in lower compression ratios. T h e average rates for the sequences varies between 2.1 Mbis for “News” t o 7.8 Mbis f o r “Boxing.” The activity contained in a sequence is closely r e l a t e d t o t h e a v e r a g e r a t e s t h a t a r e required t o encode the sequences. It is evident from the results that although the average bit ratesvarywith the xenescontained in each sequence, s o m e basic statistical p r o p e r t i e s of the video source trafficare quite similar even when sequences with these differing characteristics are encoded. In particular, the ratio of mean-to-standard deviation and the ratio of peak-to-mean of the cell g e n e r a t i o n p r o c e s s is fairly i n v a r i a n t f o r all sequences with typical values for these ratios ranging between 1.8 t o 2.7 and 2 to 3 , respectively. T h e consistency of values for these ratios for all sequences suggests that a large p a r t of t h e variability in the traffic characteristics is a result of t h e encoding process rather than the scene content. Finally, an indication of the importance o f multiplexing techniques for video can b e seen from the autocorrelation function for the ensemble of all nine sequences shown in Fig. 12. Figure 12a shows thecharacteristicsofthisensemblewhen multiplexing is performed with the intraframes of each sequence in-phase. T h e autocorrelation function in this case is similar to the autocorrelation function for the Star Wars sequence encoded with the identical coding parameters (Fig. 6). The same features, i.e., intraframe impulses and slow decay of the autocorrelation function, are visible in both figures. In thecasewhere the intraframes are staggered (Fig. 12b) the periodicity associatcd with the intraframes is not visible. However, it should be noted that even for this best case scenario the tail of the autocorrelation is still fairly long and must be taken into account. zyxwvutsrq zyxwvutsrq zyxwvutsrq zyxwvutsrqpon IEEE Communications Magazine May 1993 Layered MPEG Coding lthough many authors have proposed a n d A analyzed congestion control schemes that utilizeprioritization, few have addressed the details of implementing these priority mechanisms and t h e performance of t h e w schemes in terms of video quality. In [4], this issue is addressed for intraframe(DCT)codedVBRvideoand resultsare shown for both voice and video in terms of signalto-noise ratio (SNR) performance. With the current motion compensation based coding techniques, the effect of packet loss on service performance is harder to analyze. zyxwvutsrq 61 zyxwvutsrqponm zyxwvutsrqp Figure 13. MPEG coder with pnoritizution. zyxwvuts zyxwvut zyx zyxwvuts zyxwv Multiresolution techniques that lead to layered coding schemes 18, 91 are particularly amenable to prioritized transmission. A combination o f an MPEG-compatible algorithm which utilizes prioritized transmission can have practical significance in f u t u r e networks. Several issues need to be addressed in order to determine if priority schemes will be useful in A T M networks. First, can the output of a VBR MPEG coder be prioritized satisfactorily? What is the cost o f prioritization in terms o f extra bandwidth required'.) What are the traffic characteristics of each priority lc.vcl'? How can the level o f traffic in each priority bc controlled'? Prioritization Scheme In general, an efficient prioritization scheme separates components of a data stream into some relative order of importance. Given the number o f priority levels, the scheme must decide on the priorityassignmcnt foreachofthese components.Two desirable criteria for a prioritization scheme are simplicity and the ability t o generate a non-prinritized traffic stream o f equal quality if required. Using these principles, the MPEG coding algorithm was modified for prioritized network transmission. In order to satisfy the simplicity criterion, the operation of the priority coder is designed to be very similar to the MPEG coder. The coding loop for this prioritized MPEG coder is shown in Fig. 13. The main processing steps arc the same as for the standard MPEG coder with the modifications to provide for prioritization shown in the highlighted box. In standard MPEG coding. there is no priority control ( P C ) mechanism a n d only a single variable length code generator is uscd. In our scheme, prioritization decisions can be made for the components of the coded data to give multiple priority streams. We consider a 2 priority scheme here, giving high priority ( H P ) and low priority (LP) streams. Each prioritystream is thenvariable length c o d c d i n d c p e n d e n t l y . T h e PC c a n h e used t o adjust the number of components that are assigned to each priority level. In this priority codcr. the coding mode used for the current macroblock determines the exact procedure for prioritizing the components. For I frames. only o n e macroblock coding m o d e is used. In thcse frames, prioritization f o r each macroblock is performed as follows: I ) All header information is placed in the H P s t r e a m . T h i s h e a d e r information is primarily made up o f code words for the macroblock address and the macroblock type. 2)The DC'valueofthe DC'Tcoefficientsforeach o f the six 8 x 8 blocks in a macroblock are assigned to the H P stream. 3 ) For the remaining 63 coefficients in each 8x8 block. we define a parameter which specifics the number of A C coefficients that are to be placed in the H P stream. 4 ) T h e remaining ( 6 3 - p) coefficients a r e transmitted in the LPstream.Amacroblock address header is also added t o the LP information from each macroblock. --________ High prmrity cells zyxwvutsr Figure 14.Mitltiplexing and transmission orpriontized < d l r 62 IEEE Communication\ Magazine May 1994 zyxwvutsrqponm zyxwvutsrqponm zyxwvuts zyxwvutsrqpo For P frames. two macroblock coding modes are possible; intraframe macroblock coding mode and motion compensated interframe coding m o d e . F o r t h e first m o d e , t h e prioritization scheme is the same as for macroblocks in thelframe. For the motion compensated mode, priorities are assigned as follows: 1) All header information is placed in the H P stream as in the intraframe mode. This header consists of the macroblock address, macroblock type and a coded block pattern indicating the blocks in the current macroblock which are coded (i.e., the quantized prediction error is non zero). 2) Motionvectors for each macroblock are placed in the HP stream. 3) For the 64 DCTcoefficientsobtained from the transform of the prediction e r r o r for an 8 x 8 block, the first 3! coefficients are assigned to the HP stream. 4)The remaining(64-p)coefficientsare assigned t o the LP stream. T h e LP stream in this mode also contains the macroblock address in order that skipped macroblocks may be identified. It should be noted that unlike intraframe macroblocks where the largest energy components are in the low frequencies. in predictive coding the large energy DCT coefficients are distributed among all frequencies. Therefore, the assignment of lower frequency components to higher priorities as performed in step 3 is somewhat arbitrary. However, in order to make a more simple transition from a standard to a layered coder, we utilize t h e s a m e assignment p r o c e d u r e in b o t h intraframe and predictive coding modes. In this scheme, when p = 64, the LP stream will contain IEEE Communication\ Magazine May 1994 I I 1 1 8 I 16 I I 452 599 I 275.5 I 325.7 75.6 I 102.2 I 267 137.1 34.2 160 96.5 10.5 no information, while the H P stream will be identical to the bit stream generated by a non-prioritized MPEG coder. Transmission of the H P and L P components r e q u i r e s packetization of t h e bit s t r e a m into ATM cells. W e can visualize this p r o c e s s a s s h o w n in Fig. 14. T h e H P a n d L P o u t p u t bit streams of the coder are first fed to a packetizer a n d then t o a multiplexer t o combine the two output streams. All H P cells for a frame are first collected and then transmitted, followed by the LP cells for the frame. An interesting feature of the prioritization scheme described a b o v e is t h a t t h e coding process is independent of p. This ensures that the coding zyxwvutsr 63 zyxwvutsrqponml zyxwvutsrqpo zyxwvutsrqpo zyxwvutsrqp zyxwvutsrqp Figure 16.Autocon-elation of HP traffic (q = 3, p = 16). the degradationof image quality from thelossof low priority components was very small. Further, the proportion of low priority traffic at these larger values of p was low. Therefore, there may be little advantage in using a p higher than 16, since one could equivalently use a non-prioritized coder in this case. Table 2 shows the peak, mean, and standard deviation of the cell arrivals per frame process for several N and p. In general we observe that, for the same p, the peak and standard deviation of t h e cell arrivals for both priority streams of a mixed intraiintcrframe coder is higher than that of an intraframe coder, while the mean of t h e cell arrivals is lower. As expected, when p is decreased from 16 to 2, both the peak value and the variance of cell arrivals for the H P stream d r o p dramatically for both N = 1 and N = 16. The sum of the mean total cell arrivals per frame is fairly constant for all p, implying that varying p is an efficient way of controlling the magnitude of traffic in each of the priority streams. The average total bit rate (sum of H P and LP) per frame for the sequence is approximately 2.1 Mbis for N = 16 and 3.4 Mbls for N = 1. The bit rates per frame for a standard MPEGcoder using the same y value were 1.9 Mbis for N = 16 and 2.9 Mbls for N = 1. Using the priority coder therefore increases the bit rate per frame by approximately 10 to 20 percent. This value is important in determining if prioritized transmission is useful, i.e., by using prioritization can we gain some advantage that warrants this increase in bandwidth. Figure 15 shows the distribution of the cells g e n e r a t e d p e r f r a m e f o r a N = 16 a n d N = 1 coder respectively. For the N = 16 coder, the distributionsofthe LPand HPcellsgeneratedperframe havc a similar shape. For the N = 1 coder, as b is increased the distribution of cell arrivals for the LP stream converges to a fixed value. This fixed value of approximately three cells is the amount of header (overhead) required by the LP stream. The autocorrelation functions for the HP and LP traffic for the N = 1 and N = 16 coder (Fig. 16) shows an interesting property of the layered coding scheme. For the H P traffic, we notice that the correlation between intraframe coded frames zyxwv 4' Figure 17. HPltotal bit rate versus p. zyxwvuts zyxwvuts process is efficient for any value of p. The drawback of this scheme is that when losses occur in the LP stream, the coder and decoder lose synchronization until the next I frame. I t should be pointed out that in our scheme for a given p the frequency components in the HP stream that are transmitted to t h e d e c o d e r will always b e t h e same. Therefore although synchronization may be lost between coder and decoder the effect of LP cell losses is similar to applying a low pass filter. We found that the visual effect of these types of losses to be relatively mild. In the next section, we discuss the traffic characteristics of the prioritized MPEG video coder, in particular. the effect of p on the traffic generated by the coder. All the statistics for the coder are presented using frame layer data. Results For this study, we utilized values of p between 2 and 16. When p is greater than 16. we found that 64 IEEE Communications Magazine May 1994 zyxwvutsrqpon zyxwvutsrqp zyxwvutsrq zyxwvutsrqpo zyxwvutsrqp zyxwvutsrq Figure 18. Smoothing video sources. that occur every Nth frame still remains. In the LP autocorrelation function, however, the correlation between intraframe coded frames is much smaller. This implies that for the LP traffic, the number of cells generated per slice is less dependent on the coding mode utilized than the H P traffic. T h e ratio of H P cells to the total for different p is shown in Fig. 17. T h e results for the N = 1 case show that a large proportion of the traffic is d u e t o the low frequency D C T coefficients. For the N = 16 case, for a given p, a smaller proportion of the total cells consists of HP cells than for t h e N = 1 c o d e r . T h i s is mainly b e c a u s e in P frames, as mentioned previously, the non zero values for the coded coefficients of the prediction e r r o r will be uniformly distributed a m o n g all DCTcomponents. Hence for agiven p, a larger fraction of the traffic will consist of LP cells. Smoothing video sources moothing has been suggested as a mechanism S t o make VBR video sources easier t o manage on B-ISDN. In smoothing, cells from a source are buffered for some time period and are then transmitted at the average rate over this period. The rationale for smoothing can be seen most clearly if we examine a bursty source. When bursts occur, without smoothing, the transmission rate required would b e t h e p e a k r a t e f o r t h e b u r s t . W h e n smoothing is utilized, these burst periods a r e time a v e r a g e d with p e r i o d s of lower activity t h e r e b y d e c r e a s i n g t h e bit r a t e r e q u i r e d f o r transmission. T h e trade-off in smoothing traffic is the resulting added buffer requirements and increased delay. T h e maximum smoothing that can be utilized will be determined primarily by the extra delay which can be tolerated for a service. For video services,smoothing can be implemented by buffering coded data for several frames and then trammitting this data over the time period of these frames. In [lo], an analysis of an optimal smoothing technique for video sources is presented. This technique optimizes the trade-off between delay and buffer occupancy given the traffic characteristics of a video source. It was shown that if the video Tource traffic could be predicted for several frames, smoothing could be used advantageously. IEEE Communications Magazine May 1994 The traffic characteristics of smoothedvideocan be viewed by considering longer time periods than a frame. In terms of layers, smoothed video c o r r e s p o n d s t o observing t h e coding process above the frame layer, i.e. when several frames a r e combined. T h e effect of smoothing on the negative cumulative distribution function of cells p e r f r a m e for an N = 1 a n d a n N = 16 M P E G video coder are shown in Fig. 18. We notice that for the intraframe coder the effect of smoothing is imperceptible. This occurs because, for the intraframe coder, high bit rates extend for long periods as can be seen from the autocorrelation function. Therefore, there is little advantage in smoothing since the average bit rate over a small number of frames can still be large. For the N = 16 coder smoothing changes the traffic characteristics t o s o m e d e g r e e . F r o m Fig. 18, it c a n b e observed that there is a substantial decrease in the peak bit rate required when smoothing is utilized. This can b e explained by the observation that smoothingwill reduce the size of the intraframe peaks seen in Fig. 4. However, it should be noted that the incrementaleffect of smoothingon the peak rate when the smoothing duration is greater than three frames is not very large. Apart from this diminishing return from increasing the smoothing period, t h e increase in transmission delay caused by smoothing will result in larger buffers at the decoder, making large smoothing durations less attractive in practice. Summary he study of the statistical properties of packet T video streams, t o model video sources, is a required step in the process of designing B-ISDN networks t o handle heterogenous traffic. This work is especially critical because of t h e high bandwidth required for each video connection and because of the delay-sensitive nature of realtime video. We find that the traffic characteristics ofvideo sources are very sensitive to changes in coding algorithms. Previous studies on the statistics of video sources used a variety of coding algorithms and over time many of these coding algorithms have either become outdated or unpopular. Withworkon the MPEGvideocodingstandard nearing completion (MPEG 1 is complete, MPEG 2 is near to completion), statistical studies are required 65 zyxwvut zyxw zyxwvutsrqp zyxwvut zyxwvutsr The trafic characteristics of video sources are very sensitive to changes in coding algorithms. 66 to determine the characteristics of coders that utilize this algorithm. This work is especially important for networkmodeling because of the many proposed uses of MPEG-like algorithms for video services from HDTV to multimedia communications. By studying the underlying coding process of a MPEG coder, we determined a simple yet realistic packetization process for a real video coder. We were able to characterize avariety ofvideo sources by examining the coder output at different operating points ( N , 4 ) . From the results we have n o t e d t h a t as m o r e c o m p r e s s i o n is u s e d (by increasing N ) , the ratio of standard deviation to the average bit rate increases - confirming that higher compression leads to more bursty sources. It should also be pointed out that increasing N beyond a certain value does not increase compression. Video source traffic was also shown to be extremely correlated especially for intraframe coding. For the mixed intraiinterframe coding, the periodic impulses cause by the I frames introduces an extra periodiccorrelation component. The range of the statistical properties for different input video sequences was examined and it was shown that some of the characteristics are dependent primarily on the coding mechanism. The observations that the time scale for the cell generation process for a slice corresponds approximately to the multiplexing layer in ATM networks, and that a slice is the smallest self-contained coding unit in MPEG, led us to examine the coding process at the slice layer. It was shown that modeling video sources at this layer may be an alternative for the VBR MPEG video coders. A modified MPEG video coder that generates a prioritized output bit stream is presented. The prioritization scheme was designed based not only on the relative importance of coded components, but also on ease of implementation and ability t o p r o d u c e n o n prioritized d a t a . T h e parameter 0, which can be varied between 1 and 64, can be utilized in order to control the proportion of traffic assigned to high and low priorities. With a p value of 64, the priority coder produces standard, non-prioritized M P E G data. I t was shown that by utilizing this type of coder we can achieve basic image quality only if H P traffic is guaranteed. B e c a u s e of t h e high correlation in the bit stream, it was shown that smoothing has very little impact on the source characteristics, in particular for the intraframe coder. For t h e mixed intraiinterframe coders, some improvement was obtained from smoothing the I frames over larger intervals. T h e maximum delay for a service will dictate the maximum amount of smoothing that can be tolerated. Smoothing and prioritization can be combined for a better transmission environment for VBR video sources. Acknowledgments We would like to thank Mark Garrett at Bellcore for his technical assistance in this study. References zyxwvut zyxwvut zyxwvu [ l I B Maglaris et al. Performance Models of Statistical Multiplexing in Packet Video Communications. IEEE Trans. on Comm., vol. 36, no. 7, pp. 834444. July 1988. [21 P. Sen et al. Models for Packet Switching of Variable-Bit-Rate Video Sources IEEEJ. on Sei. Areas in Comm.. vol. 7, no. 5, pp. 865-869, June 1989. [31 W. Verbiest, L. Pinnoo. and B. Voeten, The impact of the ATM concept on Video Coding IEEEJ. on Se1 Areas in Comm , vol. 6, no. 9. pp. 1623-1 632, Dec. 1988 141 M Garrett and M. Vetterli. Congestion control strategies for packet video In Proc. Fourth International Workshop on Packet Video. Aug. 1991 [51 R. M. Rodriguez-Dagnino. M. R K. Khansari. and A. Leon-Garcia, Prediction of Bit Rate Sequences of Encoded Video Signals IEEE 1. onSel. Areasin Comm., vol. 9, no. 3, pp. 305-313. April 1991. I61 D. LeGall. MPEG.AVideoCompression Standard for Multimedia Applications. Communications of the ACM. vol. 34, no. 4, pp. 305-313, April 1991 171 T. Urabe et al.. MPEGTool: An X window-based MPEG encoder and statistics t o o l To appear in Multimedra Systems, May 1994. (Available i n t h e public domain; for information, e-mail t o : mpegtool@ee.upenn.edu) 181 M. Ghanbari, An Adaptive Video Codec for ATM Networks. In Proc. Third International Workshop on Packet Video, March 1990. [91 G. Karlsson and M . Vetterli, Subband coding of video for packet networks. Optical Engineering, vol. 27. no. 7, pp. 574-586. July 1988 1101 T. Ott. A. Tabatabai, and T. V. Lakshman, A Scheme for Smoothing Delay SensitiveTraffic Offered to ATM Networks. In Proc. IEEE INFOCOM '92, May 1992. Biographies PRAMOD PANCHA received his B.S.E. and an M S E degrees in electrical engineering from the University of Pennsylvania, Philadelphia, in 1988 and 1991. respectively. He i s currently working toward his Ph.D. i n electrical engineering at the University of Pennsvlvania. His interests are in video processing and multimedia servicesin B-ISDN. He is a student member of IEEE and a member of Eta Kappa Nu MAGDAEL ZARKi [ M '881 received her 6.E.E from Cairo University in 1979,and herM.S.andPh.D.degrees, bothinelectricalengineering,from Columbia University in 1981 and1987. respectively. From 1981 t o 1983 she was a communciations network planner in the Department of International Telecommunications at Citibank. She joined Columbia University in 1983 as a research assistant in the Computer Communications ResearchLabatory whereshe was involved in thedesign and development of an integrated LAN testbed called MAGNET. Currently she IS an assistant professor in the Department of Electrical Engineering at the University of Pennsylvania. Philadelphia. Pennsylvania. where she i s involved in research in telecommunication networks She also holds a secondaryappointment in the Department of Computer and Information Sciences. In January 1993 she was appointed part-time professor of Telecommunication Networks in the Faculty of Electrical Engineering at Delft University of Technology, Delft, The Netherlands She is a member of the Association for Computing Machinery and Sigma Xi. IEEE Communications Magazine May 1994

RELATED PAPERS

RELATED TOPICS

Log In

MPEG coding for variable bit rate video transmission

MPEG coding for variable bit rate video transmission

Related Papers

RELATED PAPERS

RELATED TOPICS