The Discrete Cosine Transform
The Discrete Cosine Transform
The Discrete Cosine Transform
DCT Encoding The general equation for a 1D (N data items) DCT is defined by the following equation:
and the corresponding inverse 1D DCT transform is simple F-1(u), i.e.: where
The general equation for a 2D (N by M image) DCT is defined by the following equation:
and the corresponding inverse 2D DCT transform is simple F-1(u,v), i.e.: where
The input image is N by M; f(i,j) is the intensity of the pixel in row i and column j; F(u,v) is the DCT coefficient in row k1 and column k2 of the DCT matrix. For most images, much of the signal energy lies at low frequencies; these appear in the upper left corner of the DCT. Compression is achieved since the lower right values represent higher frequencies, and are often small - small enough to be neglected with little visible distortion. The DCT input is an 8 by 8 array of integers. This array contains each pixel's gray scale level; 8 bit pixels have levels from 0 to 255. Therefore an 8 point DCT would be: where
The output array of DCT coefficients contains integers; these can range from -1024 to 1023. It is computationally easier to implement and more efficient to regard the DCT as a set of basis functions which given a known input array size (8 x 8) can be precomputed and stored. This involves simply computing values for a convolution mask (8 x8 window) that get applied (summ values x pixelthe window overlap with image apply window accros all rows/columns of image). The values as simply calculated from the DCT formula. The 64 (8 x 8) DCT basis functions are illustrated in Fig 7.9.
Why DCT not FFT? DCT is similar to the Fast Fourier Transform (FFT), but can approximate lines well with fewer coefficients (Fig 7.10)
DCT/FFT Comparison
o
Computing the 2D DCT Factoring reduces problem to a series of 1D DCTs (Fig 7.11): apply 1D DCT (Vertically) to Columns
apply 1D DCT (Horizontally) to resultant Vertical DCT above. or alternatively Horizontal to Vertical. The equations are given by:
o o
Most software implementations use fixed point arithmetic. Some fast implementations approximate coefficients so all multiplies are shifts and adds. World record is 11 multiplies and 29 adds. (C. Loeffler, A. Ligtenberg and G. Moschytz, "Practical Fast 1-D DCT Algorithms with 11 Multiplications", Proc. Int'l. Conf. on Acoustics, Speech, and Signal Processing 1989 (ICASSP `89), pp. 988-991)
IDCT:
HISTORY OF JPEG
"Joint Photographic Experts Group" is the original name of the committee that created the JPEG format. The standard was a joint effort by three of the world's largest standards organizations: The International Organization for Standardization (ISO), the International Telegraph and Telephone Consultative Committee (CCITT), and the International Electrotechnical Commission (IEC). The JPEG project began back in 1982. The goal was to create a data compression standard that would display an image within one second down a 64 Kbits/sec ISDN line. Eventually, the format would be able to send loss-less images. The standard was intended for natural, real world scenes. It was designed to compress natural pictures that are smooth and curved and have no jagged edges.
The DCT is the transform used in JPEG compression.
The project began under ISO as Working Group 8 but later merged with CCITT. The Joint Photographic Experts Group, actually a subcommittee of ISO, was then formed in 1986 in
order to avoid competing standards among the three standards organizations. After testing of numerous schemes, the Adaptive Discrete Cosine Transform (DCT) was chosen to be the core of the JPEG format. Three years later, the merged ISO/IEC committee gave their approval to make the JPEG the standard. It was drafted as the ISO Committee Draft 10918 or Digital Compression and Coding of Continuous-Tone Still Images. It was officially standardized as the International Standard ISO 10918-1.
JPEG has been in existence for nearly a decade. Revisions updating JPEG to make use of our current text-based technologies are in progress. This project has been in progress since August 1998. The project team isdeveloping a JPEG format that provides more compression options and better images which take up the same amount of space. It is said that the core of JPEG 2000 is Wavelet technology. The release date has been set for January 2000, but implementation will probably take some time.
DCT A technique for converting signal into elementary frequency components Why we need compression?
The need for sufficient storage space, large transmission bandwidth, and long transmission time for image, audio, and video data
Principles behind compression Redundancy reduction Aims at removing duplication from the signal source Irrelevancy reduction It omits parts of the signal that will not be noticed by the signal receiver. One-Dimensional Discrete Cosine Transform The DCT can be written as the product of a vector (the input list) and the n x n orthogonal matrix whose rows are the basis Vectors. We can find that the matrix is orthogonal And each basis vector corresponds to a sinusoid of a certain frequency. The general equation for a 1D (N data items) DCT is defined by the following equation: The one-dimensional DCT is useful in processing one-dimensional signals such as speech waveforms. For analysis of two-dimensional (2D) signals such as images, we need a 2D version of the DCT
DCT-based are either compressed entirely one at a time, or are compressed by alternately interleaving 8x8 sample blocks from each in turn. For a typical 8x8 sample block from a typical source image, most of the spatial frequencies have zero or near-zero amplitude and need not be encoded. DCT introduces no loss to the source image samples. DCT merely transforms them to a domain in which they can be more efficiently encoded. The DC coefficient, which contains a significant fraction of the total image energy, is differentially encoded. Entropy Coding (EC) achieves additional compression losslessly by encoding the quantized DCT coefficients more compactly based on their statistical characteristics. While the DCT-based image coders perform very well at moderate bit rates, at higher compression ratios, image quality degrades because of the artifacts resulting from the block-based DCT scheme. Advantages and Disadvantages The DCT does a better job of concentrating energy into lower order coefficients than does the DFT for image data The DCT is purely real, the DFT is complex. Assuming a periodic input, the magnitude of
the DFT coefficients is spatially invariant . This is not true for the DCT
REFERNECES: 1.http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html 2.http://www.youtube.com/watch?v=hgr5O0du-sg 3.http://wisnet.seecs.edu.pk/publications/tech_reports/DCT_TR802.pdf 4. Directional Discrete Cosine TransformsA New Framework for Image Coding by Bing Zeng, Member, IEEE, and Jingjing Fu http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4449470&isnumber=4479597
Cut an image up into chunks of 8x8 pixels Run each chunk through an 8x8 2D DCT Quantize the resulting coefficients (i.e. throw away unimportant information to reduce the filesize - JPEG does this by dividing the coefficients by a quantization matrix in order to get long runs of zeros) Compress the quantized coefficients using a lossless method (RLE, Huffman, Arithmetic coding, etc)
>>>
Spatial
Frequency
Mathematically, the DCT is perfectly reversable and we do not lose any image definition until we start quantizing coefficients. In my simulation, I simply threw most of them out. A better quantizer would decrease precision gradually instead of simply zeroing out components. Below is the original image and reconstructions of it using only the most significant n x n coefficients.
Original image
4x4 (25%)
3x3 (14%)
2x2 (6%)
Optimization
A 2D DCT can be evaluated more quickly using a series of 1D DCTs. The formula:
This means that instead of performing a 2D DCT, we can perform a 1D DCT on each row and then on each column of the block we're processing. Even a nave implementation of the separated DCT, such as the one in listing2.c, will run faster than a single-pass 2D DCT. The AAN (Arai/Agui/Nakajima) algorithm is one of the fastest known 1D DCTs. listing3.c shows a still faster 2D DCT built from 1D AAN DCTs.
1-Dimensional DCT
The 1-dimensional Forward DCT transforms a row of 8 residual values (V) into a row of 8 coefficients(C):
The following figure shows the result of the FDCT, applied to a sample row. The residual values are represented using shades of gray (for clarity, only positive values are used here)
Sample row Residual values DCT coefficients 121 61 58 -175 61 43 113 49 171 37 200 10 226 5 246 5
The resulting DCT coefficients don't look better compressible at all. The clue to compression, however, can be seen in the decreasing magnitude of the coefficients. The later coefficients are much smaller than the first ones, which might indicate that these do not contribute much to the image quality. The following graphs clarify this:
The red dots represent the original residual values, also shown at the bottom of each graph. The blue line represent the reconstructed residual values after performing an Inverse DCT on several DCT coefficients. In the first graph, only the first DCT coefficient (the DC coefficients) is used to reconstruct the values. In each following graph, an additional coefficient (AC coefficient) is added to the reconstruction. The first graph shows a straight line through the average residual values, which means that the DC coefficient represents the average of an entire row of values. By adding AC coefficients, detail is added to the image and the blue line moves closer to the original red dots. By using only 4 of the 8 coefficients, the reconstructed line is already close to the original. From the fifth graph onward, the subsequent improvements become smaller and less noticeable. This is the key to compression. Since the higher-order (high-frequency) DCT coefficients
contribute less information to the image, they can be discarded while still producing a close approximation of the original.
Quantisation
But completely discarding coefficients is not always desirable. In certain types of images with high contrast, like textual images or cartoons, the high frequency coefficients are important to the image detail and cannot be discarded. This is why JPEG, and AIC, use quantisation, which is just a fancy word for dividing in this context. Each coefficient gets divided by a certain value. The higher this value, the smaller the results will be. This will make the coefficients better compressible, but also reduces image quality because the coefficients cannot be reconstructed faithfully. When you choose a quality level in JPEG or AIC, you actually set the amount of quantisation used. JPEG uses the DCT to transform pixel values instead of residual values. It also uses a nonuniform quantisation method by which high frequency coefficients (the later coefficients) are quantised with higher values than low frequency ones. AIC performs the DCT on residual values. Several tests have shown that uniform quantisation is more appropriate in this case. In AIC, all coefficients are quantised by the same value.
2-Dimensional DCT
The 1-dimensional DCT discussed above only takes advantage of correlation between residual values in a row. Better compression can be achieved when we take both the horizontal and vertical correlation between residual values into account. This is done by performing a 2dimensional DCT on a block of 8x8 residual values:
However, a 2D DCT can also be implemented by first applying a 1D DCT on all rows, followed by a 1D DCT on all columns from the result of the first step. This is much faster than implementing a 2D DCT directly.
The DCT code used in the AIC codec is based on the code in the JPEG reference software from the Independent JPEG Group. This reference software supports different algorithms. AIC only uses the floating point algorithm since it produces the highest quality images. It's a bit slower than the other algorithms, but on modern computers, floating point calculations are performed much faster than in the old days. To speed up the calculations, the AAN (Arai, Agui and Nakajima) algorithm is used to calculate the DCT. Finally, in the last step, the DCT coefficients and prediction modes are encoded to the stream using Context Adaptive Binary Arithmetic Coding(CABAC). In JPEG the DCT coefficients are transmitted in zigzag order to form runs of zeros which can be encoded using run length encoding. The CABAC codec does not use run length encoding, so there is no need to reorder the DCT coefficients. So in AIC, the coefficients are transmitted in scan line order.