DIP Lecture Note - Image Compression
DIP Lecture Note - Image Compression
Module #6
Title: Image compression: redundancy
Explanation:
Image compression:
Data compression refers to the process of reducing the amount of data required to
represent a given quantity of information.
Data redundancy is a central issue in DIP.
Data that either provide no relevant information or simply restate which is already known
is said to contain data redundancy.
For achieving compression we have to reduce redundancy.
If n1 and n2 denote the no: of information carrying units in two data sets that convey the
same information, the relative data redundancy RD of the first data set can be defined as
1
𝑅𝐷 =
𝐶𝑅
Where CR is called the compression ratio and is given by
𝑛1
𝐶𝑅 =
𝑛2
In digital image compression, 3 basic data redundancies can be identified and exploited.
They are coding redundancy, interpixel redundancy and psychovisual redundancy.
Coding Redundancy: Data compression can be achieved by encoding the data using an
appropriate encoding scheme. The elements of an encoding scheme are
i. Code: a system of symbols used to represent a body of information or set of
events (letters, numbers etc.)
ii. Code word: a sequence of symbols used to represent a piece of information
or an event
iii. Word length: number of symbols in each code word
Let a discrete random variable rk in the interval [0, 1] represents the gray values
constituting an image and that each rk occurs with a probability of pr(rk). Then
pr(rk) is given by
𝑛𝑘
𝑝𝑟 (𝑟𝑘 ) = ; 𝑘 = 𝑜, 1, 2, … , 𝐿 − 1
𝑛
where L is the no: of gray values, nk is the histogram and n is the total no: of
pixels in the image.
Let the no: of bits used to represent rk is l(rk), then the average no: of bite required
to represent each pixel is
𝐿𝑎𝑣𝑔 = ∑𝐿−1
𝑘=0 𝑙(𝑟𝑘 )𝑝𝑟 (𝑟𝑘 ).
Thus the total no: of bits required to code an M x N image is MNLavg.
Two types of codes are there: constant length coding (fixed length coding) and
variable length coding.
Consider the table given below:
Code 1 represents a fixed length coding scheme and code 2 represents a variable
length coding scheme. i.e. Lavg of code 1 is 3 bits and that of code 2 is 2.7 bits.
The resulting compression ratio is CR= 3/2.7=1.11 and redundancy is RD=1-
(1/1.11)=0.099. i.e. 9.9% of the first coding scheme is redundant. This is coding
redundancy.
Coding Redundancy:
Interpixel redundancy implies that any pixel value can bereasonably predicted by
its neighbors (i.e., correlated).
Structural or geometric relationships between the objects in an image can lead to
interpixel redundancy.
A variety of names, including spatial redundancy, interframe redundancy and
geometric redundancy have been coined to refer these interpixel dependencies.
We use the term interpixel redundancy to encompass them all.
In order to reduce this, the pixel array is transformed into a more efficient form.
For eg: the differences between adjacent pixels can be used to represent an image.
Psychovisual Redundancy:
Certain information simply has less relative importance than other information in
normal visual processing. This information is said to be psychovisually redundant.
This type of redundancy is associated with real or quantifiable information.
Since the elimination of this result in loss of quantifiable information, it is
referred to as quantization.
Questions:
References:
Explanation:
The mapper transforms the input image into a format designed to reduce interpixel
redundancies.
The quantizer reduces the accuracy of the mappers output in accordance with some
predefined fidelity criteria. This reduces the psychovisual redundancy.
The symbol encoder reduces the coding redundancy by assigning shortest codeword to
the most frequently occurring data.
The reverse process is performed in the source decoder section.
Questions:
3. Explain the image compression model with the help of a block diagram.
4. Explain how the various types of redundancies are reduced by the compression system.
References:
3. “Digital image processing”, by Gonzalez and Woods
4. www.wikipedia.org
Module #6
Title: Image compression: elements of information theory
Explanation:
It defines the minimum average code word length per source symbol that can be
achieved.
A source of information with finite ensemble statistically independent source symbols is
called zero-memory source.
According to this theorem
𝐿𝑎𝑣𝑔
𝐻(𝑧) = lim
𝑛→∞ 𝑛
Questions:
References:
1. “Digital image processing”, by Gonzalez and Woods
2. www.wikipedia.org
Module #6
Title: Image compression: variable length coding (Huffman coding)
Explanation:
Huffman coding:
o After the above step the code assignment is according to the following table.
Huffman decoding:
It is uniquely decidable because any string of code word can be decoded in only one way.
For the binary code of the above figure, a left to right scan of the encoded string
010100111100 reveals that the first valid code word is 01010, which is the code for
symbol a3. The next valid code is 011, which corresponds to symbol a1. Continuing in
this manner reveals the completely decoded message to be a3a1a2a2a6.
Questions:
References:
1. “Digital image processing”, 3/e, by Gonzalez and Woods (Page no: 564-565)
2. www.wikipedia.org
Module #6
Title: Transform based compression techniques
Explanation:
In transform coding, a reversible, linear transform (such as Fourier transform, DCT, etc) is used
to map the image into a set of transform coefficients, whch are then quantized and coded.
Transform coding is a type of data compression for "natural" data like audio signals or
photographic images. The transformation is typically lossless (perfectly reversible) on its
own but is used to enable better (more targeted) quantization, which then results in a
lower quality copy of the original input (lossy compression).
In transform coding, knowledge of the application is used to choose information to
discard, thereby lowering its bandwidth. The remaining information can then be
compressed via a variety of methods. When the output is decoded, the result may not be
identical to the original input, but is expected to be close enough for the purpose of the
application.
In the encoder side, 4 relatively straight forward operations are performed: subimage
decomposition, transformation, quantization and coding.
An N x N image is first subdivided into subimages of size n x n, which are then
transformed to generate (N/n)2 subimage transform arrays, each of size n x n. the goal of
the transformation process is to decorrelate the pixels of each subimage, or to pack as
much information as possible into the smallest number of transform coefficients.
The quantization stage then selectively eliminates or more coarsely quantizes the
coefficients that carry the least information. These coefficients have the smallest impact
on reconstructed subimage quality. The encoding process terminates by coding, normally
a variable length coding like Huffman coding.
In decoding stage, the reverse of all operations performed in encoding stage are
performed to reconstruct the original image.
Example are image compression standards like JPEG, JPEG2000, etc.
Module #6
Title: Image compression: MPEG, JPEG, JPEG2000
Explanation:
The key idea is to combine transform coding (in the form of the Discrete
Cosine Transform (DCT) of 8 × 8 pixel blocks) with predictive coding (in
the form of differential Pulse Code Modulation (PCM)) in order to reduce
storage and computation of the compressed image, and at the same time to
give a high degree of compression and adaptability.
Since motion compensation is difficult to perform in the transform domain,
the first step in the interframe coder is to create a motion compensated
prediction error in the pixel domain.
For each block of current frame, a prediction block in the reference frame is
found using motion vector found during motion estimation, and differenced
to generate prediction error signal. This computation requires only a single
frame store in the encoder and decoder.
The resulting error signal is transformed using 2D DCT, quantized by an
adaptive quantizer, entropy encoded using a VariableLength Coder (VLC)
and buffered for transmission over a fixed rate channel.
Pre-processing:
o Image tiling (optional) - for each image component
o DC level shifting - samples of each tile are subtracted the same
quantity
o Color transformation (optional) - from RGB to Y Cb Cr
Discrete Wavelet Transform (DWT) is used to decompose each
tilecomponent into different sub-bands.
The transform is in the form of dyadic decomposition and use
biorthogonalwavelets.
1-D sets of samples are decomposed into low-pass and high-pass samples.
Low-pass samples represent a down-sampled, low resolution version of
the original set.
High pass samples represent a down-sampled residual version of the
original set (details).
Pre-processing:
Image tiling (optional) - for each image component
DC level shifting - samples of each tile are subtracted the same quantity
Color transformation (optional) - from RGB to Y Cb Cr
Discrete Cosine Transform (DCT) is used to decompose each tilecomponent
into different sub-bands.
After transformation, all coefficients are quantized using scalar quatization.
Quantization reduces coefficients in precision.
The coefficients in a code block are separated into bit-planes. The
individualbit-planes are coded in 1-3 coding passes
Each of these coding passes collects contextual information about the
bitplane data. The contextual information along with the bit-planes are used
by thearithmetic encoder to generate the compressed bit-stream.
Module #6
Title: Image compression: Dictionary based compression
Explanation:
LZW coding:
General purpose compression technique proposed by Lempel-Ziv-Welch (LZW).
LZW uses fixed-length codewords to represent variable-length strings of
symbols/characters that commonly occur together, e.g., words in English text.
LZW encoder and decoder build up the same dictionary dynamically while receiving
the data.
LZW places longer and longer repeated entries into a dictionary, and then emits the
code for an element, rather than the string itself, if the element has already been placed
in the dictionary.
It is conceptually very simple. At the onset of the coding process, a codebook or
dictionary containing the source symbols to be coded is constructed.
With 8-bit image data, an LZW coding method could employ 10-bit words. The
corresponding string table would then have 210 = 1024 entries
This table consists of the original 256 entries, corresponding to the original 8-bit data,
and allows 768 other entries for string codes
The string codes are assigned during the compression process, but the actual string
table is not stored with the compressed data
During decompression the information in the string table is extracted from the
compressed data itself
For the GIF (and TIFF) image file format the LZW algorithm is specified, but there
has been some controversy over this, since the algorithm is patented by Unisys
Corporation
Since these image formats are widely used, other methods similar in nature to the
LZW algorithm have been developed to be used with these, or similar, image file
formats
LZW (Lempel-Ziv-Welch) coding, assigns fixed-length code words to variable length
sequences of source symbols, but requires no a priori knowledge of the probability of
the source symbols.
LZW is used in:
o Tagged Image file format (TIFF)
o Graphic interchange format (GIF)
o Portable document format (PDF)
LZW was formulated in 1984
A codebook or “dictionary” containing the source symbols is constructed.
For 8-bit monochrome images, the first 256 words of the dictionary are assigned to the
gray levels 0-255
Remaining part of the dictionary is filled with sequences of the gray levels
Special features are
o The dictionary is created while the data are being encoded. So encoding can be
done “on the fly”
o The dictionary is not required to be transmitted. The dictionary will be built up
in the decoding
o If the dictionary “overflows” then we have to reinitialize the dictionary and add
a bit to each one of the code words.
o Choosing a large dictionary size avoids overflow, but spoils compressions
39 39
39 39 39 256 39-39
39-39-
256 258 126-126 260 126
39-39- 126-
258 260 261 126-39
126
39-39-
260 259 126-39 262 126-126
126-39-
259 257 39-126 263 39
39-126-
257 126 126 264 126
Module #6
Title: Image compression: Vector quantization
Explanation:
Vector quantization
Vector quantization (VQ) is a classical quantization technique from signal processing that
allows the modeling of probability density functions by the distribution of prototype
vectors. It was originally used for data compression. It works by dividing a large set of
points (vectors) into groups having approximately the same number of points closest to
them. Each group is represented by its centroid point, as in k-means and some
other clustering algorithms and is known as index.
The density matching property of vector quantization is powerful, especially for
identifying the density of large and high-dimensional data. Since data points are
represented by the index of their closest centroid, commonly occurring data have low
error, and rare data high error. This is why VQ is suitable for lossy data compression. It
can also be used for lossy data correction and density estimation.
Vector quantization is based on the competitive learning paradigm, so it is closely related
to the self-organizing map model and to sparse coding models used in deep learning
algorithms such as autoencoder.
Group into vectors (non-overlapping) and “quantize” each vector.
o For time signals… we usually form vectors from temporally-sequential samples.
o For images… we usually form vectors from spatially-sequential samples.
The source symbols are grouped into vectors and each vector is entered
into the codebook and assigned an index in the index table or the look-up
table. Thus the encoder is created and the decoder has the same table as
that in the encoder.
When a query comes, the encoder finds the closest match and transmit the
index to the decoder side.
While receiving the index, the best match in the lookup table is find out and
the corresponding data from the codebook is decoded. Then the message is
unblocked for getting the actual message.
Module #6
Title: Image compression: Wavelet- based image compression
Explanation:
Wavelet- based image compression
It utilizes an unconditional basis function that decreases the size of the expansion
coefficients to a negligible value as the index values increase.
The wavelet expansion allows for a more precise and localized isolation and
description of the signal characteristics. This ensures that DWT is very much effective
in image compression applications.
Secondly, the inherent flexibility in choosing a wavelet gives scope to design wavelets
customized to fit individual requirements.
The basis functions employed by the Wavelet Transform are called wavelets. Wavelets
are mother waves used for transformation. These waves are scaled and shifted
according to the variations in the signal to be analysed. Mother wavelet Ψ(t):by
scaling and translating the mother wavelet, we can obtain the rest of the function for
the transformation (child wavelet, Ψa,b(t))
1 t b
a ,b (t ) ( )
a a
Wavelet analysis can be used to represent the image in terms of two sub-signals.
Firstly, the approximation sub-signal that captures the general trends in the image
samples and the detail sub-signal that contains the high frequency vertical, horizontal
and diagonal information.
No need to block the input image and its basis functions have variable length to avoid
blocking artifacts.
More robust under transmission and decoding errors.
Better matched to the HVS characteristics
Good frequency resolution at lower frequencies, good time resolution at higher
frequencies – good for natural images.
LL – approximation coefficients.
HL – Horizontal edges
LH – Vertical edges
HH – Diagonal edges
The second figure shows the 3 level decomposition of an image.
Advantages of DWT over DCT
o Blocking artifacts are avoided since it has higher compression ratios
o The input need not be partitioned into non-overlapping blocks while coding.
o It allows good localization both in time and spatial frequency domain.
o It introduces inherent scaling and performs the transformation of the whole
image.
o Since it can identify more accurately the data relevant to human perception, it
achieves higher compression ratio
o Higher flexibility: the inherent flexibility in choosing a wavelet gives scope to
design wavelets customized to fit individual requirements.