Image Compression Fundamentals
Image Compression Fundamentals
Reference:
Digital Image Processing 2
nd
Edition
Rafael C. Gonzalez
Richard E. Woods
Overview
Introduction
Fundamentals
Coding Redundancy
Interpi xel Redundancy
Psychovisual Redundancy
Fidelity Criteria
Image Compression Models
Source Encoder and Decoder
Channel Encoder and Decoder
Elements of Information Theory
Measuring Information
The Information Channel
Fundamental Coding Theorems
Noiseless Coding Theorem
Noisy Coding Theorem
Source Coding Theorem
3/24/2012 2 CS 04 804B Image Processing Module 3
Error-Free Compression
Variable-Length Coding
Huffman Coding
Other Near Optimal Variable Length Codes
Arithmetic Coding
LZW Coding
Bit-Plane Coding
Bit-Plane Decomposition
Constant Area Coding
One-Dimensional Run-Length Coding
Two-Dimensional Run-Length Coding
Lossless Predictive Coding
Lossy Compression
Lossy Predictive Coding
3/24/2012 3 CS 04 804B Image Processing Module 3
Transform Coding
Transform Selection
Subimage Size Selection
Bit Allocation
Zonal Coding Implementation
Threshold Coding Implementation
Wavelet Coding
Wavelet Selection
Decomposition Level Selection
Quantizer Design
Image Compression Standards
Binary Image Compression Standards
One Dimensional Compression
Two Dimensional Compression
3/24/2012 CS 04 804B Image Processing Module 3 4
Continuous Tone Still Image Compression Standards
JPEG
Lossy Baseline Coding System
Extended Coding System
Lossless Independent Coding System
JPEG 2000
Video Compression Standards
3/24/2012 CS 04 804B Image Processing Module 3 5
Introduction
Need for Compression
Huge amount of digital data
Difficult to store and transmit
Solution
Reduce the amount of data required to represent a digital image
Remove redundant data
Transform the data prior to storage and transmission
Categories
Information Preserving
Lossy Compression
3/24/2012 CS 04 804B Image Processing Module 3 6
Fundamentals
Data compression
Difference between data and information
Data Redundancy
If n
1
and n
2
denote the number of information-carrying
units in two datasets that represent the same information,
the relative data redundancy R
D
of the first dataset is
defined as
3/24/2012 CS 04 804B Image Processing Module 3 7
1
2
1
1 ,
, , .
D
R
R
R
C
n
where C is called the compression ratio
n
=
=
3/24/2012 CS 04 804B Image Processing Module 3 8
2 1
2 1
2 1
1:
1 0
2:
1
3:
0
R D
R D
R D
Case n n
C and R no redundant data
Case n n
C and R highly redundant data
significant compression
Case n n
C and R second dataset contains
more data than the original
=
= =
<<
>>
Overview
Introduction
Fundamentals
Coding Redundancy
Interpi xel Redundancy
Psychovisual Redundancy
Fidelity Criteria
Image Compression Models
Source Encoder and Decoder
Channel Encoder and Decoder
Elements of Information Theory
Measuring Information
The Information Channel
Fundamental Coding Theorems
Noiseless Coding Theorem
Noisy Coding Theorem
Source Coding Theorem
3/24/2012 9 CS 04 804B Image Processing Module 3
Coding Redundancy
Let a discrete random variable r
k
in [0,1] represent the
graylevels of an image.
p
r
(r
k
) denotes the probability of occurrence of r
k
.
If the number of pixels used to represent each value of r
k
is l(r
k
), then the average number of bits required to
represent each pixel is
3/24/2012 CS 04 804B Image Processing Module 3 10
( ) , 0,1, 2,... 1
k
r k
n
p r k L
n
= =
1
0
( ) ( )
L
avg k r k
k
L l r p r
=
=
Hence, the total number of bits required to code an MxN
image is MNL
avg
.
For representing an image using an m-bit binary code,
L
avg
= m.
3/24/2012 CS 04 804B Image Processing Module 3 11
How to achieve data compression?
Variable length coding - Assign fewer bits to the more
probable graylevels than to the less probable ones.
Find L
avg
, compression ratio and redundancy.
3/24/2012 CS 04 804B Image Processing Module 3 12
3/24/2012 CS 04 804B Image Processing Module 3 13
Overview
Introduction
Fundamentals
Coding Redundancy
Interpi xel Redundancy
Psychovisual Redundancy
Fidelity Criteria
Image Compression Models
Source Encoder and Decoder
Channel Encoder and Decoder
Elements of Information Theory
Measuring Information
The Information Channel
Fundamental Coding Theorems
Noiseless Coding Theorem
Noisy Coding Theorem
Source Coding Theorem
3/24/2012 14 CS 04 804B Image Processing Module 3
Interpixel Redundancy
Related to interpixel correlation within an image.
The value of a pixel in the image can be reasonably
predicted from the values of its neighbours.
The gray levels of neighboring pixels are roughly the
same and by knowing gray level value of one of the
neighborhood pixels one has a lot of information about
gray levels of other neighborhood pixels.
Information carried by individual pixels is relatively
small. These dependencies between values of pixels in the
image are called interpixel redundancy.
3/24/2012 CS 04 804B Image Processing Module 3 15
Autocorrelation
3/24/2012 CS 04 804B Image Processing Module 3 16
3/24/2012 CS 04 804B Image Processing Module 3 17
3/24/2012 CS 04 804B Image Processing Module 3 18
The autocorrelation coefficients along a single line of
image are computed as
For the entire image,
3/24/2012 CS 04 804B Image Processing Module 3 19
1
0
( )
( )
(0)
1
( ) ( , ) ( , )
N n
y
A n
n
A
where A n f x y f x y n
N n
A
=
A
A =
A = + A
A
To reduce interpixel redundancy, transform it into an
efficient format.
Example: The differences between adjacent pixels can be
used to represent the image.
Transformations that remove interpixel redundancies are
termed as mappings.
If original image can be reconstructed from the dataset,
these mappings are called reversible mappings.
3/24/2012 CS 04 804B Image Processing Module 3 20
3/24/2012 CS 04 804B Image Processing Module 3 21
Overview
Introduction
Fundamentals
Coding Redundancy
Interpi xel Redundancy
Psychovisual Redundancy
Fidelity Criteria
Image Compression Models
Source Encoder and Decoder
Channel Encoder and Decoder
Elements of Information Theory
Measuring Information
The Information Channel
Fundamental Coding Theorems
Noiseless Coding Theorem
Noisy Coding Theorem
Source Coding Theorem
3/24/2012 22 CS 04 804B Image Processing Module 3
Psychovisual Redundancy
Based on human perception
Associated with real or quantifiable visual information.
Elimination of psychovisual redundancy results in loss of
quantitative information. This is referred to as
quantization.
Quantization mapping of a broad range of input values
to a limited number of output values.
Results in lossy data compression.
3/24/2012 CS 04 804B Image Processing Module 3 23
Overview
Introduction
Fundamentals
Coding Redundancy
Interpi xel Redundancy
Psychovisual Redundancy
Fidelity Criteria
Image Compression Models
Source Encoder and Decoder
Channel Encoder and Decoder
Elements of Information Theory
Measuring Information
The Information Channel
Fundamental Coding Theorems
Noiseless Coding Theorem
Noisy Coding Theorem
Source Coding Theorem
3/24/2012 24 CS 04 804B Image Processing Module 3
Fidelity Criteria
Objective fidelity criteria
When the level of information loss can be expressed as a
function of original (input) image and the compressed and
subsequently decompressed output image.
Example: Root Mean Square error between input and
output images.
3/24/2012 CS 04 804B Image Processing Module 3 25
1
2
1 1 2
0 0
( , ) ( , ) ( , )
1
( , ) ( , )
M N
rms
x y
e x y f x y f x y
e f x y f x y
MN
.
.
= =
=
(
(
=
(
(
| |
1 2
( ), ( ),..., ( )
T
J
P a P a P a = z
The probability that the discrete source will emit symbol
a
j
is P(a
j
).
Therefore, the self-information generated by the
production of a single source symbol is,
If k source symbols are generated, the average self-
information obtained from k outputs is
3/24/2012 CS 04 804B Image Processing Module 3 47
1 1 2 2
1
( ) log ( ) ( ) log ( ) ... ( ) log ( )
( ) log ( )
J J
J
j j
j
kP a P a kP a P a kP a P a
k P a P a
=
=
( ) log ( )
j j
I a P a =
The average information per source output, denoted as
H(z), is
This is called the uncertainty or entropy of the source.
It is the average amount of information (in m-ary units
per symbol) obtained by observing a single source
output.
If the source symbols are equally probable, the entropy is
maximized and the source provides maximum possible
average information per source symbol.
3/24/2012 CS 04 804B Image Processing Module 3 48
1
1 1
( ) [ ( )] ( ) ( )
1
( ) log ( ) log ( )
( )
J
j j
j
J J
j j j
j j
j
H E I P a I a
P a P a P a
P a
=
= =
= =
= =
z z
3/24/2012 CS 04 804B Image Processing Module 3 49
A simple information system
Output of the channel is also a discrete random variable
which takes on values from a finite or countably infinite
set of symbols {b
1
, b
2
, , b
K
} called the channel alphabet
B.
The finite ensemble (B,v), where
describes the channel output completely and thus the
information received by the user.
3/24/2012 CS 04 804B Image Processing Module 3 50
| |
1 2
( ), ( ),..., ( )
T
J
P b P b P b = v
The probability P(b
k
) of a given channel output and the
probability distribution of the source z are related as
3/24/2012 CS 04 804B Image Processing Module 3 51
1
( ) ( | ) ( )
( | )
,
.
J
k k j j
j
k j
k
j
P b P b a P a
where P b a is the conditional probability that
the output symbol b is received given that the
source symbol a was generated
=
=
= =
=
z z
z | z | | |
.
k
ed bythe source giventhat theuser receives b
The expected or average value over all b
k
is
3/24/2012 CS 04 804B Image Processing Module 3 54
1
1 1
1 1
1 1
( ) ( ) ( )
( | ) log ( | ) ( )
( | ) ( ) log ( | )
( , )
, ( | )
( )
( ) ( , ) log ( | )
K
k k
k
K J
j k j k k
k j
K J
j k k j k
k j
j k
j k
k
K J
j k j k
k j
H H b P b
P a b P a b P b
P a b P b P a b
P a b
Conditional Probability P a b
P b
H P a b P a b
=
= =
= =
= =
=
(
=
(
=
=
=
z | v z |
z | v
P(a
j
,b
k
) is the joint probability of a
j
and b
k
. That is, the
probability that a
j
is transmitted and b
k
is received.
Mutual information
H(z) is the average information per source symbol,
assuming no knowledge of the output symbol.
H(z|v) is the average information per source symbol,
assuming observation of the output symbol.
The difference between H(z) and H(z|v) is the average
information received upon observing the output symbol,
and is called the mutual information of z and v, given by
I(z|v) = H(z) - H(z|v)
3/24/2012 CS 04 804B Image Processing Module 3 55
3/24/2012 CS 04 804B Image Processing Module 3 56
1 1 1
1 1 1
1 2
1
( ) ( ) ( )
( ) log ( ) ( , ) log ( | )
( ) log ( ) ( , ) log ( | )
( ) ( , ) ( , ) ... ( , )
( , )
J J K
j j j k j k
j j k
J J K
j j j k j k
j j k
j j j j K
K
j k
k
I H H
P a P a P a b P a b
P a P a P a b P a b
P a P a b P a b P a b
P a b
= = =
= = =
=
=
( (
=
( (
= +
= + + +
=
z | v z z | v
3/24/2012 CS 04 804B Image Processing Module 3 57
1 1 1 1
1 1
1 1
( ) ( , ) log ( ) ( , ) log ( | )
( | )
( , ) log
( )
( , )
( , ) log
( ) ( )
J K J K
j k j j k j k
j k j k
J K
j k
j k
j k
j
J K
j k
j k
j k
j k
I P a b P a P a b P a b
P a b
P a b
P a
P a b
P a b
P a P b
= = = =
= =
= =
= +
=
=
z | v
3/24/2012 CS 04 804B Image Processing Module 3 58
1 1
1 1
1 1
( , ) ( | ). ( )
( , ) ( | ). ( )
( | ). ( )
( ) ( | ). ( ) log
( ) ( )
. ( )
. ( ) log
( ) ( )
. ( ) log
( )
. ( ) log
( )
j k j k k
j k k j j
J K
k j j
k j j
j k
j k
J K
kj j
kj j
j k
j k
J K
kj
kj j
j k
k
kj
kj j
k
k
P a b P a b P b
P a b P b a P a
P b a P a
I P b a P a
P a P b
q P a
q P a
P a P b
q
q P a
P b
q
q P a
P b
= =
= =
= =
=
=
=
=
=
=
=
z | v
1 1
J K
j =
z | v
The minimum possible value of I(z|v) is zero.
Occurs when the input and output symbols are statistically
independent.
That is, when P(a
j
,b
k
) = P(a
j
)P(b
k
).
3/24/2012 CS 04 804B Image Processing Module 3 60
1 1
1 1
1 1
( , )
I( | ) ( , ) log
( ) ( )
( ) ( )
( , ) log
( ) ( )
( , ) log1 0
J K
j k
j k
j k
j k
J K
j k
j k
j k
j k
J K
j k
j k
P a b
P a b
P a P b
P a P b
P a b
P a P b
P a b
= =
= =
= =
=
=
= =
z v
Channel Capacity
The maximum value of I(z|v) over all possible choices of
source probabilities in the vector z is called the capacity,
C, of the channel described by channel matrix Q.
Channel capacity is the maximum rate at which
information can be transmitted reliably through the
channel.
Binary information source
Binary Symmetric Channel (BSC)
3/24/2012 CS 04 804B Image Processing Module 3 61
max[I( | )] C =
z
z v
Binary Information Source
3/24/2012 CS 04 804B Image Processing Module 3 62
{ }
( ) ( )
( ) ( ) | |
1 2
2 2
2 2
{ , } 0, 1
1 , 2 1-
,
( ) log log
1 , 2 ,1-
log log
(.)
bs bs bs
bs bs bs bs
T
T
bs bs
bs bs bs bs
bs
Source alphabet A a a
P a p P a p p
Entropy of source
H p p p p
where P a P a p p
p p p p is called thebinary entropy
functiondenoted as H
= =
= = =
=
= = (
(
z
z
2 2
, ( ) log log
bs
For example H t t t t t =
3/24/2012 CS 04 804B Image Processing Module 3 63
Binary Symmetric Channel (Noisy Binary Information
Channel)
3/24/2012 CS 04 804B Image Processing Module 3 64
1 1 1 2
2 1 2 2
.
( | ) ( | )
( | ) ( | )
(0 | 0) (0 | 1)
(1| 0) (1| 1)
1
1
e
e e e e
e e
e e
Let the probability of error during transmission
of any symbol be p
Channel matrix for BSC
P b a P b a
Q
P b a P b a
P P
P P
p p p p
p p
p p
(
=
(
(
=
(
(
(
= =
(
(
(
3/24/2012 CS 04 804B Image Processing Module 3 65
{ }
( ) ( ) ( ) ( )
1 2
1 2
1 2
{ , b } 0, 1
, b 0 , 1
,
(0)
(1)
T T
bs
e e
bs
e e
bs e e bs
e bs e bs
Output alphabet B b
P b P P P
The probabilities of the receiving output symbols
b and b canbedetermined by
p
p p
p
p p
P p p p p
P p p p p
= =
= = ( (
=
(
(
(
(
(
= +
= +
v
v Qz
=
The mutual information of BSC can be computed as
3/24/2012 CS 04 804B Image Processing Module 3 66
2 2
2
2
1 1
1
11
11 1 2
11 1 12 2
21
21 1 2
21 1 22 2
12
12 2 2
11 1 12 2
22
22 2 2
21 1 22 2
( ) . ( ) log
( )
. ( ) log
( ) ( )
. ( ) log
( ) ( )
. ( ) log
( ) ( )
. ( ) log
( ) ( )
kj
kj j
j k
ki i
i
q
I q P a
q P a
q
q P a
q P a q P a
q
q P a
q P a q P a
q
q P a
q P a q P a
q
q P a
q P a q P a
= =
=
=
=
+
+
+
+
+
+
+
z | v
3/24/2012 CS 04 804B Image Processing Module 3 67
( )
( )
2 2
2 2
2 2
2 2
2 2
. log . log
. log . log
. log . log
. log . log
. log . log
e e
bs e bs e
bs e e bs e bs e bs
e e
e bs e bs
bs e e bs e bs e bs
bs bs bs e e e e e bs
e bs e e bs e bs e bs
e e e bs e bs bs e b
p p
p p p p
p p p p p p p p
p p
p p p p
p p p p p p p p
p p p p p p p p p
p p p p p p p p p
p p p p p p p p p
= +
+ +
+ +
+ +
= +
+ +
+ +
( )
( )
2 2
2 2
. log . log
( ) ( )
(.) log log
s
e bs e bs e e bs e bs
bs e bs bs e e bs
bs bs bs bs bs
p p p p p p p p p
H p p p p H p
where H p p p p
+ +
= +
(
=
Capacity of BSC
Maximum of mutual information over all source distributions.
3/24/2012 CS 04 804B Image Processing Module 3 68
2 2
1 1 1
( ) max . , .
2 2 2
1 1
( ) ( ) ( )
2 2
1 1
( (1 ) ) ( )
2 2
1
( )
2
1 1 1 1
log log ( )
2 2 2 2
1 ( )
T
bs
bs e bs e e
bs e e bs e
bs bs e
bs e
bs e
I is imumwhen p is This corresponds to
I H p p H p
H p p H p
H H p
H p
H p
(
=
(
= +
= +
| |
=
|
\ .
=
=
z | v z
z | v
3/24/2012 CS 04 804B Image Processing Module 3 69
Overview
Introduction
Fundamentals
Coding Redundancy
Interpi xel Redundancy
Psychovisual Redundancy
Fidelity Criteria
Image Compression Models
Source Encoder and Decoder
Channel Encoder and Decoder
Elements of Information Theory
Measuring Information
The Information Channel
Fundamental Coding Theorems
Noiseless Coding Theorem
Noisy Coding Theorem
Source Coding Theorem
3/24/2012 70 CS 04 804B Image Processing Module 3
Fundamental Coding Theorems
3/24/2012 CS 04 804B Image Processing Module 3 71
The Noiseless Coding Theorem or Shannons First
Theorem or Shannons Source Coding Theorem for
Lossless Data Compression
When both the information channel and communication
system are error-free
Defines the minimum average codeword length per source
symbol that can be achieved.
Aim: to represent source as compact as possible.
Let the information source (A,z), with statistically
independent source symbols, output an n-tuple of symbols
from source alphabet A. Then, the source output takes on
one of the J
n
possible values, denoted by,
i
, from
3/24/2012 CS 04 804B Image Processing Module 3 72
n
1 2 3
J
A' { , , , , } o o o o = .
3/24/2012 CS 04 804B Image Processing Module 3 73
( )
1 2
1 2
1
1 2
, ( )
( ) ( ) ( )... ( )
' { ( ), ( ),..., ( )}
( ') ( ) log ( )
( ) ( )... ( ) log
n
n
i i
i j j jn
J
J
i i
i
j j jn
Probability of a given P is related to single symbol
probabilities as
P P a P a P a
P P P
Entropy of the sourceis givenby
H P P
P a P a P a P
o o
o
o o o
o o
=
=
=
=
=
z
z
( )
1 2
1
( ) ( )... ( )
( ') ( )
n
J
j j jn
i
a P a P a
H nH
=
=
z z
3/24/2012 CS 04 804B Image Processing Module 3 74
Hence, the entropy of the zero-memory source is n times
the entropy of the corresponding single symbol source.
Such a source is called the n
th
extension of single-symbol
source.
1
log .
( )
1 1
log ( ) log 1
( ) ( )
i
i
i
i i
i
i
Self informationof is
P
l
P P
is thereforerepresented by acodeword whoselength
is the smallest integer exceeding the self - information
of .
o
o
o
o o
s < +
3/24/2012 CS 04 804B Image Processing Module 3 75
1 1 1
1
1 1
( ) log ( ) ( ) ( ) log ( )
( ) ( )
1 1
( ) log ( ) ( ) ( ) log 1
( ) ( )
( ') ' ( ') 1
' ( ) ( )
'
( ') ( ') 1
'
1
( ) ( )
lim
n n n
n
i i i i i
i i
J J J
i i i i
i i i
i i
avg
J
avg i i
i
avg
avg
n
P P l P P
P P
P P l P
P P
H L H
where L P l
L
H H
n n n
L
H H
n n
L
o o o o o
o o
o o o o
o o
o o
= = =
=
s < +
s < +
s < +
=
+
s <
s < +
z z
z z
z z
'
( )
avg
H
n
(
=
(
z
Shannons source coding theorem for lossless data
compression states that for any code used to represent the
symbols from a source, the minimum number of bits
required to represent the source symbols on an average
must be atleast equal to the entropy of the source.
3/24/2012 CS 04 804B Image Processing Module 3 76
( )
'
( ')
'
avg
avg
Theefficiency of any encoding strategy canbe defined as
nH
L
H
L
q
q =
=
z
z
'
1
( ) ( )
avg
L
H H
n n
s < + z z
The Noisy Coding Theorem or Shannons Second
Theorem
When the channel is noisy or prone to error
Aim: to encode information so that the communication is
made reliable and the error is minimized.
Use of repetitive coding scheme
Encode nth extension of source using K-ary code
sequences of length r, K
r
J
n
.
Select only of the K
r
possible code sequences as valid
codewords.
3/24/2012 CS 04 804B Image Processing Module 3 77
A zero-memory information source generates information
at a rate equal to its entropy.
The nth extension of the source provides information at a
rate of information units per symbol.
If the information is coded, the maximum rate of coded
information is log(/r) and occurs when the valid
codewords used to code the source are equally probable.
Hence, a code of size and block length r is said to have a
rate of
information units per symbol.
3/24/2012 CS 04 804B Image Processing Module 3 78
( ') H
n
z
log R
r
=
The noisy coding theorem thus states that for any R<C,
where C is the capacity of the zero-memory channel with
matrix Q, there exists an integer r, and code of block
length r and rate R such that the probability of a block
decoding error is less than or equal to for any >0.
That is, the probability of error can be made arbitrarily
small so long as the coded message rate is less than the
capacity of the channel.
3/24/2012 CS 04 804B Image Processing Module 3 79
The Source Coding Theorem for Lossy Data
Compression
When channel is error-free, but communication process is
lossy.
Aim: information compression
To determine the smallest rate at which information about
the source can be conveyed to the user.
To encode the source so that the average distortion is less
than a maximum allowable level D.
Let the information source and ecoder output be defined
by (A,z) and (B,v) respectively.
A nonnegative cost function (a
j
,b
k
), called distortion
measure, is used to define the penalty associated with
reproducing source output a
j
with decoder output b
k
.
3/24/2012 CS 04 804B Image Processing Module 3 80
3/24/2012 CS 04 804B Image Processing Module 3 81
| |
1 1
1 1
( ) ( , ) ( , )
( , ) ( )
.
( )
( ) min ( )
{ | ( ) }
D
J K
j k j k
j k
J K
j k j kj
j k
Q Q
D kj
Averagevalueof distortionis givenby
d Q a b P a b
a b P a q
whereQis thechannel matrix
Ratedistortion function R D is defined as
R D I
whereQ q d Q D is the set o
= =
= =
e
=
=
=
= s
z, v
.
f all
D admissibleencoding decoding procedures
If D = 0, R(D) is less than or equal to the entropy of the
source, or R(0)H(z).
defines the minimum rate at
which information can be conveyed to user subject to the
constraint that the average distortion be less than or equal
to D.
I(z,v) is minimized subject to:
d(Q) = D indicates that the minimum information rate
occurs when the maximum possible distortion is allowed.
3/24/2012 CS 04 804B Image Processing Module 3 82
| |
( ) min ( )
D
Q Q
R D I
e
= z, v
1
0, 1, ( )
K
kj kj
k
q q and d Q D
=
> = =