Chapter 6 - Predictive Coding
Chapter 6 - Predictive Coding
PREDICTIVE CODING
Lecture By,
Prof.M.Dhanalakshmi,
Asst.Prof.,
IT Dept,
SCET, Surat.
CONTEXT BASED COMPRESSION
Use minimal prior assumptions about the
statistics of the data.
Instead they use the context of the data being
encoded and the past history of the data to
provide more efficient compression.
In previous chapters we learned that we get more
compression when the message that is being
coded has a more skewed set of probabilities.
By ―skewed‖ we mean that certain symbols occur
with much higher probability than others in the
sequence to be encoded.
DM
Dhanalakshmi
CONTEXT BASED COMPRESSION
Distribution is based on the history of the
sequence.
We use that history to determine a sequence in a
predictive manner, such scheme is known as
predictive coding or context based coding.
For making the prediction it uses some context
which is of order 1, 2 and so on..
Best popular context based algorithm is PPM.
Dhanalakshmi
CONTEXT BASED COMPRESSION
PPM: Prediction with Partial Match
Only need to store those context that have
occurred in the sequence being encoded.
Dhanalakshmi
BASIC ALGORITHM
If the symbol has not occurred in the context then
Escape symbol is encoded.
Dhanalakshmi
EXAMPLE OF BASIC ALGORITHM
Encode the sequence
Dhanalakshmi
BASIC ALGORITHM
First-order context: (Table 6.3)
Dhanalakshmi
BASIC ALGORITHM
Second-Order Context: (Table 6.4)
Dhanalakshmi
BASIC ALGORITHM
Assume that the word length for arithmetic
coding is six.
l = 000000
u = 111111
Dhanalakshmi
BASIC ALGORITHM
The update equations for the lower and upper limits
are:
Dhanalakshmi
BASIC ALGORITHM
Updated second-order context:
Context Letter Count Cum_Count
th i 1 1
<ESC> 1 2
Total Count 2
hi s 1 1
<ESC> 1 2
Total Count 2
is /b 2 2
<ESC> 1 3
Total Count 3
s/b i 1 1
<ESC> 1 2
Total Count 2
/bi s 1 1
<ESC> 1 2
Dhanalakshmi Total Count 2
BASIC ALGORITHM
Updated First-order context:
Context Letter Count Cum_Count
t h 1 1
<ESC> 1 2
Total Count 2
h i 1 1
<ESC> 1 2
Total Count 2
i s 2 2
<ESC> 1 3
Total Count 3
s /b 2 2
<ESC> 1 3
Total Count 3
/b i 1 1
<ESC> 2 2
Dhanalakshmi Total Count 2
BASIC ALGORITHM
Updated zero order context:
Dhanalakshmi
BASIC ALGORITHM
The next letter to be encoded in the sequence is t.
The second-order context is ‗s/b’.
Dhanalakshmi
BASIC ALGORITHM
Again, the MSBs of l and u are the same, so we
shift the bit out and shift 0 into the LSB of l, and
1 into u, restoring l to a value of 0 and u to a
value of 63.
The transmitted sequence is now 01.
Dhanalakshmi
BASIC ALGORITHM
Updating the limits, we get
Dhanalakshmi
BASIC ALGORITHM
Dhanalakshmi
BASIC ALGORITHM
Check for ‗t‘ in zero order context.
‗t‘ is present in zero-order context table with
Cum_Count = 1 and Total_Count = 9.
Dhanalakshmi
BASIC ALGORITHM
Dhanalakshmi
Encoded upto t h i s /b i s /b t is 011000.
Dhanalakshmi
Next letter to be encoded is ‗h‘
The second order context of ‗h‘ is ‗/bt‘.
Cum_Count = 1
Total Count = 2
Calculate u and l.
Dhanalakshmi
BURROWS-WHEELER TRANSFORM
The Burrows-Wheeler Transform (BWT)
algorithm also uses the context of the symbol
being encoded for lossless compression.
Dhanalakshmi
BURROWS-WHEELER TRANSFORM
Steps in BWT:
Given a sequence of length N, we create N −1 other
sequences where each of these N −1 sequences is a
cyclic shift of the original sequence.
These N sequences are arranged in lexicographic
order.
The encoder then transmits the sequence of length N
created by taking the last letter of each sorted,
cyclically shifted, sequence. This sequence of last
letters L, and the position of the original sequence in
the sorted list, are coded and sent to the decoder.
Dhanalakshmi
BURROWS-WHEELER TRANSFORM
Example:
Encode the sequence
Dhanalakshmi
BURROWS-WHEELER TRANSFORM
Sequences sorted in lexicographical order.
Dhanalakshmi
BURROWS-WHEELER TRANSFORM
If we know that the original sequence is in the kth
row, then we can begin unraveling the original
sequence starting with the kth element of F.
The original sequence is sequence number 10, so the
first letter in of the original sequence is F[10] = t.
To find the letter following t we look for t in the array
L.
There are two t‘s in L.
Which should we use? The t in F that we are working
with is the lower of two t‘s, so we pick the lower of two
t‘s in L.
This is L[4].
Therefore, the next letter in our reconstructed
sequence is F[4] = h. The reconstructed sequence to
this point is th.
Dhanalakshmi
BURROWS-WHEELER TRANSFORM
To find the next letter, we look for h in the L
array. Again there are two h‘s. The h at F[4] is
the lower of two h‘s in F, so we pick the lower of
the two h‘s in L.
This is the fifth element of L, so the next element
in our decoded sequence is F[5] = i. The decoded
sequence to this point is thi.
Dhanalakshmi
BURROWS-WHEELER TRANSFORM
Use L and F to generate the original sequence.
Dhanalakshmi
MOVE-TO-FRONT CODING
A coding scheme that takes advantage of long runs of
identical symbols is the move-to-front (mtf) coding.
Dhanalakshmi
MOVE-TO-FRONT CODING
Encoded list is 4 0 3
Dhanalakshmi
MOVE-TO-FRONT CODING
The next letter is t, which gets encoded as 5.
Moving t to the top of the list, we get
Encoded list is 4 0 3 5.
The next letter is also a t, so that gets encoded as
a 0.
Encoded list is 4 0 3 5 0.
40350135015
Dhanalakshmi
LOSSLESS IMAGE
COMPRESSION
OLD JPEG STANDARD
The Joint Photographic Experts Group (JPEG) is
a joint ISO/ITU committee responsible for
developing standards for continuous-tone still-
picture coding.
The more famous standard produced by this
group is the lossy image compression standard.
The old JPEG lossless still compression standard
provides eight different predictive schemes from
which the user can select.
The first scheme makes no prediction. The next
seven are listed below.
Three of the seven are one-dimensional
predictors, and four are two-dimensional
prediction schemes.
Dhanalakshmi
OLD JPEG STANDARD
I(i,j )is the (i,j)th pixel of the original image, and
ˆI(i,j) is the predicted value for the (i,j)th pixel.
Dhanalakshmi
OLD JPEG STANDARD
Different images can have different structures that
can be best exploited by one of these eight modes of
prediction.
If compression is performed in a nonreal-time
environment—for example, for the purposes of
archiving—all eight modes of prediction can be tried
and the one that gives the most compression is used.
Example encoded our four test images using the
various JPEG modes.
The best results—that is, the smallest compressed file
sizes—are indicated in bold in the table.
From these results we can see that a different JPEG
predictor is the best for the different images.
Dhanalakshmi
OLD JPEG STANDARD
Dhanalakshmi
CALIC
The Context Adaptive Lossless Image
Compression (CALIC) scheme, a new lossless
image compression scheme.
Uses both context and prediction of the pixel
values.
The CALIC scheme actually functions in two
modes, one for gray-scale images and another for
bi-level images.
We will concentrate on the compression of gray-
scale images.
In an image, a given pixel generally has a value
close to one of its neighbors.
Which
neighbor has the closest value depends on the
local structure of the image.
Dhanalakshmi
CALIC
Depending on whether there is a horizontal or
vertical edge in the neighborhood of the pixel
being encoded, the pixel above, or the pixel to the
left, or some weighted average of neighboring
pixels may give the best prediction.
Dhanalakshmi
CALIC
In this figure, the pixel to be encoded has been
marked with an X.
The pixel above is called the north pixel, the pixel
to the left is the west pixel, and so on.
Note that when pixel X is being encoded, all other
marked pixels (N W NW NE WW NN NE, and
NNE) are available to both encoder and decoder.
Dhanalakshmi
CALIC
We can get an idea of what kinds of boundaries may
or may not be in the neighborhood of X by computing:
Dhanalakshmi
CALIC
Using the information about whether the pixel
values are changing by large or small amounts in
the vertical or horizontal direction in the
neighborhood of the pixel being encoded provides
a good initial prediction.
In order to refine this prediction, we need some
information about the interrelationships of the
pixels in the neighborhood.
Using this information, we can generate an offset
or refinement to our initial prediction.
We quantify the information about the
neighborhood by first forming the vector
Dhanalakshmi
CALIC
We then compare each component of this vector
with our initial prediction ˆX.
If the value of the component is less than the
prediction, we replace the value with a 1;
otherwise we replace it with a 0.
Thus, we end up with an eight-component binary
vector.
If each component of the binary vector was
independent, we would end up with 256 possible
vectors.
However, because of the dependence of various
components, we actually have 144 possible
configurations.
Dhanalakshmi
CALIC
We also compute a quantity that incorporates the
vertical and horizontal variations and the previous
error in prediction by
Dhanalakshmi
CALIC
Compute a quantity that incorporates the
vertical and horizontal variations and the
previous error in prediction by
Dhanalakshmi
CALIC
Keep track of how much prediction error is
generated in each context and offset our initial
prediction by that amount.
This results in the final predicted value.
Dhanalakshmi
CALIC
In order to reduce the complexity of the encoding,
rather than using the actual value as the context,
CALIC uses the range of values in which lies as
the context.
Dhanalakshmi
CALIC
The values of q1–q8 can be prescribed by the
user.
If the original pixel values lie between 0 and
M−1, the differences or prediction residuals will
lie between −M −1 and M −1.
Even though most of the differences will have a
magnitude close to zero, for arithmetic coding we
still have to assign a count to all possible
symbols.
This means a reduction in the size of the
intervals assigned to values that do occur, which
in turn means using a larger number of bits to
represent these values.
The CALIC algorithm attempts to resolve this
problem in a number of ways.
Dhanalakshmi
CALIC
Example consider the sequence:
is given by
Dhanalakshmi