Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
52 views

Dictionary Coding

This document provides an introduction to dictionary coding techniques for data compression. It discusses how dictionary coding works by constructing a list of commonly occurring patterns and encoding those patterns with indexes. It then categorizes dictionary coding techniques as either static or adaptive based on how the dictionaries are constructed. Static dictionaries are fixed before coding, while adaptive dictionaries change during the coding process based on the input data. Well-known dictionary coding algorithms like LZ77 and LZ78 that use sliding windows and adaptive dictionaries are also introduced.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Dictionary Coding

This document provides an introduction to dictionary coding techniques for data compression. It discusses how dictionary coding works by constructing a list of commonly occurring patterns and encoding those patterns with indexes. It then categorizes dictionary coding techniques as either static or adaptive based on how the dictionaries are constructed. Static dictionaries are fixed before coding, while adaptive dictionaries change during the coding process based on the input data. Well-known dictionary coding algorithms like LZ77 and LZ78 that use sliding windows and adaptive dictionaries are also introduced.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Introduction to Dictionary Coding

Dr. Arup Kumar Pal


Department of Computer Science & Engineering
Indian School of Mines, Dhanbad
Jharkhand-826004
E-mail: arupkrpal@gmail.com

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 1
Dictionary Coding
 Construct a list of commonly occurring patterns and
encode these patterns by transmitting their index in
the list
 In general, dictionary-based techniques works well for
highly correlated data (e.g. text), but less efficient for
data with low correlation (e.g. i.i.d. sources)

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 2
Motivation behind Dictionary Coding
 Consider an ‘English’ source with 26 letters & six punctuation
marks
 Single-symbol FLC, fixed-length encoding: 5 bps

 Four-symbol FLC, fixed-length encoding: 20 bps (324)

 If we assume uneven distribution of the symbols


 Pick a dictionary witch contains the 256 most-frequent

patterns (probability p) and encode them with 8 bits


 Encode the rest with 20 bits

 Use 1-bit prefix to distinguish the two cases

 Then, the average rate is 9p + 21(1 – p) = 21 – 12p.


If p > 0.084, 21 – 12p < 20.

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 3
Categorization of Dictionary-Based Coding
The heart of dictionary coding is the
formulation of the dictionary.
A successfully built dictionary results in data
compression; the opposite case may lead to
data expansion.
According to the ways in which dictionaries are
constructed, dictionary coding techniques can
be classified as static or adaptive.

Introduction to Image Compression


Static Dictionary Coding
A fixed dictionary,
Produced before the coding process
Used at both the transmitting and receiving ends
It is possible when the knowledge about the source
alphabet and the related strings of symbols, also known as
phrases, is sufficient.
Merit of the static approach: its simplicity.
Its drawbacks lie on
Relatively lower coding efficiency
Less flexibility compared with adaptive dictionary
techniques

Introduction to Image Compression


Static Dictionary Coding
An example of static algorithms occurs is diagram
coding.
A simple and fast coding technique.
The dictionary contains:
• all source symbols and
• some frequently used pairs of symbols.
In encoding, two symbols are checked at once to see if they
are in the dictionary.
• If so, they are replaced by the index of the two symbols in the
dictionary, and the next pair of symbols is encoded in the next step.

Introduction to Image Compression


Static Dictionary Coding
• If not, then the index of the first symbol is used to encode the first
symbol. The second symbol is combined with the third symbol to
form a new pair, which is encoded in the next step.
The diagram can be straightforwardly extended to n-
gram. In the extension, the size of the dictionary
increases and so is its coding efficiency.

Introduction to Image Compression


Static Dictionary Coding

101 100 110 111 101 100 000

Introduction to Image Compression


Sliding Window (LZ77) Algorithms
Introduction
The dictionary used is actually a portion of the input text,
which has been recently encoded.
The text that needs to be encoded is compared with the
strings of symbols in the dictionary.
The longest matched string in the dictionary is characterized
by a pointer (sometimes called a token), which is represented
by a triple of data items.
Note that this triple functions as an index to the dictionary.
In this way, a variable-length string of symbols is mapped to a
fixed-length pointer.

Introduction to Image Compression


Introduction
There is a sliding window in the LZ77 algorithms. The window
consists of two parts: a search buffer and a look-ahead buffer.
The search buffer contains: the portion of the text stream that has
recently been encoded --- the dictionary.
The look-ahead buffer contains: the text to be encoded next.
The window slides through the input text stream from
beginning to end during the entire encoding process.
The size of the search buffer is much larger than that of the
look-ahead buffer.
The sliding window: usually on the order of a few thousand symbols.
The look-ahead buffer: on the order of several tens to one hundred
symbols.

Introduction to Image Compression


Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 11
Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 12
Summary of the LZ77 Approach
The first symbol in the look-ahead buffer is the symbol
or the beginning of a string of symbols to be encoded
at the moment. Let us call it the symbol s.
In encoding, the search pointer moves to the left, away
from the symbol s, to find a match of the symbol s in
the search buffer. When there are multiple matches,
the match that produces the longest matched string is
chosen. The match is denoted by a triple <i, j, k>.

Introduction to Image Compression


Summary of the LZ77 Approach
The first item in the triple, “i”, is the offset, which is the
distance between the pointer pointing to the symbol giving
the maximum match and the symbol s.
The second item, “j”, is the length of the matched string.
The third item, “k”, is the codeword of the symbol following
the matched string in the look-ahead buffer.
At the very beginning, the content of the search
buffer can be arbitrarily selected. For instance, the
symbols in the search buffer may all be the space
symbol.

Introduction to Image Compression


Summary of the LZ77 Approach
The limitation of the approach is that if the distance
between the repeated patterns in the input text
stream is larger than the size of the search buffer,
then the approach cannot utilize the structure to
compress the text.

Introduction to Image Compression


LZ78 Algorithm
Limitations with LZ77:
 If the distance between two repeated patterns is larger than the
size of the search buffer, then the LZ77 algorithms cannot work
efficiently.
 It cannot recognize the patterns occurring some time ago because
they may have been shifted out from the buffer. In this situation,
the patterns are ignored and no matches are found.
 Increasing the sizes of the search buffer and the look-ahead buffer
seemingly will resolve the problems. A close look, however, reveals
that it also leads to increases in the number of bits required to
encode the offset and matched string length as well as an increase
in processing complexity.

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 16
LZ78 Algorithm
 LZ78 are developed to maintain a dictionary that allow
patterns to remain as entries permanently during the
whole encoding process.

LZ78 Output:

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 17
Example: LZ78 Compression
Encode (i.e., compress) the string ABBCBCABABCAABCAAB
using the LZ78 algorithm.

ABBCBCABABCAABCAAB

A is not in the Dictionary; insert it

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 18
Example: LZ78 Compression
Encode (i.e., compress) the string ABBCBCABABCAABCAAB
using the LZ78 algorithm.

ABBCBCABABCAABCAAB

B is not in the Dictionary; insert it

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 19
Example: LZ78 Compression
Encode (i.e., compress) the string ABBCBCABABCAABCAAB
using the LZ78 algorithm.

ABBCBCABABCAABCAAB

B is in the Dictionary.
BC is not in the Dictionary; insert it.

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 20
Example: LZ78 Compression
Encode (i.e., compress) the string ABBCBCABABCAABCAAB
using the LZ78 algorithm.
ABBCBCABABCAABCAAB

The compressed message is:


(0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)
Note: The above is just a representation, the
commas and parentheses are not transmitted;
Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 21
LZ78 Compression: Number of bits transmitted
Uncompressed String: ABBCBCABABCAABCAAB
Number of bits = Total number of characters ×8
= 18 × 8
= 144 bits
Compressed string( codewords):
(0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)
 Each code word consists of an integer and a character:
 The character is represented by 8 bits.
 The integer part of the codeword is represented by 3 bits.
 Total Bit=11×7=77bits

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 22
LZ78 Decompression

The decompressed message is:


ABBCBCABABCAABCAAB
Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 23
Summary of the LZ78 Approach
 No use of the sliding window.
 Instead of the triples used in the LZ77, only
pairs are used in the LZ78.
 In LZ78, only the position of the pointer to the
matched string and the symbol following the
matched string need to be encoded.
 In theory the dictionary can keep the patterns
forever after they have been seen once. In
practice, the size of the dictionary cannot grow
indefinitely. Some patterns may need to be
reinstalled when the dictionary is full.
Introduction to Image Compression
The LZW Algorithm
 LZW uses fixed-length indices to represent variable-
length strings of symbols/characters that commonly
occur together, e.g., words in English text.
 The LZW encoder and decoder build up the same
dictionary dynamically while receiving the data.
 LZW places longer and longer repeated entries into a
dictionary, and then emits the code for an element,
rather than the string itself, if the element has already
been placed in the dictionary.

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 25
Introduction to LZW

 LZW compression uses a code table, with 4096 as a


common choice for the number of table entries.
 Codes 0-255 in the code table are always assigned to
represent single bytes from the input file.
 When encoding begins the code table contains only
the first 256 entries, with the remainder of the table
being blanks.
 Compression is achieved by using codes 256 through
4095 to represent sequences of bytes.
 As the encoding continues, LZW identifies repeated
sequences in the data, and adds them to the code
book.

Introduction to Image Compression


LZW Encoding Algorithm
Repeat not end of input stream
Step 1: Find the longest match w in the
dictionary
Step 2: Output the index of w
Step 3: Put w+n in the dictionary where n
was the unmatched symbol

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 27
Example: Compression using LZW

Use the LZW algorithm to compress the string


BABAABAAA

ENCODER OUTPUT STRING TABLE


output code representing Index Entry
1 A
2 B

Introduction to Image Compression


Example 1: LZW Compression

BABAABAAA

ENCODER OUTPUT STRING TABLE


output code representing Index Entry
1 A
2 B
2 B 3 BA

Introduction to Image Compression


Example 1: LZW Compression

BABAABAAA

ENCODER OUTPUT STRING TABLE


output code representing Index Entry
1 A
2 B
2 B 3 BA
1 A 4 AB

Introduction to Image Compression


Example 1: LZW Compression

BABAABAAA

ENCODER OUTPUT STRING TABLE


output code representing Index Entry
1 A
2 B
2 B 3 BA
1 A 4 AB
3 BA 5 BAA

Introduction to Image Compression


Example 1: LZW Compression

BABAABAAA

ENCODER OUTPUT STRING TABLE


output code representing Index Entry
1 A
2 B
2 B 3 BA
1 A 4 AB
3 BA 5 BAA
4 AB 6 ABA

Introduction to Image Compression


Example 1: LZW Compression
BABAABAAA
ENCODER OUTPUT STRING TABLE
output code representing Index Entry
1 A
2 B
2 B 3 BA
1 A 4 AB
3 BA 5 BAA
4 AB 6 ABA
1 A 7 AA

Introduction to Image Compression


Example 1: LZW Compression
BABAABAAA
ENCODER OUTPUT STRING TABLE
output code representing Index Entry
1 A
2 B
2 B 3 BA
1 A 4 AB
3 BA 5 BAA
4 AB 6 ABA
1 A 7 AA
7 AA
Introduction to Image Compression
LZW Decoding Algorithm
Initialize dictionary
Decode first index to w
Repeat not end of indices
{
Step 1: Decode next index to S
Step 2: Add w with first character of S to the Dictionary
Step 3: Update w by S
}

Introduction to Image Compression Department of CSE, ISM Dhanbad August 22, 2019 35
Example: Decoding using LZW

Use the LZW algorithm to compress the string


213417 w=B

DECODER OUTPUT STRING TABLE


Index representing Index Entry
1 A
2 B 2 B

Introduction to Image Compression


Example: Decoding using LZW

Use the LZW algorithm to compress the string


213417 w = B S=A

DECODER OUTPUT STRING TABLE


Index representing Index Entry
1 A
2 B 2 B
1 A 3 BA

Introduction to Image Compression


Example: Decoding using LZW

Use the LZW algorithm to compress the string


213417 w = A S=BA

DECODER OUTPUT STRING TABLE


Index representing Index Entry
1 A
2 B 2 B
1 A 3 BA
3 BA 4 AB

Introduction to Image Compression


Example: Decoding using LZW

Use the LZW algorithm to compress the string


213417 w = BA S=AB

DECODER OUTPUT STRING TABLE


Index representing Index Entry
1 A
2 B 2 B
1 A 3 BA
3 BA 4 AB
4 AB 5 BAA

Introduction to Image Compression


Example: Decoding using LZW

Use the LZW algorithm to compress the string


213417 w = AB S=A

DECODER OUTPUT STRING TABLE


Index representing Index Entry
1 A
2 B 2 B
1 A 3 BA
3 BA 4 AB
4 AB 5 BAA
1 A 6 ABA
Introduction to Image Compression
Example: Decoding using LZW
Use the LZW algorithm to compress the string
213417 w = A S=?

DECODER OUTPUT STRING TABLE


Index representing Index Entry
1 A
2 B 2 B
1 A 3 BA
3 BA 4 AB
4 AB 5 BAA
1 A 6 ABA
7 7 A?
Introduction to Image Compression
Example: Decoding using LZW
Use the LZW algorithm to compress the string
213417 w = A S=A

DECODER OUTPUT STRING TABLE


Index representing Index Entry
1 A
2 B 2 B
1 A 3 BA
3 BA 4 AB
4 AB 5 BAA
1 A 6 ABA
7 7 A?
Introduction to Image Compression
Example: Decoding using LZW
Use the LZW algorithm to compress the string
213417 w = A S=A

DECODER OUTPUT STRING TABLE


Index representing Index Entry
1 A
2 B 2 B
1 A 3 BA
3 BA 4 AB
4 AB 5 BAA
1 A 6 ABA
7 AA 7 AA
Introduction to Image Compression
LZW Compression

Introduction to Image Compression

You might also like