Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Huffman Coding, RLE, LZW

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

Huffman and Arithmetic Coding

Coding and Its Application


Introduction
 Huffman codes can be classified as instantaneous code
 It has property:
 Huffman codes are compact codes, i.e. it produces a code with
an average length which is the smallest possible to achieve for
the given number of source symbols, code alphabet, and source
statistics
 Huffman codes operate by reducing a source with q
symbols to a source with r symbols, where r is the size of
the code alphabet
Introduction
 Consider the source S with q symbols si : i  1, 2, , q and
associated with probabilitiesP  si  : i  1, 2, , q
 Let the symbols be renumbered so that P  s1   P  s2    P  sq 
 By combining the last r symbols of S, sq r 1 , sq r  2 , , sq 
into one symbol, s qr 1 with probability P  s q r 1    P  sq r 1 
r

i 1

 The trivial r-ary compact code for the reduced source


with r symbols is used to design the compact code for
the preceding reduced source
Binary Huffman Coding
 The algorithm:
 Re-order the source symbols in decreasing order of symbol
probability
 Reduce the source by combining the last two symbols and re-
ordering the new set in decreasing order
 Assign a compact code for the final reduced source. For a two
symbol source the trivial code is {0, 1}
 Backtrack to the original source S assigning a compact code
 Example:
Consider a 5 symbol source with the following probability
Binary Huffman Coding

The average length is 2.2 bits/symbol


The efficiency is 96.5%

Is Huffman code unique?


r-ary Huffman Codes
qr
 Calculate    r 1 . If α is a non-integer value then append
“dummy” symbols to the source with zero probability until
there are q  r     r 1 symbols
 Re-order the source symbols in decreasing order of symbol
probability
 Reduce the source S to S1 , then S 2 and so on by combining
the last r symbols of S j into a combined symbol and re-
ordering the new set of symbol probabilities for S j 1 in
decreasing order. For each source keep track of the position of
the combined symbol sˆq  r 1
 Terminate the source reduction when a source with exactly r
symbols is produced. For a source with q symbols the reduced
source with r symbols will be S 
r-ary Huffman Codes
 Assign a compact r-ary code for the final reduced source.
For a source with r symbols the trivial code is 0,1, , r
 Backtrack to the original source assigning a compact code
for the j-th reduced source. The compact code assigned
to S , minus the code words assigned to any “dummy”
symbols, is the r-ary Huffman code
r-ary Huffman Codes
 Example: we want to design a compact quaternary code
for a source with 11 symbols

 First, we calculate   114 14  2.33 which is not integer


 
 We need to append “dummy” symbols, so that we have a
source with q  4  2.33  4 1  13 symbols
 The appended symbols is s12 , s13 with P  s12   P  s13   0.00
r-ary Huffman Codes
RUN LENGTH CODING
Run-Length Encoding: RLE
 Replaces sequences of the same data values within a file
 by a count number
 and a single value.
 Also need a special byte value to indicate when count number
follows
 E.g., ASCII does not use high order bit of byte, so special byte value can
be 1000 0000 in binary (we’ll call this  here)
 Suppose the following string of ASCII data has to be
compressed:
ABBBBBBBBBCDEEEEF
 Using RLE compression, the compressed file takes up 10 bytes and
could look like this:
A 9BCD4EF
 Data size before compression: 17 bytes
 Data size after compression: 10 bytes
 Savings: 17/10 = 1.7
 In order to have a savings
 Need to have a sequence of >=4 of same characters
Run-length coding

 Every code word is made up of a pair (g, l) where g is the


gray level, and l is the number of pixels with that gray
level (length, or “run”).
 E.g.,
56 56 56 82 82 82 83 80
56 56 56 56 56 80 80 80
creates the run-length code (56, 3)(82, 3)(83, 1)(80, 4)(56, 5).
 The code is calculated row by row.

 Very efficient coding for binary data.


 Important to know position, and the image dimensions
must be stored with the coded image.
 Used in most fax machines.la University) Image Coding
an
Run-length coding
Run-length coding
Run-length coding

Compression Achieved
Original image requires 3 bits per pixel (in total - 8x8x3=192 bits).
Compressed image has 29 runs and needs 3+3=6 bits per
run (in total - 174 bits or 2.72 bits per pixel).
LZW - CODING
Lempel-Ziv-Welch
The History of LZW
 1977 (LZ77) is published and improved in 1978 (LZ78)
 1981 LZ file for US patent 4,464,650 on LZ78 (granted 1984)
for Sperry Corp
 1983 Welch improves on LZ78 before leaving Sperry, who file
for US patent 4,558,302 June 20, 1983 (granted Dec. 10, 1985)
 1984 Welch, publishes "A Technique for High Performance
Data Compression," IEEE Computer, vol. 17, no. 6, June 1984.
 1986 Sperry, Burroughs form Unisys, who assumed ownership
of US 4,558,302

17
Lempel-Ziv-Welch (LZW) Compression: http://netghost.narod.ru/gff/graphics/book/ch09_04.htm
LZW Compression

 Works by building a dictionary of phrases from the input


stream
 A token or an index is used to identify each distinct phrase
 Character sequences in the original text are replaced by
codes that are dynamically determined.
 The code table is not encoded into the compressed text,
because it may be reconstructed from the compressed text
during decompression.
LZW Compression

 Assume the letters in the text are limited to {a, b}.


 In practice, the alphabet may be the 256 character ASCII set.
 The characters in the alphabet are assigned code numbers
beginning at 0.
 The initial code table is:

code 0 1
key a b
LZW Compression
code 0 1 2
key a b ab
 Original text = abababbabaabbabbaabba
 p=a
 pCode = 0
 c=b
 Represent a by 0 and enter ab into the code table.
 Compressed text = 0
LZW Compression
code 0 1 2 3
key a b ab ba

 Original text = abababbabaabbabbaabba


 Compressed text = 0
• p=b
• pCode = 1
• c=a
• Represent b by 1 and enter ba into the code table.
• Compressed text = 01
LZW Compression
code 0 1 2 3 4
key a b ab ba aba

 Original text = abababbabaabbabbaabba


 Compressed text = 01
• p = ab
• pCode = 2
• c=a
• Represent ab by 2 and enter aba into the code
table.
• Compressed text = 012
LZW Compression
code 0 1 2 3 4 5
key a b ab ba aba abb

 Original text = abababbabaabbabbaabba


 Compressed text = 012
• p = ab
• pCode = 2
• c=b
• Represent ab by 2 and enter abb into the code
table.
• Compressed text = 0122
LZW Compression
code 0 1 2 3 4 5 6
key a b ab ba aba abb bab
 Original text = abababbabaabbabbaabba
 Compressed text = 0122
• p = ba
• pCode = 3
• c=b
• Represent ba by 3 and enter bab into the code
table.
• Compressed text = 01223
LZW Compression
code 0 1 2 3 4 5 6 7
key a b ab ba aba abb bab baa
 Original text = abababbabaabbabbaabba
 Compressed text = 01223
• p = ba
• pCode = 3
• c=a
• Represent ba by 3 and enter baa into the code
table.
• Compressed text = 012233
LZW Compression
code 0 1 2 3 4 5 6 7 8
key a b ab ba aba abb bab baa abba
 Original text = abababbabaabbabbaabba
 Compressed text = 012233
• p = abb
• pCode = 5
• c=a
• Represent abb by 5 and enter abba into the code
table.
• Compressed text = 0122335
LZW Compression
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa
 Original text = abababbabaabbabbaabba
 Compressed text = 0122335
• p = abba
• pCode = 8
• c=a
• Represent abba by 8 and enter abbaa into the code
table.
• Compressed text = 01223358
LZW Compression
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa
 Original text = abababbabaabbabbaabba
 Compressed text = 01223358

• p = abba
• pCode = 8
• c = null
• Represent abba by 8.
• Compressed text = 012233588
Code Table Representation
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa

 Dictionary.
 Pairs are (key, element) = (key,code).
 Operations are : get(key) and put(key, code)
 Limit number of codes to 212.
 Use a hash table.
 Convert variable length keys into fixed length keys.
 Each key has the form pc, where the string p is a key that is already in
the table.
 Replace pc with (pCode)c.
Code Table Representation

code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa

code 0 1 2 3 4 5 6 7 8 9
key a b 0b 1a 2a 2b 3b 3a 5a 8a
LZW Decompression
code 0 1
key a b
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• Convert codes to text from left to right.


• 0 represents a.
• Decompressed text = a
• pCode = 0 and p = a.
• p = a followed by next text character (c) is entered
into the code table.
LZW Decompression
code 0 1 2
key a b ab
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 1 represents b.
• Decompressed text = ab
• pCode = 1 and p = b.
• lastP = a followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3
key a b ab ba
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 2 represents ab.
• Decompressed text = abab
• pCode = 2 and p = ab.
• lastP = b followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4
key a b ab ba aba
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 2 represents ab
• Decompressed text = ababab.
• pCode = 2 and p = ab.
• lastP = ab followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4 5
key a b ab ba aba abb
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 3 represents ba
• Decompressed text = abababba.
• pCode = 3 and p = ba.
• lastP = ab followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4 5 6
key a b ab ba aba abb bab
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 3 represents ba
• Decompressed text = abababbaba.
• pCode = 3 and p = ba.
• lastP = ba followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4 5 6 7
key a b ab ba aba abb bab baa
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 5 represents abb
• Decompressed text = abababbabaabb.
• pCode = 5 and p = abb.
• lastP = ba followed by first character of p is entered
into the code table.
LZW Decompression
code 0 1 2 3 4 5 6 7 8
key a b ab ba aba abb bab baa abba
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 8 represents ???
• When a code is not in the table, its key is
lastP followed by first character of lastP.
• lastP = abb
• So 8 represents abba.
LZW Decompression
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa
 Original text = abababbabaabbabbaabba
 Compressed text = 012233588

• 8 represents abba
• Decompressed text = abababbabaabbabbaabba.
• pCode = 8 and p = abba.
• lastP = abba followed by first character of p is
entered into the code table.
Code Table Representation
code 0 1 2 3 4 5 6 7 8 9
key a b ab ba aba abb bab baa abbaabbaa

 Dictionary.
 Pairs are (key, element) = (code, what the code represents) = (code,
codeKey).
 Operations are : get(key) and put(key, code)
 Keys are integers 0, 1, 2, …
 Use a 1D array codeTable.
 codeTable[code] = codeKey.
 Each code key has the form pc, where the string p is a code key that
is already in the table.
 Replace pc with (pCode)c.
Time Complexity
 Compression.
 O(n) expected time, where n is the length of the text that is
being compressed.
 Decompression.
 O(n) time, where n is the length of the decompressed text.

You might also like