Entropy Coding
Entropy Coding
Introduction to Entropy
Coding
Thinh Nguyen
Oregon State University
Codes
Definitions:
English alphabets:
English alphabets:
Note: all the letters (symbols) in this case use the same
number of bits (7). These are called fixed length codes.
The average number of bits per symbol (letter) is called
the rate of the code.
Code Rate
Suppose our source alphabet consists of four letters a1, a2, a3,
and a4 with probabilities P(a1) = 0.5 P(a2) = 0.25, and P(a3)
= P(a4) = 0.125.
l = P(ai )n(ai )
i =1
a1
a2
a3
a4
Probabilitity
Code 1
Code 2
Code 3
Code 4
0.5
0.25
0.125
0.125
0
0
1
10
1.125
0
1
00
11
1.25
0
10
110
111
1.75
0
01
011
0111
1.875
Average Length
Prefix: 011
Dangling suffix: 101
Algorithm:
1.
2.
3.
2.
Consider {0,01,11}
Consider {0,01,10}
Prefix Codes
K (C ) = 2 li 1
i =1
li
2
1
i =1
K (C ) = 2 li 1
Proof of Theorem 1
i =1
li
2 = 2 2 ... 2 = ... 2
i1=1 i 2=1 in =1
i =1
i =1
i =1
i =1
N
[ K (C )] = Ak 2 k
n
k =n
nl
k =n
k =n
[ K (C )]n = Ak 2 k 2 k 2 k = nl n + 1
Growth linearly!!!!
Thus,
K (C ) 1
i =1
Assume: l1 l2 ... l N
j 1
Define:
w1 = 0, w j = 2
l j li
j >1
i =1
wj
would take up
ceil[log 2 ( w j + 1)]
wj
is less than
l j j =1 li
j =1 l j li
l j
log 2 ( w j + 1) = log 2 2 + 1 = log 2 2 2 + 2
i =1
i =1
j =1 li
= l j + log 2 2 l j
i =1
lj
i =1
representation of wj
If
j
2
i =1
l li
i =1
j < k,
cj is the prefix of ck
w
w j = lk kl j
2
k 1
, However
wk = 2 k
l l j
i =1
Therefore,
wk
2
lk l j
k 1
= 2
i =1
l j li
k 1
= wj + 2
l j li
i= j
wk
2
Hence, contradicts!
= wj +1+
lk l j
is
k 1
i = j +1
wj +1
l j li
wj +1