Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A Universal Variable-to-Fixed Length Source Code Based On Lawrence's Algorithm

Download as ps, pdf, or txt
Download as ps, pdf, or txt
You are on page 1of 14

A Universal Variable-to-Fixed Length Source Code

Based on Lawrence's Algorithm


Tjalling J. Tjalkens* Frans M.J. Willems
June 17, 1991

Abstract
The Lawrence algorithm (1977) is a universal binary variable-to- xed length
source coding algorithm. Here we introduce a modi ed version of this algorithm and
investigate its asymptotic performance. For M (the segment set cardinality) large
enough, we show that the rate R as function of the source parameter  satis es
R h( ) 1 +

log log M  ;
 
2 log M
for 0 <  < 1. Here h( ) is the binary entropy function.


In addition to this, we prove that no codes exist that have a better asymptotic per-
formance, thereby establishing the asymptotic optimality of our modi ed Lawrence
code.
The asymptotic bounds show that universal variable-to- xed length codes can
have a signi cantly lower redundancy than universal xed-to-variable length codes
with the same number of codewords.
Index terms: Universal Source Coding, Enumerative Coding, Variable-
to-Fixed Length Codes, Asymptotic Redundancy.

1 Preliminaries
A binary memoryless information source generates a sequence of independent and identi-
cally distributed random variables fXt gt=1;1, each of which assumes values in the nite
set X =  f0; 1g, called the source alphabet. Let  = PrfX = 1g = 1 PrfX = 0g,
t t
t = 1; 2;   . Then the entropy of the source (in bits per symbol) is equal to h() =
 log() (1 ) log(1 ). (We assume throughout this paper that log()'s have base 2
and that ln() has base e.)
In what follows we will describe a universal variable-to- xed length coding strategy
for the class of binary memoryless sources. With these codes, the (in nite length) source
 Eindhoven University of Technology, Faculty of Electrical Engineering, P.O. Box 513,
5600 MB Eindhoven.

1
sequence is chopped up into sequences of variable length (segments), chosen from some
nite set S of segments, and each segment is assigned to a code sequence of xed length
N = log M , where M is the number of segments in S . (Note that we ignore the rounding
of log M to an integer). This set of segments must be complete, i.e. every in nite sequence
has a pre x in the segment set, since every sequence must be subdividable into segments.
We also require that the set is proper, i.e. no segment in the set is a pre x of an other
segment in the set. In this way we guarantee a unique subdivision of every source sequence.
It is assumed that the code alphabet Y = f0; 1g. Let L(x ) = k be the length of segment
x = (x1 ; x2;    ; xk ). Then instead of sending the L(x ) source symbols to a receiver
we can send the corresponding codeword. This codeword can be used by the receiver to
reconstruct the source segment. If the code is properly chosen, the average segment length
Lav
 can be considerable higher than N where
Lav  X
PrfX  = x gL(x ):
 = (1)
x 2S
Therefore the (compression) rate R of a code, which is de ned as R =  N=Lav , can be

smaller than one. Note that a universal code can not be designed using the statistics of
the source.
Tunstall [10] discovered a procedure for constructing an optimum segment set for a
given memoryless source. For a xed N , this construction maximizes Lav  . If we form
such a code for a binary source with  < 0:5, then N + log()  L h()  N . A major
av
disadvantage of a Tunstall code is that the complete code has to be stored by both the
encoder and the decoder. Note that these codes are not universal.
Lawrence [5] devised a variable-to- xed length code that is easier to implement. Only a
part of Pascal's triangle must be stored by the encoder and the decoder now. An additional
feature of this code is that it is universal. This code can be seen as the variable-to- xed
length counterpart of Schalkwijk's [8] `Pascal triangle' algorithm.
In this paper we will describe a modi cation of this Lawrence code. Instead of using
a pre x and a sux implementation as in Schalkwijk [8] and in Lawrence [5] we compute
the lexicographical indices of the segments. The lexicographical index of a segment x 2 S
equals the number of segments in S which are less than x in a lexicographical ordering.
This index can be represented using log M binary symbols. We also change the segment
set of the code. Both modi cations yield a more natural and simple implementation of the
algorithm and reduce the redundancy of the code. In the next sections we will be more
speci c about our `modi ed Lawrence' code.

2 The segment set


Just like Tunstall [10], we try to de ne our segment set as a set of, more or less, equiprobable
sequences. However, the probability Prfx g = (1 )a b of a sequence containing a zeros
and b ones is unknown to the encoder and decoder. Because we have no knowledge about
the parameter , we assume that  is a random variable uniformly distributed over the
interval [0; 1]. In this way we do not favor one particular value above another and this
2
seems to be a fair choice. So the source is a composite source, see Davisson [3], instead of a
memoryless one. For this composite source the probability Q(x ) of a sequence containing
a zeros and b ones is
! 1
Z 1
  a b
Q(x ) = (1 )  d = a + b + 1 b1 a + b : (2)
0
 (x ; x ;    ; x  )
We use Q(x ) to de ne the segments. Given a positive integer C , x = 1 2 L(x )
is a segment in the code if and only if
Q(x ) 1  C and Q(x 1 ) 1 < C; (3)
with x 1 = (x1 ; x2 ;    ; xL(x ) 1). Note that we can use Pascal's triangle P (a; b) = a+b b ,
 

see Figure 1, to determine whether or not a sequence is a segment. A new segment starts
at the top of the triangle, xt = 0 corresponds to a step in the a-direction, xt = 1 to a step
in the b-direction. Hence 0010001 is a segment since 7  6 < 82 and 8  21  82. We say
that (5,2) is on the segment set boundary in the triangle, i.e. (5,2) is a boundary point.
See Appendix A.1 for the exact de nition of this term. In Figure 1 we also indicate the
path for 001001.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 21 7 1
1 8 28 28 8 1
1 9 36 36 9 1
1 10 10 1
1 11 . a b & 11 1
1 12 12 1
 

1 81 1 1

Figure 1: Modi ed Lawrence code with C=82. Boundary points are underlined.

3 The coding algorithm


We have seen that the segment set boundary can be determined using Pascal's triangle,
but is it also possible to nd the lexicographical index of a segment in a similar way? Yes,
but we have to re ll the triangle rst. An element M (a; b) of the new array must be equal
3
to the number of distinct ways to reach a boundary point after we have seen a zeros and
b ones. Therefore
9

M (a; b) = 1 if (a; b) is a boundary point, =


(4)

M (a; b) = M (a + 1; b) + M (a; b + 1) if not. ;

Observe that M (0; 0) is equal to the total number of segments in the set. M (0; 0) is a
function of C and for C = 82 this total number is 256, see Figure 2, where we re lled the
triangle of Figure 1 from bottom to top using (4) and starting at the boundary points. In
the appendix A we show that for C large enough 2C  M (0; 0)  2C 1 + ln M2(0;0) .
log


256
128 128
106 22 106
95 11 11 95
88 7 4 7 88
83 5 2 2 5 83
79 4 1 1 1 4 79
76 3 1 1 3 76
74 2 1 1 2 74
73 1 1 1 1 73
72 1 1 72
71 1 . a b & 1 71
70 1 1 70
 

1 1 1 1

Figure 2: Coding array for C=82.


To determine the lexicographical index (and the end) of a segment, the encoder uses
this M (a; b) array in the following way:

i) index := 0 a := 0 b := 0
ii) while M(a,b) 6= 1 do
if x(next) = 0
then a := a + 1
else index := index + M(a+1,b)
b := b + 1
This lexicographical index is sent to the decoder that reconstructs the segment as fol-
lows:

i) I := 0 a := 0 b := 0

4
ii) while M(a,b) 6= 1 do
if index < (I + M(a+1,b))
then x(next) := 0
a := a + 1
else x(next) := 1
I := I + M(a+1,b)
b := b + 1

It will be clear that the lexicographical index of the segment 0010001 is 95 + 3 = 98,
see Figure 2. We remark that there are M (3; 0) = 95 segments starting with '000' and
M (6; 1) = 3 segments starting with '0010000'.

4 The performance
The redundancy of a code is de ned as the di erence between the compression rate R
and the source entropy h(). In this section we tabulate the redundancy of our algorithm
and compare it with the universal 'Pascal triangle' algorithm [8] and Lawrence's algo-
rithm [5]. We compute the redundancy of the three codes for the code sizes 256 (i.e. 8
digits codewords) and 65536 (i.e. 16 digits codewords). The results are listed in Table 1.
Code size is 28 Code size is 216
 Pascal Law- modif.  Pascal Law- modif.
triangle rence algor. triangle rence algor.
0.5 0.17857 0.25799 0.24574 0.5 0.09943 0.12219 0.19436
0.1 0.22958 0.21492 0.17505 0.1 0.13257 0.11983 0.11929
0.01 0.37748 0.20726 0.06196 0.01 0.22517 0.04883 0.03849
0.001 0.42016 0.24226 0.09132 0.001 0.25925 0.00480 0.00449
0.0001 0.42740 0.24889 0.09768 0.0001 0.26559 0.00329 0.00063
0.00001 0.42842 0.24986 0.09862 0.00001 0.26653 0.00381 0.00102
Table 1: The redundancies.
From this table we can see that the two variable-to- xed length algorithms (Lawrence
and the modi ed Lawrence algorithms) outperform the xed-to-variable length Pascal tri-
angle algorithm, except for high entropy sources. Also we observe that for low entropy
sources the modi ed algorithm performs better than the original Lawrence algorithm.
From the table we can conclude that the variable-to- xed length algorithms perform well
for practical code sizes.

5 The asymptotic performance


In the previous section we have seen that the modi ed algorithm compares favorably to
the other universal algorithms for small code sizes. It is also interesting to see how the

5
rate of this modi ed Lawrence algorithm converges as the code size increases. An asymp-
totic upperbound on the rate is stated in the following theorem and its proof is given in
Appendix A.
Theorem 1 For any  > 0 and any 0 <  < 1, we have for C > C (; ) that
M  1 + (1 + )  log log M  h():
!

R = log
Lav 2 log M
(5)

It should be noted that M increases when C increases. In particular, see the text before
(11) in Appendix A where it is shown that M  2C .

6 A lowerbound to the compression rate


In this section we state a lowerbound to the compression rate achieved by any variable-
to- xed length code with M codewords for (almost) all sources  2 [0; 1]. The result is
summarized in the following theorem, and the proof thereof is given in Appendix B.
Theorem 2 For all  > 0 and any variable-to- xed length code with a large enough number
M of segments we have
log M
!

R  1 + (1 )  log
2 log M
 h();
for all 0    1 except for those  in a set B whose volume tends to zero as M increases.
The proof of this theorem is based on Rissanen's converse for xed-to-variable length
codes for arbitrary sources [7]. We restrict ourselves to variable-to- xed length codes for
binary memoryless sources, although it is clear that the proof readily extends to arbitrary
nite alphabet memoryless sources.

7 Conclusion
In this contribution we showed that the modi ed Lawrence algorithm is universal over the
class of binary memoryless sources, and in addition, that the rate converges asymptotically
optimally fast to the source entropy h(). For the class of binary memoryless sources the
asymptotically optimal redundancy is h()  log log M=(2 log M ) where M is the number
of codewords. When we compare this to Rissanen's redundancy for this case, which is
log N=(2N ) = log log M=(2 log M ) where M again denotes the number of codewords, we
see that in the VF case the asymptotic redundancy is a factor h() lower than in the FV
case.
An earlier converse for VF codes for memoryless sources was given by Tro mov, see [4].
The lowerbound stated there showed a uniform convergence in correspondence with Davis-
son's [3] result that the class of memoryless sources is `minimax universal'. However this
6
bound is expressed in terms of the minimal average message length of a code (with respect
to the class of sources) and we consider this a less realistic approach than our bound of
Theorem 2 which relates the redundancy to the code size.
In a recent paper, Shtarkov [9] presented two universal VF coding schemes for m-ary
memoryless sources. The rst scheme achieves Tro mov's lowerbound. The upperbound
for the redundancy for the second scheme for binary sources is twice as high as our upper-
bound (5).
Finally we want to thank the reviewer who suggested that an upperbound on the rate
in a previous version might be too weak. Motivated by this remark we indeed obtained
the suggested improvement and also the converse result.

Appendices
A An upperbound on the rate of the modi ed Lawrence
algorithm
This appendix consists of 4 subsections. Throughout these subsections we assume that
0 <  < 1 and that  > 0.

A.1 The shape of the boundary


For (a; b) with a = 0; 1;    and b = 0; 1;    we de ne the segment function f (; ) as
a+b :
!

f (a; b) = (a + b + 1) b (6)

Recall that (a; b) corresponds to a sequence containing a zeros and b ones. A point (a; b)
is now said to be internal if and only if f (a; b) < C . A point (a; b) is a boundary point if
and only if f (a; b)  C and at least one of the points (a; b 1) and (a 1; b) is internal.
Note that if (a; b) is a boundary (internal) point, (b; a) is a boundary (internal) point
too. The set of boundary points (segments) is therefore symmetric. Let S be the minimal
value of a + b when (a; b) is a boundary point. Then, if S is even, (S=2; S=2) must be a
boundary point and (S=2; S=2 1) an internal point. Consequently C > f (S=2; S=2 1) 
S  2SS 1 = 2S 1. Likewise, for S odd, we can show that C > 2S 1. Hence
S < log 2C: (7)
To avoid degenerate codes we always assume that S  1 and thus C > 1.
Now consider a boundary point (a; b) with a  b. Then, if b  1, the point (a; b 1)
must be internal since f (a; b 1)  f (a 1; b). Note that not both (a + 1; b 1) and
(a + 1; b) can be boundary points. On the other hand either (a + 1; b 1) or (a + 1; b) must
be a boundary point since f (a + 1; b)  f (a; b)  C .
7
Now for 1  b  S=2 let amin (b) resp. amax (b) be the minimal resp. maximal
value of a such that (a; b) is a boundary point. The consequence of the above is that
(amin (b); b); (amin(b) + 1; b);    ; (amax (b); b) all are (adjacent) boundary points and so is
(amax (b) + 1; b 1).
If we consider a boundary point (a; 0) then (a 1; 0) must be an internal point. Note
that (a + 1; 0) can not be a boundary point too.

A.2 An upper bound for the number of segments


For 0  b  S=2 let M (b) be the number of segments that contain b ones. From the
previous subsection it follows that
amin (b) + b amin (b) + b amax (b) + b 1 amax (b) + b
! ! ! !

M (b) = + ++ = : (8)


b b 1 b 1 b
Using symmetry we nd for the total number of segments M that
amax + b
!

M =2
X

b N (S ); (9)
b=0;bS=2c
 
where N (S ) is 0 for odd S and S for even S . Note that (amax(b); b 1) is an interior
S=2
point if 2  b  bS=2c: Hence

C > f (amax (b); b 1) = (amax (b) + b) amax (bb) +1b 1 = b amax (bb) + b :
! !

(10)

From amax0 (0) + amax1 (1) = C (and consequently M  2C ) for C large enough, and (10),
   

we obtain from (9) that


Z S=2

M  2C + 2 
X C  2C + 2 C db = 2C (1 + ln S )
b=2;bS=2c b 1 b 2
 2C 1 + ln log22C  2C 1 + ln log2M ;
!! !!

(11)

for C large enough. The third inequality is a consequence of (7).

A.3 An upper bound for the segment divergence


Recall that the probability that the source generates a sequence x containing a zeros and
b ones is PrfX  = x g = (1 )a b . In this subsection we will derive an upper bound
for the divergence D (P kQ) between the actual probability distribution over the segments
P (x) = PrfX  = x g and the design distribution Q(x ). For this divergence we can write
D (P kQ) =
X P (x) =
P (x) log Q
X
Prf(A; B ) = (a; b)g log (11 a+) b 1
a b
(12)
x 2S (x ) (a;b)2G
 

a+b+1 b

8
where S is the set of all segments in the code, G the set of all boundary points (a; b), and
Prf(A; B ) = (a; b)g is the probability that the source generates a segment with a zeros
and b ones. 
Now let  = =2 and + = (1 + )=2. If we note that 0 <  <  < + < 1 we can
de ne
G  = f(a; b) 2 G :   a +b b  +g: (13)
Note that both a and b tend to in nity for (a; b) 2 G  when C increases. This follows from
the fact that for any boundary point (a; b)
a + b = 21 log 22(a+b)  12 log(a + b + 1)2a+b
1 a +b
!
p
 2 log(a + b + 1) b  log C (14)

and from a  (1 + )(a + b) and b   (a + b) for (a; b) 2 G  . Therefore for C large


p botht+a12 and b will be 1 or larger for (a; b) 2 G . Using Stirling's approximation
enough 
t! = 2  t exp( t + 12t ) for some 0 < < 1 for t > 0 (see Abramowitz and Stegun
[1]), we obtain for the argument of the log in (12) that
1 (a + b)(1 ) a (a + b) b
s
(1 )a b a +b
! !

1 a+b 1
  (a + b + 1) 2ab exp( 12 )

a b
a+b+1 b
p
s
2
= a + b (a + b + 1) exp( 1 ) exp (a + b)d b k
!!

2ab 12 a+b
p b + 1)2 exp( 1 )  pa + b 9 exp(1=6)
s v

 a + b (a +2ab
u
u
t
12 8 (1 +)
v
p p
= a + b 9 exp(1=6) = K a + b;
u
u
t
(15)
2(1 )
for K = (9 exp(1=6))=(2(1 )) and where d(pkq) =  p log p + (1 p) log 1 p (with
q

q 1 q
0  p  1 and 0 < q < 1) is the (non-negative) binary divergence function. For any
(a; b) 2 G we nd
(1 )ab
1 a+b 1
 (a + b + 1)  C: (16)
a+b+1 b
The last inequality holds since for any point (a; b); a + b  1 on the boundary, there exists
an interior point (a0 ; b0) with a0 + b0 = a + b 1 for which (a0 + b0 + 1)  B < C , for some
binomial coecient B .
To get an upper bound for the divergence D(P kQ) in (12) we rst consider
Prf(A; B ) 62 G  g = Prf(A; B ) 2 G : A + B <  g + Prf(A; B ) 2 G : B > +g
B A+B
9
= Prf (1 )l i i g + Prf (1 )l+ ii g
X X

i=0;dl  e 1 i=dl++ e;l+


 exp( l d(kp )) + exp( l+ d(k+))
 2 exp( log( C )  ) = 2C =2 ; (17)
where l resp. l+ is the value of a + b of the boundary point (a; b) 2 G  for which b=(a + b)
is minimal resp. maximal and  = min(d(k ); d(k+)). The second equality follows
from inspection of the shape of the boundary, the rst inequality from Cherno 's bound,
the last one from inequality (14).
We now combine (15), (16), and (17) and obtain for the divergence in (12)
 p 
D(P kQ)  Prf(A; B ) = (a; b)g log K a + b + Prf(A; B ) 62 G  g log C
X

(a;b)2G 
p
 Prf(A; B ) = (a; b)g log( a + b) + K + 2C =2 log C
X

(a;b)2G
 12 log(Lav ) + K + ; (18)
for C large enough.

A.4 An upper bound on the rate


First recall (from subsection A.2) that M  2C for C large enough. Therefore M tends to
in nity when C increases. We shall use this fact several times in this subsection.
Taking logarithms in (11) and noting that 1 + ln(log(M )=2)  log log M for C large
enough we obtain
log C  log M log log log M 1: (19)
From Massey's `leaf-node' theorem [6] it follows that Hsegm = Lav  h() where H
segm is
the segment entropy. Hence, with log Q(x )  C 1 for all segments x, we have
D(P kQ) = P (x) log Q(1x ) Hsegm  log C Lav  h():
X
(20)

x 2S
Substituting the upper bound for the divergence (18) in (20) we nd that
log C  Lav 1
  h() + 2 log L + K + ;
av (21)
for C large enough. Combining (19) and (21) yields, again for C large enough, that
1
  h()  log M 2 log L
Lav av log log log M K
  1

 log M 12 log log M log log log M K0 ; (22)

10
 K 1 log h() + 1 +  . The second inequality follows from log M  H segm =
where K0 =  2 
L  h(). For the rate of our modi ed Lawrence code we nally obtain
av

R = log M  log M  h()


L
av 1
log M 2 log log M log log log M K0
1
= log log M log log log M K0  h()
1 2 log M log M ! log M
 1 + (1 + )  log log M  h(); (23)
2 log M
for C large enough. This proves Theorem 1.

B The proof of the converse


The converse presented here, can be regarded as an adaptation of Rissanen's converse for
xed-to-variable length codes for arbitrary sources [7]. We restrict ourselves to variable-to-
xed length codes for binary memoryless sources. For the source parameter  we assume
that 0    1.
Let  > 0: Fix a such that 0 < < 1. Consider a variable-to- xed length code with
a (proper and complete) segment set S with M segments. For such a code we de ne the
probability
A = Prfx 2 S : L(x ) < Lmin g; (24)
where Lmin = d log M e. Note that A depends on  as is indicated by its subscript. For
the entropy of the segments in terms of A we nd that
Hsegm  h(A ) + A log 2Lmin 1 + (1 A ) log M
 1 + (1 A (1 )) log M: (25)
  h(): Combining this
From Massey's `leaf-node' theorem [6] it follows that Hsegm = Lav
with our the upper bound for Hsegm, we nd for the rate of our code for given  that

R = log M = log M  h()  log M  h()


Lav Hsegm 1 + (1 A (1 )) log M !

 1 + A (1 ) log1M  h(): (26)

Note that this lower bound for R holds also for h() = 0. Now we easily arrive at our rst
implication :
log M log log M
!

A (1 )  log
2 log M
) R   1+
2 log M
1
log M
 h(): (27)

11
Next we introduce the set X of segments that have a pre x of length Lmin which is `
-typical', i.e.
8 9
LXmin
X = :x 2 S : L(x )  L ^ Lmin  xi   p cmin ; ;
1
 <
  min

=
(28)

i=1
L
where c > 0 is to be speci ed later. Note that for P = Prfx 2 X g, by the union-bound
and Chebyshev's inequality, we may conclude that
8 9
LXmin
P  1 Prfx 2 S : L(x ) < L g Pr :x 2 S : Lmin  xi  > p cmin ;1
< =
  min 



i=1 L
 1 A (1c2 )  1 A 41c2 : (29)
Let M be the number of segments in X , then from the `log-sum' inequality (Csiszar and
Korner [2]) we obtain that
T =
X
Prfx g log
Prfxg  P log P : (30)


x 2X 1 =M M  =M
Furthermore from Massey's leaf-node theorem, the log-sum inequality and the basic in-
equality ln t < t 1 it follows that
 g log Prfx g

log M = Lav  f
X
 h(  ) + Pr x 1=M
x 2S
 Lav  h() + T + (1 P ) log 1 1 MP=M 
 Lav  h() + T log e: (31)

2
segm  log M , leads for M  2exp 1  , to our second
This combined with Lav  h() = H
implication :
M M e
!
log log log log
T  (1 )  2 ) R  1 + (1 )  2 log M log M  h(): log (32)
From the de nition
M M
( )
 log log
B =  : A (1 ) < 2 log M ^ T < (1 )  2 log log : (33)

and the implications (27) and (32) it follows that for M large enough
log M
!

 62 B ) R  1 + (1 2)  log 2 log M


 h(): (34)
For  2 B and M large enough however, the rate of our code may not satisfy the inequality
in (34). In this converse it is our objective to show that the `volume' V of the set B,
12
i.e. the set containing all 's for which our code has a `small enough' redundancy, can
be made arbitrarily small by p increasing Therefore let N be the maximal number of
pM. min

disjoint intervals I = [ c= L ;  + c= L ] that can be constructed with centerpoints
min
 2 B. Let C be the set of all centerpoints. The corresponding intervals may not cover
B completely, but they do if we double their sizes. To see this, suppose that some ^ 2 B
remained uncovered after the size of the intervals had been doubled. Then the distance
p min
^
between  and any centerpoint would be at least 2c= L . But then the original number
of intervals would not have been maximal ! We can therefore bound the volume V by
V  4N  p cmin : (35)
L
To nd an upper bound for N , consider a  2 B. Then from inequality (30) and the
de nition of B in (33) we may conclude that
M 1  2 log P log log M
!

log <
M P log log M  2
; (36)
while from (29) and de nition (33) it follows that
log M 1
P > 1 1 1  log 2 log M 4c2
(37)
p
By choosing c = ( ) 1 we can guarantee that P > 1 =2 for M large enough. Sub-
stituting this in (36) we obtain that for M large enough there must exist an < 1 such
that
M > M  (log M ) =2 : (38)
The intervals I that correspond to all centerpoints in C are disjoint, therefore the sets X
corresponding to all these centerpoints must also be disjoint. Consequently
M M > NM  (log M ) =2 :
X
(39)
2C
Combining (38) and (39) yields
N < (log M ) =2 : (40)
p
If we substitute this bound for N in (35), set c = ( ) 1 and note that Lmin  log M ,
we obtain that
V < p4   (log M )( 1)=2 : (41)
Since < 1 we get that V # 0 for M ! 1. Taking  = =2 nally proves the converse as
stated in Theorem 2.

References
[1] M. Abramowitz and I.A. Stegun, Handbook of Mathematical Functions. New York :
Dover, 1970.
13
[2] I. Csiszar and J. Korner, Information Theory : Coding Theorems for Discrete Memo-
ryless Systems. Budapest, Hungary : Akademiai Kiado, 1981.
[3] L.D. Davisson, \Universal noiseless coding," IEEE Trans. Inform. Theory, vol. IT-19,
no. 6, 1973, pp. 783{795.
[4] R.E. Krichevsky and V.K. Tro mov, \The performance of universal encoding," IEEE
Trans. Inform. Theory, vol. IT-27, no. 2, 1981, pp. 199{207.
[5] J.C. Lawrence, \A new universal coding scheme for the binary memoryless source,"
IEEE Trans. Inform. Theory, vol. IT-23, no. 4, 1977, pp. 466{472.
[6] J.L. Massey, \The entropy of a rooted tree with probabilities," presented at the IEEE
Int. Symp. Inform. Theory, St. Jovite, Canada, Sept. 26{30, 1983.
[7] J. Rissanen, \Universal Coding, Information, Prediction, and Estimation," IEEE
Trans. Inform. Theory, vol. IT-30, no. 4, July 1984, pp. 629{636.
[8] J.P.M. Schalkwijk, \An algorithm for source coding," IEEE Trans. Inform. Theory,
vol. IT-18, no. 3, 1972, pp. 395{399.
[9] Yu.M. Shtarkov, \The variable-to-block universal encoding of memoryless sources,"
presented at The Fourth Joint Swedish-Soviet International Workshop on Information
Theory, august 1989, Gotland, Sweden.
[10] B.P. Tunstall, Synthesis of noiseless compression codes, Ph.D. dissertation, Georgia
Institute of Technology, Sept. 1967.

14

You might also like