Graf 1997 MN PDF
Graf 1997 MN PDF
Graf 1997 MN PDF
Abstract. For a real- valued random variable whose distribution is the classical Cantor probabil-
ity, the n -quantization error and the n -optimal quantization rules are calculated for every natural
number n. Moreover, the connection between the rate of convergence of the logarithms of the quan-
tization errors for n going to infinity and the Hausdorff dimension of the Cantor set is indicated.
1. Introduction
Let P denote a probability measure on Rd. Fix n E IN and let Fn be the set
of all n-quantizers, i-e., the set of all Bore1 measurable maps f : Rd + ELd with
card (f(lRd)) 5 n. If X is a random variable in lRd with distribution P then, for each
f E Fn,.f(X)gives a quantized version of X . The quantization error is defined by
v, =
VTl
V, = inf
a
1min llz - a112 c i ~ ( z,)
aEa
P = -POSF1
1 + ,POS,-',
1
2
where P o 5'7' denotes the image measure of P with respect to Si (i = 1 , 2 ) .
sets (J,),,~(1,2}k are just the 2k intervals of length 1/(3k) in the k-th level of the
classical Cantor construction. The intervals Ju+l,Ju*2 into which J,, is split up at the
(k + 1) - th level are called the children of J,. The set C = nkEN
UaEt1,21k J, is the
Cantor set and equals the support of the Cantor measure P .
$
s fdP=
uE{1,2}~
/foS,dP.
For later use we will need the expectation E ( P ) and variance V ( P )of P .
Lemma 3.4.
1 1
E(P)= - V(P)= v1 = -
2' 8'
and for fixed xo E IR,
/(x
( - -1)2 + -:
- z ~ ) ~ d P (=z ) zo I
E(P) = s x d P ( z )
1
= -E(P)
3
+ 31 .
This equation implies
1
E(P) = -
2
(ii) First we will calculate x2 dP(x).
Using 3.1 we obtain
Graf/Luschgy, The Quantization of the Cantor Distribution 117
J z2dP(z)= -3.
8
(iii) Using (i) and (ii) we deduce
3
V ( P ) = Jz2dP(z) - E(P)2 = -
8
- 1-4 --8-1'
+
(iv) Since [(z - Z O ) ~dP(z) = V ( P ) (E(P)
-~ 0 the
) last
~ equation in the lemma
holds. 0
Definition 3.5. For n E IN with n 2 1 let E(n)be the unique natural number with
-
2"") _< n < 2'((n)+'. For I C {l,2}'(n) with card(I) = n 2'(fl) let a,(I) be the set
consisting of all midpoints a, of intervals J, with u E (1,2}l(n)\I and of all midpoints
a,,~, au*2 of the children of J, with u E I. Formally,
Proposition 3.7.
P r o o f . Obviously we have
It follows from Corollary 3.3 and Lemma 3.4 that, for every k E N and every
T E {l,qk>
118 Math. Nachr. 183 (1997)
Thus we deduce
s min l ~ - a ) ~ d ~ =( z-1). -
a€a,(I)
1 card ({1, 2}*(,)\1)
8 18i(n)
R e m a r k 3.9. For integers n = 2l the upper bound takes the form t(i)'and
coincides with the upper bound given by BUCKLEW and WISE [2], p. 243.
.
L e m m a 4.1. Let n 2 2 and let a = {al, .. ,a,} be an optimal set of n -means
for P and assume w. 1.0. g. a1 < a2 < --
< a,. Then
Proof. Assume 1 / 3 5 a l . Then Corollary 3.8, Theorem 3.1 and Lemma 3.4 imply
-!-
18
/(z - 3al)'dP(z)
Thus we get a contradiction and al < 1 / 3 is proved. In a similar fashion one can show
that a, > 213. By the preceding considerations we get
we therefore deduce
By moving aj+l to 1/3 and aj+z to 2/3 we could strictly reduce the quantization error
which is a contradiction to the optimality of a.
+ +
Since k # j , we have k = j 1 or k = j 2. In any case (i) is proved.
Now suppose that 1 / 3 < aj+l < 2/3. Since P ( M ( a j + l ) ) > 0 we know that
f(aj+l + +
a j ) < 1 / 3 or f ( a j + l aj+z) > 2/3. Assume that only one of these in-
equalities holds, say the first one. Then the quantization error can be strictly reduced
by moving aj+l to 1/3, which yields a contradiction to the optimality of a. Similarly
we can proceed in the second case. Thus statement (ii) follows. 0
For the following considerations we will need two more technical lemmas.
(i) $1- ); = -
1 1- (1/6)n
5 (l-(l,2)n)
and
(ii) F (5 t) =
1
) ail t E I O , ~ ] .
~ ( t for
F(t + h) - F ( t )
and
P
([
0,l--
3 =1--
1
2n
[-
J +4
1 $7 ? 1 - &r
2 dP(x) .
(4.3)
GrafILuschgy, The Quantization of the Cantor Distribution 121
(4.5)
sk
1
= -
2 J 1
-2
3
+ -32 d P ( 2 ) .
This implies
1 zdP(z) = c s k
k=l
n
[0,l - j\.]
(4.9)
-
k=l
5[
k=l
(;)k - 12 1"
"( s
- -1. 1 -(1/2)n 5
- -. 1 -(1/6)n
2 1-1/2 12 1 - 1 / 6
= 1- (a)" - f (I - (;)n).
122 Math. Nachr. 183 (1997)
(4.10)
> t.
For 1/18 5 t < 19/162 Lemma 4.2 yields
1 1 1
Graf/Luschgy, The Quantization of the Cantor Distribution 123
= -
23 - -1 - -2 > t.
162 6 34
Thus the lemma is proved. 0
Since M(a1)f l [0,1] = [0, ;(a1 + a2)] it follows from Theorem 2.6(iii) and Lemma 4.2
that
= P ([o,
1
-1) 1 zdP(z) = F (f (a1 + a2)) 2 F (a + f a1) = f (al).
[o,*]
- 5 2 2 5 - -2 -
> - x
6 34 3 - 6 34
Since Q = {al,a2,a3} is optimal, we have (using (4.17), (4.18) and (4.19) for the last
inequality)
+is1(z - g)' d P ( z )
We evaluate these integrals by Lemma 3.4 and obtain
(4.20)
v
3
2 & [(5- J + is; + (- f - a) + is; + - 5 ) '+ ;]
23 1 1 1 23 1
(4.22) V, =
1
5 / zE(x - 3a)'dP(z) +-
18
/ min (z- (3a - 2 ) ) ' d P ( z ) .
a ~ a 2
If 3al is not an optimal set of j-means, then we could find a set P C IR with card (p) =
j and Jmin (z- b)2 dP(z) < min (z- 3a)2 dP(z). But then
bE/3 aEat
(ip)
U a2 is a set of
126 Math. Nachr. 183 (1997)
cardinality n with J min (z- a)2dP(z)< min (z- a)z d P ( z ) ,which contradicts
aE+rOUaz aEa
the optimality of a. Similarly 3a2 - 2 can be shown to be an optimal set of ( n - j ) -
means. Thus (4.22) implies V, = &(q + V,-j).
Proposition 4.6. (i) The only optimal set of 1-means is { ?j}. Moreover, V1 = i.
(ii) The only optimal set of 2 -means consists of the midpoints of [0, and91 [i,
11.
Moreover, V, = iV1.
(iii) There are two optimal sets of 3-means. One consists of the midpoints
of [0,$1, [g, 91 and [$, 11, the other of the midpoints of [0, 83, [i,
g], and 11.[i,
Moreover, V, = b(V1 V2). +
Proof. (i) is true since 1/2 is the expected value of P (Lemma 3.4).
3
(ii) An optimal set of 2 -means does not contain a point from ] $, [ (Lemma 4.1).
Thus Lemma 4.5 together with (i) yields that the set of midpoints of [0, and 91 [g, 11
is the only candidate for an optimal set of 2-means. By the existence result for
n-means (Proposition 2.1) it, therefore, is the only set of 2-means. Lemma 4.5 also
implies ~2 = &(vI +v I ) = iVi.
(iii) By Proposition 4.4 a n optimal set a of 3-means does not contain a point from
] 3 [. Set a1 = an [0, i]
i, and a2 = an [$,11. If card (a1)= 2 then 3al is an optimal
set of 2-means (Lemma 4.5). Hence (ii) implies that a1 consists of the midpoints of
[0, $1 and [g,93. Similarly, a2 consists of the midpoint of [$, 13. If card (a1)= 1then
we get the set of midpoints of [0,9], [$, g], and [$,1] for a. By Lemma 4.5 there
& +
are no other possibilities for an optimal set of 3 -means andV3 = (K &). Using
the last equality and Proposition 3.7 we see that the two sets are indeed optimal sets
of 3 - means. 0
P r o o f . For n < 4 this follows from Lemma 4.1 and Proposition 4.4. For n 2 4 let
j be as in Lemma 4.1. Assume aj+l E ] $ [. i,
Case 1: aj+l >_ 1/2.
Then
Graf/Luschgy, The Quantization of the Cantor Distribution 127
(5.3) v, 1 J (x- $) d P ( x ) .
2
[#41
Applying Corollary 3.3 to the last integral we get
1 9
Theorem 5.2. Forn E N wath n 2 1 let l(n)E IN satisfy 2'(4 5 n < 2'(")+l. A set
a c IR is an optimal set of n -means if and only if there exists a subset I C (1, 2}1(n)
with card ( I ) = n - 2"(") and such that a consists of all midpoints of intervals J, with
u E {l,2}1(n)\I and all midpoints of the children of J, with u E I (i. e. a = ~ ~ ( 1 ) ) .
Moreover,
Proof. We will proceed by induction on n. For n < 4 the statement of the theorem
is proved in Proposition 4.6.
Suppose that the assertion of the theorem holds for all m < n, n 2 2. According to
Proposition 5.1 and Lemma 4.5 there exists a j E (1, . . . ,n - 1) with
1
(5.4) vn = + Vn-j).
i&(b
Without loss of generality we may assume j 2 n - j .
We will show that
128 Math. Nachr. 183 (1997)
In particular we have
(5.8) k 2 1.
Using Corollary 3.8 and the induction hypothesis, (5.4) implies
(5.9)
(5.13)
and
a = a,(I).
That every set a as described in the statement of the theorem is an optimal set of
n-means follows from (5.13) and Proposition 3.7. Thus Theorem 5.2 is proved.
8 + 4P 5 12 117
and
17 5 2 * ( 8 + 4 P ) .
(ii) A direct calculation shows f(1) = = f(2). Now
17
f’(x) = -za-1 -
36P
Thus
1 1
f‘(1) = --(17-(8+4p)) = -(9-4P) >0
360 36P
and
17 1 153 - 144 - 72P
36P 2
1 = < 0.
72P
Moreover, for z E [l,21,
1 17
f’(z) = 0 - ~ $ - ~ ( 1 7 - ( 8 + 4 P ) z ) = 0- x = -
36P 8+4P’
This implies that f has its maximum on [I, 21 at the point 17/(8 + 4p) and that
f([l,21) is as described in (ii). 0
Theorem 6.3. The set of accumulation points of the sequence (-$Vn)nEIN equals
GrafILuschgy, The Quantization of the Cantor Distribution 131
1 8 1 7 8
= 82' (T - p)
Now
212' 5 x2' < 2'2' +1
hence
2 - 31 < 5' 5 x
and, therefore, lim xl = x.
1-00
Since f is continuous this implies
lim nlB V,, = f(x) = y .
1-00
Now let y be an accumulation point of the sequence (nPVn) . Then there exists
nElN
(
a subsequence nk8 V n k k) E I N with
y =
lim n 8
k400
k v n , = lim f(xk,) =
1-00
f( I-cx,
lim x k , ) E [i,j(m)].17
0
Ln-8 5 V, 5 - n -8
72 8
hence
1 9
1
- log
1
- 1
- - log n 5 log v,1125. - log - - -1 log n .
2 72 p 2 8 P
This implies
- log n log n - log n
& log & - + log n
I -
log (v:12)
I & log - log n ' :+
Since
and
lim
n-+w + log-8 log- n
log n = B
we get
log n
lim - = P
"-+O0 log (VY")
Graf/Luschgy, T h e Quantization of t h e Cantor Distribution 133
Acknowledgements
We are indebted to the referees for valuable suggestions concerning an improvement of the
first version of this paper.
References
ABUT,H. (ED.): Vector Quantization. IEEE Press, New York 1990
BUCKLEW, J. A.. and Wise, G. L. : Multidimensional Asymptotic Quantization with r - t h Power
Distortion Measures, IEEE Trans. Inform. Theory 28 (1982), 239 - 247
GRAF,S., and LUSCHGY, H. : Foundations of Quantization for Random Vectors, Preprint No.
16,Angew. Mathematik und Informatik, Universitat Miinster 1994
GRAF,S., and LUSCHGY, H. : Consistent Estimation in the Quantization Problem for Random
Vectors, in: Transactions of the 12th Prague Conference on Information Theory, Statistical
Decission Functions, Random Processes, Acad. Sci. Czech. Republic & Charles Univ. Prague,
Prague 1994,pp. 84-87
HUTCHINSON,
J. : Ftactals and Self-Similarity. Indiana Univ. J. 30 (1981), 713-747
POLLARD,D.: Quantization and the Method of k-means, IEEE Trans. Inform. Theory 28
(1982), 199-205
ZADOR. P. L. : Asymptotic Quantization Error of Continuous Signals and the Quantization
Dimension, IEEE Trans. Inform. Theory 28 (1982), 139-249