EE/Ma 126b Information Theory - Homework Set #4

EE/Ma 126b Information Theory - Homework Set #4
Ling Li, ling@cs.caltech.edu
January 31, 2001
4.1 One bit quantization of a single Gaussian random variable.∗ Let the boundary be x. We
use one symbol to represent all t ≤ x and another symbol for t > x. Since squared error
measure
R x is used, the two conditional expectations
R ∞ f (t)should be the reproduction points.1 That is,
f (t) 2 2
x0 = −∞ 1/2+A(x) tdt for t ≤ x and x1 = x 1/2−A(x) tdt for t > x, where f (t) = √ 2 et /2σ
Rx 2πσ
is the probability density function and A(x) = 0 f (t)dt. Thus the distortion D is the weighted
sum of two conditional variances:
"Z 2 #
x Z x
f (t) 2 f (t)
D(x) = (1/2 + A(x)) t dt − tdt
−∞ 1/2 + A(x) −∞ 1/2 + A(x)
"Z 2 #
∞ Z ∞
f (t) f (t)
+ (1/2 − A(x)) t2 dt − tdt
x 1/2 − A(x) x 1/2 − A(x)
Z ∞
σ 4 f 2 (x) σ 4 f 2 (x)
= f (x)t2 dt − −
−∞ 1/2 + A(x) 1/2 − A(x)
2 2
4σ 4 f 2 (x) 2σ 2 e−x /σ
= σ2 − = σ 2
− . (1)
1 − 4A2 (x) π(1 − 4A2 (x))
2 /σ 2
Now we want to prove e−x ≤ 1 − 4A2 (x). Without loss of generality, assume x ≥ 0.
Z x 2 Z x Z x
2
4A (x) = f (t)dt = f (u)f (v)dudv
−x −x −x
√ √ !
Z 2x Z 2x2 −u2 u2 +v 2
1
≤ √ √ e− 2σ 2 dv du (2)
2πσ 2 − 2x − 2x2 −u2
√
Z 2π Z 2x 2
1 − r2 2 /σ 2
= dθ e 2σ rdr = 1 − e−x .
2πσ 2 0 0
The reason for (2) is that

√ f > 0 and the square of [−x, x] × [−x, x] is inside the circle with
−x 2 /σ 2
center (0, 0) and radius 2x. So e ≤ 1 − 4A2 (x) with equality iff x = 0. Then from (1),
π−2 2
D(x) ≥ π−2 2
π σ with equality iff x = 0. So, the minimum distortion is σ , with x = 0
π
r
R0 2
and x0,1 = ±2 −∞ f (t)tdt = ± σ.
π
∗
Prof. Malik Magdon-Ismail gave me the idea of first setting the boundary, instead of setting the two reproduction
points. The last way wasted me more than 12 hours.
1
4.2 Rate distortion for uniform source with hamming distortion. The distortion with the Hamming
measure is D̄ = Ed(X, X̂) = Pr{d(X, X̂) = 1}. We have
I(X; X̂) = H(X) − H(X|X̂) (definition of I(X; X̂))

= H(X) − H(X − X̂|X̂) (previous homework)
= H(X) − H(X − X̂, d(X, X̂)|X̂) (d(X, X̂) is a function of X − X̂)
= H(X) − H(d(X, X̂)|X̂) − H(X − X̂|d(X, X̂), X̂) (chain rule of entropy)
≥ H(X) − H(d(X, X̂)) − H(X − X̂|d(X, X̂), X̂) (conditioning reduces entropy)
= H(X) − H(D̄) − H(X − X̂|X̂, d(X, X̂) = 1)D̄ (D̄ = Pr{d(X, X̂) = 1} and
d(X, X̂) = 0 ⇒ X − X̂ = 0)
≥ log m − H(D̄) − D̄ log(m − 1). (p(X)
is uniform and for given X̂
{X − X̂ : X 6= X̂} = m − 1)

Notice that
d[log m − H(D̄) − D̄ log(m − 1)] D̄
= log − log(m − 1),
dD̄ 1 − D̄
1 1
which is less than 0 when 0 < D̄ < 1 − m . Thus when D ≤ 1 − m , we have
R(D) = min I(X; X̂) ≥ log m − H(D) − D log(m − 1).

D̄≤D
We can design distributions p(X|X̂) and p(X̂) to achieve the minimum I(X; X̂). Let p(X̂) be
1
uniform distribution. For 0 ≤ D ≤ 1 − m , set

1 − D, X = X̂;
p(X|X̂) =
D/(m − 1), X 6= X̂.
Thus
X
p(X = x) = p(X = x|X̂ = x̂)p(X̂ = x̂)
x̂
 
1  X
= p(X = x|X̂ = x) + p(X = x|X̂ = x̂)
m
x̂6=x

1 D 1
= 1 − D + (m − 1) × = ,
m m−1 m
and the distortion is
X
Pr{X 6= X̂} = 1 − Pr{X = X̂} = 1 − p(X = x̂|X̂ = x̂)p(X̂ = x̂) = D.
x̂
So such distribution meets the requirements on the distribution of X and the distortion. And,
now
I(X; X̂) = H(X) − H(X|X̂)
D D
= H(X) − H(1 − D, ,..., )
m−1 m−1
D D
= H(X) + (1 − D) log(1 − D) + (m − 1) × log
m−1 m−1
= log m − H(D) − D log(m − 1).
2
1 1
Thus we know R(D) = log m − H(D) − D log(m − 1) for 0 ≤ D ≤ 1 − m. When D > 1 − m,
we can send nothing and simply choose X̂ at random. Thus the distortion is Pr{X 6= X̂} =
m−1 1
m < D. So obviously R(D) = 0 when D > 1 − m .
4.3 Erasure distortion. Let {0, E, 1} denote the set X̂ , where ‘E’ stands for erasure. Since d(0, 1) =
d(1, 0) = ∞, we must have p(0, 1) = p(1, 0) = 0 for a finite distortion. Thus
D = p(0, E) + p(1, E) = pX̂ (E),
and
I(X; X̂) = H(X) − H(X|X̂)

= 1 − H(X|X̂ = E)pX̂ (E)
≥ 1 − D. (3)
When p(X|X̂ = E) = 21 and D ≤ 1, we can set pX̂ (0) = pX̂ (1) = 1−D 2 and pX̂ (E) = D. Then
1 1 1
pX (x) = pX̂ (x) + 2 pX̂ (E) = 2 , meeting that X ∼ Bernoulli( 2 ). And now the equality of (3)
holds. Thus R(D) = 1 − D when 0 ≤ D ≤ 1. When D > 1, obviously R(D) = 0 .
A simple strategy to achieve R(D) is to erase X at random with probability 1 − R(D). Since
X is uniformly distributed, we have with such strategy, p(0, E) = p(1, E) = 12 (1 − R(D)) and
p(0, 1) = p(1, 0) = 0. Thus from the above discussion, the rate is 1 − (1 − R(D)) = R(D).
4.4 Bounds on the rate distortion function for squared error distortion. With D as the upper bound
of the distortion, we have
σ 2 (X − X̂) = E(X − X̂)2 − [E(X − X̂)]2 = E(X − X̂)2 ≤ D.

1 1
Thus h(X − X̂) ≤ 2 log 2πeσ 2 (X − X̂) ≤ 2 log(2πeD). So
I(X; X̂) = h(X) − h(X|X̂) = h(X) − h(X − X̂|X̂)

≥ h(X) − h(X − X̂)
1
≥ h(X) − log(2πeD),
2
and
1
R(D) = min I(X; X̂) ≥ h(X) − log(2πeD) .
E(X−X̂)2 ≤D 2
2 σ 2 −D
For Z ∼ N (0, σDσ
2 −D ) independent with X and X̂ = σ2
(X + Z), we have the distortion is
2
σ2 − D

2 D
E(X − X̂) = E X− Z
σ2 σ2
2 2 2
D σ −D
= E X + E Z
σ2 σ2
2 2 2
D 2 σ −D Dσ 2
= σ +
σ2 σ2 σ2 − D
= D.
3
And it is surprising to find out that for a constant a,†
h(X|aY ) = h(X|Y ), I(X; aY ) = I(X; Y ).
You can find the proof in the footnote. Thus the mutual information is
I(X; X̂) = I(X; X + Z)

= h(X + Z) − h(X + Z|X)
= h(X + Z) − h(Z)
1 Dσ 2 1 Dσ 2
≤ log 2πe(σ 2 + 2 ) − log 2πe 2 (4)
2 σ −D 2 σ −D
1 σ 2
= log .
2 D
The reason for (4) is that
Dσ 2
σ 2 (X + Z) = σ 2 (X) + σ 2 (Z) = σ 2 + .
σ2 − D
For such X̂ and I(X; X̂), the distortion is bounded by D. So we get
1 σ2
R(D) ≤ log .
2 D
2
Since for Gaussian random variable with variance σ 2 , R(D) = 12 log σD achieves the maximum
of R(D), it is harder to describe the Gaussian random variable than other random variables
with the same variance.
4.5 Properties of optimal rate distortion code. The conditions of equalities are listed at the right
† 1
Let Z = aY . Then fZ (z) = f
|a| Y
( az ), and fX,Z (x, z) = 1
f
|a| X,Y
(x, az ). Thus with variable replacing,
Z
fX,Z (x, z)
h(X|aY ) = − fX,Z (x, z) log dxdz
S fZ (z)
fX,Y (x, az )
Z
1 z
= − fX,Y (x, ) log dxdz
S |a| a fY ( az )
Z
fX,Y (x, y)
= − fX,Y (x, y) log dxdy
S fY (y)
= h(X|Y ).
This can also be proved by
I(X; aY ) = h(aY ) − h(aY |X) = h(Y ) + log |a| − h(Y |X) − log |a| = I(X; Y ).
4
side:
nR ≥ H(X̂ n ) (X̂ n is uniformly distributed)

≥ H(X̂ n ) − H(X̂ n |X n ) (X̂ n is a deterministic function of X n )
= H(X n ) − H(X n |X̂ n )
Xn Xn
= H(Xi ) − H(Xi |X̂ n , X1i−1 )
i=1 i=1
n
X n
X
≥ H(Xi ) − H(Xi |X̂i ) (independent encoding among Xi )
i=1 i=1
Xn
= I(Xi ; X̂i )
i=1
Xn
≥ R(Ed(Xi , X̂i )) (optimal for each single Xi )
i=1
n
!
1X
≥ nR Ed(Xi , X̂i ) (same distortion on each Xi )
n
i=1
= nR(D)

EE/Ma 126b Information Theory - Homework Set #4

Uploaded by

Copyright:

Available Formats

EE/Ma 126b Information Theory - Homework Set #4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EE/Ma 126b Information Theory - Homework Set #4

Uploaded by

Copyright:

Available Formats

EE/Ma 126b Information Theory - Homework Set #4

Ling Li, ling@cs.caltech.edu

January 31, 2001

The reason for (2) is that

I(X; X̂) = H(X) − H(X|X̂) (definition of I(X; X̂))

R(D) = min I(X; X̂) ≥ log m − H(D) − D log(m − 1).

D = p(0, E) + p(1, E) = pX̂ (E),

I(X; X̂) = H(X) − H(X|X̂)

σ 2 (X − X̂) = E(X − X̂)2 − [E(X − X̂)]2 = E(X − X̂)2 ≤ D.

I(X; X̂) = h(X) − h(X|X̂) = h(X) − h(X − X̂|X̂)

h(X|aY ) = h(X|Y ), I(X; aY ) = I(X; Y ).

I(X; X̂) = I(X; X + Z)

The reason for (4) is that

For such X̂ and I(X; X̂), the distortion is bounded by D. So we get

This can also be proved by

nR ≥ H(X̂ n ) (X̂ n is uniformly distributed)

You might also like