University of Groningen
Entropy of a bit-shift channel
Baggen, Stan; Balakirsky, Vladimir; Denteneer, Dee; Egner, Sebastian; Hollmann, Henk;
Tolhuizen, Ludo; Verbitskiy, Evgeny
Published in:
EPRINTS-BOOK-TITLE
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from
it. Please check the document version below.
Document Version
Publisher's PDF, also known as Version of record
Publication date:
2006
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):
Baggen, S., Balakirsky, V., Denteneer, D., Egner, S., Hollmann, H., Tolhuizen, L., & Verbitskiy, E. (2006).
Entropy of a bit-shift channel. In EPRINTS-BOOK-TITLE University of Groningen, Johann Bernoulli Institute
for Mathematics and Computer Science.
Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the
author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the
number of authors shown on this cover page is limited to 10 maximum.
Download date: 01-06-2020
IMS Lecture Notes–Monograph Series
Dynamics & Stochastics
Vol. 48 (2006) 274–285
c Institute of Mathematical Statistics, 2006
DOI: 10.1214/074921706000000293
Entropy of a bit-shift channel
Stan Baggen1 , Vladimir Balakirsky2 , Dee Denteneer1 ,
Sebastian Egner1 , Henk Hollmann1 , Ludo Tolhuizen1 and
Evgeny Verbitskiy1
Philips Research Laboratories Eindhoven and Eindhoven University of Technology
Abstract: We consider a simple transformation (coding) of an iid source called
a bit-shift channel. This simple transformation occurs naturally in magnetic
or optical data storage. The resulting process is not Markov of any order. We
discuss methods of computing the entropy of the transformed process, and
study some of its properties.
Results presented in this paper originate from the discussions we had at the “Coding
Club” – the weekly seminar on coding theory at the Philips Research Laboratories
in Eindhoven. Mike Keane, when his active travelling schedule permits, is also
attending this seminar. We would like to use this opportunity to thank Mike for
his active participation, pleasant and fruitful discussions, his inspiration which we
had a pleasure to share.
1. Bit-shift channel
In this paper we consider a simplified model for errors occurring in the readout of
digital information stored on an optical recording medium like the Compact Disk
(CD) or the Digital Versatile Disk (DVD). For more detailed information on optical
storage see [9] or [16].
On optical disks the information is stored in a reflectivity pattern. For technical
reasons, it is advantageous to use only two states, i.e. “low” and “high” reflectivity.
Figure 1 shows the disk surfaces for two types of the DVD’s. While the presence of
only 2 states greatly simplifies the detection of the state, it reduces the maximum
spatial frequency, and hence storage capacity.
In this situation it is better not to encode the information in the reflectivity state
itself but rather in the location of the transitions: The reflectivity pattern consists
of an alternating sequence of “high” and ”low” marks of varying length (an integer
multiple of some small length unit), while each mark exceeds a minimal length,
say d + 1 units. Hence, this “run-length limited” (RLL) encoding makes sure no
mark is too short for the disk while the information density is only limited by the
accuracy of determining the length of the marks, or equivalently the location of the
transitions. For technical reasons (to recover the length unit from the signal itself)
another constraint is imposed: No mark must exceed k + 1 units, k > d. (For the
CD, (d, k) = (2, 10).)
1 Philips
Research Laboratories, Prof. Holstlaan 4, 5656 AA, Eindhoven, The
Netherlands, e-mail: stan.baggen@philips.com e-mail: dee.denteneer@philips.com email:
sebastian.egner@philips.com
e-mail:
henk.d.l.hollmann@philips.com
e-mail:
ludo.tolhuizen@philips.com e-mail: evgeny.verbitskiy@philips.com
2 TU Eindhoven, 5600 MB, Eindhoven, The Netherlands, e-mail: V.B.Balakirsky@tue.nl
AMS 2000 subject classifications: primary 94A17, 28D20; secondary 58F11.
Keywords and phrases: Markov and hidden Markov processes, Entropy, channel capacity, Gibbs
vs. non-Gibbs.
274
Entropy of a bit-shift channel
275
Fig 1. Images of DVD disks. The left image shows a DVD-ROM. The track is formed by pressing
the disk mechanically onto a master disk. On the right is an image of a rewritable disk. The
resolution has been increased to demonstrate the irregularities in the track produced by the laser.
These irregularities lead to higher probabilities of jitter errors.
It is customary to describe RLL sequences by their transitions: A (d, k)-RLL
sequence has at least d and at most k ’0’s between ’1’s. So a “high” mark of 4
units, followed by a “low” of 3 units, followed by a “high” of 4 units correspond to
the RLL-sequence 100010010001 written to the disk.
At the time the RLL-sequence is read from disk, the transitions (the ‘1’s) might
be detected at different positions due to noise, inter-symbol interference, clock jitter,
and other distortions. In the simplest version of this “bit-shift channel model” each
‘1’ may be detected one unit early, on time, or one unit late with the probabilities
(ε, 1 − 2ε, ε), 0 ≤ ε ≤ 1/2, and the shifts are independent.
More formally, suppose X is the length of a continuous interval of low or high
marks on the disk. Then, after reading, the detected length is
Y = X + ωlef t − ωright ,
(1)
where ωlef t , ωright take values {−1, 0, 1}. And ω = 1, 0, −1 means that the transition between the ”low”-”high” or ”high”-”low” runs was detected one time unit
too early, correctly, or one unit too late, respectively. Note that for two consecutive
intervals ωright of of the first interval is ωlef t of the second. The simplest model
for the distribution of time shifts ωlef t is to assume that they are independent for
different intervals (runs), and
P(ωlef t = −1) = P(ωlef t = 1) = ε,
P(ωlef t = 0) = 1 − 2ε,
for some ε ∈ [0, 1/2].
An important question then is: Given (d, k), ε, and some distribution for the
input sequences (e.g. run-lengths uniformly distributed in {d, . . . , k}), what is the
mutual information between input and output sequences? In other words, how much
can be learned about the input from observing the output, on average. The problem
of computing the mutual information is equivalent to computation of the entropy
of the output sequence, see [2].
The supremum of this mutual information over all possible measures on space
of input sequences is called “channel capacity”.
276
S. Baggen et al.
1.1. Model
Let us describe the bit-shift channel as a continuous transformation (factor) of a
certain subshift of finite type. Let A = {d, . . . , k}, where d, k ∈ N, d < k and d ≥ 2.
The input space then is A Z = {x = (xi ) : xi ∈ A }. Consider also a finite alphabet
Ω with 9 symbols
Ω = {(−1, −1), (−1, 0), (−1, 1), (0, −1), (0, 0), (0, 1), (1, −1), (1, 0), (1, 1)} .
Finally, consider a subshift of finite type ΩJ ⊂ ΩZ defined as
ΩJ = (ωn ) ∈ ΩN : ωn,2 = ωn+1,1 for all n ∈ Z ,
where ωn = (ωn,1 , ωn,2 ). The factor map φ is defined on A Z × ΩJ as follows:
y = φ(x, ω) with
(2)
yn = xn + ωn,1 − ωn,2 for all n.
Note that the output space O = φ(A Z × ΩJ ) is a subshift of B Z , with B =
{d − 2, . . . , k + 2}. Clearly, O = B Z . For example, (d − 2, d − 2) cannot occur in
any output sequence. Indeed, if yn = d − 2, then xn = d, ωn,1 = −1 and ωn,2 = 1.
But then, yn+1 = xn+1 + ωn+1,1 − ωn+1,2 = xn+1 + ωn,2 − ωn+1,2 ≥ d + 1 − 1 = d.
With a similar argument, one concludes that for any L ≥ 1
[d − 2, d, . . . , d, d − 2] or [d − 2, d, . . . , d, d − 1]
L times
L times
do not occur in any output sequence y ∈ O. Therefore, there is an infinite number
of minimal forbidden words, i.e., forbidden words all of whose subwords are allowed.
Hence, O is not a subshift of finite type. However, since A Z × ΩJ is a subshift of
finite type, O is sofic [13].
1.2. Capacity of a bit-shift channel
Suppose J is a measure on ΩJ . For example, in this paper we will be mainly interested in Markov measures Jε on ΩJ , obtained in a natural way from Bernoulli
measures on {−1, 0, 1} with probabilities ε, 1 − 2ε, and ε, respectively. If P is a
translation invariant measure on A Z , then we obtain a measure Q on O, which is
the push forward of P × J. We use a standard notation Q = (P × J) ◦ φ−1 .
From the information-theoretical point of view, an important quantity is the
capacity of the channel. The capacity of a bit-shift channel specified by J is defined
as
(3)
Cbitshift (J) = sup h (P × J) ◦ φ−1 − h(J),
P∈P(A Z )
where the supremum is taken over P(A Z ) – the set of all translation invariant
probability measures on A Z , and h(·) is the entropy.
Even for ’Bernoulli’ measures Jε , the capacity Cbitshift (Jε ) is not known. It is
relatively easy to see that the supremum in (3) is achieved. However, the properties
of maximizing measures are not known. It is expected that maximizing measures
are not Markov of any order. Finally, if one is interested in topological entropy of
O:
htop (O) = sup h(Q),
Q∈P(O)
then htop (O) is easily computable using standard methods [13] or using the efficient
numerical approach of [8].
Entropy of a bit-shift channel
277
2. Entropy of a bit-shift channel
Suppose that {Xn } are independent identically distributed random variables taking
values in A = {d, . . . , k}, and let P be the corresponding distribution. What is the
entropy of Q = (P × Jε ) ◦ φ−1 ?
Note that (Xn , ωn ) is a Markov chain, and, hence, Yn , given by (2), is a function
of a Markov chain.
Let us start by recalling some methods of computing the entropy of processes
which are functions of Markov chains. Suppose X = {Xn }, n ∈ Z, is a stationary
ergodic Markov chain taking values in a finite alphabet A . Let φ : A → B be
some map, and consider a process Y = {Yn }, defined by
Yn = φ(Xn )
for all n ∈ Z.
The following result [3, 4], see also [5, Theorem 4.4.1], provides sharp estimates on
the entropy of Y .
Theorem 2.1. If X is a Markov chain and Y = φ(X ), then for every n ≥ 1 one
has
H(Y0 |Y1 , . . . , Yn−1 , Xn ) ≤ h(Y ) ≤ H(Y0 |Y1 , . . . , Yn−1 , Yn ).
Moreover, as n ր ∞
H(Y0 |Y1 , . . . , Yn−1 , Xn ) ր h(Y ),
H(Y0 |Y1 , . . . , Yn−1 , Yn ) ց h(Y ).
Birch [3, 4] has shown that under some additional conditions,the convergence is
in fact exponential:
|h(Y ) − H(Y0 |Y1 , . . . , Yn−1 , Xn )| ≤ Cρn ,
|h(Y ) − H(Y0 |Y1 , . . . , Yn−1 , Yn )| ≤ Cρn ,
where ρ ∈ (0, 1) is independent of the factor map φ.
Let us give a proof of Theorem 2.1, since it is very short and provides us with
some useful intuition.
Proof of Theorem 2.1. An upper estimate of h(Y ) in terms of H(Y0 |Y1 , . . . , Yn )
and the monotonic convergence H(Y0 |Y1 , . . . , Yn ) to h(Y ) are standard facts. For
the lower estimate we proceed as follows: for any m ∈ N one has
H(Y0 |Y1 , . . . , Yn−1 , Xn )
= H(Y0 |Y1 , . . . , Yn−1 , Xn , . . . , Xn+m )
= H(Y0 |Y1 , . . . , Yn−1 , Yn , . . . , Yn+m , Xn , . . . , Xn+m )
≤ H(Y0 |Y1 , . . . , Yn−1 , Yn , . . . , Yn+m ),
(4)
(5)
(6)
where in (4) we used the Markov property of X , and (5), (6) follow from the
standard properties of conditional entropies. Since
h(Y ) = lim H(Y0 |Y1 , . . . , Yn−1 , Yn , . . . , Yn+m ),
k→∞
we obtain the lower estimate of h(Y ). Moreover, using standard properties of conditional entropies, we immediately conclude that H(Y0 |Y1 , . . . , Yn−1 , Xn ) is monotonically increasing with n.
278
S. Baggen et al.
To prove that the lower bound actually converges to h(Y ), we proceed as follows.
Note that
H(Xn+1 ) ≥ H(Xn+1 |Yn ) − H(Xn+1 |Yn , . . . , Y0 )
n−1
H(Xn+1 |Yn , . . . , Yi+1 ) − H(Xn+1 |Yn , . . . , Yi )
=
i=0
n−1
=
H(Xn−i+1 |Yn−i , . . . , Y1 ) − H(Xn−i+1 |Yn−i , . . . , Y0 )
i=0
n
=
n
cj ,
H(Xj+1 |Yj , . . . , Y1 ) − H(Xj+1 |Yj , . . . , Y0 ) =
j=1
j=1
where
cj = H(Xj+1 |Yj , . . . , Y1 ) − H(Xj+1 |Yj , . . . , Y0 ),
Since cj ≥ 0 and
n → ∞. Moreover,
n
j=1 cj
j = 1, . . . , n.
< H(X1 ) < ∞ for all n, we conclude that cn → 0 as
cn = H(Xn+1 |Yn , . . . , Y1 ) − H(Xn+1 |Yn , . . . , Y0 )
= H(Y1 , . . . , Yn , Xn+1 ) − H(Y1 , . . . , Yn )
− H(Y0 , Y1 , . . . , Yn , Xn+1 ) + H(Y0 , Y1 , . . . , Yn )
= H(Y0 |Y1 , . . . , Yn ) − H(Y0 |Y1 , . . . , Yn , Xn+1 ).
Finally, since H(Y0 |Y1 , . . . , Yn ) converges to h(Y ), so does H(Y0 |Y1 , . . . , Yn , Xn+1 ).
Let us conclude this section with one general remark. Suppose Y is a factor of
X , i.e. Y = φ(X ), where X is some ergodic process. For n, m ∈ N, let
dn,m = H(Y0 |Y1 , . . . , Yn , Xn+1 , . . . , Xn+m ).
Note that dn,m ≥ dn,m+1 ≥ 0, and hence limm→∞ dn,m =: Dn exists. Since for any
n, m ∈ N
dn,m = H(Y0 |Y1 , . . . , Yn , Xn+1 , . . . , Xn+m ) ≤ H(Y0 |Y1 , . . . , Yn , Yn+1 , . . . , Yn+m ),
we conclude that Dn ≤ h(Y ). Note also that since dn,m ≤ dn+1,m−1 , one has
Dn+1 ≥ Dn .
The natural question is under which conditions does Dn converge to h(Y ) as
n → ∞. For this we need a certain regularity of the conditional probabilities of the
X -process. For example, if conditional probabilities are continuous, i.e., if
rn =
sup
′
′
sup P(X0 |X1 , . . . , Xn , Xn+1
, . . .)
, Xn+2
X0 ,...,Xn X ′ ,X”
′′
′′
, . . .) → 0,
, Xn+2
− P(X0 |X1 , . . . , Xn , Xn+1
n → ∞,
then Dn → h(Y ). Gibbs measures and g-measures (see Section 4) have continuous
conditional probabilities.
279
Entropy of a bit-shift channel
2.1. Entropy via a prefix code
In this section we recall the approach to efficient computation of entropies of factor
processes Y = φ(X ), where X is Markov, which was originally proposed in [2, 7].
The inequalities of Theorem 2.1 can be rewritten as follows
P(y1n )HP(·|y1n ) (Y0 ),
P(y1n )HP(·|y1n ) (Y0 |Xn+1 ) ≤ h(Y ) ≤
(7)
y1n ∈Bn
y1n ∈Bn
where we use the following notation
y1n = (y1 , y2 , . . . , yn ) ∈ B n ,
P(y1n ) = P(Y1 = y1 , . . . , Yn = yn ),
P(·|y1n ) = P(·|Y1 = y1 , . . . , Yn = yn ).
The subindex P(·|y1n ) in (7) stresses that the entropy of Y0 and the conditional
entropy of Y0 and Xn+1 is computed using P(·|y1n ).
Note that the sum in (7) is taken over elements of a partition of B N into cylinders
of length n:
Un = [y1n ] y1n ∈ B n , [y1n ] = {ỹ ∈ B Z : ỹ1 = y1 , . . . , ỹn = yn }.
In fact, an estimate similar to (7) holds for any partition of B Z into cylindric sets,
see [7, Theorem 1].
Theorem 2.2. Let W be a finite partition of B Z into cylindric sets:
M
.
W = [wi ], wi = (wi,1 , . . . , wi,li )
i=1
Then
h(w),
h1 (w) ≤ h(Y ) ≤
where
(8)
w∈W
w∈W
h1 (w) = P(Y1 . . . Y|w| = w)H(Y0 |Y1 . . . Y|w| = w, X|w|+1 ),
h(w) = P(Y1 . . . Y|w| = w)H(Y0 |Y1 . . . Y|w| = w).
Theorem (2.2) leads to the following algorithm. Suppose W is some partition
into cylinders. We can refine the partition W by removing a certain word w from
W and adding all words of the form wb, where b ∈ B, i.e.,
W ′ = W \ {w} ∪ {wb| b ∈ B}.
(9)
Suppose {Wk }k≥1 is a sequence of partitions such that for each k, Wk+1 is a refinement of Wk as in (9), and at each step a word w ∈ Wk is selected such that
h(w) − h1 (w) = max h(u) − h1 (u) .
(10)
u∈Wk
The greedy strategy (10), as well as some other strategies (e.g, uniform, |w| =
minu∈W |u|), guarantees the convergence of the upper and lower estimates in (8),
i.e.,
h(w) − h1 (w) = 0.
lim
k→∞
w∈Wk
280
S. Baggen et al.
2.2. Entropy via renewal times.
As before, suppose that {Xn } are independent and identically distributed in {d,
. . . , k} with P(Xi = ℓ) = pℓ , ℓ = d, . . . , k. Assume also that pd > 0.
Another method for estimating the entropy is based on the following observation.
Suppose Yn = d − 2 for some n. This implies that Xn = d, ωn = (−1, 1). Since the
sequence {ωk } forms a Markov chain, (. . . , ω1 , . . . , ωn−1 ) and (ωn+1 , ωn+2 , . . .), are
independent given ωn . Therefore, since Yn = d−2 implies ωn = (−1, 1), we conclude
that (. . . , ω1 , . . . , ωn−1 ) and (ωn+1 , ωn+2 , . . .) are independent given Yn = d − 2.
Moreover, since Xn form an iid sequence, (. . . , Yn−2 , Yn−1 ) and (Yn+1 , Yn+2 , . . .)
are also independent given Yn = d − 2.
Consider our subshift O, and a set C = [d − 2] = {y ∈ O : y0 = d − 2}. Let
S : O → O be a left shift, and consider an induced map SC on C:
SC (y) = S RC (y) (y),
where RC (y) = min{k ≥ 1 : yk = d − 2}. On C, the induced map SC has a natural
Bernoulli partition
[d − 2, y1 , . . . , yr , d − 2] : yj ∈ B, yj = d − 2, j = 1, . . . , r, r ∈ N .
Finally, by the Abramov formula [1]
∞
h(Q) = −
Q([d − 2, y1 , . . . , yr , d − 2]) log Q [d − 2, y1 , . . . , yr , d − 2])
r=1 y1 ,...,yr =d−2
+ Q([d − 2]) log Q([d − 2]).
(11)
Computation of entropy of images of Markov measures using the renewal times
and induced map was used in the past, see e.g. [15]. However, in the case of bit-shift
channel, the method based on (11) is extremely inefficient.
3. Numerics
For illustration we present a numerical computation of the entropy using the prefix
code method described in Section 2.1.
The algorithm constructs a sequence of refined partitions Wk as described above.
A particularly useful strategy is given by (10). This “greedy” heuristics selects the
cylinder most responsible for the difference in upper and lower bound, in the hope
that refining this cylinder will tighten the bounds quickly. This strategy is not
optimal (as can be shown by example) but it has three advantages. Firstly, the
bounds converge (eventually). Secondly, if in a particular word w ∈ W , the last
symbol is the ”renewal” symbol d−2 (similarly k+2), this word will never be refined
again. Thirdly, the next cylinder to expand can be found quickly by representing
W as a “priority queue” data structure.
For illustration, we run the algorithm for the model of the jitter channel described in Section 1.1. The parameters are inspired by the Compact Disc: The errorcorrection and modulation system of the CD essentially produces an RLL-sequence
with parameters (d, k) = (2, 10). We model the run-lengths as independent identically distributed random variables with probabilities pℓ = p2 γ ℓ−2 , ℓ ∈ {2, . . . , 10},
where γ = 0.658 and p2 is chosen such that
pℓ = 1. This truncated geometric
281
Entropy of a bit-shift channel
2.6
I(X;Y)
rate EFM/CD
2.4
bits/run
2.2
2
1.8
1.6
1.4
0.001
0.01
epsilon
0.1
Fig 2. Mutual information I(Y ; X ) = h(Y ) − h(Jε ) as a function of ε, for (d, k) = (2, 10) and
the truncated geometric distribution for X .
model with γ = 0.658 is a very good approximation of the (marginal) run-length
distribution observed on the CD.
Figure 2 shows the mutual information
I(Y ; X ) = h(Y ) − h(Jε ) = h(Y ) + 2ε log ε + (1 − 2ε) log(1 − 2ε)
as a function of ε. The horizontal line represents the rate designed for the last stage
of the encoding used in the CD (the so-called EFM code). If the jitter is so strong
that the mutual information drops below this rate, reliable decoding is impossible.
In practice, similar plots are used to evaluate the performance of particular encoding
schemes with respect to various distortions introduced by the physical channel.
Figure 3 compares the greedy and uniform heuristics. The standard estimate
H(Y0 |Y1 , . . . , Yn ) in fact corresponds to the uniform refinement. Observe a superior
rate of convergence for the greedy refinement strategy.
4. Thermodynamics of jittered measures
Bernoulli and Markov measures belong to a wider class of the so-called Gibbs
measures. Bernoulli and Markov measures are also examples of g-measures.
In the seminal paper [10] M. Keane introduced a class of g-measures. These
are the measures whose conditional probabilities are given by a continuous and
strictly positive function g. For subshifts of finite type, the theory of g-measures
is extensive. For sofic subshifts, the problem of defining g-measures is much more
complicated. For the first results see the paper by W. Krieger [11] in this volume.
The thermodynamic formalism allows to look at Gibbs measures from two different sides. First of all, locally, through the conditional probabilities; and secondly,
globally, through the variational principles.
Contrary to the class of g-measures, the class of Gibbs measures for a sofic
subshift is well defined. The natural question is whether a ”jittered” measure Q =
(P × Jε ) ◦ φ−1 is Gibbs. If the measure is Gibbs and the potential is identified, then,
282
S. Baggen et al.
(upper - lower) bound [bit/run]
0.1
greedy
uniform
0.01
0.001
0.0001
1e-05
1e-06
1e-07
1e-08
10
100
[t]
1000
# cylinders
10000
100000
Fig 3. Difference between upper and lower bounds on entropy as a function of |W |, the number
of cylinders, in partitions built by the greedy and uniform refinement strategies.
using the variational principle, we obtain another method of computing the entropy
of Q.
The subshift O ⊂ B Z satisfies a specification property (as a factor of a subshift
of finite type A Z × ΩJ which has a specification property [6]). Hence the results
of [17] on existence of Gibbs measures for expansive dynamical systems with the
specification property are applicable. If Q would be a Gibbs measure for potential
f from the Bowen class V(O), then there would exist positive constants c, C such
that for any n > 0 and every y ∈ O
c≤
exp(
n
k=0
Q([y0 , . . . , yn ])
≤ C,
f (S i (y)) − (n + 1)P (f ))
(12)
where S : O → O is a left shift and P (f ) is the topological pressure of f . As a
corollary of (12) one easily concludes that
log Q(y0 |y1 , . . . , yn ) = log
Q([y0 , . . . , yn ])
Q([y1 , . . . , yn ])
should be bounded, which is not the case, see [18]. Hence, Q is not Gibbs for any
potential from the large class of potentials V(O). Examples of measures Q such that
estimates similar to (12) hold for some continuous f and subexponential bounds cn
and Cn (limn n−1 log cn = limn n−1 log Cn = 0) have been considered [14, 20], and
were shown to be weakly Gibbs. It is not known whether Q for the bit-shift channel
is weakly Gibbs for some continuous potential f .
Nevertheless, the thermodynamic formalism could be useful in estimating the
capacity of the bit-shift channel. We recall the notion of a compensation function
and some results summarized in [19]. First of all, we define the topological pressure
of real-valued continuous functions defined on A Z × ΩJ and O. If f ∈ C(A Z × ΩJ ),
g ∈ C(O), the topological pressures of f and g are defined as
Z
f dS , P (g|O) = sup h(Q) + g dQ ,
P (f |A × ΩJ ) = sup h(S) +
S
Q
A Z ×ΩJ
O
Entropy of a bit-shift channel
283
where the suprema are taken over all translation invariant measures on A Z × ΩJ
and O, respectively. A measure S on A Z × ΩJ is called an equilibrium state for
f ∈ C(A Z × ΩJ ) if
Z
P (f |A × ΩJ ) = h(S) + f dS.
(13)
We define equilibrium states on O in a similar way. It is well known that every
measure is an equilibrium state: for every translation invariant measure S on A Z ×
ΩJ one can find a continuous function f : A Z × ΩJ → R such that (13) holds.
Moreover, for any S = P × Jε , such an f is of a special form
f (x, ω) = f˜(x) + jε (ω),
where f˜ : A Z → R and jε : ΩJ → R are continuous functions. (In fact, jε can be
found explicitly.)
A continuous function F : A Z × ΩJ → R is a compensation function if
P (F + g ◦ φ|A Z × ΩJ ) = P (g|O)
for all g ∈ C(O). Compensation functions exist for factor maps defined on shifts of
finite type [19].
An important result is the so-called relative variational principle [12, 19], which
in our notation states that F is a compensation function if and only if for any
invariant measure Q on O one has
h(Q) = sup h(S) + F dS S ◦ φ−1 = Q .
S
Suppose F is a compensation function, then for Q = (P × Jε ) ◦ φ−1 we obtain
h(Q) ≥ h(P × Jε ) + F d(P × Jε ) = h(P) + h(Jε ) + Fε dP,
(14)
where Fε (x) = ΩJ F (x, ω)Jε (dω). For the capacity of the bit-shift channel we
obtain the following lower estimate
Cbitshift (Jε ) =
sup
h(Q) − h(Jε )
Q=(P×Jε )◦φ−1
≥ sup h(P) + Fε dP = P (Fε |A Z ).
(15)
P
An interesting question is whether the inequalities in (14) and (15) are strict. The
inequality (14) is most probably strict in the generic situation. Indeed, by Corollary
3.4 [19], if Q is an equilibrium state for g on O, and S is such that S ◦ φ−1 = Q and
h(Q) = h(S) + F d S,
then S is an equilibrium state for F + g ◦ φ, and conversely. On the other hand if
S = P × Jε , then S is an equilibrium state for f (x, ω) = f˜(x) + jε (ω). Therefore,
for the equality in (14), it is necessary that F (x, ω) + (g ◦ φ)(x, ω) and f˜(x) + jε (ω)
are physically equivalent, i.e., have the same set of equilibrium states. In fact, it is
quite difficult to imagine how for a given compensation function F of the bit-shift
284
S. Baggen et al.
channel and a generic g one could find f˜ to satisfy the requirement of physical
equivalence. On the other hand, it is not very difficult to see that in fact
Cbitshift (Jε ) = sup P (Fε |A Z ),
(16)
F
where the supremum is taken over all compensation functions F . Indeed, suppose
Q∗ = (P∗ × Jε ) ◦ φ−1 is a ‘maximal’ ergodic measure, i.e.,
Cbitshift (Jε ) = h(Q∗ ) − h(Jε ).
Then there exist continuous functions g ∗ ∈ C(O) and f˜∗ ∈ C(A Z ) such that Q∗
and P∗ are equilibrium states for g ∗ and f˜∗ , respectively. But then
F (x, ω) = f˜∗ (x) + jε (ω) − (g ∗ ◦ φ)(x, ω)
is the compensation function for which the maximum in (16) is attained. Thus
methods for dealing with factor systems developed in dynamical systems, could be
applied to estimate channel capacities. The practicality of such estimates depends
strongly on whether one is able to understand the structure of a class of compensation function for a given channel. Probably, in many concrete cases, a relatively
large family of compensation functions will suffice as well.
References
[1] Abramov, L. M. (1959). The entropy of a derived automorphism. Dokl.
Akad. Nauk SSSR 128, 647–650. MR0113984
[2] Baggen, S., and Balakirsky, V. (2003). An efficient algorithm for computing the entropy of output sequences for bitshift channels. Proc. 24th Int.
Symposium on Information Theory in Benelux , 157–164.
[3] Birch, J. J. (1962) Approximation for the entropy for functions of Markov
chains. Ann. Math. Statist. 33, 930–938. MR0141162
[4] Birch, J. J. (1963) On information rates for finite-state channels. Information
and Control 6, 372–380. MR0162651
[5] Cover, T. M., and Thomas, J. A. (1991). Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons Inc., New York.
MR1122806
[6] Denker, M., Grillenberger, C., and Sigmund, K. (1976). Ergodic Theory on Compact Spaces. Springer-Verlag, Berlin. Lecture Notes in Mathematics
527. MR0457675
[7] Egner, S., Balakirsky, V., Tolhuizen, L., Baggen, S., and Hollmann, H. (2004) On the entropy rate of a hidden Markov model. Proceedings
International Symposium Information Theory, ISIT 2004 .
[8] Froyland, G., Junge, O., and Ochs, G. (2001). Rigorous computation
of topological entropy with respect to a finite partition. Phys. D 154, 1–2,
68–84. MR1840806
[9] Immink, K. (1999). Codes for Mass Data Storage Systems. Shannon Foundation EEE Publishers, The Netherlands.
[10] Keane, M. (1972). Strongly mixing g-measures. Invent. Math. 16, 309–324.
MR0310193
[11] Krieger, W. (2006) On g-functions for subshifts. In Dynamics and Stochastics, IMS Lecture Notes-Monograph Series, Vol. 48, 306–316.
Entropy of a bit-shift channel
285
[12] Ledrappier, F., and Walters, P. (1977). A relativised variational principle
for continuous transformations. J. London Math. Soc. (2) 16, 3, 568–576.
MR0476995
[13] Lind, D., and Marcus, B. (1995). An Introduction to Symbolic Dynamics
and Coding. Cambridge University Press. MR1369092
[14] Maes, C., Redig, F., Takens, F., van Moffaert, A., and Verbitski, E.
(2000). Intermittency and weak Gibbs states. Nonlinearity 13, 5, 1681–1698.
MR1781814
[15] Marcus, B., Petersen, K., and Williams, S. ( 1984). Transmission rates
and factors of Markov chains. In Conference in Modern Analysis and Probability (New Haven, Conn., 1982), vol. 26 of Contemp. Math. Amer. Math. Soc.,
Providence, RI, pp. 279–293. MR1369092
[16] Marcus, B. H., Roth, R. M., and Siegel, P. H. (2001). An Introduction
to Coding of Constrained Systems. Lecture Notes, fifth edition.
[17] Ruelle, D. (1992). Thermodynamic formalism for maps satisfying positive
expansiveness and specification. Nonlinearity 5, 6, 1223–1236. MR1192516
[18] van Enter, A. C. D., and Verbitskiy, E. A. (2004). On the variational
principle for generalized Gibbs measures. Markov Process. Related Fields 10,
3, 411–434. MR2097865
[19] Walters, P.(1986). Relative pressure, relative equilibrium states, compensation functions and many-to-one codes between subshifts. Trans. Amer. Math.
Soc. 296, 1, 1–31. MR837796
[20] Yuri, M. (1999). Thermodynamic formalism for certain nonhyperbolic maps.
Ergodic Theory Dynam. Systems 19, 5, 1365–1378. MR1721626