Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Entropy of a bit-shift channel

2006, Institute of Mathematical Statistics Lecture Notes - Monograph Series

University of Groningen Entropy of a bit-shift channel Baggen, Stan; Balakirsky, Vladimir; Denteneer, Dee; Egner, Sebastian; Hollmann, Henk; Tolhuizen, Ludo; Verbitskiy, Evgeny Published in: EPRINTS-BOOK-TITLE IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2006 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Baggen, S., Balakirsky, V., Denteneer, D., Egner, S., Hollmann, H., Tolhuizen, L., & Verbitskiy, E. (2006). Entropy of a bit-shift channel. In EPRINTS-BOOK-TITLE University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 01-06-2020 IMS Lecture Notes–Monograph Series Dynamics & Stochastics Vol. 48 (2006) 274–285 c Institute of Mathematical Statistics, 2006  DOI: 10.1214/074921706000000293 Entropy of a bit-shift channel Stan Baggen1 , Vladimir Balakirsky2 , Dee Denteneer1 , Sebastian Egner1 , Henk Hollmann1 , Ludo Tolhuizen1 and Evgeny Verbitskiy1 Philips Research Laboratories Eindhoven and Eindhoven University of Technology Abstract: We consider a simple transformation (coding) of an iid source called a bit-shift channel. This simple transformation occurs naturally in magnetic or optical data storage. The resulting process is not Markov of any order. We discuss methods of computing the entropy of the transformed process, and study some of its properties. Results presented in this paper originate from the discussions we had at the “Coding Club” – the weekly seminar on coding theory at the Philips Research Laboratories in Eindhoven. Mike Keane, when his active travelling schedule permits, is also attending this seminar. We would like to use this opportunity to thank Mike for his active participation, pleasant and fruitful discussions, his inspiration which we had a pleasure to share. 1. Bit-shift channel In this paper we consider a simplified model for errors occurring in the readout of digital information stored on an optical recording medium like the Compact Disk (CD) or the Digital Versatile Disk (DVD). For more detailed information on optical storage see [9] or [16]. On optical disks the information is stored in a reflectivity pattern. For technical reasons, it is advantageous to use only two states, i.e. “low” and “high” reflectivity. Figure 1 shows the disk surfaces for two types of the DVD’s. While the presence of only 2 states greatly simplifies the detection of the state, it reduces the maximum spatial frequency, and hence storage capacity. In this situation it is better not to encode the information in the reflectivity state itself but rather in the location of the transitions: The reflectivity pattern consists of an alternating sequence of “high” and ”low” marks of varying length (an integer multiple of some small length unit), while each mark exceeds a minimal length, say d + 1 units. Hence, this “run-length limited” (RLL) encoding makes sure no mark is too short for the disk while the information density is only limited by the accuracy of determining the length of the marks, or equivalently the location of the transitions. For technical reasons (to recover the length unit from the signal itself) another constraint is imposed: No mark must exceed k + 1 units, k > d. (For the CD, (d, k) = (2, 10).) 1 Philips Research Laboratories, Prof. Holstlaan 4, 5656 AA, Eindhoven, The Netherlands, e-mail: stan.baggen@philips.com e-mail: dee.denteneer@philips.com email: sebastian.egner@philips.com e-mail: henk.d.l.hollmann@philips.com e-mail: ludo.tolhuizen@philips.com e-mail: evgeny.verbitskiy@philips.com 2 TU Eindhoven, 5600 MB, Eindhoven, The Netherlands, e-mail: V.B.Balakirsky@tue.nl AMS 2000 subject classifications: primary 94A17, 28D20; secondary 58F11. Keywords and phrases: Markov and hidden Markov processes, Entropy, channel capacity, Gibbs vs. non-Gibbs. 274 Entropy of a bit-shift channel 275 Fig 1. Images of DVD disks. The left image shows a DVD-ROM. The track is formed by pressing the disk mechanically onto a master disk. On the right is an image of a rewritable disk. The resolution has been increased to demonstrate the irregularities in the track produced by the laser. These irregularities lead to higher probabilities of jitter errors. It is customary to describe RLL sequences by their transitions: A (d, k)-RLL sequence has at least d and at most k ’0’s between ’1’s. So a “high” mark of 4 units, followed by a “low” of 3 units, followed by a “high” of 4 units correspond to the RLL-sequence 100010010001 written to the disk. At the time the RLL-sequence is read from disk, the transitions (the ‘1’s) might be detected at different positions due to noise, inter-symbol interference, clock jitter, and other distortions. In the simplest version of this “bit-shift channel model” each ‘1’ may be detected one unit early, on time, or one unit late with the probabilities (ε, 1 − 2ε, ε), 0 ≤ ε ≤ 1/2, and the shifts are independent. More formally, suppose X is the length of a continuous interval of low or high marks on the disk. Then, after reading, the detected length is Y = X + ωlef t − ωright , (1) where ωlef t , ωright take values {−1, 0, 1}. And ω = 1, 0, −1 means that the transition between the ”low”-”high” or ”high”-”low” runs was detected one time unit too early, correctly, or one unit too late, respectively. Note that for two consecutive intervals ωright of of the first interval is ωlef t of the second. The simplest model for the distribution of time shifts ωlef t is to assume that they are independent for different intervals (runs), and P(ωlef t = −1) = P(ωlef t = 1) = ε, P(ωlef t = 0) = 1 − 2ε, for some ε ∈ [0, 1/2]. An important question then is: Given (d, k), ε, and some distribution for the input sequences (e.g. run-lengths uniformly distributed in {d, . . . , k}), what is the mutual information between input and output sequences? In other words, how much can be learned about the input from observing the output, on average. The problem of computing the mutual information is equivalent to computation of the entropy of the output sequence, see [2]. The supremum of this mutual information over all possible measures on space of input sequences is called “channel capacity”. 276 S. Baggen et al. 1.1. Model Let us describe the bit-shift channel as a continuous transformation (factor) of a certain subshift of finite type. Let A = {d, . . . , k}, where d, k ∈ N, d < k and d ≥ 2. The input space then is A Z = {x = (xi ) : xi ∈ A }. Consider also a finite alphabet Ω with 9 symbols Ω = {(−1, −1), (−1, 0), (−1, 1), (0, −1), (0, 0), (0, 1), (1, −1), (1, 0), (1, 1)} . Finally, consider a subshift of finite type ΩJ ⊂ ΩZ defined as   ΩJ = (ωn ) ∈ ΩN : ωn,2 = ωn+1,1 for all n ∈ Z , where ωn = (ωn,1 , ωn,2 ). The factor map φ is defined on A Z × ΩJ as follows: y = φ(x, ω) with (2) yn = xn + ωn,1 − ωn,2 for all n. Note that the output space O = φ(A Z × ΩJ ) is a subshift of B Z , with B = {d − 2, . . . , k + 2}. Clearly, O = B Z . For example, (d − 2, d − 2) cannot occur in any output sequence. Indeed, if yn = d − 2, then xn = d, ωn,1 = −1 and ωn,2 = 1. But then, yn+1 = xn+1 + ωn+1,1 − ωn+1,2 = xn+1 + ωn,2 − ωn+1,2 ≥ d + 1 − 1 = d. With a similar argument, one concludes that for any L ≥ 1 [d − 2, d, . . . , d, d − 2] or [d − 2, d, . . . , d, d − 1]       L times L times do not occur in any output sequence y ∈ O. Therefore, there is an infinite number of minimal forbidden words, i.e., forbidden words all of whose subwords are allowed. Hence, O is not a subshift of finite type. However, since A Z × ΩJ is a subshift of finite type, O is sofic [13]. 1.2. Capacity of a bit-shift channel Suppose J is a measure on ΩJ . For example, in this paper we will be mainly interested in Markov measures Jε on ΩJ , obtained in a natural way from Bernoulli measures on {−1, 0, 1} with probabilities ε, 1 − 2ε, and ε, respectively. If P is a translation invariant measure on A Z , then we obtain a measure Q on O, which is the push forward of P × J. We use a standard notation Q = (P × J) ◦ φ−1 . From the information-theoretical point of view, an important quantity is the capacity of the channel. The capacity of a bit-shift channel specified by J is defined as   (3) Cbitshift (J) = sup h (P × J) ◦ φ−1 − h(J), P∈P(A Z ) where the supremum is taken over P(A Z ) – the set of all translation invariant probability measures on A Z , and h(·) is the entropy. Even for ’Bernoulli’ measures Jε , the capacity Cbitshift (Jε ) is not known. It is relatively easy to see that the supremum in (3) is achieved. However, the properties of maximizing measures are not known. It is expected that maximizing measures are not Markov of any order. Finally, if one is interested in topological entropy of O: htop (O) = sup h(Q), Q∈P(O) then htop (O) is easily computable using standard methods [13] or using the efficient numerical approach of [8]. Entropy of a bit-shift channel 277 2. Entropy of a bit-shift channel Suppose that {Xn } are independent identically distributed random variables taking values in A = {d, . . . , k}, and let P be the corresponding distribution. What is the entropy of Q = (P × Jε ) ◦ φ−1 ? Note that (Xn , ωn ) is a Markov chain, and, hence, Yn , given by (2), is a function of a Markov chain. Let us start by recalling some methods of computing the entropy of processes which are functions of Markov chains. Suppose X = {Xn }, n ∈ Z, is a stationary ergodic Markov chain taking values in a finite alphabet A . Let φ : A → B be some map, and consider a process Y = {Yn }, defined by Yn = φ(Xn ) for all n ∈ Z. The following result [3, 4], see also [5, Theorem 4.4.1], provides sharp estimates on the entropy of Y . Theorem 2.1. If X is a Markov chain and Y = φ(X ), then for every n ≥ 1 one has H(Y0 |Y1 , . . . , Yn−1 , Xn ) ≤ h(Y ) ≤ H(Y0 |Y1 , . . . , Yn−1 , Yn ). Moreover, as n ր ∞ H(Y0 |Y1 , . . . , Yn−1 , Xn ) ր h(Y ), H(Y0 |Y1 , . . . , Yn−1 , Yn ) ց h(Y ). Birch [3, 4] has shown that under some additional conditions,the convergence is in fact exponential: |h(Y ) − H(Y0 |Y1 , . . . , Yn−1 , Xn )| ≤ Cρn , |h(Y ) − H(Y0 |Y1 , . . . , Yn−1 , Yn )| ≤ Cρn , where ρ ∈ (0, 1) is independent of the factor map φ. Let us give a proof of Theorem 2.1, since it is very short and provides us with some useful intuition. Proof of Theorem 2.1. An upper estimate of h(Y ) in terms of H(Y0 |Y1 , . . . , Yn ) and the monotonic convergence H(Y0 |Y1 , . . . , Yn ) to h(Y ) are standard facts. For the lower estimate we proceed as follows: for any m ∈ N one has H(Y0 |Y1 , . . . , Yn−1 , Xn ) = H(Y0 |Y1 , . . . , Yn−1 , Xn , . . . , Xn+m ) = H(Y0 |Y1 , . . . , Yn−1 , Yn , . . . , Yn+m , Xn , . . . , Xn+m ) ≤ H(Y0 |Y1 , . . . , Yn−1 , Yn , . . . , Yn+m ), (4) (5) (6) where in (4) we used the Markov property of X , and (5), (6) follow from the standard properties of conditional entropies. Since h(Y ) = lim H(Y0 |Y1 , . . . , Yn−1 , Yn , . . . , Yn+m ), k→∞ we obtain the lower estimate of h(Y ). Moreover, using standard properties of conditional entropies, we immediately conclude that H(Y0 |Y1 , . . . , Yn−1 , Xn ) is monotonically increasing with n. 278 S. Baggen et al. To prove that the lower bound actually converges to h(Y ), we proceed as follows. Note that H(Xn+1 ) ≥ H(Xn+1 |Yn ) − H(Xn+1 |Yn , . . . , Y0 ) n−1 H(Xn+1 |Yn , . . . , Yi+1 ) − H(Xn+1 |Yn , . . . , Yi ) = i=0 n−1 = H(Xn−i+1 |Yn−i , . . . , Y1 ) − H(Xn−i+1 |Yn−i , . . . , Y0 ) i=0 n = n cj , H(Xj+1 |Yj , . . . , Y1 ) − H(Xj+1 |Yj , . . . , Y0 ) = j=1 j=1 where cj = H(Xj+1 |Yj , . . . , Y1 ) − H(Xj+1 |Yj , . . . , Y0 ), Since cj ≥ 0 and n → ∞. Moreover, n j=1 cj j = 1, . . . , n. < H(X1 ) < ∞ for all n, we conclude that cn → 0 as cn = H(Xn+1 |Yn , . . . , Y1 ) − H(Xn+1 |Yn , . . . , Y0 ) = H(Y1 , . . . , Yn , Xn+1 ) − H(Y1 , . . . , Yn ) − H(Y0 , Y1 , . . . , Yn , Xn+1 ) + H(Y0 , Y1 , . . . , Yn ) = H(Y0 |Y1 , . . . , Yn ) − H(Y0 |Y1 , . . . , Yn , Xn+1 ). Finally, since H(Y0 |Y1 , . . . , Yn ) converges to h(Y ), so does H(Y0 |Y1 , . . . , Yn , Xn+1 ). Let us conclude this section with one general remark. Suppose Y is a factor of X , i.e. Y = φ(X ), where X is some ergodic process. For n, m ∈ N, let dn,m = H(Y0 |Y1 , . . . , Yn , Xn+1 , . . . , Xn+m ). Note that dn,m ≥ dn,m+1 ≥ 0, and hence limm→∞ dn,m =: Dn exists. Since for any n, m ∈ N dn,m = H(Y0 |Y1 , . . . , Yn , Xn+1 , . . . , Xn+m ) ≤ H(Y0 |Y1 , . . . , Yn , Yn+1 , . . . , Yn+m ), we conclude that Dn ≤ h(Y ). Note also that since dn,m ≤ dn+1,m−1 , one has Dn+1 ≥ Dn . The natural question is under which conditions does Dn converge to h(Y ) as n → ∞. For this we need a certain regularity of the conditional probabilities of the X -process. For example, if conditional probabilities are continuous, i.e., if rn = sup ′ ′ sup P(X0 |X1 , . . . , Xn , Xn+1 , . . .) , Xn+2 X0 ,...,Xn X ′ ,X” ′′ ′′ , . . .) → 0, , Xn+2 − P(X0 |X1 , . . . , Xn , Xn+1 n → ∞, then Dn → h(Y ). Gibbs measures and g-measures (see Section 4) have continuous conditional probabilities. 279 Entropy of a bit-shift channel 2.1. Entropy via a prefix code In this section we recall the approach to efficient computation of entropies of factor processes Y = φ(X ), where X is Markov, which was originally proposed in [2, 7]. The inequalities of Theorem 2.1 can be rewritten as follows P(y1n )HP(·|y1n ) (Y0 ), P(y1n )HP(·|y1n ) (Y0 |Xn+1 ) ≤ h(Y ) ≤ (7) y1n ∈Bn y1n ∈Bn where we use the following notation y1n = (y1 , y2 , . . . , yn ) ∈ B n , P(y1n ) = P(Y1 = y1 , . . . , Yn = yn ), P(·|y1n ) = P(·|Y1 = y1 , . . . , Yn = yn ). The subindex P(·|y1n ) in (7) stresses that the entropy of Y0 and the conditional entropy of Y0 and Xn+1 is computed using P(·|y1n ). Note that the sum in (7) is taken over elements of a partition of B N into cylinders of length n:   Un = [y1n ] y1n ∈ B n , [y1n ] = {ỹ ∈ B Z : ỹ1 = y1 , . . . , ỹn = yn }. In fact, an estimate similar to (7) holds for any partition of B Z into cylindric sets, see [7, Theorem 1]. Theorem 2.2. Let W be a finite partition of B Z into cylindric sets: M  . W = [wi ], wi = (wi,1 , . . . , wi,li ) i=1 Then h(w), h1 (w) ≤ h(Y ) ≤ where (8) w∈W w∈W h1 (w) = P(Y1 . . . Y|w| = w)H(Y0 |Y1 . . . Y|w| = w, X|w|+1 ), h(w) = P(Y1 . . . Y|w| = w)H(Y0 |Y1 . . . Y|w| = w). Theorem (2.2) leads to the following algorithm. Suppose W is some partition into cylinders. We can refine the partition W by removing a certain word w from W and adding all words of the form wb, where b ∈ B, i.e., W ′ = W \ {w} ∪ {wb| b ∈ B}. (9) Suppose {Wk }k≥1 is a sequence of partitions such that for each k, Wk+1 is a refinement of Wk as in (9), and at each step a word w ∈ Wk is selected such that   h(w) − h1 (w) = max h(u) − h1 (u) . (10) u∈Wk The greedy strategy (10), as well as some other strategies (e.g, uniform, |w| = minu∈W |u|), guarantees the convergence of the upper and lower estimates in (8), i.e., h(w) − h1 (w) = 0. lim k→∞ w∈Wk 280 S. Baggen et al. 2.2. Entropy via renewal times. As before, suppose that {Xn } are independent and identically distributed in {d, . . . , k} with P(Xi = ℓ) = pℓ , ℓ = d, . . . , k. Assume also that pd > 0. Another method for estimating the entropy is based on the following observation. Suppose Yn = d − 2 for some n. This implies that Xn = d, ωn = (−1, 1). Since the sequence {ωk } forms a Markov chain, (. . . , ω1 , . . . , ωn−1 ) and (ωn+1 , ωn+2 , . . .), are independent given ωn . Therefore, since Yn = d−2 implies ωn = (−1, 1), we conclude that (. . . , ω1 , . . . , ωn−1 ) and (ωn+1 , ωn+2 , . . .) are independent given Yn = d − 2. Moreover, since Xn form an iid sequence, (. . . , Yn−2 , Yn−1 ) and (Yn+1 , Yn+2 , . . .) are also independent given Yn = d − 2. Consider our subshift O, and a set C = [d − 2] = {y ∈ O : y0 = d − 2}. Let S : O → O be a left shift, and consider an induced map SC on C: SC (y) = S RC (y) (y), where RC (y) = min{k ≥ 1 : yk = d − 2}. On C, the induced map SC has a natural Bernoulli partition   [d − 2, y1 , . . . , yr , d − 2] : yj ∈ B, yj = d − 2, j = 1, . . . , r, r ∈ N . Finally, by the Abramov formula [1] ∞ h(Q) = − Q([d − 2, y1 , . . . , yr , d − 2]) log Q [d − 2, y1 , . . . , yr , d − 2]) r=1 y1 ,...,yr =d−2 + Q([d − 2]) log Q([d − 2]). (11) Computation of entropy of images of Markov measures using the renewal times and induced map was used in the past, see e.g. [15]. However, in the case of bit-shift channel, the method based on (11) is extremely inefficient. 3. Numerics For illustration we present a numerical computation of the entropy using the prefix code method described in Section 2.1. The algorithm constructs a sequence of refined partitions Wk as described above. A particularly useful strategy is given by (10). This “greedy” heuristics selects the cylinder most responsible for the difference in upper and lower bound, in the hope that refining this cylinder will tighten the bounds quickly. This strategy is not optimal (as can be shown by example) but it has three advantages. Firstly, the bounds converge (eventually). Secondly, if in a particular word w ∈ W , the last symbol is the ”renewal” symbol d−2 (similarly k+2), this word will never be refined again. Thirdly, the next cylinder to expand can be found quickly by representing W as a “priority queue” data structure. For illustration, we run the algorithm for the model of the jitter channel described in Section 1.1. The parameters are inspired by the Compact Disc: The errorcorrection and modulation system of the CD essentially produces an RLL-sequence with parameters (d, k) = (2, 10). We model the run-lengths as independent identically distributed random variables with probabilities pℓ = p2 γ ℓ−2 , ℓ ∈ {2, . . . , 10}, where γ = 0.658 and p2 is chosen such that pℓ = 1. This truncated geometric 281 Entropy of a bit-shift channel 2.6 I(X;Y) rate EFM/CD 2.4 bits/run 2.2 2 1.8 1.6 1.4 0.001 0.01 epsilon 0.1 Fig 2. Mutual information I(Y ; X ) = h(Y ) − h(Jε ) as a function of ε, for (d, k) = (2, 10) and the truncated geometric distribution for X . model with γ = 0.658 is a very good approximation of the (marginal) run-length distribution observed on the CD. Figure 2 shows the mutual information I(Y ; X ) = h(Y ) − h(Jε ) = h(Y ) + 2ε log ε + (1 − 2ε) log(1 − 2ε) as a function of ε. The horizontal line represents the rate designed for the last stage of the encoding used in the CD (the so-called EFM code). If the jitter is so strong that the mutual information drops below this rate, reliable decoding is impossible. In practice, similar plots are used to evaluate the performance of particular encoding schemes with respect to various distortions introduced by the physical channel. Figure 3 compares the greedy and uniform heuristics. The standard estimate H(Y0 |Y1 , . . . , Yn ) in fact corresponds to the uniform refinement. Observe a superior rate of convergence for the greedy refinement strategy. 4. Thermodynamics of jittered measures Bernoulli and Markov measures belong to a wider class of the so-called Gibbs measures. Bernoulli and Markov measures are also examples of g-measures. In the seminal paper [10] M. Keane introduced a class of g-measures. These are the measures whose conditional probabilities are given by a continuous and strictly positive function g. For subshifts of finite type, the theory of g-measures is extensive. For sofic subshifts, the problem of defining g-measures is much more complicated. For the first results see the paper by W. Krieger [11] in this volume. The thermodynamic formalism allows to look at Gibbs measures from two different sides. First of all, locally, through the conditional probabilities; and secondly, globally, through the variational principles. Contrary to the class of g-measures, the class of Gibbs measures for a sofic subshift is well defined. The natural question is whether a ”jittered” measure Q = (P × Jε ) ◦ φ−1 is Gibbs. If the measure is Gibbs and the potential is identified, then, 282 S. Baggen et al. (upper - lower) bound [bit/run] 0.1 greedy uniform 0.01 0.001 0.0001 1e-05 1e-06 1e-07 1e-08 10 100 [t] 1000 # cylinders 10000 100000 Fig 3. Difference between upper and lower bounds on entropy as a function of |W |, the number of cylinders, in partitions built by the greedy and uniform refinement strategies. using the variational principle, we obtain another method of computing the entropy of Q. The subshift O ⊂ B Z satisfies a specification property (as a factor of a subshift of finite type A Z × ΩJ which has a specification property [6]). Hence the results of [17] on existence of Gibbs measures for expansive dynamical systems with the specification property are applicable. If Q would be a Gibbs measure for potential f from the Bowen class V(O), then there would exist positive constants c, C such that for any n > 0 and every y ∈ O c≤ exp( n k=0 Q([y0 , . . . , yn ]) ≤ C, f (S i (y)) − (n + 1)P (f )) (12) where S : O → O is a left shift and P (f ) is the topological pressure of f . As a corollary of (12) one easily concludes that log Q(y0 |y1 , . . . , yn ) = log Q([y0 , . . . , yn ]) Q([y1 , . . . , yn ]) should be bounded, which is not the case, see [18]. Hence, Q is not Gibbs for any potential from the large class of potentials V(O). Examples of measures Q such that estimates similar to (12) hold for some continuous f and subexponential bounds cn and Cn (limn n−1 log cn = limn n−1 log Cn = 0) have been considered [14, 20], and were shown to be weakly Gibbs. It is not known whether Q for the bit-shift channel is weakly Gibbs for some continuous potential f . Nevertheless, the thermodynamic formalism could be useful in estimating the capacity of the bit-shift channel. We recall the notion of a compensation function and some results summarized in [19]. First of all, we define the topological pressure of real-valued continuous functions defined on A Z × ΩJ and O. If f ∈ C(A Z × ΩJ ), g ∈ C(O), the topological pressures of f and g are defined as       Z f dS , P (g|O) = sup h(Q) + g dQ , P (f |A × ΩJ ) = sup h(S) + S Q A Z ×ΩJ O Entropy of a bit-shift channel 283 where the suprema are taken over all translation invariant measures on A Z × ΩJ and O, respectively. A measure S on A Z × ΩJ is called an equilibrium state for f ∈ C(A Z × ΩJ ) if  Z P (f |A × ΩJ ) = h(S) + f dS. (13) We define equilibrium states on O in a similar way. It is well known that every measure is an equilibrium state: for every translation invariant measure S on A Z × ΩJ one can find a continuous function f : A Z × ΩJ → R such that (13) holds. Moreover, for any S = P × Jε , such an f is of a special form f (x, ω) = f˜(x) + jε (ω), where f˜ : A Z → R and jε : ΩJ → R are continuous functions. (In fact, jε can be found explicitly.) A continuous function F : A Z × ΩJ → R is a compensation function if P (F + g ◦ φ|A Z × ΩJ ) = P (g|O) for all g ∈ C(O). Compensation functions exist for factor maps defined on shifts of finite type [19]. An important result is the so-called relative variational principle [12, 19], which in our notation states that F is a compensation function if and only if for any invariant measure Q on O one has    h(Q) = sup h(S) + F dS S ◦ φ−1 = Q . S Suppose F is a compensation function, then for Q = (P × Jε ) ◦ φ−1 we obtain   h(Q) ≥ h(P × Jε ) + F d(P × Jε ) = h(P) + h(Jε ) + Fε dP, (14)  where Fε (x) = ΩJ F (x, ω)Jε (dω). For the capacity of the bit-shift channel we obtain the following lower estimate Cbitshift (Jε ) = sup h(Q) − h(Jε ) Q=(P×Jε )◦φ−1    ≥ sup h(P) + Fε dP = P (Fε |A Z ). (15) P An interesting question is whether the inequalities in (14) and (15) are strict. The inequality (14) is most probably strict in the generic situation. Indeed, by Corollary 3.4 [19], if Q is an equilibrium state for g on O, and S is such that S ◦ φ−1 = Q and  h(Q) = h(S) + F d S, then S is an equilibrium state for F + g ◦ φ, and conversely. On the other hand if S = P × Jε , then S is an equilibrium state for f (x, ω) = f˜(x) + jε (ω). Therefore, for the equality in (14), it is necessary that F (x, ω) + (g ◦ φ)(x, ω) and f˜(x) + jε (ω) are physically equivalent, i.e., have the same set of equilibrium states. In fact, it is quite difficult to imagine how for a given compensation function F of the bit-shift 284 S. Baggen et al. channel and a generic g one could find f˜ to satisfy the requirement of physical equivalence. On the other hand, it is not very difficult to see that in fact Cbitshift (Jε ) = sup P (Fε |A Z ), (16) F where the supremum is taken over all compensation functions F . Indeed, suppose Q∗ = (P∗ × Jε ) ◦ φ−1 is a ‘maximal’ ergodic measure, i.e., Cbitshift (Jε ) = h(Q∗ ) − h(Jε ). Then there exist continuous functions g ∗ ∈ C(O) and f˜∗ ∈ C(A Z ) such that Q∗ and P∗ are equilibrium states for g ∗ and f˜∗ , respectively. But then F (x, ω) = f˜∗ (x) + jε (ω) − (g ∗ ◦ φ)(x, ω) is the compensation function for which the maximum in (16) is attained. Thus methods for dealing with factor systems developed in dynamical systems, could be applied to estimate channel capacities. The practicality of such estimates depends strongly on whether one is able to understand the structure of a class of compensation function for a given channel. Probably, in many concrete cases, a relatively large family of compensation functions will suffice as well. References [1] Abramov, L. M. (1959). The entropy of a derived automorphism. Dokl. Akad. Nauk SSSR 128, 647–650. MR0113984 [2] Baggen, S., and Balakirsky, V. (2003). An efficient algorithm for computing the entropy of output sequences for bitshift channels. Proc. 24th Int. Symposium on Information Theory in Benelux , 157–164. [3] Birch, J. J. (1962) Approximation for the entropy for functions of Markov chains. Ann. Math. Statist. 33, 930–938. MR0141162 [4] Birch, J. J. (1963) On information rates for finite-state channels. Information and Control 6, 372–380. MR0162651 [5] Cover, T. M., and Thomas, J. A. (1991). Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons Inc., New York. MR1122806 [6] Denker, M., Grillenberger, C., and Sigmund, K. (1976). Ergodic Theory on Compact Spaces. Springer-Verlag, Berlin. Lecture Notes in Mathematics 527. MR0457675 [7] Egner, S., Balakirsky, V., Tolhuizen, L., Baggen, S., and Hollmann, H. (2004) On the entropy rate of a hidden Markov model. Proceedings International Symposium Information Theory, ISIT 2004 . [8] Froyland, G., Junge, O., and Ochs, G. (2001). Rigorous computation of topological entropy with respect to a finite partition. Phys. D 154, 1–2, 68–84. MR1840806 [9] Immink, K. (1999). Codes for Mass Data Storage Systems. Shannon Foundation EEE Publishers, The Netherlands. [10] Keane, M. (1972). Strongly mixing g-measures. Invent. Math. 16, 309–324. MR0310193 [11] Krieger, W. (2006) On g-functions for subshifts. In Dynamics and Stochastics, IMS Lecture Notes-Monograph Series, Vol. 48, 306–316. Entropy of a bit-shift channel 285 [12] Ledrappier, F., and Walters, P. (1977). A relativised variational principle for continuous transformations. J. London Math. Soc. (2) 16, 3, 568–576. MR0476995 [13] Lind, D., and Marcus, B. (1995). An Introduction to Symbolic Dynamics and Coding. Cambridge University Press. MR1369092 [14] Maes, C., Redig, F., Takens, F., van Moffaert, A., and Verbitski, E. (2000). Intermittency and weak Gibbs states. Nonlinearity 13, 5, 1681–1698. MR1781814 [15] Marcus, B., Petersen, K., and Williams, S. ( 1984). Transmission rates and factors of Markov chains. In Conference in Modern Analysis and Probability (New Haven, Conn., 1982), vol. 26 of Contemp. Math. Amer. Math. Soc., Providence, RI, pp. 279–293. MR1369092 [16] Marcus, B. H., Roth, R. M., and Siegel, P. H. (2001). An Introduction to Coding of Constrained Systems. Lecture Notes, fifth edition. [17] Ruelle, D. (1992). Thermodynamic formalism for maps satisfying positive expansiveness and specification. Nonlinearity 5, 6, 1223–1236. MR1192516 [18] van Enter, A. C. D., and Verbitskiy, E. A. (2004). On the variational principle for generalized Gibbs measures. Markov Process. Related Fields 10, 3, 411–434. MR2097865 [19] Walters, P.(1986). Relative pressure, relative equilibrium states, compensation functions and many-to-one codes between subshifts. Trans. Amer. Math. Soc. 296, 1, 1–31. MR837796 [20] Yuri, M. (1999). Thermodynamic formalism for certain nonhyperbolic maps. Ergodic Theory Dynam. Systems 19, 5, 1365–1378. MR1721626