Coding and

Amparo  Urbano

DPEB 09/10 Nash equilibrium and information transmission coding and decoding rules >Penélope Henández Universitat de València and ERI-CES, Spain >Amparo Urbano Universitat de València and ERI-CES, Spain >José E. Vila Universitat de València and ERI-CES, Spain July, 2010 Nash Equilibrium and information transmission coding and decoding rules∗ Penélope Hernández†, Amparo Urbano and José E.Vila‡ June 28, 2010 Abstract The design of equilibrium protocols in sender-receiver games where communication is noisy occupies an important place in the Economic literature. This paper shows that the common way of constructing a noisy channel communication protocol in Information Theory does not necessarily lead to a Nash equilibrium. Given the decoding scheme, it may happen that, given some state, it is better for the sender to transmit a message that is different from that prescribed by the codebook. Similarly, when the sender uses the codebook as prescribed, the receiver may sometimes prefer to deviate from the decoding scheme when receiving a message. Keywords: Noisy channel, Shannon’s Theorem, sender-receiver games, Nash equilibrium. JEL: C72, C02 ∗ The authors wish to thank participants of the Third World Congress of the Game Theory Society, Chicago 2008, and the Center of Rationality and Interactive Decision Theory, Jerusalem 2009. Special thanks go to Bernhard von Stengel. The authors thank both the Spanish Ministry of Science and Technology and the European Feder Funds for financial support under project SEJ200766581. † Corresponding Author: Departament of Economic Analysis and ERI-CES. University of Valencia. Campus dels Tarongers. 46022 Valencia (Spain). Tel: +34 96 3828783. Fax: +34 96 3828249. e-mail: penelope.hernandez@uv.es. ‡ Departament of Economic Analysis and ERI-CES. University of Valencia. Campus dels Tarongers. 46022 Valencia (Spain). e-mails: amparo.urbano@uv.es; jose.e.vila@uv.es. 1 1 Introduction A central result of Information Theory is Shannon’s noisy channel coding theorem. The purpose of this note is to point out that this theorem is not robust to a game theoretical analysis and thus cannot be directly applied to strategic situations. To demonstrate our inquiry we study the same framework as Shannon: the possibility of a noisy channel communication between a privately informed sender and a receiver who must take an action. Our contribution is to show that the methodology developed for optimal information transmission do not necessarily deﬁne equilibria of sender-receiver games. The issue of information transmission is not new in Economics and actually there is a vast literature starting with the seminal work of Crawford and Sobel [3]. Several papers have additionally addressed the situation where communication may be distorted in the communication process by assuming that messages may not arrive (Myerson [12], Rubinstein [13], among others). This brand of literature points out that players’ strategic behavior under “almost common knowledge” is not enough to guarantee coordination. Less research has been undertaken when the noisy communication is of a particular type: while messages are always received by the receiver, they may differ from those sent by the sender (Blume et al. [1], Koessler [10], Hernández et al. [9], Mitusch and Strausz [11]). Another brand of the literature deals with entropy based communication protocols (See Gossner et al [4], Gossner and Tomala [5], [6], [7], Hernández and Urbano [8]). Traditional Information Theory, pioneered by Shannon [14], has approached noisy information transmission by considering that agents communicate through a discrete noisy channel. Although Shannon does not describe this situation as a game, we consider it as in a standard sender-receiver game with two players: a sender and a receiver. The sender has to communicate through a noisy channel some private information from a message source to the receiver, who must take some action from an action space, and with both receiving 1 if the information is correctly transmitted and 0 otherwise. More precisely, suppose the sender wishes to transmit an input sequence of signals (a message) through a channel that makes errors. One way to compensate for these errors is to send through the channel not the sequence itself but a modiﬁed version of the sequence that contains redundant information. The process of modiﬁcation chosen is called the encoding of the message. The receiver receives an output message and he has to decode it, removing the errors and obtaining the original message. He does this by applying a decoding function. The situation that we consider, in line with the set up of Information Theory, is as follows. We have a set Ω of M states. The sender wants to transmit through the channel the chosen state, so there are M possible messages. The communication protocol is chosen, given by a codebook of M possible messages, each of which is represented by a codeword of length n over the communication alphabet. The sender 2 picks the codeword corresponding to the state. This codeword is transmitted and altered by the noisy channel. The receiver decodes the received message (a string of n symbols from the alphabet), according to some decoding scheme. The protocol is common knowledge to the players. Both sender and receiver are supposed to follow the rules of the protocol. The natural question from the viewpoint of Game Theory is whether following the rules constitutes a Nash equilibrium. The protocol may not deﬁne the best possible code in terms of reliability, but in that case one may hope that it constitutes at least a not-so-good Nash equilibrium. This paper shows that the common way of constructing a communication protocol does not necessarily lead to a Nash equilibrium: Given the decoding scheme, it may happen that, given some state, it is better for the sender to transmit a message that is different from that prescribed by the codebook. Similarly, when the sender uses the codebook as prescribed, the receiver may sometimes prefer to deviate from the decoding scheme when receiving a message. This common way of choosing a communication protocol is as follows: 1. The channel with its errors is deﬁned as a discrete Markov process where a symbol from the alphabet is transformed into some other symbol according to some error probability. 2. From these characteristics of the channel one can compute a capacity of the channel, which determines the maximal rate of transmitting information reliably. For example, a rate of 0.2 means that (if the alphabet is binary) for every bit of information on the input side, one needs to transmit 5 bits across the channel. 3. A main insight of Shannon is that as long as the rate is below channel capacity, the probability of error in information transmission can be made arbitrarily small when the length n of the codewords is allowed to be sufficiently long. The way Shannon achieves the above is the following: The sender selects M codewords of length n at random. That is, for every input message, the encoding is chosen entirely randomly from the set of all possible encoding functions. Furthermore, for every message, this choice is independent of the encoding of every other message. With high probability this random choice leads to a “nearly optimal” encoding function, from the point of view of rate and reliability. The decoding rule is based on a simple idea: A channel outcome will be decoded as a speciﬁc input message if that input sequence is “statistically close” to the output sequence. This statistical proximity is measured in terms of the entropy of the joint distribution of both sequences which establishes when two sequences are probabilistically related. The associated decoding function is known as the jointly typical decoding. Our methodological note is organized as follows. The sender-receiver game and the noisy channel are set up in Section 2. Section 3 offers a rigorous presentation of Shannon’s communication protocol, specifying players’ strategies from a theoret3 ical viewpoint. The reader familiar with Information Theory can skip it. Section 4 presents three simple examples of a sender-receiver game with speciﬁc code realizations. The ﬁrst two examples offer the following code realizations: 1) the “natural one” where the decoding rule translates to the majority rule and where the equilibrium conditions are satisﬁed; and 2) a worse code realization, where a deviation by the receiver takes place. The last example exhibits a sender’s deviation. Concluding remarks close the paper. 2 The basic sender-receiver set up Consider the possibilities of communication between two players, called the sender (S) and the receiver (R) in an incomplete information game Γ: there is a ﬁnite set of feasible states of nature Ω = {ω0 , . . . , ωM −1 }. Nature chooses ﬁrst randomly ωj ∈ Ω with probability qj and then the sender is informed of this state ωj , the receiver must take some action in some ﬁnite action space A, and payoffs are realized. The agents’ payoffs depend on the sender’s information or type ω and the receiver’s action a. Let u : A×Ω → R be the players’ (common) payoff function, i.e., u(at , ωj ), j = 0, 1, . . . , M − 1. Assume that for each realization of ω, there exists a unique receiver’s action with positive payoffs: for each state ωj ∈ Ω, there exists a unique action ! aj ∈ A such that: " 1 if at = ! aj u(at , ωj ) = 0 otherwise The timing of the game is as follows: the sender observes the value of ω and then sends a message, which is a string of signals from some message space. It is assumed that signals belong to some ﬁnite space and may be distorted in the communication process. This distortion or interference is known as noise. The noise can be modeled by assuming that the signals of each message can randomly be mapped to the whole set of possible signals. An unifying approach to this noisy information transmission is to consider that agents communicate through a discrete noisy channel. Definition 1 A discrete channel (X; p(y|x); Y ) is a system consisting of an input alphabet X and output alphabet Y , and a probability transition matrix p(y|x) that expresses the probability of observing the output symbol y, given that the symbol x was sent. A channel is memoryless if the probability distribution of the output depends only on the input at that time and is conditionally independent of previous channel inputs or outputs. In addition, a channel is used without feedback if the input symbols do not depend on the past output symbols. The nth extension of a discrete memoryless channel is the channel (X = X n ; p(y = n # p(yi |xi ). yn |x = xn ); Y = Y n ), where p(y|x) = p(y n |xn ) = i=1 4 Consider the binary channel ν(ε0 , ε1 ) = (X = {0, 1}; p(y|x); Y = {0, 1}) where p(1|0) = ε0 and p(0|1) = ε1 (i.e., εl is the probability of a mistransmission of input message l) and let ν n (ε0 , ε1 ) be its nth extension. While binary channels may seem rather oversimpliﬁed, they capture the essence of most mathematical challenges that arise when trying to make communication reliable. Furthermore, many of the solutions found to make communication reliable in this setting have been generalized to other scenarios. Let Γnυ denote the extended communication game. It is a one-stage game where the sender sends a message x ∈ X of length n, using the noisy channel, the receiver observes a realization y ∈ Y of such a message and takes an action in Γ. A strategy of S in the extended communication game Γnυ is a decision rule suggesting the message to be sent at each ωj : a M -tuple {σjS }j where σjS ∈ X is the message $ sent%by S given that the true state of nature is ωj . A strategy of R is a 2n -tuple σyR y , specifying an action choice in Γ as a response to the realized output sequence y ∈ Y. Expected payoffs are deﬁned in the$usual % way. Let the tuple of the sender’s payoffs be denoted by {πjS }j = {πjS (σjS , σyR y )}j , where for each ωj , & $ % p(y|σjS )u(σyR , ωj ) πjS = πjS (σjS , σyR y ) = y∈Y and where p(y|σjS ) is the sender’s probability about the realization of the output sequence y ∈ Y conditional on having sent message σjS in state ωj . Let the tuple of the receiver’s payoffs be denoted by {πyR }y = {πyR ({σjS }j , σyR )}y , where for each output sequence y ∈ Y, πyR = πyR ({σjS }j , σyR ) = M −1 & p(σjS |y)u(σyR , ωj ) j=0 and where p(σjS |y) is the receiver’s probability about input message σjS in state ωj conditional on having received the output message y. A pure strategy Nash equilibrium of the communication game is a pair of tuples ({! σjS }j , {! σyR }y ) such that for each ωj , and for any other strategy σ 'jS of the sender, σyR }y ) σjS , {! σjS , {! σyR }y ) ≥ πjS (' π !jS = πjS (! and for each y ∈ Y and for any other receiver’s strategy σ 'yR , π !yR = πyR ({! σjS }j , σ !yR ) ≥ πyR ({! σjS }j , σ 'yR ) 5 Notice that the set of probabilities {p(σjS |y)}j for the receiver (where by Bayes rule p(y|σ S )p(σ S ) j j p(σjS |y) = ) is always well-deﬁned (p(y) > 0 for all y). Therefore, the p(y) Nash equilibrium is also a perfect Bayesian equilibrium. Fix the Sender’s strategy {σjS }0,...,M −1 where σjS ∈ X is the message sent by S given that the true state of nature is ωj . The receiver has to take an action al in Γ after receiving an output sequence y such that: al = Arg max al M −1 & p(σjS |y)u(σyR , ωj ) = Arg max al j=0 M −1 & p(σjS |y). j=0 Equivalently, given the linearity of the receiver’s payoff functions in probabilities {p(σlS |y)}l , 0 ≤ l < M − 1, and since by Bayes’ rule, p(σlS |y) = p(σkS |y) p(y|σlS )p(σlS ) p(y) p(y|σkS )p(σkS ) p(y) = ql p(y|σlS ) qk p(y|σkS ) then the receiver will choose, for each y, action al whenever ql p(σlS |y) ≥ qk p(σkS |y) (i.e., ql p(σlS |y) qk p(σkS |y) ≥ 1), for all k (= l, k = 0, . . . , M − 1, and will choose ak otherwise. This condition translates to the receiver choosing action al whenever ql p(y|σlS ) ≥ qk p(y|σkS ), and choosing ak otherwise, with p(y|σjS ) given by the channel’s error probabilities and by the sender’s coding. To simplify assume that the states of 1 nature are uniformly distributed, ql = M for l ∈ {0, . . . , M − 1}. Then σyR = al , whenever p(y|σlS ) ≥ p(y|σkS ) ∀σkS ∈ X (1) Consider now the sender’s best response to the receiver’s strategy σyR . The sender’s problem is to choose an input sequence σjS for each state ωj , j = 0, . . . , M − 1, such that σlS = Arg max & p(y|σlS )u(σyR , ωl ) = Arg max y∈Y & p(y|σlS ). y∈Y Given the receiver’s decoding, the above problem amounts to choosing an input sequence σjS in states ωj such that & p(y|σlS ) ≥ y∈Y & p(y|xS ) y∈Y for any other input sequences xS in all codebooks over {0, 1}n . 6 (2) 3 Shannon’s communication protocol For completeness we present ﬁrst some basic results from Information Theory, largely following Cover and Thomas [2] Let X be a random variable with probability distribution p. The entropy H(X) of X is deﬁned by H(X) = −Σθ∈Θ p(θ) log(p(θ) = −EX [log p(X)] , where 0 log 0 = 0 by convention. Consider independent, identically distributed (i.i.d.) random variables X1 , . . . , Xn . Then by the deﬁnition of entropy, H(X1 , . . . , Xn ) = −Σθ1 ∈Θ1 . . . Σθn ∈Θn p(θ1 , . . . , θn ) log p(θ1 , . . . , θn ) where p(θ1 , . . . , θn ) = p(X1 = θ1 , . . . , Xn = θn ). Let x be a sequence of length n over a ﬁnite alphabet θ of size |θ|. Denote by θi (x) the frequency θi over n. We deﬁne the empirical entropy of x, denoted by H(θ1 (x), . . . , θ|θ| (x)), as the entropy of the empirical distribution of x. An (M, n) code for the channel (X, p(y | x), Y ) consists of 1) an index set {0, 1, . . . , M − 1}; 2) an encoding function e : {0, 1, . . . , M − 1} −→ X n , yielding codewords: e(1), e(2), . . . , e(M ). The set of codewords is called the codebook ; 3) a decoding function d : Y n −→ {0, 1, . . . , M − 1}. Consider a noisy channel and a communication length n. Let & λi = Pr(d(Y n (= i|X n = X n (i)) = p(y n |xn (i))I(d(y n ) (= i) yn be the conditional probability of error given that index i was sent, and where I(.) is the indicator function. The maximal probability of error λ(n) for an (M, n) code is (n) deﬁned as λ(n) = maxi∈{0,1,...,M −1} λi and the average probability of error Pe for M −1 (n) (n) 1 ( an (M, n) code is Pe = M λi . Note that Pe ≤ λ(n) . i=0 The rate and the mutual information are two useful concepts from Information Theory characterizing when information can be reliably transmitted over a commulog M , and a rate nications channel. The rate r of an (M, n) code is equal to r = |Θ| n r is said to be achievable if there exists a sequence of (2nr , n) codes such that the maximal probability of error λ(n) tends to 0 as n goes to ∞. The capacity of a discrete memoryless channel is the supremum of all achievable rates. The mutual information I(X; Y ) measures the information that random variables X and Y share. Mutual information can be equivalently expressed as I(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X), where H(Y |X) is the conditional entropy of Y (taking values θ2 ∈ Θ2 ) given X (taking values θ1 ∈ Θ1 ) deﬁned by & & p(θ2 | θ1 ) log p(θ2 | θ1 ). p(θ1 ) H(Y | X) = − θ1 ∈Θ1 θ2 ∈Θ2 7 Then, the capacity C of a channel can be expressed as the maximum of the mutual information. Formally: C = suppX I(X; Y ) between the input and output of the channel, where the maximization is with respect to the input distribution. Therefore the channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel. Theorem 1 (Shannon):All rates below capacity C are achievable. Specifically, for every rate r < C, there exists a sequence of (2nr , n) codes with maximum probability of error λ(n) −→ 0. Conversely, any sequence of (2nr , n) codes with λ(n) −→ 0 must have r ≤ C. 3.1 Shannon’s strategies: Fix a channel and a communication length n. We can compute from the channel its capacity C, and from n the information transmission rate r. Shannon’s theorem states that given a noisy channel with capacity C and information transmission rate r, if r < C, then there will exist both an encoding rule and a decoding rule which will allow the receiver to make arbitrarily small the average probability of the information transmission error. These two parameters: rate and capacity are the key to the existence of such coding1 . The sender’s strategy: random coding Let us show how to construct a random choice of codewords to generate a (M, n) code for our sender-receiver game. Consider the binary channel ν(ε0 , ε1 ) and its nth extension ν n (ε0 , ε1 ). Following Shannon’s construction random codes are generated, for each state of nature, according to the probability distribution θ that maximizes the mutual information I(X; Y ). In other words, let us assume a binary random variable Xθ that takes value 0 with probability θ and value 1 with probability 1 − θ. Then, let Yθ be the random variable deﬁned by the probabilistic transformation of input variable Xθ through the channel, with probability distribution: Yθ = {(1 − ε0 )θ + ε1 (1 − θ), ε0 θ + (1 − ε1 )(1 − θ)}. Therefore the mutual information between Xθ and Yθ is equal to: I(Xθ ; Yθ ) = H(Yθ ) − H(Yθ |Xθ ) = H({(1 − ε0 )θ + ε1 (1 − θ), ε0 θ + (1 − ε1 )(1 − θ)}) − [θH(ε0 ) + (1 − θ)H(ε1 )], where θ is obtained as the solution of the optimization problem: θ = arg max I(Xθ , Yθ ) θ 1 Notice that for a fixed C, it is always possible to find a length n, large enough, to guarantee Shannon’s Theorem. Alternatively, given a fixed r, we can always find a noisy structure, a channel, achieving this transmission rate. 8 Denoting by p(x) the distribution of Xθ according to θ, generate 2nR codewords, n # p(xi ). i.e., a (M, n) code at random according to p(x) = i=1 The M codewords can be displayed as the rows of a matrix:   x1 (0) x2 (0) ... xn (0)  ... ... ... ... ζ= x1 (M − 1) x2 (M − 1) . . . xn (M − 1) and therefore the probability of such a code is: p(ζ) = 2nR n #−1 # p(xi (ω)). ω=0 i=1 The receiver’s strategy: jointly typical decoding The receiver’s strategy is based on a statistical property derived from the weak law of large numbers. This property tell us when two sequences are probabilistically related. Definition 2 The set Anη of jointly typical sequences {x, y} with respect to the distribution p(x, y) is the set of n-sequences with empirical entropy η-close to the true entropy,i.e. {(x, y) ∈ X × Y : -− n1 -log p(x) − H(X)- < η; -− n1- log p(y) − H(Y )- < η and n Aη = -− 1 log p(x, y) − H(X, Y )- < η} n A channel outcome y ∈ Y will be decoded as the ith index if the codeword xi ∈ X is “jointly typical ” with the received sequence y: two sequences x and y are jointly η-typical if the pair (x, y) is η-typical with respect to the joint distribution p(x, y) and both x and y are η-typical with respect to their marginal distributions p(x) and p(y). In words, a typical set with tolerance η, Anη , is the set of sequences whose empirical entropy differ by no more than η from their true entropy. Shannon’s communication protocol: Let us apply the above concepts to the extended communication game Γnυ . The sender communicates her private information, through the nth extension of the noisy channel ν(ε0 , ε1 ), by generating M codewords of length n from the probability θ which maximizes the capacity of the channel. The communication protocol has the following sequence of events: 1. The realization of such codes is revealed to both the sender and the receiver. 2. The sender is informed about the true state of nature and sends message xi associated to i ∈ Ω. # 3. The receiver observes a sequence y, according to p(y|x) = ni=1 p(yi |xi ) 9 4. The receiver updates the possible state of nature, and decides that index l ∈ Ω was sent if the following conditions are satisﬁed: • (xl , y) are jointly typical. • There is no other index k ∈ Ω such that (xk , y) are jointly typical. • If no such l ∈ Ω exists, then an error will be declared. 5. Finally, the receiver chooses an action in Γ according to his decoding rule: • if y is only jointly typical with xl , he takes action al , • otherwise, no action is taken. Shannon was the ﬁrst one to show that good codes exists. Given the above strategies and Shannon’s Theorem, we can construct a good code for information transmission purposes in the following way: 1. Choose ﬁrst the θ that maximizes the mutual information I(X; Y ) and generate a realization of the random code. Then, for all η there exists an n∗ such that η for all n ≥ n∗ , the empirical entropy of each realized code is at distance 12 to H(X). 2. By the jointly typical decoding rule, any output message y is decoded as either a unique input coding x, or an error is declared. When no error is declared, the decoding rule translates to the condition that the distance between the empirical η entropy of the pair (x, y) and the true entropy H(X, Y ) is smaller than 12 . 3. By the proof of the above Shannon’s Theorem (Cover and Thomas, page 200– (n) 202), the average probability of error Pe , averaged over all codebooks, is smaller η than 2 . Therefore, for a ﬁxed n ∈ [n∗ , ∞), there shall exist a realization of a codebook satisfying that at least half of its codewords have conditional probability of error less than η. In particular, its maximal probability of error λ(n) is less than η. Notice that in order to apply this protocol to a standard sender-receiver game, one needs to deﬁne an assignment rule when an error is declared in Shannon’s protocol. This rule assigns an action to the decoding errors and allows us to completely specify the receiver’s strategy. Remark: Shannon’s Theorem is an asymptotic result and establishes that for all η− approximations there exists a large enough n guaranteeing a small average error related to such η. By the proof of the Theorem (Cover and Thomas, page 200-202), the average error has two terms. The ﬁrst one comes from the Jointly Typical Set deﬁned by such a threshold η. Here, again for large enough n, the probability that a realized output sequence is not jointly typical with the right code is very low. The second term comes from the declared errors in Shanon’s protocol, which have a probability of 2{−n(I(X:Y )−3η))} of taking place and which is very small when n is large enough. 10 Therefore, both probabilities are bigger or smaller depending on both n and how many outcomes are rightly declared, and they are important to partition the output sequence space. When we focus on ﬁnite-time communication protocols, i.e., when n and η are both ﬁxed, disregarding asymptotic assumptions, we cannot guarantee that the above probabilities are small enough with respect to n. Actually, the η-approximation and the corresponding different associated errors can generate different partitions of the output space. Therefore, careful attention shall be paid to generate a partition in such situations. 3.2 Nash Equilibrium Codes We have deﬁned good information transmission codes. They come from asymptotic behavior. Now, we look for ﬁnite communication-time codes and such that no player has an incentive to deviate. Let Yl be the set of y’s in Y such that the receiver decodes all of them as index l ∈ {0, 1, . . . , M − 1}. From the equilibrium conditions 1 and 2 in section 2: Proposition 1 A code (M, n) is a Nash Equilibrium code if and only if i) p(y|x(i)) ≥ p(y|x(j)) ( ( ∀i (= j ∈ M , and d(y) = i ii) y∈Yi p(y|x(i)) ≥ y∈Yi p(y|x), for all x ∈ {0, 1}n . The question that arises is whether Shannon’s strategies are Nash equilibrium strategies of the extended communication game Γnν . Particularly, we rewrite condition i) above in terms of the entropy condition of the jointly typical sequences. For any two indexes l and k, let xl = x(l), and xk = x(k), then d(y) = l, whenever p(y|xl ) ≥ p(y|xk ) ∀xk ∈ M Alternatively, there exist η > 0 such that 1 1 − log p(xl , y) − H(X, Y ) < η and − log p(xk , y) − H(X, Y ) > η. n n By Deﬁnition 3, set Anη is the set of jointly typical sequences. Consider y ∈ Yn such that (x0 , y) ∈ Anη and (x1 , y) ∈ / Anη . Formally: - 1 -− log p(x0 , y) − H(X, Y )- < η and -− 1 log p(x1 , y) − H(X, Y )- ≥ η - n - n Therefore if y were decoded as l, we could assert that y is jointly typical with xl , and not jointly typical with any other xk . It is straightforward to check that the opposite is not true, that is, even if the empirical entropy of p(xl , y) were closer than that of p(xk , y) to the true entropy, then the conditional probability of xl given 11 y would not need be bigger than the conditional probability of xk given y. In fact there are four possible inequalities: 1. − n1 log p(x0 , y) − H(X, Y ) < η and − n1 log p(x1 , y) − H(X, Y ) > η. In this case we obtain that p(x0 |y) > 2−n(H(X,Y )+η) > p(x1 |y) p(y) and therefore, if (x0 , y) is more statistically related than (x1 , y), then the conditional probability of x0 given y will be greater than the conditional probability of x1 given y. 2. n1 log p(x0 , y) + H(X, Y ) < η and n1 log p(x1 , y) + H(X, Y ) > η. In this case we obtain the opposite conclusion. Namely, p(x0 |y) < 2−n(H(X,Y )−η) < p(x1 |y) p(y) and now the above condition shows that even if the empirical entropy of p(x0 , y) were closer than that of p(x1 , y) to the true entropy, then the conditional probability of x1 given y could be bigger than or equal to the conditional probability of x0 given y. 3. − n1 log p(x0 , y) − H(X, Y ) < η and n1 log p(x1 , y) + H(X, Y ) > η. Here, p(x0 , y) > 2−n(H(X,Y )+η) p(y) and 2−n(H(X,Y )−η) < p(x1 , y). p(y) and no relationship between p(x0 |y) and p(x1 |y) can be established. Finally, 4. n1 log p(x0 , y) + H(X, Y ) < η and − n1 log p(x1 , y) − H(X, Y ) > η. As the third case above, we cannot establish any order between p(x0 |y) and p(x1 |y). Indeed, we get: p(x0 |y) < 2−n(H(X,Y )+η) 2−n(H(X,Y )−η) and > p(x1 |y). p(y) p(y) Condition i) above establishes an order on the conditional probabilities of each output sequences y, for all input sequences. We have seen that when the entropy condition of the Jointly Typical Set is satisﬁed without the absolute value, then it properly orders these conditional probabilities. Otherwise it may fail to do so. Consider now condition ii). Let Yl be the set of y ∈ Y such that p(y|xl ) ≥ p(y|xk ) ∀xk ∈ M . Summing over all y in Yl we get: & & p(y|xl ) ≥ p(y|xk ) for all xk ∈ M. y∈Yl y∈Yl 12 The second condition says that the aggregated probability of partition Yl when σlS was sent is higher than such probability2 when any other code, even those sequences never taken into account in the realized codebook, are sent. 4 Examples: Shannon versus Game Theory We wish to investigate whether the random coding and jointly typical decoding are robust to a game theoretical analysis, i.e. whether they are ex-ante equilibrium strategies. Since, the ex-ante equilibrium is equivalent to playing a Nash for every code realization, then if for some code realizations the players’ strategies are not a Nash equilibrium, then no ex-ante equilibrium will exist. In the sequel we analyze three examples. The ﬁrst two examples correspond to two realizations of the random coding. The former consists of the “natural” coding in the sense that the signal strings do not share a common digit, either 0 or 1, and then the decoding rule translates to the “majority” rule; the latter is a worse codebook realization. For each code realization we show how to generate a partition of the output space, the receiver’s strategy and the players’ equilibrium conditions. In particular, we prove that receiver’s equilibrium condition is not fulﬁlled for the second code realization. The last example offers a sender’s deviation. Fix a Sender-Receiver “common interest” game Γ where nature chooses ωi , i = 0, 1, according to the law q = (q0 , q1 ) = (0.5, 0.5). The Receiver’s set of actions is A = {a0 , a1 } and the payoff matrices for both states of the nature are deﬁned by: S ω0 ω1 R a0 a 1 (1, 1) (0, 0) (0, 0) (1, 1) Consider the noisy channel ν(ε0 ,ε1 ) where the probability transition matrix p(y|x) expressing the probability of observing the output symbol y, given that the symbol x was sent, is p(1|0) = ε0 = 0.1 and p(0|1) = ε1 = 0.2. Deﬁne the binary random variable Xθ which takes value 0 with probability θ and value 1 with probability 1 − θ. Let Yθ be the random variable deﬁned by the channel probabilistic transformation of the input random variable Xθ with probability distribution: Yθ = {(1 − ε0 )θ + ε1 (1 − θ), ε0 θ + (1 − ε1 )(1 − θ)}. 2 the codeword xl is λl = P r(y ∈ ∪k"=l Yk |xl ) = ! Recalling that the error λl of decoding ! y∈Y / l p(y|xl ), and that the right side! y∈Yl p(y|xk ) is part of the λk error, then the Sender’s condition could be written as 1 − λl ≥ y∈Yl p(y|xk ) for all xk ∈ M , which means that the aggregated probability of the partition Yl when σlS was sent is higher than the corresponding part of the k-error of any code even for sequences never taken into account in the realized codebook. 13 Therefore the mutual information between Xθ and Yθ is equal to: I(Xθ ; Yθ ) = H(Yθ ) − H(Yθ |Xθ ) = H({(1 − ε0 )θ + ε1 (1 − θ), ε0 θ + (1 − ε1 )(1 − θ)}) − [θH(ε0 ) + (1 − θ)H(ε1 )]. Let θ! = arg maxθ I(Xθ , Yθ ). Then for channel ν(ε0 ,ε1 ) = ν(0.1,0.2), this probability θ! = 0.52. Random codes are generated, for each state of nature, according to the probability distribution θ! = 0.52. The code corresponding to index 0, i.e. state ω0 , say x0 , is ! Similarly, x1 is the code corresponding generated by n independent realizations of θ. to index 1, i.e. state ω1 . Let us consider that a code is chosen uniformly at random and sent through the noisy channel (by sending n bits one after the other). 4.1 A code fulfilling the Nash equilibrium conditions We present ﬁrst the realization of the “natural code” in full detail because it is quite familiar and will help the reader to follow later a more complicated example. To make the analysis very simple consider that the communication goes for 3 periods and let Γ3ν be the noisy communication extended game. Suppose that a speciﬁc and common knowledge realization of the random code is: . / . / x0 = 0, 0, 0 x1 (0) x2 (0) x3 (0) = x1 = 1, 1, 1 x1 (1) x2 (1) x3 (1) Nature informs the sender about the true state of nature, therefore, the sender’s strategy σjS , j = 0, 1 is sending: σ0S = x0 = 000, if ω = ω0 σ1S = x1 = 111, if ω = ω1 The receiver observes a transformed sequence y, with transition probability # p(y|x) = 3i=1 p(yi |xi ) and tries to guess which message has been sent. He will consider that index j was sent if (xj , y) are jointly typical and there is no other index k, such that (xk , y) are jointly typical. If no such index j exists, then an error will be declared. Let us proceed to construct the receiver’s strategy, by generating a partition of the set of outcome sequences Y = {0, 1}3 . To apply the jointly typical decoding rule, it is needed to calculate the functions3 : log(p(x0 , y)) − H(X, Y )| 3 log(p(x1 , y) ∆x1 (y) = | − − H(X, Y )| 3 ∆x0 (y) = | − 3 Notice that only the third condition in the definition of jointly typical sequences is the binding condition to be checked. 14 which measures the difference between the empirical entropy of each sequence in Y and the true entropy H(X, Y ) = 1, 6. For example, for y = 000, for our speciﬁc channel ν(0.1, 0.2) and since θ! = 0.5, then p(y = 000|x0 = 000) = (p(0|0))3 = (1 − ε0 )3 = 0.93 = 0.59; p(y = 000|x1 = 111) = (p(0|1))3 = ε31 = 0.23 = 0.0003; p(x0 , y) = p(y|x0 )p(x0 ) = 0.59 × (0.5)3 , and p(x1 , y) = p(y|x1 )p(x1 ) = 0.0003 × (0.33 ), and then: ∆x0 (y =000) = 0.485 and ∆x1 (y =000) = 1.801 Now we have to choose an η-approximation in order to partition the output message space. Fix η = 0.64. The reason for such a choice will become clear at the end of the example. Recall that such value is the upper bound of the distance between the empirical entropy and the true entropy to deﬁne jointly typical sequences. Then, the jointly typical decoding rule states that a given y ∈ Y is jointly typical with x0 = 000, and with x1 = 111, respectively, whenever ∆x0 (y) < η = 0.64 ∆x1 (y) < η = 0.64, respectively The jointly typical decoding rule allows the receiver to deﬁne the following subsets of Y, P00 = {y ∈ Y :∆x0 (y) < η} P0¬0 = {y ∈ Y :∆x0 (y) ≥ η} P1¬1 = {y ∈ Y :∆x1 (y) ≥ η} P11 = {y ∈ Y :∆x1 (y) < η} The ﬁrst set P00 contains all the sequences in Y that are probabilistically related to input sequence x0 = 000. Conversely, set P0¬0 refers to all the sequences of Y that are not probabilistically related to x0 . Similarly, P11 is the set of sequences in Y that are probabilistically related to input sequence x1 = 111, while P1¬1 is the set of sequences in Y that cannot be related to x1 . These sets are: P00 = {000, 001, 010, 100} P0¬0 = {111, 110, 101, 011} P1¬1 = {000, 001, 010, 100} P11 = {111, 110, 101, 011} Denote by P0 = P00 ∩ P1¬1 = {y ∈ Y :∆x0 (y) < η and ∆x1 (y) ≥ η} P1 = P0¬0 ∩ P11 = {y ∈ Y :∆x1 (y) < η and ∆x0 (y) ≥ η}. 15 the set of all sequences of Y which are uniquely related in probability to x0 and x1 , respectively. Since, P00 = P1¬1 this implies that no matters whether x0 or x1 has been sent, the receiver univocally assigns x0 to all sequences in P00 or P1¬1 . Similarly, P0¬0 = P11 implies that the receiver decodes all the sequences in either of these sets as corresponding to x1 . Moreover, since P0 ∩ P1 = ∅ and P0 ∪ P1 = Y, then the typical decoding rule generates a true partition. In fact, the jointly typical decoding rule is in this case equivalent to the majority rule decoding. To see this let yk be an output sequence with k zeros. Then, p(x0 | yk ) = p(yk | x0 )p(x0 ) 1 (1 − ε0 )k ε03−k ≥ = 3−k k k p(y ) 2 (1 − ε0 )k ε0 + ε1 (1 − ε1 )3−k if and only if k ≥ 2. The jointly typical decoding rule gives rise to the receiver’s strategy, for each y ∈ Y: σyR = ai , whenever y ∈ Pi To show that the above strategies are a Nash equilibrium in pure strategies, let us check that both the sender and the receiver’s strategies are a best response to each other. 1) The receiver’s Nash equilibrium condition translates to her choice of action a0 whenever p(y|σ0S ) ≥ p(y|σ1S ), and of action a1 otherwise. In table 1 below it can be checked that all output sequences y, that satisfy with strict inequality the condition p(y|σ0S ) ≥ p(y|σ1S ) are exactly those belonging to set P0 , and those for which p(y|σ1S ) ≥ p(y|σ0S ) with strict inequality are the ones in P1 . Therefore the receiver’s jointly typical decoding rule is a best response to the sender’s coding strategy. y p(y|x0 ) p(y|x1 ) 000 0.729 0.008 001 0.081 0.032 010 0.081 0.032 011 0.009 0.128 100 0.081 0.032 101 0.009 0.128 110 0.009 0.128 111 0.001 0.512 y 000 001 010 011 100 101 110 111 Table 1 2) The sender’s Nash equilibrium condition, given the receiver’s jointly typical decoding, amounts to choosing input sequences σ0S and σ1S , in states ω0 and ω1 , 16 respectively, such that & p(y|σ0S )u(σyR , ω0 ) = y∈Y & y∈Y p(y|σ1S )u(σyR , ω1 ) = & p(y|σ0S ) ≥ p(y|σ0$S ) = p(y|σ1S ) ≥ y∈P1 & y∈P1 & p(y|σ0$S )u(σyR , ω0 ) y∈Y y∈P0 y∈P0 & & p(y|σ1$S ) = & p(y|σ1$S )u(σyR , ω1 ) y∈Y for any other input sequences σ0$S and σ1$S , respectively. ( ( Let y∈P0 p(y|x0 ) and y∈P1 p(y|x1 ) denote the aggregated probabilities of the sequences in P0 and P1 when input sequences x0 and x1 are sent. Given the symmetry of the sequences it suffices to check the ones shown in the table 2 below: ( ( x0 x1 y∈P0 p(y|x0 ) y∈P1 p(y|x1 ) 000 0.972 0.028 000 001 0.846 0.154 001 011 0.328 0.672 011 111 0.104 0.896 111 Table 2 Clearly, if the state is ω0 , then obeying the communication protocol and sending σ0S = 000 will be a best reply to the receiver’s strategy, since sending instead any other input sequence will only decrease the sender’s payoffs, as shown in the left hand side of the above table. Similarly, if the state is ω1 , sending σ1S = 111 will maximize the sender’s payoffs against the receiver’s strategy, as shown in the right hand side of the above table. To conclude this example we display in Figure 1 the relationship between the η-approximation and the existence of an output set partition. The horizontal axes represents the output set sequences and the vertical axes are the functions ∆x0 (y) (the dotted line) and ∆x1 (y) (the continuos line) for the natural coding x0 = 000 and x1 = 111. Different values of η have been plotted in the same Figure 1. We obtain the following remarks: • For an η = 0.9 and y ∈ Y, if the value of ∆x0 (y) goes by above of the constant function η = 0.9, then that of ∆x1 (y) will go by below of η, and the same will happen in the other way around. By the Jointly Typical condition every y is uniquely related in probability to either x0 or x1 . Therefore for η = 0.9 a partition of set Y is easily generated. • The same reasoning applies to any η in (0.6, 1.08). This is why we have chosen η = 0.64. 17 • For η ≥ 1.08 or η ≤ 0.6, there are output sequences belonging to both the output set associated to x0 and that associated to x1 . Hence, there is a need to uniquely reassign those sequences to one of the them. In sum, under the natural coding x0 = 000 and x1 = 111 it is possible to ﬁnd a range of η which enables to construct a partition of the output set and therefore support the strategies of the communication protocol as a Nash equilibrium of the extended communication game. 3• 2η = 1.7 - 1,5- • - - - • • • • • η = 0.64 0,5• I 000 I 001 I 010 I 011 - - - I 100 I 101 I 110 I 111 Figure 1: Partition of the output message space around x0 = 000, x1 = 111. However, other realizations of the random code might not guarantee the existence of such an η to construct such partition as the following code realization shows. 4.2 A receiver’s deviation Suppose that a new realization of the code is: . / . / x1 (0) x2 (0) x3 (0) x0 = 0, 1, 0 = x1 (1) x2 (1) x3 (1) x1 = 0, 1, 1 where, as above, the channel is ν(ε0 ,ε1 ) = ν(0.1,0.2) and Γ3ν is the noisy communication extended game. Fix now η = 0.37. 18 Let us consider that the receiver observes the output sequence y = 010. Let us calculate p(y = 010|x0 = 010) = 0.648 and p(y = 010|x1 = 011) = 0.144, and the functions: log(p(x0 , y)) − H(X, Y)| = 0.40 3 log(p(x1 , y) − H(X, Y)| = 0.36 ∆x1 (y) = | − 3 ∆x0 (y) = | − For η = 0.37, Shannon protocol dictates that the receiver decodes y as x1 and plays action a1 . This situation would correspond with case 3 in subsection 3.1 where the protocol may not order the conditional probabilities. In fact, the Nash equilibrium condition for the receiver when y = 010 translates to choosing action a0 since, as shown above, the conditional probability of y given x0 = 010 (0.648) is bigger than the conditional probability of y given x1 = 011 (0.144). 4.3 A sender’s deviation Fix now4 n = 5 and suppose that the speciﬁc and common knowledge realization of the random code is the following: 0 1 . / x1 (0) x2 (0) . . . x5 (0) x0 = 0, 0, 0, 0, 0 = ( ( ( x1 = 0, 0, 0, 1, 1 x1 1) x2 1) . . . x5 1) where the two signal strings share the ﬁrst three digits, and therefore only the last two digits are different. Then σjS , j = 0, 1 is: σ0S = x0 = 00000, if ω = ω0 σ1S = x1 = 00011, if ω = ω1 To construct the receiver’s strategy, we repeat the above computations of sets P00 , P0¬0 , P1¬1 , P11 , P0 and P1 of Y. Notice that P00 (= P1¬1 implies that the receiver cannot univocally assign some y in Y to x0 no matter whether x0 or x1 has been sent. Similarly, P0¬0 (= P11 with the same meaning for x1 . Therefore, P0 ∪P1 " Y. Let us deﬁne the set P2 = Y−P0 ∪P1 : P2 = {y ∈ Y :∆x0 (y) < η and ∆x1 (y) < η} ∪ {y ∈ Y : ∆x1 (y) ≥ η and ∆x0 (y) ≥ η} = {00100, 00111, 01000, 01011, 01100, 01111, 10000, 10011, 10100, 10111, 11000, 11011}. This set contains all the sequences in Y, which the receiver is not able to decode, i.e., any y ∈ P2 cannot be univocally assigned either to x0 or x1 : the errors in 4 We run a systematic search computation for a sender’s deviation when n < 5 and we concluded that there was none. 19 Shannon’s approach. Therefore, the jointly typical decoding does not generate a partition of Y, and the receiver does not know how to take an action in Γ. There is a need then to assign the sequences in P2 to either P0 or P1 . Consider that the speciﬁc rule is to assign each sequences y ∈ P2 , to that element of the input sequence which is probabilistically closer to them5 , namely y ∈ P0 if ∆x0 (y) < ∆x1 (y), and y ∈ P1 otherwise. Then: P0 = {00100, 01000, 01100, 10000, 10100, 11000, 11100} P1 = {00000, 00001, 00010, 00011, 00101, 00110, 00111, 01001, 01010, 01011, 01101, 01110, 01111, 10001, 10010, 10011, 10111, 10011, 10111, 10110, 10111, 11001, 11010, 11011, 11101, 11110, 11111} Therefore, P0 ∩ P1 = ∅ and P0 ∪ P1 = Y, and the partition gives rise to the receiver’s strategy σyR = ai , whenever y ∈ Pi , and for each y ∈ Y. ( ( Recalling that p(P0 ) = y∈P0 p(y|σ0S ) and p(P1 ) = y∈P1 p(y|σ1S ), then it is easy to calculate that p(P0 ) = 0.729 and p(P1 ) = 0.271. Consider the sender’ deviation, i.e., σ0dS = xd0 = 11100, if ω = ω0 , instead of σ0S = x0 = 00000 σ1 = x1 = 00011, if ω = ω1 This deviation does not change the partition but does change the probability ( associated to sets P0 and P1 . In particular, y∈P1 p(y|x0 = 00000) = 0.21951 and ( d y∈P1 p(y|x0 = 00011) = 0.98916. Suppose that ω = ω0 and let σ0S and σyR be the strategies of following faithfully the protocol in Γ5ν , for each y ∈ Y. Then, the sender’s expected payoffs are & $ % p(y|σ0S )1 = 0.21951 π0S = π0S (σ0S , σyR y ) = π0dS = π0S (σ0dS , $ % σyR y ) y∈P0 = & p(y|σ0dS )1 = 0.80352 y∈P0 and the sender will then deviate. 5 This rule is in the spirit of the maximum likelihood criterion. 20 5 Concluding remarks Information Theory tells us that whatever the probability of error in information transmission, it is possible to construct error-correcting codes in which the likelihood of failure is arbitrarily low. In this framework, error detection is the ability to detect the presence of errors caused by noise, while error correction is the additional ability to reconstruct the original error-free data. Detection is much simpler than correction, and the basic idea is to add one or more “check” digits to the transmitted information (e.g., some digits are commonly embedded in credit card numbers in order to detect mistakes). As is common in Information Theory protocols, both the sender and the receiver are committed to use speciﬁc rules in order to construct error correcting/detecting codes. Shannon’s theorem is an important theorem in error correction which describes the maximum attainable efficiency of an error-correcting scheme for expected levels of noise interference. Namely, Shannon’s Theorem is an asymptotic result and establishes that for all small tolerance it is possible to construct error-correcting codes in which the likelihood of failure is arbitrarily low, thus providing necessary and sufficient conditions to achieve a good information transmission. Nevertheless, the asymptotic nature of such a protocol masks the difficulties to apply information theory protocols to ﬁnite communication schemes in strategic sender-receiver games. In this paper we consider a game-theoretical model where a sender and a receiver are players trying to coordinate their actions through a ﬁnite time communication protocol à la Shannon. Firstly, given a common knowledge coding rule and an output message, we offer the Nash equilibrium condition for the extended communication game. Speciﬁcally, the receiver’s equilibrium conditions are summarized by choosing the action corresponding to that state of nature for which the conditional probability of the received message is higher. This implies an ordering of the probability of receiving a message conditional to any possible input message. On the other hand, given the realized state of nature and the receiver’s partition of the output message space generated by the coding and the decoding rules, the sender’s equilibrium conditions are speciﬁed by choosing the input message maximizing the sum of the above conditional probabilities over all output messages belonging to the partition corresponding to that state of nature. Secondly, we relate the Nash equilibrium strategies to those of Shannon’s coding and decoding scheme. Particularly, we rewrite the receiver’s Nash constraint in terms of the entropy condition of the Jointly Typical Set, pointing out that such entropy condition may not be enough to guarantee the partition of the output space. Finally, we provide two counterexamples to illustrate our ﬁndings. Consequently, coding and decoding rules under Information Theory satisfy a set of information transmission constraints, but they may fail to be Nash equilibrium 21 strategies. References [1] Blume, A., O. J. Board and K. Kawamura (2007): “Noisy Talk”, Theoretical Economics, Vol. 2, 395–440. [2] Cover, T. M. and J. A. Thomas (1991): Elements of information theory. Wiley Series in Telecomunications. Wiley. [3] Crawford, V. and J. Sobel (1982): “Strategic Information Transmission”, Econometrica, Vol. 50, 1431–1451. [4] Gossner, O., P. Hernández and A. Neyman (2006): “Optimal use of communication resources”, Econometrica, Vol. 74, 1603–1636. [5] Gossner, O. and T. Tomala (2006): “Empirical Distributions of beliefs under imperfect monitoring”, Mathematics of Operations Research, Vol. 31, 13–31. [6] Gossner, O. and T. Tomala (2007): “Secret Correlation in Repeated Games with Imperfect Monitoring”, Mathematics of Operations Research, Vol. 32, 413–424. [7] Gossner, O. and T. Tomala (2008): “Entropy bounds on Bayesian Learning”, Journal of Mathematical Economics, Vol. 44, 24–32. [8] Hernández, P. and A. Urbano. (2008): “Codiﬁcation Schemes and Finite Automata, Mathematical Social Sciences Vol. 56, 3, 395–409. [9] Hernández, P., A. Urbano and J. Vila (2010): “Grammar and Language: An Equilibrium Approach”, Working Paper ERI-CES 01/2010. [10] Koessler, F. (2001): “Common Knowledge and Consensus with Noisy Communication”, Mathematical Social Sciences, 42(2), 139–159. [11] Mitusch, K. and R. Strausz (2005): “Mediation in Situations of Conﬂict and Limmited Commitment”, Journal of Law, Economics and Organization, vol. 21(2), 467–500. [12] Myerson, R. (1991): Game Theory. Analysis of conﬂict. Harvard University Press, Cambridge, Massachusetts, London, England. [13] Rubinstein, A. (1989): “The Electronic Mail Game: A Game with Almost Common Knowledge”, American Economic Review 79, 385–391. [14] Shannon, C.E. (1948): “A Mathematical Theory of Communication,” Bell System Technical Journal, 27, 379–423; 623–656. 22

RELATED PAPERS

RELATED TOPICS

Log In

Coding and

Coding and

Related Papers

RELATED PAPERS

RELATED TOPICS