2014 - Lectures Notes On Game Theory - WIlliam H Sandholm
2014 - Lectures Notes On Game Theory - WIlliam H Sandholm
2014 - Lectures Notes On Game Theory - WIlliam H Sandholm
William H. Sandholm
October 21, 2014
Contents
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
4
7
8
9
9
9
15
16
17
18
20
21
21
26
29
31
31
33
38
41
42
Many thanks to Katsuhiko Aiba, Emin Dokumac, Danqing Hu, Rui Li, Allen Long, Ignacio Monzon,
Michael Rapp, and Ryoji Sawa for creating the initial draft of this document from my handwritten notes
and various other primitive sources.
Department of Economics, University of Wisconsin, 1180 Observatory Drive, Madison, WI 53706, USA.
e-mail: whs@ssc.wisc.edu; website: http://www.ssc.wisc.edu/whs.
1.6
Bayesian Games
3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
121
121
125
127
Repeated Games
4.1 The Repeated Prisoners Dilemma . . . . . . . . . . . . . . .
4.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 The Folk Theorem . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Computing the Set of Subgame Perfect Equilibrium Payoffs
4.4.1 Dynamic programming . . . . . . . . . . . . . . . . .
131
131
136
138
146
146
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.5
In the first part of the theorem, the only if direction follows immediately from the
fact that the real numbers are ordered. For the if direction, assign the elements of Z
utility values sequentially; the weak order axioms ensure that this can be done without
contradiction.
Ordinal refers to the fact that only the order of the values of the utility function have
meaning. Neither the values nor differences between them convey information about
intensity of preferences. This is captured by the second part of the theorem, which says
that utility functions are only unique up to increasing transformations.
If Z is (uncountably) infinite, weak order is not enough to ensure that there is an ordinal
utility representation:
Example 0.2. Lexicographic preferences. Let Z = R2 , and suppose that a b a1 > b1 or
[a1 = b1 and a2 b2 ]. In other words, the agents first priority is the first component of
the prize; he only uses the second component to break ties. While satisfies the weak
order axioms, it can be shown that there is no ordinal utility function that represents .
In essence, there are too many levels of preference to fit them all into the real line.
There are various additional assumptions that rule out such examples. One is
Continuity: Z Rn , and for every a Z, the sets {b : b a} and {b : a b} are closed.
Notice that Example 0.2 violates this axiom.
Theorem 0.3. Let Z Rn and let be a preference relation. Then there is a continuous
ordinal utility function u : Z Rn that represents if and only if is complete, transitive, and
continuous.
In the next section we consider preferences over lotteriesprobability distributions over
a finite set of prizes. Theorem 0.3 ensures that if preferences satisfy the weak order and
continuity axioms, then they can be represented by a continuous ordinal utility function.
By introducing an additional axiom, one can obtain a more discriminating representation.
lottery 1:
lottery 2:
$0 with probability 12 .
One tempting possibility is to look at expected values: the weighted averages of the
possible values, with weights given by probabilities.
lottery 1: $1M 1 = $1M
lottery 2: $2M 12 + $0M
1
2
= $1M
$0
$10
p = (.2, .8, 0)
.9
.1
$0
$100
q = (.9, 0, .1)
$100
r = (0, 0, 1)
Example 0.6. c = .7p + .3q = .7(.2, .8, 0) + .3(.9, 0, .1) = (.41, .56, .03)
.7
.3
.2
$0
.8
$10
.9
.1
$0
.41 $0
.56 $10
.03
$100
$100
Preference axioms:
(NM1) Weak order: is complete and transitive.
(NM2) Continuity: For all p, q, and r such that p q r, there exist , (0, 1) such that
(1 )p + r q (1 )r + p.
Example 0.7. p = win Nobel prize, q = nothing, r = get hit by a bus. (Since p, q and r are
supposed to be lotteries, we should really write p = win Nobel prize with probability 1,
etc.)
(NM3) Independence: For all p, q, and r and all (0, 1),
p q p + (1 )r q + (1 )r.
Example 0.8. p = (.2, .8, 0), q = (.9, 0, .1), r = (0, 0, 1)
.2
.5
.5
$0
.8
$10
$100
.9
.5
.5
.1
1
$0
$100
$100
We say that u : Z R provides an expected utility representation for the preference relation
on Z if
(1)
pq
X
zZ
u(z) p(z)
u(z) q(z).
zZ
The function u is then called a von Neumann-Morgenstern (or NM) utility function.
6
The theorem tells us that as long as (NM1)(NM3) hold, there is some way of
assigning numbers to the alternatives such that taking expected values of these
numbers is the right way to evaluate lotteries over alternatives.
(ii) The values of an NM utility function are sometimes called cardinal utilities (as
opposed to ordinal utilities). What more-than-ordinal information do cardinal
utilities provide?
The nature of this information can be deduced from the fact that a NM utility
function is unique up to positive affine transformations.
b
Example 0.10. Let a, b, c Z, and suppose that ua > uc > ub . Let = uuac u
. This quantity is
ub
not affected by positive affine transformations. Indeed, if v = u + , then
What if the probabilities are not given? We call an agent Bayesian rational (or say that he
has subjective expected utility preferences) if
(i)
d
0, 3
1, 1
A
B
a
b
3, 1 0, 0
0, 0 1, 3
Suppose that 1 plays A with probability 34 , and 2 plays a with probability 14 . Then
= (1 , 2 ) = (1 (A), 1 (B)), (2 (a), 2 (b)) = ( 43 , 14 ), ( 41 , 34 )
The pure strategy profile (A, a) is played with probability 1 (A) 2 (a) =
complete product distribution is presented in the matrix below.
2
1
a ( 4 ) b ( 34 )
3
9
A ( 34 )
16
16
1
1
3
B ( 14 )
16
16
3
4
1
4
3
.
16
The
When player i has two strategies, his set of mixed strategies Si is the simplex in R2 , which
is an interval.
A
1
10
When player i has three strategies, his set of mixed strategies Si is the simplex in R3 ,
which is a triangle.
A
When player i has four strategies, his set of mixed strategies Si is the simplex in R4 ,
which is a pyramid.
A
D
B
C
Correlated strategies
In some circumstances we need to consider the possibility that players all have access to
the same randomizing device, and so are able to correlate their behavior. This is not as
strange as it may seem, since any uncertain event that is commonly observed can serve to
correlate behavior.
Example 1.3. Battle of the Sexes revisited.
Suppose that the players observe a toss of a fair coin. If the outcome is Heads, they play
(A, a); if it is Tails, they play (B, b).
A formal description of their behavior specifies the probability of each pure strategy
profile: = ((A, a), (A, b), (B, a), (B, b)) = ( 12 , 0, 0, 12 ).
2
a
1
A
B
1
2
b
0
1
2
11
This behavior cannot be achieved using a mixed strategy profile, since it requires correlation: any mixed strategy profile putting weight on (A, a) and (B, b) would also put weight
on (A, b) and (B, a):
2
y>0
(1 y) > 0
a
b
x>0
A
xy
x(1 y)
1
(1 x) > 0
B (1 x)y (1 x)(1 y)
all marginal probabilities > 0 all joint probabilities > 0
Q
We call iP Si = S a correlated strategy. It is an arbitrary joint distribution on
Q
iP Si .
Example 1.4. Suppose that P = {1, 2, 3} and Si = {1, . . . , ki }. Then a mixed strategy profile
Q
= (1 , 2 , 3 ) iP Si consists
Qof three
probability vectors of lengths k1 , k2 , and k3 ,
while a correlated strategy iP Si is a single probability vector of length k1 k2 k3 .
Y
iP
Y
Si
Si = correlated strategies.
iP
We write because the items on each side are not the same kinds of mathematical
objects (i.e., they live in different spaces).
Example 1.5. If S1 = {A, B} and S2 = {a, b}, then the set of mixed strategies {A, B} {a, b}
is the product of two intervals, and hence a square. The set of correlated strategies
({A, B} {a, b}) is a pyramid.
A
Aa
Bb
B
Ab
12
Ba
The set of correlated strategies that correspond to mixed strategy profilesin other words,
the product distributions on {A, B} {a, b}form a surface in the pyramid.
Beliefs
One can divide traditional game-theoretic analyses into two classes: equilibrium and
non-equilibrium. In equilibrium analyses (e.g., using Nash equilibrium), one assumes
that players correctly anticipate how opponents will act. In this case, Bayesian rational
players will maximize their expected utility with respect to correct predictions about how
opponents will act. In nonequilibrium analyses (e.g., dominance arguments) this is not
assumed. Instead, Bayesian rationality requires players to form beliefs about how their
opponents will act, and to maximize their expected payoffs given their beliefs. In some
cases knowledge of opponents rationality leads to restrictions on plausible beliefs, and
hence on our predictions of play.
Let us consider beliefs in a two-player game. Suppose for now first that player i expects
his opponent to play a pure strategy, but that i may not be certain of which strategy j will
play. Then player i should form beliefs i S j about what his opponent will do. Note
that in this case, player is beliefs about player j are the same sort of object as a mixed
strategy of player j.
Remarks:
(i)
If player i thinks that player j might randomize, then is beliefs j would need to be
a probability measure on S j (so that loosely speaking, j (S j ).) Such beliefs
can be reduced to a probability measure on S j by taking expectations. Specifically,
let i = Ei j be the mean of a random variable that takes values in S j and whose
distribution is j . Then i S j , and i (s j ) represents the probability that i assigns
to the realization of js mixed strategy being the pure strategy s j . In the end, these
probabilities are all that matter for player is expected utility calculations. Thus in
nonequilibrium analyses, there is no loss in restricting attention to beliefs that only
put weight on opponents pure strategies. We do just this in Sections 1.2 and 1.3.
13
On the other hand, if player j plays mixed strategy j , then player is beliefs are only
correct if i ( j ) = 1. But when we consider solution concepts that require correct
beliefs (especially Nash equilibriumsee Section 1.4), there will be no need to refer
to beliefs explicitly in the formal definitions, since the definitions will implicitly
assume that beliefs are correct.
(ii) When player is beliefs i assigns probability 1 to player j choosing a pure strategy
but put weight on multiple pure strategies, these beliefs are formally identical to
a mixed strategy j of player j. Therefore, the optimization problem player i faces
when he is uncertain and holds beliefs i is equivalent to the the optimization
problem he faces when he knows player j will play the mixed strategy j (see
below). The implications of this point will be explored in Section 1.3.
Now we consider beliefs in games with many players. Suppose again that each player
expects his opponents to play pure strategies, although he is not sure which pure strategies
Q
they will choose. In this case, player is beliefs i are an element of j,i S j , and so are
equivalent to a correlated strategy among player is opponents. Remark (i) above applies
here as well: in nonequilibrium analyses, defining beliefs as just described is without loss
of generality.
(It may be preferable in some applications to restrict a players beliefs about different
opponents choices to be independent, in which case beliefs are described by elements
Q
of j,i S j , the set of opponents mixed strategy profiles. We do not do so here, but we
discuss this point further in Section 1.3.)
In all cases, we assume that if a player chooses a mixed strategy, learning which of his
pure strategies is realized does not alter his beliefs about his opponents.
Expected utility
To compute a numerical assessment of a correlated strategy or mixed strategy profile, a
player takes the weighted average of the utility of each pure strategy profile, with the
weights given by the probabilities that each pure strategy profile occurs. This is called the
expected utility associated with . See Section 0.2.
Example 1.6. Battle of the Sexes once more.
2
A
B
a
3, 1
0, 0
b
0, 0
1, 3
payoffs
A ( 34 )
B ( 41 )
a ( 14 )
3
16
1
16
b ( 34 )
9
16
3
16
probabilities
14
Suppose = (1 , 2 ) = (1 (A), 1 (B)), (2 (a), 2 (b)) = ( 34 , 14 ), ( 14 , 34 ) is played. Then
u1 () = 3
u2 () = 1
3
16
3
16
+0
+0
9
16
9
16
+0
+0
1
16
1
16
+1
+3
3
16
3
16
= 34 ,
= 34 .
ui () =
ui (s) (s).
sS
ui () =
ui (s)
Y
j (s j ) .
jP
sS
ui (i , i ) =
sS
There is a standard
abuse of notation here. In (2) ui acts on correlated strategies (so that
Q
Q
ui :
S R),
jP S j R), in (3) ui acts on mixed strategy profiles (so that ui :
QjP j
and in (4) ui acts on mixed strategy/beliefs pairs (so that ui : Si
j,i S j R).
Sometimes we even combine mixed strategies with pure strategies, as in ui (si , i ). In the
end we are always taking the expectation of ui (s) over the relevant distribution on pure
strategy profiles s, so there is really no room for confusion.
15
equilibrium assumptions.
We always assume that the structure and payoffs of the game are common knowledge: that
everyone knows these things, that everyone knows that everyone knows them, and so on.
Notation:
G = {P , {Si }iP , {ui }iP } a normal form game
Q
a profile of pure strategies for is opponents
si Si = j,i S j
Q
is beliefs about his opponents strategies
i j,i S j
(formally equivalent to a correlated strategy for is opponents)
Remember that (i) in a two-player game, player is beliefs i are the same kind of object
as player js mixed strategy j , and (ii) in a game with more than two players, player is
beliefs i can emulate any mixed strategy profile i of is opponents, but in addition can
allow for correlation.
1.2.1 Strictly dominant strategies
Dominance concerns strategies whose performance is good (or bad) regardless of how
opponents behave.
Pure strategy si Si is strictly dominant if
(5)
In words: player i prefers si to any alternative s0i regardless of the pure strategy profile
played by his opponents.
Example 1.7. Prisoners Dilemma revisited.
c
C 2, 2
1
D 3, 0
d
0, 3
1, 1
Joint payoffs are maximized if both players cooperate. But regardless of what player 2
does, player 1 is better off defecting. The same is true for player 2. In other words, D and
d are strictly dominant strategies.
The entries in the payoff bimatrix are the players NM utilities. If the game is supposed to
represent the banker story from Example 1.1, then having these entries correspond to the
dollar amounts in the story is tantamount to assuming that (i) each player is risk neutral,
16
and (ii) each player cares only about his own dollar payoffs. If other considerations are
importantfor instance, if the two bankers are friends and care about each others fates
then the payoff matrix would need to be changed to reflect this, and the analysis would
differ correspondingly. Put differently, the analysis above tells us only that if each banker
is rational and cares only about his dollar payoffs, then we should expect to see (D, d).
The next observation shows that a Bayesian rational player must play a strictly dominant
strategy whenever one is available.
Observation 1.8. Strategy si is strictly dominant if and only if
(6)
Thus, if strategy si is strictly dominant, then it earns the highest expected utility regardless of
player is beliefs.
While condition (6) directly addresses Bayesian rationality, condition (5) is easier to check.
Why are the conditions equivalent? () is immediate. () follows from the fact that the
inequality in (6) is a weighted average of those in (5).
Considering player is mixed strategies would not allow anything new here: First, a pure
strategy that strictly dominates all other pure strategies also dominates all other mixed
strategies. Second, a mixed strategy that puts positive probability on more than one pure
strategy cannot be strictly dominant (since it cannot be the unique best response to any
si ; see Observation 1.14).
1.2.2 Strictly dominated strategies
Most games do not have strictly dominant strategies. How can we get more mileage the
notion of dominance?
A strategy i Si is strictly dominated if there exists a 0i Si such that
ui (0i , si ) > ui (i , si ) for all si Si
Remarks on strictly dominated strategies:
(i)
17
j,i
S j.
+ 31 ( 12 T + 12 M) = 16 T + 65 M.
(iv) But even if a group of pure strategies are not dominated, mixed strategies that
combine them may be:
T
T
M
B
not dominated
L
R
3, 0,
0, 3,
2, 2,
M
T, M, and B are all best responses to some 2 S2 , and so are not strictly dominated.
But 12 T + 21 M (guarantees 23 ) is strictly dominated by B (guarantees 2). In fact, any
mixed strategy with both T and M in its support is strictly dominated.
1.2.3 Iterated strict dominance
Some games without a strictly dominant strategy can still be solved using the idea of
dominance.
18
Example 1.9. In the game below, 2 does not have a dominated pure strategy.
T
M
B
2
L
C
2, 2 6, 1
1, 3 5, 5
0, 0 4, 2
R
1, 1
9, 2
8, 8
19
T
B
L
R
1, 0,
0, 0,
While the use of weakly dominated strategies is not ruled out by Bayesian rationality
alone, the avoidance of such strategies is often taken as a first principle. In decision
theory, this principle is referred to as admissibility; see Kohlberg and Mertens (1986) for
discussion and historical comments. In game theory, admissibility is sometimes deduced
from the principle of cautiousness, which requires that players not view any opponents
behavior as impossible; see Asheim (2006) for discussion.
It is natural to contemplate iteratively removing weakly dominated strategies. However,
iterated removal and cautiousness conflict with one another: removing a strategy means
viewing it as impossible, which contradicts cautiousness. See Samuelson (1992) for discussion and analysis. One consequence is that the order of removal of weakly dominated
strategies can mattersee Example 1.12 below. (For results on when order of removal
does not matter, see Marx and Swinkels (1997) and sterdal (2005).) But versions of
iterated weak dominance can be placed on a secure epistemic footing (see Brandenburger
et al. (2008)), and moreover, iterated weak dominance is a powerful tool for analyzing
extensive form games (see Section 2.6.1).
Example 1.12. Order of removal matters under IWD.
2
U
M
D
L
R
5, 1 4, 0
6, 0 3, 1
6, 4 4, 4
20
In the game above, removing weakly dominated stategy U first leads to prediction (D, R),
while removing the weakly dominated strategy M first leads to the prediction (D, L).
An intermediate solution concept between ISD and IWD is introduced by Dekel and Fudenberg (1990), who suggest one round of elimination of all weakly dominated strategies,
followed by iterated elimination of strictly dominated strategies. Since weak dominance
is not applied iteratively, the tensions described above do not arise. Strategies that survive
this Dekel-Fudenberg procedure are sometimes called permissible. See Section 2.5 for further
discussion.
1.3 Rationalizability
1.3.1 Definition and examples
Q: What is the tightest prediction that we can make assuming only common knowledge
of rationality?
A: Bayesian rational players not only avoid dominated strategies; they also avoid strategies that are never a best response. If we apply this idea iteratively, we obtain the
sets of rationalizable strategies.
Strategy i is a best response to beliefs i (denoted i Bi (i )) if
ui (i , i ) ui (0i , i )
for all 0i Si
3
4
of the average.
3
4
All players choosing the target integer split a prize worth V > 0 (or, alternatively, each is
given the prize with equal probability). If no one chooses the target integer, the prize is
not awarded.
Which pure strategies are rationalizable in this game?
To start, we claim that for any pure strategy profile si of his opponents, player i has a
response ri Si such that the target integer generated by (ri , si ) is ri . (You are asked to
prove this on the problem set.) Thus for any beliefs i about his opponents, player i can
obtain a positive expected payoff (for instance, by playing a best response to some si in
the support of i ).
So: Since Si = {0, 1, . . . , 100},
Thus if players are rational and know that others are rational, no player chooses a strategy
above 56.
Proceeding through the rounds of eliminating strategies that cannot be best responses, we
find that no player will choose a strategy higher than
75 . . . 56 . . . 42 . . . 31 . . . 23 . . . 17 . . . 12 . . . 9 . . . 6 . . . 4 . . . 3 . . . 2 . . . 1 . . . 0.
Thus, after 14 rounds of iteratively removing strategies that cannot be best responses, we
conclude that each players unique rationalizable strategy is 0.
When applying rationalizability, we may reach a point in our analysis at which a player
has multiple pure strategies, none of which can be removed (meaning that for each such
strategy, there are beliefs against which that strategy is optimal). In this case, we should
22
To find B1 : S2 S1
u1 (T, 1 ) u1 (M, 1 )
u1 (T, 1 ) u1 (B, 1 )
3l 3c
lc
3l 2
l 23
T
M
B
2
L
C
3, 3 0, 0
0, 0 3, 3
2, 2 2, 2
R
0, 2
0, 2
2, 0
To find B2 : S1 S2
u2 (2 , L) u2 (2 , C)
u2 (2 , L) u2 (2 , R)
23
3t + 2b 3m + 2b
tm
3t + 2b 2t + 2m
t + 2b 2m
C + L
L + R
C + L
M + T
C + L
M
C
M + T
B
C + R
B + T
B1: S2 S1
T
M + T + B
M + T
M + B
B2: S1 S2
L
Best responses
for player 1
Q: No mixtures of T and M are a best response for 1. Since 2 knows this, can it be a best
response for her to play R?
A: R is not a best response to any point on the dark lines TB and BM, which represent
mixtures between strategies T and B and between B and M.
Since player 2 is uncertain about which best response player 1 will play, Bayesian rationality requires her to form beliefs about this. These beliefs 2 are a probability measure
on the set of player 1s best responses.
If 2s beliefs about 1s behavior are 2 (T) = 2 (M) = 12 , then it is as if 2 knows that 1 will
play 12 T + 12 M, and R is a best response to these beliefs.
In fact, if 2 (T) = 2 (M) = 25 and 2 (B) = 15 , then it is as if 2 knows that 1 will play
2
T + 25 M + 15 B, so all of 2s mixed strategies are possible best responses.
5
Thus, the player 1s set of rationalizable strategies is R 1 = {1 S1 : [1 (T) = 0 or 1 (M) =
0]}, and player 2s set of rationalizable strategies is simply R = S .
2
When we compute the rationalizable strategies, we must account for each players uncertainty about his opponents strategies. Thus, during each iteration we must leave in
all of his best responses to any mixture of the opponents surviving pure strategies, even
mixtures that are never a best response. Put differently, strategic uncertainty leads us to
24
include the convex hull of the surviving mixed strategies at each intermediate stage of the
elimination process.
Iterative definition of (and procedure to compute) rationalizable strategies:
(i) Iteratively remove pure strategies that are never a best response (to any allowable
beliefs).
(ii) When no further pure strategies can be removed, remove mixed strategies that are
never a best response.
The mixed strategies that remain are the rationalizable strategies.
There are refinements of rationalizability based on assumptions beyond CKR that generate
tighter predictions in some games, while still avoiding the use of equilibrium knowledge
assumptionssee Section 2.5.
Rationalizability and iterated strict dominance in two-player games
It is obvious that
Observation 1.16. If i is strictly dominated, then i is never a best response.
In two-player games, the converse statement is not obvious, but is nevertheless true:
Proposition 1.17. In a two-player game, any strategy that is never a BR is strictly dominated.
The proof is based on the separating hyperplane theorem: see Section 1.3.2.
So never a best response and strictly dominated are equivalent in two-player games.
Iterating yields
Theorem 1.18. In a two-player game, a strategy is rationalizable if and only if it satisfies iterated
strict dominance.
Rationalizability and iterated strict dominance in games with three or more players
For games with three or more players, there are two definitions of rationalizability in use.
The original one (sometimes called independent rationalizability) computes best responses
under the assumption that a players beliefs about different opponents choices are independent, so that these beliefs are formally equivalent to an opponents mixed strategy
profile. The alternative (sometimes called correlated rationalizability) allows correlation in
a players beliefs about different opponents choices. This agrees with the way we defined
beliefs in Section 1.1.2. In either case, [i strictly dominated] [i never a best response],
so all rationalizable strategies survive iterated strict dominance. But the analogues of
Proposition 1.17 and Theorem 1.18 are only true under correlated rationalizability.
25
While opinion is not completely uniform, most game theorists would choose correlated
rationalizability as the more basic of the two concepts. See Hillas and Kohlberg (2002) for
a compelling defense of this point of view. We take rationalizability to mean correlated
rationalizability unless otherwise noted.
Example 1.19. Consider the following three-player game in which only player 3s payoffs
are shown.
3:B
2
3:A
2
1
T
B
L
R
, , 5 , , 2
, , 2 , , 1
T
B
L
, , 4
, , 0
R
, , 0
, , 4
3:C
2
1
T
B
L
, , 1
, , 2
R
, , 2
, , 5
Strategy B is not strictly dominated, since a dominating mixture of A and C would need
to put at least probability 43 on both A (in case 1 and 2 play (T, L)) and C (in case 1 and
2 play (B, R)). If player 3s beliefs about player 1s choices and player 2s choices are
independent, B is not a best response: Independence implies that for some t, l [0, 1], we
can write 3 (T, L) = tl, 3 (T, R) = t(1 l), 3 (B, L) = (1 t)l, and 3 (B, R) = (1 t)(1 l).
Then
u3 (C, 3 ) > u3 (B, 3 )
tl + 2t(1 l) + 2(1 t)l + 5(1 t)(1 l) > 4tl + 4(1 t)(1 l)
1 + t + l > 6tl,
which is true whenever t + l 1 (why?); symmetrically, u3 (A, 3 ) > u3 (B, 3 ) whenever
t + l 1. But B is a best response to the correlated beliefs 3 (T, L) = 3 (B, R) = 21 .
1.3.2 The separating hyperplane theorem
A hyperplane is a set of points in Rn that satisfy a scalar linear equality. More specifically,
the hyperplane Hp,c = {x Rn : p x = c} is identified by some normal vector p Rn {0}
and intercept c R. Since the hyperplane is an n 1 dimensional affine subset of Rn , its
normal vector is unique up to a multiplicative constant.
A half space is a set {x Rn : p x c}.
Example 1.20. In R2 , a hyperplane is a line. x2 = ax1 + b (a, 1) x = b, so p = (a, 1).
The figure below displays cases in which a = 12 , so that p = ( 12 , 1).
26
2
p.x = 0
p.x = 4
p.x = 2
p
4
p. x > c
z
A
p
In cases where B consists of a single point on the boundary of A, the hyperplane whose
existence is guaranteed by the theorem is often called a supporting hyperplane.
For proofs, discussion, examples, etc. see Hiriart-Urruty and Lemarechal (2001).
Application: Best responses and dominance in two-player games
Observation 1.16. If i is strictly dominated, then i is never a best response.
27
Proposition 1.17. Let G be a two-player game. Then i Si is strictly dominated if and only if
i is not a best response to any i Si
Theorem 1.18. In a two-player game, a strategy is rationalizable if and only if it satisfies iterated
strict dominance.
We illustrate the idea of the proof of Proposition 1.17 with an example.
Example 1.22. Our goal is to show that in the two-player game below, [i Si is not
strictly dominated] implies that [i is a best response to some i Si ].
2
L
A 2,
B 6,
C 7,
D 3,
R
5,
3,
1,
2,
Let v1 (1 ) = (u1 (1 , L), u1 (1 , R)) be the vector payoff induced by 1 . Note that u1 (1 , 1 ) =
1 v1 (1 ).
Let V1 = {v1 (1 ) : 1 S1 } be the set of such vector payoffs. Equivalently, V1 is the convex
hull of the vector payoffs to player 1s pure strategies. It is closed and convex.
Now 1 S1 is not strictly dominated if and only if v1 (1 ) lies on the northeast boundary
of V1 . For example, 1 = 12 A + 21 B is not strictly dominated, with v1 ( 1 ) = (4, 4). We want
to show that 1 is a best response to some 1 S2 .
R
6
v1(A) = (2, 5)
~ 1 = ( 13 , 23 )
~
v (
) = v ( 12 A + 12 B) = (4, 4)
v (B) = (6, 3)
1
V1
~ 1 w1= 4
v (D) = (3, 2)
1
v1(C) = (7, 1)
28
~ 1 w1< 4
A general principle: when you are given point on the boundary of a convex set, the normal
vector at that point often reveals something interesting.
The point v1 ( 1 ) lies on the hyperplane 1 w1 = 4, where 1 = ( 13 , 32 ).
This hyperplane separates the point v1 ( 1 ) from the set V1 , on which 1 w1 4.
Put differently,
1 w1 1 v1 ( 1 )
for all w1 V1
1 v1 (1 ) 1 v1 ( 1 )
u1 (1 , 1 ) u1 ( 1 , 1 )
for all 1 S1
~ = (1/3, 2/3)
v1(A)
V1
3
2
v1(B)
u1(A,1)
1 = (2/3, 1/3)
u1(B,1)
u1(D,1)
v1(D)
u1(C,1)
v1(C)
2
2
3
L+ 13 R
1
3
L+ 23 R
beliefs
placed on
best responses
justified by
beliefs
placed on
best responses
placed on
beliefs
The precise version of this fixed point idea is provided by part (i) of Theorem 1.23, which
later will allow us to relate rationalizability to Nash equilibrium. Part (ii) of the theorem
provides the new characterization of rationalizability. We state the characterization for
pure strategies. (To obtain the version for mixed strategies, take i Si as the candidate
set and let Ri = i i support(i ).)
Q
Theorem 1.23. Let Ri Si for all i P , and let Ri = j,i R j
(i)
a
b
A 7, 0 2, 5
B 5, 2 3, 3
C 0, 7 2, 5
D 0, 0 0, 2
c
0, 7
5, 2
7, 0
0, 0
d
0, 1
0, 1
0, 1
9, 1
Then:
B is optimal for 1 when 1 = b R2 , and
b is optimal for 2 when = B R .
2
Also:
A is optimal for 1 when 1 = a R2 ,
a is optimal for 2 when = C R .
2
G
B
g
b
2, 2 0, 0
0, 0 1, 1
Everything is rationalizable.
The Nash equilibria are: (G, g), (B, b), ( 13 G + 23 B, 31 g + 32 b).
Checking the mixed equilibrium:
u2 ( 13 G + 23 B, g) =
1
3
2 + 23 0 =
2
3
32
u2 ( 13 G + 23 B, b) =
1
3
0 + 23 1 =
2
3
s
S 8, 8
R 0, 8
r
8, 0
9, 9
The Nash equilibria here are (S, s), (R, r), and ( 19 S + 98 R, 19 s + 89 r). Although (R, r) yields both
players the highest payoff, each player might be tempted by the sure payoff of 8 that the
safe investment guarantees.
1.4.2 Computing Nash equilibria
The next proposition provides links between Nash equilibrium and rationalizability.
Proposition 1.27. (i) Any pure strategy used with positive probability in a Nash equilibrium
is rationalizable.
(ii) If each player has a unique rationalizable strategy, the profile of these strategies is a Nash
equilibrium.
Proof. Theorem 1.23 provided conditions under which strategies in the sets Ri Si are
rationalizable: for each i P and each si Ri , there is a i Si such that
33
34
C + L
L + R
C + L
M + T
C + L
T
M
B
L
3, 3
0, 0
2, 2
2
C
0, 0
3, 3
2, 2
R
0, 2
0, 2
2, 0
M + T
C + R
B1: S2 S1
T
M + T + B
M + T
B + T
C
M
M + B
B2: S1 S2
L
Best responses
for player 1
R 1
R 2
The key point here is that in Nash equilibrium, player 2s beliefs are correct (i.e., place
probability on player 1s actual strategy).)
Thus, we need not consider any support for 2 that includes R. Three possible supports
for 2 remain:
{L}
{C}
1s BR is T 2s BR is L
1s BR is M 2s BR is C
(i)
(T, L) is Nash
(M, C) is Nash
(ii)
{L, C} u2 (1 , L) = u2 (1 , C) u2 (1 , R):
(i) 3t + 2b = 3m + 2b
t=m
2
5
Looking at B1 (or R 1 ), we see that this is only possible if player 1 plays B for sure. Player
1 is willing to do this if
u1 (B, 2 ) u1 (T, 2 ) l 23 , and
35
u1 (B, 2 ) u1 (M, 2 ) c
2
3
Since we know that R is not used in any Nash equilibrium, we conclude that (B, L+(1)C)
is a Nash equilibrium for [ 31 , 32 ].
Since we have checked all possible supports for 2 , we are done.
Example 1.29. Zeemans (1980) game.
A
B
C
A
0, 0
3, 6
1, 4
2
B
6, 3
0, 0
3, 5
C
4, 1
5, 3
0, 0
Since the game is symmetric, both players have the same incentives as a function of the
opponents behavior.
A B 6b 4c 3a + 5c a + 2b 3c;
A C 6b 4c a + 3b a + 3b 4c;
B C 3a + 5c a + 3b 5c 2a + 3b.
A+C
A+C
A
B
B
B+C
B, C
all
2
2
2
3:L
22
0
0
3
1
0
2
0
0
1
0
3
3
a 2
d
A 2, 2, 22 0, 0, 1
D 0, 0, 3 0, 0, 3
3:R
2
a
A 2, 2, 2
1
D 1, 0, 2
d
0, 3, 3
1, 0, 2
Implies that 3 plays L. Since 1 and 2 are also playing best responses,
this is a Nash equilibrium.
(D, a)
Implies that 3 plays L, which implies that 1 prefers to deviate to A.
(D, mix)
Implies that 3 plays L, which with 2 mixing implies that 1 prefers
to deviate to A.
(A, d)
Implies that 3 plays R, which implies that 1 prefers to deviate to D.
(A, a)
1 and 2 are willing to do this if 3 (L) 13 . Since 3 cannot affect his
payoffs given the behavior of 1 and 2, these are Nash equilibria.
(A, mix)
2 only mixes if 3 (L) = 13 ; but if 1 plays A and 2 mixes, 3 strictly
prefers R a contradiction.
(mix, a)
Implies that 3 plays L, which implies that 1 strictly prefers A.
(mix, d)
If 2 plays d, then for 1 to be willing to mix, 3 must play L; this leads
2 to deviate to a.
(mix, mix) Notice that 2 can only affect her own payoffs when 1 plays A.
Hence, for 2 to be indifferent, 3 (L) = 13 . Given this, 1 is willing to
mix if 2 (d) = 23 . Then for 3 to be indifferent, 1 (D) = 74 . This is a
Nash equilibrium.
There are three components of Nash equilibria: (D, d, L)
(A, a, 3 (L) 13 )
( 37 A + 74 D, 13 a + 23 d, 13 L + 23 R)
37
g
G 2, 2
B 0, 0
b
0, 0
1, 1
NE: (G, g)
(B, b)
( 13 G + 23 B, 13 g + 32 b).
38
Focal points can also be determined abstractly, using (a)symmetry to single out
certain distinct strategies: see Alos-Ferrer
and Kuzmics (2013).
(iv) Learning/Evolution: If players repeatedly face the same game, they may find their
way from arbitrary initial behavior to Nash equilibrium.
Heuristic learning: Small groups of players, typically employing rules that
condition on the empirical distribution of past play (Young (2004))
Evolutionary game theory: Large populations of agents using myopic updating rules (Sandholm (2010))
In some classes of games (that include the two examples above), many learning
and evolutionary processes do converge to Nash equilibrium.
But there is no general guarantee of convergence:
Many games lead to cycling or chaotic behavior, and in some games any reasonable dynamic process fails to converge to equilibrium (Shapley (1964), Hofbauer
and Swinkels (1996), Hart and Mas-Colell (2003)).
Some games introduced in applications are known to have poor convergence properties (Hopkins and Seymour (2002), Lahkar (2011)).
In fact, evolutionary game theory models do not even support the elimination of
strictly dominated strategies in all games (Hofbauer and Sandholm (2011)).
Interpretation of mixed strategy Nash equilibrium: why mix in precisely the way that
makes your opponents indifferent?
In the unique equilibrium of Matching Pennies, player 1 is indifferent among all of his
mixed strategies. He chooses ( 12 , 12 ) because this makes player 2 indifferent. Why should
we expect player 1 to behave in this way?
(i)
Deliberate randomization
We can interpret i as describing the beliefs that player is opponents have about
player is behavior. The fact that i is a mixed strategy then reflects the opponents
uncertainty about how i will behave, even if i is not actually planning to randomize.
But as Rubinstein (1991) observes, this interpretation
. . . implies that an equilibrium does not lead to a prediction (statistical or otherwise) of the players behavior. Any player is action which is a best response given
his expectation about the other players behavior (the other n 1 strategies) is consistent as a prediction for is action (this might include actions which are outside the
support of the mixed strategy). This renders meaningless any comparative statics
or welfare analysis of the mixed strategy equilibrium and brings into question the
enormous economic literature which utilizes mixed strategy equilibrium.
(iii) Mixed equilibria as time averages of play: fictitious play (Brown (1951))
Suppose that the game is played repeatedly, and that in each period, each player
chooses a best response to the time average of past play.
Then in certain classes of games, the time average of each players behavior converges to his part in some Nash equilibrium strategy profile.
(iv) Mixed equilibria as population equilibria (Nash (1950))
Suppose that there is one population for the player 1 role and another for the player
2 role, and that players are randomly matched to play the game.
If half of the players in each population play Heads, no one has a reason to deviate.
Hence, the mixed equilibrium describes stationary distributions of pure strategies in
each population.
(v) Purification: mixed equilibria as pure equilibria of games with payoff uncertainty
(Harsanyi (1973))
Example 1.33. Purification in Matching Pennies. Suppose that while the Matching Pennies
payoff bimatrix gives players approximate payoffs, players actual payoffs also contain
small terms H , h representing a bias toward playing heads, and that each player only
knows his own bias. (The formal framework for modeling this situation is called a Bayesian
gamesee Section 3.)
2
H
T
h
1 + H , 1 + h
1, 1 + h
t
1 + H , 1
1, 1
Specifically, suppose that H and h are independent random variables with P(H > 0) =
P(H < 0) = 12 and P(h > 0) = P(h < 0) = 12 . Then it is a strict Nash equilibrium for each
40
player to follow his bias. From the ex ante point of view, the distribution over actions that
this equilibrium generates in the original normal form game is ( 12 H + 12 T, 12 h + 21 t).
Harsanyi (1973) shows that any mixed equilibrium can be purified in this way. This
includes not only reasonable mixed equilibria like that in Matching Pennies, but also
unreasonable ones like those in coordination games.
1.4.4 Existence of Nash equilibrium and structure of the equilibrium set
Existence and structure theorems for finite normal form games
When does Nash equilibrium provide us with at least one prediction of play? Always, at
least in the context of finite normal form games.
Let G = {P , {Si }iP , {ui }iP } be a finite normal form game.
Theorem 1.34 (Nash (1950)).
G has at least one (possibly mixed) Nash equilibrium.
One can also say more about the structure of the set of Nash equilibria.
Theorem 1.35 (Kohlberg and Mertens (1986)).
The set of Nash equilibria of G consists of finitely many connected components.
Below, a statement is true for generic choices of payoffs if the set of payoff assignments for
which the statement is false has measure zero.
Theorem 1.36 (Wilson (1971), Gul et al. (1993)).
For generic choices of payoffs in G,
(i) the number of Nash equilibria of G is finite and odd;
(ii) if G has 2k + 1 Nash equilibria, at least k of them are (nondegenerate) mixed equilibria.
Equilibrium existence results are usually proved by means of fixed point theorems.
Theorem 1.37 (Brouwer (1912)). Let X Rn be nonempty, compact, and convex. Let the
function f : X X be continuous. Then there exists an x X such that x = f (x).
Theorem 1.38 (Kakutani (1941)). Let X Rn be nonempty, compact, and convex. Let the
correspondence f : X X be nonempty, upper hemicontinuous, and convex valued. Then there
exists an x X such that x f (x).
41
T
B
Al
Betty
L
R
5, 5 2, 6
6, 2 1, 1
(2,6)
(4,4)
(3,3)
2
(6,2)
Q: Can a mediator use a randomizing device to generate expected payoffs greater than 8
in equilibrium?
A: Yes, by only telling each player what he is supposed to play, not what the opponent is
supposed to play.
Suppose the device specifies TL, TR, and BL each with probability 13 , so that = 31 TL +
1
TR + 13 BL.
3
43
Betty
L R
Al
T
B
1
3
1
3
1
3
However, Al is only told whether his component is T or B, and Betty is only told if her
component is L or R:
The correlated strategy generates payoffs of (4 31 , 4 31 ).
Moreover, we claim that both players obeying constitutes an equilibrium:
Suppose that Betty plays as prescribed and consider Als incentives.
If Al sees B, he knows that Betty will play L, so his best response is B (since 6 > 5).
If Al sees T, he believes that Betty is equally likely to play L or R, and so Al is willing to
play T (since 3 12 = 3 12 ).
By symmetry, this is an equilibrium.
Let G = {P , {Si }iP , {ui }iP } be a normal form game.
Q
P
For any correlated strategy ( S j ) with (si ) =
(si , si ) > 0, let (si |si ) =
jP
si Si
(si ,si )
.
(si )
si Si
X
si Si
si Si
si Si
In words, the first condition says that if i receives signal si and opponents obey their signals,
i cannot benefit from disobeying his signal. The second condition is mathematically
simplersee below.
Observation 1.42. Correlated strategy is equivalent to a Nash equilibrium if and only if is a
correlated equilibrium and is a product measure (i.e., the players signals are independent).
Example 1.43. (Describing the set of correlated equilibria)
G
B
2
g
b
3, 3
0, 5
5, 0 4, 4
44
The Nash equilibria are (G, b), (B, g), and ( 32 G + 13 B, 23 g + 31 b).
The set of correlated equilibria of any game is an intersection of a finite number of half
spaces. It is therefore a polytope: that is, the convex hull of a finite number of points.
The constraint ensuring that player 1 plays G when told is 3Gg + 0Gb 5Gg 4Gb , or,
equivalently, 2Gb Gg . We compute the constraints for strategies B, g, and b similarly,
and list them along with the nonnegativity constraints:
(1)
(2)
(3)
(4)
2Gb Gg ;
Bg 2Bb ;
2Bg Gg ;
Gb 2Bb ;
(5) Gg
(6) Gb
(7) Bg
(8) Bb
0
0
0
0
The set of correlated equilibria is drawn below. Notice that vertices and are the
pure Nash equilibria, and that vertex is the mixed Nash equilibrium, at which all four
incentive constraints bind.
45
8
3
31
3
8
4
24
4
5
5
8
1.5.2 Interpretation
Some game theorists feel that correlated equilibrium is the fundamental solution concept
for normal form games:
(i)
Mathematical simplicity
The set of correlated equilibria is defined by a finite number of linear inequalities.
It is therefore a polytope (the convex hull of a finite number of points).
Existence of correlated equilibrium can be proved using results from the theory of
linear inequalities (Hart and Schmeidler (1989)).
(The basic results from the theory of linear inequalities are much easier to prove than
fixed point theorems, and each can be derived from the others without too much
difficulty. These results include the Minmax Theorem (Section 1.6), the Separating
Hyperplane Theorem for polytopes (Section 1.3.2), Linear Programming Duality,
Farkass Lemma, . . . )
(ii) Learnability
There are procedures which enable players to learn to play correlated equilibria
regardless of the game they are playing, at least in the sense of time-averaged play
(Foster and Vohra (1997), Hart and Mas-Colell (2000), Young (2004)).
46
47
48
2 S2
2 S2
2 S2
1 S1
49
1 S1
2 S2
T
B
2
L
C
4, 4 2, 2
0, 0 1, 1
R
1, 1
3, 3
Suppose that the players are only allowed to play pure strategies. In this case
max min u1 (s1 , s2 ) = max{1, 0} = 1;
s1 S1 s2 S2
It is easy to check that whether players are restricted to pure strategies or are allowed
play mixed strategies, max min min max: going last (and getting to react what your
opponent did) is at least as good as going first. (The proof is straightforward: see Proof
of the Minmax Theorem below.) The previous example shows that when players are
restricted to pure strategies, we can have max min < min max. Can this happen if mixed
strategies are allowed?
50
u1(1,L)
u1(1,C)
2
v1maxmin=
u1(1,R)
5
3
u1(1,2(1))
R
L
B
Notice that we can divide S1 into three punishment regions: writing 1 = T + (1 )B,
the regions are [0, 31 ] (where the punishment is L), [ 13 , 23 ] (where it is C), and [ 23 , 1] (where
it is R). Because the lower envelope is a minimum of linear functions, player 1s maxmin
strategy must occur at a vertex of one of the punishment regions: that is, at T, 23 T + 31 B,
1
T + 23 B, or B. An analogous statement is true in the case of player 2s minmax strategy, as
3
we will see next.
We can find player 2s minmax strategy in a similar fashion. In this case, we are looking
for the strategy of player 2 that minimizes an upper envelope function. This calculation
uses player 1s best response correspondence. The upper envelope of the payoff functions
pictured below is u1 (1 (2 ), 2 ). Because this upper envelope is the maximum of linear
functions, it is minimized at a vertex of one of the best response regions shown at bottom.
By computing u1 (1 (2 ), 2 ) at each vertex, we find that 2 = 32 C + 13 R, where u1 (1 ( 23 C +
1
R), 32 C + 13 R) = 53 = vminmax
.
3
1
Notice that vmaxmin
= vminmax
: the payoff that player 1 is able to guarantee himself is equal
1
1
to the payoff that player 2 can hold him to. This is a consequence of the Minmax Theorem,
which we state next.
51
u1(T)
C
u1(B)
R
C [2]
[5/3]
T
[2]
L [4]
(2/3)C+(1/3)R
(1/3)L+(2/3)R
[3]
We can also use a version of the first picture to compute 2 and vminmax
. We do so by
1
finding the convex combination of the graphs of u1 (1 , L), u1 (1 , C), and u1 (1 , R) whose
highest point is as low as possible. This is the horizontal line shown in the diagram
below at right. (It is clear that no line can have a lower highest point, because no lower
payoff for player 1 is feasible when 1 = 32 T + 31 B = 1 .) Since this horizontal line is
the graph of 23 u1 (1 , C) + 13 u1 (1 , R) = u1 (1 , 23 C + 13 R) (check the endpoints), we conclude
that 2 = 23 C + 13 R. 1s minmax payoff is the constant value of u1 (1 , 23 C + 13 R), which is
vminmax
= 53 .
i
In similar fashion, one can determine 1 and vmaxmin
using the second picture by finding
1
the convex combination of the two planes that whose lowest point is as high as possible.
This convex combination corresponds to the mixed strategy 1 = 23 T + 13 B. (Sketch this
plane in to see for yourself.)
52
u1(1,23C+13R)
v1minmax=
5
3
2
1
(ii) (1 , 2 ) is a Nash equilibrium of G if and only if 1 is a maxmin strategy for player 1 and
2 is a minmax strategy for player 2. (This implies that a Nash equilibrium of G exists.)
(iii) Every Nash equilibrium of G yields payoff vmaxmin
= vminmax
for player 1.
1
1
The common value of vmaxmin = vminmax is known as the value of the game, and is denoted v .
1
The Minmax Theorem tells us that in a zero-sum game G, player 1 can guarantee that
he gets at least v(G), and player 2 can guarantee that player 1 gets no more than v(G);
moreover, in such a game, worst-case analysis and Bayesian-rational equilibrium analysis
generate the same predictions of play.
For more on the structure of the equilibrium set in zero-sum games, see Shapley and Snow
(1950); see Gonzalez-Daz et al. (2010) for a textbook treatment. For minmax theorems for
games with infinite strategy sets, see Sion (1958).
Example 1.47. In the zero-sum game G with player 1s payoffs defined in Example 1.45,
the unique Nash equilibrium is ( 23 T + 13 B, 23 C + 31 R), and the games value is v(G) = 53 .
Example 1.48. What are the Nash equilibria of this normal form game?
53
2
X
Y
A 10, 0 1, 11
B 9, 1
1, 9
C 2, 8
8, 2
D 2, 8
2, 8
E 4, 6
5, 5
Z
1, 11
1, 9
1, 9
7, 3
6, 4
This game is a constant-sum game, and so creates the same incentives as a zero-sum game
(for instance, the game in which each players payoffs are always 5 units lower). Therefore,
by the Minmax Theorem, player 2s Nash equilibrium strategies are her minmax strategies.
To find these, we draw player 1s best response correspondence (see below). The numbers
in brackets are player 1s best response payoffs against the given strategies of player 2.
= 4 117 , which player 2 can enforce using her
Evidently, player 1s minmax payoff is v1 = 51
11
5
unique minmax strategy, 2 = 11
X + 115 Y + 111 Z.
Now let 1 be an equilibrium strategy of player 1. Since player 2 uses her minmax strategy
5
5
2 = 11
X + 11
Y + 111 Z in any equilibrium, it must be a best response to 1 ; in particular, X,
Y, and Z must yield her the same payoff against . But , being a best response to ,
1
can only put weight on pure strategies B, C, and E. The only mixture of these strategies
13
B + 778 C + 56
E. We therefore
that makes player 2 indifferent among X, Y, and Z is 1 = 77
77
13
8
56
5
5
1
conclude that ( 77 B + 77 C + 77 E, 11 X + 11 Y + 11 Z) is the unique Nash equilibrium of G.
X [10]
[19/3]
2/3X+1/3Y
[5] 1/2X+1/2Y
5/11X+5/11Y+1/11Z
[51/11]
[8]
[19/3]
2/3X+1/3Z
1/2X+1/2Z [5]
1/3X+2/3Z
C
5/8Y+3/8Z
[43/8]
[16/3]
D
1/4Y+3/4Z
[23/4]
[7]
vmaxmin
= ui ( i , j ( i )) ui ( i , j ) ui (i ( j ), j ) = vminmax
.
i
i
54
Example 1.49. Consider the two player normal form game G below, in which player 1 is
the row player and player 2 is the column player. (Only player 1s payoffs are shown.)
2
1
T
B
a
b
c
5, 3, 1,
1, 0, 4,
d
0,
7,
Explain in game theoretic terms what it means for a vector to be an element of the
set K.
(ii) Let c be the smallest number such that the point (c , c ) is contained in K. What is
the value of c ? Relate this number to player 1s minmax payoff in G, explaining
the reason for the relationship you describe.
55
(iii) Specify the normal vector p R2 and the intercept d of the hyperplane H = {v
R2 : p v = d } that separates the set L(c ) from the set K, choosing the vector p to
have components that are nonnegative and sum to one.
(iv) Interpret the fact that p v d for all v K in game theoretic terms. What
conclusions can we draw about player 1s maxmin payoff in G?
(v) Let Gn be a two player normal form game in which player 1 has n 2 strategies.
Sketch a proof of the fact that player 1s minmax payoff in Gn and his maxmin
payoff in Gn are equal. (When Gn is zero-sum, this fact is the Minmax Theorem.)
Solution:
(i) v K if and only if v = (u1 (T, 2 ), u1 (B, 2 )) for some 2 S2 .
(ii) c = 2. This is player 1s minmax value: by playing 12 b + 12 c, the strategy that
generates (2, 2), player 2 ensures that player 1 cannot obtain a payoff higher than
2. If you draw a picture of K, you will see that player 2 cannot restrict 1 to a lower
payoff.
(iii) p = ( 23 , 13 ) and d = 2.
(iv) Let 1 = p . Then (p v 2 for all v K) is equivalent to (u1 (1 , 2 ) 2 for all
2 S2 ). Thus, player 1s maxmin payoff is at least 2; by part (ii), it must be
exactly 2.
(v) Define n-strategy analogues of the sets J, K, and L(c). Let c be the largest value of
c such that int(L(c)) and int(K) do not intersect. (Notice that if, for example, player
1 has a dominated strategy, L(c ) K may not include (c , . . ., c ); this is why we
need the set L(c ).) If player 2 chooses a 2 that generates a point in L(c ) K, then
player 1 cannot obtain a payoff higher than c . Hence, player 1s minmax value is
less than or equal to c . (In fact, the way we chose c tells us that it is exactly equal
to c .)
Let p and d define a hyperplane that separates L(c ) and K; we know that such a
hyperplane exists by the separating hyperplane theorem. Given the form of L(c ),
we can choose p to lie in S1 ; therefore, since the hyperplane passes through the
point (c , . . ., c ), the d corresponding to any p chosen from S1 (in particular, to
any p whose components sum to one) is in fact c . Thus, as in (iv) above, player
1s maxmin value is at least c . Since player 1s maxmin value cannot exceed his
minmax value, we conclude that both of these values equal c .
games of imperfect information. The key distinction here is that in the former class of games,
a players choices are always immediately observed by his opponents.
Simultaneous moves are modeled by incorporating unobserved moves (see Example 2.2),
and so lead to imperfect information games (but see Section 2.3.4).
Extensive form games may also include chance events, modeled as moves by Nature. For
our categorization, it is most convenient to understand game of perfect information to
refer only to games without moves by Nature.
The definition of extensive form games here follows Selten (1975). Osborne and Rubinstein
(1994) define extensive form games without explicitly introducing game trees. Instead,
what we call nodes are identified with the sequences of actions that lead to them. This
approach, which is equivalent to Seltens, requires somewhat less notation, but at the cost
of being somewhat more abstract.
Example 2.1. Sequential Battle of the Sexes.
1x
F
2y
f
2z
f
2,1
0,0
0,0
1,2
Example 2.2. (Simultaneous) Battle of the Sexes. Here player 2 does not observe player 1s
choice before she moves herself. We represent the fact that player 2 cannot tell which of
her decision nodes has been reached by enclosing them in an oval. The decision nodes
are said to be in the same information set.
1
F
2
f
2,1
2
f
0,0
0,0
57
1,2
Example 2.3. A simple card game. Players 1 and 2 each bet $1. Player 1 is given a card which
is high or low; each is equally likely. Player 1 sees the card, player 2 doesnt. Player 1 can
raise the bet to $2 or fold. If player 1 raises, player 2 can call or fold. If player 2 calls, then
player 1 wins if and only if his card is high.
The random assignment of a card is represented as move by Nature, marked in the game
tree as player 0. Since player 2 cannot tell whether player 1 has raised with a high card
or a low card, her two decision nodes are in a single information set.
0v
H
1w
R
1x
F
-1,1
-1,1
2z
2y
C
2,-2
F
1,-1
-2,2
1,-1
Terminal nodes
58
set of actions
assigns each edge an action, assigning different actions to distinct edges
, e (e) , (e))
leaving the same node (e = (x, y), e = (x, y)
P = {1, . . . , n}
Di D
ui : Z R
set of players
set of player is decision nodes: each x D is in exactly one Di
(in other words, D1 , . . . , Dn form a partition of D)
player is von Neumann-Morgenstern utility function
L
a
0,0,0
R
a
1,1,1
0,4,2
b
3,1,3
0,0,1
This completes the definition of an extensive form game of perfect information. Extensive
form games, perfect information or not, are typically denoted by .
Example 2.4. Sequential Battle of the Sexes: notation. In Example 2.1, we have:
Assignments of decision nodes: D1 = {x}, D2 = {y, z}.
To be more precise, the set of decision nodes D is now partitioned into D0 , D1 , . . . , Dn , with
D0 being the set of nodes assigned to Nature. Let x be such a node, so that Ax is the set
of actions available at x. Then the definition of the game includes a probability vector
px Ax , which for each action a Ax specifies the probability px (a) that Nature chooses
a.
Although we sometimes refer to Nature as player 0, Nature is really just a device for
representing chance events, and should not be regarded as a player. That is, the set of
players is still P = {1, . . . n}.
Games of imperfect information
In games of imperfect information, certain action choices of one or more players or of
Nature may not be observed before other players make their own action choices.
To represent unobserved moves, we form a partition Ii of each player is set of decision
nodes Di into information sets I Di . When play reaches player is information set I, i
knows that some node in I has been reached, but he cannot tell which.
For an information set I to make sense, its owner must have the same set of actions AI
available at each node in I. Were this not true, the owner could figure out where he is
in I by seeing what actions he has available. (Formally, we require that if x, y I, then
Ax = A y , and we define AI to be this common action set.)
When an information set is a singleton (like {x}), we sometimes abuse notation and refer
to the information set itself as x.
Observe that is a game of perfect information if every information set is a singleton and
there are no moves by Nature.
Example 2.5. A simple card game: notation. In Example 2.3, we have:
Assignments of decision nodes: D0 = {v}, D1 = {w, x}, D2 = {y, z}.
Action sets: Av = {H, L}, Aw = {R, F}, Ax = {r, f }, A y = Az = {C, F}.
The probability assignments at Natures node are pv (H) = pv (L) = 12 .
Only player 2 has a nontrivial information set: I2 = {I}, where I = {y, z}. Notice that
A y = Az as required; thus AI = A y = Az = {C, F}.
Perfect recall
One nearly always restricts attention to games satisfying perfect recall: each player remembers everything he once knew, including his own past moves.
To express this requirement formally, write e y when e precedes y in the game tree (i.e.,
60
when the path from the root node to y includes e). Then perfect recall is defined as follows
(with x = x permitted):
If x I Ii , e Ex , y, y I0 Ii , and e y,
1x
L
1y
l
(1)
(0)
(0)
(0)
1z
r
1y
(1)
(4)
W
(1)
Q
IIi
Example 2.7. Sequential Battle of the Sexes: strategies. In Example 2.1, S1 = Ax = {F, B} and
b f, bb}.
Note that even if player 1 chooses F, we still must specify
S2 = A y Az = { f f, f b,
what player 2 would have done at z, so that 1 can evaluate his choices at x.
A pure strategy for an extensive form game must provide a complete description of how a
player intends to play the game, regardless of what the other players do. In other words,
one role of a strategy is to specify a plan of action for playing the game.
However, in games where a player may be called upon to move more than once during the
course of play (that is, in games that do not have the single-move property), a pure strategy
contains information that a plan of action does not: it specifies how the player would act at
information sets that are unreachable given his strategy, meaning that the actions specified
by his strategy earlier in the game ensure that the information set is not reached.
61
1x
2y
B
1,1
1z
1,4
0,3
2,2
2 (s2 ) = 1.
s2 S2
1
U
2x
L
3,3
Ll
2y
R
l
2,1
Rr
1,2
4,4
Lr
Rl
Another way to specify random behavior is to suppose that each time a player reaches
an information set I, he randomizes over the actions in AI . (These randomizations are
assumed to be independent of each other and of all other randomizations in the game.)
Q
This way of specifying behavior is called a behavior strategy: i IIi AI .
Example 2.10. For the game from Example 2.9,
y
Note that mixed strategies are joint distributions, while behavior strategies are collections
of marginal distributions from which draws are assumed to be independent. (Their
relationship is thus the same as that between correlated strategies and mixed strategy
profiles in normal form games.)
It follows immediately that every distribution over pure strategies generated by a behavior
strategy can also be generated by a mixed strategy. However, the converse statement is
false in general, since behavior strategies cannot generate correlation in choices at different
information sets.
63
Example 2.11. In the game from Example 2.9, consider the behavior strategy 2 = ((x2 (L), x2 (R)),
y
y
(2 (l), 2 (r))) = (( 12 , 21 ), ( 13 , 32 )). Since the randomizations at the two nodes are independent,
2 generates the same distribution over pure strategies as the mixed strategy
2 = (2 (Ll), 2 (Lr), 2 (Rl), 2 (Rr))
= ( 12 13 , 12 23 , 12 31 , 12 23 )
= ( 16 , 13 , 16 , 13 ).
Example 2.12. The mixed strategy 2 = ( 31 , 16 , 0, 12 ) entails correlation between the randomizations at player 2s two decision nodes. In behavior strategies, such correlation is
forbidden. But in this case the correlation is strategically irrelevant, since during a given
play of the game, only one of player 2s decision nodes will be reached. In fact, 2 is also
strategically equivalent to 2 . We make this idea precise next.
Correlation in a players choices at different information sets is strategically irrelevant
in any game with perfect recall. Thus in such games, mixed strategies and behavior
strategies provide exactly the same strategic possibilities.
To formalize this point, we say that player is behavior strategy bi and mixed strategy i
are outcome equivalent if for any mixed strategy profile i of player is opponents, (bi , i )
and (i , i ) generate the same distribution over terminal nodes.
It is immediate from the foregoing that every behavior strategy is outcome equivalent to
a mixed strategy. For the converse, we have
Theorem 2.13 (Kuhns (1953) Theorem). Suppose that has perfect recall. Then every mixed
strategy i is outcome equivalent to some behavior strategy i .
For a proof, see Gonzalez-Daz et al. (2010).
Suppose one is given a mixed strategy i . How can one find an equivalent behavior
strategy i ? Specifically, how do we define Ii (a), the probability placed on action a at
player is information set I?
Intuitively, Ii (a) should be the probability that a is chosen at I, conditional on i and his
opponents acting in such a way that I is reached.
Formally, Ii (a) can be defined as follows:
(i) Let i (I) be the probability that i places on pure strategies that do not preclude Is
being reached.
(ii.a) If i (I) > 0, let i (I, a) be the probability that i places on pure strategies that do not
preclude Is being reached and that specify action a at I. Then let Ii (a) = i (I, a)/i (I).
64
1
3
1
3
+
+
1
= 23 ;
3
1
= 125 .
12
Example 2.15. Again let 2 = ( 13 , 13 , 121 , 14 ) as above, but consider the game below:
2x
L
1
U
3,1
D
2y
l
3,0
r
4,1
2,2
Then x2 (L) = 2 (Ll) + 2 (Lr) = 23 as before. This game does not have the single move
property, and player 2s second node can only be reached if L is played. Thus
y
2 (l)
2 (Ll)
=
=
2 (Ll) + 2 (Lr)
1
3
1
3
1
3
1
= .
2
Example 2.16. In the game from the Example 2.15, if 2 = (0, 0, 0, 1), then x2 (L) = 0, but
y
2 (l) is unrestricted, since if 2 plays 2 her second node is not reached.
When studying extensive form games directly, it is generally easier to work with behavior
strategies than with mixed strategies. Except when the two are being compared, the
notation i is used to denote both.
65
Example 2.17.
1
U
G(1 )
2
2x
L
2y
R
3,3
l
2,1
1,2
Ll Lr
U 3, 3 3, 3
D 1, 2 4, 4
Rl
2, 1
1, 2
Rr
2, 1
4, 4
R
2, 1
4, 4
4,4
Example 2.18.
1
U
G(2 )
2
2
L
2
R
3,3
2,1
1,2
L
U 3, 3
D 1, 2
4,4
Example 2.19.
2x
L
1
U
4,1
Ll
U 4, 1
1
D 3, 0
2y
l
3,1
3,0
r
2,2
66
Lr
2, 2
3, 0
Rl Rr
3, 1 3, 1
3, 1 3, 1
We are not done: the (purely) reduced normal form consolidates redundant pure strategies:
G(3 )
U
D
2
Ll Lr
4, 1 2, 2
3, 0 3, 0
R
3, 1
3, 1
In the previous example, the two pure strategies that were consolidated correspond to the
same plan of action (see Section 2.1.2). This is true in generic extensive form games.
Many extensive form games can have the same reduced normal form. For example,
switching the order of moves in Example 2.18 does not change the reduced normal form.
Thompson (1952) and Elmes and Reny (1994) show that two extensive form games with
perfect recall have the same reduced normal form if it is possible to convert one game to
the other by applying three basic transformations: interchange of simultaneous moves,
addition of superfluous moves, and coalescing of information sets.
2
0
3
2
F
-1
-1
1
1
1
67
F
O
0, 3
E 1, 1
A
0, 3
1, 1
If is a strategy profile in , then |x denotes the strategy profile that induces in subgame
x .
Definition (i). Strategy profile is a subgame perfect equilibrium of (Selten (1965)) if it
induces a Nash equilibrium in each subgame x of .
Definition (ii). Strategy i is sequentially rational given i if for each decision node x of
player i, (|x )i is a best response to (|x )i in x . If this is true for every player i, we say that
strategy profile itself is sequentially rational.
The third definition concerns the outcomes of an explicit procedure for selecting strategy
profiles. This backward induction procedure goes as follows: Find a decision node x that is
only followed by terminal nodes. Specify an action at that node that leads to the terminal
node z following x that yields the highest payoff for the owner of node x. Then replace
decision node x with terminal node z and repeat the procedure.
If indifferences occur, the output of the procedure branches into multiple strategy profiles,
with each specifying a different optimal choice at the point of indifference. Different
strategy profiles that survive may have different outcomes, since while the player making
the decision is indifferent between his actions, other players generally are not (see Example
2.30).
Definition (iii). Any strategy profile that can be obtained by following the process described
above is said to survive the backward induction procedure.
Theorem 2.21. Let be a finite perfect information game. Then the following are equivalent:
(i) Strategy profile is a subgame perfect equilibrium.
(ii) Strategy profile is sequentially rational.
(iii) Strategy profile survives the backward induction procedure.
Remarks:
(i)
We typically use the term subgame perfect equilibrium to refer to the equivalent
solution concepts from Theorem 2.21.
(ii) Since the backward induction procedure always generates at least one strategy
profile, existence of subgame perfect equilibrium is guaranteed. In fact, since it
is always possible to specify a pure action at every point of indifference, a pure
strategy subgame perfect equilibrium always exists.
(iii) If the backward induction procedure never leads to an indifference, it generates
a unique subgame perfect equilibrium, sometimes called the backward induction
solution. This is the case in generic finite perfect information games (specifically,
69
finite perfect information games in which no player is indifferent between any pair
of terminal nodes).
In this case subgame perfect equilibrium is something of a misnomer. This
term suggests that equilibrium knowledge assumptions are needed to justify the
prediction. In Section 2.3.2 we explain why this is not true, but also why the
assumptions that are needed are still rather strong in many cases.
(iv) When an indifference occurs during the backward induction procedure in a game
with finite action sets, each choice of action at the point of indifference leads to
a distinct subgame perfect equilibrium. In games with infinite action sets this
need not be true: some ways of breaking indifferences may be inconsistent with
equilibrium play (see Example 2.32).
Example 2.22. Entry Deterrence: solution. In the Entry Deterrence game (Example 2.20), the
backward induction procedure selects A for player 2, and hence E for player 1. Thus the
unique subgame perfect equilibrium is (E, A).
Example 2.23. Multiple entrants. There are two entrants and an incumbent. The entrants
decide sequentially whether to stay out of the market or enter the market. Entrants who
stay out get 0. If both entrants stay out, the incumbent gets 5. If there is entry, the
incumbent can fight or accommodate. If the incumbent accommodates, per firm profits
are 2 for duopolists and 1 for triopolists. On top of this, fighting costs the incumbent 1
and the entrants who enter 3.
1
O
2
o
0
0
5
2
o
0
-1
1
0
2
2
-1
0
1
2
0
2
f
-4
-4
-2
a
-1
-1
-1
The unique subgame perfect equilibrium of this game is (E, (e, o0 ), (a, a0 , a00 )), generating
outcome (2, 0, 2). This is an instance of first mover advantage.
70
Note that Nash equilibrium does not restrict possible predictions very much in this game:
if the 64 pure strategy profiles, 20 are Nash equilibria of the reduced normal form; (0, 0, 5),
(0, 2, 2), and (2, 0, 2) are Nash equilibrium outcomes. Thus, requiring credibility of commitments refines the set of predictions substantially.
1:O
2
f f 0 f 00
f f 0 a00
f a0 f 00
f a0 a00
a f 0 f 00
a f 0 a00
aa0 f 00
aa0 a00
oo0
005
005
005
005
005
005
005
005
oe0
005
005
005
005
005
005
005
005
eo0
0 -1 1
0 -1 1
0 -1 1
0 -1 1
022
022
022
022
1:E
2
ee0
0 -1 1
0 -1 1
0 -1 1
0 -1 1
022
022
022
022
f f 0 f 00
f f 0 a00
f a0 f 00
f a0 a00
a f 0 f 00
a f 0 a00
aa0 f 00
aa0 a00
oo0
-1 0 1
-1 0 1
202
202
-1 0 1
-1 0 1
202
202
oe0
-4 -4 2
-1 -1 -1
-4 -4 2
-1 -1 -1
-4 -4 2
-1 -1 -1
-4 -4 2
-1 -1 -1
eo0
-1 0 1
-1 0 1
202
202
-1 0 1
-1 0 1
202
202
ee0
-4 -4 2
-1 -1 -1
-4 -4 2
-1 -1 -1
-4 -4 2
-1 -1 -1
-4 -4 2
-1 -1 -1
1y
C
1z
E
(7)
(2)
(6)
(5)
(4)
(3)
(1)
(8)
Let and be pure strategies of the lone player in sequential decision problem .
x generates a higher payoff
We say that is a profitable deviation from in subgame x if |
than |x in subgame x . In this definition, it is essential that payoffs be evaluated from the
vantage of node x, not from the vantage of the initial node of (see Example 2.27).
By definition, is sequentially rational in if it does not admit a profitable deviation
72
in any subgame. Put differently, is sequentially rational if for any decision node x, |x
yields the lone player the highest payoff available in x .
Example 2.25. Sequential decision problem revisited. In the sequential decision problem from
Example 2.24, the sequentially rational strategy is (B, C, F, G, I, K, N).
We call strategy a one-shot deviation from strategy if it only differs from at a single
This one-shot deviation is profitable if it generates a higher payoff
decision node, say x.
than in subgame x .
The backward induction procedure can be viewed a systematic method of constructing a
strategy from which there is no profitable one-shot deviation: it first ensures this at nodes
followed only by terminal nodes, and then at the nodes before these, and so on.
Theorem 2.26 (The one-shot deviation theorem).
Let be a finite sequential decision problem. Then the following are equivalent:
Strategy does not admit a profitable deviation in any subgame. (That is, is sequentially
rational.)
(ii) Strategy does not admit a profitable one-shot deviation at any decision node. (That is,
survives the backward induction procedure.)
(i)
Thus to construct a strategy that does not admit profitable deviations of any kind, even
ones requiring changes in action at multiple nodes at once, it is enough to apply the
backward induction procedure, which never considers such deviations explicitly.
In the context of finite sequential decision problems, the one-shot deviation theorem is
simple, but it is not trivialit requires a proof. (One way to see that the theorem has some
content is to note that it does not extend to infinite sequential decision problems unless
additional assumptions are imposedsee Example 4.8 and Theorem 4.7.)
Proof of Theorem 2.26. It is immediate that (i) implies (ii). To prove the converse, suppose
that does not admit a profitable one-shot deviation, but that it does admit a profitable
deviation that requires changes in action over T periods of play. By hypothesis, the last
stage of this deviation is not profitable when it is undertaken. Therefore, only following
the first T 1 periods of the deviation must yield at least as good a payoff from the vantage
of the initial stage of the deviation. By the same logic, it is at least as good to only follow
the first T 2 periods of the deviation, and hence the first T 3 periods of the deviation
. . . and hence the first period of the deviation. But since a one-shot deviation cannot be
profitable, we have reached a contradiction.
73
In applying the one-shot deviation theorem, it is essential that the profitability of a deviation be evaluated from the vantage of the node where the deviation occurs.
Example 2.27. Illustration of the one-shot deviation theorem.
O
(0)
1x
I
L
(-1)
1y
R
(1)
If deviations are always evaluated from the point of view of the initial node x, then there
is no way to improve upon strategy (O, L) by only changing the action played at a single
node. But (O, L) clearly is not sequentially rational. Does this contradict the one-shot
deviation theorem?
No. When we consider changing the action at node y from L to R, we should view the
effect of this deviation from the vantage of node y, where it is indeed profitable. The
choice of R at y in turn creates a profitable deviation at x from O to I.
We are now in a position to complete the proof of Theorem 2.21.
Proof of Theorem 2.21, (iii) (ii). Suppose that strategy profile survives the backward
induction procedure in the finite perfect information game . We wish to show that is
sequentially rational in .
Suppose that we view the strategy profile i of player is opponents as exogenous. Then
player i is left with a collection sequential decision problems. In fact, player is strategy i
was obtained by applying the backward induction procedure in these decision problems,
so Theorem 2.26 implies that i is sequentially rational in these decision problems. As this
is true for all players, is sequentially rational in .
When we stated Theorem 2.21, it may have seemed that the rationale for the backward
induction procedure was entirely game-theoretic, in that it provided a way of enforcing
credibility of commitments. But having examined its use in one-player sequential decision
problems, we now see that it has a second, decision-theoretic rationale.
74
1,1
1,4
0,3
2,2
The unique subgame perfect equilibrium is ((B, F), D). For player 1 to prefer to play B, he
must expect player 2 to play D if her node is reached. Is it possible for a rational player 2
to do this?
To answer this question, we have to consider what player 2 might think if her decision
node is reached: specifically, what she would conclude about player 1s rationality, and
about what player 1 will do at the final decision node. One possibility is that player 2
thinks that a rational player 1 would play B, and that a player 1 who plays A is irrational,
and in particular would play E at the final node if given the opportunity. If this is what
player 2 would think if her node were reached, then she would play C there.
If player 1 is rational and anticipates such a reaction by player 2, he will play A, resulting
in the play of ((A, F), C). Indeed, this prediction is consistent with the assumption of
common certainty of rationality at the start of playsee Ben-Porath (1997). Therefore,
stronger assumptions of the sort described before this example are needed to ensure the
backward induction solution.
Example 2.29. Centipede (Rosenthal (1981)).
Two players, alternating moves. Each player begins with $1 in his pile. When moving,
a player can stop the game or continue. If a player continues, his pile is reduced by $1,
while his opponents pile is increased by $2. The game ends when a player stops or both
piles have $100; players take home the money in their piles.
C
1
S
1,1
2
S
0,3
1
S
2,2
2
S
1,4
... 1
S
98,98
2
S
97,100
1
S
99,99
100,100
S
98,101
In the backward induction solution, everyone always stops, yielding payoffs of (1, 1).
Remarks on the Centipede Game:
(i)
This example raises a general critique of backward induction logic: Suppose you
are player 2, and you get to go. Since 1 deviated once, why not expect him to
deviate again?
76
(ii) The outcome (1, 1) is not only the backward induction solution; it is also the unique
Nash equilibrium outcome.
(iii) There are augmented versions of the game in which some continuing occurs in
equilibrium: see Kreps et al. (1982) and Example 2.43 below.
(iv) In experiments, most people do better (McKelvey and Palfrey (1992)).
At the same time, in experiments with high-level chess players, subgame perfection
predicts play quite well (see Palacios-Huerta and Volij (2009), but also see Levitt
et al. (2011)).
Backward induction with indifferences
When indifferences arise during the backward induction procedure, the procedure branches,
with each branch leading to a distinct subgame perfect equilibrium. To justify subgame
perfect equilibrium predictions in such cases, one must resort to equilibrium knowledge
assumptions to coordinate beliefs about what indifferent players will do.
Example 2.30. Discipline by an indifferent parent. Player 1 is a child and player 2 the parent.
The child chooses to Behave or to Misbehave. If the child Misbehaves, the parent chooses
to Punish or Not to punish; she is indifferent between these options, but the child is not.
B
0,1
1x
M
P
2y
-1,0
N
1,0
1
2
77
2.3.3 Applications
Example 2.31. Chess and checkers.
In chess, moves and payoffs are well-defined. The game is zero-sum, and termination
rules ensure that it is finite (at least in some variationssee Ewerhart (2002)).
The minmax theorem thus implies that there is a unique Nash equilibrium outcome,
and that it can be ensured by maxmin play. Moreover, since a pure subgame perfect
equilibrium (and hence a pure Nash equilibrium) exists, the equilibrium outcome must
be nonrandom.
It follows that one of these three statements must be true: (i) Black can guarantee a win,
(ii) White can guarantee a win, (iii) both players can guarantee a draw.
In fact, it can be shown that after two rounds of removing weakly dominated strategies,
all of the strategy profiles that remain yield the equilibrium outcome (Ewerhart (2000)).
But even writing down the game tree for chess is impossible! In big enough games,
backward induction is not computationally feasible.
But Checkers (aka Draughts), which has roughly 5 1020 positions, has been solved:
perfect play leads to a draw! (Schaeffer et al. (2007))
Example 2.32. Alternating Offer Bargaining (Sthl (1972), Rubinstein (1982)).
Two players bargain over $1 for T + 1 periods, starting with period 0. In even periods: (i)
player 1 offers (ot , 1 ot ) as the split, where ot [0, 1]; (ii) if 2 accepts, the dollar is split;
(iii) if not, next round. In odd periods, roles are reversed. If no offers are ever accepted,
both players get nothing. Discounting: dollars received in period t are discounted by t ,
where (0, 1).
This game has perfect information, but also infinite strategy sets and many tied outcomes.
Proposition 2.33.
(i) There is a unique subgame perfect equilibrium. In it, the initial offer is accepted.
1
(ii) As T , equilibrium payoffs converge to ( 1+
, 1+
).
Part (ii) tells us that in a long game, there is a first mover advantage.
Proof: Suppose T is even (i.e., the number of periods is odd).
In period T, player 2 should accept any positive offer. If she receives an offer of 0, she is
indifferent, so we must consider the possibility that she accepts such an offer, rejects such
an offer, or randomizes after such an offer.
But it is convenient to consider player 1s period T offer first. It cannot be a best response
for player 1 to offer a positive amount to player 2. If he were going to offer her x, he
78
improve his payoffs by offering her x2 instead, since that offer will also be accepted. Thus
in any subgame perfect equilibrium, player 1 must offer 0 to player 2.
Now suppose that player 2 rejects an offer of 0 with probability y > 0. Then player 1s
y
expected dollar return from this offer is 1 y, and so he could do better by offering 2 ,
y
which ensures a dollar return of 1 2 .
Thus in any subgame perfect equilibrium, player 2 must accept an offer of 0. If player
2 does so, then offering 0 is optimal for player 1. This is the unique subgame perfect
equilibrium of the period T subgame.
We repeatedly (but implicitly) use this argument in the remainder of the analysis. (See
the remark below for further discussion.)
By backward induction:
In period T (even): 2 accepts any offer.
T
1+T+1 T+1
, 1+
1+
1T+1 +T+1
, 1+
1+
1
,
1+ 1+
1
,
1+ 1+
as T .
as T .
(Why? We saw that if T is even, 1 offers last, and his period 0 offer is ( 1+
,
) (*).
1+
1+
If T = T + 1 is odd, then 2 offers last; since when she makes her period 1 offer there are
T 1+T
T+1 1+T+1
T periods to go, this offer is obtained by reversing (*): (
, 1+ ) = (
,
). Thus,
1+
1+ 1+
1+T
+T+1
+T+1
1T+1
0
T+1
Remark: The argument above shows that the unique subgame perfect equilibrium has
each player making offers that cause the opponent to be indifferent, and has the opponent
accepting such offers. The fact that the opponent always accepts when indifferent may
seem strange, but with a continuum of possible offers, it is necessary for equilibrium to
exist. It is comforting to know that this equilibrium can be viewed as the limit of equilibria
79
of games with finer and finer discrete strategy sets, equilibria in which the opponent rejects
offers that leave her indifferent.
The infinite horizon version of this model is known as the Rubinstein (1982) bargaining model. The analysis is trickier since there is no last period. It turns out that the
model has a unique
subgame perfect equilibrium, and that payoffs in this equilibrium are
In both of these settings, all players have the same information at every point in play.
But there are many economic environments in which the key forces at work are asymmetries in information among the participants.
There are two main sorts of informational asymmetries:
unobserved actions: during the course of play some players actions may not be
observed by others
(ii) private information about primitives: players may have information about their preferences, the environment, others information, or others beliefs that is not known
to others
(i)
Corresponding to these are two basic models that capture each sort of asymmetry in
isolation.
Unobserved actions are modeled using extensive form games with imperfect information.
(ii) Bayesian games provide the basic framework for modeling settings with private
information about primitives at the start of play and simultaneous action choices.
(i)
More complicated asymmetries can be captured using extensive form games with imperfect
information and moves by Nature.
Examples of environments with asymmetric information:
poker
colluding firms
risk-taking by the insured (moral hazard)
markets for insurance contracts (adverse selection) (Rothchild-Stiglitz)
markets for used cars (Akerlof)
labor contracting with unobservable skill levels and/or effort choices (Spence)
firm/union negotiations and strikes
public good provision
auctions
The incorporation of asymmetric information into economic modeling is perhaps the
most important innovation in microeconomic modeling in recent decades. In the late
1970s/early 1980s, game theory replaced general equilibrium theory as the leading tool
for microeconomic modeling. In large part, this happened because informational asymmetries are not easily handled in general equilibrium models, but are simple to include in
game theory models.
81
2b
2e
1c
1d
2f
3g
3h
1i
0
2
2x
F
-1
-1
2y
A
2
1
F
-3
-1
A
4
0
O
M
B
F
0, 2
1, 1
3, 1
A
0, 2
2, 1
4, 0
The entrant can choose stay out (O), enter meekly (M), or enter boldly (B). The incumbent
cannot observe which sort of entry has occurred, but is better off accommodating either
way.
This game has two components of Nash equilibria, (B, A) and (O, 2 (F) 23 ).
82
Since the whole game is the only subgame, all of these Nash equilibria are also subgame
perfect equilibria.
To heed the principle of sequential rationality in this example, we need to ensure that
player 2 behaves rationally if her information set is reached.
To accomplish this, we will require players to form beliefs about where they are in an
information set, regardless of whether the information set is reached in equilibrium. We
then will require optimal behavior at each information set given these beliefs.
By introducing appropriate restrictions on allowable beliefs, we ultimately will define
the notion of sequential equilibrium, the fundamental equilibrium concept for general
extensive form games.
Remark on information sets and evaluation of deviations:
In Example 2.35, in any Nash equilibrium in which player 2s information set is reached
with positive probability, equilibrium knowledge ensures that player 2 has correct beliefs
about where she is in the information set. (This sort of equilibrium knowledge is precisely
the sort occurring in normal form games.)
Nevertheless, the information set plays a role when we evaluate the consequences of a
deviation by player 1.
For instance, consider evaluating the strategy profile (B, A). When we consider a deviation
by player 1 from B to M, player 2s strategy remains fixed at A.
If instead each of player 2s nodes were in its own information set, player 2s equilibrium
strategy could specify different behavior at each. In this case, if we considered a deviation
by player 1 from B to M, player 2s behavior on the path of play would change if her choices
at her two nodes differed.
2.4.2 Beliefs and sequential rationality
Beliefs
Fix an extensive form game .
Player is beliefs are a map i : Di [0, 1] satisfying
P
xI
If x I, i (x) represents the probability that player i assigns to being at node x given that
his information set I has been reached.
Note: the player subscript on i is often omitted when no confusion will result.
Let = (1 , . . . , n ) denote the profile of all players beliefs.
83
Given a strategy profile , we can compute the probability P (x) that each node x X is
P
reached. Let P (I) = xI P (x).
The beliefs are Bayesian given profile if i (x) =
P (x)
P (I)
2x
F
2y
1/4
3/4
1
3
At information sets I on the path of play, Bayesian beliefs describe the conditional probabilities of nodes being reached.
But beliefs are most important at unreached information sets. In this case, they represent
conditional probabilities after a probability zero event has occurred.
In Section 2.4.3 we impose a key restriction on allowable beliefs in just this circumstance.
Sequential rationality
Given a node x, let ui (|x) denote player is expected utility under strategy profile
conditional on node x being reached.
We call strategy i rational starting from information set I Ii given i and i if
(7)
X
xI
i (x)ui (i , i |x)
xI
(Remark: The definition only depends on choices under i and i from information set I
onward, and on beliefs i at information set I.)
If (7) holds for every information set I Ii , we call strategy i sequentially rational given i
and i . If for a given and this is true for all players, we call strategy profile sequentially
rational given .
A pair (, ) consisting of a strategy profile and beliefs is called an assessment.
The assessment (, ) is a weak sequential equilibrium if
84
0
2
2x
f
-5
3
2y
a
-5
2
f
-2
-1
a
2
1
In this game the entrant moves in two stages. First, he chooses between staying out (O)
and entering (E); if he enters, he then chooses between entering foolishly (F) and entering
cleverly (C). Entering foolishly (i.e., choosing (E, F)) is strictly dominated.
85
If the entrant enters, the incumbent cannot observe which kind of entry has occurred.
Fighting ( f ) is optimal against foolish entry, but accommodating (a) is optimal against
clever entry.
The unique subgame perfect equilibrium is ((E, C), a): player 1 plays the dominant strategy
C in the subgame, so 2 plays a, and so 1 plays E. This corresponds to a weak sequential
equilibrium (with beliefs 2 (y) = 1 determined by taking conditional probabilities).
In addition, ((O, ), 2 ( f ) 12 ) are Nash equilibria.
These Nash equilibria correspond to a component of weak sequential equilibria (that are
not subgame perfect!):
((O, C), f ) with 2 (x) 23 ;
((O, C), 2 ( f ) 12 ) with 2 (x) = 23 .
Why are these weak sequential equilibria? If 1 plays O, condition (ii) places no restriction
on beliefs at 2s information set. Therefore, player 2 can put all weight on x, despite the
fact F is a dominated strategy.
Thus, to obtain an appealing solution concept for games of imperfect information, we
need a stronger restriction than Bayesian beliefs.
2.4.3 Definition of sequential equilibrium
Definition
An assessment (, ) is a sequential equilibrium (Kreps and Wilson (1982)) if
(i)
k1 (E) = 1 kO
k1 (F) = kF
k1 (C) = 1 kF
= k2 (y) =
(1 kO )(1 kF )
Pk (y)
=
1X
Pk (y) + Pk (x)
1 kO
The assessments (((O, C), f ), (x) 32 ) are weak sequential equilibria (but not subgame
perfect equilibria).
k1 (O) = 1 kE
k1 (F)
kF
k1 (E) = kE
k1 (C)
=1
kF
= k2 (y) =
kE (1 kF )
kE
= 1 kF 1 l
87
2
L
1x
R
1y
Example 2.41. In the game tree below, if player 1 chooses A, consistency requires that
2 (y) = 3 ( y).
1
A
2y
2z
3 y
3z
One can regard cross-player consistency as an extension of the usual equilibrium assumption: In a Nash equilibrium, players agree about opponents behavior off the equilibrium
path. Cross-player consistency requires them to agree about the relative likelihoods of
opponents deviations from equilibrium play.
A one-shot deviation theorem for games of imperfect information
When determining whether a strategy profile is sequentially rational given beliefs , do
we need to explicitly consider multiperiod deviations? The following theorem says that
we need not, so long as beliefs are preconsistent.
Theorem 2.42 (Hendon et al. (1996)). Let be a finite extensive form game with perfect recall,
let be a strategy profile for , and suppose that beliefs are preconsistent given . Then is
sequentially rational given if and only if no player i has a profitable one-shot deviation from i
given i and i .
2.4.4 Computing sequential equilibria
To compute all sequential equilibria of a game:
(i) First take care of easy parts of and .
(ii) Then try all combinations of remaining pieces, working backward through each
players information sets.
89
One carries out (ii) by dividing the analysis into cases, using the implications of sequential
rationality to eliminate assessments. By the one-shot deviation theorem (Theorem 2.42),
we may focus on behavior at one information set at a time.
This approach is simply a convenient way of evaluating every assessment.
Remark: Although sequential equilibrium does not allow the play of strictly dominated
strategies, the set of sequential equilibria of a game may differ from that of a game 0 that
is obtained from by removing a strictly dominated strategy: see Section 2.6, especially
Examples 2.59 and 2.61.
Example 2.43. Centipede with a possibly generous player (Myerson (1991), based on Kreps et al.
(1982)).
Start with a Centipede game (Example 2.29) in which continuing costs you 1 but benefits
your opponent 5. (If this was it, always stop would be the unique subgame perfect
equilibrium.)
But assume that Player 2 has a 201 chance of being a generous type who always continues.
Player 2s type is not observed by player 1. (This is an example of a Bayesian extensive
form game (Section 3).)
0
0
1920
normal
generous
1x
1w
S1
0,0
s1
-1,5
S2
4,4
s2
3,9
C1
S1
C1
0,0
c1
1y
1z
C2
2
c2
8,8
90
S2
C2
4,4
8,8
19
20
P (y)
1 (y) =
=
P (y) + P (z)
19
(C1 )2 (c1 )
20 1
19
(C1 )2 (c1 ) + 201 1 (C1 )
20 1
192 (c1 )
.
192 (c1 ) + 1
And if 1 (C1 ) = 0, preconsistency implies that () still describes the only consistent beliefs.
At his second information set, 1 can choose S2 , C2 , or a mixed strategy.
Suppose first that 1 plays S2 . For this to be optimal for player 1, his payoff to S2 is at
least as big as his payoff to C2 . His beliefs thus must satisfy 4 31 (y) + 8(1 1 (y)), or,
equivalently, 1 (y) 45 . But if 1 plays S2 , it is optimal for player 2 to play s1 , which by
equation () implies that 1 (y) = 0, a contradiction.
Now suppose that 1 plays C2 . For this to be optimal for player 1, it must be that 1 (y) 54 .
But if 1 plays C2 , it is optimal for player 2 to play c1 , so equation () implies that 1 (y) = 19
,
20
another contradiction.
So player 1 mixes at his second information set. For this to be optimal, it must be that
1 (y) = 45 .
Thus equation () implies that
2 (c1 ) =
192 (c1 )
192 (c1 )+1
4
.
19
1
5
15
19
(1) +
4
19
4
5
4 + 15 3 +
1
20
4
5
4 + 51 8 =
1
4
for playing C1 . Therefore, player 1 continues, and the unique sequential equilibrium is
15 1
((C1 , 45 S2 + 15 C2 ), ( 19
s +
4 1 2
c , s )), (1 (w)
19
91
19
, 1 (y)
20
= 45 ) .
The point of the example: By making 1 uncertain about 2s type, we make him willing to
continue. But this also makes 2 willing to continue, since by doing so she can keep player
1 uncertain and thereby prolong the game.
Kreps et al. (1982) introduced this idea to show that introducing a small amount of
uncertainty about one players preferences can lead to long initial runs of cooperation in
finitely repeated Prisoners Dilemmas (Example 4.1).
Example 2.44. Ace-King-Queen Poker is a two-player card game that is played using a
deck consisting of three cards: an Ace (the high card), a King (the middle card), and a
Queen (the low card). Play proceeds as follows:
Each player puts $1 in a pot in the center of the table.
The deck is shuffled, and each player is dealt one card. Each player only sees the
card he is dealt.
Player 1 chooses to Raise (R) or Fold (F). A choice of R means that player 1 puts an
additional $1 in the pot. Choosing F means that player 1 ends the game, allowing
player 2 to have the money already in the pot.
If player 1 raises, then player 2 chooses to Call (c) or Fold ( f ). A choice of f means
that player 2 ends the game, allowing player 1 to have the money already in the pot.
A choice of c means that player 2 also puts an additional $1 in the pot; in this case,
the players reveal their cards and the player with the higher card wins the money in
the pot.
(i) Draw the extensive form of this game.
(ii) Find all sequential equilibria of this game.
(iii) If you could choose whether to be player 1 or player 2 in this game, which player
would you choose to be?
(iv) Suppose we modify the game as follows: Instead of choosing between Raise and
Fold, player 1 chooses between Raise and Laydown (L). A choice of L means that
the game ends, the players show their cards, and the player with the higher card
wins the pot. Answer parts (ii) and (iii) for this modified game.
(i) See below. (The probabilities of Natures moves (which are
1
6
the unique best response for type K. (In detail: Suppose 1 has a King and raises. If 2 has
an Ace, she will call a raise, so 1s payoff is 2; if 2 has a Queen, she will fold to a raise,
so 1s payoff is 1. When 1 has a King he finds these events equally likely, so his expected
payoff from a raise is 12 (2) + 21 (1) = 21 . Since his payoff from folding is 1, it is optimal
for player 1 to raise with a King.)
All that remains is to determine 1 (R|Q) and 2 (c|k).
Suppose 2 (c|k) = 1. Then 1 (F|Q) = 1, which implies that 2 (A|k) = 1, in which case 2
should choose f given k, a contradiction.
Suppose 2 (c|k) = 0. Then 1 (F|Q) = 0, which implies that 2 (A|k) = 12 , in which case 2
should choose c given k, a contradiction.
It follows that player 2 must be mixing when she is of type k. For this to be true, it must
be that when she is of type k, her expected payoffs to calling and folding are equal; this is
true when 2 (A|k) (2) + (1 2 (A|k)) 2 = 1, and hence when 2 (A|k) = 34 . For this to be
her belief, it must be that 1 (R|Q) = 13 .
For player 1 to mix when he is of type Q, his expected payoffs to R and to F must be equal.
This is true when
1 = 2
1 + 2 (c|k)
1 2 (c|k)
+1
.
2
2
2
+
1)
+
1
+
(2)
+
1
+
(
(2)
+
(1))
+
(
(1)
+
1
+
2)
= 91 .
6
3
3
3
3
3
9
9
Therefore, since this is a zero-sum game, player 2s expected payoff is 19 , and so it is better
to be player 2.
(iv) The unique sequential equilibrium is now
1 (R|A) = 1, 1 (R|K) = 0, 1 (R|Q) = 13 ; 2 (c|a) = 1, 2 (c|k) = 13 , 2 (c|q) = 0,
with the corresponding Bayesian beliefs. That is, the only change from part (ii) (apart
from replacing F with L) is that player 1 now chooses L when he receives a King.
To see why, note that compared to the original game, the payoffs in this game are only
different when 1 chooses L after Ak, Aq, and Kq: in these cases he now gets 1 instead of
1. As before, c is dominant for type a, and f is dominant for type q. Given these choices
of player 2, the unique best response for type K is now L. (The reason is that player 1s
expected payoff from choosing L when he has a King is 21 (1) + 12 (1) = 0, which exceeds
his expected payoff from R of 12 (2) + 21 (1) = 12 .)
Unlike before, R is now only weakly dominant for A: since 2 ( f |q) = 1, L is a best response
for type A when 2 ( f |k) = 1. But if 2 ( f |k) = 1, then 1 (R|Q) = 1, which implies that
93
0
Ak
Aq
A
2, 2
1, 1
R
2
c
1, 1
2, 2
1, 1
1, 1
2, 2 1, 1
k
1, 1
2, 2 1, 1
F
a
Qk
1
1, 1
2
c
Qa
1
1, 1
Kq
K
1, 1
Ka
2, 2 1, 1
2
c
2, 2
1, 1
Draw an extensive form representation of this game, and find all of its sequential
equilibria.
94
(ii) Now suppose that union 2 cannot observe union 1s decision. Draw an appropriate
extensive form for this new game, and compute all of its sequential equilibria.
(iii) The games in part (i) and (ii) each have a unique sequential equilibrium outcome,
but the choices made on these equilibrium paths are quite different. Explain in
words why the choices made on the equilibrium path in (i) cannot be made on the
equilibrium path in (ii), and vice versa. Evidently, the differences here must hinge
on whether or not player 2 can observe a deviation by player 1.
(i)
c
3
A
2
2
4
d
3x
B
-2
-2
2
a
2
4
2
d
^
3y
b
-2
0
1
4
2
2
3z
a
b
0
-2
1
4
4
0
b
0
0
1
95
1 (D)
(1 1 (D)) 2 (d) + 1 (D)
2 (d) =
1 (D)
.
1 1 (D)
In any sequential equilibrium, 3 chooses A. We again split the analysis into cases according
to 3s behavior at her right information set I; as before, b is a best response for 3 if and
only if 3 (z) 21 .
If 3 plays a, then 2 plays d, so 1 plays D. But then 3 prefers b. Contradiction.
Suppose that 3 plays b. We split the analysis into subcases:
Suppose that 2 plays c. Then 1 plays C. Then parsimony implies that 3 (z) = 0, and so 3
prefers a. Contradiction.
Suppose that 2 plays d. Then 1 plays D, and 2 and 3s choices are optimal. Thus (D, d, (A, b))
(with 2 (w) = 1 and 3 (z) = 1) is a sequential equilibrium.
Suppose that 2 mixes. Then she must be indifferent, implying that 21 (C)2(11 (C)) = 0,
or equivalently that 1 (C) = 12 . Thus 1 must be indifferent, and a similar calculation shows
that this implies that 2 (c) = 21 . But these choices of 1 and 2 imply that 3 (z) = 13 , and so 3
prefers a. Contradiction.
Now suppose that 3 mixes. Then 3 (z) = 21 . We again split the analysis into subcases.
First suppose that I is off the equilibrium path, which is the case if 1 plays C and 2 plays
c. Then parsimony implies that 3 (z) = 0, and so 3 prefers a. Contradiction.
Now suppose that I is on the equilibrium path. Then since beliefs are Bayesian, we have
96
1
C
2v
2w
c
3
A
2
2
4
d
3x
B
-2
-2
2
3y
b
2
4
2
-2
0
1
4
2
2
3z
a
b
0
-2
1
4
4
0
b
0
0
1
that
3 (z) =
1 (D)2 (d)
(1 1 (D)) 2 (d) + 1 (D)
This equation defines a hyperbola in the plane. One point of intersection with the unit
square (where legitimate mixed strategies live) is the point (1 (D), 2 (d)) = (0, 0), but
these choices prevent I from being unreached, a contradiction. The remaining points of
intersection have positive components, allowing us to rewrite () as
()
3=
1
1
+
.
1 (D) 2 (d)
Along this curve 2 (d) increases as 1 (D) decreases, and the curve includes points (1 (D), 2 (d))
= (1, 12 ), ( 23 , 23 ), and ( 21 , 1). A calculation shows that for player 1 to prefer D and player 2 to
prefer d, we must have
(a)
(b)
respectively. Now, the position of curve () implies that at least one of players 1 and 2
must be indifferent. Suppose without loss of generality that player 1 is indifferent. Then
(a) must bind, so this and (b) imply that 2 (d) 1 (D). Then equation () implies in turn
that 2 (d) 23 , and hence that player 2 is also indifferent. Thus (a) also binds, from which
it follows that 2 (d) = 1 (D), and hence, from (), that 2 (d) = 1 (D) = 32 . But then the
equality in (a) implies that 3 (a) = 32 , a contradiction.
Thus, the unique sequential equilibrium is (D, d, (A, b)) (with 2 (w) = 1 and 3 (z) = 1),
97
which generates payoffs (0, 0, 1). Evidently, if player 2 cannot observe player 1s choice,
all three players are worse off.
(iii) Why cant (D, d, b) be chosen on the equilibrium path in part (i)? If player 3 plays
(A, b), then player 2 will play c at her left node, while d is always a best response at her
right node. If player 1 is planning to play D, he knows that when he switches to C, player
2 will observe this and play c rather than d, which makes this deviation profitable.
Why cant (C, c, A) be chosen on the equilibrium path in part (ii)? If 1 and 2 are playing
(C, c), player 3s information set I is unreached. If a deviation causes I to be reached, then
since 2 cannot observe 1s choice, it follows from parsimony that 3 may not believe that
this is the result of a double deviation leading to z. Thus 3 must play a at I. Since 2
anticipates that 3 will play a at I, 2 is better off deviating to d. (The same logic shows that
1 is also better off deviating to D.)
2.4.5 Existence of sequential equilibrium and structure of the equilibrium set
Let be a finite extensive form game with perfect recall. Kreps and Wilson (1982) prove
Theorem 2.46. has at least one sequential equilibrium.
Theorem 2.47. Every sequential equilibrium strategy profile of is a subgame perfect equilibrium.
Relationships among refinements:
t
sequential equilibrium
Nash equilibrium
Theorem 2.48 (Kreps and Wilson (1982), Kohlberg and Mertens (1986)).
(i) The set of sequential equilibria of consists of a finite number of connected components.
(ii) For generic choices of payoffs in , payoffs are constant on each connected component of
sequential equilibria.
Remark: Theorem 2.48(ii) is also true if we replace sequential equilibrium with Nash
equilibrium. The restriction to generic choices of payoffs in rather than in G is important:
reduced normal forms of most extensive form games have payoff ties, and thus are
nongeneric in the space of normal form games.
1
O
0
2
2x
f
-5
3
0
2
-2
-1
-5
3
2x
2y
-5
2
2y
a
f
-5
2
-2
-1
1
2
1
f
0, 2
O
F 5, 3
C 2, 1
a
0, 2
5, 2
2, 1
2
1
We saw earlier that in game , ((E, C), a) is the unique subgame perfect equilibrium and
the unique sequential equilibrium (with 2 (y) = 1). There are additional weak sequential
equilibria in which player 1 plays O: namely, ((O, C), f ) with 2 (x) 23 , and ((O, C), 2 ( f )
1
) with 2 (x) = 23 .
2
Notice that and 0 only differ in the how player 1s choices are presented. In particular,
both of these games have the same reduced normal form: G() = G(0 ).
In 0 , consistency places no restrictions on beliefs. Therefore, all weak sequential equilibria
of above correspond to sequential equilibria in 0 !
What is going on? When F and C are by themselves at a decision node, consistency
forces player 2 to discriminate between them. But in 0 , F and C appear as choices at the
99
same decision node as O, so when player 1 chooses O, consistency does not discriminate
between F and C.
One way to respect invariance is to perform analyses directly on reduced normal forms, so
that invariance holds by default. This also has the advantage of mathematical simplicity,
since normal form games are simpler objects than extensive form games.
Shifting the analysis to the reduced normal form may seem illegitimate. First, normal
form and extensive form games differ in a fundamental way, since only in the latter is it
possible to learn something about ones opponent during the course of play (see Example
2.28). Second, working directly with the normal form appears to conflict with the use of
backward induction, whose logic seems tied to the extensive form.
Both of these criticisms can be addressed. First, the differences between extensive and
normal form games are much smaller if we only consider equilibrium play: when players
adhere to equilibrium strategies, nothing important is learned during the course of play.
Second, the fact that a strategy for an extensive form game specifies a players complete
plan for playing a game suggests that the temporal structure provided by the extensive
form may not be essential as it might seem. In fact, since an extensive form game creates
a telltale pattern of ties in its reduced normal form, one can reverse engineer a reduced
normal form to determine the canonical extensive form that generates itsee Mailath
et al. (1993).
To implement the normal form approach, we require robustness of equilibrium to low
probability mistakes, sometimes called trembles. Trembles ensure that all information sets
of the corresponding extensive form game are reached.
Perfect equilibrium
Throughout this section, we let G be a finite normal form game and a finite extensive
form game (with perfect recall).
Strategy profile is an -perfect equilibrium of G if it is completely mixed and if si < Bi (i )
implies that i (si ) .
Strategy profile is a perfect equilibrium of G if and only if it is the limit of a sequence
of -perfect equilibria with 0.
Remarks:
(i)
point arose in the definition of consistency for sequential equilibrium (Section 2.4.3).
The analogous point holds for proper equilibrium, but not for KM stable setssee
below.
(ii) The formulation of perfect equilibrium above is due to Myerson (1978). The original
definition, due to Selten (1975), is stated in terms of Nash equilibria of perturbed
S
games Gp , p : iP Si (0, 1), in which player is mixed strategy must put at least
probability psi on strategy si .
Example 2.50. Entry deterrence revisited.
1
O
2
0
3
2
F
-1
-1
F
O
0, 3
E 1, 1
A
0, 3
1, 1
E
1
1
Nash equilibria: (E, A) and (O, 2 (F) 12 ). Only (E, A) is subgame perfect.
What are the perfect equilibria of the normal form G()? Since F is weakly dominated, 2s
best response to any completely mixed strategy of 1 is A, so in any -perfect equilibrium,
2 (F) . It follows that if is small, 1s best response is E, so in any -perfect equilibrium,
1 (O) . Therefore, any sequence of -perfect equilibria with 0 converges to (E, A),
which is thus the unique perfect equilibrium of G().
Selten (1975) establishes the following properties of perfect equilibrium:
Theorem 2.51. G has at least one perfect equilibrium.
Theorem 2.52. Every perfect equilibrium of G is a Nash equilibrium which does not use weakly
dominated strategies. In two-player games, the converse statement is also true.
These results imply
Corollary 2.53. G has at least one Nash equilibrium in which no player uses a weakly dominated
strategy.
Let be a generic extensive form game of perfect information, so that has a unique
subgame perfect equilibrium. Will applying perfection to G() rule out Nash equilibria of
that are not subgame perfect?
101
If has the single move property (i.e., if no player has more than one decision node on any
play path), then the perfect equilibrium of G() is unique, and it is outcome equivalent to
the unique subgame perfect equilibrium of .
But beyond games with the single move property, perfect equilibrium is not adequate to
capture backward induction.
Example 2.54. In the game below, ((B, D), R) is the unique subgame perfect equilibrium.
((A, ), L) are Nash equilibria. (Actually A with 2 (L) 12 are Nash too.)
1
G()
2
2,4
L
1,1
C
0,0
2
L
R
A
2, 4 2, 4
BC 1, 1 0, 0
BD 1, 1 3, 3
1
3,3
G()
2
A
BC
BD
L
R
2, 4 2, 4
1, 1 0, 0
1, 1 3, 3
1
2
Why? If 2 (R) is small but positive, then u1 (A) > u1 (BD) > u1 (BC), so in any -proper
equilibrium we have 1 (BD) 1 (A) and 1 (BC) 1 (BD).
Therefore, player 2 puts most of her weight on R in any -proper equilibrium, and so L is
not played in any proper equilibrium.
Properties of proper equilibrium.
Theorem 2.55 (Myerson (1978)).
(i) G has at least one proper equilibrium.
(ii) Every proper equilibrium of G is a perfect equilibrium of G.
It can be shown that if is a game of perfect information, then every proper equilibrium
of G() is outcome equivalent (i.e., induces the same distribution over terminal nodes) to
some subgame perfect equilibrium of .
Remarkably, proper equilibrium also captures sequential rationality in games of imperfect
information:
Theorem 2.56 (van Damme (1984), Kohlberg and Mertens (1986)).
Suppose that is a proper equilibrium of G(). Then there is an outcome equivalent
behavior strategy profile of that is a sequential equilibrium strategy profile of .
(ii) Let { } be a sequence of -proper equilibria of G() that converge to proper equilibrium .
Let behavior strategy be outcome equivalent to , and let behavior strategy be a limit
point of the sequence { }. Then is a sequential equilibrium strategy profile of .
(i)
What is the difference between parts (i) and (ii) of the theorem? In part (i), is outcome
equivalent to some sequential equilibrium strategy profile . But outcome equivalent
strategy profiles may specify different behavior off the equilibrium path; moreover, the
strategy i for the reduced normal form does not specify how player i would behave at
unreachable information sets (i.e., at information sets that i itself prevents from being
reached). In part (ii), the -proper equilibria are used to explicitly construct the behavior
strategy profile . Thus, part (ii) shows that the construction of proper equilibrium does
103
not only lead to outcomes that agree with sequential equilibrium; by identifying choices
off the equilibrium path, it captures the full force of the principle of sequential rationality.
Theorem 2.56 shows that proper equilibrium achieves our goals of respecting the principle
of sequential rationality while ensuring invariance of predictions across games with the
same purely reduced normal form.
Nevertheless, we argue in the next sections that even proper equilibrium is subject to
criticism.
Extensive form perfect equilibrium and quasi-perfect equilibrium
The agent normal form A() of an extensive form is the reduced normal form we obtain
if we assume that each information set is controlled by a distinct player. In particular,
whenever the original game has players with more than one information set, we create
new players to inhabit the information sets.
Profile is an extensive form perfect equilibrium of (Selten (1975)) if it corresponds to a
(normal form) perfect equilibrium of A().
Example 2.54 revisited: Direct computation of extensive form perfect equilibrium.
The agent normal form of the game above is
A()
3:D
2
3:C
2
L
A 2, 4, 2
B 1, 1, 1
R
2, 4, 2
0, 0, 0
A
1
B
L
2, 4, 2
1, 1, 1
R
2, 4, 2
3, 3, 3
Only (B, R, D) is perfect in A(), and so only ((B, D), R) is extensive form perfect in .
Why isnt (A, L, ) perfect in A()? C is weakly dominated for player 3, so he plays D in
any perfect equilibrium. Facing C + (1 )D, 2 prefers R. Therefore, L is not played in
any perfect equilibrium.
Extensive form perfect equilibrium is the original equilibrium refinement used to capture
backward induction in extensive form games with imperfect information, but it is not an
easy concept to use. Kreps and Wilson (1982) introduced sequential equilibrium to retain
most of the force of extensive form perfect equilibrium, but in a simpler and more intuitive
way, using beliefs and sequential rationality.
The following result shows that extensive form perfect equilibrium and sequential equilibrium are nearly equivalent, with the former being a just slightly stronger refinement.
Theorem 2.57 (Kreps and Wilson (1982), Blume and Zame (1994), Hendon et al. (1996)).
(i)
(ii) In generic extensive form games, every sequential equilibrium strategy profile is an extensive
form perfect equilibrium.
In rough terms, the distinction between the concepts is as follows: Extensive form perfect
equilibrium and sequential equilibrium require reasonable behavior at all information
sets. But extensive form perfect equilibrium requires best responses to the perturbed
strategies themselves, while sequential equilibrium only requires best responses in the
limit.
To make further connections, Kreps and Wilson (1982) define weak extensive form perfect
equilibrium, which generalizes Seltens (1975) definition by allowing slight perturbations
to the games payoffs. They show that this concept is equivalent to sequential equilibrium.
Notice that extensive form perfect equilibrium retains the problem of making different
predictions in games with the same reduced normal form, since such games can have
different agent normal forms (cf Example 2.49).)
Surprisingly, extensive form perfect equilibria can use weakly dominated strategies:
Example 2.58. In the game below, A is weakly dominated by BD.
1
A
2
L
0,0
G()
1
1,1
0,0
2
A
BC
BD
L
0, 0
0, 0
1, 1
R
1, 1
0, 0
1, 1
1,1
But ((A, D), R) is extensive form perfect: there are -perfect equilibria of the agent normal
form in which agent 1b is more likely to tremble to C than player 2 is to tremble L, leading
agent 1a to play A.
In fact, Mertens (1995) (see also Hillas and Kohlberg (2002)) provides an example of a game
in which all extensive form perfect equilibria use weakly dominated strategies! Again, the
difficulty is that some player believes that he is more likely to tremble than his opponents.
In extensive form game , one defines quasi-perfect equilibrium (van Damme (1984)) in
essentially the same way as extensive-form perfect equilibrium, except that when considering player is best response at a given information set against perturbed strategy
profiles, one only perturbs the strategies of is opponents; one does not perturb player is
own choices at his other information sets. Put differently, we do not have player i consider
the possibility that he himself may tremble later in the game.
Neither of extensive form perfection or quasi-perfection implies the other. But unlike extensive form perfect equilibria, quasi-perfect equilibria never employ weakly dominated
strategies.
105
van Damme (1984) proves Theorem 2.56 by showing that proper equilibria must correspond to quasi-perfect equilibria, which in turn correspond to sequential equilibria.
Mailath et al. (1997) show that proper equilibrium in a given normal form game G is
equivalent to what one might call uniform quasi-perfection across all extensive forms
with reduced normal form G.
Sequential rationality without equilibrium in imperfect information games
In this section and the last, our analyses of extensive form games and their reduced normal
forms have used equilibrium concepts designed to capture the principle of sequential
rationality. But one can also aim to capture sequential rationality through non-equilibrium
solution concepts. This requires combining the logic of rationalizability with the undying
common belief in future rational play used to justify the backward induction solution
in generic perfect information games (see Section 2.3.2.) The resulting solution concepts
yield the backward induction solution in generic perfect information games, but can be
applied to imperfect information games as well.
The basic solution concept obtained in this manner, sequential rationalizability, was suggested by Bernheim (1984) (under the name subgame rationalizability), defined formally
by Dekel et al. (1999, 2002), and provided with epistemic foundations in the two-player
case by Asheim and Perea (2005). By adding a requirement of common certainty of cautiousness, the last paper also defines and provides epistemic foundations for quasi-perfect
rationalizability for two-player games, which differs from sequential rationalizability only
in nongeneric extensive form games.
Similar motivations lead to the notion of proper rationalizability for normal form games
(Schuhmacher (1999), Asheim (2001), Perea (2011)). The epistemic foundations for this
solution concept require common knowledge of (i) cautiousness and (ii) opponents being
infinitely more likely to play strategies with higher payoffs. (We note that cautiousness allows the ruling out of weakly dominated strategies, but not of iteratively weakly
dominated strategies, because the weakly dominated strategies are never viewed as completely impossiblesee Asheim (2006, Sec. 5.3).)
As discussed in Section 2.3.2, the assumption of undying common belief in future rational play may be viewed as too strong in games without the single-move property. One
weakening of this assumption requires that a player need only expect an opponent to
choose rationally at reachable information sets, meaning those that are not precluded by the
opponents own choice of strategy. (For instance, in the Mini Centipede game (Example
2.28), player 1s second decision node is not reachable if he chooses B at his first decision
node.) Foundations for the resulting rationalizability concept, sometimes called weakly
sequential rationalizability, are provided by Ben-Porath (1997); for the equilibrium analogue
of this concept, sometimes called weakly sequential equilibrium, see Reny (1992). Adding
common certainty of cautiousness yields the permissible strategies, which are the strategies
that survive the Dekel-Fudenberg procedure; see Dekel and Fudenberg (1990), Brandenburger (1992), and Borgers
(1994), as well as Section 1.2.4. The equilibrium analogue
of permissibility is normal form perfect equilibrium. See Asheim (2006) for a complete
106
Example 2.59. Battle of the Sexes with an outside option. Consider the following game and
its reduced normal form G():
1
2, 2
G()
1
T
3, 1
0, 0
0, 0
1, 3
L
O 2, 2
T 3, 1
B 0, 0
R
2, 2
0, 0
1, 3
107
Nevertheless, only one of the three equilibria seems reasonable: If player 1 enters the
subgame, he is giving up a certain payoff of 2. Realizing this, player 2 should expect him
to play T, and then play L herself. We therefore should expect ((I, T), L) to be played.
Kohlberg and Mertens (1986) use this example to introduce the idea of forward induction:
Essentially what is involved here is an argument of forward induction: a subgame
should not be treated as a separate game, because it was preceded by a very specific form
of preplay communicationthe play leading to the subgame. In the above example, it is
common knowledge that, when player 2 has to play in the subgame, preplay communication (for the subgame) has effectively ended with the following message from player 1
to player 2: Look, I had the opportunity to get 2 for sure, and nevertheless I decided to
play in this subgame, and my move is already made. And we both know that you can no
longer talk to me, because we are in the game, and my move is made. So think now well,
and make your decision.
Speeches of this sort are often used to motivate forward induction arguments.
In the example above, forward induction can be captured by requiring that an equilibrium
persist after a strictly dominated strategy is removed. Notice that strategy (I, B) is strictly
dominated for player 1. If we remove this strategy, the unique subgame perfect equilibrium (and hence sequential and proper equilibrium) is ((I, T), L). This example shows that
none of these solution concepts is robust to the removal of strictly dominated strategies,
and hence to a weak form of forward induction.
In general, capturing forward induction requires more than persistence after the removal
of dominated strategies.
A somewhat stronger form of forward induction is captured by equilibrium dominance: an
equilibrium should persist after a strategy that is suboptimal at the equilibrium outcome
is removed (see Sections 2.7.2 and 2.6.2).
A general definition of forward induction for all extensive form games has been provided
by Govindan and Wilson (2009).
For intuition, GW say:
Forward induction should ensure that a players belief assigns positive probability only
to a restricted set of strategies of other players. In each case, the restricted set comprises
strategies that satisfy minimal criteria for rational play.
GWs formal definition is along these lines:
A players pure strategy is called relevant for an outcome of a game in extensive form
with perfect recall if there exists a weakly sequential equilibrium with that outcome for
which the strategy is an optimal reply at every information set it does not exclude. The
108
These games have many applications (to labor, IO, bargaining problems, etc.)
In signaling games, sequential equilibrium fails to adequately restrict predictions of play.
We therefore introduce new refinements that capture forward induction, and that take the
form of additional restrictions on out-of-equilibrium beliefs.
109
r1
r2
r1
m1
ta
m2
r2
r3
0
r1
r2
m1
tb
r1
m2
r2
r3
P = {1, 2}
T
A1 = M = {. . . , m, . . .}
S1 = {s1 : T M}
1a (m)
u1a (m, r)
A2 = R = {. . . , r, . . .}
Rm R
S2 = {s2 : M R | s2 (m) Rm }
m
(r)
2
m
(T)
2
u2 (t, m, r)
u1a (m, r) m
2 (r).
rRm
If message m is sent, the receivers expected utility from response r given beliefs m is
X
u2 (t, m, r) m
2 (t).
tT
110
ta
0
U
0
0
tb
-1
1
-1
0
-1
0
1
1
Suppose I is played. This message is strictly dominated for ta , so the receiver really ought
to believe she is facing tb (i.e., 2 (b) = 1), breaking the bad equilibria.
Story: Suppose you are tb . If you deviate, you tell the receiver: ta would never want to
deviate. If you see a deviation it must be me, so you should play D.
If we introduce the requirement that equilibria be robust to the removal of dominated
strategies, the bad equilibria are eliminated: If we eliminate action I for type ta , then
player 2 must play D, and so type tb plays I.
Computation of equilibria:
We can treat each type of player 1 as a separate player.
Strategy O is strictly dominant for type ta , so he plays this in any sequential equilibrium.
Now consider type tb . If 1b (I) > 0, then 2 (tb ) = 1, so player 2 must play D, implying that
type tb plays I. Equilibrium.
If type tb plays O, then player 2s information set is unreached, so her beliefs are unrestricted. Also, for O to be type tb s best response, it must be that 0 2 (U) + (1 2 (U)),
or equivalently that 2 (U) 12 . 2 (U) = 1 is justified for player 2 whenever 2 (a) 21 ,
while 2 (U) [ 12 , 1) is justified whenever 2 (a) = 12 . These combinations are sequential
equilibria.
Stronger forms of forward induction are based on equilibrium dominance: they rule out
components of equilibria that vanish after a strategy that is not a best response at any
equilibrium in the component is removed.
Example 2.62. The Beer-Quiche Game (Cho and Kreps (1987)).
0
1
F
W
.1
2
0
1
-1
tw
F
W
3
0
0
.9
F
W
ts
3
0
1
1
0
-1
W
2
0
Senders can be wimpy or surly. Surly types like beer for breakfast; wimpy types like
quiche. Getting ones preferred breakfast is worth 1. Receivers like to fight with wimpy
players but walk away from surly players. Avoiding fights is worth 2 to all senders.
112
(1)
1w (B)
B2 (W)
Q
(F)
2
Q
2 (tw )
= 1 = 1s (B)
= 1
= 1 ( 12 )
12 (= 21 )
(2)
1w (Q)
Q
(W)
2
B
2 (F)
B2 (tw )
= 1 = 1s (Q)
= 1
= 1 ( 12 )
12 (= 21 )
Are the equilibria in component (2) reasonable? The wimpy type is getting his highest
possible payoff by choosing quichehe can only be hurt by switching to beer. Therefore,
if the receiver sees beer, he should expect a surly type, and so walk away. Expecting this,
surly players should deviate to beer.
In Example 2.61, certain beliefs were deemed unreasonable because they were based on
expecting a particular sender type to play a dominated strategy. This is not the case
here: B is not dominated by Q for tw . Instead, we fixed the component of equilibria
under consideration, and then concluded that certain beliefs are unreasonable given the
anticipation of equilibrium payoffs: the possible payoffs to B for tw (which are 0 and 2) are
all smaller than this types equilibrium payoff to Q (which is 3), so a receiver who sees B
should not think he is facing tw .
Computation of equilibria:
We divide the analysis into cases according to the choices of types tw and ts .
(Q or mix, B). This implies that B2 (s) .9 and hence that sB2 = W, and that Q
(w) = 1 and
2
Q
hence that s2 = F. Thus type tw obtains 1 for playing Q and 2 for playing W, implying that
he plays W. l
(B, B). In this case B2 (ts ) = .9 and hence sB2 = W. Q
is unrestricted. Playing B gives ts his
2
Q
best payoff. Type tw weakly prefers B iff 2 2 (F) + 3(1 Q
(F), and hence iff Q
(F) 21 .
2
2
Q
(F) = 1 is justified if Q
(s) 12 , while Q
(F) [ 21 , 1) is justified if Q
(s) = 21 . These
2
2
2
2
combinations form a component of sequential equilibria.
(B or mix, Q). This implies that Q
(s) .9 and hence that sQ
= W, and that B2 (w) = 1 and
2
2
hence that sB2 = F. Thus type tw obtains 0 for playing B and 3 for playing Q, implying that
he plays Q. l
(Q, Q). In this case Q
(t ) = .9 and hence sQ
= W. B2 is unrestricted. Playing Q gives tw
2 s
2
his best payoff. Type ts weakly prefers Q if and only if 2 B2 (F) + 3(1 B2 (F)), and hence
if and only if B2 (F) 12 . B2 (F) = 1 is justified if B2 (s) 21 , while B2 (F) [ 12 , 1) is justified if
B2 (s) = 12 . These combinations form another component of sequential equilibria.
(B, mix). In this case Q
(t ) = 1 and hence sQ
= W, implying that tw prefers Q. l
2 s
2
(Q, mix). In this case B2 (ts ) = 1 and hence sB2 = W, implying that ts prefers B. l
113
(W), and
(mix, mix). For ts to be indifferent, it must be that (1 B2 (W)) + 3B2 (W) = 2Q
2
Q
hence that B2 (W) = 2 (W) 12 . But for tw to be indifferent, it must be that 2B2 (W) =
(W) + 12 . l
(1 Q
(W)) + 3Q
(W), and hence that B2 (W) = Q
2
2
2
We now consider refinements that formalize the notion of equilibrium dominance in
signaling games. Cho and Kreps (1987), using results of Kohlberg and Mertens (1986),
prove that at least one equilibrium outcome survives after any one of these refinements is
applied.
(I) Rm be the set of responses to message m that are optimal
For set of types I T, let BRm
2
for the receiver under some beliefs that put probability 1 on the senders type being in I.
Formally:
m
BRm
2 (2 )
BRm
2 (I)
= argmax
rRm
u2 (t, m, r) m
2 (t),
tT
m
BRm
2 (2 ).
m
: m
(I)=1
2
2
Fix a component of sequential equilibria of signaling game , and let u1a be the payoff
received by type ta on this component. (Recall that we are restricting attention to games
in which payoffs are constant on every equilibrium component.)
(I) For each unused message m, let
o
n
u
(m,
r)
Dm = ta T : u1a > max
1a
m
rBR2 (T)
Dm is the set of types for whom message m is dominated by the equilibrium, given
that the receiver behaves reasonably.
(II) If for some unused message m with Dm , T and some type tb , we have
u1b <
min
rBRm
(TDm )
2
u1b (m, r)
then component of equilibria fails the Cho-Kreps criterion (a.k.a. the intuitive criterion).
Type tb would exceed his equilibrium payoffs by playing message m if the receiver
played a best response to some beliefs that exclude types in Dm .
Example 2.62 revisited. Applying the Cho-Kreps criterion in the Beer-Quiche game.
Component (2): B is unused.
114
Further refinements
We can eliminate more equilibria by replacing (II) with a weaker requirement.
For set of types I T, let MBRm
(I) Rm be the set of responses to message m that are
2
optimal for the receiver under some beliefs that put probability 1 on the senders type
being in I. Formally:
[
m
m m
m
{ m
MBRm
(I)
=
2 R : support(
2
2 ) BR2 (2 )}.
(I)=1
: m
m
2
2
m
that u1b < rRm u1b (m, r) 2 (r), then (, ) fails the strong Cho-Kreps criterion (a.k.a.
the equilibrium domination test).
Whats the difference between (II) and (II)?
(i)
Under (II), there is a single type who wants to deviate regardless of the BR the
receiver chooses.
(ii) Under (II), the type can vary with the BR the receiver chooses. This sometimes
allows us to rule out more equilibria.
Iterated versions of these concepts can be obtained by applying (I) repeatedly...
n
o
(Dm )0 = ta T : u1a > max
u
(m,
r)
1a
m
m
rB2 (TD )
There are additional refinements that are stronger than equilibrium dominance. Banks
and Sobel (1987) introduce refinements that are based on the following idea (called D2
in Cho and Kreps (1987)): instead of following step (II) above, we exclude type ta from
for which ta
having deviated to unused message m if for any mixed best response m
2
weakly prefers m to getting the equilibrium payoff, there is another type tb that strictly
prefers m. Iterated versions of this sort of requirement are called (universal) divinity by
Banks and Sobel (1987). Under the never a weak best response criterion, we exclude type ta
for which ta
from having deviated to unused message m if for any mixed best response m
2
is indifferent between playing m and getting the equilibrium payoff, there is another type
tb that strictly prefers m. All of these refinements are implied by KM stability (see the
next section); guaranteeing that components satisfying the refinements exist. For further
discussion, see Cho and Kreps (1987) and Banks and Sobel (1987).
It may not be surprising that this profusion of different solution concepts has led to
substantial criticism of the signaling game refinements literature.
2, 2
G()
1
T
3, 1
0, 0
0, 0
1, 3
116
L
O 2, 2
T 3, 1
B 0, 0
2
R
2, 2
0, 0
1, 3
2, 2
G(0 )
1
M
3
4
2, 2
L
D
1
1
4
R L
1
R
O
M
T
B
2
L
2, 2
2 14 , 1 34
3, 1
0, 0
R
2, 2
1 21 , 1 12
0, 0
1, 3
3, 1 0, 0 3, 1 0, 0 0, 0 1, 3
117
(iii) If we find solutions for games by applying proper equilibrium to their purely
reduced normal forms, then we may obtain different solutions to games with the
same fully reduced normal form.
Hillas (1998) suggests a different interpretation of Example 2.63: by requiring solutions to
respect backward induction (sequential equilibrium) and full invariance, one can obtain
forward induction for free!
Example 2.63 rerevisited. As we have seen, has three subgame perfect (and sequential)
equilibria. But 0 has the same fully reduced normal form as , but its only subgame
perfect equilibrium is ((I, D, T), L). Therefore, our unique prediction of play in should
be the corresponding subgame perfect equilibrium ((I, T), L). As we saw earlier, this is the
only equilibrium of that respects forward induction.
Building on this insight, Govindan and Wilson (2009) argue that together, backward
induction and full invariance imply forward induction, at least in generic two-player
games.
2.7.2 KM stability and set-valued solution concepts
With the foregoing examples as motivation, Kohlberg and Mertens (1986) list desirable
properties (or desiderata) for refinements of Nash equilibrium.
(D1) Full invariance: Solutions to games with the same fully reduced normal form are
identical.
(D2) Backward induction: The solution contains a sequential equilibrium.
(D3) Iterated dominance: The solution to G contains a solution to G0 , where G0 is obtained
from G by removing a weakly dominated strategy.
(D4) Admissibility: Solutions do not include weakly dominated strategies.
Iterated dominance (D3) embodies a limited form of forward inductionsee Section 2.6.1.
KM argue that admissibility (D4) is a basic decision-theoretic postulate that should be
respected, and appeal to various authorities (Wald, Arrow, . . . ) in support of this point of
view.
In addition, KM require existence: a solution concept should offer at least one solution for
every game.
For a solution concept to satisfy invariance (D1), backward induction (D2), and existence
in all games, the solutions must be set-valued: see Example 2.65 below.
118
Similarly, set-valued solutions are required for the solution concept to satisfy (D1), (D3),
and existence (see KM, Section 2.7.B), or and existence (see Example 1.12).
Set-valued solutions are natural: Extensive form games possess connected components
of Nash equilibria, elements of which differ only in terms of behavior at unreached
information sets. Each such component should be considered as a unit.
Once one moves to set-valued solutions, one must consider restrictions on the structure
of solution sets. KM argue that solution sets should be connected sets.
As build-up, KM introduce two set-valued solution concepts that satisfy (D1)(D3) and
existence, but that fail admissibility (D4) and connectedness.
They then introduce their preferred solution concept: A closed set E of Nash equilibria
(of game G = G ()) is KM stable if it is minimal with respect to the following property:
for any > 0 there exists some 0 > 0 such that for any completely mixed strategy vector
(1 , . . . , n ) and for any 1 , . . . , n (0 < i < 0 ), the perturbed game where every strategy s
of player i is replaced by (1 i )s + i i has an equilibrium -close to E.
Remark: If in the above one replaces for any (1 , . . . , n ) and 1 , . . . , n with for some
(1 , . . . , n ) and 1 , . . . , n , the resulting requirement is equivalent to perfect equilibrium.
Thus, a key novelty in the definition of KM stability is the requirement that equilibria be
robust to all sequences of perturbations.
KM stability satisfies (D1), (D3), (D4), and existence. In fact, it even satisfies a stronger
forward induction requirement than (D3) called equilibrium dominance: A KM stable set E
contains a KM stable set of any game obtained by deletion of a strategy that is not a best
response to any equilibrium in E (see Section 2.6.2 for further discussion).
However, KM stability fails connectedness and backward induction (D2): KM provide
examples in which a KM stable set (of G ()) contains no strategy profile corresponding
to a sequential equilibrium (of ).
A variety of other definitions of strategic stability have been proposed since Kohlberg
and Mertens (1986). Mertens (1989, 1991) proposes a definition of strategic stability that
satisfies (D1)(D4), existence, connectedness, and much besides, but that is couched in
terms of ideas from algebraic topology. Govindan and Wilson (2006) obtain (D1)(D4) and
existence (but not connectedness) using a relatively basic definition of strategic stability.
Example 2.65. Why backward induction and full invariance require a set-valued solution
concept.
119
(p)
G
1,-1
1,-1
1 T
2,-2
-2, 2
-2, 2
2,-2
1,-1
1
M
1 p
1,-1
2x
L
2y
R
2z
R L
In particular, player 1 must place positive probability on B and on at least one of M and T.
For player 1 to place positive probability on T, he would have to be indifferent between B
and T, implying that 2 plays 12 L + 21 R. But in this case player 1 would be strictly better off
playing M in the subgame, a contradiction. Thus 1 (T) = 0, and so (*) implies that player
1p
1
1 plays 2p
M + 2p B in the subgame.
For player 1 to be willing to randomize between M and B, it must be that
p + (1 p)(22 (L) 2(1 2 (L))) = 22 (L) + 2(1 2 (L)),
implying that 2 (L) =
43p
84p
4p
.
84p
Finally, with these strategies chosen in the subgame, player 1s expected payoff from
choosing I at his initial node is
22 (L) + 22 (R) =
4p
84p
p
.
2p
Since p < 1, this payoff is less than 1, and so player 1 strictly prefers O at his initial node.
120
Each choice of p (0, 1) leads to a unique and distinct sequential equilibrium of (p).
These equilibria correspond to distinct Nash equilibria of G, which itself is the reduced
normal form of each (p). Therefore, if we accept backward induction and invariance, no
one Nash equilibrium of G constitutes an acceptable prediction of play. Thus, requiring
invariance and backward induction leads us to set-valued solution concepts.
(If in (p) we had made the strategy M a randomization between O and B, the weight on
2 (L) would have gone from 43 to 12 , giving us the other half of the component of equilibria.
This does not give us 2 (L) = 21 , but this is the unique subgame perfect equilibrium of the
game where 1 does not have strategy M.)
3. Bayesian Games
How can we model settings in which different players have different information at
the onset of play? We now introduce Bayesian games, which provide a simple way of
modeling settings in which agents choose strategies simultaneously after obtaining their
private information.
(One can also model information differences at the start of play using extensive form games
that begin with moves by Nature, as we have in Example 2.3 (simple card game), Example
2.43 (Centipede with a possibly generous player), Example 2.44 (Ace-King-Queen Poker),
and Section 2.6.2 (signaling games). Indeed, any Bayesian game satisfying the common
prior assumption can be represented in this way (see the Remark below). The extensive
form also allows for sequential moves and further information asymmetries during the
course of play, but at the cost of having a considerably more complicated model.)
3.1 Definition
A Bayesian game (Harsanyi (19671968)) is a collection BG = {P , {Ai }iP , {Ti }iP , {pi }iP , {ui }iP }.
P = {1, . . . , n}
ai Ai
ti Ti
pi : Ti Ti
ui : A T R
That is, the pi are conditional probabilities generated from the common prior p.
When the CPA holds, the Bayesian game BG is equivalent to the following extensive form
game BG :
Stage 0:
Stage 1:
Bayesian strategies
si : Ti Ai
i : Ti Ai
122
Nash equilibrium
If player i of type ti chooses action ai , and each opponent j , i plays some Bayesian strategy
s j , then type ti s expected payoff is
(8)
Ui (ai , si |ti ) =
ti Ti
In words: player i of type ti chooses an action that maximizes his expected utility given
his beliefs about opponents types and given the opponents Bayesian strategies.
Example 3.1. Consider a card game in which (i) each player is dealt a hand of cards from
a single shuffled deck, and observes only his own hand, and then (ii) each player simultaneously chooses an action. We can identify a players type with his hand. According to
definition (8), when a player with hand ti is deciding what to do, he should account for all
of the possible profiles of hands ti of the other players, weighting these profiles by how
likely he thinks they are. His beliefs pi ( |ti ), which describe the probabilities he assigns to
his opponents having various profiles of hands, depend on his own handfor instance,
if he has all the aces, then he knows that no one else has any.
Now suppose that player i correctly anticipates his opponents strategies, and hence the
action s j (t j ) that opponent j would play if she had hand t j . Definition (8) indicates that
player i0 s payoffs are affected by his opponents hands in two distinct ways: there is a
direct effect (in some circumstances, he will win because he has better cards), as well as an
indirect effect, since an opponents cards determine the action she chooses (for instance,
an opponent with bad cards may choose a passive action).
Strategy profile = (1 , . . . , n ) is a (mixed) Nash equilibrium of BG if
i (ai | ti ) > 0 ai argmax
ai Ai
X
ti Ti
X Y
pi (ti | ti )
j (a j | t j ) ui ((ai , ai ),(ti , ti ))
ai Ai j,i
Sometimes Nash equilibria of Bayesian games are called Bayesian equilibria, or Bayes-Nash
equilibria, or other similar sounding things.
Remark: A Bayesian game BG as defined above is equivalent to a normal form game G
P
with players (i, ti ) (and hence iP #Ti players in total) and payoffs Ui ( |ti ).
Example 3.2. In the two-player Bayesian game BG, player is type ti , representing his level
of productivity, takes values in the finite set Ti {1, 2, . . .}. Types are drawn according to
the prior distribution p on T = T1 T2 . After types are drawn, each player chooses to be
In or Out of a certain project. If player i chooses Out, his payoff is 0. If player i chooses In
and player j chooses Out, player is payoff is c, where c > 0 is the cost of participating
in the project. Finally, if both players choose In, then player is payoff is ti t j c. Thus,
a player who chooses In must pay a cost, but the project only succeeds if both players
choose In; in the latter case, the per-player benefit of the project is the product of the
players productivity levels.
Now suppose that the type sets are T1 = {3, 4, 5}
distribution p is given by the table below.
t2
4 5
3 .2 .1
t1 4 .1 .1
5 0 .1
6
0
.1
.3
2
3
This is shown as follows: The highest benefit that type t1 = 3 could obtain from playing
In is 32 12 + 31 5 = 13. Since this is less than c = 15, this type stays out in any equilibrium.
Proceeding sequentially, we argue that types t2 = 4, t1 = 4, and t2 = 5 play Out. The
highest benefit type t2 = 4 could obtain from playing In is 31 16 = 5 13 < 15, so this
type plays Out; thus, the highest benefit type t1 = 4 could obtain from playing In is
124
20 + 13 24 = 44
= 14 23 < 15, so this type plays Out; and thus, finally, the highest benefit
3
type t2 = 5 could obtain from playing In is 31 25 = 8 13 < 15, so this type plays Out.
1
3
Conditional on this behavior for the low and middle productivity types, the remaining
types, t1 = 5 and t2 = 6, are playing a 2 2 coordination game, namely
t1 = 5
t2 = 6
In
Out
1
1
In 7 2 , 7 2 15, 0
Out 0, 15
0, 0
where 7.5 = 43 30 15. It is an equilibrium for both to be In, and also for both to be Out.
For type t1 = 5 to be willing to randomize, his expected benefit to being in must equal
c = 15; thus 34 62 30 = 15, implying that 62 = 23 . Virtually the same calculation shows that
51 = 23 as well.
3.2 Interpretation
Interpreting types and equilibria
1. In environments where agents receive informative private signals (e.g., card games;
exploratory drilling before a mineral rights auction) one can take the drawing of
types described by the extensive form game BG literally.
2. In other settings (e.g., auctions with independent private values), it may be
unreasonable to suppose that players suddenly learn their preferences as play
begins. In such cases, a players nonrealized types are there in order to describe
opponents uncertainty about this players preferences.
3. When different players types are independent (e.g., auctions with independent
private values; games in which only one player has private information), one can
imagine that each player i is drawn from a population in which the distribution of
types is described by the marginal distribution of p T on Ti .
4. If player 1 knows his type is t1 = 4, and player 2 knows her type is t2 = 6, why do
we need to determine the equilibrium behaviors of the other types?
We do this to close the model. For example, since player 1 doesnt know player 2s
type, he must form conjectures about the behavior of t2 = 4, t2 = 5, and t2 = 6.
We could specify 2 ( | t2 = 4) and 2 ( | t2 = 5) exogenously, but this wouldnt make
much sense if we didnt select best responses: that would amount to requiring
player 1 to believe that t2 = 6 behaves rationally but that types t2 = 4 and t2 = 5 do
not.
125
A model of rational behavior must require all types to behave optimally, even if
the modeler knows which types were realized.
Understanding types when multiple players have multiple types
1. A players type is all private information he has at the onset of play.
There are two aspects to a players type:
(i) Basic uncertainty: information that affects preferences (own or others),
or information about the environment.
This is reflected in the fact that each ui can depend on ti and t j .
(ii) Beliefs about opponents types.
This is reflected in the fact that pi conditions on ti .
In most applications, the name we assign to a type reflects (i), but (ii) is still a part
of the type.
2. Beliefs refers not only to first-order beliefs (about the opponents types) but also
higher-order beliefs (i.e., beliefs about beliefs). For instance,
p(player 2 assigns probability
1
3
to t1 = 3 | t1 = 3) = p(t2 = 5 | t1 = 3) = 13 .
2. More generally, the rationale for the CPA is less obvious. This is especially so when
the drawing of types is not to be taken literally.
The CPA is generally imposed for modeling discipline. Under the CPA, it is as if
the game began at an ex ante stage at which all information was commonly known.
In this way, the CPA ensures that all differences in players beliefs have explicitly
modeled sources.
3. While the CPA is a good baseline assumption, it may not be appropriate in settings
where irreconcilable differences in agents beliefs are importantfor instance, in
models of speculative trade. See Morris (1994, 1995).
3.3 Examples
The next two examples illustrate the idea of contagion, which refers to iterated dominance
arguments that arise in certain Bayesian games with chains of correlation in the common
prior distribution. These examples illustrate the important role that higher-order beliefs
can play in equilibrium analysis. Less stylized versions of these examples have been used
to model bank runs and currency crisessee Morris and Shin (2003).
Example 3.3. The Electronic Mail Game (Rubinstein (1989)).
In this game, there is probability 23 that the payoff matrix is GL , and probability 13 that the
payoff matrix is GR . In GL , A is a strictly dominant strategy. GR is a coordination game in
which the players want to coordinate on action B. If they coordinate on the wrong action,
both get 0; if they miscoordinate, the player who chose B gets punished.
GL :
GR :
2
A
B
A
2, 2
3, 0
B
0, 3
1, 1
2
A
B
A
0, 0
3, 0
B
0, 3
2, 2
Then the probability that no messages are sent is 0 = 23 , and the probability that m > 0
messages are sent is m = 13 (1 )m1 .
The type sets are T1 = { {0} , {1, 2} , {3, 4} , . . .} and T2 = { {0, 1} , {2, 3} , {4, 5} , . . .}.
|{z} |{z} |{z}
|{z} |{z} |{z}
sends 0 sends 1 sends 2
{0}
sends 1 {1, 2}
sends 2 {3, 4}
sends 3 {5, 6}
sends 0
t1
sends 0
sends 1
sends 2
sends 3
{0, 1}
{2, 3}
0
1
(1 )
3
1
(1 )2
3
0
..
.
{4, 5}
0
0
1
(1 )3
3
1
(1 )4
3
..
.
{6, 7}
0
0
0
1
(1 )5
3
..
.
2
3
1
0
0
..
.
...
...
...
...
..
.
Proposition 3.4. If > 0, the unique Nash equilibrium has both players play A regardless of type.
If > 0 is small, and the payoff matrix turns out to be GR , it is very likely that both players
know that the payoff matrix is GR . But the players still play A, even though (B, B) is a strict
equilibrium in GR .
Proof. If the payoff matrix is GL , player 1 knows this, and plays A, which is dominant for
him in this matrix.
Suppose that player 2s type is t2 = {0, 1}.
Then her posterior probability that no messages were sent is
0
=
0 + 1
2
3
2
3
+ 13
2
2
> .
2+ 3
Since player 1 plays A when no messages are sent, player 2s expected payoff to playing A is
more than 23 2 = 43 , while her expected payoff to choosing B is less than 23 (3)+ 31 2 = 43 .
Therefore, when her type is t2 = {0, 1}, player 2 should choose A.
Now suppose that player 1s type is t1 = {1, 2}.
In this case, player 1 knows that the payoff matrix is GR , and so that his payoff from
playing A is 0.
128
1
1
1
3
= 1
=
> .
1
1
1 + 2
+ 3 (1 3 ) 1 + (1 ) 2
3
1
2
(3) + 12 2 = 12 .
I
N
I
N
r, r
r 1, 0
0, r 1
0, 0
In this game, strategy I represents investing, and strategy N represents not investing.
Investing yields a payoff of r or r 1 according to whether the players opponent invests
or not. Not investing yields a certain payoff of 0. If r < 0, then the unique NE is
(N, N); if r = 0, the NE are (N, N) and (I, I); if r (0, 1), the NE are (N, N), (I, I), and
((1 r)I + rN, (1 r)I + rN); if r = 1, the NE are (N, N) and (I, I); and if r > 1, then the unique
NE is (I, I).
Now consider a Bayesian game BG in which payoffs are given by the above payoff matrix,
but in which the value of r is the realization of a random variable that is uniformly
distributed on [2, 3]. In addition, each player i only observes a noisy signal ti about the
value of r. Specifically, ti is defined by ti = r + i , where i is uniformly distributed on
1 1
, 10 ], and r, 1 , and 2 are independent of one another. This construction is known as
[ 10
a global game; see Carlsson and van Damme (1993) and Morris and Shin (2003).
Pure strategies for player i are of the form si : [ 21
, 31 ] {I, N}. To define a pure Nash
10 10
equilibrium, let ui (a, r) be player is payoff under action profile a under payoff parameter
r (which we can read from the payoff matrix). Then define the payoff to a player i of type
ti for playing action ai against an opponent playing strategy s j as
Z
Ui (ai , s j , ti ) =
ui (ai , s j (t j )), r di (t j , r | ti ),
tj, r
129
where i (|ti ) represents is beliefs about (t j , r) when he is of type ti . Then for s to be a pure
Nash equilibrium, it must be that
si (ti ) argmax U(ai , s j , ti )
ai
1
20
1 (s2 (t2 ) = N | t1 =
1
)
20
1
20
9
32
< 0.
1
,
20
In a similar fashion, we can show show that in any Nash equilibrium, if the value of player
1
is signal is less than 10
, then player i strictly prefers not to invest. Again we focus on
1
the most demanding case, in which t = 10
. In this case player 1s beliefs about player 2s
1
3
type are centered at 10 , and so have support [ 101 , 10
]. But we know from the previous
130
paragraph that player 2 will play N whenever t2 < 201 , so essentially the same calculation
as before shows that player 1 again must assign a probability of at least 329 to player 1
1
playing N. Thus, since U1 (I, s2 , 10
) 201 329 < 0, player 1s strict best response when t1 101
is to play N.
Rather than iterate this argument further, let be the supremum of the set of types that
can be shown by such an iterative argument to play N in any equilibrium. If were less
than 12 , then since a player whose type is less than will play N, a player whose type is
obtains an expected payoff of 21 < 0 from playing N. By continuity, this is also true of a
player whose type is slightly larger than , contradicting the definition of . We therefore
conclude that is at least 21 . This establishes that player i strictly prefers not to invest
when his signal is less than 12 . A symmetric argument shows that player i strictly prefers
to invest when his signal is above 21 .
In these examples, the introduction of a small amount of higher-order uncertainty to
a baseline game with multiple equilibria leads to the selection of a unique equilibrium.
This approach can be used as the basis for selecting among multiple equilibria. But it is
important to be aware that which equilibrium is selected is sensitive to the exact form
that the higher-order uncertainty takes. Indeed, Weinstein and Yildiz (2007) show that
every rationalizable strategy profile in a baseline game is the unique rationalizable strategy
profile in some nearby game with higher-order uncertainty.
4. Repeated Games
In many applications, players face the same interaction repeatedly. How does this affect
our predictions of play?
Repeated games provide a general framework for studying long run relationships.
While we will focus on the basic theory, this subject becomes even more interesting when
informational asymmetries are added: hidden information reputation models (Kreps
et al. (1982)); hidden actions imperfect monitoring models (Abreu et al. (1990)). An
excellent general reference for this material is Mailath and Samuelson (2006).
131
2
C
C 1, 1
1
D 2, 1
D
1, 2
0, 0
H = {h }
S
H = Tt=0 Ht
si : H Ai
i : H Ai
HT+1
(0, 1]
i : HT+1 R
i (hT+1 ) =
T
P
t=0
t ui (at )
Proposition 4.2. In the unique subgame perfect equilibrium of GT , both players always defect.
Proof. By backward induction:
Once the final period T is reached, D is dominant for both players, and so is played
regardless of the previous history.
Therefore, choices in period T 1 cannot influence payoffs in period T.
Hence, backward induction implies that both players defect in period T 1.
Repeat through period 0.
Proposition 4.3. In any Nash equilibrium of GT , players always defect on the equilibrium path.
Proof. Fix a Nash equilibrium .
Clearly, both players play D in period T at any node which is reached with positive
probability.
132
Therefore, since it cannot affect behavior in period T, the unique best response at any
positive probability period T 1 history is to play D.
Thus, since is a Nash equilibrium, players must also play D in period T 1.
Repeat through period 0.
Example 4.4. The infinitely repeated Prisoners Dilemma G .
S
t
H = n
t=0 H
o
H = (a0 , a1 , . . .) : at A
finite histories
infinite histories (note: H H = )
si : H Ai
i : H Ai
(0, 1)
i : H R
pure strategies
behavior strategies
discount rate
payoff function (defined on infinite histories)
i (h ) = (1 )
P
t=0
t ui (at )
The discount rate can be interpreted as the probability that the game ends in any given
period. But we dont need to be too literal about their being the possibility of infinite
repetition: what is important is that the players view the interaction as one with no clear
endsee Rubinstein (1991) for a discussion.
P
P t
1
1
t
Why the (1 )? Recall that
t=0 c = (1 ) 1 c = c.
t=0 = 1 (1 )
In G , (c, c, c, . . .) is worth c;
(0, c, c, . . .) is worth c;
(c, 0, 0, 0, . . .) is worth (1 )c.
Proposition 4.5. (i) Always defect is a subgame perfect equilibrium of G for all (0, 1).
(ii) If 1/2, the following defines a subgame perfect equilibrium of G : i = Cooperate so
long as no one has ever defected; otherwise defect, (the grim trigger strategy).
(There are many other equilibriasee Section 4.3.)
Q: How do we determine whether is a subgame perfect equilibrium of the infinitely
repeated game? There are a few obvious difficulties: there are no last subgames; after
each history there are an infinite number of possible deviations to consider; and there are
an infinite number of histories to consider.
The one-shot deviation theorem for infinite-horizon sequential choice problems with discounting (Theorem 4.7) states that a player has no profitable deviation in any subgame
133
t=0
t=1
t=2
t=0
t=1
t=2
t=1
t=1
t
Deviation: (C, D), (D, D), (D, D), . . . (1 ) 1 +
0 = 1 < 0
t=0
t=1
t=2
t=1
C
2
C
1
C
2
C
C
D C
1
D C
C
C
1
D C
1
DC
C
D C
C
D C
C
D C
C
D
C
2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2
C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C DC D C D C D C D C D C D C D C D C D C D C D C D C D C D C D C D
The (partial) game tree above is that of the repeated Prisoners Dilemma. The bold
edges represent the grim trigger strategy profile. If this strategy profile is played,
then play proceeds down the leftmost branch of the tree, and both players cooperate
in every period. If we modify one players strategy so that after a single cooperative
history he plays D rather than C, then play enters a subgame in which both players
defect in every period. Thus a deviation from the strategys prescription after a
single history can alter what actions are played in all subsequent periods.
This figure also shows the partition of histories into the two cases above. The
decision nodes of player 1 are the initial nodes of subgames. The leftmost subgames
follow histories in which no one has defected (including the null history). All other
subgames follow histories in which someone has defected.
3. The difference between the equilibrium outcomes of the finitely repeated and infinitely repeated Prisoners Dilemmas is quite stark. With many other stage games,
this difference is not so stark. If the stage game G has multiple Nash equilibrium
outcomes, one can often sustain the play of non-Nash action profiles of G in early
periodsfor instance, by rewarding cooperative play in early periods with the
play of good Nash outcomes of G in later periods, and by punishing deviations in
early periods with bad Nash outcomes in later periods. For general analyses of the
finitely repeated games, see Benot and Krishna (1985) and Friedman (1985).
135
a pure action
a mixed action
a mixed action profile
S
H = Ht
H = {(a0 , a1 , . . .) : at A}
Si = {si : H Ai }
i = {i : H Ai }
(0, 1)
i : H R
P
i (h ) = (1 ) t ui (at )
t=0
t=0
We call i a one-shot deviation from i if it only differs from i in the period immediately
following a single history, say ht . The one-shot deviation i is profitable if it generates a
higher payoff than i in the subgame starting with history ht .
Theorem 4.6. Let G be a repeated game. Then the following are equivalent:
(i) Strategy profile is a subgame perfect equilibrium.
(ii) Strategy profile is sequentially rational.
(iii) No player has a profitable one-shot deviation from .
Given the structure of repeated games, the equivalence of statements (i) and (ii) is immediate. The equivalence of (ii) and (iii) follows from the one-shot deviation theorem for
infinite-horizon sequential decision problems with discounting, which we present next.
The one-shot deviation theorem
As in the case of simple sequential decision problems (Theorem 2.26) we have
Theorem 4.7 (The one-shot deviation theorem). In the repeated game G , i is sequentially
rational given i if and only if there is no profitable one-shot deviation from i .
Idea of the proof : We know from Theorem 2.26 that the absence of profitable finite-period
deviations implies the absence of profitable one-shot deviations. Moreover, in a repeated
game with discounting, the total payoff impact of choices from period T onward is at most
P
t
T
(1 )
t=T K = K , which approaches 0 as T grows large. Thus, if there is a profitable
infinite period deviation, then some of the benefit must have been obtained in some
finite number of periods, which in turn implies the existence of a profitable one-period
deviation.
Example 4.8. Without discounting, or more generally, without some notion of continuity
of payoffs at infinity (see Definition 4.1 of Fudenberg and Tirole (1991)), the one-shot deviation theorem does not hold. For example, suppose that an agent must choose an infinite
sequence of Ls and Rs; his payoff is 1 if he always chooses R and is 0 otherwise. Consider
the strategy always choose L. While there is no profitable finite-period deviation from
this strategy, there is obviously a profitable infinite-period deviation.
Example 4.9. Suppose that two players repeatedly play the following normal form game:
T
M
B
T
3, 3
1, 2
4, 2
2
M
2, 1
1, 1
3, 1
137
B
2, 4
1, 3
0, 0
min
max
Q
i j,i A j i Ai
ui (i , i ).
Thus vi is the payoff obtained when his opponents minmax him and he, anticipating what
they will do, plays a best response (see Section 1.6). This leads to a lower bound on what
138
139
Clearly, generates payoff vector v in G () for any (0, 1). To verify that is a subgame
perfect equilibrium of for large enough, we check that no player has a profitable one-shot
deviation.
There are 1 + n cases to consider. On the equilibrium path (i.e., if there has never been a
unilateral deviation), i does not benefit from deviating if
vi (1 )vi + ui (i ).
Since vi > ui (i ), this is true if is large enough.
After a history in which player i was the first to unilaterally deviate, the continuation
strategy profile is always play the Nash equilibrium i . Clearly, no player has a profitable
one-shot deviation here.
Example 4.13. Stick and carrot strategies (Fudenberg and Maskin (1986)).
We now prove the folk theorem under the assumptions that
can be obtained from some pure action profile a A.
(1) The payoff vector v = u(a)
(2) There are exactly two players.
(3) For each player i, vi is greater than player is pure strategy minmax value,
p
Let am
be player is minmaxing pure action, and consider the following stick and carrot
i
strategy:
i : (I) Play ai initially, or if a was played last period.
(II) If there is a deviation from (I), play am
L times and then restart (I).
i
(III) If there is a deviation from (II), begin (II) again.
The value of the punishment length L 1 will be determined below; often L = 1 is enough.
Again, generates payoff vector v in G () for any (0, 1). To verify that is a subgame
perfect equilibrium of for large enough, we check that no player has a profitable one-shot
deviation.
Let vi = maxa ui (a), and let vm
= ui (am ), where am = (am
, am
) is the joint minmaxing pure
2
1
i
action profile. Then
(9)
i .
vm
i vi < vi v
140
We can therefore choose a positive integer L such that for i {1, 2},
i vi ,
L (vi vm
i ) > v
or equivalently,
(L + 1) vi > vi + Lvm
i .
(10)
(In words: if player i were perfectly patient, he would prefer getting vi for L + 1 periods
to getting his maximum payoff vi once followed by his joint minmax payoff vm
L times.)
i
There is no profitable one-shot deviation from the equilibrium phase (I) if
vi = (1 )
X
t=0
L
X
X
X
t vm
+
v
t vi (1 ) vi +
i
i
t=1
t vi vi +
t=0
L
X
t=L+1
t vm
i .
t=1
Equation (10) implies that this inequality holds when is close enough to 1.
In the punishment phase (II), deviating is most tempting in the initial period, when all L
rounds of punishment still remain. Deviating in this period is not profitable if
L1
X
X
X
p X
(1 )
t vm
t vi (1 ) vi +
t vm
t vi
i +
i +
t=0
t=L
t=1
+ vi
vm
i
L vi + (1 L )vm
i
p
vi +
p
vi .
t=L+1
L vm
i
Equation (9) implies that this inequality holds when is close enough to 1.
Why is i called a stick and carrot strategy? The punishment phase (II) is the stick (i.e.,
the threat) that keeps players from deviating from the equilibrium path phase (I). The
equilibrium path phase (I) is the carrot (i.e., the reward) offered to players for carrying out
the punishment phase (II).
Discussion
1.
Why should one care about being able to support low equilibrium payoffs?
In a subgame perfect equilibrium, behavior in every subgame, including subgames corresponding to punishments, must constitute an equilibrium. Therefore, in order to support
high equilibrium payoffs at a fixed discount rate (0, 1) (a question that the folk theorem does not address, but that we consider in Sections 4.4 and 4.5), we need to have
punishments that are as severe as possible. The next example illustrates this point.
141
Example 4.14. Using stick and carrot strategies. Consider this stage game G:
2
a
b
c
A 1, 2 5, 1 1, 0
1 B 2, 1 4, 4 0, 0
C 0, 1 0, 0 0, 0
The unique Nash equilibrium of G is ( 34 A + 41 B, 12 a + 12 b), which yields payoffs of (3, 47 ).
For what discount rates can we support a play path of (B, B), (B, B), . . . in a subgame
perfect equilibrium of G ()?
With Nash reversion: There is no profitable deviation from the off-equilibrium path. On
the equilibrium path, player 2 gets optimal payoffs at (B, b) and will not deviate. For
player 1, we need
4 (1 )5 + 3
4 5 2
12 .
What about with stick and carrot strategies? Consider
: (I) Play (B, b) so long as no one has deviated.
(II) If someone deviates, play (C, c), (B, b), (B, b), . . .
(III) If someone deviates from (II), begin (II) again.
When is this a subgame perfect equilibrium? Check using the one-shot deviation theorem.
Case (I):
carrot strategy, the punishment profile (stage (II)) yields payoffs of (4, 4). Thus when
= 41 , the punishment profile of the stick and carrot strategy is considerably worse for
both players than the punishment profile under Nash reversion. This low payoff on the
punishment path allows us to support high subgame perfect equilibrium payoffs at a low
discount rate.
2.
How can one support payoff vectors that do not correspond to pure action profiles?
If we do not modify the repeated game, payoff vectors that do not correspond to pure
strategy profiles can only be achieved if players alternate among pure action profiles
over time. Sorin (1986) shows that any feasible payoff vector can be achieved through
alternation if players are sufficiently patient. However, the need to alternate complicates
the construction of equilibrium, since we must ensure that no player has a profitable
deviation at any point during the alternation. Fudenberg and Maskin (1991) show that
this can be accomplished so long as players are sufficiently patient.
To avoid alternation and the complications it brings, it is common to augment the repeated
game by introducing public randomization: at the beginning of each period, all players view
the realization of a uniform(0, 1) random variable, enabling them to play a correlated action
in every period. If on the equilibrium path players always play a correlated action whose
expected payoff is v, then each players continuation payoff is always exactly v. Since in
addition the benefit obtained in a the current period from a one shot deviation is bounded
(by maxa ui (a) mina ui (a)), the equilibrium constructions and analyses from Examples
4.12 and 4.13 go through with very minor changes.
It is natural to ask whether public randomization introduces equilibrium outcomes that
would otherwise be impossible. Since the folk theorem holds without public randomization, we know that for each payoff vector v F , there is a (v) such that v can be achieved
in a subgame perfect equilibrium of G () whenever > (v). Furthermore, Fudenberg
et al. (1994) show that any given convex, compact set in the interior of F contains only
subgame perfect equilibrium payoff vectors of G () once is large enough. However,
Yamamoto (2010) constructs an example in which the set of subgame perfect equilibrium
payoff vectors of G () is not convex (and in particular excludes certain points just inside
the Pareto frontier) for any < 1; thus, allowing public randomization is not entirely
without loss of generality even for discount factors arbitrarily close to 1.
Having discussed how one obtains payoff vectors that do not correspond to pure strategy
profiles, we now present an example that highlights the differences between the sets of
payoff vectors that the various equilibrium constructions can sustain.
Example 4.15. Consider the symmetric normal form game G:
A
B
C
a
0, 0
2, 4
0, 1
2
b
4, 2
0, 0
0, 0
143
c
1, 0
0, 0
0, 0
The Nash equilibria are (A, b), (B, a), and ( 32 A + 13 B, 23 a + 13 b), and yield payoffs of (4, 2),
(2, 4), and ( 43 , 43 ). The strategies C and c are strictly dominated, but are the pure minmax
p p
strategies; they generate the pure minmax payoffs (v1 , v2 ) = (1, 1). The mixed minmax
strategies are 31 A + 23 C and 31 a + 32 b; they generate the mixed minmax payoffs (v1 , v2 ) = ( 32 , 32 ).
(The latter two calculations are illustrated below at left.)
u2
4
u2(a)
B
u2(b)
2
A
B [4]
(2/3)A+(1/3)B [4/3]
A [2]
[2/3]
(1/3)A+(2/3)C
[1]
u1
Here the stick and carrot strategies using pure minmax punishments from Example 4.13
support more equilibrium payoff vectors than the Nash reversion strategies from Example
4.12, but do not support all of the payoff vectors guaranteed by the folk theorem.
3.
How can one obtain payoffs close to the players mixed minmax values?
The the stick and carrot equilibrium from Example 4.13 relied on pure minmax actions
as punishments. As Example 4.15 illustrates, such punishments are not always strong
enough to sustain all vectors in F as subgame perfect equilibrium payoffs.
The difficulty with using mixed action punishments is that when a player chooses a mixed
action, his opponents cannot observe his randomization probabilities, but only his realized
pure action. Suppose we modified the stick and carrot strategy profile from Example 4.13
by specifying that player i play his mixed minmax action i in the punishment phase.
Then during this punishment phase, unless player i expects to get the same stage game
payoff from each action in the support of i , he will have a profitable and undetectable
deviation from i to his favorite action in the support of i .
To address this problem, we need to modify the repeated game strategies so that when
player i plays one of his less preferred actions in the support of i , he is rewarded with a
144
higher continuation payoff. More precisely, by carefully balancing player is stage game
payoffs and continuation payoffs, one can make player i indifferent among playing any
of the actions in the support of i , and therefore willing to randomize among them in the
way his mixed minmax action specifies. See Mailath and Samuelson (2006, Sec. 3.8) for a
textbook treatment.
4.
What new issues arise when there are three or more players?
A1
B1
A2
1, 1, 1
0, 0, 0
B2
0, 0, 0
0, 0, 0
A1
B1
2
A2
B2
0, 0, 0 0, 0, 0
0, 0, 0 1, 1, 1
3: A3
3: B3
To obtain a pure minmax folk theorem for games with three or more players satisfying the
NEU condition using an analogue of the stick and carrot strategy profile from Example
4.13, one first replaces the single punishment stage with distinct punishments for each
player (as in Example 4.12). To provide incentives to player is opponents to carry out a
punishment of player i, one must reward the punishers once the punishment is over. See
Mailath and Samuelson (2006, Sec. 3.4.2) for a textbook treatment.
state space
feasibility correspondence
(x) = states feasible tomorrow if todays state is x
feasibility correspondence for sequences
(x) = {{xt }
t=0 : x0 = x, xt+1 (xt ) t 0}
payoff function. F(x, y) = agents payoff today if
todays state is x and tomorrows state is y
discount rate.
Example 4.17. If xt = capital at time t and ct = f (xt ) xt+1 = consumption at time t, then the
payoff at time t is u(ct ) = u( f (xt ) xt+1 ) = F(xt , xt+1 ).
Assumptions
(i) X is convex or finite.
(ii) is non-empty, compact valued, and continuous.
146
v(x) = max (1 )
{xt }(x)
t F(xt , xt+1 ).
t=0
If w: X R describes the continuation values tomorrow, then Tw(x) is the (optimal) value
of being at x today. Notice that Tw B(X).
Observation 4.19. v is a fixed point of T (v(x) = Tv(x) for all x) if and only if v solves (F).
Theorem 4.20 (The Method of Successive Approximations).
(i) T(C(X)) C(X).
k w w k.
(ii) kTw Twk
(iii) T admits a unique fixed point v, and lim Tk w = v for all w C(X).
k
Part (iii) of the theorem provides an iterative method of solving the functional equation
(F). Once parts (i) and (ii) are established, part (iii) follows from the Banach (Contraction
Mapping) Fixed Point Theorem.
4.4.2 Dynamic programs vs. repeated games
Repeated games differ from dynamic programs in two basic respects:
(i)
There is no state variable (available choices and their consequences do not depend
on past choices).
(ii) There are multiple agents.
The analogue of the value function is a subgame perfect equilibrium payoff vector.
But unlike the value function for (D), the subgame perfect equilibrium payoff vector for
G () is typically not unique.
Nevertheless, Abreu et al. (1990) show that there is an analogous iterative method of
determining the set of subgame perfect equilibrium payoff vectors of G ():
Introduce an operator B on {sets of payoff vectors}= {W Rn }
(compare to the Bellman operator T, which acts on {value functions}={w : X R}.)
(ii) Observe that the set of subgame perfect equilibrium payoffs V is a fixed point of B.
(iii) Establish properties of B (self-generation, monotonicity, compactness) which imply
that iteration of B from a well-chosen initial condition W0 leads to V.
(i)
148
We want to factor elements of V using pairs (a, c), where a A is the initial action profile
and c : A Rn is a continuation value function.
The value of pair (a, c) is the vector v(a, c) = (1 )u(a) + c(a) Rn .
Which pairs which satisfy period 0 incentive constraints?
We say that action a is enforceable by continuation value function c if
vi (a, c) vi ((a0i , ai ), c)
Theorem 4.23.
T
k=0
In this theorem, Wk is the set of equilibrium payoff vectors of games of the following
form: k rounds of the stage game G, followed by (history-dependent) continuation values
drawn from W0 (mnemonic: Wk = Bk (W0 ) comes k periods before W0 ). As k grows larger,
the time before the continuation payoffs from W0 appear is put off further and further into
the future.
Remarks:
1. All of these results can be generalized to repeated games with imperfect public
monitoringsee Abreu et al. (1990).
2. Using these results, one can prove that the set V of (normalized) subgame perfect
equilibrium payoffs is monotone in the discount factor (0, 1). In other words,
increasing players patience increases the set of equilibrium outcomes.
Example 4.24. Consider an infinite repetition of the Prisoners Dilemma below. What is
the set of subgame perfect equilibrium payoffs when = 43 ?
2
C
C 1, 1
1
D 2, 1
D
1, 2
0, 0
HC,CL
W0
HD,DL
HD,CL
To begin, we compute the set W1 = B(W0 ). For each action profile a, we determine the set
W1a of payoff vectors that can be enforceably factored by some pair (a, c) into W0 (meaning
that c : A W0 ). Then W1 is the union of these sets.
150
First consider a = (D, D). For this action profile to be enforceable, neither player can prefer
to deviate to C. Player 1 does not prefer to deviate if
(1 )u1 (D, D) + c1 (D, D) (1 )u1 (C, D) + c1 (C, D)
14 0 + 34 c1 (D, D) 14 (1) + 34 c1 (C, D)
c1 (D, D) 13 + c1 (C, D).
Similarly, player 2 prefers not to deviate if
c2 (D, D) 13 + c2 (D, C).
These inequalities show that if c(D, D) = c(C, D) = c(D, C), the pair (a, c) will be enforceable.
(This makes sense: one does not need to promise future rewards to make players choose
a dominant action.) Thus, for any w W0 , we can enforce action profile (D, D) using a
continuation value function c with c(D, D) = w.
The value for the pair ((D, D), c) with c(D, D) = w is
(1 )u(D, D) + c(D, D) =
1
4
(0, 0) + 43 w.
In the figure below, the full shaded area is W0 ; the smaller shaded area is W1DD = 14 (0, 0) +
3
W0 , the set of payoff vectors that can be enforceably factored by some pair ((D, D), c)
4
into W0 .
HC,DL
HC,CL
W1DD
HD,DL
HD,CL
Now consider a = (C, C). In this case, the enforceability constraints are
(1 )u1 (C, C) + c1 (C, C) (1 )u1 (D, C) + c1 (D, C)
151
14 1 + 34 c1 (C, C) 14 2 + 34 c1 (D, C)
c1 (C, C) 13 + c1 (D, C), and, symmetrically,
c2 (C, C) 13 + c2 (C, D).
These calculations show that for c(C, C) W0 to be part of an enforceable factoring of
((C, C), c) into W0 , there must be a point in W0 that is 13 units to the left of c(C, C) (to punish
player 1 if he deviates) as well as a point in W0 that is 31 units below c(C, C) (to punish player
2 if she deviates). Since (0, 0) is both the leftmost and the lowest point in W0 , it follows
that for any w {w W0 : w1 , w2 13 }, we can enforce (C, C) using a c with c(C, C) = w. The
value for the pair ((C, C), c) with c(C, C) = w is
(1 )u(C, C) + c(C, C) =
1
4
(1, 1) + 43 w.
In the figure below, the full shaded area is {w W0 : w1 , w2 13 }, the set of allowable values
for c(C, C); the smaller shaded area is the set W1CC = 14 (1, 1) + 43 {w W0 : w1 , w2 13 }.
HC,DL
HC,CL
W1CC
HD,DL
HD,CL
Next consider a = (C, D). In this case the enforceability constraints are
c1 (C, D)
1
3
That is, we only need to provide incentives for player 1. Reasoning as above, we find that
for any w {w W0 : w1 31 }, we can enforce (C, D) using a c with c(C, D) = w. The value
for the pair ((C, D), c) with c(C, D) = w is
(1 )u(C, D) + c(C, D) =
1
4
(1, 2) + 34 w.
In the figure below at left, the larger shaded area is {w W0 : w1 13 }; the smaller shaded
152
area is the set W1CD = 14 (1, 1) + 34 {w W0 : w1 31 }. The figure below at right shows the
construction of W1DC , which is entirely symmetric.
HC,DL
HC,DL
HC,CL
HC,CL
W1CD
W1DC
HD,DL
HD,DL
HD,CL
HD,CL
W1 = W0
HC,CL
HD,DL
HD,CL
Repeating the argument above shows that Wk+1 = Wk = . . . = W0 for all k, implying that
V = W0 . In other words, all feasible, weakly individually rational payoffs are achievable
in subgame perfect equilibrium when = 43 .
This example was especially simple. In general, each iteration W0 7 W1 7 . . . 7 Wk 7 . . .
eliminates additional payoff vectors, and V is only obtained in the limit.
Example 4.25. Consider the same Prisoners Dilemma stage game as in the previous example, but suppose that = 12 . The sets W0 , . . . , W5 are shown below.
153
HC,DL
W0
HC,DL
W1
HC,CL
HC,DL
HC,CL
HD,DL
HC,CL
HD,DL
HD,DL
HD,CL
HC,DL
W2
W3
HD,CL
HC,DL
W4
HC,CL
HD,CL
HC,DL
W5
HC,CL
HD,DL
HC,CL
HD,DL
HD,DL
HD,CL
HD,CL
HD,CL
Taking the limit, we find that the set of subgame perfect equilibrium payoffs is
o
[ n
V =
( 21k , 21k ), ( 21k , 0), (0, 21k ) {(0, 0)}.
k=0
HC,DL
W* = V
HC,CL
HD,DL
HD,CL
154
The payoff (1, 1) is obtained in a subgame perfect equilibrium from the grim trigger
strategy profile with equilibrium path (C, C), (C, C), . . . . The payoff (1, 0) is obtained from
the one with equilibrium path (D, C), (C, D), (D, C), (C, D), . . . ; payoff (0, 1) is obtained by
reversing roles. Payoff ( 21k , 21k ) is obtained by playing (D, D) for k periods before beginning
the cooperative phase; similarly for ( 21k , 0) and (0, 21k ). Finally the always defect strategy
profile yields payoff (0, 0).
Proofs of the theorems
Once one understands why Theorems 4.21 and 4.22 are true, the proofs are basically
bookkeeping. The proof of Theorem 4.23, which calls on the previous two theorems,
requires more work, but it can be explained quickly if the technicalities are omitted. This
is what we do below.
Proof of Theorem 4.21: W bounded, W B(W) W V.
Example 4.26. That W is bounded ensures that we cant put off actually receiving our
payoffs forever. To see this, suppose that W = R+ and = 1/2. Then we can decompose
1 W as 1 = 0 + 2. And we can decompose 2 W as 2 = 0 + 4. And we can decompose
4 W as 4 = 0 + 8 . . . And the payoff 1 is never obtained.
The proof: Let w0 W. We want to construct a SPE with payoffs w0 .
Since w0 W B(W), it is enforceably factored by some (a0 , c0 ) into W.
for each a A, c0 (a) W is the payoff vector to be obtained from period 1 on after
h1 = {a}.
Now consider the period 1 history h1 = {a}.
Let w1 = c0 (a). Then w1 W B(W).
That is, w1 is enforceably factored by some (a1a , c1a ) into W.
a1a is the period 1 action profile occurring after h1 = {a} in our SPE.
W is the payoff vector to be obtained from period 2 on after
for each a A, c1a (a)
2
...
h = {a, a}
For each a A, let c(a) = (s|a ), where s|a is the strategy profile starting in period 1 if action
profile a is played in period 0. Then:
(i)
vi (a , c)
0
and so a0 is enforceable by c.
vi ((ai , a0i ), c)
for all ai Ai , i P .
(iii) Since s is a SPE, so is s|a for every action profile a, and thus c(a) V for all a A.
(In words: continuation values are in V.)
In conclusion, w is enforceably factored by (a0 , c) into V.
(II):
T
k=0
Wk = V.
156
H0 = H H
For each (h0 , h ) H0 , define the simple strategy profile s(h0 , h ) S S as follows:
(i) Play h0 until a unilateral deviation occurs;
(ii) Play hi if player i unilaterally deviated from the most recently specified history.
Example 4.27. Representing simple strategy profiles as automatons.
1
h1
h0
2
1
2
h2
Here, h0 is the initial path and h1 and h2 are punishment paths. Labels on the arrows
represent unilateral deviators.
A vector s = (s1 , . . . , sn ) Sn of n strategy profiles is called a penal code.
A penal code s = s (h ) is simple if there is a vector of histories h H such that si = s(hi , h )
for all i P . (In words: each strategy profile si from s is generated by initial path hi and
punishment paths from h .)
s 1:
1
h1
h1
2
1
2
h2
s 2:
2
157
h1
h2
2
1
2
h2
A penal code s is optimal if si S and i (si ) = minsS i (s) for all i P . (In words:
Each strategy profile si from s is a subgame perfect equilibrium that generates is worst
subgame perfect equilibrium payoff.)
Theorem 4.28.
(i) There exists a vector of histories h H such that the simple penal code s (h ) is optimal.
(ii) Let s S be a subgame perfect equilibrium of G () with equilibrium path h H .
Then the simple strategy profile s(h , h ) is a subgame perfect equilibrium with the same
equilibrium path as s .
The theorem tells us that
(i)
There is a simple penal code that simultaneously defines a worst case subgame
perfect equilibrium for every player.
(ii) Replacing off-path behavior in any subgame perfect equilibrium with the simple
optimal penal code from (i) yields a new subgame perfect equilibrium with the
same path of play.
Proof of the theorem
For any history h = (a0 , a1 , . . .) H , let ht = (at , at+1 , . . .) H denote the continuation
history starting at time t.
Let s S be a subgame perfect equilibrium with play path h = (a0 , a1 , . . .) H .
Fix j P , a j A j , and t 0, and let h be the continuation path that would start in period
t + 1 if player j were to deviate to a j in period t.
One can show that
(11)
j = min
j (s) is attained at some swj S .
sS
This is a key lemma, since it ensures that there is a worst SPE punishment for j.
Since s is a SPE and by the definition of j ,
(12)
(1 )u j (a j , at ) + .
(1 )u j (at ) + j (h t+1 ) (1 )u j (a j , atj ) + j (h)
j
j
|
{z
} |
{z
}
equilibrium payoff
deviation payoff
The last inequality follows by replacing the continuation payoff with worst SPE payoff.
(i) Define h = (h1 , . . . , hn ) by letting hi be the play path of swi , and write hi = (a0i , a1i , . . .)
H . (Recall that hi is the path from a worst SPE for player i.)
We want to show that the simple penal code s = s (h ) is optimal.
158
By (11), si gives i his worst payoff. So it is enough to show that each si = s(hi , h ) is a SPE.
Were this not the case, a profitable one-shot deviation would exist from some hi at some
time t for some player j:
) < (1 )u j (a j , (ati )j ) + j .
(1 )u j (ati ) + j (ht+1
i
But this contradicts (12), since hi = (a0i , a1i , . . .) is the play path of a SPE.
Let s S be a SPE with play path h , and consider the strategy profile s(h , h ).
Since s is a SPE, (12) holds along the path h , so no one-shot deviations from s(h , h ) on
the equilibrium path h are profitable. (Remember that s differs from s(h , h ) in that the
(ii)
References
Abreu, D. (1988). On the theory of infinitely repeated games with discounting. Econometrica, 56:38396.
Abreu, D., Dutta, P. K., and Smith, L. (1994). The folk theorem for repeated games: A
NEU condition. Econometrica, 62:93948.
Abreu, D., Pearce, D., and Stacchetti, E. (1990). Toward a theory of discounted repeated
games with imperfect monitoring. Econometrica, 58:104163.
Alos-Ferrer,
C. and Kuzmics, C. (2013). Hidden symmetries and focal points. Journal of
Hendon, E., Jacobsen, H. J., and Sloth, B. (1996). The one-shot-deviation principle for
sequential rationality. Games and Economic Behavior, 12:274282.
Hillas, J. (1998). How much of forward induction is implied by backward induction and
ordinality? Unpublished manuscript, University of Auckland.
Hillas, J. and Kohlberg, E. (2002). Conceptual foundations of strategic equilibrium. In
Aumann, R. J. and Hart, S., editors, Handbook of Game Theory with Economic Applications,
volume 3, chapter 42, pages 15971663. Elsevier, New York.
Hiriart-Urruty, J.-B. and Lemarechal, C. (2001). Fundamentals of Convex Analysis. Springer,
Berlin.
Hofbauer, J. and Sandholm, W. H. (2011). Survival of dominated strategies under evolutionary dynamics. Theoretical Economics, 6:341377.
Hofbauer, J. and Swinkels, J. M. (1996). A universal Shapley example. Unpublished
manuscript, University of Vienna and Northwestern University.
Hopkins, E. and Seymour, R. M. (2002). The stability of price dispersion under seller and
consumer learning. International Economic Review, 43:11571190.
Howard, R. A. (1960). Dynamic Programming and Markov Processes. MIT Press, Cambridge.
Jackson, M. O., Simon, L. K., Swinkels, J. M., and Zame, W. R. (2002). Communication and
equilibrium in discontinuous games of incomplete information. Econometrica, 70:1711
1740.
Kakutani, S. (1941). A generalization of Brouwers fixed point theorem. Duke Mathematical
Journal, 8:457459.
Kohlberg, E. and Mertens, J.-F. (1986). On the strategic stability of equilibria. Econometrica,
54:10031037.
Kohlberg, E. and Reny, P. J. (1997). Independence on relative probability spaces and
consistent assessments in game trees. Journal of Economic Theory, 75:280313.
Kreps, D. M., Milgrom, P., Roberts, J., and Wilson, R. (1982). Rational cooperation in the
finitely repeated Prisoners Dilemma. Journal of Economic Theory, 27:245252.
Kreps, D. M. and Wilson, R. (1982). Sequential equilibria. Econometrica, 50:863894.
Kuhn, H. W. (1953). Extensive games and the problem of information. In Kuhn, H. W.
and Tucker, A. W., editors, Contributions to the Theory of Games II, volume 28 of Annals of
Mathematics Studies, pages 193216. Princeton University Press, Princeton.
Lahkar, R. (2011). The dynamic instability of dispersed price equilibria. Journal of Economic
Theory, 146:17961827.
163
Levitt, S. D., List, J. A., and Sadoff, S. E. (2011). Checkmate: Exploring backward induction
among chess players. American Economic Review, 101:975990.
Mailath, G. J. and Samuelson, L. (2006). Repeated Games and Reputations: Long-Run Relationships. Oxford University Press, Oxford.
Mailath, G. J., Samuelson, L., and Swinkels, J. M. (1993). Extensive form reasoning in
normal form games. Econometrica, 61:273302.
Mailath, G. J., Samuelson, L., and Swinkels, J. M. (1997). How proper is sequential
equilibrium? Games and Economic Behavior, 18:193218.
Marx, L. M. and Swinkels, J. M. (1997). Order independence for iterated weak dominance.
Games and Economic Behavior, 18:219245. Corrigendum, 31 (2000), 324-329.
McKelvey, R. D. and Palfrey, T. R. (1992). An experimental study of the centipede game.
Econometrica, 60:803836.
McLennan, A., Monteiro, P. K., and Tourky, R. (2011). Games with discontinuous payoffs:
A strengthening of Renys existence theorem. Econometrica, 79:16431664.
Mertens, J.-F. (1989). Stable equilibriaa reformulation. I. Definition and basic properties.
Mathematics of Operations Research, 14:575625.
Mertens, J.-F. (1991). Stable equilibriaa reformulation. II. Discussion of the definition,
and further results. Mathematics of Operations Research, 16:694753.
Mertens, J.-F. (1995). Two examples of strategic equilibrium. Games and Economic Behavior,
8:378388.
Mertens, J.-F. and Zamir, S. (1985). Formulation of Bayesian analysis for games with
incomplete information. International Journal of Game Theory, 14:129.
Milnor, J. (1954). Games against Nature. In Thrall, R. M., Coombs, C. H., and Davis, R. L.,
editors, Decison Processes, pages 4959. Wiley, New York.
Morris, S. (1994). Trade with heterogeneous prior beliefs and asymmetric information.
Econometrica, 62:13271347.
Morris, S. (1995). The common prior assumption in economic theory. Economics and
Philosophy, 11:227253.
Morris, S. and Shin, H. S. (2003). Global games: Theory and applications. In Dewatripont,
M., Hansen, L. P., and Turnovsky, S. J., editors, Advances in Economics and Econometrics:
Theory and Applications, Eighth World Congress, volume 1, pages 56114. Cambridge
University Press, Cambridge.
Myerson, R. B. (1978). Refinements of the Nash equilibrium concept. International Journal
of Game Theory, 7:7380.
164
165
Stoye, J. (2011). Statistical decisions under ambiguity. Theory and Decision, 70:129148.
Thompson, F. (1952). Equivalence of games in extensive form. RM 759, The Rand Corporation. Reprinted in Classics in Game Theory, H. W. Kuhn, Ed., p. 36-45, Princeton
University Press, Princeton (1997).
Trans, T. (1998). Tie-breaking in games of perfect information. Games and Economic
Behavior, 22:148161.
van Damme, E. (1984). A relation between perfect equilibria in extensive form games and
proper equilibria in normal form games. International Journal of Game Theory, 13:113.
von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior.
Prentice-Hall, Princeton.
Weinstein, J. and Yildiz, M. (2007). A structure theorem for rationalizability with application to robust predictions of refinements. Econometrica, 75:365400.
Wilson, R. (1971). Computing equilibria of n-person games. SIAM Journal on Applied
Mathematics, 21:8087.
Yamamoto, Y. (2010). The use of public randomization in discounted repeated games.
International Journal of Game Theory, 39:431443.
Young, H. P. (2004). Strategic Learning and Its Limits. Oxford University Press, Oxford.
Zeeman, E. C. (1980). Population dynamics from game theory. In Nitecki, Z. and Robinson,
C., editors, Global Theory of Dynamical Systems (Evanston, 1979), number 819 in Lecture
Notes in Mathematics, pages 472497, Berlin. Springer.
167