Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

[1]\fnmDengwang \surTang

1]\orgdivMing Hsieh Department of Electrical and Computer Engineering, \orgnameUniversity of Southern California, \orgaddress\cityLos Angeles, \stateCA, \postcode90089-2560, \countryUSA

2]\orgdivElectrical and Computer Engineering Division, Electrical Engineering and Computer Science Department, \orgnameUniversity of Michigan, \orgaddress\cityAnn Arbor, \stateMI, \postcode48109, \countryUSA

​​​​​Information Compression in Dynamic Games​​​​​

dwtang@umich.edu    \fnmVijay \surSubramanian vgsubram@umich.edu    \fnmDemosthenis \surTeneketzis teneket@umich.edu [ [
Abstract

One of the reasons why stochastic dynamic games with an underlying dynamic system are challenging is because strategic players have access to enormous amount of information which leads to the use of extremely complex strategies at equilibrium. One approach to resolve this challenge is to simplify players’ strategies by identifying appropriate compression of information maps so that the players can make decisions solely based on the compressed version of information, called the information state. Such maps allow players to implement their strategies efficiently. For finite dynamic games with asymmetric information, inspired by the notion of information state for single-agent control problems, we propose two notions of information states, namely mutually sufficient information (MSI) and unilaterally sufficient information (USI). Both these information states are obtained by applying information compression maps that are independent of the strategy profile. We show that Bayes-Nash Equilibria (BNE) and Sequential Equilibria (SE) exist when all players use MSI-based strategies. We prove that when all players employ USI-based strategies the resulting sets of BNE and SE payoff profiles are the same as the sets of BNE and SE payoff profiles resulting when all players use full information-based strategies. We prove that when all players use USI-based strategies the resulting set of weak Perfect Bayesian Equilibrium (wPBE) payoff profiles can be a proper subset of all wPBE payoff profiles. We identify MSI and USI in specific models of dynamic games in the literature. We end by presenting an open problem: Do there exist strategy-dependent information compression maps that guarantee the existence of at least one equilibrium or maintain all equilibria that exist under perfect recall? We show, by a counterexample, that a well-known strategy-dependent information compression map used in the literature does not possess any of the properties of the strategy-independent compression maps that result in MSI or USI.

keywords:
Non-cooperative Games, Dynamic Games, Information State, Sequential Equilibrium, Markov Decision Process
pacs:
[
pacs:
[
pacs:
[

JEL Classification]C72, C73, D80 MSC Classification]90C40, 91A10, 91A15, 91A25, 91A50

Acknowledgements]The authors would like to thank Yi Ouyang, Hamidreza Tavafoghi, Ashutosh Nayyar, Tilman Börgers, and David Miller for helpful discussions.

1 Introduction

The model of stochastic dynamic games has found application in many engineering and socioeconomic settings, such as transportation networks, power grid, spectrum markets, and online shopping platforms. In these settings, multiple agents/players make decisions over time on top of an ever-changing environment with players having different goals and asymmetric information. For example, in transportation networks, individual drivers make routing decisions based on information from online map services in order to reach their respective destinations as fast as possible. Their actions then collectively affect traffic conditions in the future. Another example involves online shopping platforms, where buyers leave reviews to inform potential future buyers, while sellers update prices and make listing decisions based on the feedback from buyers. In these systems, players’ decisions are generally not only interdependent, but also affect the underlying environment as well as future decisions and payoffs of all players in complex ways.

Determining the set of equilibria, or even solving for one equilibrium, in a given stochastic dynamic game can be a challenging task. The main challenges include: (a) the presence of an underlying environment/system that can change over time based on the actions of all players; (b) incomplete and asymmetric information; (c) large number of players, states, and actions; and (d) growing amount of information over time which results in a massive strategy space. As a result of the advances in technology, stochastic dynamic games today are often played by players (e.g. big corporations) that have access to substantial computational resources along with a large amount of data for decision making. Nevertheless, even these players are computationally constrained, and they must make decisions in real-time, hence complicated strategies may not be feasible for them. Therefore, it is important to determine computationally efficient strategies for players to play at equilibria. Compression of players’ information and then use of the strategies based on the compressed information is a well-heeled methodology that results in computationally efficient strategies. In this paper we address some of the above-mentioned challenges. We concentrate on the challenges associated with information compression, namely the existence of equilibria under information compression, and the preservation of all equilibrium payoff profiles under information compression. We leave as a topic of future investigation the discovery of efficient algorithms for the computation of equilibria based on strategies that use compressed information.

Specifically, our goal is to identify appropriate strategy-independent 111Strategy independent information compression maps are maps that are not parameterized by a strategy profile. Examples of strategy-independent information compression maps include those that use a fixed-subset of the game’s history (e.g. the most recent observation) or some statistics based on the game’s history (e.g. the number of times player i𝑖iitalic_i takes a certain action). Strategy-dependent maps are parameterized by a strategy profile (see Section 6). information compression maps in dynamic games so that the resulting compressed information has properties/features sufficient to satisfy the following requirements: (R1) existence of equilibria when all players use strategies based on the compressed information; (R2) equality of the set of all equilibrium payoff profiles that are achieved when all players use full information based-strategies with the set of all equilibrium payoff profiles that are achieved when all players use strategies based on the compressed information.

Inspired by the literature on single-agent decision/control problems, particularly the notion of information state, we develop notions of information state (compressed information) that satisfy requirements (R1) and (R2). Specifically, we introduce the notions of Mutually Sufficient Information (MSI) and Unilaterally Sufficient Information (USI). We show that MSI has properties/features sufficient to satisfy (R1), whereas USI has properties sufficient to satisfy (R2) under several different equilibrium concepts.

The remainder of the paper is organized as follows: In Section 1.1 we briefly review related literature in stochastic control and game theory. In Section 1.2 we list our contributions. In Section 1.3 we introduce our notation. In Section 2 we formulate our game model. In Section 3.1 and Section 3.2 we introduce the notion of mutually sufficient information and unilaterally sufficient information respectively. We present our main results in Section 4. We discuss these results in Section 5. We discuss an open problem, primarily associated with strategy-dependent information compression, in Section 6. We provide supporting results in Appendix A. We present alternative characterizations of sequential equilibria in Appendix B. We provide proofs of the results of Sections 3 and 4 in Appendix C. We present the details of the discussions in Section 5 and Section 6 in Appendix D.

1.1 Related Literature

We first present a brief literature survey on information compression in single-agent decision problems because it has inspired several of the key ideas presented in this paper.

Single-agent decision/control problems are problems where one agent chooses actions over time on top of an ever-changing system to maximize their total reward. These problems have been extensively studied in the control theory [23], operations research [39], computer science [41], and mathematics [8] literature. Models like Markov Decision Process (MDP) and Partially Observable Markov Decision Process (POMDP) have been analyzed and applied widely in real-world systems. It is well known that in an MDP, the agent can use a Markov strategy—making decisions based on the current state—without loss of optimality. A Markov strategy can be seen as a strategy based on compressed information: the full information—state and action history—is compressed into only the current state. Furthermore, in finite horizon problems, such optimal Markov strategies can be found through a sequential decomposition procedure. It is also well known that any POMDP can be transformed into an MDP with an appropriate belief acting as the underlying state [23, Chapter 6]. As a result, the agent can use a belief-based strategy without loss of optimality. A belief-based strategy compresses the full information into the conditional belief of the current state. Critically, this information compression is strategy-independent [2, 44, 45, 23]. For general single-agent control problems, sufficient conditions that guarantee optimality of compression-based strategies have been proposed under the names of sufficient statistic [43, 46, 58, 18, 47] and information state [23, 24, 48]. In these works, the authors transform single-agent control problems with partial observations into equivalent problems with complete observations with the sufficient statistic/information state acting as the underlying state.

Multi-agent dynamic decision problems are either teams where all agents have the same objective, or games where agents have different objectives and are strategic. Information compression in dynamic teams has been investigated in [55, 32, 34, 24, 54, 48, 19], and many other works (see [54] and [48] for a list of references). Dynamic games can be divided into two categories: those with a static underlying environment (e.g. repeated games), and those with an underlying dynamic system. Over the years, economics researchers have studied repeated games extensively (e.g. see [30, Chapter 7]). As our focus is on dynamic games with an underlying dynamic system, we will not discuss the literature on repeated games. Among models for dynamic games with an underlying dynamic system, the model of zero-sum games, as a particular class which possesses special properties, has been analyzed in [42, 28, 40] and many others (see [37] for a list of references). Non-zero-sum games with an underlying dynamic system and symmetric information have also been studied extensively [5, 9]. For such dynamic games with perfect information, the authors of [26] introduce the concept of Markov Perfect Equilibrium (MPE), where each player compresses their information into a Markov state. Dynamic games with asymmetric information have been analyzed in [29, 27, 31, 33, 13, 14, 35, 36, 53, 52, 56, 51, 37]. In [33], the authors introduce the concept of Common Information Based Markov Perfect Equilibrium (CIB-MPE), which is an extension of MPE in partially observable systems. In a CIB-MPE, all players choose their actions at each time based on the Common-Information-Based (CIB) belief (a compression of the common information) and private information instead of full information. The authors establish the existence of CIB-MPE under the assumption that the CIB belief is strategy-independent. Furthermore, the authors develop a sequential decomposition procedure to solve for such equilibria. In [36], the authors extend the result of [33] to a particular model where the CIB beliefs are strategy-dependent. They introduce the concept of Common Information Based Perfect Bayesian Equilibrium (CIB-PBE). In a CIB-PBE all players choose their actions based on the CIB belief and their private information. They show that such equilibria can be found through a sequential decomposition whenever the decomposition has a solution. The authors conjecture the existence of such equilibria. The authors of [51] extend the model of [36] to games among teams. They consider two compression maps and their associated equilibrium concepts. For the first compression map, which is strategy-independent, they establish preservation of equilibrium payoffs. For the second information compression map, which is strategy-dependent, they propose a sequential decomposition of the game. If the decomposition admits a solution, then there exists a CIB-BNE based on the compressed information. Furthermore, they provide an example where CIB-BNE based on this specific compressed information do not exist. The example also proves that the conjecture about the existence of CIB-PBEs, made in [36], is false.

In addition to the methods on information compression that appear in [26, 33, 36, 51], there are two lines of work on games where the players’ decisions are based on limited information. In the first line of work, players face exogenous hard constraints on the information that can be used to choose actions [38, 7, 12, 15, 3]. In the second line of work, players can utilize any finite automaton with any number of states to choose actions, however more complex automata are assumed to be more costly [1, 4]. In our work, we also deal with finite automaton based strategies. However, there is a critical difference between our work and both lines of literatureboth of the above-mentioned lines of work: Our primary interest is to study conditions under which a compression based strategy profile can form an equilibrium under standard equilibrium concepts when unbounded rationality and perfect recall are allowed. Under these equilibrium concepts, we do not restrict the strategy of any player, nor do we impose any penalty on complicated strategies. In other words, a compression based strategy needs to be a best response compared to all possible strategies with full recall in terms of the payoff alone. The methodology for information compression presented in this paper is similar in spirit to that of [26, 33, 36, 56, 37]. However, this paper is significantly different from those works as it deals with the discovery of information compression maps that lead not only to the existence (in general) of various types of compressed information based equilibria but also to the preservation of all equilibrium payoff profiles (a topic not investigated in [26, 33, 36, 56, 37]). This paper builds on [51]; it identifies embodiments of the two information compression maps studied in [51] for a much more general class of games than that of [51], and a broader set of equilibrium concepts.

1.2 Contributions

Our main contributions are the following:

  1. 1.

    We propose two notions of information states/compressed information for dynamic games with asymmetric information that result in from strategy-independent compression maps: Mutually Sufficient Information (MSI) and Universally Sufficient Information (USI) — Definitions 4 and 1, respectively. We present an example that highlights the differences between MSI and USI.

  2. 2.

    We show that in finite dynamic games with asymmetric information, Bayes–Nash Equilibria (BNE) and Sequential Equilibria (SE) exist when all players use MSI-based strategies — Theorems 2 and 4, respectively.

  3. 3.

    We prove that when all players employ USI-based strategies the resulting sets of BNE and SE payoff profiles are same as the sets of BNE and SE payoff profiles resulting when all players use full information based strategies — Theorems 3 and 5, respectively.

  4. 4.

    We prove that when all players use USI-based strategies the resulting set of weak Perfect Bayesian Equilibrium (wPBE) payoff profiles can be a proper subset of the set of all wPBE payoff profiles — Proposition 6. A result similar to that of Proposition 6 is also true under Watson’s PBE [57].

    Figure 1 depicts the results stated in Contributions 3 and 4 above.

    USI-based BNE = All BNEAll wPBEUSI-based wPBEUSI-based SE = All SE
    Figure 1: A Venn diagram showing the relationship of the sets of payoff profiles for different equilibrium concepts using either unilateral sufficient information (USI) based strategy profiles or general strategies.
  5. 5.

    We present several examples — Examples 5.3 through 5.6 — of finite dynamic games with asymmetric information where we identify MSI and USI.

Additional contributions of this work are:

  1. 1.

    A set of alternative definitions of SE — Appendix B. These definitions are equivalent to the original definition of SE given in [21] and help simplify some of the proofs of the main results in this paper.

  2. 2.

    A new methodology for establishing existence of equilibria. The methodology is based on a best response function defined through a dynamic program for a single-agent control problem.

  3. 3.

    A counterexample showing that a well-known strategy-dependent compression map, resulting in sufficient private information along with common information-based beliefs, does not guarantee existence of equilibria based on the above-stated compressed information.

1.3 Notation

We follow the notational convention of stochastic control literature (i.e. using random variables to define the system, representing information as random variables, etc.) instead of the convention of game theory literature (i.e. game trees, nodes, information sets, etc.) unless otherwise specified. This allows us to apply techniques from stochastic control, which we rely heavily upon, in a more natural way. We use capital letters to represent random variables, bold capital letters to denote random vectors, and lower case letters to represent realizations. We use superscripts to indicate players, and subscripts to indicate time. We use i𝑖iitalic_i to represent a typical player and i𝑖-i- italic_i to represent all players other than i𝑖iitalic_i. We use t1:t2:subscript𝑡1subscript𝑡2t_{1}:t_{2}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to indicate the collection of timestamps (t1,t1+1,,t2)subscript𝑡1subscript𝑡11subscript𝑡2(t_{1},t_{1}+1,\cdots,t_{2})( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 , ⋯ , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). For example, X1:4isuperscriptsubscript𝑋:14𝑖X_{1:4}^{i}italic_X start_POSTSUBSCRIPT 1 : 4 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT stands for the random vector (X11,X2i,X3i,X4i)superscriptsubscript𝑋11superscriptsubscript𝑋2𝑖superscriptsubscript𝑋3𝑖superscriptsubscript𝑋4𝑖(X_{1}^{1},X_{2}^{i},X_{3}^{i},X_{4}^{i})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). For random variables or random vectors represented by Latin letters, we use the corresponding script capital letters to denote the space of values these random vectors can take. For example, tisuperscriptsubscript𝑡𝑖\mathcal{H}_{t}^{i}caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT denotes the space of values the random vector Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT can take. The products of sets refers to Cartesian products. We use ()\mathbb{P}(\cdot)blackboard_P ( ⋅ ) and 𝔼[]𝔼delimited-[]\mathbb{E}[\cdot]blackboard_E [ ⋅ ] to denote probabilities and expectations, respectively. We use Δ(Ω)ΔΩ\Delta(\varOmega)roman_Δ ( roman_Ω ) to denote the set of probability distributions on a finite set ΩΩ\varOmegaroman_Ω. For a distribution νΔ(Ω)𝜈ΔΩ\nu\in\Delta(\varOmega)italic_ν ∈ roman_Δ ( roman_Ω ), we use supp(ν)supp𝜈\mathrm{supp}(\nu)roman_supp ( italic_ν ) to denote the support of ν𝜈\nuitalic_ν. When writing probabilities, we will omit the random variables when the lower case letters that represent the realizations clearly indicate the random variable it represents. For example, we will use (yti|xt,ut)conditionalsuperscriptsubscript𝑦𝑡𝑖subscript𝑥𝑡subscript𝑢𝑡\mathbb{P}(y_{t}^{i}|x_{t},u_{t})blackboard_P ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) as a shorthand for (Yti=yti|Xt=xt,Ut=ut)\mathbb{P}(Y_{t}^{i}=y_{t}^{i}|X_{t}=x_{t},U_{t}=u_{t})blackboard_P ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). When λ𝜆\lambdaitalic_λ is a function from Ω1subscriptΩ1\varOmega_{1}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to Δ(Ω2)ΔsubscriptΩ2\Delta(\varOmega_{2})roman_Δ ( roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), with some abuse of notation we write λ(ω2|ω1):=(λ(ω1))(ω2)assign𝜆conditionalsubscript𝜔2subscript𝜔1𝜆subscript𝜔1subscript𝜔2\lambda(\omega_{2}|\omega_{1}):=(\lambda(\omega_{1}))(\omega_{2})italic_λ ( italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) := ( italic_λ ( italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ( italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) as if λ𝜆\lambdaitalic_λ is a conditional distribution. We use 𝟏Asubscript1𝐴\bm{1}_{A}bold_1 start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT to denote the indicator random variable of an event A𝐴Aitalic_A.

In general, probability distributions of random variables in a dynamic system are only well defined after a complete strategy profile is specified. We specify the strategy profile that defines the distribution in superscripts, e.g. g(xti|ht0)superscript𝑔conditionalsuperscriptsubscript𝑥𝑡𝑖superscriptsubscript𝑡0\mathbb{P}^{g}(x_{t}^{i}|h_{t}^{0})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ). When the conditional probability is independent of a certain part of the strategy (gti)(i,t)Ωsubscriptsuperscriptsubscript𝑔𝑡𝑖𝑖𝑡Ω(g_{t}^{i})_{(i,t)\in\varOmega}( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT ( italic_i , italic_t ) ∈ roman_Ω end_POSTSUBSCRIPT, we may omit this part of the strategy in the notation, e.g. g1:t1(xt|y1:t1,u1:t1)superscriptsubscript𝑔:1𝑡1conditionalsubscript𝑥𝑡subscript𝑦:1𝑡1subscript𝑢:1𝑡1\mathbb{P}^{g_{1:t-1}}(x_{t}|y_{1:t-1},u_{1:t-1})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ), gi(uti|hti)superscriptsuperscript𝑔𝑖conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑡𝑖\mathbb{P}^{g^{i}}(u_{t}^{i}|h_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) or (xt+1|xt,ut)conditionalsubscript𝑥𝑡1subscript𝑥𝑡subscript𝑢𝑡\mathbb{P}(x_{t+1}|x_{t},u_{t})blackboard_P ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). We say that a realization of some random vector (for example htisuperscriptsubscript𝑡𝑖h_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT) is admissible under a partially specified strategy profile (for example gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT) if the realization has strictly positive probability under some completion of the partially specified strategy profile (In this example, that means gi,gi(hti)>0superscriptsuperscript𝑔𝑖superscript𝑔𝑖superscriptsubscript𝑡𝑖0\mathbb{P}^{g^{i},g^{-i}}(h_{t}^{i})>0blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) > 0 for some gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT). Whenever we write a conditional probability or conditional expectation, we implicitly assume that the condition has non-zero probability under the specified strategy profile. When only part of the strategy profile is specified in the superscript, we implicitly assume that the condition is admissible under the specified partial strategy profile. In this paper, we make heavy use of value functions and reward-to-go functions. Such functions will be clearly defined within their context with the following convention: Q𝑄Qitalic_Q stands for state-action value functions; V𝑉Vitalic_V stands for state value functions; and J𝐽Jitalic_J stands for reward-to-go functions for a given strategy profile (as opposed to Q𝑄Qitalic_Q or V𝑉Vitalic_V, both of which are typically defined via a maximum over all strategies).

2 Game Model and Objectives

2.1 Game Model

In this section we formulate a general model for a finite horizon dynamic game with finitely many players.

Denote the set of players by \mathcal{I}caligraphic_I. Denote the set of timestamps by 𝒯={1,2,,T}𝒯12𝑇\mathcal{T}=\{1,2,\cdots,T\}caligraphic_T = { 1 , 2 , ⋯ , italic_T }. At time t𝑡titalic_t, player i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I takes action Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, obtains instantaneous reward Rtisuperscriptsubscript𝑅𝑡𝑖R_{t}^{i}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and then learns new information Ztisuperscriptsubscript𝑍𝑡𝑖Z_{t}^{i}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Player i𝑖iitalic_i may not necessarily observe the instantaneous rewards Rtisuperscriptsubscript𝑅𝑡𝑖R_{t}^{i}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT directly. The reward is observable only if it is part of Ztisuperscriptsubscript𝑍𝑡𝑖Z_{t}^{i}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Define Zt=(Zti)i,Ut=(Uti)iformulae-sequencesubscript𝑍𝑡subscriptsuperscriptsubscript𝑍𝑡𝑖𝑖subscript𝑈𝑡subscriptsuperscriptsubscript𝑈𝑡𝑖𝑖Z_{t}=(Z_{t}^{i})_{i\in\mathcal{I}},U_{t}=(U_{t}^{i})_{i\in\mathcal{I}}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT, and Rt=(Rti)isubscript𝑅𝑡subscriptsuperscriptsubscript𝑅𝑡𝑖𝑖R_{t}=(R_{t}^{i})_{i\in\mathcal{I}}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT. We assume that there is an underlying state variable Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and

(Xt+1,Zt,Rt)subscript𝑋𝑡1subscript𝑍𝑡subscript𝑅𝑡\displaystyle(X_{t+1},Z_{t},R_{t})( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) =ft(Xt,Ut,Wt),t𝒯,formulae-sequenceabsentsubscript𝑓𝑡subscript𝑋𝑡subscript𝑈𝑡subscript𝑊𝑡𝑡𝒯\displaystyle=f_{t}(X_{t},U_{t},W_{t}),\qquad t\in\mathcal{T},= italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_t ∈ caligraphic_T , (1)

where (ft)t𝒯subscriptsubscript𝑓𝑡𝑡𝒯(f_{t})_{t\in\mathcal{T}}( italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT are fixed functions. The primitive random variable X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT represents the initial move of nature. The primitive random vector H1=(H1i)isubscript𝐻1subscriptsuperscriptsubscript𝐻1𝑖𝑖H_{1}=(H_{1}^{i})_{i\in\mathcal{I}}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT represents the initial information of the players. The initial state and information X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are, in general, correlated. The random variables (Wt)t=1Tsuperscriptsubscriptsubscript𝑊𝑡𝑡1𝑇(W_{t})_{t=1}^{T}( italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT are mutually independent primitive random variables representing nature’s move. The vector (X1,H1)subscript𝑋1subscript𝐻1(X_{1},H_{1})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) is assumed to be mutually independent with W1,W2,,WTsubscript𝑊1subscript𝑊2subscript𝑊𝑇W_{1},W_{2},\cdots,W_{T}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_W start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. The distributions of the primitive random variables are common knowledge to all players.

Define 𝒳t,𝒰t,𝒵t,𝒲t,1subscript𝒳𝑡subscript𝒰𝑡subscript𝒵𝑡subscript𝒲𝑡subscript1\mathcal{X}_{t},\mathcal{U}_{t},\mathcal{Z}_{t},\mathcal{W}_{t},\mathcal{H}_{1}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to be the sets of possible values of Xt,Ut,Zt,Wt,H1subscript𝑋𝑡subscript𝑈𝑡subscript𝑍𝑡subscript𝑊𝑡subscript𝐻1X_{t},U_{t},Z_{t},W_{t},H_{1}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT respectively. The sets 𝒳t,𝒰t,𝒵t,𝒲t,1subscript𝒳𝑡subscript𝒰𝑡subscript𝒵𝑡subscript𝒲𝑡subscript1\mathcal{X}_{t},\mathcal{U}_{t},\mathcal{Z}_{t},\mathcal{W}_{t},\mathcal{H}_{1}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are assumed to be common knowledge among all players. In this work, in order to focus on conceptual difficulties instead of technical issues, we make the following assumption.

Assumption 1.

𝒳t,𝒰t,𝒵t,𝒲t,1subscript𝒳𝑡subscript𝒰𝑡subscript𝒵𝑡subscript𝒲𝑡subscript1\mathcal{X}_{t},\mathcal{U}_{t},\mathcal{Z}_{t},\mathcal{W}_{t},\mathcal{H}_{1}caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are finite sets, and Rtisuperscriptsubscript𝑅𝑡𝑖R_{t}^{i}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is supported on [1,1]11[-1,1][ - 1 , 1 ].

We assume perfect recall, i.e. the information player i𝑖iitalic_i has at time t𝑡titalic_t is Hti=(H1i,Z1:t1i)superscriptsubscript𝐻𝑡𝑖superscriptsubscript𝐻1𝑖superscriptsubscript𝑍:1𝑡1𝑖H_{t}^{i}=(H_{1}^{i},Z_{1:t-1}^{i})italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), and player i𝑖iitalic_i’s action Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is contained in the new information Ztisuperscriptsubscript𝑍𝑡𝑖Z_{t}^{i}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. A behavioral strategy gi=(gti)t𝒯superscript𝑔𝑖subscriptsuperscriptsubscript𝑔𝑡𝑖𝑡𝒯g^{i}=(g_{t}^{i})_{t\in\mathcal{T}}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT of player i𝑖iitalic_i is a collection of functions gti:tiΔ(𝒰ti):superscriptsubscript𝑔𝑡𝑖maps-tosuperscriptsubscript𝑡𝑖Δsuperscriptsubscript𝒰𝑡𝑖g_{t}^{i}\colon\mathcal{H}_{t}^{i}\mapsto\Delta(\mathcal{U}_{t}^{i})italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), where tisuperscriptsubscript𝑡𝑖\mathcal{H}_{t}^{i}caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the space where Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT takes values. Under a behavioral strategy profile g=(gi)i𝑔subscriptsuperscript𝑔𝑖𝑖g=(g^{i})_{i\in\mathcal{I}}italic_g = ( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT, the total reward/payoff of player i𝑖iitalic_i in this game is given by

Ji(g):=𝔼g[t=1TRti].assignsuperscript𝐽𝑖𝑔superscript𝔼𝑔delimited-[]superscriptsubscript𝑡1𝑇superscriptsubscript𝑅𝑡𝑖J^{i}(g):=\mathbb{E}^{g}\left[\sum_{t=1}^{T}R_{t}^{i}\right].italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g ) := blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] . (2)
Remark 1.

This is not a restrictive model: By choosing appropriate state representation Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and instantaneous reward vector Rtsubscript𝑅𝑡R_{t}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, it can be used to model any finite-node extensive form sequential game with perfect recall.

We initially consider two solution concepts for dynamic games with asymmetric information: Bayes–Nash Equilibrium (BNE) and Sequential Equilibrium (SE). We define BNE and SE below.

Definition 1 (Bayes-Nash Equilibrium).

A behavioral strategy profile g𝑔gitalic_g is said to form a Bayes-Nash equilibrium (BNE) if for any player i𝑖iitalic_i and any behavioral strategy g~isuperscript~𝑔𝑖\tilde{g}^{i}over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT of player i𝑖iitalic_i, we have Ji(g)Ji(g~i,gi)superscript𝐽𝑖𝑔superscript𝐽𝑖superscript~𝑔𝑖superscript𝑔𝑖J^{i}(g)\geq J^{i}(\tilde{g}^{i},g^{-i})italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g ) ≥ italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ).

Definition 2 (Sequential Equilibrium).

Let g=(gi)i𝑔subscriptsuperscript𝑔𝑖𝑖g=(g^{i})_{i\in\mathcal{I}}italic_g = ( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT be a behavioral strategy profile. Let Q=(Qti)i,t𝒯𝑄subscriptsuperscriptsubscript𝑄𝑡𝑖formulae-sequence𝑖𝑡𝒯Q=(Q_{t}^{i})_{i\in\mathcal{I},t\in\mathcal{T}}italic_Q = ( italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T end_POSTSUBSCRIPT be a collection of history-action value functions, i.e. Qti:ti×𝒰ti:superscriptsubscript𝑄𝑡𝑖maps-tosuperscriptsubscript𝑡𝑖superscriptsubscript𝒰𝑡𝑖Q_{t}^{i}\colon\mathcal{H}_{t}^{i}\times\mathcal{U}_{t}^{i}\mapsto\mathbb{R}italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ blackboard_R. The strategy profile g𝑔gitalic_g is said to be sequentially rational under Q𝑄Qitalic_Q if for each i,t𝒯formulae-sequence𝑖𝑡𝒯i\in\mathcal{I},t\in\mathcal{T}italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T and each htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT,

supp(gti(hti))argmaxutiQti(hti,uti).suppsuperscriptsubscript𝑔𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathrm{supp}(g_{t}^{i}(h_{t}^{i}))\subseteq\underset{u_{t}^{i}}{\arg\max}~{}Q% _{t}^{i}(h_{t}^{i},u_{t}^{i}).roman_supp ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ⊆ start_UNDERACCENT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (3)

Q𝑄Qitalic_Q is said to be fully consistent with g𝑔gitalic_g if there exist a sequence of pairs of strategies and history-action value functions (g(n),Q(n))n=1superscriptsubscriptsuperscript𝑔𝑛superscript𝑄𝑛𝑛1(g^{(n)},Q^{(n)})_{n=1}^{\infty}( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT such that

  1. (1)

    g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is fully mixed, i.e. every action is chosen with positive probability at every information set.

  2. (2)

    Q(n)superscript𝑄𝑛Q^{(n)}italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is consistent with g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT, i.e.,

    Qτ(n),i(hτi,uτi)superscriptsubscript𝑄𝜏𝑛𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle Q_{\tau}^{(n),i}(h_{\tau}^{i},u_{\tau}^{i})italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g(n)[t=τTRti|hτi,uτi],absentsuperscript𝔼superscript𝑔𝑛delimited-[]conditionalsuperscriptsubscript𝑡𝜏𝑇superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle=\mathbb{E}^{g^{(n)}}\left[\sum_{t=\tau}^{T}R_{t}^{i}\Big{|}h_{% \tau}^{i},u_{\tau}^{i}\right],= blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] , (4)

    for each i,τ𝒯,hτiτi,uτi𝒰τiformulae-sequence𝑖formulae-sequence𝜏𝒯formulae-sequencesuperscriptsubscript𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript𝒰𝜏𝑖i\in\mathcal{I},\tau\in\mathcal{T},h_{\tau}^{i}\in\mathcal{H}_{\tau}^{i},u_{% \tau}^{i}\in\mathcal{U}_{\tau}^{i}italic_i ∈ caligraphic_I , italic_τ ∈ caligraphic_T , italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

  3. (3)

    (g(n),Q(n))(g,Q)superscript𝑔𝑛superscript𝑄𝑛𝑔𝑄(g^{(n)},Q^{(n)})\rightarrow(g,Q)( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) → ( italic_g , italic_Q ) as n𝑛n\rightarrow\inftyitalic_n → ∞.

A tuple (g,Q)𝑔𝑄(g,Q)( italic_g , italic_Q ) is said to be a sequential equilibrium if g𝑔gitalic_g is sequentially rational under Q𝑄Qitalic_Q and Q𝑄Qitalic_Q is fully consistent with g𝑔gitalic_g.

Whereas Definition 2 of SE is different from that of [21], we show in Appendix B that it is equivalent to the concept in [21]. We use Definition 2 as it is more suitable for the development of our results.

In this paper, we are interested in analyzing the performance of strategy profiles that are based on some form of compressed information. Let Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be a function of Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT that can be sequentially updated, i.e. there exist functions (ιti)t𝒯subscriptsuperscriptsubscript𝜄𝑡𝑖𝑡𝒯(\iota_{t}^{i})_{t\in\mathcal{T}}( italic_ι start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT such that

K1isuperscriptsubscript𝐾1𝑖\displaystyle K_{1}^{i}italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =ι1i(H1i),absentsuperscriptsubscript𝜄1𝑖superscriptsubscript𝐻1𝑖\displaystyle=\iota_{1}^{i}(H_{1}^{i}),= italic_ι start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (5)
Ktisuperscriptsubscript𝐾𝑡𝑖\displaystyle K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =ιti(Kt1i,Zt1i),t𝒯\{1}.formulae-sequenceabsentsuperscriptsubscript𝜄𝑡𝑖superscriptsubscript𝐾𝑡1𝑖superscriptsubscript𝑍𝑡1𝑖𝑡\𝒯1\displaystyle=\iota_{t}^{i}(K_{t-1}^{i},Z_{t-1}^{i}),\qquad t\in\mathcal{T}% \backslash\{1\}.= italic_ι start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , italic_t ∈ caligraphic_T \ { 1 } . (6)

Write Ki=(Kti)t𝒯superscript𝐾𝑖subscriptsuperscriptsubscript𝐾𝑡𝑖𝑡𝒯K^{i}=(K_{t}^{i})_{t\in\mathcal{T}}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT and K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT. We will refer to Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT as the compression of player i𝑖iitalic_i’s information under ιi=(ιti)t𝒯superscript𝜄𝑖subscriptsuperscriptsubscript𝜄𝑡𝑖𝑡𝒯\iota^{i}=(\iota_{t}^{i})_{t\in\mathcal{T}}italic_ι start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_ι start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT. A Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based (behavioral) strategy ρi=(ρti)t𝒯superscript𝜌𝑖subscriptsuperscriptsubscript𝜌𝑡𝑖𝑡𝒯\rho^{i}=(\rho_{t}^{i})_{t\in\mathcal{T}}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT is a collection of functions ρti:𝒦tiΔ(𝒰ti):superscriptsubscript𝜌𝑡𝑖maps-tosuperscriptsubscript𝒦𝑡𝑖Δsuperscriptsubscript𝒰𝑡𝑖\rho_{t}^{i}\colon\mathcal{K}_{t}^{i}\mapsto\Delta(\mathcal{U}_{t}^{i})italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). A strategy profile where each player i𝑖iitalic_i uses a Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy is called a K𝐾Kitalic_K-based strategy profile. If a K𝐾Kitalic_K-based strategy profile forms an Bayes-Nash (resp. sequential) equilibrium, then it is called a K𝐾Kitalic_K-based Bayes-Nash (resp. sequential) equilibrium. Note that unlike [38, 7, 12, 15, 3], we require the K𝐾Kitalic_K-based BNE and K𝐾Kitalic_K-based SE to contain no profitable deviation among all full-history-based strategies.

2.2 Objectives

Our goal is to discover properties/features of the compressed information K𝐾Kitalic_K sufficient to guarantee that (i) there exists K𝐾Kitalic_K-based BNE and SE; (ii) the set of K𝐾Kitalic_K-based BNE (resp. SE) payoff profiles is equal to the set of (general strategy based) BNE (resp. SE) profiles under perfect recall.

To achieve the above-stated objectives we proceed as follows: First, we introduce two notions of information state, namely MSI and USI (Section 3). Then, we investigate the existence of MSI-based and USI-based BNE and SE, as well as the preservation of the set of all BNE and SE payoff profiles when USI-based strategies are employed by all players (Section 4).

Remark 2.

A key challenge in achieving the above-stated goal is the following: Unlike the case of perfect recall, one may not be able to recover Kt1isuperscriptsubscript𝐾𝑡1𝑖K_{t-1}^{i}italic_K start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT from Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Therefore, Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based (behavioral) strategies are not equivalent to mixed strategies supported on the set of Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based pure strategies. This fact creates difficulty for analyzing Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategies since the standard technique of using Kuhn’s Theorem [22] to transform mixed strategies to behavioral strategies does not apply. To resolve this challenge, we developed stochastic control theory-based techniques that allow us to work with Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based behavioral strategies directly rather than transforming from a mixed strategy.

Remark 3.

In the following sections, when referring to the compressed information Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, we will consider the compression mappings ιisuperscript𝜄𝑖\iota^{i}italic_ι start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT to be fixed and given, so that Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is fixed given Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. The space of compressed information 𝒦tisuperscriptsubscript𝒦𝑡𝑖\mathcal{K}_{t}^{i}caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a fixed, finite set given ιisuperscript𝜄𝑖\iota^{i}italic_ι start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. When we use ktisuperscriptsubscript𝑘𝑡𝑖k_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT to represent a realization of Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, we assume that it corresponds to the compression of Hti=htisuperscriptsubscript𝐻𝑡𝑖superscriptsubscript𝑡𝑖H_{t}^{i}=h_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT under the fixed ιisuperscript𝜄𝑖\iota^{i}italic_ι start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

3 Two Definitions of Information State

Before we define notions of information state in dynamic games we introduce the notion of information state for one player when other players’ strategies are fixed. The following definition is an extension of the definition of information state in [48].

Definition 3.

Let gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT be a behavioral strategy profile of players other than i𝑖iitalic_i. We say that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is an information state under gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT if there exist functions (Pti,gi)t𝒯,(rti,gi)t𝒯subscriptsuperscriptsubscript𝑃𝑡𝑖superscript𝑔𝑖𝑡𝒯subscriptsuperscriptsubscript𝑟𝑡𝑖superscript𝑔𝑖𝑡𝒯(P_{t}^{i,g^{-i}})_{t\in\mathcal{T}},(r_{t}^{i,g^{-i}})_{t\in\mathcal{T}}( italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT , ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT, where Pti,gi:𝒦ti×𝒰tiΔ(𝒦t+1i):superscriptsubscript𝑃𝑡𝑖superscript𝑔𝑖maps-tosuperscriptsubscript𝒦𝑡𝑖superscriptsubscript𝒰𝑡𝑖Δsuperscriptsubscript𝒦𝑡1𝑖P_{t}^{i,g^{-i}}\colon\mathcal{K}_{t}^{i}\times\mathcal{U}_{t}^{i}\mapsto% \Delta(\mathcal{K}_{t+1}^{i})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) and rti,gi:𝒦ti×𝒰ti[1,1]:superscriptsubscript𝑟𝑡𝑖superscript𝑔𝑖maps-tosuperscriptsubscript𝒦𝑡𝑖superscriptsubscript𝒰𝑡𝑖11r_{t}^{i,g^{-i}}\colon\mathcal{K}_{t}^{i}\times\mathcal{U}_{t}^{i}\mapsto[-1,1]italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ [ - 1 , 1 ], such that

  1. (1)

    gi,gi(kt+1i|hti,uti)=Pti,gi(kt+1i|kti,uti)superscriptsuperscript𝑔𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑃𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathbb{P}^{g^{i},g^{-i}}(k_{t+1}^{i}|h_{t}^{i},u_{t}^{i})=P_{t}^{i,g^{-i}}(k_% {t+1}^{i}|k_{t}^{i},u_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for all t𝒯\{T}𝑡\𝒯𝑇t\in\mathcal{T}\backslash\{T\}italic_t ∈ caligraphic_T \ { italic_T };

  2. (2)

    𝔼gi,gi[Rti|hti,uti]=rti,gi(kti,uti)superscript𝔼superscript𝑔𝑖superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑟𝑡𝑖superscript𝑔𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathbb{E}^{g^{i},g^{-i}}[R_{t}^{i}|h_{t}^{i},u_{t}^{i}]=r_{t}^{i,g^{-i}}(k_{t% }^{i},u_{t}^{i})blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T,

for all gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and all (hti,uti)superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖(h_{t}^{i},u_{t}^{i})( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) admissible under (gi,gi)superscript𝑔𝑖superscript𝑔𝑖(g^{i},g^{-i})( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ). (Both Pti,gisuperscriptsubscript𝑃𝑡𝑖superscript𝑔𝑖P_{t}^{i,g^{-i}}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and rti,gisuperscriptsubscript𝑟𝑡𝑖superscript𝑔𝑖r_{t}^{i,g^{-i}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT may depend on gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, but they do not depend on gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.)

In the absence of other players, the above definition is exactly the same as the definition of information state for player i𝑖iitalic_i’s control problem. When other players are present, the parameters of player i𝑖iitalic_i’s control problem, in general, depend on the strategy of other players. As a consequence, an information state under one strategy profile gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT may not be an information state under a different strategy profile g~isuperscript~𝑔𝑖\tilde{g}^{-i}over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT.

3.1 Mutually Sufficient Information

Definition 4 (Mutually Sufficient Information).

We say that K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT is mutually sufficient information (MSI) if for all players i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I and all Kisuperscript𝐾𝑖K^{-i}italic_K start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT-based strategy profiles ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is an information state under ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT.

In words, MSI represents mutually consistent compression of information in a dynamic game: Player i𝑖iitalic_i could compress their information to Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT without loss of performance when other players are compressing their information to Kisuperscript𝐾𝑖K^{-i}italic_K start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT. Note that MSI imposes interdependent conditions on the compression maps of all players: It requires the compression maps of all players to be consistent with each other.

The following lemma provides a sufficient condition for a compression maps to yield mutually sufficient information.

Lemma 1.

If for all i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I and all Kisuperscript𝐾𝑖K^{-i}italic_K start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT-based strategy profiles ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, there exist functions (Φti,ρi)t𝒯subscriptsuperscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖𝑡𝒯(\Phi_{t}^{i,\rho^{-i}})_{t\in\mathcal{T}}( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT where Φti,ρi:𝒦tiΔ(𝒳t×𝒦ti):superscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖maps-tosuperscriptsubscript𝒦𝑡𝑖Δsubscript𝒳𝑡superscriptsubscript𝒦𝑡𝑖\Phi_{t}^{i,\rho^{-i}}\colon\mathcal{K}_{t}^{i}\mapsto\Delta(\mathcal{X}_{t}% \times\mathcal{K}_{t}^{-i})roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) such that

gi,ρi(xt,kti|hti)=Φti,ρi(xt,kti|kti),superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript𝑥𝑡conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖subscript𝑥𝑡conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖\mathbb{P}^{g^{i},\rho^{-i}}(x_{t},k_{t}^{-i}|h_{t}^{i})=\Phi_{t}^{i,\rho^{-i}% }(x_{t},k_{t}^{-i}|k_{t}^{i}),blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (7)

for all behavioral strategies gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T, and all htisuperscriptsubscript𝑡𝑖h_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT admissible under (gi,ρi)superscript𝑔𝑖superscript𝜌𝑖(g^{i},\rho^{-i})( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ), then K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT is mutually sufficient information.

Proof 3.1.

See Appendix C.1.

In words, the condition of Lemma 1 means that Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT has the same predictive power as Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT in terms of forming a belief on the current state and other players’ compressed information whenever other players are using compression-based strategies. This belief is sufficient for player i𝑖iitalic_i to predict other player’s actions and future state evolution. Since other players are using compression-based strategies, player i𝑖iitalic_i does not have to form a belief on other player’s full information in order to predict other players’ actions.

3.2 Unilaterally Sufficient Information

Definition 1 (Unilaterally Sufficient Information).

We say that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information (USI) for player i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I if there exist functions (Fti,gi)t𝒯subscriptsuperscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖𝑡𝒯(F_{t}^{i,g^{i}})_{t\in\mathcal{T}}( italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT and (Φti,gi)t𝒯subscriptsuperscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖𝑡𝒯(\Phi_{t}^{i,g^{-i}})_{t\in\mathcal{T}}( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT where Fti,gi:𝒦tiΔ(ti),Φti,gi:𝒦tiΔ(𝒳t×ti):superscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖maps-tosuperscriptsubscript𝒦𝑡𝑖Δsuperscriptsubscript𝑡𝑖superscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖:maps-tosuperscriptsubscript𝒦𝑡𝑖Δsubscript𝒳𝑡superscriptsubscript𝑡𝑖F_{t}^{i,g^{i}}\colon\mathcal{K}_{t}^{i}\mapsto\Delta(\mathcal{H}_{t}^{i}),% \Phi_{t}^{i,g^{-i}}\colon\mathcal{K}_{t}^{i}\mapsto\Delta(\mathcal{X}_{t}% \times\mathcal{H}_{t}^{-i})italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) such that

g(xt,ht|kti)=Fti,gi(hti|kti)Φti,gi(xt,hti|kti),superscript𝑔subscript𝑥𝑡conditionalsubscript𝑡superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖subscript𝑥𝑡conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑘𝑡𝑖\mathbb{P}^{g}(x_{t},h_{t}|k_{t}^{i})=F_{t}^{i,g^{i}}(h_{t}^{i}|k_{t}^{i})\Phi% _{t}^{i,g^{-i}}(x_{t},h_{t}^{-i}|k_{t}^{i}),blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (8)

for all behavioral strategy profiles g𝑔gitalic_g, all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T, and all ktisuperscriptsubscript𝑘𝑡𝑖k_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT admissible under g𝑔gitalic_g.222In the case where random vectors Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{-i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT share some common components, (8) should be interpreted in the following way: xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, htisuperscriptsubscript𝑡𝑖h_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and htisuperscriptsubscript𝑡𝑖h_{t}^{-i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT are three separate realizations that are not necessarily congruent with each other (i.e. they can disagree on their common parts). In the case of incongruency, the left-hand side equals 0. The equation needs to be true for all combinations of xt𝒳tsubscript𝑥𝑡subscript𝒳𝑡x_{t}\in\mathcal{X}_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{-i}\in\mathcal{H}_{t}^{-i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT.

The definition of USI can be separated into two parts: The first part states that the conditional distribution of Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, player i𝑖iitalic_i’s full information, given Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the compressed information, does not depend on other players’ strategies. This is similar to the idea of sufficient statistics in the statistics literature [20]: If player i𝑖iitalic_i would like to use their “data” Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT to estimate the “parameter” gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, then Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a sufficient statistic for this parameter estimation problem. The second part states that Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT has the same predictive power as Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT in terms of forming a belief on the current state and other players’ full information. In contrast to the definition of mutually sufficient information, if Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information, then Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is sufficient for player i𝑖iitalic_i’s decision making regardless of whether other players are using any information compression map.

3.3 Comparison

Using Lemma 1 it can be shown that if Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is USI for each i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I, then K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT is MSI. The converse is not true. The following example illustrates the difference between MSI and USI.

Example 3.2.

Consider a two stage stateless (i.e. Xt=subscript𝑋𝑡X_{t}=\varnothingitalic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∅) game of two players: Alice (A) moves first and Bob (B) moves afterwards. There is no initial information (i.e. H1A=H1B=superscriptsubscript𝐻1𝐴superscriptsubscript𝐻1𝐵H_{1}^{A}=H_{1}^{B}=\varnothingitalic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = ∅).

At time t=1𝑡1t=1italic_t = 1, Alice chooses U1A{0,1}superscriptsubscript𝑈1𝐴01U_{1}^{A}\in\{0,1\}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ∈ { 0 , 1 }. The instantaneous rewards of both players are given by

R1A=U1A,R1B=U1A.formulae-sequencesuperscriptsubscript𝑅1𝐴superscriptsubscript𝑈1𝐴superscriptsubscript𝑅1𝐵superscriptsubscript𝑈1𝐴R_{1}^{A}=U_{1}^{A},R_{1}^{B}=-U_{1}^{A}.italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = - italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT . (9)

The new information of both Alice and Bob at time 1111 is Z1A=Z1B=U1Asuperscriptsubscript𝑍1𝐴superscriptsubscript𝑍1𝐵superscriptsubscript𝑈1𝐴Z_{1}^{A}=Z_{1}^{B}=U_{1}^{A}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT, i.e. Alice’s action is observed.

At time t=2𝑡2t=2italic_t = 2, Bob chooses U2B{1,1}superscriptsubscript𝑈2𝐵11U_{2}^{B}\in\{-1,1\}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ∈ { - 1 , 1 }. The instantaneous rewards of both players are given by

R2A=U2B,R2B=0.formulae-sequencesuperscriptsubscript𝑅2𝐴superscriptsubscript𝑈2𝐵superscriptsubscript𝑅2𝐵0R_{2}^{A}=U_{2}^{B},R_{2}^{B}=0.italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = 0 . (10)

Set KtA=HtAsuperscriptsubscript𝐾𝑡𝐴superscriptsubscript𝐻𝑡𝐴K_{t}^{A}=H_{t}^{A}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT and KtB=superscriptsubscript𝐾𝑡𝐵K_{t}^{B}=\varnothingitalic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = ∅ for both t{1,2}𝑡12t\in\{1,2\}italic_t ∈ { 1 , 2 }. It can be shown that K𝐾Kitalic_K is mutually sufficient information. However, KBsuperscript𝐾𝐵K^{B}italic_K start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT is not unilaterally sufficient information: We have g(h2B|k2B)=g(u1A)=g1A(u1A|)superscript𝑔conditionalsuperscriptsubscript2𝐵superscriptsubscript𝑘2𝐵superscript𝑔superscriptsubscript𝑢1𝐴superscriptsubscript𝑔1𝐴conditionalsuperscriptsubscript𝑢1𝐴\mathbb{P}^{g}(h_{2}^{B}|k_{2}^{B})=\mathbb{P}^{g}(u_{1}^{A})=g_{1}^{A}(u_{1}^% {A}|\varnothing)blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ) = italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT | ∅ ), while the definition of USI requires that g(h2B|k2B)=FtB,gB(h2B|k2B)superscript𝑔conditionalsuperscriptsubscript2𝐵superscriptsubscript𝑘2𝐵superscriptsubscript𝐹𝑡𝐵superscript𝑔𝐵conditionalsuperscriptsubscript2𝐵superscriptsubscript𝑘2𝐵\mathbb{P}^{g}(h_{2}^{B}|k_{2}^{B})=F_{t}^{B,g^{B}}(h_{2}^{B}|k_{2}^{B})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) = italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B , italic_g start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) for some function FtB,gBsuperscriptsubscript𝐹𝑡𝐵superscript𝑔𝐵F_{t}^{B,g^{B}}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B , italic_g start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT that does not depend on gAsuperscript𝑔𝐴g^{A}italic_g start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT.

4 Information-State Based Equilibrium

In this section, we formulate our result on MSI and USI based equilibria for two equilibrium concepts: Bayes–Nash equilibria and sequential equilibria.

4.1 Information-State Based Bayes–Nash Equilibrium

Theorem 2.

If K𝐾Kitalic_K is mutually sufficient information, then there exists at least one K𝐾Kitalic_K-based BNE.

Proof 4.1.

See Appendix C.2.

The main idea for the proof of Theorem 2 is the definition of a best-response correspondence through the dynamic program for an underlying single-agent control problem.

Theorem 3.

If K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT where Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information for player i𝑖iitalic_i, then the set of K𝐾Kitalic_K-based BNE payoffs is the same as that of all BNE.

Proof 4.2.

See Appendix C.3.

The intuition behind Theorem 3 is that one can think of player i𝑖iitalic_i’s information that is not included in the unilaterally sufficient information Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT as a private randomization device for player i𝑖iitalic_i: When player i𝑖iitalic_i is using a strategy that depends on their information outside of Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, it is as if they are using a randomized Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy. The main idea for the proof of Theorem 3 is to show that for every BNE strategy profile g𝑔gitalic_g, player i𝑖iitalic_i can switch to an “equivalent” randomized Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT while maintaining the equilibrium and payoffs.333Besides the connection of USI to sufficient statistics, the idea behind the construction of the equivalent Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy is also closely related to the idea of the Rao–Blackwell estimator [20], where a new estimator is obtained by taking the conditional expectation of the old estimator given the sufficient statistics. The theorem then follows from iteratively switching the strategy of each player.

Example 3.2 can also be used to illustrate that when K𝐾Kitalic_K is an MSI but not an USI, K𝐾Kitalic_K-based BNE exist but K𝐾Kitalic_K-based strategies do not attain all equilibrium payoffs.

Example 3.2 (Continued).

In this example, KtA=HtA,KtB=formulae-sequencesuperscriptsubscript𝐾𝑡𝐴superscriptsubscript𝐻𝑡𝐴superscriptsubscript𝐾𝑡𝐵K_{t}^{A}=H_{t}^{A},K_{t}^{B}=\varnothingitalic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = ∅ for t=1,2𝑡12t=1,2italic_t = 1 , 2 is MSI. Furthermore, it can be shown that the following strategy profiles are BNE of the game: (E1) Alice plays U1A=1superscriptsubscript𝑈1𝐴1U_{1}^{A}=1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = 1 at time 1 and Bob plays U2B=1superscriptsubscript𝑈2𝐵1U_{2}^{B}=1italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = 1 irrespective of Alice’s action at time 1; and (E2) Alice plays U1A=0superscriptsubscript𝑈1𝐴0U_{1}^{A}=0italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = 0 at time 1; Bob plays U2B=1superscriptsubscript𝑈2𝐵1U_{2}^{B}=1italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = 1 if U1A=0superscriptsubscript𝑈1𝐴0U_{1}^{A}=0italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = 0 and U2B=1superscriptsubscript𝑈2𝐵1U_{2}^{B}=-1italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = - 1 if U1A=1superscriptsubscript𝑈1𝐴1U_{1}^{A}=1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = 1. Equilibrium (E1) is a K𝐾Kitalic_K-based equilibrium. However, (E2) cannot be attained by K𝐾Kitalic_K-based strategy profile for the following reason: In any K𝐾Kitalic_K-based equilibrium, Bob plays the same mixed strategy irrespective of Alice’s action and his expected payoff at the end of the game is 11-1- 1. At (E2), Bob’s expected payoff at the end of the game is 00. Therefore, the payoff at (E2) cannot be attained by any K𝐾Kitalic_K-based strategy profile.

4.2 Information-State Based Sequential Equilibrium

Theorem 4.

If K𝐾Kitalic_K is mutually sufficient information, then there exists at least one K𝐾Kitalic_K-based sequential equilibrium.

Proof 4.3.

See Appendix C.4.

The proof of Theorem 4 follows steps similar to that of Theorem 2. The difference is that we explicitly construct a sequence of conjectured history-action value functions Q(n)superscript𝑄𝑛Q^{(n)}italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT (as defined in Definition 2) using the dynamic program of player i𝑖iitalic_i’s decision problem. Then we argue that the strategies and the conjectures satisfies Definition 2.

Theorem 5.

If K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT where Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information for player i𝑖iitalic_i, then the set of K𝐾Kitalic_K-based sequential equilibrium payoffs is the same as that of all sequential equilibria.

Proof 4.4.

See Appendix C.5.

The proof of Theorem 5 mostly follows the same ideas for Theorem 3: for each sequential equilibrium strategy profile g𝑔gitalic_g, we construct an “equivalent” Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for player i𝑖iitalic_i with similar construction as in Theorem 3. The critical part is to show that ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is still sequentially rational under the concept of sequential equilibrium.

5 Discussion

In this section we first investigate if USI can preserve the set of equilibrium payoffs achievable under perfect recall when refinements of BNE other than SE, namely, various versons of Perfect Bayesian Equilibrium (PBE), are considered. Then, we identify MSI and USI in specific models that appeared in the literature.

5.1 Other Equilibrium Concepts

We first present Example 5.1 to show that the result of Theorem 5 is not true when we replace SE with the concept of weak Perfect Bayesian Equilibrium (wPBE) [25] which is a refinement of BNE that is weaker than SE. Then, we discuss how the result of Proposition 6, that is, part of Example 5.1 and appears below, applies or does not apply to other versions of PBE, namely, those defined in [57] and [6].

The concept of wPBE is defined as follows: Let (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ) be an assessment, where g𝑔gitalic_g is a behavioral strategy profile as specified in Section 2 and μ𝜇\muitalic_μ is a system of functions representing player’s beliefs in the extensive-form game representation. Then, (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ) is said to be a weak perfect Bayesian equilibrium [25] if g𝑔gitalic_g is sequentially rational to μ𝜇\muitalic_μ and μ𝜇\muitalic_μ satisfies Bayes rule with respect to g𝑔gitalic_g on the equilibrium path. The concept of wPBE does not impose any restriction on beliefs off the equilibrium path.

Example 5.1.

Consider a two-stage game with two players: Bob (B) moves at stage 1; Alice (A) and Bob move simultaneously at stage 2. Let X1A,X1Bsuperscriptsubscript𝑋1𝐴superscriptsubscript𝑋1𝐵X_{1}^{A},X_{1}^{B}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT be independent uniform random variables on {1,+1}11\{-1,+1\}{ - 1 , + 1 } representing the types of the players. The state satisfies X1=(X1A,X1B)subscript𝑋1superscriptsubscript𝑋1𝐴superscriptsubscript𝑋1𝐵X_{1}=(X_{1}^{A},X_{1}^{B})italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) and X2=X1Bsubscript𝑋2superscriptsubscript𝑋1𝐵X_{2}=X_{1}^{B}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT. The set of actions are 𝒰1B={1,+1}superscriptsubscript𝒰1𝐵11\mathcal{U}_{1}^{B}=\{-1,+1\}caligraphic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = { - 1 , + 1 }, 𝒰2A=𝒰2B={1,0,+1}superscriptsubscript𝒰2𝐴superscriptsubscript𝒰2𝐵101\mathcal{U}_{2}^{A}=\mathcal{U}_{2}^{B}=\{-1,0,+1\}caligraphic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = caligraphic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = { - 1 , 0 , + 1 }. The information structure is given by

H1Asuperscriptsubscript𝐻1𝐴\displaystyle H_{1}^{A}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT =X1A,H1B=X1B;formulae-sequenceabsentsuperscriptsubscript𝑋1𝐴superscriptsubscript𝐻1𝐵superscriptsubscript𝑋1𝐵\displaystyle=X_{1}^{A},\quad H_{1}^{B}=X_{1}^{B};= italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ; (11)
H2Asuperscriptsubscript𝐻2𝐴\displaystyle H_{2}^{A}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT =(X1A,U1B),H2B=(X1B,U1B),formulae-sequenceabsentsuperscriptsubscript𝑋1𝐴superscriptsubscript𝑈1𝐵superscriptsubscript𝐻2𝐵superscriptsubscript𝑋1𝐵superscriptsubscript𝑈1𝐵\displaystyle=(X_{1}^{A},U_{1}^{B}),\quad H_{2}^{B}=(X_{1}^{B},U_{1}^{B}),= ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) , italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) , (12)

i.e. types are private and actions are observable.

The instantaneous payoffs of Alice are given by

R1Asuperscriptsubscript𝑅1𝐴\displaystyle R_{1}^{A}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ={1,if U1B=1;0,otherwise,R2A={1,if U2A=X2 or U2A=0;0,otherwise..formulae-sequenceabsentcases1if superscriptsubscript𝑈1𝐵10otherwisesuperscriptsubscript𝑅2𝐴cases1if superscriptsubscript𝑈2𝐴subscript𝑋2 or superscriptsubscript𝑈2𝐴00otherwise\displaystyle=\begin{cases}-1,&\text{if }U_{1}^{B}=-1;\\ 0,&\text{otherwise},\end{cases}\qquad R_{2}^{A}=\begin{cases}1,&\text{if }U_{2% }^{A}=X_{2}\text{ or }U_{2}^{A}=0;\\ 0,&\text{otherwise}.\end{cases}.= { start_ROW start_CELL - 1 , end_CELL start_CELL if italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = - 1 ; end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise , end_CELL end_ROW italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT or italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = 0 ; end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise . end_CELL end_ROW .

The instantaneous payoffs of Bob are given by

R1Bsuperscriptsubscript𝑅1𝐵\displaystyle R_{1}^{B}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ={0.2,if U1B=1;0,otherwise,R2B={1,if U2A=U2B;0,otherwise..formulae-sequenceabsentcases0.2if superscriptsubscript𝑈1𝐵10otherwisesuperscriptsubscript𝑅2𝐵cases1if superscriptsubscript𝑈2𝐴superscriptsubscript𝑈2𝐵0otherwise\displaystyle=\begin{cases}0.2,&\text{if }U_{1}^{B}=-1;\\ 0,&\text{otherwise},\end{cases}\qquad R_{2}^{B}=\begin{cases}-1,&\text{if }U_{% 2}^{A}=U_{2}^{B};\\ 0,&\text{otherwise}.\end{cases}.= { start_ROW start_CELL 0.2 , end_CELL start_CELL if italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = - 1 ; end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise , end_CELL end_ROW italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = { start_ROW start_CELL - 1 , end_CELL start_CELL if italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ; end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise . end_CELL end_ROW .

Define K1A=X1Asuperscriptsubscript𝐾1𝐴superscriptsubscript𝑋1𝐴K_{1}^{A}=X_{1}^{A}italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT and K2A=U1Bsuperscriptsubscript𝐾2𝐴superscriptsubscript𝑈1𝐵K_{2}^{A}=U_{1}^{B}italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT. It can be shown that KAsuperscript𝐾𝐴K^{A}italic_K start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT is unilaterally sufficient information for Alice.444In fact, this example can be seen as an instance of the model described in Example 5.6 which we introduce later. Set KtB=HtBsuperscriptsubscript𝐾𝑡𝐵superscriptsubscript𝐻𝑡𝐵K_{t}^{B}=H_{t}^{B}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT, i.e. no compression for Bob’s information. Then, KBsuperscript𝐾𝐵K^{B}italic_K start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT is trivially unilaterally sufficient information for Bob.

Proposition 6.

In the game defined in Example 5.1, the set of K𝐾Kitalic_K-based wPBE payoffs is a proper subset of that of all wPBE payoffs.

Proof 5.2.

See Appendix D.1.

Note that since any wPBE is first and foremost a BNE, by Theorem 3, any general strategy based wPBE payoff profile can be attained by a K𝐾Kitalic_K-based BNE. However, Proposition 6 implies that there exists a wPBE payoff profile such that none of the K𝐾Kitalic_K-based BNEs attaining this payoff profile are wPBEs.

Intuitively, the reason for some wPBE payoff profiles to be unachievable under K𝐾Kitalic_K-based wPBE payoffs in this example can be explained as follows. The state X1Asuperscriptsubscript𝑋1𝐴X_{1}^{A}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT in this game can be thought of as a private randomization device of Alice that is payoff irrelevant (i.e. a private coin flip) that should not play a role in the outcome of the game. However, under the concept of wPBE, the presence of X1Asuperscriptsubscript𝑋1𝐴X_{1}^{A}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT facilitates Alice to implement off-equilibrium strategies that are otherwise not sequentially rational. This holds due to the following: For a fixed realization of U1Bsuperscriptsubscript𝑈1𝐵U_{1}^{B}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT, the two realizations of X1Asuperscriptsubscript𝑋1𝐴X_{1}^{A}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT give rise to two different information sets. Under the concept of wPBE, if the two information sets are both off equilibrium path, Alice is allowed to form different beliefs and hence justify the use of different mixed actions under different realizations of X1Asuperscriptsubscript𝑋1𝐴X_{1}^{A}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT. Therefore, the presence of X1Asuperscriptsubscript𝑋1𝐴X_{1}^{A}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT can expand Alice’s set of “justifiable” mixed actions off-equilibrium. By restricting Alice to use KAsuperscript𝐾𝐴K^{A}italic_K start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT-based strategies, i.e. choosing her mixed action not depending on X1Asuperscriptsubscript𝑋1𝐴X_{1}^{A}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT, Alice loses the ability to use some mixed actions off-equilibrium in a “justifiable” manner, and hence losing her power to sustain certain equilibrium outcomes. This phenomenon, however, does not happen under the concept of sequential equilibrium, since SE (quite reasonably) would require Alice to use the same belief on two information sets if they only differ in the realization of X1Asuperscriptsubscript𝑋1𝐴X_{1}^{A}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT.

With similar approaches, one can establish the analogue of Proposition 6 for the perfect Bayesian equilibrium concept defined in [57] (which we refer to as “Watson’s PBE”). Simply put, this is since Watson’s PBE imposes conditions on the belief update for each pair of successive information states in a separated manner. There exist no restrictions across different pairs of successive information states. As a result, for a fixed realization of U1Bsuperscriptsubscript𝑈1𝐵U_{1}^{B}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT, Alice is allowed to form different beliefs under two realizations of X1Asuperscriptsubscript𝑋1𝐴X_{1}^{A}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT just like under wPBE as long as both beliefs are reasonable on their own. In fact, in the proof of Proposition 6, the two off-equilibrium belief updates both satisfy Watson’s condition of plain consistency [57].

Approaches similar to those in the proof of Proposition 6, however, do not apply to the PBE concept defined with the independence property of conditional probability systems specified in [6] (which we refer to as “Battigalli’s PBE”). In fact, Battigalli’s PBE is equivalent to sequential equilibrium if the dynamic game consists of only two strategic players [6]. We conjecture that in general games with three or more players, if K𝐾Kitalic_K is USI, then the set of all K𝐾Kitalic_K-based Battigalli’s PBE payoffs is the same as that of all Battigalli’s PBE payoffs. However, establishing this result can be difficult due to the complexity of Battigalli’s conditions.

5.2 Information States in Specific Models

In this section, we identify MSI and USI in specific game models studied in the literature. Whereas we recover some existing results using our framework, we also develop some new results.

Example 5.3.

Consider stateless dynamic games with observable actions, i.e. Xt=,H1i=,Zti=Utformulae-sequencesubscript𝑋𝑡formulae-sequencesuperscriptsubscript𝐻1𝑖superscriptsubscript𝑍𝑡𝑖subscript𝑈𝑡X_{t}=\varnothing,H_{1}^{i}=\varnothing,Z_{t}^{i}=U_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∅ , italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ∅ , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for all i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I. One instance of such games is the class of repeated games [10]. In this game, Hti=U1:t1superscriptsubscript𝐻𝑡𝑖subscript𝑈:1𝑡1H_{t}^{i}=U_{1:t-1}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT for all i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I. Let (ιt0)t𝒯subscriptsuperscriptsubscript𝜄𝑡0𝑡𝒯(\iota_{t}^{0})_{t\in\mathcal{T}}( italic_ι start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT be an arbitrary, common update function and let Ki=K0superscript𝐾𝑖superscript𝐾0K^{i}=K^{0}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_K start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT be generated from (ιt0)t𝒯subscriptsuperscriptsubscript𝜄𝑡0𝑡𝒯(\iota_{t}^{0})_{t\in\mathcal{T}}( italic_ι start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT. Then K𝐾Kitalic_K is mutually sufficient information since Lemma 1 is trivially satisfied. As a result, Theorem 2 holds for K𝐾Kitalic_K, i.e. there exist at least one K𝐾Kitalic_K-based BNE.

However, in general, K𝐾Kitalic_K is not unilaterally sufficient information. To see that, one can consider the case when player ji𝑗𝑖j\neq iitalic_j ≠ italic_i is using a strategy that chooses different mixed actions for different realizations of U1:t1subscript𝑈:1𝑡1U_{1:t-1}italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT. In this case gi,gi(k~t+1i|hti,uti)superscriptsuperscript𝑔𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑘𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathbb{P}^{g^{i},g^{-i}}(\tilde{k}_{t+1}^{i}|h_{t}^{i},u_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) would potentially depend on U1:t1subscript𝑈:1𝑡1U_{1:t-1}italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT as a whole. This means that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is not an information state for player i𝑖iitalic_i under gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, which violates Lemma C.4.

Furthermore for K𝐾Kitalic_K, the result of Theorem 3 does not necessarily hold, i.e. the set of K𝐾Kitalic_K-based BNE payoffs may not be the same as that of all BNE. Example 3.2 can be used to show this.

Example 5.4.

The model of [26] is a special case of our dynamic game model where Zti=(Xt+1,Ut)superscriptsubscript𝑍𝑡𝑖subscript𝑋𝑡1subscript𝑈𝑡Z_{t}^{i}=(X_{t+1},U_{t})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), i.e. the (past and current) states and past actions are observable. In this case, K=(Kti)t𝒯,i𝐾subscriptsuperscriptsubscript𝐾𝑡𝑖formulae-sequence𝑡𝒯𝑖K=(K_{t}^{i})_{t\in\mathcal{T},i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T , italic_i ∈ caligraphic_I end_POSTSUBSCRIPT with Kti=Xtsuperscriptsubscript𝐾𝑡𝑖subscript𝑋𝑡K_{t}^{i}=X_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is mutually sufficient information; note that Hti=(X1:t,U1:t1)superscriptsubscript𝐻𝑡𝑖subscript𝑋:1𝑡subscript𝑈:1𝑡1H_{t}^{i}=(X_{1:t},U_{1:t-1})italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ). Consider a Kisuperscript𝐾𝑖K^{-i}italic_K start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT-based strategy profile ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, i.e. ρtj:𝒳tΔ(𝒰tj):superscriptsubscript𝜌𝑡𝑗maps-tosubscript𝒳𝑡Δsuperscriptsubscript𝒰𝑡𝑗\rho_{t}^{j}\colon\mathcal{X}_{t}\mapsto\Delta(\mathcal{U}_{t}^{j})italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT : caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) for t𝒯,j\{i}formulae-sequence𝑡𝒯𝑗\𝑖t\in\mathcal{T},j\in\mathcal{I}\backslash\{i\}italic_t ∈ caligraphic_T , italic_j ∈ caligraphic_I \ { italic_i }. We have

gi,ρi(x~t,k~ti|hti)superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\mathbb{P}^{g^{i},\rho^{-i}}(\tilde{x}_{t},\tilde{k}_{t}^{-i}|h_{% t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =gi,ρi(x~t,k~ti|x1:t,u1:t1)absentsuperscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑘𝑡𝑖subscript𝑥:1𝑡subscript𝑢:1𝑡1\displaystyle=\mathbb{P}^{g^{i},\rho^{-i}}(\tilde{x}_{t},\tilde{k}_{t}^{-i}|x_% {1:t},u_{1:t-1})= blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ) (13)
=𝟏{x~t=xt}ji𝟏{k~tj=xt}absentsubscript1subscript~𝑥𝑡subscript𝑥𝑡subscriptproduct𝑗𝑖subscript1superscriptsubscript~𝑘𝑡𝑗subscript𝑥𝑡\displaystyle=\bm{1}_{\{\tilde{x}_{t}=x_{t}\}}\prod_{j\neq i}\bm{1}_{\{\tilde{% k}_{t}^{j}=x_{t}\}}= bold_1 start_POSTSUBSCRIPT { over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT { over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } end_POSTSUBSCRIPT (14)
=:Φti,ρi(x~t,k~ti|xt).\displaystyle=:\Phi_{t}^{i,\rho^{-i}}(\tilde{x}_{t},\tilde{k}_{t}^{-i}|x_{t}).= : roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (15)

Hence K𝐾Kitalic_K is mutually sufficient information by Lemma 1. As a result, there exists at least one K𝐾Kitalic_K-based BNE.

Similar to Example 5.3, in general, K𝐾Kitalic_K is not unilaterally sufficient information, and the set of K𝐾Kitalic_K-based BNE payoffs may not be the same as that of all BNE. The argument for both claims can be carried out in an analogous way to Example 5.3.

Example 5.5.

The model of [33] is a special case of our dynamic model satisfying the following conditions.

  1. (1)

    The information of each player i𝑖iitalic_i can be separated into the common information Ht0superscriptsubscript𝐻𝑡0H_{t}^{0}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and private information Ltisuperscriptsubscript𝐿𝑡𝑖L_{t}^{i}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, i.e. there exists a strategy-independent bijection between Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and (Ht0,Lti)superscriptsubscript𝐻𝑡0superscriptsubscript𝐿𝑡𝑖(H_{t}^{0},L_{t}^{i})( italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for all i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I.

  2. (2)

    The common information Ht0superscriptsubscript𝐻𝑡0H_{t}^{0}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT can be sequentially updated, i.e.

    Ht+10superscriptsubscript𝐻𝑡10\displaystyle H_{t+1}^{0}italic_H start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT =(Ht0,Zt0),absentsuperscriptsubscript𝐻𝑡0superscriptsubscript𝑍𝑡0\displaystyle=(H_{t}^{0},Z_{t}^{0}),= ( italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) , (16)

    where Zt0=iZtisuperscriptsubscript𝑍𝑡0subscript𝑖superscriptsubscript𝑍𝑡𝑖Z_{t}^{0}=\bigcap_{i\in\mathcal{I}}Z_{t}^{i}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = ⋂ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the common part of the new information of all players at time t𝑡titalic_t.

  3. (3)

    The private information Ltisuperscriptsubscript𝐿𝑡𝑖L_{t}^{i}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT can be sequentially updated, i.e. there exist functions (ζti)t=0T1superscriptsubscriptsuperscriptsubscript𝜁𝑡𝑖𝑡0𝑇1(\zeta_{t}^{i})_{t=0}^{T-1}( italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT such that

    Lt+1isuperscriptsubscript𝐿𝑡1𝑖\displaystyle L_{t+1}^{i}italic_L start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =ζti(Lti,Zti).absentsuperscriptsubscript𝜁𝑡𝑖superscriptsubscript𝐿𝑡𝑖superscriptsubscript𝑍𝑡𝑖\displaystyle=\zeta_{t}^{i}(L_{t}^{i},Z_{t}^{i}).= italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (17)

In [33], the authors impose the following assumption.

Assumption 2 (Strategy independence of beliefs).

There exist a function Pt0superscriptsubscript𝑃𝑡0P_{t}^{0}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT such that

g(xt,lt|ht0)=Pt0(xt,lt|ht0),superscript𝑔subscript𝑥𝑡conditionalsubscript𝑙𝑡superscriptsubscript𝑡0superscriptsubscript𝑃𝑡0subscript𝑥𝑡conditionalsubscript𝑙𝑡superscriptsubscript𝑡0\displaystyle\mathbb{P}^{g}(x_{t},l_{t}|h_{t}^{0})=P_{t}^{0}(x_{t},l_{t}|h_{t}% ^{0}),blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) = italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) , (18)

for all behavioral strategy profiles g𝑔gitalic_g whenever g(ht0)>0superscript𝑔superscriptsubscript𝑡00\mathbb{P}^{g}(h_{t}^{0})>0blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) > 0, where lt=(lti)isubscript𝑙𝑡subscriptsuperscriptsubscript𝑙𝑡𝑖𝑖l_{t}=(l_{t}^{i})_{i\in\mathcal{I}}italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT.

In this model, if we set Kti=(Πt,Lti)superscriptsubscript𝐾𝑡𝑖subscriptΠ𝑡superscriptsubscript𝐿𝑡𝑖K_{t}^{i}=(\Pi_{t},L_{t}^{i})italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) where ΠtΔ(𝒳t×𝒮t)subscriptΠ𝑡Δsubscript𝒳𝑡subscript𝒮𝑡\Pi_{t}\in\Delta(\mathcal{X}_{t}\times\mathcal{S}_{t})roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is a function of Ht0superscriptsubscript𝐻𝑡0H_{t}^{0}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT defined through

Πt(xt,lt):=Pt0(xt,lt|Ht0),assignsubscriptΠ𝑡subscript𝑥𝑡subscript𝑙𝑡superscriptsubscript𝑃𝑡0subscript𝑥𝑡conditionalsubscript𝑙𝑡superscriptsubscript𝐻𝑡0\Pi_{t}(x_{t},l_{t}):=P_{t}^{0}(x_{t},l_{t}|H_{t}^{0}),roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) := italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) , (19)

then K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT is mutually sufficient information. First note that Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT can be sequentially updated as ΠtsubscriptΠ𝑡\Pi_{t}roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be sequentially updated using Bayes rule. Then

gi,ρi(x~t,l~ti|hti)superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑙𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\mathbb{P}^{g^{i},\rho^{-i}}(\tilde{x}_{t},\tilde{l}_{t}^{-i}|h_{% t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =gi,ρi(x~t,l~ti|ht0,lti)absentsuperscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑙𝑡𝑖superscriptsubscript𝑡0superscriptsubscript𝑙𝑡𝑖\displaystyle=\mathbb{P}^{g^{i},\rho^{-i}}(\tilde{x}_{t},\tilde{l}_{t}^{-i}|h_% {t}^{0},l_{t}^{i})= blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (20)
=gi,ρi(x~t,lti,l~ti|ht0)gi,ρi(lti|ht0)absentsuperscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡superscriptsubscript𝑙𝑡𝑖conditionalsuperscriptsubscript~𝑙𝑡𝑖superscriptsubscript𝑡0superscriptsuperscript𝑔𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑙𝑡𝑖superscriptsubscript𝑡0\displaystyle=\dfrac{\mathbb{P}^{g^{i},\rho^{-i}}(\tilde{x}_{t},l_{t}^{i},% \tilde{l}_{t}^{-i}|h_{t}^{0})}{\mathbb{P}^{g^{i},\rho^{-i}}(l_{t}^{i}|h_{t}^{0% })}= divide start_ARG blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG (21)
=Pt0(x~t,(lti,l~ti)|ht0)x^t,l^tiPt0(x^t,(lti,l^ti)|ht0)absentsuperscriptsubscript𝑃𝑡0subscript~𝑥𝑡conditionalsuperscriptsubscript𝑙𝑡𝑖superscriptsubscript~𝑙𝑡𝑖superscriptsubscript𝑡0subscriptsubscript^𝑥𝑡superscriptsubscript^𝑙𝑡𝑖superscriptsubscript𝑃𝑡0subscript^𝑥𝑡conditionalsuperscriptsubscript𝑙𝑡𝑖superscriptsubscript^𝑙𝑡𝑖superscriptsubscript𝑡0\displaystyle=\dfrac{P_{t}^{0}(\tilde{x}_{t},(l_{t}^{i},\tilde{l}_{t}^{-i})|h_% {t}^{0})}{\sum_{\hat{x}_{t},\hat{l}_{t}^{-i}}P_{t}^{0}(\hat{x}_{t},(l_{t}^{i},% \hat{l}_{t}^{-i})|h_{t}^{0})}= divide start_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG (22)
=πt(x~t,(lti,l~ti))x^t,l^tiπt(x^t,(lti,l^ti))absentsubscript𝜋𝑡subscript~𝑥𝑡superscriptsubscript𝑙𝑡𝑖superscriptsubscript~𝑙𝑡𝑖subscriptsubscript^𝑥𝑡superscriptsubscript^𝑙𝑡𝑖subscript𝜋𝑡subscript^𝑥𝑡superscriptsubscript𝑙𝑡𝑖superscriptsubscript^𝑙𝑡𝑖\displaystyle=\dfrac{\pi_{t}(\tilde{x}_{t},(l_{t}^{i},\tilde{l}_{t}^{-i}))}{% \sum_{\hat{x}_{t},\hat{l}_{t}^{-i}}\pi_{t}(\hat{x}_{t},(l_{t}^{i},\hat{l}_{t}^% {-i}))}= divide start_ARG italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over^ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) ) end_ARG (23)
=:Φ~ti,ρi(x~t,l~ti|kti),\displaystyle=:\tilde{\Phi}_{t}^{i,\rho^{-i}}(\tilde{x}_{t},\tilde{l}_{t}^{-i}% |k_{t}^{i}),= : over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (24)

for some function Φ~ti,ρisuperscriptsubscript~Φ𝑡𝑖superscript𝜌𝑖\tilde{\Phi}_{t}^{i,\rho^{-i}}over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where πtsubscript𝜋𝑡\pi_{t}italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the realization of ΠtsubscriptΠ𝑡\Pi_{t}roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT corresponding to Ht0=ht0superscriptsubscript𝐻𝑡0superscriptsubscript𝑡0H_{t}^{0}=h_{t}^{0}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. In steps (21) and (22) we apply Bayes rule on the conditional probabilities given ht0superscriptsubscript𝑡0h_{t}^{0}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, and we use Assumption 2 to express the belief with the strategy-independent function Pt0superscriptsubscript𝑃𝑡0P_{t}^{0}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.

Note that Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{-i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT is contained in the vector (Kti,Lti)superscriptsubscript𝐾𝑡𝑖superscriptsubscript𝐿𝑡𝑖(K_{t}^{i},L_{t}^{-i})( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ), hence we conclude that

gi,ρi(x~t,k~ti|hti)superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\mathbb{P}^{g^{i},\rho^{-i}}(\tilde{x}_{t},\tilde{k}_{t}^{-i}|h_{% t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =:Φti,ρi(x~t,k~ti|kti),\displaystyle=:\Phi_{t}^{i,\rho^{-i}}(\tilde{x}_{t},\tilde{k}_{t}^{-i}|k_{t}^{% i}),= : roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (25)

for some function Φti,ρisuperscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖\Phi_{t}^{i,\rho^{-i}}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. By Lemma 1 we conclude that K𝐾Kitalic_K is mutually sufficient information. Therefore there exists at least one K𝐾Kitalic_K-based BNE.

Similar to Examples 5.3 and 5.4, in general, K𝐾Kitalic_K is not unilaterally sufficient information, and the set of K𝐾Kitalic_K-based BNE payoffs may not be the same as that of all BNE. The argument for both claims can be carried out in an analogous way to Examples 5.3 and 5.4.

Example 5.6.

The following model is a variant of [36] and [56].

  • Each player i𝑖iitalic_i is associated with a local state Xtisuperscriptsubscript𝑋𝑡𝑖X_{t}^{i}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and Xt=(Xti)isubscript𝑋𝑡subscriptsuperscriptsubscript𝑋𝑡𝑖𝑖X_{t}=(X_{t}^{i})_{i\in\mathcal{I}}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT.

  • Each player i𝑖iitalic_i is associated with a local noise process Wtisuperscriptsubscript𝑊𝑡𝑖W_{t}^{i}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and Wt=(Wti)isubscript𝑊𝑡subscriptsuperscriptsubscript𝑊𝑡𝑖𝑖W_{t}=(W_{t}^{i})_{i\in\mathcal{I}}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT.

  • There is no initial information, i.e. H1i=superscriptsubscript𝐻1𝑖H_{1}^{i}=\varnothingitalic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ∅ for all i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I.

  • There is a public noisy observation Ytisuperscriptsubscript𝑌𝑡𝑖Y_{t}^{i}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT of the local state. The state transitions, observation processes, and reward generation processes satisfy the following:

    (Xt+1i,Yti)superscriptsubscript𝑋𝑡1𝑖superscriptsubscript𝑌𝑡𝑖\displaystyle(X_{t+1}^{i},Y_{t}^{i})( italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =fti(Xti,Ut,Wti),i,formulae-sequenceabsentsuperscriptsubscript𝑓𝑡𝑖superscriptsubscript𝑋𝑡𝑖subscript𝑈𝑡superscriptsubscript𝑊𝑡𝑖for-all𝑖\displaystyle=f_{t}^{i}(X_{t}^{i},U_{t},W_{t}^{i}),\quad\forall i\in\mathcal{I},= italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_i ∈ caligraphic_I , (26)
    Rtisuperscriptsubscript𝑅𝑡𝑖\displaystyle R_{t}^{i}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =rti(Xt,Ut),i.formulae-sequenceabsentsuperscriptsubscript𝑟𝑡𝑖subscript𝑋𝑡subscript𝑈𝑡for-all𝑖\displaystyle=r_{t}^{i}(X_{t},U_{t}),\quad\forall i\in\mathcal{I}.= italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , ∀ italic_i ∈ caligraphic_I . (27)
  • The information player i𝑖iitalic_i has at time t𝑡titalic_t is Hti=(Y1:t1,U1:t1,X1:ti)superscriptsubscript𝐻𝑡𝑖subscript𝑌:1𝑡1subscript𝑈:1𝑡1superscriptsubscript𝑋:1𝑡𝑖H_{t}^{i}=(Y_{1:t-1},U_{1:t-1},X_{1:t}^{i})italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_Y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I, where Yt=(Yti)isubscript𝑌𝑡subscriptsuperscriptsubscript𝑌𝑡𝑖𝑖Y_{t}=(Y_{t}^{i})_{i\in\mathcal{I}}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT.

  • All the primitive random variables, i.e. the random variables in the collection (X1i)i(Wti)i,t𝒯subscriptsuperscriptsubscript𝑋1𝑖𝑖subscriptsuperscriptsubscript𝑊𝑡𝑖formulae-sequence𝑖𝑡𝒯(X_{1}^{i})_{i\in\mathcal{I}}\cup(W_{t}^{i})_{i\in\mathcal{I},t\in\mathcal{T}}( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT ∪ ( italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T end_POSTSUBSCRIPT, are mutually independent.

Proposition 7.

In the model of Example 5.6, Kti=(Y1:t1,U1:t1,Xti)superscriptsubscript𝐾𝑡𝑖subscript𝑌:1𝑡1subscript𝑈:1𝑡1superscriptsubscript𝑋𝑡𝑖K_{t}^{i}=(Y_{1:t-1},U_{1:t-1},X_{t}^{i})italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_Y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is unilaterally sufficient information.555Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategies in this setting are closely related to the “strategies of type s𝑠sitalic_s” defined in [56]. In [56], the authors showed that strategy profiles of type s𝑠sitalic_s can attain all equilibrium payoffs attainable by general strategy profiles. However, the authors did not show that strategy profiles of type s𝑠sitalic_s can do so while being an equilibrium.

Proof 5.7.

See Appendix D.2.

Finally, we note that the concept of USI is useful in the context of games among teams as well. We omit the details of the following example due to its complicated nature.

Example 5.8.

In the model of games among teams with delayed intra-team information sharing analyzed in [51], the authors defined the notion of sufficient private information (SPI). It can be shown (through the arguments in [51, Section 4.3] and [50, Chapters 4.6.1 and 4.6.2]) that Kti=(Ht0,Sti)superscriptsubscript𝐾𝑡𝑖superscriptsubscript𝐻𝑡0superscriptsubscript𝑆𝑡𝑖K_{t}^{i}=(H_{t}^{0},S_{t}^{i})italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), which consists of the common information Ht0superscriptsubscript𝐻𝑡0H_{t}^{0}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and the SPI Stisuperscriptsubscript𝑆𝑡𝑖S_{t}^{i}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, is unilaterally sufficient information.

6 An Open Problem

Identifying strategy-dependent compression maps that guarantee existence of at least one equilibrium (BNE or SE) or maintain all equilibria that exist under perfect recall is an open problem.

A known strategy-dependent compression map is one that compress separately first each agent’s private information, (resulting in “sufficient private information”), and then the agents’ common information (resulting in “common information based (CIB) beliefs” on the system state and the agents’ sufficient private information [35, 36, 56, 51, 37]). Such a compression does not possess any of the properties of the strategy-independent compression maps that result in MSI or USI. The following example presents a game where belief-based equilibria, i.e. equilibrium strategy profiles based on the above-described compression, do not exist.

Example 6.1.

Consider the following two-stage zero-sum game. The players are Alice (A) and Bob (B). Alice acts at stage t=1𝑡1t=1italic_t = 1 and Bob at stage t=2𝑡2t=2italic_t = 2. The game’s initial state X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is distributed uniformly at random on {1,+1}11\{-1,+1\}{ - 1 , + 1 }. Let HtA,HtBsuperscriptsubscript𝐻𝑡𝐴superscriptsubscript𝐻𝑡𝐵H_{t}^{A},H_{t}^{B}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT denote Alice’s and Bob’s information at stage t𝑡titalic_t, and UtA,UtBsuperscriptsubscript𝑈𝑡𝐴superscriptsubscript𝑈𝑡𝐵U_{t}^{A},U_{t}^{B}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT denote Alice’s and Bob’s actions at stage t𝑡titalic_t, t=1,2𝑡12t=1,2italic_t = 1 , 2. We assume that H1A=X1,H1B=formulae-sequencesuperscriptsubscript𝐻1𝐴subscript𝑋1superscriptsubscript𝐻1𝐵H_{1}^{A}=X_{1},H_{1}^{B}=\varnothingitalic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = ∅, i.e. Alice knows X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Bob does not. At stage t=1𝑡1t=1italic_t = 1, Alice chooses U1A{1,1}superscriptsubscript𝑈1𝐴11U_{1}^{A}\in\{-1,1\}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ∈ { - 1 , 1 }, and the state transition is given by X2=X1U1Asubscript𝑋2subscript𝑋1superscriptsubscript𝑈1𝐴X_{2}=X_{1}\cdot U_{1}^{A}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT. At stage t=2𝑡2t=2italic_t = 2, we assume that H2A=(X1:2,U1A)superscriptsubscript𝐻2𝐴subscript𝑋:12superscriptsubscript𝑈1𝐴H_{2}^{A}=(X_{1:2},U_{1}^{A})italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 : 2 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ) and H2B=U1Asuperscriptsubscript𝐻2𝐵superscriptsubscript𝑈1𝐴H_{2}^{B}=U_{1}^{A}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT, i.e. Bob observes Alice’s action but not the state before or after Alice’s action. At time t=2𝑡2t=2italic_t = 2, Bob picks an action U2B{U,D}superscriptsubscript𝑈2𝐵UDU_{2}^{B}\in\{\mathrm{U},\mathrm{D}\}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ∈ { roman_U , roman_D }. Alice’s instantaneous rewards are given by

R1A={cif U1A=+1;0if U1A=1,andR2A={2if X2=+1,U2B=U;1if X2=1,U2B=D;0otherwise,formulae-sequencesuperscriptsubscript𝑅1𝐴cases𝑐if superscriptsubscript𝑈1𝐴10if superscriptsubscript𝑈1𝐴1andsuperscriptsubscript𝑅2𝐴cases2formulae-sequenceif subscript𝑋21superscriptsubscript𝑈2𝐵U1formulae-sequenceif subscript𝑋21superscriptsubscript𝑈2𝐵D0otherwise\displaystyle R_{1}^{A}=\begin{cases}c&\text{if }U_{1}^{A}=+1;\\ 0&\text{if }U_{1}^{A}=-1,\end{cases}\qquad\text{and}\qquad R_{2}^{A}=\begin{% cases}2&\text{if }X_{2}=+1,U_{2}^{B}=\mathrm{U};\\ 1&\text{if }X_{2}=-1,U_{2}^{B}=\mathrm{D};\\ 0&\text{otherwise},\end{cases}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = { start_ROW start_CELL italic_c end_CELL start_CELL if italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = + 1 ; end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL if italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = - 1 , end_CELL end_ROW and italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = { start_ROW start_CELL 2 end_CELL start_CELL if italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = + 1 , italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = roman_U ; end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL if italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - 1 , italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = roman_D ; end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise , end_CELL end_ROW (28)

where c(0,1/3)𝑐013c\in(0,1/3)italic_c ∈ ( 0 , 1 / 3 ). The stage reward for Bob is RtB=RtAsuperscriptsubscript𝑅𝑡𝐵superscriptsubscript𝑅𝑡𝐴R_{t}^{B}=-R_{t}^{A}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = - italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT for t=1,2𝑡12t=1,2italic_t = 1 , 2.

The above game is a signaling game which can be represented in extensive form as in Figure 2.

AliceAlice(1,1)11(1,-1)( 1 , - 1 )(0,0)00(0,0)( 0 , 0 )(0,0)00(0,0)( 0 , 0 )(2,2)22(2,-2)( 2 , - 2 )(c,c)𝑐𝑐(c,-c)( italic_c , - italic_c )(2+c,2c)2𝑐2𝑐(2+c,-2-c)( 2 + italic_c , - 2 - italic_c )(1+c,1c)1𝑐1𝑐(1+c,-1-c)( 1 + italic_c , - 1 - italic_c )(c,c)𝑐𝑐(c,-c)( italic_c , - italic_c )+11+1+ 1[0.5]delimited-[]0.5[0.5][ 0.5 ]N11-1- 1[0.5]delimited-[]0.5[0.5][ 0.5 ]BobBob11-1- 111-1- 1+11+1+ 1+11+1+ 1DUDUDUDU
Figure 2: Extensive form of the game in Example 6.1.

In order to define the concept of belief based equilibrium for this game, we specify the common information Ht0superscriptsubscript𝐻𝑡0H_{t}^{0}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, along with Alice’s and Bob’s private information, denoted by LtA,LtBsuperscriptsubscript𝐿𝑡𝐴superscriptsubscript𝐿𝑡𝐵L_{t}^{A},L_{t}^{B}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT, respectively, for t=1,2𝑡12t=1,2italic_t = 1 , 2 as follows:

H10superscriptsubscript𝐻10\displaystyle H_{1}^{0}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT =,L1A=X1,L1B=,formulae-sequenceabsentformulae-sequencesuperscriptsubscript𝐿1𝐴subscript𝑋1superscriptsubscript𝐿1𝐵\displaystyle=\varnothing,\quad L_{1}^{A}=X_{1},\quad L_{1}^{B}=\varnothing,= ∅ , italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = ∅ , (29)
H20superscriptsubscript𝐻20\displaystyle H_{2}^{0}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT =U1A,L2A=X2,L2B=.formulae-sequenceabsentsuperscriptsubscript𝑈1𝐴formulae-sequencesuperscriptsubscript𝐿2𝐴subscript𝑋2superscriptsubscript𝐿2𝐵\displaystyle=U_{1}^{A},\quad L_{2}^{A}=X_{2},\quad L_{2}^{B}=\varnothing.= italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = ∅ . (30)

We prove the following result.

Proposition 8.

In the game of Example 6.1 belief-based equilibria do not exist.

Proof 6.2.

See Appendix D.3.

7 Conclusion

In this paper, we investigated sufficient conditions for strategy-independent compression maps to be viable in dynamic games. Motivated by the literature on information states for control problems [23, 24, 48], we provided two notions of information state, both resulting in from strategy-independent information compression maps for dynamic games, namely mutually sufficient information (MSI) and unilaterally sufficient information (USI). While MSI guarantees the existence of compression-based equilibria, USI guarantees that compression-based equilibria can attain all equilibrium payoff profiles that are achieved when all agents have perfect recall. We established the results under both the concepts of Bayes-Nash equilibrium and sequential equilibrium. We discussed how USI does not guarantee the preservation of payoff profiles under certain other equilibrium refinements. We considered a strategy-depedent compression map that results in sufficient private information, for each agent, along with a CIB belief. We showed, by an example, that this information compression map does not possess any of the properties of the strategy-independent compression maps that result in MSI or USI.

The discovery of strategy-dependent information compression maps that lead to results similar to those of Theorem 2 and 4 or to those of Theorems 3 and 5 is a challenging open problem of paramount importance. Another important open problem is the discovery of information compression maps under which certain subsets of equilibrium payoff profiles are attained when strategies based on the resulting compressed information are used. The results of this paper have been derived for finite-horizon finite games. The extension of the results to infinite-horizon games and to games with continuous action and state spaces are other interesting technical problems.

Author Contributions

This work is a collaborative intellectual effort of the three authors, with Dengwang Tang being the leader. Due to the interconnected nature of the results, it is impossible to separate the contributions of each author.

Funding

This work is supported by National Science Foundation (NSF) Grant No. ECCS 1750041, ECCS 2025732, ECCS 2038416, ECCS 1608361, CCF 2008130, CMMI 2240981, Army Research Office (ARO) Award No. W911NF-17-1-0232, and Michigan Institute for Data Science (MIDAS) Sponsorship Funds by General Dynamics.

Data Availability

Not applicable since all results in this paper are theoretical.

Declarations

Conflict of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical Approval

Not applicable since no experiments are involved in this work.

Appendix A Information State of Single-Agent Control Problems

In this section we consider single-agent Markov Decision Processes (MDPs) and develop auxiliary results. This section is a recap of [24] with more detailed results and proofs.

Let Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be a controlled Markov Chain controlled by action Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with initial distribution ν1Δ(𝒳1)subscript𝜈1Δsubscript𝒳1\nu_{1}\in\Delta(\mathcal{X}_{1})italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and transition kernels P=(Pt)t𝒯,Pt:𝒳t×𝒰tΔ(𝒳t+1):𝑃subscriptsubscript𝑃𝑡𝑡𝒯subscript𝑃𝑡maps-tosubscript𝒳𝑡subscript𝒰𝑡Δsubscript𝒳𝑡1P=(P_{t})_{t\in\mathcal{T}},P_{t}\colon\mathcal{X}_{t}\times\mathcal{U}_{t}% \mapsto\Delta(\mathcal{X}_{t+1})italic_P = ( italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ). Let r=(rt)t𝒯,rt:𝒳t×𝒰t:𝑟subscriptsubscript𝑟𝑡𝑡𝒯subscript𝑟𝑡maps-tosubscript𝒳𝑡subscript𝒰𝑡r=(r_{t})_{t\in\mathcal{T}},r_{t}\colon\mathcal{X}_{t}\times\mathcal{U}_{t}% \mapsto\mathbb{R}italic_r = ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ blackboard_R be a collection of instantaneous reward functions. An MDP is denoted by a tuple (ν1,P,r)subscript𝜈1𝑃𝑟(\nu_{1},P,r)( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P , italic_r ).

For a Markov strategy g=(gt)t𝒯,gt:𝒳tΔ(𝒰t):𝑔subscriptsubscript𝑔𝑡𝑡𝒯subscript𝑔𝑡maps-tosubscript𝒳𝑡Δsubscript𝒰𝑡g=(g_{t})_{t\in\mathcal{T}},g_{t}\colon\mathcal{X}_{t}\mapsto\Delta(\mathcal{U% }_{t})italic_g = ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), we use g,ν1,Psuperscript𝑔subscript𝜈1𝑃\mathbb{P}^{g,\nu_{1},P}blackboard_P start_POSTSUPERSCRIPT italic_g , italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT and 𝔼g,ν1,Psuperscript𝔼𝑔subscript𝜈1𝑃\mathbb{E}^{g,\nu_{1},P}blackboard_E start_POSTSUPERSCRIPT italic_g , italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT to denote the probabilities of events and expectations of random variables under the distribution specified by controlled Markov Chain (ν1,P)subscript𝜈1𝑃(\nu_{1},P)( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P ) and strategy g𝑔gitalic_g. When (ν1,P)subscript𝜈1𝑃(\nu_{1},P)( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P ) is fixed and clear from the context, we use gsuperscript𝑔\mathbb{P}^{g}blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT and 𝔼gsuperscript𝔼𝑔\mathbb{E}^{g}blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT respectively.

Define the total expected reward in the MDP (ν1,P,r)subscript𝜈1𝑃𝑟(\nu_{1},P,r)( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P , italic_r ) under strategy g𝑔gitalic_g by

J(g;ν1,P,r):=𝔼g,ν1,P[t=1Trt(Xt,Ut)].assign𝐽𝑔subscript𝜈1𝑃𝑟superscript𝔼𝑔subscript𝜈1𝑃delimited-[]superscriptsubscript𝑡1𝑇subscript𝑟𝑡subscript𝑋𝑡subscript𝑈𝑡\displaystyle J(g;\nu_{1},P,r):=\mathbb{E}^{g,\nu_{1},P}\left[\sum_{t=1}^{T}r_% {t}(X_{t},U_{t})\right].italic_J ( italic_g ; italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P , italic_r ) := blackboard_E start_POSTSUPERSCRIPT italic_g , italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] . (31)

Define the value function and state-action quality function by

Vτ(xτ;P,r)subscript𝑉𝜏subscript𝑥𝜏𝑃𝑟\displaystyle V_{\tau}(x_{\tau};P,r)italic_V start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ; italic_P , italic_r ) :=maxgτ:T𝔼gτ:T,P[t=τTrt(Xt,Ut)|xτ],τ[T+1],formulae-sequenceassignabsentsubscriptsubscript𝑔:𝜏𝑇superscript𝔼subscript𝑔:𝜏𝑇𝑃delimited-[]conditionalsuperscriptsubscript𝑡𝜏𝑇subscript𝑟𝑡subscript𝑋𝑡subscript𝑈𝑡subscript𝑥𝜏for-all𝜏delimited-[]𝑇1\displaystyle:=\max_{g_{\tau:T}}\mathbb{E}^{g_{\tau:T},P}\left[\sum_{t=\tau}^{% T}r_{t}(X_{t},U_{t})|x_{\tau}\right],\qquad\forall\tau\in[T+1],:= roman_max start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ] , ∀ italic_τ ∈ [ italic_T + 1 ] , (32)
Qτ(xτ,uτ;P,r)subscript𝑄𝜏subscript𝑥𝜏subscript𝑢𝜏𝑃𝑟\displaystyle Q_{\tau}(x_{\tau},u_{\tau};P,r)italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ; italic_P , italic_r ) :=rτ(xτ,uτ)+x~τ+1Vτ+1(x~τ+1)Pτ(x~τ+1|xτ,uτ),τ[T].formulae-sequenceassignabsentsubscript𝑟𝜏subscript𝑥𝜏subscript𝑢𝜏subscriptsubscript~𝑥𝜏1subscript𝑉𝜏1subscript~𝑥𝜏1subscript𝑃𝜏conditionalsubscript~𝑥𝜏1subscript𝑥𝜏subscript𝑢𝜏for-all𝜏delimited-[]𝑇\displaystyle:=r_{\tau}(x_{\tau},u_{\tau})+\sum_{\tilde{x}_{\tau+1}}V_{\tau+1}% (\tilde{x}_{\tau+1})P_{\tau}(\tilde{x}_{\tau+1}|x_{\tau},u_{\tau}),\qquad% \forall\tau\in[T].:= italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_τ + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ + 1 end_POSTSUBSCRIPT ) italic_P start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ + 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) , ∀ italic_τ ∈ [ italic_T ] . (33)

Note that VT+1(;P,r)0subscript𝑉𝑇1𝑃𝑟0V_{T+1}(\cdot;P,r)\equiv 0italic_V start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ( ⋅ ; italic_P , italic_r ) ≡ 0.

Definition 9.

[24] Let Kt=Ψt(Xt)subscript𝐾𝑡subscriptΨ𝑡subscript𝑋𝑡K_{t}=\Psi_{t}(X_{t})italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for some function ΨtsubscriptΨ𝑡\Psi_{t}roman_Ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then, Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is called an information state for (P,r)𝑃𝑟(P,r)( italic_P , italic_r ) if there exist functions PtK:𝒦t×𝒰tΔ(𝒦t+1),rtK:𝒦t×𝒰t:superscriptsubscript𝑃𝑡𝐾maps-tosubscript𝒦𝑡subscript𝒰𝑡Δsubscript𝒦𝑡1superscriptsubscript𝑟𝑡𝐾:maps-tosubscript𝒦𝑡subscript𝒰𝑡P_{t}^{K}\colon\mathcal{K}_{t}\times\mathcal{U}_{t}\mapsto\Delta(\mathcal{K}_{% t+1}),r_{t}^{K}\colon\mathcal{K}_{t}\times\mathcal{U}_{t}\mapsto\mathbb{R}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ roman_Δ ( caligraphic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ blackboard_R such that

  1. (1)

    Pt(kt+1|xt,ut)=PtK(kt+1|Ψt(xt),ut)subscript𝑃𝑡conditionalsubscript𝑘𝑡1subscript𝑥𝑡subscript𝑢𝑡superscriptsubscript𝑃𝑡𝐾conditionalsubscript𝑘𝑡1subscriptΨ𝑡subscript𝑥𝑡subscript𝑢𝑡P_{t}(k_{t+1}|x_{t},u_{t})=P_{t}^{K}(k_{t+1}|\Psi_{t}(x_{t}),u_{t})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | roman_Ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ); and

  2. (2)

    rt(xt,ut)=rtK(Ψt(xt),ut)subscript𝑟𝑡subscript𝑥𝑡subscript𝑢𝑡superscriptsubscript𝑟𝑡𝐾subscriptΨ𝑡subscript𝑥𝑡subscript𝑢𝑡r_{t}(x_{t},u_{t})=r_{t}^{K}(\Psi_{t}(x_{t}),u_{t})italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( roman_Ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).

If Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is an information state, then Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is also a controlled Markov Chain with initial distribution ν1KΔ(K1)superscriptsubscript𝜈1𝐾Δsubscript𝐾1\nu_{1}^{K}\in\Delta(K_{1})italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∈ roman_Δ ( italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and transition kernel PK=(PtK)t𝒯superscript𝑃𝐾subscriptsuperscriptsubscript𝑃𝑡𝐾𝑡𝒯P^{K}=(P_{t}^{K})_{t\in\mathcal{T}}italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT = ( italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT, where

ν1K(k1)=x1𝟏{k1=Ψ1(x1)}ν1(x1).superscriptsubscript𝜈1𝐾subscript𝑘1subscriptsubscript𝑥1subscript1subscript𝑘1subscriptΨ1subscript𝑥1subscript𝜈1subscript𝑥1\displaystyle\nu_{1}^{K}(k_{1})=\sum_{x_{1}}\bm{1}_{\{k_{1}=\Psi_{1}(x_{1})\}}% \nu_{1}(x_{1}).italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT { italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) } end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) .

The tuple (ν1K,PK,rK)superscriptsubscript𝜈1𝐾superscript𝑃𝐾superscript𝑟𝐾(\nu_{1}^{K},P^{K},r^{K})( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ) defines a new MDP. For a K𝐾Kitalic_K-based strategy ρ=(ρt)t𝒯,ρt:𝒦tΔ(𝒰t):𝜌subscriptsubscript𝜌𝑡𝑡𝒯subscript𝜌𝑡maps-tosubscript𝒦𝑡Δsubscript𝒰𝑡\rho=(\rho_{t})_{t\in\mathcal{T}},\rho_{t}\colon\mathcal{K}_{t}\mapsto\Delta(% \mathcal{U}_{t})italic_ρ = ( italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), the J,V,Q𝐽𝑉𝑄J,V,Qitalic_J , italic_V , italic_Q functions can be defined as above for the new MDP.

We state the following standard result (see, for example, Section 2 of [48]).

Lemma A.1.

Let Kt=Ψt(Xt)subscript𝐾𝑡subscriptΨ𝑡subscript𝑋𝑡K_{t}=\Psi_{t}(X_{t})italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) be an information state for (P,r)𝑃𝑟(P,r)( italic_P , italic_r ). Then

  1. (1)

    Vt(xt;P,r)=Vt(Ψt(xt);PK,rK)subscript𝑉𝑡subscript𝑥𝑡𝑃𝑟subscript𝑉𝑡subscriptΨ𝑡subscript𝑥𝑡superscript𝑃𝐾superscript𝑟𝐾V_{t}(x_{t};P,r)=V_{t}(\Psi_{t}(x_{t});P^{K},r^{K})italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_P , italic_r ) = italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ; italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ) for all xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT;

  2. (2)

    Qt(xt,ut;P,r)=Qt(Ψt(xt),ut;PK,rK)subscript𝑄𝑡subscript𝑥𝑡subscript𝑢𝑡𝑃𝑟subscript𝑄𝑡subscriptΨ𝑡subscript𝑥𝑡subscript𝑢𝑡superscript𝑃𝐾superscript𝑟𝐾Q_{t}(x_{t},u_{t};P,r)=Q_{t}(\Psi_{t}(x_{t}),u_{t};P^{K},r^{K})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_P , italic_r ) = italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( roman_Ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_P start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ) for all xt,utsubscript𝑥𝑡subscript𝑢𝑡x_{t},u_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Definition 10.

Let g𝑔gitalic_g be a Markov strategy, an K𝐾Kitalic_K-based strategy ρ𝜌\rhoitalic_ρ is said to be associated with g𝑔gitalic_g if

ρt(kt)=𝔼g,ν1,P[gt(Xt)|kt],subscript𝜌𝑡subscript𝑘𝑡superscript𝔼𝑔subscript𝜈1𝑃delimited-[]conditionalsubscript𝑔𝑡subscript𝑋𝑡subscript𝑘𝑡\displaystyle\rho_{t}(k_{t})=\mathbb{E}^{g,\nu_{1},P}[g_{t}(X_{t})|k_{t}],italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUPERSCRIPT italic_g , italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT [ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] , (34)

whenever g,ν1,P(kt)>0superscript𝑔subscript𝜈1𝑃subscript𝑘𝑡0\mathbb{P}^{g,\nu_{1},P}(k_{t})>0blackboard_P start_POSTSUPERSCRIPT italic_g , italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) > 0.

The following lemma will be used in the proofs in Appendix C.

Lemma A.2 (Policy Equivalence Lemma).

Let (ν1,P,r)subscript𝜈1𝑃𝑟(\nu_{1},P,r)( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P , italic_r ) be an MDP. Let Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be an information state for (P,r)𝑃𝑟(P,r)( italic_P , italic_r ). Let an K𝐾Kitalic_K-based strategy ρ𝜌\rhoitalic_ρ be associated with a Markov strategy g𝑔gitalic_g, then

  1. (1)

    g,ν1,P(kt)=ρ,ν1,P(kt)superscript𝑔subscript𝜈1𝑃subscript𝑘𝑡superscript𝜌subscript𝜈1𝑃subscript𝑘𝑡\mathbb{P}^{g,\nu_{1},P}(k_{t})=\mathbb{P}^{\rho,\nu_{1},P}(k_{t})blackboard_P start_POSTSUPERSCRIPT italic_g , italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_ρ , italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for all kt𝒦tsubscript𝑘𝑡subscript𝒦𝑡k_{t}\in\mathcal{K}_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T;

  2. (2)

    J(g;ν1,P,r)=J(ρ;ν1,P,r)𝐽𝑔subscript𝜈1𝑃𝑟𝐽𝜌subscript𝜈1𝑃𝑟J(g;\nu_{1},P,r)=J(\rho;\nu_{1},P,r)italic_J ( italic_g ; italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P , italic_r ) = italic_J ( italic_ρ ; italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P , italic_r ).

Proof A.3.

In this proof all probabilities and expectations are assumed to be defined with (ν1,P)subscript𝜈1𝑃(\nu_{1},P)( italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P ). Given a Markov strategy g𝑔gitalic_g, let ρ𝜌\rhoitalic_ρ be an information state-based strategy that satisfies (34).

First, we have

g(ut|kt)=𝔼g[gt(ut|Xt)|kt]=ρt(ut|kt),superscript𝑔conditionalsubscript𝑢𝑡subscript𝑘𝑡superscript𝔼𝑔delimited-[]conditionalsubscript𝑔𝑡conditionalsubscript𝑢𝑡subscript𝑋𝑡subscript𝑘𝑡subscript𝜌𝑡conditionalsubscript𝑢𝑡subscript𝑘𝑡\displaystyle\mathbb{P}^{g}(u_{t}|k_{t})=\mathbb{E}^{g}[g_{t}(u_{t}|X_{t})|k_{% t}]=\rho_{t}(u_{t}|k_{t}),blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (35)

for all ktsubscript𝑘𝑡k_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT such that g(kt)>0superscript𝑔subscript𝑘𝑡0\mathbb{P}^{g}(k_{t})>0blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) > 0.

  1. (1)

    Proof by induction:

    Induction Base: We have g(k1)=ρ(k1)superscript𝑔subscript𝑘1superscript𝜌subscript𝑘1\mathbb{P}^{g}(k_{1})=\mathbb{P}^{\rho}(k_{1})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) since the distribution of K1=Ψ1(X1)subscript𝐾1subscriptΨ1subscript𝑋1K_{1}=\Psi_{1}(X_{1})italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) is strategy-independent.

    Induction Step: Suppose that

    g(kt)=ρ(kt),superscript𝑔subscript𝑘𝑡superscript𝜌subscript𝑘𝑡\displaystyle\mathbb{P}^{g}(k_{t})=\mathbb{P}^{\rho}(k_{t}),blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (36)

    for all kt𝒦tsubscript𝑘𝑡subscript𝒦𝑡k_{t}\in\mathcal{K}_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. We prove the result for time t+1𝑡1t+1italic_t + 1. Combining (35) and (36), and incorporating the information state transition kernel PtKsuperscriptsubscript𝑃𝑡𝐾P_{t}^{K}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT defined in Definition 9, we have

    g(kt+1)superscript𝑔subscript𝑘𝑡1\displaystyle\mathbb{P}^{g}(k_{t+1})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) =k~t,u~tg(kt+1|k~t,u~t)g(u~t|k~t)g(k~t)absentsubscriptsubscript~𝑘𝑡subscript~𝑢𝑡superscript𝑔conditionalsubscript𝑘𝑡1subscript~𝑘𝑡subscript~𝑢𝑡superscript𝑔conditionalsubscript~𝑢𝑡subscript~𝑘𝑡superscript𝑔subscript~𝑘𝑡\displaystyle=\sum_{\tilde{k}_{t},\tilde{u}_{t}}\mathbb{P}^{g}(k_{t+1}|\tilde{% k}_{t},\tilde{u}_{t})\mathbb{P}^{g}(\tilde{u}_{t}|\tilde{k}_{t})\mathbb{P}^{g}% (\tilde{k}_{t})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (37)
    =k~t,u~tPtK(kt+1|k~t,u~t)ρt(ut|k~t)ρ(k~t)absentsubscriptsubscript~𝑘𝑡subscript~𝑢𝑡superscriptsubscript𝑃𝑡𝐾conditionalsubscript𝑘𝑡1subscript~𝑘𝑡subscript~𝑢𝑡subscript𝜌𝑡conditionalsubscript𝑢𝑡subscript~𝑘𝑡superscript𝜌subscript~𝑘𝑡\displaystyle=\sum_{\tilde{k}_{t},\tilde{u}_{t}}P_{t}^{K}(k_{t+1}|\tilde{k}_{t% },\tilde{u}_{t})\rho_{t}(u_{t}|\tilde{k}_{t})\mathbb{P}^{\rho}(\tilde{k}_{t})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (38)
    =ρ(kt+1).absentsuperscript𝜌subscript𝑘𝑡1\displaystyle=\mathbb{P}^{\rho}(k_{t+1}).= blackboard_P start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) . (39)

    Therefore we have established the induction step.

  2. (2)

    Using (35)(36) along with the result of part (1), we obtain

    𝔼g[rt(Xt,Ut)]superscript𝔼𝑔delimited-[]subscript𝑟𝑡subscript𝑋𝑡subscript𝑈𝑡\displaystyle\mathbb{E}^{g}[r_{t}(X_{t},U_{t})]blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] =𝔼g[rtK(Kt,Ut)]absentsuperscript𝔼𝑔delimited-[]superscriptsubscript𝑟𝑡𝐾subscript𝐾𝑡subscript𝑈𝑡\displaystyle=\mathbb{E}^{g}[r_{t}^{K}(K_{t},U_{t})]= blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] (40)
    =k~t,u~trtK(k~t,u~t)g(u~t|k~t)g(k~t)absentsubscriptsubscript~𝑘𝑡subscript~𝑢𝑡superscriptsubscript𝑟𝑡𝐾subscript~𝑘𝑡subscript~𝑢𝑡superscript𝑔conditionalsubscript~𝑢𝑡subscript~𝑘𝑡superscript𝑔subscript~𝑘𝑡\displaystyle=\sum_{\tilde{k}_{t},\tilde{u}_{t}}r_{t}^{K}(\tilde{k}_{t},\tilde% {u}_{t})\mathbb{P}^{g}(\tilde{u}_{t}|\tilde{k}_{t})\mathbb{P}^{g}(\tilde{k}_{t})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (41)
    =k~t,u~trtK(k~t,u~t)ρt(u~t|k~t)ρ(k~t)absentsubscriptsubscript~𝑘𝑡subscript~𝑢𝑡superscriptsubscript𝑟𝑡𝐾subscript~𝑘𝑡subscript~𝑢𝑡subscript𝜌𝑡conditionalsubscript~𝑢𝑡subscript~𝑘𝑡superscript𝜌subscript~𝑘𝑡\displaystyle=\sum_{\tilde{k}_{t},\tilde{u}_{t}}r_{t}^{K}(\tilde{k}_{t},\tilde% {u}_{t})\rho_{t}(\tilde{u}_{t}|\tilde{k}_{t})\mathbb{P}^{\rho}(\tilde{k}_{t})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (42)
    =𝔼ρ[rt(Xt,Ut)],absentsuperscript𝔼𝜌delimited-[]subscript𝑟𝑡subscript𝑋𝑡subscript𝑈𝑡\displaystyle=\mathbb{E}^{\rho}[r_{t}(X_{t},U_{t})],= blackboard_E start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT [ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] , (43)

    for each t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T. The result then follows from linearity of expectation.

This concludes the proof.

Appendix B Alternative Characterizations of Sequential Equilibria

This section deals with the game model introduced in Section 2. We provide three alternative definitions of sequential equilibria that are equivalent to the original one given by [21]. These definitions help simplify some of the proofs in Appendix C.

We would like to note that several alternative definitions of sequential equilibria are also given in [21, 16]. The definition of weak perfect equilibrium in Proposition 6 of [21] is close to our definitions in spirit in terms of using sequences of payoff functions instead of beliefs as a vehicle to define sequential rationality.

Notice that fixing the behavioral strategies gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT of players other than player i𝑖iitalic_i, player i𝑖iitalic_i’s best response problem (at every information set) can be considered as a Markov Decision Process with state Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and action Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, where the transition kernels and instantaneous reward functions depend on gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT. Inspired by this observation, we introduce an alternative definition of sequential equilibrium for our model, where we form conjectures of transition kernels and reward functions instead of forming beliefs on nodes. This allows us for a more compact representation of the appraisals and beliefs of players. We will later show that this alternative definition is equivalent to the classical definition of sequential equilibrium in [21].

For player i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I, let Pi=(Pti)t𝒯\{T},Pti:ti×𝒰tiΔ(𝒵ti):superscript𝑃𝑖subscriptsuperscriptsubscript𝑃𝑡𝑖𝑡\𝒯𝑇superscriptsubscript𝑃𝑡𝑖maps-tosuperscriptsubscript𝑡𝑖superscriptsubscript𝒰𝑡𝑖Δsuperscriptsubscript𝒵𝑡𝑖P^{i}=(P_{t}^{i})_{t\in\mathcal{T}\backslash\{T\}},P_{t}^{i}\colon\mathcal{H}_% {t}^{i}\times\mathcal{U}_{t}^{i}\mapsto\Delta(\mathcal{Z}_{t}^{i})italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T \ { italic_T } end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) and ri=(rti)t𝒯,rti:ti×𝒰ti[1,1]:superscript𝑟𝑖subscriptsuperscriptsubscript𝑟𝑡𝑖𝑡𝒯superscriptsubscript𝑟𝑡𝑖maps-tosuperscriptsubscript𝑡𝑖superscriptsubscript𝒰𝑡𝑖11r^{i}=(r_{t}^{i})_{t\in\mathcal{T}},r_{t}^{i}\colon\mathcal{H}_{t}^{i}\times% \mathcal{U}_{t}^{i}\mapsto[-1,1]italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ [ - 1 , 1 ] be collections of functions that represent conjectures of transition kernels and instantaneous reward functions. For a behavioral strategy profile gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, define the reward-to-go function Jtisuperscriptsubscript𝐽𝑡𝑖J_{t}^{i}italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT recursively through

JTi(gTi;hTi,Pi,ri):=u~TirTi(hTi,u~Ti)gTi(u~Ti|hTi);assignsuperscriptsubscript𝐽𝑇𝑖superscriptsubscript𝑔𝑇𝑖superscriptsubscript𝑇𝑖superscript𝑃𝑖superscript𝑟𝑖subscriptsuperscriptsubscript~𝑢𝑇𝑖superscriptsubscript𝑟𝑇𝑖superscriptsubscript𝑇𝑖superscriptsubscript~𝑢𝑇𝑖superscriptsubscript𝑔𝑇𝑖conditionalsuperscriptsubscript~𝑢𝑇𝑖superscriptsubscript𝑇𝑖\displaystyle\quad~{}J_{T}^{i}(g_{T}^{i};h_{T}^{i},P^{i},r^{i}):=\sum_{\tilde{% u}_{T}^{i}}r_{T}^{i}(h_{T}^{i},\tilde{u}_{T}^{i})g_{T}^{i}(\tilde{u}_{T}^{i}|h% _{T}^{i});italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) := ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; (44a)
Jti(gt:Ti;hti,Pi,ri)superscriptsubscript𝐽𝑡𝑖superscriptsubscript𝑔:𝑡𝑇𝑖superscriptsubscript𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖\displaystyle\quad~{}J_{t}^{i}(g_{t:T}^{i};h_{t}^{i},P^{i},r^{i})italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (44b)
:=assign\displaystyle:=:= u~ti[rti(hti,u~ti)+z~tiJt+1i(gt+1:Ti;(hti,z~ti),Pi,ri)Pti(z~ti|hti,u~ti)]gti(u~ti|hti).subscriptsuperscriptsubscript~𝑢𝑡𝑖delimited-[]superscriptsubscript𝑟𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖subscriptsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝐽𝑡1𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑧𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖superscriptsubscript𝑃𝑡𝑖conditionalsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\sum_{\tilde{u}_{t}^{i}}\left[r_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{% i})+\sum_{\tilde{z}_{t}^{i}}J_{t+1}^{i}(g_{t+1:T}^{i};(h_{t}^{i},\tilde{z}_{t}% ^{i}),P^{i},r^{i})P_{t}^{i}(\tilde{z}_{t}^{i}|h_{t}^{i},\tilde{u}_{t}^{i})% \right]g_{t}^{i}(\tilde{u}_{t}^{i}|h_{t}^{i}).∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ] italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (44c)
Definition 11 (“Model-based” Sequential Equilibrium).

Let g=(gi)i𝑔subscriptsuperscript𝑔𝑖𝑖g=(g^{i})_{i\in\mathcal{I}}italic_g = ( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT be a behavioral strategy profile. Let (P,r)=(Pi,ri)i𝑃𝑟subscriptsuperscript𝑃𝑖superscript𝑟𝑖𝑖(P,r)=(P^{i},r^{i})_{i\in\mathcal{I}}( italic_P , italic_r ) = ( italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT be a conjectured profile. Then, g𝑔gitalic_g is said to be sequentially rational under (P,r)𝑃𝑟(P,r)( italic_P , italic_r ) if for each i,t𝒯formulae-sequence𝑖𝑡𝒯i\in\mathcal{I},t\in\mathcal{T}italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T and each htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT,

Jti(gt:Ti;hti,Pi,ri)Jti(g~t:Ti;hti,Pi,ri),superscriptsubscript𝐽𝑡𝑖superscriptsubscript𝑔:𝑡𝑇𝑖superscriptsubscript𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖superscriptsubscript𝐽𝑡𝑖superscriptsubscript~𝑔:𝑡𝑇𝑖superscriptsubscript𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖J_{t}^{i}(g_{t:T}^{i};h_{t}^{i},P^{i},r^{i})\geq J_{t}^{i}(\tilde{g}_{t:T}^{i}% ;h_{t}^{i},P^{i},r^{i}),italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ≥ italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (45)

for all behavioral strategies g~t:Tisuperscriptsubscript~𝑔:𝑡𝑇𝑖\tilde{g}_{t:T}^{i}over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Conjectured profile (P,r)𝑃𝑟(P,r)( italic_P , italic_r ) is said to be fully consistent with g𝑔gitalic_g if there exist a sequence of behavioral strategy and conjecture profiles (g(n),P(n),r(n))n=1superscriptsubscriptsuperscript𝑔𝑛superscript𝑃𝑛superscript𝑟𝑛𝑛1(g^{(n)},P^{(n)},r^{(n)})_{n=1}^{\infty}( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT such that

  1. (1)

    g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is fully mixed, i.e. every action is chosen with positive probability at every information set.

  2. (2)

    For each i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I, (P(n),i,r(n),i)superscript𝑃𝑛𝑖superscript𝑟𝑛𝑖(P^{(n),i},r^{(n),i})( italic_P start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ) is consistent with g(n),isuperscript𝑔𝑛𝑖g^{(n),-i}italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT, i.e. for each i,t𝒯,htiti,uti𝒰tiformulae-sequence𝑖formulae-sequence𝑡𝒯formulae-sequencesuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝒰𝑡𝑖i\in\mathcal{I},t\in\mathcal{T},h_{t}^{i}\in\mathcal{H}_{t}^{i},u_{t}^{i}\in% \mathcal{U}_{t}^{i}italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT,

    Pt(n),i(zti|hti,uti)superscriptsubscript𝑃𝑡𝑛𝑖conditionalsuperscriptsubscript𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle P_{t}^{(n),i}(z_{t}^{i}|h_{t}^{i},u_{t}^{i})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =g(n),i(zti|hti,uti),absentsuperscriptsuperscript𝑔𝑛𝑖conditionalsuperscriptsubscript𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=\mathbb{P}^{g^{(n),-i}}(z_{t}^{i}|h_{t}^{i},u_{t}^{i}),= blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (46)
    rt(n),i(hti,uti)superscriptsubscript𝑟𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle r_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g(n),i[Rti|hti,uti].absentsuperscript𝔼superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=\mathbb{E}^{g^{(n),-i}}[R_{t}^{i}|h_{t}^{i},u_{t}^{i}].= blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] . (47)
  3. (3)

    (g(n),P(n),r(n))(g,P,r)superscript𝑔𝑛superscript𝑃𝑛superscript𝑟𝑛𝑔𝑃𝑟(g^{(n)},P^{(n)},r^{(n)})\rightarrow(g,P,r)( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) → ( italic_g , italic_P , italic_r ) as n𝑛n\rightarrow\inftyitalic_n → ∞.

A triple (g,P,r)𝑔𝑃𝑟(g,P,r)( italic_g , italic_P , italic_r ) is said to be a “model-based” sequential equilibrium666Here we borrow the terms “model-based” (resp. “model-free”) from the reinforcement learning literature: “Model-based” means that an algorithm constructs the underlying model (P,r)𝑃𝑟(P,r)( italic_P , italic_r ), while “model-free” usually means that the algorithm directly constructs state-action value functions Q𝑄Qitalic_Q. if g𝑔gitalic_g is sequentially rational under (P,r)𝑃𝑟(P,r)( italic_P , italic_r ) and (P,r)𝑃𝑟(P,r)( italic_P , italic_r ) is fully consistent with g𝑔gitalic_g.

One can also form conjectures directly on the optimal reward-to-go given a state-action pair (hti,uti)superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖(h_{t}^{i},u_{t}^{i})( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ).

Definition 12 (“Model-free” Sequential Equilibrium, Definition 2 revisited).

Let g=(gi)i𝑔subscriptsuperscript𝑔𝑖𝑖g=(g^{i})_{i\in\mathcal{I}}italic_g = ( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT be a behavioral strategy profile. Let Q=(Qti)i,t𝒯𝑄subscriptsuperscriptsubscript𝑄𝑡𝑖formulae-sequence𝑖𝑡𝒯Q=(Q_{t}^{i})_{i\in\mathcal{I},t\in\mathcal{T}}italic_Q = ( italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T end_POSTSUBSCRIPT be a collection of functions where Qti:ti×𝒰ti[T,T]:superscriptsubscript𝑄𝑡𝑖maps-tosuperscriptsubscript𝑡𝑖superscriptsubscript𝒰𝑡𝑖𝑇𝑇Q_{t}^{i}\colon\mathcal{H}_{t}^{i}\times\mathcal{U}_{t}^{i}\mapsto[-T,T]italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ [ - italic_T , italic_T ]. The strategy profile g𝑔gitalic_g is said to be sequentially rational under Q𝑄Qitalic_Q if for each i,t𝒯formulae-sequence𝑖𝑡𝒯i\in\mathcal{I},t\in\mathcal{T}italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T and each htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT,

supp(gti(hti))argmaxutiQti(hti,uti).suppsuperscriptsubscript𝑔𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathrm{supp}(g_{t}^{i}(h_{t}^{i}))\subseteq\underset{u_{t}^{i}}{\arg\max}~{}Q% _{t}^{i}(h_{t}^{i},u_{t}^{i}).roman_supp ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ⊆ start_UNDERACCENT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (48)

The collection of functions Q𝑄Qitalic_Q is said to be fully consistent with g𝑔gitalic_g if there exist a sequence of behavioral strategy and conjectured profiles (g(n),Q(n))n=1superscriptsubscriptsuperscript𝑔𝑛superscript𝑄𝑛𝑛1(g^{(n)},Q^{(n)})_{n=1}^{\infty}( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT such that

  1. (1)

    g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is fully mixed, i.e. every action is chosen with positive probability at every information set.

  2. (2)

    Q(n)superscript𝑄𝑛Q^{(n)}italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is consistent with g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT, i.e.,

    Qτ(n),i(hτi,uτi)superscriptsubscript𝑄𝜏𝑛𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle Q_{\tau}^{(n),i}(h_{\tau}^{i},u_{\tau}^{i})italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g(n)[t=τTRti|hτi,uτi],absentsuperscript𝔼superscript𝑔𝑛delimited-[]conditionalsuperscriptsubscript𝑡𝜏𝑇superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle=\mathbb{E}^{g^{(n)}}\left[\sum_{t=\tau}^{T}R_{t}^{i}\Big{|}h_{% \tau}^{i},u_{\tau}^{i}\right],= blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] , (49)

    for each i,τ𝒯,hτiτi,uτi𝒰τiformulae-sequence𝑖formulae-sequence𝜏𝒯formulae-sequencesuperscriptsubscript𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript𝒰𝜏𝑖i\in\mathcal{I},\tau\in\mathcal{T},h_{\tau}^{i}\in\mathcal{H}_{\tau}^{i},u_{% \tau}^{i}\in\mathcal{U}_{\tau}^{i}italic_i ∈ caligraphic_I , italic_τ ∈ caligraphic_T , italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

  3. (3)

    (g(n),Q(n))(g,Q)superscript𝑔𝑛superscript𝑄𝑛𝑔𝑄(g^{(n)},Q^{(n)})\rightarrow(g,Q)( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) → ( italic_g , italic_Q ) as n𝑛n\rightarrow\inftyitalic_n → ∞.

A tuple (g,Q)𝑔𝑄(g,Q)( italic_g , italic_Q ) is said to be a “model-free” sequential equilibrium if g𝑔gitalic_g is sequentially rational under Q𝑄Qitalic_Q and Q𝑄Qitalic_Q is fully consistent with g𝑔gitalic_g.

A slightly different definition is also equivalent:

Definition 13 (“Model-free” Sequential Equilibrium, Version 2).

A tuple (g,Q)𝑔𝑄(g,Q)( italic_g , italic_Q ) is said to be a “model-free” sequential equilibrium (version 2) if it satisfies Definition 12 with condition (2) for full consistency replaced by the following condition:

  • (2’)

    For each i𝑖iitalic_i, Q(n),isuperscript𝑄𝑛𝑖Q^{(n),i}italic_Q start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT is consistent with g(n),isuperscript𝑔𝑛𝑖g^{(n),-i}italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT, i.e.

    Qτ(n),i(hτi,uτi)superscriptsubscript𝑄𝜏𝑛𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle Q_{\tau}^{(n),i}(h_{\tau}^{i},u_{\tau}^{i})italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g(n),i[Rτi|hτi,uτi]+maxg~τ+1:Ti𝔼g~τ+1:Ti,g(n),i[t=τ+1TRti|hτi,uτi],absentsuperscript𝔼superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript~𝑔:𝜏1𝑇𝑖superscript𝔼superscriptsubscript~𝑔:𝜏1𝑇𝑖superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑡𝜏1𝑇superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle=\mathbb{E}^{g^{(n),-i}}[R_{\tau}^{i}|h_{\tau}^{i},u_{\tau}^{i}]+% \underset{\tilde{g}_{\tau+1:T}^{i}}{\max}~{}\mathbb{E}^{\tilde{g}_{\tau+1:T}^{% i},g^{(n),-i}}\left[\sum_{t=\tau+1}^{T}R_{t}^{i}\Big{|}h_{\tau}^{i},u_{\tau}^{% i}\right],= blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] + start_UNDERACCENT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_max end_ARG blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ,

    for each τ𝒯,hτiτi,uτi𝒰τiformulae-sequence𝜏𝒯formulae-sequencesuperscriptsubscript𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript𝒰𝜏𝑖\tau\in\mathcal{T},h_{\tau}^{i}\in\mathcal{H}_{\tau}^{i},u_{\tau}^{i}\in% \mathcal{U}_{\tau}^{i}italic_τ ∈ caligraphic_T , italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

To introduce the last definition of SE, which corresponds to the original definition proposed in [21], we first describe the game in Section 2 as an extensive-form game tree as follows: To convert the game from a simultaneous move game to a sequential game, we set ={1,2,,I}12𝐼\mathcal{I}=\{1,2,\cdots,I\}caligraphic_I = { 1 , 2 , ⋯ , italic_I }, where the index indicates the order of movement. For convenience, for i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I, we use the superscript <iabsent𝑖<i< italic_i (resp. >iabsent𝑖>i> italic_i) to represent the set of players {1,,i1}1𝑖1\{1,\cdots,i-1\}{ 1 , ⋯ , italic_i - 1 } (resp. {i+1,,I}𝑖1𝐼\{i+1,\cdots,I\}{ italic_i + 1 , ⋯ , italic_I }) that moves before (resp. after) player i𝑖iitalic_i in any given round. At time t=0𝑡0t=0italic_t = 0, nature takes action w0=(x1,h1)subscript𝑤0subscript𝑥1subscript1w_{0}=(x_{1},h_{1})italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and the game enters t=1𝑡1t=1italic_t = 1. For each time t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T, player 1111 takes action ut1superscriptsubscript𝑢𝑡1u_{t}^{1}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT first, then followed by player 2222 taking action ut2superscriptsubscript𝑢𝑡2u_{t}^{2}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and so on, while nature takes action wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT after player I𝐼Iitalic_I takes action utIsuperscriptsubscript𝑢𝑡𝐼u_{t}^{I}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT. In this extensive form game, there are three types of nodes: (1) a node where some player i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I takes action (at some time t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T), (2) a node where nature takes action (at some time t{0}𝒯𝑡0𝒯t\in\{0\}\cup\mathcal{T}italic_t ∈ { 0 } ∪ caligraphic_T), and (3) a terminal node, where the game has terminated. We denote the set of the first type of nodes corresponding to player i𝑖iitalic_i and time t𝑡titalic_t as 𝒪tisuperscriptsubscript𝒪𝑡𝑖\mathcal{O}_{t}^{i}caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. A node oti𝒪tisuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝒪𝑡𝑖o_{t}^{i}\in\mathcal{O}_{t}^{i}italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT can also be represented as a vector oti=(x1,h1,w1:t1,u1:t1,ut<i)superscriptsubscript𝑜𝑡𝑖subscript𝑥1subscript1subscript𝑤:1𝑡1subscript𝑢:1𝑡1superscriptsubscript𝑢𝑡absent𝑖o_{t}^{i}=(x_{1},h_{1},w_{1:t-1},u_{1:t-1},u_{t}^{<i})italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT < italic_i end_POSTSUPERSCRIPT ) which contains all the moves (by all players and nature) before it. As a result, otisuperscriptsubscript𝑜𝑡𝑖o_{t}^{i}italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT also uniquely determines the states x1:tsubscript𝑥:1𝑡x_{1:t}italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT and information increment vectors z1:t1subscript𝑧:1𝑡1z_{1:t-1}italic_z start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT. We denote the set of the terminal nodes as 𝒪T+1subscript𝒪𝑇1\mathcal{O}_{T+1}caligraphic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT. A terminal node oT+1𝒪T+1subscript𝑜𝑇1subscript𝒪𝑇1o_{T+1}\in\mathcal{O}_{T+1}italic_o start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ∈ caligraphic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT also has a vector representation oT+1=(x1,h1,w1:T,u1:T)subscript𝑜𝑇1subscript𝑥1subscript1subscript𝑤:1𝑇subscript𝑢:1𝑇o_{T+1}=(x_{1},h_{1},w_{1:T},u_{1:T})italic_o start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 1 : italic_T end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 1 : italic_T end_POSTSUBSCRIPT ).

Given a terminal node oT+1subscript𝑜𝑇1o_{T+1}italic_o start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT, all the actions of players and nature throughout the game are uniquely determined, hence the realizations of (Rt)t𝒯subscriptsubscript𝑅𝑡𝑡𝒯(R_{t})_{t\in\mathcal{T}}( italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT defined in Section 2 are also uniquely determined. Let Λ=(Λi)i,Λi:𝒪T+1:ΛsubscriptsuperscriptΛ𝑖𝑖superscriptΛ𝑖maps-tosubscript𝒪𝑇1\Lambda=(\Lambda^{i})_{i\in\mathcal{I}},\Lambda^{i}\colon\mathcal{O}_{T+1}% \mapsto\mathbb{R}roman_Λ = ( roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT , roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ↦ blackboard_R be the mappings from terminal nodes to total payoffs, i.e. Λi(oT+1)=t=1TrtisuperscriptΛ𝑖subscript𝑜𝑇1superscriptsubscript𝑡1𝑇superscriptsubscript𝑟𝑡𝑖\Lambda^{i}(o_{T+1})=\sum_{t=1}^{T}r_{t}^{i}roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, where rtisuperscriptsubscript𝑟𝑡𝑖r_{t}^{i}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the realization of Rtisuperscriptsubscript𝑅𝑡𝑖R_{t}^{i}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT corresponding to oT+1subscript𝑜𝑇1o_{T+1}italic_o start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT. Also define Λτi(oT+1)=t=τTrtisuperscriptsubscriptΛ𝜏𝑖subscript𝑜𝑇1superscriptsubscript𝑡𝜏𝑇superscriptsubscript𝑟𝑡𝑖\Lambda_{\tau}^{i}(o_{T+1})=\sum_{t=\tau}^{T}r_{t}^{i}roman_Λ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for each τ𝒯𝜏𝒯\tau\in\mathcal{T}italic_τ ∈ caligraphic_T.

Now, as we have constructed the extensive-form game, it is helpful to view the nodes in the game tree as a stochastic process. Define Otisuperscriptsubscript𝑂𝑡𝑖O_{t}^{i}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT to be a random variable with support on 𝒪tisuperscriptsubscript𝒪𝑡𝑖\mathcal{O}_{t}^{i}caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT that represents the node player i𝑖iitalic_i is at before taking action at time t𝑡titalic_t. Let OT+1subscript𝑂𝑇1O_{T+1}italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT be a random variable with support on 𝒪T+1subscript𝒪𝑇1\mathcal{O}_{T+1}caligraphic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT that represents the terminal node the game ends at. If we view (𝒯×){T+1}𝒯𝑇1(\mathcal{T}\times\mathcal{I})\cup\{T+1\}( caligraphic_T × caligraphic_I ) ∪ { italic_T + 1 } as a set of time indices with lexicographic ordering, the random process (Oti)(t,i)𝒯×(OT+1)subscriptsuperscriptsubscript𝑂𝑡𝑖𝑡𝑖𝒯subscript𝑂𝑇1(O_{t}^{i})_{(t,i)\in\mathcal{T}\times\mathcal{I}}\cup(O_{T+1})( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT ( italic_t , italic_i ) ∈ caligraphic_T × caligraphic_I end_POSTSUBSCRIPT ∪ ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) is a controlled Markov Chain controlled by action Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT at time (t,i)𝑡𝑖(t,i)( italic_t , italic_i ).

Definition 14 (Classical Sequential Equilibrium [21]).

An assessment is a pair (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ), where g𝑔gitalic_g is a behavioral strategy profile of players (excluding nature) as described in Section 2, and μ=(μti)t,i,μti:tiΔ(𝒪ti):𝜇subscriptsuperscriptsubscript𝜇𝑡𝑖formulae-sequence𝑡𝑖superscriptsubscript𝜇𝑡𝑖maps-tosuperscriptsubscript𝑡𝑖Δsuperscriptsubscript𝒪𝑡𝑖\mu=(\mu_{t}^{i})_{t\in\mathcal{I},i\in\mathcal{I}},\mu_{t}^{i}\colon\mathcal{% H}_{t}^{i}\mapsto\Delta(\mathcal{O}_{t}^{i})italic_μ = ( italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_I , italic_i ∈ caligraphic_I end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is a belief system. Then, g𝑔gitalic_g is said to be sequentially rational given μ𝜇\muitalic_μ if

oti𝔼gt:Ti,gt>i,gt:Ti[Λi(OT+1)|oti]μti(oti|hti)oti𝔼g~t:Ti,gt>i,gt:Ti[Λi(OT+1)|oti]μti(oti|hti),subscriptsuperscriptsubscript𝑜𝑡𝑖superscript𝔼superscriptsubscript𝑔:𝑡𝑇𝑖superscriptsubscript𝑔𝑡absent𝑖superscriptsubscript𝑔:𝑡𝑇𝑖delimited-[]conditionalsuperscriptΛ𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖subscriptsuperscriptsubscript𝑜𝑡𝑖superscript𝔼superscriptsubscript~𝑔:𝑡𝑇𝑖superscriptsubscript𝑔𝑡absent𝑖superscriptsubscript𝑔:𝑡𝑇𝑖delimited-[]conditionalsuperscriptΛ𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\sum_{o_{t}^{i}}\mathbb{E}^{g_{t:T}^{i},g_{t}^{>i},g_{t:T}^{-i}}[\Lambda^{i}(O% _{T+1})|o_{t}^{i}]\mu_{t}^{i}(o_{t}^{i}|h_{t}^{i})\geq\sum_{o_{t}^{i}}\mathbb{% E}^{\tilde{g}_{t:T}^{i},g_{t}^{>i},g_{t:T}^{-i}}[\Lambda^{i}(O_{T+1})|o_{t}^{i% }]\mu_{t}^{i}(o_{t}^{i}|h_{t}^{i}),∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ≥ ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (50)

for all i,t𝒯,htitiformulae-sequence𝑖formulae-sequence𝑡𝒯superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖i\in\mathcal{I},t\in\mathcal{T},h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and all behavioral strategies g~t:Tisuperscriptsubscript~𝑔:𝑡𝑇𝑖\tilde{g}_{t:T}^{i}over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. The belief system μ𝜇\muitalic_μ is said to be fully consistent with g𝑔gitalic_g if there exist a sequence of assessments (g(n),μ(n))n=1(g,μ)superscriptsubscriptsuperscript𝑔𝑛superscript𝜇𝑛𝑛1𝑔𝜇(g^{(n)},\mu^{(n)})_{n=1}^{\infty}\rightarrow(g,\mu)( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT → ( italic_g , italic_μ ) such that g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is a fully mixed strategy profile and

  1. (1)

    g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is fully mixed.

  2. (2)

    μ(n)superscript𝜇𝑛\mu^{(n)}italic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT is consistent with g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT, i.e. μt(n),i(oti|hti)=g(n)(oti|hti)superscriptsubscript𝜇𝑡𝑛𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖superscriptsuperscript𝑔𝑛conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\mu_{t}^{(n),i}(o_{t}^{i}|h_{t}^{i})=\mathbb{P}^{g^{(n)}}(o_{t}^{i}|h_{t}^{i})italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for all t𝒯,i,htitiformulae-sequence𝑡𝒯formulae-sequence𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖t\in\mathcal{T},i\in\mathcal{I},h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_t ∈ caligraphic_T , italic_i ∈ caligraphic_I , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and oti𝒪tisuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝒪𝑡𝑖o_{t}^{i}\in\mathcal{O}_{t}^{i}italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

  3. (3)

    (g(n),μ(n))(g,μ)superscript𝑔𝑛superscript𝜇𝑛𝑔𝜇(g^{(n)},\mu^{(n)})\rightarrow(g,\mu)( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) → ( italic_g , italic_μ ) as n𝑛n\rightarrow\inftyitalic_n → ∞.

An assessment (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ) is said to be a (classical) sequential equilibrium if g𝑔gitalic_g is sequentially rational given μ𝜇\muitalic_μ and μ𝜇\muitalic_μ is fully consistent with g𝑔gitalic_g.

Remark B.1.

Since the instantaneous rewards R1:t1isuperscriptsubscript𝑅:1𝑡1𝑖R_{1:t-1}^{i}italic_R start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT have already been realized at time t𝑡titalic_t, replacing the total reward ΛΛ\Lambdaroman_Λ with reward-to-go ΛtsubscriptΛ𝑡\Lambda_{t}roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in (50) would result in an equivalent definition.

Theorem 15.

Definitions  11, 12, 13, and 14 are equivalent for strategy profiles.

Proof B.2.

We complete the proof via four steps: In each step, we show that if g𝑔gitalic_g is a strategy profile satisfying one definition of SE, then it satisfy one of the other definitions of SE as well. We follow the following diagram: Definition 14 \Rightarrow Definition 11 \Rightarrow Definition 12 \Rightarrow Definition 13 \Rightarrow Definition 14.

Step 1: Classical SE (Definition 14) \Rightarrow “Model-based” SE (Definition 11)

Let (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ) satisfy Definition 14. Let (g(n),μ(n))superscript𝑔𝑛superscript𝜇𝑛(g^{(n)},\mu^{(n)})( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) be a sequence of assessments that satisfies conditions (1)-(3) of fully consistency in Definition 14.

Set Pt(n),i(zti|hti,uti)=g(n)(zti|hti,uti)superscriptsubscript𝑃𝑡𝑛𝑖conditionalsuperscriptsubscript𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsuperscript𝑔𝑛conditionalsuperscriptsubscript𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖P_{t}^{(n),i}(z_{t}^{i}|h_{t}^{i},u_{t}^{i})=\mathbb{P}^{g^{(n)}}(z_{t}^{i}|h_% {t}^{i},u_{t}^{i})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) and rt(n),i(hti,uti)=𝔼g(n)[Rti|hti,uti]superscriptsubscript𝑟𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscript𝔼superscript𝑔𝑛delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖r_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})=\mathbb{E}^{g^{(n)}}[R_{t}^{i}|h_{t}^{i},u_% {t}^{i}]italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] for all htiti,uti𝒰tiformulae-sequencesuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝒰𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i},u_{t}^{i}\in\mathcal{U}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Recall that we can write Oti=(X1,H1,W1:t1,U1:t1,Ut<i)superscriptsubscript𝑂𝑡𝑖subscript𝑋1subscript𝐻1subscript𝑊:1𝑡1subscript𝑈:1𝑡1superscriptsubscript𝑈𝑡absent𝑖O_{t}^{i}=(X_{1},H_{1},W_{1:t-1},U_{1:t-1},U_{t}^{<i})italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT < italic_i end_POSTSUPERSCRIPT ), and (X1:t,Z1:t1)subscript𝑋:1𝑡subscript𝑍:1𝑡1(X_{1:t},Z_{1:t-1})( italic_X start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ) can be expressed as a function of Otisuperscriptsubscript𝑂𝑡𝑖O_{t}^{i}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Therefore there exist fixed functions fti,Z,fti,Rsuperscriptsubscript𝑓𝑡𝑖𝑍superscriptsubscript𝑓𝑡𝑖𝑅f_{t}^{i,Z},f_{t}^{i,R}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_Z end_POSTSUPERSCRIPT , italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_R end_POSTSUPERSCRIPT such that Zti=fti,Z(Oti,Uti,Ut>i,Wt)superscriptsubscript𝑍𝑡𝑖superscriptsubscript𝑓𝑡𝑖𝑍superscriptsubscript𝑂𝑡𝑖superscriptsubscript𝑈𝑡𝑖superscriptsubscript𝑈𝑡absent𝑖subscript𝑊𝑡Z_{t}^{i}=f_{t}^{i,Z}(O_{t}^{i},U_{t}^{i},U_{t}^{>i},W_{t})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_Z end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), Rti=fti,R(Oti,Uti,Ut>i,Wt)superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑓𝑡𝑖𝑅superscriptsubscript𝑂𝑡𝑖superscriptsubscript𝑈𝑡𝑖superscriptsubscript𝑈𝑡absent𝑖subscript𝑊𝑡R_{t}^{i}=f_{t}^{i,R}(O_{t}^{i},U_{t}^{i},U_{t}^{>i},W_{t})italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_R end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). Furthermore, for all j>i𝑗𝑖j>iitalic_j > italic_i, there also exists functions ftj,i,Hsuperscriptsubscript𝑓𝑡𝑗𝑖𝐻f_{t}^{j,i,H}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_H end_POSTSUPERSCRIPT such that Htj=ftj,i,H(Oti)superscriptsubscript𝐻𝑡𝑗superscriptsubscript𝑓𝑡𝑗𝑖𝐻superscriptsubscript𝑂𝑡𝑖H_{t}^{j}=f_{t}^{j,i,H}(O_{t}^{i})italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_H end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (since Htj=(H1i,Z1:t1i)superscriptsubscript𝐻𝑡𝑗superscriptsubscript𝐻1𝑖superscriptsubscript𝑍:1𝑡1𝑖H_{t}^{j}=(H_{1}^{i},Z_{1:t-1}^{i})italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = ( italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT )). Since μt(n),i(oti|hti)=g(n)(oti|hti)superscriptsubscript𝜇𝑡𝑛𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖superscriptsuperscript𝑔𝑛conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\mu_{t}^{(n),i}(o_{t}^{i}|h_{t}^{i})=\mathbb{P}^{g^{(n)}}(o_{t}^{i}|h_{t}^{i})italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) we have

Pt(n),i(zti|hti,uti)=oti,u~t>i,w~t𝟏{zti=fti,Z(oti,uti,u~t>i,w~t)}(w~t)(j=i+1Igt(n),j(u~tj|ftj,i,H(oti)))μt(n)(oti|hti),superscriptsubscript𝑃𝑡𝑛𝑖conditionalsuperscriptsubscript𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖subscriptsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript~𝑢𝑡absent𝑖subscript~𝑤𝑡subscript1superscriptsubscript𝑧𝑡𝑖superscriptsubscript𝑓𝑡𝑖𝑍superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript~𝑢𝑡absent𝑖subscript~𝑤𝑡subscript~𝑤𝑡superscriptsubscriptproduct𝑗𝑖1𝐼superscriptsubscript𝑔𝑡𝑛𝑗conditionalsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript𝑓𝑡𝑗𝑖𝐻superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑛conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\begin{split}&\enspace\enspace\>P_{t}^{(n),i}(z_{t}^{i}|h_{t}^{i}% ,u_{t}^{i})\\ &=\sum_{o_{t}^{i},\tilde{u}_{t}^{>i},\tilde{w}_{t}}\bm{1}_{\{z_{t}^{i}=f_{t}^{% i,Z}(o_{t}^{i},u_{t}^{i},\tilde{u}_{t}^{>i},\tilde{w}_{t})\}}\mathbb{P}(\tilde% {w}_{t})\left(\prod_{j=i+1}^{I}g_{t}^{(n),j}(\tilde{u}_{t}^{j}|f_{t}^{j,i,H}(o% _{t}^{i}))\right)\mu_{t}^{(n)}(o_{t}^{i}|h_{t}^{i}),\end{split}start_ROW start_CELL end_CELL start_CELL italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT { italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_Z end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) } end_POSTSUBSCRIPT blackboard_P ( over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( ∏ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_H end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ) italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , end_CELL end_ROW (51)
rt(n),i(hti,uti)=oti,u~t>i,w~tfti,R(oti,uti,u~t>i,w~t)(w~t)(j=i+1Igt(n),j(u~tj|ftj,i,H(oti)))μt(n)(oti|hti).superscriptsubscript𝑟𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖subscriptsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript~𝑢𝑡absent𝑖subscript~𝑤𝑡superscriptsubscript𝑓𝑡𝑖𝑅superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript~𝑢𝑡absent𝑖subscript~𝑤𝑡subscript~𝑤𝑡superscriptsubscriptproduct𝑗𝑖1𝐼superscriptsubscript𝑔𝑡𝑛𝑗conditionalsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript𝑓𝑡𝑗𝑖𝐻superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑛conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\begin{split}&\enspace\enspace\>r_{t}^{(n),i}(h_{t}^{i},u_{t}^{i}% )\\ &=\sum_{o_{t}^{i},\tilde{u}_{t}^{>i},\tilde{w}_{t}}f_{t}^{i,R}(o_{t}^{i},u_{t}% ^{i},\tilde{u}_{t}^{>i},\tilde{w}_{t})\mathbb{P}(\tilde{w}_{t})\left(\prod_{j=% i+1}^{I}g_{t}^{(n),j}(\tilde{u}_{t}^{j}|f_{t}^{j,i,H}(o_{t}^{i}))\right)\mu_{t% }^{(n)}(o_{t}^{i}|h_{t}^{i}).\end{split}start_ROW start_CELL end_CELL start_CELL italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_R end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) blackboard_P ( over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( ∏ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_H end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ) italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . end_CELL end_ROW (52)

Therefore, as μ(n)μsuperscript𝜇𝑛𝜇\mu^{(n)}\rightarrow\muitalic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_μ, g(n)gsuperscript𝑔𝑛𝑔g^{(n)}\rightarrow gitalic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_g, we have (P(n),r(n))(P,r)superscript𝑃𝑛superscript𝑟𝑛𝑃𝑟(P^{(n)},r^{(n)})\rightarrow(P,r)( italic_P start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) → ( italic_P , italic_r ) for some (P,r)𝑃𝑟(P,r)( italic_P , italic_r ).

Let τ𝒯𝜏𝒯\tau\in\mathcal{T}italic_τ ∈ caligraphic_T and g~τ:Tisuperscriptsubscript~𝑔:𝜏𝑇𝑖\tilde{g}_{\tau:T}^{i}over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be an arbitrary strategy. First, observe that one can represent the conditional reward-to-go 𝔼g(n)[t=τTRti|hτi]superscript𝔼superscript𝑔𝑛delimited-[]conditionalsuperscriptsubscript𝑡𝜏𝑇superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝜏𝑖\mathbb{E}^{g^{(n)}}[\sum_{t=\tau}^{T}R_{t}^{i}|h_{\tau}^{i}]blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] using μ(n)superscript𝜇𝑛\mu^{(n)}italic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT or (P(n),r(n))superscript𝑃𝑛superscript𝑟𝑛(P^{(n)},r^{(n)})( italic_P start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ). Hence we have

oτi𝔼g~τ:Ti,gτ(n),>i,gτ+1:T(n),i[Λτi(OT+1)|oτi]μτ(n),i(oτi|τti)=Jti(g~τ:Ti;hτi,P(n),i,r(n),i),subscriptsuperscriptsubscript𝑜𝜏𝑖superscript𝔼superscriptsubscript~𝑔:𝜏𝑇𝑖superscriptsubscript𝑔𝜏𝑛absent𝑖superscriptsubscript𝑔:𝜏1𝑇𝑛𝑖delimited-[]conditionalsuperscriptsubscriptΛ𝜏𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝜏𝑖superscriptsubscript𝜇𝜏𝑛𝑖conditionalsuperscriptsubscript𝑜𝜏𝑖superscriptsubscript𝜏𝑡𝑖superscriptsubscript𝐽𝑡𝑖superscriptsubscript~𝑔:𝜏𝑇𝑖superscriptsubscript𝜏𝑖superscript𝑃𝑛𝑖superscript𝑟𝑛𝑖\begin{split}\sum_{o_{\tau}^{i}}\mathbb{E}^{\tilde{g}_{\tau:T}^{i},g_{\tau}^{(% n),>i},g_{\tau+1:T}^{(n),-i}}[\Lambda_{\tau}^{i}(O_{T+1})|o_{\tau}^{i}]\mu_{% \tau}^{(n),i}(o_{\tau}^{i}|\tau_{t}^{i})&=J_{t}^{i}(\tilde{g}_{\tau:T}^{i};h_{% \tau}^{i},P^{(n),i},r^{(n),i}),\end{split}start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_τ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_CELL start_CELL = italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ) , end_CELL end_ROW (53)

where Jtisuperscriptsubscript𝐽𝑡𝑖J_{t}^{i}italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is as defined in (44).

Observe that the left-hand side of (53) is continuous in (gτ(n),>i,gτ+1:T(n),i,μτ(n),i)superscriptsubscript𝑔𝜏𝑛absent𝑖superscriptsubscript𝑔:𝜏1𝑇𝑛𝑖superscriptsubscript𝜇𝜏𝑛𝑖(g_{\tau}^{(n),>i},g_{\tau+1:T}^{(n),-i},\mu_{\tau}^{(n),i})( italic_g start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT , italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ) since it is a sum of products of components of (gτ(n),>i,gτ+1:T(n),i,μτ(n),i)superscriptsubscript𝑔𝜏𝑛absent𝑖superscriptsubscript𝑔:𝜏1𝑇𝑛𝑖superscriptsubscript𝜇𝜏𝑛𝑖(g_{\tau}^{(n),>i},g_{\tau+1:T}^{(n),-i},\mu_{\tau}^{(n),i})( italic_g start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT , italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ). Also observe that the right-hand side of (53) is continuous in (P(n),i,r(n),i)superscript𝑃𝑛𝑖superscript𝑟𝑛𝑖(P^{(n),i},r^{(n),i})( italic_P start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ) since it is a sum of products of components of (P(n),i,r(n),i)superscript𝑃𝑛𝑖superscript𝑟𝑛𝑖(P^{(n),i},r^{(n),i})( italic_P start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ) by the definition in (44). Therefore by taking limit as n𝑛n\rightarrow\inftyitalic_n → ∞, we conclude that

oτi𝔼g~τ:Ti,gi[Λτi(OT+1)|oτi]μτi(oτi|hτi)=Jτi(g~τ:Ti;hτi,Pi,ri),subscriptsuperscriptsubscript𝑜𝜏𝑖superscript𝔼superscriptsubscript~𝑔:𝜏𝑇𝑖superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscriptΛ𝜏𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝜏𝑖superscriptsubscript𝜇𝜏𝑖conditionalsuperscriptsubscript𝑜𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝐽𝜏𝑖superscriptsubscript~𝑔:𝜏𝑇𝑖superscriptsubscript𝜏𝑖superscript𝑃𝑖superscript𝑟𝑖\sum_{o_{\tau}^{i}}\mathbb{E}^{\tilde{g}_{\tau:T}^{i},g^{-i}}[\Lambda_{\tau}^{% i}(O_{T+1})|o_{\tau}^{i}]\mu_{\tau}^{i}(o_{\tau}^{i}|h_{\tau}^{i})=J_{\tau}^{i% }(\tilde{g}_{\tau:T}^{i};h_{\tau}^{i},P^{i},r^{i}),∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_J start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (54)

for all strategies g~τ:Tisuperscriptsubscript~𝑔:𝜏𝑇𝑖\tilde{g}_{\tau:T}^{i}over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Using sequential rationality of g𝑔gitalic_g with respect to μ𝜇\muitalic_μ and (54) we conclude that

Jti(gτ:Ti;hτi,Pi,ri)Jti(g~τ:Ti;hτi,Pi,ri),superscriptsubscript𝐽𝑡𝑖superscriptsubscript𝑔:𝜏𝑇𝑖superscriptsubscript𝜏𝑖superscript𝑃𝑖superscript𝑟𝑖superscriptsubscript𝐽𝑡𝑖superscriptsubscript~𝑔:𝜏𝑇𝑖superscriptsubscript𝜏𝑖superscript𝑃𝑖superscript𝑟𝑖\displaystyle J_{t}^{i}(g_{\tau:T}^{i};h_{\tau}^{i},P^{i},r^{i})\geq J_{t}^{i}% (\tilde{g}_{\tau:T}^{i};h_{\tau}^{i},P^{i},r^{i}),italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ≥ italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (55)

for all τ𝒯,i,hτiτiformulae-sequence𝜏𝒯formulae-sequence𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝜏𝑖\tau\in\mathcal{T},i\in\mathcal{I},h_{\tau}^{i}\in\mathcal{H}_{\tau}^{i}italic_τ ∈ caligraphic_T , italic_i ∈ caligraphic_I , italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, i.e. g𝑔gitalic_g is also sequentially rational given (P,r)𝑃𝑟(P,r)( italic_P , italic_r ).

Step 2: “Model-based” SE (Definition 11) \Rightarrow “Model-free” SE version 1 (Definition 12)

Let (g,P,r)𝑔𝑃𝑟(g,P,r)( italic_g , italic_P , italic_r ) be a sequential equilibrium under Definition 11, and let (g(n),P(n),r(n))superscript𝑔𝑛superscript𝑃𝑛superscript𝑟𝑛(g^{(n)},P^{(n)},r^{(n)})( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) satisfy conditions (1)-(3) of full consistency in Definition 11. Set

Qτ(n),i(hτi,uτi)=𝔼g(n)[t=τTRti|hτi,uτi],superscriptsubscript𝑄𝜏𝑛𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscript𝔼superscript𝑔𝑛delimited-[]conditionalsuperscriptsubscript𝑡𝜏𝑇superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle Q_{\tau}^{(n),i}(h_{\tau}^{i},u_{\tau}^{i})=\mathbb{E}^{g^{(n)}}% \left[\sum_{t=\tau}^{T}R_{t}^{i}\Big{|}h_{\tau}^{i},u_{\tau}^{i}\right],italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] , (56)

for all τ𝒯,i,hτiτi,uτi𝒰τiformulae-sequence𝜏𝒯formulae-sequence𝑖formulae-sequencesuperscriptsubscript𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript𝒰𝜏𝑖\tau\in\mathcal{T},i\in\mathcal{I},h_{\tau}^{i}\in\mathcal{H}_{\tau}^{i},u_{% \tau}^{i}\in\mathcal{U}_{\tau}^{i}italic_τ ∈ caligraphic_T , italic_i ∈ caligraphic_I , italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Then Q(n),isuperscript𝑄𝑛𝑖Q^{(n),i}italic_Q start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT satisfies the recurrence relation

QT(n),i(hTi,uTi)superscriptsubscript𝑄𝑇𝑛𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle Q_{T}^{(n),i}(h_{T}^{i},u_{T}^{i})italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =rT(n),i(hTi,uTi),absentsuperscriptsubscript𝑟𝑇𝑛𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle=r_{T}^{(n),i}(h_{T}^{i},u_{T}^{i}),= italic_r start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (57a)
Vt(n),i(hti)superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript𝑡𝑖\displaystyle V_{t}^{(n),i}(h_{t}^{i})italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=u~tiQt(n),i(hti,u~ti)gt(n),i(u~ti|hti),t𝒯,formulae-sequenceassignabsentsubscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑔𝑡𝑛𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖for-all𝑡𝒯\displaystyle:=\sum_{\tilde{u}_{t}^{i}}Q_{t}^{(n),i}(h_{t}^{i},\tilde{u}_{t}^{% i})g_{t}^{(n),i}(\tilde{u}_{t}^{i}|h_{t}^{i}),\quad\forall t\in\mathcal{T},:= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T , (57b)
Qt(n),i(hti,uti)superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle Q_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =rt(n),i(hti,uti)absentsuperscriptsubscript𝑟𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=r_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})= italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (57c)
+z~tiVt+1(n),i((hti,z~ti))Pt(n),i(z~ti|hti,uti),t𝒯\{T}.subscriptsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑉𝑡1𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑃𝑡𝑛𝑖conditionalsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖for-all𝑡\𝒯𝑇\displaystyle+\sum_{\tilde{z}_{t}^{i}}V_{t+1}^{(n),i}((h_{t}^{i},\tilde{z}_{t}% ^{i}))P_{t}^{(n),i}(\tilde{z}_{t}^{i}|h_{t}^{i},u_{t}^{i}),\quad\forall t\in% \mathcal{T}\backslash\{T\}.+ ∑ start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T \ { italic_T } . (57d)

Since (g(n),P(n),r(n))(g,P,r)superscript𝑔𝑛superscript𝑃𝑛superscript𝑟𝑛𝑔𝑃𝑟(g^{(n)},P^{(n)},r^{(n)})\rightarrow(g,P,r)( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) → ( italic_g , italic_P , italic_r ) as n𝑛n\rightarrow\inftyitalic_n → ∞, we have Q(n)Qsuperscript𝑄𝑛𝑄Q^{(n)}\rightarrow Qitalic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_Q where Q=(Qti)t𝒯,i𝑄subscriptsuperscriptsubscript𝑄𝑡𝑖formulae-sequence𝑡𝒯𝑖Q=(Q_{t}^{i})_{t\in\mathcal{T},i\in\mathcal{I}}italic_Q = ( italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T , italic_i ∈ caligraphic_I end_POSTSUBSCRIPT satisfies

QTi(hTi,uTi)superscriptsubscript𝑄𝑇𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle Q_{T}^{i}(h_{T}^{i},u_{T}^{i})italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =rTi(hTi,uTi),absentsuperscriptsubscript𝑟𝑇𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle=r_{T}^{i}(h_{T}^{i},u_{T}^{i}),= italic_r start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (58a)
Vti(hti)superscriptsubscript𝑉𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle V_{t}^{i}(h_{t}^{i})italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=u~tiQti(hti,u~ti)gti(u~ti|hti),t𝒯,formulae-sequenceassignabsentsubscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖for-all𝑡𝒯\displaystyle:=\sum_{\tilde{u}_{t}^{i}}Q_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i})g% _{t}^{i}(\tilde{u}_{t}^{i}|h_{t}^{i}),\quad\forall t\in\mathcal{T},:= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T , (58b)
Qti(hti,uti)superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle Q_{t}^{i}(h_{t}^{i},u_{t}^{i})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =rti(hti,uti)absentsuperscriptsubscript𝑟𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=r_{t}^{i}(h_{t}^{i},u_{t}^{i})= italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT )
+z~tiVt+1i((hti,z~ti))Pti(z~ti|hti,uti),t𝒯\{T}.subscriptsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑉𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑃𝑡𝑖conditionalsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖for-all𝑡\𝒯𝑇\displaystyle+\sum_{\tilde{z}_{t}^{i}}V_{t+1}^{i}((h_{t}^{i},\tilde{z}_{t}^{i}% ))P_{t}^{i}(\tilde{z}_{t}^{i}|h_{t}^{i},u_{t}^{i}),\quad\forall t\in\mathcal{T% }\backslash\{T\}.+ ∑ start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T \ { italic_T } . (58c)

Comparing (58) with the reward-to-go function Jtisuperscriptsubscript𝐽𝑡𝑖J_{t}^{i}italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT defined in (44), we observe that

Vti(hti)=Jti(gt:Ti;hti,Pi,ri),superscriptsubscript𝑉𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝐽𝑡𝑖superscriptsubscript𝑔:𝑡𝑇𝑖superscriptsubscript𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖\displaystyle V_{t}^{i}(h_{t}^{i})=J_{t}^{i}(g_{t:T}^{i};h_{t}^{i},P^{i},r^{i}),italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (59)

for all t𝒯,i,hτiτiformulae-sequence𝑡𝒯formulae-sequence𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝜏𝑖t\in\mathcal{T},i\in\mathcal{I},h_{\tau}^{i}\in\mathcal{H}_{\tau}^{i}italic_t ∈ caligraphic_T , italic_i ∈ caligraphic_I , italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Let g~tisuperscriptsubscript~𝑔𝑡𝑖\tilde{g}_{t}^{i}over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be a strategy such that g^ti(hti)=ηΔ(𝒰ti)superscriptsubscript^𝑔𝑡𝑖superscriptsubscript𝑡𝑖𝜂Δsuperscriptsubscript𝒰𝑡𝑖\hat{g}_{t}^{i}(h_{t}^{i})=\eta\in\Delta(\mathcal{U}_{t}^{i})over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_η ∈ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), then

Jti((g~ti,gt+1:Ti);hti,Pi,ri)superscriptsubscript𝐽𝑡𝑖superscriptsubscript~𝑔𝑡𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖superscriptsubscript𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖\displaystyle\enspace\enspace\>J_{t}^{i}((\tilde{g}_{t}^{i},g_{t+1:T}^{i});h_{% t}^{i},P^{i},r^{i})italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( ( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (60)
=u~t(rti(hti,u~ti)+z~tiJt+1i(gt+1:Ti;(hti,z~ti),Pi,ri)Pti(z~ti|hti,u~ti))η(u~ti)absentsubscriptsubscript~𝑢𝑡superscriptsubscript𝑟𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖subscriptsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝐽𝑡1𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑧𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖superscriptsubscript𝑃𝑡𝑖conditionalsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖𝜂superscriptsubscript~𝑢𝑡𝑖\displaystyle=\sum_{\tilde{u}_{t}}\left(r_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i})% +\sum_{\tilde{z}_{t}^{i}}J_{t+1}^{i}(g_{t+1:T}^{i};(h_{t}^{i},\tilde{z}_{t}^{i% }),P^{i},r^{i})P_{t}^{i}(\tilde{z}_{t}^{i}|h_{t}^{i},\tilde{u}_{t}^{i})\right)% \eta(\tilde{u}_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) italic_η ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (61)
=u~t(rti(hti,u~ti)+z~tiVt+1i((hti,z~ti))Pti(z~ti|hti,u~ti))η(u~ti)absentsubscriptsubscript~𝑢𝑡superscriptsubscript𝑟𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖subscriptsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑉𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑃𝑡𝑖conditionalsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖𝜂superscriptsubscript~𝑢𝑡𝑖\displaystyle=\sum_{\tilde{u}_{t}}\left(r_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i})% +\sum_{\tilde{z}_{t}^{i}}V_{t+1}^{i}((h_{t}^{i},\tilde{z}_{t}^{i}))P_{t}^{i}(% \tilde{z}_{t}^{i}|h_{t}^{i},\tilde{u}_{t}^{i})\right)\eta(\tilde{u}_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) italic_η ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (62)
=u~tQti(hti,u^ti)η(u~ti),absentsubscriptsubscript~𝑢𝑡superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript^𝑢𝑡𝑖𝜂superscriptsubscript~𝑢𝑡𝑖\displaystyle=\sum_{\tilde{u}_{t}}Q_{t}^{i}(h_{t}^{i},\hat{u}_{t}^{i})\eta(% \tilde{u}_{t}^{i}),= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_η ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (63)

where we substitute (44) in (61), (59) in (62), and (58c) in (63).

By sequential rationality of g𝑔gitalic_g with respect to (P,r)𝑃𝑟(P,r)( italic_P , italic_r ), we have

Jti(gt:Ti;hti,Pi,ri)Jti((g~ti,gt+1:Ti);hti,Pi,ri),superscriptsubscript𝐽𝑡𝑖superscriptsubscript𝑔:𝑡𝑇𝑖superscriptsubscript𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖superscriptsubscript𝐽𝑡𝑖superscriptsubscript~𝑔𝑡𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖superscriptsubscript𝑡𝑖superscript𝑃𝑖superscript𝑟𝑖J_{t}^{i}(g_{t:T}^{i};h_{t}^{i},P^{i},r^{i})\geq J_{t}^{i}((\tilde{g}_{t}^{i},% g_{t+1:T}^{i});h_{t}^{i},P^{i},r^{i}),italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ≥ italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( ( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ,

which means that

u~tQti(hti,u~ti)gti(u~ti|hti)u~tQti(hti,u~ti)η(u~ti),subscriptsubscript~𝑢𝑡superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖subscriptsubscript~𝑢𝑡superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖𝜂superscriptsubscript~𝑢𝑡𝑖\displaystyle\sum_{\tilde{u}_{t}}Q_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i})g_{t}^{% i}(\tilde{u}_{t}^{i}|h_{t}^{i})\geq\sum_{\tilde{u}_{t}}Q_{t}^{i}(h_{t}^{i},% \tilde{u}_{t}^{i})\eta(\tilde{u}_{t}^{i}),∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ≥ ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_η ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (64)

for all ηΔ(𝒰ti)𝜂Δsuperscriptsubscript𝒰𝑡𝑖\eta\in\Delta(\mathcal{U}_{t}^{i})italic_η ∈ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for all t𝒯,i,hτiτiformulae-sequence𝑡𝒯formulae-sequence𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝜏𝑖t\in\mathcal{T},i\in\mathcal{I},h_{\tau}^{i}\in\mathcal{H}_{\tau}^{i}italic_t ∈ caligraphic_T , italic_i ∈ caligraphic_I , italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Hence g𝑔gitalic_g is sequentially rational given Q𝑄Qitalic_Q. Therefore (g,Q)𝑔𝑄(g,Q)( italic_g , italic_Q ) is a sequential equilibrium under Definition 12.

Step 3: “Model-free” SE version 1 (Definition 12) \Rightarrow “Model-free” SE version 2 (Definition 13)

Let (g,Q)𝑔𝑄(g,Q)( italic_g , italic_Q ) be a sequential equilibrium under Definition 12 and let (g(n),Q(n))superscript𝑔𝑛superscript𝑄𝑛(g^{(n)},Q^{(n)})( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) satisfies conditions (1)-(3) of full consistency in Definition 12. Then Q(n),isuperscript𝑄𝑛𝑖Q^{(n),i}italic_Q start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT satisfies

QT(n),i(hTi,uTi)superscriptsubscript𝑄𝑇𝑛𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle Q_{T}^{(n),i}(h_{T}^{i},u_{T}^{i})italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g(n),i[RTi|hTi,uTi],absentsuperscript𝔼superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑇𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle=\mathbb{E}^{g^{(n),-i}}[R_{T}^{i}|h_{T}^{i},u_{T}^{i}],= blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] , (65a)
Vt(n),i(hti)superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript𝑡𝑖\displaystyle V_{t}^{(n),i}(h_{t}^{i})italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=u~tiQt(n),i(hti,u~ti)gt(n),i(u~ti|hti),t𝒯,formulae-sequenceassignabsentsubscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑔𝑡𝑛𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖for-all𝑡𝒯\displaystyle:=\sum_{\tilde{u}_{t}^{i}}Q_{t}^{(n),i}(h_{t}^{i},\tilde{u}_{t}^{% i})g_{t}^{(n),i}(\tilde{u}_{t}^{i}|h_{t}^{i}),\quad\forall t\in\mathcal{T},:= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T , (65b)
Qt(n),i(hti,uti)superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle Q_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g(n),i[Rti|hti,uti]absentsuperscript𝔼superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=\mathbb{E}^{g^{(n),-i}}[R_{t}^{i}|h_{t}^{i},u_{t}^{i}]= blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ]
+z~tiVt+1(n),i((hti,z~ti))g(n),i(z~ti|hti,uti),t𝒯\{T},subscriptsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑉𝑡1𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑧𝑡𝑖superscriptsuperscript𝑔𝑛𝑖conditionalsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖for-all𝑡\𝒯𝑇\displaystyle+\sum_{\tilde{z}_{t}^{i}}V_{t+1}^{(n),i}((h_{t}^{i},\tilde{z}_{t}% ^{i}))\mathbb{P}^{g^{(n),-i}}(\tilde{z}_{t}^{i}|h_{t}^{i},u_{t}^{i}),\quad% \forall t\in\mathcal{T}\backslash\{T\},+ ∑ start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T \ { italic_T } , (65c)

and Q(n)Qsuperscript𝑄𝑛𝑄Q^{(n)}\rightarrow Qitalic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_Q as n𝑛n\rightarrow\inftyitalic_n → ∞. Set

Q^τ(n),i(hτi,uτi)=𝔼g(n),i[Rτi|hτi,uτi]+maxg~τ+1:Ti𝔼g~τ+1:Ti,g(n),i[t=τ+1TRti|hτi,uτi],superscriptsubscript^𝑄𝜏𝑛𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscript𝔼superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript~𝑔:𝜏1𝑇𝑖superscript𝔼superscriptsubscript~𝑔:𝜏1𝑇𝑖superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑡𝜏1𝑇superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle\hat{Q}_{\tau}^{(n),i}(h_{\tau}^{i},u_{\tau}^{i})=\mathbb{E}^{g^{% (n),-i}}[R_{\tau}^{i}|h_{\tau}^{i},u_{\tau}^{i}]+\underset{\tilde{g}_{\tau+1:T% }^{i}}{\max}~{}\mathbb{E}^{\tilde{g}_{\tau+1:T}^{i},g^{(n),-i}}\left[\sum_{t=% \tau+1}^{T}R_{t}^{i}\Big{|}h_{\tau}^{i},u_{\tau}^{i}\right],over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] + start_UNDERACCENT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_max end_ARG blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] , (66)

for each τ𝒯,hτiτi,uτi𝒰τiformulae-sequence𝜏𝒯formulae-sequencesuperscriptsubscript𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript𝒰𝜏𝑖\tau\in\mathcal{T},h_{\tau}^{i}\in\mathcal{H}_{\tau}^{i},u_{\tau}^{i}\in% \mathcal{U}_{\tau}^{i}italic_τ ∈ caligraphic_T , italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Then Q^(n),isuperscript^𝑄𝑛𝑖\hat{Q}^{(n),i}over^ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT satisfies the recurrence relation

Q^T(n),i(hTi,uTi)superscriptsubscript^𝑄𝑇𝑛𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle\hat{Q}_{T}^{(n),i}(h_{T}^{i},u_{T}^{i})over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g(n),i[RTi|hTi,uTi],absentsuperscript𝔼superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑇𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle=\mathbb{E}^{g^{(n),-i}}[R_{T}^{i}|h_{T}^{i},u_{T}^{i}],= blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] , (67a)
V^t(n),i(hti)superscriptsubscript^𝑉𝑡𝑛𝑖superscriptsubscript𝑡𝑖\displaystyle\hat{V}_{t}^{(n),i}(h_{t}^{i})over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=maxu~tiQ^t(n),i(hti,u~ti),t𝒯,formulae-sequenceassignabsentsubscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript^𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖for-all𝑡𝒯\displaystyle:=\max_{\tilde{u}_{t}^{i}}\hat{Q}_{t}^{(n),i}(h_{t}^{i},\tilde{u}% _{t}^{i}),\quad\forall t\in\mathcal{T},:= roman_max start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T , (67b)
Q^t(n),i(hti,uti)superscriptsubscript^𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\hat{Q}_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g(n),i[Rti|hti,uti]absentsuperscript𝔼superscript𝑔𝑛𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=\mathbb{E}^{g^{(n),-i}}[R_{t}^{i}|h_{t}^{i},u_{t}^{i}]= blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ]
+z~tiV^t+1(n),i((hti,z~ti))g(n),i(z~ti|hti,uti),t𝒯\{T}.subscriptsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript^𝑉𝑡1𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑧𝑡𝑖superscriptsuperscript𝑔𝑛𝑖conditionalsuperscriptsubscript~𝑧𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖for-all𝑡\𝒯𝑇\displaystyle+\sum_{\tilde{z}_{t}^{i}}\hat{V}_{t+1}^{(n),i}((h_{t}^{i},\tilde{% z}_{t}^{i}))\mathbb{P}^{g^{(n),-i}}(\tilde{z}_{t}^{i}|h_{t}^{i},u_{t}^{i}),% \quad\forall t\in\mathcal{T}\backslash\{T\}.+ ∑ start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T \ { italic_T } . (67c)

Claim: Q^t(n)Qtisuperscriptsubscript^QtnsuperscriptsubscriptQti\hat{Q}_{t}^{(n)}\rightarrow Q_{t}^{i}over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT as nnn\rightarrow\inftyitalic_n → ∞.

Given the claim, we have (g(n),Q^(n))superscript𝑔𝑛superscript^𝑄𝑛(g^{(n)},\hat{Q}^{(n)})( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , over^ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) satisfying conditions (1)(2’)(3) of full consistency in Definition 13. Therefore (g,Q)𝑔𝑄(g,Q)( italic_g , italic_Q ) is also a sequential equilibrium under Definition 13, and we complete this part of the proof.

Proof of Claim: By induction on time t𝒯t𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T.

Induction Base: Observe that Q^T(n)=QT(n)superscriptsubscript^QTnsuperscriptsubscriptQTn\hat{Q}_{T}^{(n)}=Q_{T}^{(n)}over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT by construction. Since QT(n)QTsuperscriptsubscriptQTnsubscriptQTQ_{T}^{(n)}\rightarrow Q_{T}italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT we also have Q^T(n)QTsuperscriptsubscript^QTnsubscriptQT\hat{Q}_{T}^{(n)}\rightarrow Q_{T}over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

Induction Step: Suppose that the result is true for time tttitalic_t. We prove it for time t1t1t-1italic_t - 1.

By induction hypothesis and g(n)gsuperscript𝑔𝑛𝑔g^{(n)}\rightarrow gitalic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_g, we have

V^t(n),i(hti)=superscriptsubscript^𝑉𝑡𝑛𝑖superscriptsubscript𝑡𝑖absent\displaystyle\hat{V}_{t}^{(n),i}(h_{t}^{i})=over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = maxu~tiQ^t(n),i(hti,u~ti)nmaxu~tiQti(hti,u~ti).𝑛subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript^𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖\displaystyle\max_{\tilde{u}_{t}^{i}}\hat{Q}_{t}^{(n),i}(h_{t}^{i},\tilde{u}_{% t}^{i})\xrightarrow{n\rightarrow\infty}\max_{\tilde{u}_{t}^{i}}Q_{t}^{i}(h_{t}% ^{i},\tilde{u}_{t}^{i}).roman_max start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_ARROW start_OVERACCENT italic_n → ∞ end_OVERACCENT → end_ARROW roman_max start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (68)

Since Q(n)Qsuperscript𝑄𝑛𝑄Q^{(n)}\rightarrow Qitalic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_Q and g(n)gsuperscript𝑔𝑛𝑔g^{(n)}\rightarrow gitalic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_g, we have

Vt(n),i(hti)=superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript𝑡𝑖absent\displaystyle V_{t}^{(n),i}(h_{t}^{i})=italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = u~tiQt(n),i(hti,u~ti)gt(n),i(u~ti|hti)subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑔𝑡𝑛𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\sum_{\tilde{u}_{t}^{i}}Q_{t}^{(n),i}(h_{t}^{i},\tilde{u}_{t}^{i}% )g_{t}^{(n),i}(\tilde{u}_{t}^{i}|h_{t}^{i})∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (69)
n𝑛\displaystyle\xrightarrow{n\rightarrow\infty}start_ARROW start_OVERACCENT italic_n → ∞ end_OVERACCENT → end_ARROW u~tiQti(hti,u~ti)gti(u~ti|hti)=:Vti(hti).\displaystyle\sum_{\tilde{u}_{t}^{i}}Q_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i})g_{% t}^{i}(\tilde{u}_{t}^{i}|h_{t}^{i})=:V_{t}^{i}(h_{t}^{i}).∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = : italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (70)

Since g𝑔gitalic_g is sequentially rational given Q𝑄Qitalic_Q, we have

u~tiQti(hti,u~ti)gti(u~ti|hti)subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\sum_{\tilde{u}_{t}^{i}}Q_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i})g_{% t}^{i}(\tilde{u}_{t}^{i}|h_{t}^{i})∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =maxu~tiQti(hti,u~ti).absentsubscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖\displaystyle=\max_{\tilde{u}_{t}^{i}}Q_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i}).= roman_max start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (71)

Combining (68)(70)(71) we have V^t(n),i(hti)Vti(hti)superscriptsubscript^𝑉𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑉𝑡𝑖superscriptsubscript𝑡𝑖\hat{V}_{t}^{(n),i}(h_{t}^{i})\rightarrow V_{t}^{i}(h_{t}^{i})over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) → italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for all htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Since tisuperscriptsubscript𝑡𝑖\mathcal{H}_{t}^{i}caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a finite set, we have

maxh~ti|V^t(n),i(h~ti)Vt(n),i(h~ti)|n0.𝑛subscriptsuperscriptsubscript~𝑡𝑖superscriptsubscript^𝑉𝑡𝑛𝑖superscriptsubscript~𝑡𝑖superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript~𝑡𝑖0\displaystyle\max_{\tilde{h}_{t}^{i}}|\hat{V}_{t}^{(n),i}(\tilde{h}_{t}^{i})-V% _{t}^{(n),i}(\tilde{h}_{t}^{i})|\xrightarrow{n\rightarrow\infty}0.roman_max start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | start_ARROW start_OVERACCENT italic_n → ∞ end_OVERACCENT → end_ARROW 0 . (72)

We then have

|Q^t1(n),i(hti,uti)Qt1(n),i(hti,uti)|superscriptsubscript^𝑄𝑡1𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑄𝑡1𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\enspace\enspace\>|\hat{Q}_{t-1}^{(n),i}(h_{t}^{i},u_{t}^{i})-Q_{% t-1}^{(n),i}(h_{t}^{i},u_{t}^{i})|| over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) - italic_Q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | (73)
=|z~t1i[V^t(n),i((ht1i,z~t1i))Vt(n),i((ht1i,z~t1i))]gt1(n),i(z~t1i|ht1i,ut1i)|\displaystyle=\left|\sum_{\tilde{z}_{t-1}^{i}}\left[\hat{V}_{t}^{(n),i}((h_{t-% 1}^{i},\tilde{z}_{t-1}^{i}))-V_{t}^{(n),i}((h_{t-1}^{i},\tilde{z}_{t-1}^{i}))% \right]\mathbb{P}^{g_{t-1}^{(n),-i}}(\tilde{z}_{t-1}^{i}|h_{t-1}^{i},u_{t-1}^{% i})\right|= | ∑ start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) - italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ] blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | (74)
maxz~t1i|V^t(n),i((ht1i,z~t1i))Vt(n),i((ht1i,z~t1i))|n0,absentsubscriptsuperscriptsubscript~𝑧𝑡1𝑖superscriptsubscript^𝑉𝑡𝑛𝑖superscriptsubscript𝑡1𝑖superscriptsubscript~𝑧𝑡1𝑖superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript𝑡1𝑖superscriptsubscript~𝑧𝑡1𝑖𝑛0\displaystyle\leq\max_{\tilde{z}_{t-1}^{i}}|\hat{V}_{t}^{(n),i}((h_{t-1}^{i},% \tilde{z}_{t-1}^{i}))-V_{t}^{(n),i}((h_{t-1}^{i},\tilde{z}_{t-1}^{i}))|% \xrightarrow{n\rightarrow\infty}0,≤ roman_max start_POSTSUBSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) - italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( ( italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) | start_ARROW start_OVERACCENT italic_n → ∞ end_OVERACCENT → end_ARROW 0 , (75)

where we substitute (65c)(67c) in (74). Since Qt1(n),i(hti,uti)Qt1i(hti,uti)superscriptsubscript𝑄𝑡1𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑄𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖Q_{t-1}^{(n),i}(h_{t}^{i},u_{t}^{i})\rightarrow Q_{t-1}^{i}(h_{t}^{i},u_{t}^{i})italic_Q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) → italic_Q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), we conclude that Q^t1(n),i(hti,uti)Qt1i(hti,uti)superscriptsubscript^𝑄𝑡1𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑄𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\hat{Q}_{t-1}^{(n),i}(h_{t}^{i},u_{t}^{i})\rightarrow Q_{t-1}^{i}(h_{t}^{i},u_% {t}^{i})over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) → italic_Q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), establishing the induction step.

Step 4: “Model-free” SE version 2 (Definition 13) \Rightarrow Classical SE (Definition 14)

Let (g,Q)𝑔𝑄(g,Q)( italic_g , italic_Q ) be a sequential equilibrium under Definition 13 and let (g(n),Q^(n))superscript𝑔𝑛superscript^𝑄𝑛(g^{(n)},\hat{Q}^{(n)})( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , over^ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) satisfies conditions (1)(2’)(3) of full consistency in Definition 13.

Define the beliefs μ(n)superscript𝜇𝑛\mu^{(n)}italic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT on the nodes of the extensive-form game through μ(n)(oti|hti)=g(n)(oti|hti)superscript𝜇𝑛conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖superscriptsuperscript𝑔𝑛conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\mu^{(n)}(o_{t}^{i}|h_{t}^{i})=\mathbb{P}^{g^{(n)}}(o_{t}^{i}|h_{t}^{i})italic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). By taking subsequences, without lost of generality, assume that μ(n)μsuperscript𝜇𝑛𝜇\mu^{(n)}\rightarrow\muitalic_μ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_μ.

Let g^tisuperscriptsubscript^𝑔𝑡𝑖\hat{g}_{t}^{i}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be an arbitrary strategy, then by condition (2’) of Definition 13, we can write

u~tiQ^t(n),i(hti,u~ti)g^ti(u~ti|hti)=maxg~t+1:Tioti𝔼g^ti,g~t+1:Ti,gt(n),>i,gt+1:T(n),i[Λti(OT+1)|oti]μt(n),i(oti|hti).subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript^𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript^𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖subscriptsuperscriptsubscript~𝑔:𝑡1𝑇𝑖subscriptsuperscriptsubscript𝑜𝑡𝑖superscript𝔼superscriptsubscript^𝑔𝑡𝑖superscriptsubscript~𝑔:𝑡1𝑇𝑖superscriptsubscript𝑔𝑡𝑛absent𝑖superscriptsubscript𝑔:𝑡1𝑇𝑛𝑖delimited-[]conditionalsuperscriptsubscriptΛ𝑡𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑛𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\begin{split}&\enspace\enspace\>\sum_{\tilde{u}_{t}^{i}}\hat{Q}_{t}^{(n),i}(h_% {t}^{i},\tilde{u}_{t}^{i})\hat{g}_{t}^{i}(\tilde{u}_{t}^{i}|h_{t}^{i})\\ &=\max_{\tilde{g}_{t+1:T}^{i}}\sum_{o_{t}^{i}}\mathbb{E}^{\hat{g}_{t}^{i},% \tilde{g}_{t+1:T}^{i},g_{t}^{(n),>i},g_{t+1:T}^{(n),-i}}[\Lambda_{t}^{i}(O_{T+% 1})|o_{t}^{i}]\ \mu_{t}^{(n),i}(o_{t}^{i}|h_{t}^{i}).\end{split}start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_max start_POSTSUBSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . end_CELL end_ROW (76)

For each otisuperscriptsubscript𝑜𝑡𝑖o_{t}^{i}italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, 𝔼g~ti,g~t+1:T[Λti(Ot+1)|oti]superscript𝔼superscriptsubscript~𝑔𝑡absent𝑖subscript~𝑔:𝑡1𝑇delimited-[]conditionalsuperscriptsubscriptΛ𝑡𝑖subscript𝑂𝑡1superscriptsubscript𝑜𝑡𝑖\mathbb{E}^{\tilde{g}_{t}^{\geq i},\tilde{g}_{t+1:T}}[\Lambda_{t}^{i}(O_{t+1})% |o_{t}^{i}]blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ≥ italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] is continuous in (g~ti,g~t+1:T)superscriptsubscript~𝑔𝑡absent𝑖subscript~𝑔:𝑡1𝑇(\tilde{g}_{t}^{\geq i},\tilde{g}_{t+1:T})( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ≥ italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT ) since it is the sum of product of components of (g~ti,g~t+1:T)superscriptsubscript~𝑔𝑡absent𝑖subscript~𝑔:𝑡1𝑇(\tilde{g}_{t}^{\geq i},\tilde{g}_{t+1:T})( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ≥ italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT ). Therefore,

oti𝔼g^ti,g~t+1:Ti,gt(n),>i,gt+1:T(n),i[Λti(Ot+1)|oti]μt(n),i(oti|hti)subscriptsuperscriptsubscript𝑜𝑡𝑖superscript𝔼superscriptsubscript^𝑔𝑡𝑖superscriptsubscript~𝑔:𝑡1𝑇𝑖superscriptsubscript𝑔𝑡𝑛absent𝑖superscriptsubscript𝑔:𝑡1𝑇𝑛𝑖delimited-[]conditionalsuperscriptsubscriptΛ𝑡𝑖subscript𝑂𝑡1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑛𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\sum_{o_{t}^{i}}\mathbb{E}^{\hat{g}_{t}^{i},\tilde{g}_{t+1:T}^{i}% ,g_{t}^{(n),>i},g_{t+1:T}^{(n),-i}}[\Lambda_{t}^{i}(O_{t+1})|o_{t}^{i}]\ \mu_{% t}^{(n),i}(o_{t}^{i}|h_{t}^{i})∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT )
n𝑛\displaystyle\xrightarrow{n\rightarrow\infty}start_ARROW start_OVERACCENT italic_n → ∞ end_OVERACCENT → end_ARROW oti𝔼g^ti,g~t+1:Ti,gt>i,gt+1:Ti[Λti(Ot+1)|oti]μti(oti|hti),subscriptsuperscriptsubscript𝑜𝑡𝑖superscript𝔼superscriptsubscript^𝑔𝑡𝑖superscriptsubscript~𝑔:𝑡1𝑇𝑖superscriptsubscript𝑔𝑡absent𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖delimited-[]conditionalsuperscriptsubscriptΛ𝑡𝑖subscript𝑂𝑡1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\sum_{o_{t}^{i}}\mathbb{E}^{\hat{g}_{t}^{i},\tilde{g}_{t+1:T}^{i}% ,g_{t}^{>i},g_{t+1:T}^{-i}}[\Lambda_{t}^{i}(O_{t+1})|o_{t}^{i}]\ \mu_{t}^{i}(o% _{t}^{i}|h_{t}^{i}),∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (77)

for each behavioral straetegy g~t+1:Tisuperscriptsubscript~𝑔:𝑡1𝑇𝑖\tilde{g}_{t+1:T}^{i}over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Applying Berge’s Maximum Theorem [49], and taking the limit on both sides of (76), we obtain

u~tiQti(hti,u~ti)g^ti(u~ti|hti)=maxg~t+1:Tioti𝔼g^ti,g~t+1:Ti,gt>i,gt+1:Ti[Λti(OT+1)|oti]μti(oti|hti),subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript^𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖subscriptsuperscriptsubscript~𝑔:𝑡1𝑇𝑖subscriptsuperscriptsubscript𝑜𝑡𝑖superscript𝔼superscriptsubscript^𝑔𝑡𝑖superscriptsubscript~𝑔:𝑡1𝑇𝑖superscriptsubscript𝑔𝑡absent𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖delimited-[]conditionalsuperscriptsubscriptΛ𝑡𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\sum_{\tilde{u}_{t}^{i}}Q_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i})~{}% \hat{g}_{t}^{i}(\tilde{u}_{t}^{i}|h_{t}^{i})=\max_{\tilde{g}_{t+1:T}^{i}}\sum_% {o_{t}^{i}}\mathbb{E}^{\hat{g}_{t}^{i},\tilde{g}_{t+1:T}^{i},g_{t}^{>i},g_{t+1% :T}^{-i}}[\Lambda_{t}^{i}(O_{T+1})|o_{t}^{i}]~{}\mu_{t}^{i}(o_{t}^{i}|h_{t}^{i% }),∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = roman_max start_POSTSUBSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (78)

for all t𝒯,i,htitiformulae-sequence𝑡𝒯formulae-sequence𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖t\in\mathcal{T},i\in\mathcal{I},h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_t ∈ caligraphic_T , italic_i ∈ caligraphic_I , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and all behavioral strategy g^tisuperscriptsubscript^𝑔𝑡𝑖\hat{g}_{t}^{i}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Sequential rationality of g𝑔gitalic_g to Q𝑄Qitalic_Q means that

gtiargmaxg^tiu~tiQti(hti,u~ti)g^ti(u~ti|hti)=argmaxg^timaxg~t+1:Tioti𝔼g^ti,g~t+1:Ti,gt>i,gt+1:Ti[Λti(OT+1)|oti]μti(oti|hti),superscriptsubscript𝑔𝑡𝑖superscriptsubscript^𝑔𝑡𝑖subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript^𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript^𝑔𝑡𝑖subscriptsuperscriptsubscript~𝑔:𝑡1𝑇𝑖subscriptsuperscriptsubscript𝑜𝑡𝑖superscript𝔼superscriptsubscript^𝑔𝑡𝑖superscriptsubscript~𝑔:𝑡1𝑇𝑖superscriptsubscript𝑔𝑡absent𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖delimited-[]conditionalsuperscriptsubscriptΛ𝑡𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖\begin{split}g_{t}^{i}&\in\underset{\hat{g}_{t}^{i}}{\arg\max}~{}\sum_{\tilde{% u}_{t}^{i}}Q_{t}^{i}(h_{t}^{i},\tilde{u}_{t}^{i})~{}\hat{g}_{t}^{i}(\tilde{u}_% {t}^{i}|h_{t}^{i})\\ &=\underset{\hat{g}_{t}^{i}}{\arg\max}~{}\max_{\tilde{g}_{t+1:T}^{i}}\sum_{o_{% t}^{i}}\mathbb{E}^{\hat{g}_{t}^{i},\tilde{g}_{t+1:T}^{i},g_{t}^{>i},g_{t+1:T}^% {-i}}[\Lambda_{t}^{i}(O_{T+1})|o_{t}^{i}]~{}\mu_{t}^{i}(o_{t}^{i}|h_{t}^{i}),% \end{split}start_ROW start_CELL italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_CELL start_CELL ∈ start_UNDERACCENT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = start_UNDERACCENT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG roman_max start_POSTSUBSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , end_CELL end_ROW (79)

for all t𝒯,iformulae-sequence𝑡𝒯𝑖t\in\mathcal{T},i\in\mathcal{I}italic_t ∈ caligraphic_T , italic_i ∈ caligraphic_I, and all htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Recall that the node Otisuperscriptsubscript𝑂𝑡𝑖O_{t}^{i}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT uniquely determines (X1,W1:t1,U1:t1)subscript𝑋1subscript𝑊:1𝑡1subscript𝑈:1𝑡1(X_{1},W_{1:t-1},U_{1:t-1})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ). Therefore, the instantaneous rewards Rτisuperscriptsubscript𝑅𝜏𝑖R_{\tau}^{i}italic_R start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for τt1𝜏𝑡1\tau\leq t-1italic_τ ≤ italic_t - 1 are uniquely determined by Otisuperscriptsubscript𝑂𝑡𝑖O_{t}^{i}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT as well. For τt1𝜏𝑡1\tau\leq t-1italic_τ ≤ italic_t - 1, let rτisuperscriptsubscript𝑟𝜏𝑖r_{\tau}^{i}italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be realizations of Rτisuperscriptsubscript𝑅𝜏𝑖R_{\tau}^{i}italic_R start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT under Oti=otisuperscriptsubscript𝑂𝑡𝑖superscriptsubscript𝑜𝑡𝑖O_{t}^{i}=o_{t}^{i}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Recall that ΛisuperscriptΛ𝑖\Lambda^{i}roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the total reward function and ΛtisuperscriptsubscriptΛ𝑡𝑖\Lambda_{t}^{i}roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the reward-to-go function starting with (and including) time t𝑡titalic_t. We have 𝔼g^ti,g~t+1:Ti,gt>i,gt+1:Ti[Λi(OT+1)Λti(OT+1)|oti]=τ=1t1rτisuperscript𝔼superscriptsubscript^𝑔𝑡𝑖superscriptsubscript~𝑔:𝑡1𝑇𝑖superscriptsubscript𝑔𝑡absent𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖delimited-[]superscriptΛ𝑖subscript𝑂𝑇1conditionalsuperscriptsubscriptΛ𝑡𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜏1𝑡1superscriptsubscript𝑟𝜏𝑖\mathbb{E}^{\hat{g}_{t}^{i},\tilde{g}_{t+1:T}^{i},g_{t}^{>i},g_{t+1:T}^{-i}}[% \Lambda^{i}(O_{T+1})-\Lambda_{t}^{i}(O_{T+1})|o_{t}^{i}]=\sum_{\tau=1}^{t-1}r_% {\tau}^{i}blackboard_E start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) - roman_Λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT to be independent of the strategy profile. Therefore we have

gtiargmaxg^timaxg~t+1:Tioti𝔼g^ti,g~t+1:Ti,gt>i,gt+1:Ti[Λi(OT+1)|oti]μti(oti|hti).superscriptsubscript𝑔𝑡𝑖superscriptsubscript^𝑔𝑡𝑖subscriptsuperscriptsubscript~𝑔:𝑡1𝑇𝑖subscriptsuperscriptsubscript𝑜𝑡𝑖superscript𝔼superscriptsubscript^𝑔𝑡𝑖superscriptsubscript~𝑔:𝑡1𝑇𝑖superscriptsubscript𝑔𝑡absent𝑖superscriptsubscript𝑔:𝑡1𝑇𝑖delimited-[]conditionalsuperscriptΛ𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝜇𝑡𝑖conditionalsuperscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑡𝑖g_{t}^{i}\in\underset{\hat{g}_{t}^{i}}{\arg\max}~{}\max_{\tilde{g}_{t+1:T}^{i}% }\sum_{o_{t}^{i}}\mathbb{E}^{\hat{g}_{t}^{i},\tilde{g}_{t+1:T}^{i},g_{t}^{>i},% g_{t+1:T}^{-i}}[\Lambda^{i}(O_{T+1})|o_{t}^{i}]~{}\mu_{t}^{i}(o_{t}^{i}|h_{t}^% {i}).italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ start_UNDERACCENT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG roman_max start_POSTSUBSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (80)

Fixing hτisuperscriptsubscript𝜏𝑖h_{\tau}^{i}italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, the problem of optimizing

Jτi(g~τ:Ti;hτi,μτi):=oτi𝔼g~τ:Ti,gτ>i,gτ+1:Ti[Λi(OT+1)|oτi]μτi(oτi|hτi),assignsuperscriptsubscript𝐽𝜏𝑖superscriptsubscript~𝑔:𝜏𝑇𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝜇𝜏𝑖subscriptsuperscriptsubscript𝑜𝜏𝑖superscript𝔼superscriptsubscript~𝑔:𝜏𝑇𝑖superscriptsubscript𝑔𝜏absent𝑖superscriptsubscript𝑔:𝜏1𝑇𝑖delimited-[]conditionalsuperscriptΛ𝑖subscript𝑂𝑇1superscriptsubscript𝑜𝜏𝑖superscriptsubscript𝜇𝜏𝑖conditionalsuperscriptsubscript𝑜𝜏𝑖superscriptsubscript𝜏𝑖J_{\tau}^{i}(\tilde{g}_{\tau:T}^{i};h_{\tau}^{i},\mu_{\tau}^{i}):=\sum_{o_{% \tau}^{i}}\mathbb{E}^{\tilde{g}_{\tau:T}^{i},g_{\tau}^{>i},g_{\tau+1:T}^{-i}}[% \Lambda^{i}(O_{T+1})|o_{\tau}^{i}]~{}\mu_{\tau}^{i}(o_{\tau}^{i}|h_{\tau}^{i}),italic_J start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) := ∑ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) | italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (81)

over all g~τ:Tisuperscriptsubscript~𝑔:𝜏𝑇𝑖\tilde{g}_{\tau:T}^{i}over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a POMDP problem with

  • Timestamps T~={τ,τ+1,,T,T+1}~𝑇𝜏𝜏1𝑇𝑇1\tilde{T}=\{\tau,\tau+1,\cdots,T,T+1\}over~ start_ARG italic_T end_ARG = { italic_τ , italic_τ + 1 , ⋯ , italic_T , italic_T + 1 };

  • State process (Oti)t=τT(OT+1)superscriptsubscriptsuperscriptsubscript𝑂𝑡𝑖𝑡𝜏𝑇subscript𝑂𝑇1(O_{t}^{i})_{t=\tau}^{T}\cup(O_{T+1})( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∪ ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT );

  • Control actions (Uti)t=τTsuperscriptsubscriptsuperscriptsubscript𝑈𝑡𝑖𝑡𝜏𝑇(U_{t}^{i})_{t=\tau}^{T}( italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT;

  • Initial state distribution μτi(hτi)Δ(𝒪τi)superscriptsubscript𝜇𝜏𝑖superscriptsubscript𝜏𝑖Δsuperscriptsubscript𝒪𝜏𝑖\mu_{\tau}^{i}(h_{\tau}^{i})\in\Delta(\mathcal{O}_{\tau}^{i})italic_μ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ∈ roman_Δ ( caligraphic_O start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT );

  • State transition kernel gt>i,gt+1<i(ot+1i|oti,uti)superscriptsuperscriptsubscript𝑔𝑡absent𝑖superscriptsubscript𝑔𝑡1absent𝑖conditionalsuperscriptsubscript𝑜𝑡1𝑖superscriptsubscript𝑜𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathbb{P}^{g_{t}^{>i},g_{t+1}^{<i}}(o_{t+1}^{i}|o_{t}^{i},u_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT < italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for t<T𝑡𝑇t<Titalic_t < italic_T and gT>i(oT+1|oTi,uTi)superscriptsuperscriptsubscript𝑔𝑇absent𝑖conditionalsubscript𝑜𝑇1superscriptsubscript𝑜𝑇𝑖superscriptsubscript𝑢𝑇𝑖\mathbb{P}^{g_{T}^{>i}}(o_{T+1}|o_{T}^{i},u_{T}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT > italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_o start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT | italic_o start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for t=T𝑡𝑇t=Titalic_t = italic_T;

  • Observation history: (Hti)t=τTsuperscriptsubscriptsuperscriptsubscript𝐻𝑡𝑖𝑡𝜏𝑇(H_{t}^{i})_{t=\tau}^{T}( italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT;

  • Instantaneous rewards are 0. Terminal reward is Λi(OT+1)superscriptΛ𝑖subscript𝑂𝑇1\Lambda^{i}(O_{T+1})roman_Λ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_O start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ).

The belief μ𝜇\muitalic_μ is fully consistent with g𝑔gitalic_g by construction. From standard results in game theory, we know that μt+1i(ht+1i)superscriptsubscript𝜇𝑡1𝑖superscriptsubscript𝑡1𝑖\mu_{t+1}^{i}(h_{t+1}^{i})italic_μ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) can be updated with Bayes rule from μti(hti)superscriptsubscript𝜇𝑡𝑖superscriptsubscript𝑡𝑖\mu_{t}^{i}(h_{t}^{i})italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) and g𝑔gitalic_g whenever applicable. Therefore, (μt)t=τTsuperscriptsubscriptsubscript𝜇𝑡𝑡𝜏𝑇(\mu_{t})_{t=\tau}^{T}( italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT represent the true beliefs of the state given observations in the above POMDP problem. Therefore, through standard control theory [23, Section 6.7], (80) is a sufficient condition for gt:Tisuperscriptsubscript𝑔:𝑡𝑇𝑖g_{t:T}^{i}italic_g start_POSTSUBSCRIPT italic_t : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT to be optimal for the above POMDP problem, which means that g𝑔gitalic_g is sequentially rational given μ𝜇\muitalic_μ.

Therefore we conclude that (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ) is a sequential equilibrium under Definition 14.

Appendix C Proofs for Sections 3 and 4

C.1 Proof of Lemma 1

Lemma C.1 (Lemma 1, restated).

If for all i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I and all Kisuperscript𝐾𝑖K^{-i}italic_K start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT-based strategy profiles ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, there exist functions (Φti,ρi)t𝒯subscriptsuperscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖𝑡𝒯(\Phi_{t}^{i,\rho^{-i}})_{t\in\mathcal{T}}( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT where Φti,ρi:𝒦tiΔ(𝒳t×𝒦ti):superscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖maps-tosuperscriptsubscript𝒦𝑡𝑖Δsubscript𝒳𝑡superscriptsubscript𝒦𝑡𝑖\Phi_{t}^{i,\rho^{-i}}\colon\mathcal{K}_{t}^{i}\mapsto\Delta(\mathcal{X}_{t}% \times\mathcal{K}_{t}^{-i})roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) such that

gi,ρi(xt,kti|hti)=Φti,ρi(xt,kti|kti),superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript𝑥𝑡conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖subscript𝑥𝑡conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖\mathbb{P}^{g^{i},\rho^{-i}}(x_{t},k_{t}^{-i}|h_{t}^{i})=\Phi_{t}^{i,\rho^{-i}% }(x_{t},k_{t}^{-i}|k_{t}^{i}),blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (82)

for all behavioral strategies gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T, and all htisuperscriptsubscript𝑡𝑖h_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT admissible under (gi,ρi)superscript𝑔𝑖superscript𝜌𝑖(g^{i},\rho^{-i})( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ), then K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT is mutually sufficient information.

Proof C.2.

Let gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be an arbitrary behavioral strategy for player i𝑖iitalic_i and ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT be any Kisuperscript𝐾𝑖K^{-i}italic_K start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT-based strategy profile. Let htisuperscriptsubscript𝑡𝑖h_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be admissible under (gi,ρi)superscript𝑔𝑖superscript𝜌𝑖(g^{i},\rho^{-i})( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ). We have

gi,ρi(x~t,u~ti|hti)superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\mathbb{P}^{g^{i},\rho^{-i}}(\tilde{x}_{t},\tilde{u}_{t}^{-i}|h_{% t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =h~tigi,ρi(u~ti|x~t,h~ti,hti,uti)gi,ρi(x~t,h~ti|hti,uti)absentsubscriptsuperscriptsubscript~𝑡𝑖superscriptsuperscript𝑔𝑖superscript𝜌𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖subscript~𝑥𝑡superscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=\sum_{\tilde{h}_{t}^{-i}}\mathbb{P}^{g^{i},\rho^{-i}}(\tilde{u}_% {t}^{-i}|\tilde{x}_{t},\tilde{h}_{t}^{-i},h_{t}^{i},u_{t}^{i})\mathbb{P}^{g^{i% },\rho^{-i}}(\tilde{x}_{t},\tilde{h}_{t}^{-i}|h_{t}^{i},u_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (83)
=h~ti(jiρtj(u~tj|k~tj))gi,ρi(x~t,h~ti|hti)absentsubscriptsuperscriptsubscript~𝑡𝑖subscriptproduct𝑗𝑖superscriptsubscript𝜌𝑡𝑗conditionalsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript~𝑘𝑡𝑗superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle=\sum_{\tilde{h}_{t}^{-i}}\left(\prod_{j\neq i}\rho_{t}^{j}(% \tilde{u}_{t}^{j}|\tilde{k}_{t}^{j})\right)\mathbb{P}^{g^{i},\rho^{-i}}(\tilde% {x}_{t},\tilde{h}_{t}^{-i}|h_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (84)
=k~ti(jiρtj(u~tj|k~tj))gi,ρi(x~t,k~ti|hti)absentsubscriptsuperscriptsubscript~𝑘𝑡𝑖subscriptproduct𝑗𝑖superscriptsubscript𝜌𝑡𝑗conditionalsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript~𝑘𝑡𝑗superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle=\sum_{\tilde{k}_{t}^{-i}}\left(\prod_{j\neq i}\rho_{t}^{j}(% \tilde{u}_{t}^{j}|\tilde{k}_{t}^{j})\right)\mathbb{P}^{g^{i},\rho^{-i}}(\tilde% {x}_{t},\tilde{k}_{t}^{-i}|h_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (85)
=k~ti(jiρtj(u~tj|k~tj))Φti,ρi(x~t,k~ti|kti),absentsubscriptsuperscriptsubscript~𝑘𝑡𝑖subscriptproduct𝑗𝑖superscriptsubscript𝜌𝑡𝑗conditionalsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript~𝑘𝑡𝑗superscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle=\sum_{\tilde{k}_{t}^{-i}}\left(\prod_{j\neq i}\rho_{t}^{j}(% \tilde{u}_{t}^{j}|\tilde{k}_{t}^{j})\right)\Phi_{t}^{i,\rho^{-i}}(\tilde{x}_{t% },\tilde{k}_{t}^{-i}|k_{t}^{i}),= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (86)

where in (83) we applied the Law of Total Probability. In (85) we combined the realizations of h~tisuperscriptsubscript~𝑡𝑖\tilde{h}_{t}^{-i}over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT corresponding to the same compressed information k~tisuperscriptsubscript~𝑘𝑡𝑖\tilde{k}_{t}^{-i}over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT. In the final equation, we used the condition of Lemma 1.

By the definition of the model, Zti=fti,Z(Xt,Ut,Wt)superscriptsubscript𝑍𝑡𝑖superscriptsubscript𝑓𝑡𝑖𝑍subscript𝑋𝑡subscript𝑈𝑡subscript𝑊𝑡Z_{t}^{i}=f_{t}^{i,Z}(X_{t},U_{t},W_{t})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_Z end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for some fixed function fti,Zsuperscriptsubscript𝑓𝑡𝑖𝑍f_{t}^{i,Z}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_Z end_POSTSUPERSCRIPT independent of the strategy profile. Since the compressed information can be sequentially updated as Kt+1i=ιt+1i(Kti,Zti)superscriptsubscript𝐾𝑡1𝑖superscriptsubscript𝜄𝑡1𝑖superscriptsubscript𝐾𝑡𝑖superscriptsubscript𝑍𝑡𝑖K_{t+1}^{i}=\iota_{t+1}^{i}(K_{t}^{i},Z_{t}^{i})italic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_ι start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), this means that we can write Kt+1i=ξti(Kti,Xt,Ut,Wt)superscriptsubscript𝐾𝑡1𝑖superscriptsubscript𝜉𝑡𝑖superscriptsubscript𝐾𝑡𝑖subscript𝑋𝑡subscript𝑈𝑡subscript𝑊𝑡K_{t+1}^{i}=\xi_{t}^{i}(K_{t}^{i},X_{t},U_{t},W_{t})italic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for some fixed function ξtisuperscriptsubscript𝜉𝑡𝑖\xi_{t}^{i}italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Since Wtsubscript𝑊𝑡W_{t}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a primitive random variable, we conclude that (kt+1i|kti,xt,ut)conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖subscript𝑥𝑡subscript𝑢𝑡\mathbb{P}(k_{t+1}^{i}|k_{t}^{i},x_{t},u_{t})blackboard_P ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is independent of any strategy profile. Therefore,

gi,ρi(kt+1i|hti,uti)superscriptsuperscript𝑔𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\quad~{}\mathbb{P}^{g^{i},\rho^{-i}}(k_{t+1}^{i}|h_{t}^{i},u_{t}^% {i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (87)
=x~t,u~ti(kt+1i|kti,x~t,(u~ti,uti))gi,ρi(x~t,u~ti|hti)absentsubscriptsubscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖subscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsuperscript𝑔𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle=\sum_{\tilde{x}_{t},\tilde{u}_{t}^{-i}}\mathbb{P}(k_{t+1}^{i}|k_% {t}^{i},\tilde{x}_{t},(\tilde{u}_{t}^{-i},u_{t}^{i}))\mathbb{P}^{g^{i},\rho^{-% i}}(\tilde{x}_{t},\tilde{u}_{t}^{-i}|h_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_P ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (88)
=x~t,u~ti[(kt+1i|kti,x~t,(u~ti,uti))k~ti(jiρtj(u~tj|k~tj))Φti,ρi(x~t,k~ti|kti)]absentsubscriptsubscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖delimited-[]conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖subscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑖subscriptsuperscriptsubscript~𝑘𝑡𝑖subscriptproduct𝑗𝑖superscriptsubscript𝜌𝑡𝑗conditionalsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript~𝑘𝑡𝑗superscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle=\sum_{\tilde{x}_{t},\tilde{u}_{t}^{-i}}\left[\mathbb{P}(k_{t+1}^% {i}|k_{t}^{i},\tilde{x}_{t},(\tilde{u}_{t}^{-i},u_{t}^{i}))\sum_{\tilde{k}_{t}% ^{-i}}\left(\prod_{j\neq i}\rho_{t}^{j}(\tilde{u}_{t}^{j}|\tilde{k}_{t}^{j})% \right)\Phi_{t}^{i,\rho^{-i}}(\tilde{x}_{t},\tilde{k}_{t}^{-i}|k_{t}^{i})\right]= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ blackboard_P ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ] (89)
=:Pti,ρi(kt+1i|kti,uti),\displaystyle=:P_{t}^{i,\rho^{-i}}(k_{t+1}^{i}|k_{t}^{i},u_{t}^{i}),= : italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (90)

for some function Pti,gisuperscriptsubscript𝑃𝑡𝑖superscript𝑔𝑖P_{t}^{i,g^{-i}}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, where in (88) we used the Law of Total Probability, and we substituted (86) in (89).

Since Rti=fti,R(Xt,Ut,Wt)superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑓𝑡𝑖𝑅subscript𝑋𝑡subscript𝑈𝑡subscript𝑊𝑡R_{t}^{i}=f_{t}^{i,R}(X_{t},U_{t},W_{t})italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_R end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for some fixed function fti,Rsuperscriptsubscript𝑓𝑡𝑖𝑅f_{t}^{i,R}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_R end_POSTSUPERSCRIPT and Wtsubscript𝑊𝑡W_{t}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a primitive random variable, we have 𝔼[Rti|Xt,Ut]𝔼delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖subscript𝑋𝑡subscript𝑈𝑡\mathbb{E}[R_{t}^{i}|X_{t},U_{t}]blackboard_E [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] to be independent of the strategy profile g𝑔gitalic_g. By an argument similar to the one that leads from (87) to (90) we obtain

𝔼gi,ρi[Rti|hti,uti]superscript𝔼superscript𝑔𝑖superscript𝜌𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\quad~{}\mathbb{E}^{g^{i},\rho^{-i}}[R_{t}^{i}|h_{t}^{i},u_{t}^{i}]blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] (91)
=x~t,u~ti[𝔼[Rti|x~t,(uti,u~ti)]k~ti(jiρtj(u~tj|k~tj))Φti,ρi(x~t,k~ti|kti)]absentsubscriptsubscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖delimited-[]𝔼delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖subscript~𝑥𝑡superscriptsubscript𝑢𝑡𝑖superscriptsubscript~𝑢𝑡𝑖subscriptsuperscriptsubscript~𝑘𝑡𝑖subscriptproduct𝑗𝑖superscriptsubscript𝜌𝑡𝑗conditionalsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript~𝑘𝑡𝑗superscriptsubscriptΦ𝑡𝑖superscript𝜌𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle=\sum_{\tilde{x}_{t},\tilde{u}_{t}^{-i}}\left[\mathbb{E}[R_{t}^{i% }|\tilde{x}_{t},(u_{t}^{i},\tilde{u}_{t}^{-i})]\sum_{\tilde{k}_{t}^{-i}}\left(% \prod_{j\neq i}\rho_{t}^{j}(\tilde{u}_{t}^{j}|\tilde{k}_{t}^{j})\right)\Phi_{t% }^{i,\rho^{-i}}(\tilde{x}_{t},\tilde{k}_{t}^{-i}|k_{t}^{i})\right]= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ blackboard_E [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) ] ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ] (92)
=:rti,ρi(kti,uti),\displaystyle=:r_{t}^{i,\rho^{-i}}(k_{t}^{i},u_{t}^{i}),= : italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (93)

for some function rii,ρisuperscriptsubscript𝑟𝑖𝑖superscript𝜌𝑖r_{i}^{i,\rho^{-i}}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. With (90) and (93), we have shown that K𝐾Kitalic_K satisfies Definition 4 and hence K𝐾Kitalic_K is MSI.

C.2 Proof of Theorem 2

Theorem 16 (Theorem 2, restated).

If K𝐾Kitalic_K is mutually sufficient information, then there exists at least one K𝐾Kitalic_K-based BNE.

Proof C.3.

The proof will proceed as follows: We first construct a best-response correspondence using stochastic control theory, and then we establish the existence of equilibria by applying Kakutani’s fixed-point theorem to this correspondence. For technical reasons, we first consider only behavioral strategies where each action has probability at least ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 of being played at each information set. We then take ϵitalic-ϵ\epsilonitalic_ϵ to zero.

Fixing a Kisuperscript𝐾𝑖K^{-i}italic_K start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT-based strategy profile ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, we first argue that Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a controlled Markov process controlled by player i𝑖iitalic_i’s action Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

From the definition of an information state (Definition 3) we know that

g~i,ρi(kt+1i|hti,uti)=Pti,ρi(kt+1i|kti,uti).superscriptsuperscript~𝑔𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑃𝑡𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\mathbb{P}^{\tilde{g}^{i},\rho^{-i}}(k_{t+1}^{i}|h_{t}^{i},u_{t}^% {i})=P_{t}^{i,\rho^{-i}}(k_{t+1}^{i}|k_{t}^{i},u_{t}^{i}).blackboard_P start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (94)

Since (K1:ti,U1:ti)superscriptsubscript𝐾:1𝑡𝑖superscriptsubscript𝑈:1𝑡𝑖(K_{1:t}^{i},U_{1:t}^{i})( italic_K start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is a function of (Hti,Uti)superscriptsubscript𝐻𝑡𝑖superscriptsubscript𝑈𝑡𝑖(H_{t}^{i},U_{t}^{i})( italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), by the smoothing property of conditional probability we have

g~i,ρi(kt+1i|k1:ti,u1:ti)=Pti,ρi(kt+1i|kti,uti).superscriptsuperscript~𝑔𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘:1𝑡𝑖superscriptsubscript𝑢:1𝑡𝑖superscriptsubscript𝑃𝑡𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\mathbb{P}^{\tilde{g}^{i},\rho^{-i}}(k_{t+1}^{i}|k_{1:t}^{i},u_{1% :t}^{i})=P_{t}^{i,\rho^{-i}}(k_{t+1}^{i}|k_{t}^{i},u_{t}^{i}).blackboard_P start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (95)

Therefore we have shown that Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a controlled Markov process controlled by player i𝑖iitalic_i’s action Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

From the definition of information state (Definition 3) we know that

𝔼g~i,ρi[Rti|kti,uti]=rti,ρi(kti,uti),superscript𝔼superscript~𝑔𝑖superscript𝜌𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑟𝑡𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\mathbb{E}^{\tilde{g}^{i},\rho^{-i}}\left[R_{t}^{i}|k_{t}^{i},u_{% t}^{i}\right]=r_{t}^{i,\rho^{-i}}(k_{t}^{i},u_{t}^{i}),blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (96)

for all (kti,uti)superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖(k_{t}^{i},u_{t}^{i})( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) admissible under (g~i,ρi)superscript~𝑔𝑖superscript𝜌𝑖(\tilde{g}^{i},\rho^{-i})( over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ).

Therefore, using the Law of Total Expectation we have

Ji(g~i,ρi)superscript𝐽𝑖superscript~𝑔𝑖superscript𝜌𝑖\displaystyle J^{i}(\tilde{g}^{i},\rho^{-i})italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) =𝔼g~i,ρi[t=1TRti]=𝔼g~i,ρi[t=1T𝔼g~i,ρi[Rti|Kti,Uti]]absentsuperscript𝔼superscript~𝑔𝑖superscript𝜌𝑖delimited-[]superscriptsubscript𝑡1𝑇superscriptsubscript𝑅𝑡𝑖superscript𝔼superscript~𝑔𝑖superscript𝜌𝑖delimited-[]superscriptsubscript𝑡1𝑇superscript𝔼superscript~𝑔𝑖superscript𝜌𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝐾𝑡𝑖superscriptsubscript𝑈𝑡𝑖\displaystyle=\mathbb{E}^{\tilde{g}^{i},\rho^{-i}}\left[\sum_{t=1}^{T}R_{t}^{i% }\right]=\mathbb{E}^{\tilde{g}^{i},\rho^{-i}}\left[\sum_{t=1}^{T}\mathbb{E}^{% \tilde{g}^{i},\rho^{-i}}\left[R_{t}^{i}|K_{t}^{i},U_{t}^{i}\right]\right]= blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ] (97)
=𝔼g~i,ρi[t=1Trti,ρi(Kti,Uti)].absentsuperscript𝔼superscript~𝑔𝑖superscript𝜌𝑖delimited-[]superscriptsubscript𝑡1𝑇superscriptsubscript𝑟𝑡𝑖superscript𝜌𝑖superscriptsubscript𝐾𝑡𝑖superscriptsubscript𝑈𝑡𝑖\displaystyle=\mathbb{E}^{\tilde{g}^{i},\rho^{-i}}\left[\sum_{t=1}^{T}r_{t}^{i% ,\rho^{-i}}(K_{t}^{i},U_{t}^{i})\right].= blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ] . (98)

By standard MDP theory, there exist Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategies ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT that maximize Ji(g~i,ρi)superscript𝐽𝑖superscript~𝑔𝑖superscript𝜌𝑖J^{i}(\tilde{g}^{i},\rho^{-i})italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) over all behavioral strategies g~isuperscript~𝑔𝑖\tilde{g}^{i}over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Furthermore, optimal Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategies can be found through dynamic programming.

Assume ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, let 𝒫ϵ,isuperscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon,i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT denote the set of Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategies for player i𝑖iitalic_i where each action uti𝒰tisuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝒰𝑡𝑖u_{t}^{i}\in\mathcal{U}_{t}^{i}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is chosen with probability at least ϵitalic-ϵ\epsilonitalic_ϵ at any information set. To endow 𝒫ϵ,isuperscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon,i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT with a topology, we consider it as a product of sets of distributions, i.e.

𝒫ϵ,i=t𝒯kti𝒦tiΔϵ(𝒰ti),superscript𝒫italic-ϵ𝑖subscriptproduct𝑡𝒯subscriptproductsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖superscriptΔitalic-ϵsuperscriptsubscript𝒰𝑡𝑖\displaystyle\mathcal{P}^{\epsilon,i}=\prod_{t\in\mathcal{T}}\prod_{k_{t}^{i}% \in\mathcal{K}_{t}^{i}}\Delta^{\epsilon}(\mathcal{U}_{t}^{i}),caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT = ∏ start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (99)

where

Δϵ(𝒰ti)={ηΔ(𝒰ti):η(uti)ϵuti𝒰ti}.superscriptΔitalic-ϵsuperscriptsubscript𝒰𝑡𝑖conditional-set𝜂Δsuperscriptsubscript𝒰𝑡𝑖𝜂superscriptsubscript𝑢𝑡𝑖italic-ϵfor-allsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝒰𝑡𝑖\displaystyle\Delta^{\epsilon}(\mathcal{U}_{t}^{i})=\{\eta\in\Delta(\mathcal{U% }_{t}^{i}):\eta(u_{t}^{i})\geq\epsilon~{}\forall u_{t}^{i}\in\mathcal{U}_{t}^{% i}\}.roman_Δ start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = { italic_η ∈ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) : italic_η ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ≥ italic_ϵ ∀ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } . (100)

Define 𝒫ϵ=i𝒫ϵ,isuperscript𝒫italic-ϵsubscriptproduct𝑖superscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon}=\prod_{i\in\mathcal{I}}\mathcal{P}^{\epsilon,i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT = ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT. Denote the set of all Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy profiles by 𝒫0superscript𝒫0\mathcal{P}^{0}caligraphic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.

For the rest of the proof, assume that ϵitalic-ϵ\epsilonitalic_ϵ is small enough such that Δϵ(𝒰ti)superscriptΔitalic-ϵsuperscriptsubscript𝒰𝑡𝑖\Delta^{\epsilon}(\mathcal{U}_{t}^{i})roman_Δ start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is non-empty for all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T and i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I.

For each t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T, i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I and kti𝒦tisuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖k_{t}^{i}\in\mathcal{K}_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, define the correspondence BRtϵ,i[kti]:𝒫ϵ,iΔϵ(𝒰ti):superscriptsubscriptBR𝑡italic-ϵ𝑖delimited-[]superscriptsubscript𝑘𝑡𝑖maps-tosuperscript𝒫italic-ϵ𝑖superscriptΔitalic-ϵsuperscriptsubscript𝒰𝑡𝑖\mathrm{BR}_{t}^{\epsilon,i}[k_{t}^{i}]:\mathcal{P}^{\epsilon,-i}\mapsto\Delta% ^{\epsilon}(\mathcal{U}_{t}^{i})roman_BR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT [ italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] : caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT ↦ roman_Δ start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) sequentially through

QTϵ,i(kTi,uTi;ρi)superscriptsubscript𝑄𝑇italic-ϵ𝑖superscriptsubscript𝑘𝑇𝑖superscriptsubscript𝑢𝑇𝑖superscript𝜌𝑖\displaystyle Q_{T}^{\epsilon,i}(k_{T}^{i},u_{T}^{i};\rho^{-i})italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) :=rTi,ρi(kTi,uTi),assignabsentsuperscriptsubscript𝑟𝑇𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle:=r_{T}^{i,\rho^{-i}}(k_{T}^{i},u_{T}^{i}),:= italic_r start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (101a)
BRtϵ,i[kti](ρi)superscriptsubscriptBR𝑡italic-ϵ𝑖delimited-[]superscriptsubscript𝑘𝑡𝑖superscript𝜌𝑖\displaystyle\mathrm{BR}_{t}^{\epsilon,i}[k_{t}^{i}](\rho^{-i})roman_BR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT [ italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ( italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) :=argmaxηΔϵ(𝒰ti)u~tiQtϵ,i(kti,u~ti;ρi)η(u~ti),assignabsent𝜂superscriptΔitalic-ϵsuperscriptsubscript𝒰𝑡𝑖subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡italic-ϵ𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscript𝜌𝑖𝜂superscriptsubscript~𝑢𝑡𝑖\displaystyle:=\underset{\eta\in\Delta^{\epsilon}(\mathcal{U}_{t}^{i})}{\arg% \max}~{}\sum_{\tilde{u}_{t}^{i}}Q_{t}^{\epsilon,i}(k_{t}^{i},\tilde{u}_{t}^{i}% ;\rho^{-i})\eta(\tilde{u}_{t}^{i}),:= start_UNDERACCENT italic_η ∈ roman_Δ start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG roman_arg roman_max end_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) italic_η ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (101b)
Vtϵ,i(kti;ρi)superscriptsubscript𝑉𝑡italic-ϵ𝑖superscriptsubscript𝑘𝑡𝑖superscript𝜌𝑖\displaystyle V_{t}^{\epsilon,i}(k_{t}^{i};\rho^{-i})italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) :=maxηΔϵ(𝒰ti)u~tiQtϵ,i(kti,u~ti;ρi)η(u~ti),assignabsentsubscript𝜂superscriptΔitalic-ϵsuperscriptsubscript𝒰𝑡𝑖subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡italic-ϵ𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscript𝜌𝑖𝜂superscriptsubscript~𝑢𝑡𝑖\displaystyle:=\max_{\eta\in\Delta^{\epsilon}(\mathcal{U}_{t}^{i})}\sum_{% \tilde{u}_{t}^{i}}Q_{t}^{\epsilon,i}(k_{t}^{i},\tilde{u}_{t}^{i};\rho^{-i})% \eta(\tilde{u}_{t}^{i}),:= roman_max start_POSTSUBSCRIPT italic_η ∈ roman_Δ start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) italic_η ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (101c)
Qt1ϵ,i(kt1i,ut1i;ρi)superscriptsubscript𝑄𝑡1italic-ϵ𝑖superscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑢𝑡1𝑖superscript𝜌𝑖\displaystyle Q_{t-1}^{\epsilon,i}(k_{t-1}^{i},u_{t-1}^{i};\rho^{-i})italic_Q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) :=rt1i,ρi(kt1i,ut1i)+assignabsentlimit-fromsuperscriptsubscript𝑟𝑡1𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑢𝑡1𝑖\displaystyle:=r_{t-1}^{i,\rho^{-i}}(k_{t-1}^{i},u_{t-1}^{i})+:= italic_r start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) + (101d)
+kti𝒦tisubscriptsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖\displaystyle+\sum_{k_{t}^{i}\in\mathcal{K}_{t}^{i}}+ ∑ start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT Vtϵ,i(kti;ρi)Pt1i,ρi(kti|kt1i,uti).superscriptsubscript𝑉𝑡italic-ϵ𝑖superscriptsubscript𝑘𝑡𝑖superscript𝜌𝑖superscriptsubscript𝑃𝑡1𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle V_{t}^{\epsilon,i}(k_{t}^{i};\rho^{-i})P_{t-1}^{i,\rho^{-i}}(k_{% t}^{i}|k_{t-1}^{i},u_{t}^{i}).italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) italic_P start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (101e)

Define BRϵ:𝒫ϵ𝒫ϵ:superscriptBRitalic-ϵmaps-tosuperscript𝒫italic-ϵsuperscript𝒫italic-ϵ\mathrm{BR}^{\epsilon}:\mathcal{P}^{\epsilon}\mapsto\mathcal{P}^{\epsilon}roman_BR start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT : caligraphic_P start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ↦ caligraphic_P start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT by

BRϵ(ρ)=it𝒯kti𝒦tiBRtϵ,i[kti](ρi).superscriptBRitalic-ϵ𝜌subscriptproduct𝑖subscriptproduct𝑡𝒯subscriptproductsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖superscriptsubscriptBR𝑡italic-ϵ𝑖delimited-[]superscriptsubscript𝑘𝑡𝑖superscript𝜌𝑖\displaystyle\mathrm{BR}^{\epsilon}(\rho)=\prod_{i\in\mathcal{I}}\prod_{t\in% \mathcal{T}}\prod_{k_{t}^{i}\in\mathcal{K}_{t}^{i}}\mathrm{BR}_{t}^{\epsilon,i% }[k_{t}^{i}](\rho^{-i}).roman_BR start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ( italic_ρ ) = ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_BR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT [ italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ( italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) . (102)

Claim:

  1. (a)

    Pti,ρi(kt+1i|kti,uti)superscriptsubscript𝑃𝑡𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖P_{t}^{i,\rho^{-i}}(k_{t+1}^{i}|k_{t}^{i},u_{t}^{i})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is continuous in ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT on 𝒫ϵ,isuperscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT for all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T and all kt+1i𝒦t+1i,kti𝒦ti,uti𝒰tiformulae-sequencesuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝒦𝑡1𝑖formulae-sequencesuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝒰𝑡𝑖k_{t+1}^{i}\in\mathcal{K}_{t+1}^{i},k_{t}^{i}\in\mathcal{K}_{t}^{i},u_{t}^{i}% \in\mathcal{U}_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

  2. (b)

    rti,ρi(kti,uti)superscriptsubscript𝑟𝑡𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖r_{t}^{i,\rho^{-i}}(k_{t}^{i},u_{t}^{i})italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is continuous in ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT on 𝒫ϵ,isuperscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT for all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T and all kti𝒦ti,uti𝒰tiformulae-sequencesuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝒰𝑡𝑖k_{t}^{i}\in\mathcal{K}_{t}^{i},u_{t}^{i}\in\mathcal{U}_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Given the claims we prove by induction that Qtϵ,i(kti,uti;ρi)superscriptsubscript𝑄𝑡italic-ϵ𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscript𝜌𝑖Q_{t}^{\epsilon,i}(k_{t}^{i},u_{t}^{i};\rho^{-i})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) is continuous in ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT on 𝒫ϵ,isuperscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT for each kti𝒦ti,uti𝒰tiformulae-sequencesuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝒰𝑡𝑖k_{t}^{i}\in\mathcal{K}_{t}^{i},u_{t}^{i}\in\mathcal{U}_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Induction Base: QTϵ,i(kTi,uTi;ρi)=rTi,ρi(kTi,uTi)superscriptsubscriptQTϵisuperscriptsubscriptkTisuperscriptsubscriptuTisuperscriptρisuperscriptsubscriptrTisuperscriptρisuperscriptsubscriptkTisuperscriptsubscriptuTiQ_{T}^{\epsilon,i}(k_{T}^{i},u_{T}^{i};\rho^{-i})=r_{T}^{i,\rho^{-i}}(k_{T}^{i% },u_{T}^{i})italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) = italic_r start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is continuous in ρisuperscriptρi\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT on 𝒫ϵ,isuperscript𝒫ϵi\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT due to part (a) of the claims.

Induction Step: Suppose that the induction hypothesis is true for tttitalic_t. Then Vtϵ,i(kti;ρi)superscriptsubscriptVtϵisuperscriptsubscriptktisuperscriptρiV_{t}^{\epsilon,i}(k_{t}^{i};\rho^{-i})italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) is continuous in ρisuperscriptρi\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT on 𝒫ϵ,isuperscript𝒫ϵi\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT due to Berge’s Maximum Theorem [49]. Then, Qt1ϵ,i(kt1i,ut1i;ρi)superscriptsubscriptQt1ϵisuperscriptsubscriptkt1isuperscriptsubscriptut1isuperscriptρiQ_{t-1}^{\epsilon,i}(k_{t-1}^{i},u_{t-1}^{i};\rho^{-i})italic_Q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) is continuous in ρisuperscriptρi\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT on 𝒫ϵ,isuperscript𝒫ϵi\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT due to the claims.

Applying Berge’s Maximum Theorem [49] once again, we conclude that BRtϵ,i[kti]superscriptsubscriptBR𝑡italic-ϵ𝑖delimited-[]superscriptsubscript𝑘𝑡𝑖\mathrm{BR}_{t}^{\epsilon,i}[k_{t}^{i}]roman_BR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT [ italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] is upper hemicontinuous on 𝒫ϵ,isuperscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT. For each ρi𝒫ϵ,isuperscript𝜌𝑖superscript𝒫italic-ϵ𝑖\rho^{-i}\in\mathcal{P}^{\epsilon,-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT, BRtϵ,i[kti](ρi)superscriptsubscriptBR𝑡italic-ϵ𝑖delimited-[]superscriptsubscript𝑘𝑡𝑖superscript𝜌𝑖\mathrm{BR}_{t}^{\epsilon,i}[k_{t}^{i}](\rho^{-i})roman_BR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT [ italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ( italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) is non-empty and convex since it is the solution set of a linear program.

As a product of compact-valued upper hemicontinuous correspondences, BRϵsuperscriptBRitalic-ϵ\mathrm{BR}^{\epsilon}roman_BR start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT is upper hemicontinuous. For each ρ𝒫ϵ𝜌superscript𝒫italic-ϵ\rho\in\mathcal{P}^{\epsilon}italic_ρ ∈ caligraphic_P start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT, BRϵ(ρ)superscriptBRitalic-ϵ𝜌\mathrm{BR}^{\epsilon}(\rho)roman_BR start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ( italic_ρ ) is non-empty and convex. By Kakutani’s fixed point theorem, BRϵsuperscriptBRitalic-ϵ\mathrm{BR}^{\epsilon}roman_BR start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT has a fixed point.

The above construction provides an approximate K𝐾Kitalic_K-based BNE for small ϵitalic-ϵ\epsilonitalic_ϵ. Next, we show that we can take ϵitalic-ϵ\epsilonitalic_ϵ to zero to obtain an exact BNE: Let (ϵn)n=1superscriptsubscriptsubscriptitalic-ϵ𝑛𝑛1(\epsilon_{n})_{n=1}^{\infty}( italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a sequence such that ϵn>0,ϵn0formulae-sequencesubscriptitalic-ϵ𝑛0subscriptitalic-ϵ𝑛0\epsilon_{n}>0,\epsilon_{n}\rightarrow 0italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > 0 , italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → 0. Let ρ(n)superscript𝜌𝑛\rho^{(n)}italic_ρ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT be a fixed point of BRϵnsuperscriptBRsubscriptitalic-ϵ𝑛\mathrm{BR}^{\epsilon_{n}}roman_BR start_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Then for each i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I we have

ρ(n),iargmaxρ~i𝒫ϵn,iJi(ρ~i,ρ(n),i).superscript𝜌𝑛𝑖superscript~𝜌𝑖superscript𝒫subscriptitalic-ϵ𝑛𝑖superscript𝐽𝑖superscript~𝜌𝑖superscript𝜌𝑛𝑖\rho^{(n),i}\in\underset{\tilde{\rho}^{i}\in\mathcal{P}^{\epsilon_{n},i}}{\arg% \max}~{}J^{i}(\tilde{\rho}^{i},\rho^{(n),-i}).italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ∈ start_UNDERACCENT over~ start_ARG italic_ρ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_ρ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT ) . (103)

Let ρ()𝒫0superscript𝜌superscript𝒫0\rho^{(\infty)}\in\mathcal{P}^{0}italic_ρ start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT be the limit of a sub-sequence of (ρ(n))n=1superscriptsubscriptsuperscript𝜌𝑛𝑛1(\rho^{(n)})_{n=1}^{\infty}( italic_ρ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT. Since Ji(ρ)superscript𝐽𝑖𝜌J^{i}(\rho)italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_ρ ) is continuous in ρ𝜌\rhoitalic_ρ on 𝒫0superscript𝒫0\mathcal{P}^{0}caligraphic_P start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, and ϵ𝒫ϵ,imaps-toitalic-ϵsuperscript𝒫italic-ϵ𝑖\epsilon\mapsto\mathcal{P}^{\epsilon,i}italic_ϵ ↦ caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT is a continuous correspondence with compact, non-empty value, through applying Berge’s Maximum Theorem [49] one last time, we conclude that for each i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I

ρ(),iargmaxρ~i𝒫0,iJi(ρ~i,ρ(),i),superscript𝜌𝑖superscript~𝜌𝑖superscript𝒫0𝑖superscript𝐽𝑖superscript~𝜌𝑖superscript𝜌𝑖\rho^{(\infty),i}\in\underset{\tilde{\rho}^{i}\in\mathcal{P}^{0,i}}{\arg\max}~% {}J^{i}(\tilde{\rho}^{i},\rho^{(\infty),-i}),italic_ρ start_POSTSUPERSCRIPT ( ∞ ) , italic_i end_POSTSUPERSCRIPT ∈ start_UNDERACCENT over~ start_ARG italic_ρ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUPERSCRIPT 0 , italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_ρ end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT ( ∞ ) , - italic_i end_POSTSUPERSCRIPT ) , (104)

i.e. ρ(),isuperscript𝜌𝑖\rho^{(\infty),i}italic_ρ start_POSTSUPERSCRIPT ( ∞ ) , italic_i end_POSTSUPERSCRIPT is optimal among Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategies in response to ρ(),isuperscript𝜌𝑖\rho^{(\infty),-i}italic_ρ start_POSTSUPERSCRIPT ( ∞ ) , - italic_i end_POSTSUPERSCRIPT. Recall that we have shown that there exist Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategies ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT that maximizes Ji(g~i,ρi)superscript𝐽𝑖superscript~𝑔𝑖superscript𝜌𝑖J^{i}(\tilde{g}^{i},\rho^{-i})italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) over all behavioral strategies g~isuperscript~𝑔𝑖\tilde{g}^{i}over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Therefore, we conclude that ρ()superscript𝜌\rho^{(\infty)}italic_ρ start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT forms a BNE, proving the existence of K𝐾Kitalic_K-based BNE.

Proof of Claim: We establish the continuity of the two functions by showing that they can be expressed with basic functions (i.e. summation, multiplication, division).

Let g^isuperscript^𝑔𝑖\hat{g}^{i}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be a behavioral strategy where player i𝑖iitalic_i chooses actions uniformly at random at every information set. For ρi𝒫ϵ,isuperscript𝜌𝑖superscript𝒫italic-ϵ𝑖\rho^{-i}\in\mathcal{P}^{\epsilon,-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT, we have g^i,ρi(kti)>0superscriptsuperscript^𝑔𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡𝑖0\mathbb{P}^{\hat{g}^{i},\rho^{-i}}(k_{t}^{i})>0blackboard_P start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) > 0 for all kti𝒦tisuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖k_{t}^{i}\in\mathcal{K}_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT since (g^i,ρi)superscript^𝑔𝑖superscript𝜌𝑖(\hat{g}^{i},\rho^{-i})( over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) is a strategy profile that always plays strictly mixed actions. Therefore we have

Pti,ρi(kt+1i|kti,uti)superscriptsubscript𝑃𝑡𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle P_{t}^{i,\rho^{-i}}(k_{t+1}^{i}|k_{t}^{i},u_{t}^{i})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =g^i,ρi(kt+1i|kti,uti)=g^i,ρi(kt+1i,kti,uti)g^i,ρi(kti,uti),absentsuperscriptsuperscript^𝑔𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsuperscript^𝑔𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsuperscript^𝑔𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=\mathbb{P}^{\hat{g}^{i},\rho^{-i}}(k_{t+1}^{i}|k_{t}^{i},u_{t}^{% i})=\dfrac{\mathbb{P}^{\hat{g}^{i},\rho^{-i}}(k_{t+1}^{i},k_{t}^{i},u_{t}^{i})% }{\mathbb{P}^{\hat{g}^{i},\rho^{-i}}(k_{t}^{i},u_{t}^{i})},= blackboard_P start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = divide start_ARG blackboard_P start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG start_ARG blackboard_P start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG , (105)
rti,ρi(kti,uti)superscriptsubscript𝑟𝑡𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle r_{t}^{i,\rho^{-i}}(k_{t}^{i},u_{t}^{i})italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼g^i,ρi[Rti|kti,uti]absentsuperscript𝔼superscript^𝑔𝑖superscript𝜌𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=\mathbb{E}^{\hat{g}^{i},\rho^{-i}}[R_{t}^{i}|k_{t}^{i},u_{t}^{i}]= blackboard_E start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] (106)
=xt𝒳t,uti𝒰t𝔼[Rti|xt,ut]g^i,ρi(xt,uti|kti,uti),absentsubscriptformulae-sequencesubscript𝑥𝑡subscript𝒳𝑡superscriptsubscript𝑢𝑡𝑖subscript𝒰𝑡𝔼delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖subscript𝑥𝑡subscript𝑢𝑡superscriptsuperscript^𝑔𝑖superscript𝜌𝑖subscript𝑥𝑡conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle=\sum_{x_{t}\in\mathcal{X}_{t},u_{t}^{-i}\in\mathcal{U}_{t}}% \mathbb{E}[R_{t}^{i}|x_{t},u_{t}]\mathbb{P}^{\hat{g}^{i},\rho^{-i}}(x_{t},u_{t% }^{-i}|k_{t}^{i},u_{t}^{i}),= ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] blackboard_P start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (107)

where 𝔼[Rti|xt,ut]𝔼delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖subscript𝑥𝑡subscript𝑢𝑡\mathbb{E}[R_{t}^{i}|x_{t},u_{t}]blackboard_E [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] is independent of the strategy profile.

We know that both g^i,ρi(kt+1i,kti,uti)superscriptsuperscript^𝑔𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathbb{P}^{\hat{g}^{i},\rho^{-i}}(k_{t+1}^{i},k_{t}^{i},u_{t}^{i})blackboard_P start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) and g^i,ρi(kti,uti)superscriptsuperscript^𝑔𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathbb{P}^{\hat{g}^{i},\rho^{-i}}(k_{t}^{i},u_{t}^{i})blackboard_P start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) are sums of products of components of ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT and g^isuperscript^𝑔𝑖\hat{g}^{i}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, hence both are continuous in ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT. Therefore Pti,ρi(zti|kti,uti)superscriptsubscript𝑃𝑡𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑧𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖P_{t}^{i,\rho^{-i}}(z_{t}^{i}|k_{t}^{i},u_{t}^{i})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is continuous in ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT on 𝒫ϵ,isuperscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT. The continuity of rti,ρi(kti,uti)superscriptsubscript𝑟𝑡𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖r_{t}^{i,\rho^{-i}}(k_{t}^{i},u_{t}^{i})italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) in ρisuperscript𝜌𝑖\rho^{-i}italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT on 𝒫ϵ,isuperscript𝒫italic-ϵ𝑖\mathcal{P}^{\epsilon,-i}caligraphic_P start_POSTSUPERSCRIPT italic_ϵ , - italic_i end_POSTSUPERSCRIPT can be shown with an analogous argument.

C.3 Proof of Theorem 3

Theorem 17 (Theorem 3, restated).

If K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT where Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information for player i𝑖iitalic_i, then the set of K𝐾Kitalic_K-based BNE payoffs is the same as that of all BNE.

To establish Theorem 3, we first introduce Definition 18, an extension of Definition 3, for the convenience of the proof. Then, we establish Lemmas C.4, C.6, C.8. Finally, we conclude the proof of Theorem 3 from the three lemmas.

In the following definition, we provide an extension of the definition of the information state where not only player i𝑖iitalic_i’s payoff are considered. This definition allows us to characterize compression maps that preserve payoff profiles, as required in the statement of Theorem 3.

Definition 18.

Let gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT be a behavioral strategy profile of players other than i𝑖iitalic_i and 𝒥𝒥\mathcal{J}\subseteq\mathcal{I}caligraphic_J ⊆ caligraphic_I be a subset of players. We say that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is an information state under gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT for the payoffs of 𝒥𝒥\mathcal{J}caligraphic_J if there exist functions (Pti,gi)t𝒯,(rtj,gi)j𝒥,t𝒯subscriptsuperscriptsubscript𝑃𝑡𝑖superscript𝑔𝑖𝑡𝒯subscriptsuperscriptsubscript𝑟𝑡𝑗superscript𝑔𝑖formulae-sequence𝑗𝒥𝑡𝒯(P_{t}^{i,g^{-i}})_{t\in\mathcal{T}},(r_{t}^{j,g^{-i}})_{j\in\mathcal{J},t\in% \mathcal{T}}( italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ caligraphic_T end_POSTSUBSCRIPT , ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j ∈ caligraphic_J , italic_t ∈ caligraphic_T end_POSTSUBSCRIPT, where Pti,gi:𝒦ti×𝒰tiΔ(𝒦t+1i):superscriptsubscript𝑃𝑡𝑖superscript𝑔𝑖maps-tosuperscriptsubscript𝒦𝑡𝑖superscriptsubscript𝒰𝑡𝑖Δsuperscriptsubscript𝒦𝑡1𝑖P_{t}^{i,g^{-i}}:\mathcal{K}_{t}^{i}\times\mathcal{U}_{t}^{i}\mapsto\Delta(% \mathcal{K}_{t+1}^{i})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) and rtj,gi:𝒦ti×𝒰ti[1,1]:superscriptsubscript𝑟𝑡𝑗superscript𝑔𝑖maps-tosuperscriptsubscript𝒦𝑡𝑖superscriptsubscript𝒰𝑡𝑖11r_{t}^{j,g^{-i}}:\mathcal{K}_{t}^{i}\times\mathcal{U}_{t}^{i}\mapsto[-1,1]italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ [ - 1 , 1 ], such that

  1. (1)

    gi,gi(kt+1i|hti,uti)=Pti,gi(kt+1i|kti,uti)superscriptsuperscript𝑔𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑃𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathbb{P}^{g^{i},g^{-i}}(k_{t+1}^{i}|h_{t}^{i},u_{t}^{i})=P_{t}^{i,g^{-i}}(k_% {t+1}^{i}|k_{t}^{i},u_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for all t𝒯\{T}𝑡\𝒯𝑇t\in\mathcal{T}\backslash\{T\}italic_t ∈ caligraphic_T \ { italic_T }; and

  2. (2)

    𝔼gi,gi[Rtj|hti,uti]=rtj,gi(kti,uti)superscript𝔼superscript𝑔𝑖superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑟𝑡𝑗superscript𝑔𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathbb{E}^{g^{i},g^{-i}}[R_{t}^{j}|h_{t}^{i},u_{t}^{i}]=r_{t}^{j,g^{-i}}(k_{t% }^{i},u_{t}^{i})blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for all j𝒥𝑗𝒥j\in\mathcal{J}italic_j ∈ caligraphic_J and all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T,

for all gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and all (hti,uti)superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖(h_{t}^{i},u_{t}^{i})( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) admissible under (gi,gi)superscript𝑔𝑖superscript𝑔𝑖(g^{i},g^{-i})( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ).

Notice that condition (2) of Definition 18 means that the information state Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is sufficient for evaluating other agents’ payoffs as well. This property is essential in establishing the preservation of payoff profiles of other agents when player i𝑖iitalic_i switches to a compression-based strategy.

Lemma C.4.

If Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information, then Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is an information state under gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT for the payoffs of \mathcal{I}caligraphic_I under all behavioral strategy profiles gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT.

Proof C.5 (Proof of Lemma C.4).

Let Φti,gisuperscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖\Phi_{t}^{i,g^{-i}}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT be as in the definition of USI (Definition 1), we have

g(xt,hti|hti)superscript𝑔subscript𝑥𝑡conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\mathbb{P}^{g}(x_{t},h_{t}^{-i}|h_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =Φti,gi(xt,hti|kti).absentsuperscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖subscript𝑥𝑡conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle=\Phi_{t}^{i,g^{-i}}(x_{t},h_{t}^{-i}|k_{t}^{i}).= roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (108)

Applying the Law of Total Probability,

g(x~t,u~ti|hti)superscript𝑔subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\mathbb{P}^{g}(\tilde{x}_{t},\tilde{u}_{t}^{-i}|h_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =h~tig(u~ti|x~t,h~ti,hti)g(x~t,h~ti|hti)absentsubscriptsuperscriptsubscript~𝑡𝑖superscript𝑔conditionalsuperscriptsubscript~𝑢𝑡𝑖subscript~𝑥𝑡superscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑖superscript𝑔subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle=\sum_{\tilde{h}_{t}^{-i}}\mathbb{P}^{g}(\tilde{u}_{t}^{-i}|% \tilde{x}_{t},\tilde{h}_{t}^{-i},h_{t}^{i})\mathbb{P}^{g}(\tilde{x}_{t},\tilde% {h}_{t}^{-i}|h_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (109)
=h~ti(jigtj(u~tj|h~tj))Φti,gi(x~t,h~ti|kti)absentsubscriptsuperscriptsubscript~𝑡𝑖subscriptproduct𝑗𝑖superscriptsubscript𝑔𝑡𝑗conditionalsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript~𝑡𝑗superscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle=\sum_{\tilde{h}_{t}^{-i}}\left(\prod_{j\neq i}g_{t}^{j}(\tilde{u% }_{t}^{j}|\tilde{h}_{t}^{j})\right)\Phi_{t}^{i,g^{-i}}(\tilde{x}_{t},\tilde{h}% _{t}^{-i}|k_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (110)
=:P~ti,gi(x~t,u~ti|kti).\displaystyle=:\tilde{P}_{t}^{i,g^{-i}}(\tilde{x}_{t},\tilde{u}_{t}^{-i}|k_{t}% ^{i}).= : over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (111)

We know that Kt+1i=ιt+1i(Kti,Zti)=ξti(Kti,Xt,Ut,Wt)superscriptsubscript𝐾𝑡1𝑖superscriptsubscript𝜄𝑡1𝑖superscriptsubscript𝐾𝑡𝑖superscriptsubscript𝑍𝑡𝑖superscriptsubscript𝜉𝑡𝑖superscriptsubscript𝐾𝑡𝑖subscript𝑋𝑡subscript𝑈𝑡subscript𝑊𝑡K_{t+1}^{i}=\iota_{t+1}^{i}(K_{t}^{i},Z_{t}^{i})=\xi_{t}^{i}(K_{t}^{i},X_{t},U% _{t},W_{t})italic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_ι start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for some fixed function ξtisuperscriptsubscript𝜉𝑡𝑖\xi_{t}^{i}italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT independent of the strategy profile g𝑔gitalic_g. Since Wtsubscript𝑊𝑡W_{t}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a primitive random variable, (kt+1i|kti,xt,ut)conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖subscript𝑥𝑡subscript𝑢𝑡\mathbb{P}(k_{t+1}^{i}|k_{t}^{i},x_{t},u_{t})blackboard_P ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is independent of the strategy profile g𝑔gitalic_g. Therefore,

g(kt+1i|hti,uti)superscript𝑔conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\mathbb{P}^{g}(k_{t+1}^{i}|h_{t}^{i},u_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =x~t,u~ti(kt+1i|kti,x~t,(u~ti,uti))P~ti,gi(x~t,u~ti|kti)absentsubscriptsubscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖subscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript~𝑃𝑡𝑖superscript𝑔𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle=\sum_{\tilde{x}_{t},\tilde{u}_{t}^{-i}}\mathbb{P}(k_{t+1}^{i}|k_% {t}^{i},\tilde{x}_{t},(\tilde{u}_{t}^{-i},u_{t}^{i}))\tilde{P}_{t}^{i,g^{-i}}(% \tilde{x}_{t},\tilde{u}_{t}^{-i}|k_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_P ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (112)
=:Pti,gi(kt+1i|kti,uti),\displaystyle=:P_{t}^{i,g^{-i}}(k_{t+1}^{i}|k_{t}^{i},u_{t}^{i}),= : italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (113)

establishing part (1) of Definition 18.

Consider any j𝑗j\in\mathcal{I}italic_j ∈ caligraphic_I. Since Rtjsuperscriptsubscript𝑅𝑡𝑗R_{t}^{j}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is a strategy-independent function of (Xt,Ut,Wt)subscript𝑋𝑡subscript𝑈𝑡subscript𝑊𝑡(X_{t},U_{t},W_{t})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), 𝔼[Rtj|xt,ut]𝔼delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗subscript𝑥𝑡subscript𝑢𝑡\mathbb{E}[R_{t}^{j}|x_{t},u_{t}]blackboard_E [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] is independent of g𝑔gitalic_g. Therefore

𝔼g[Rtj|hti,uti]superscript𝔼𝑔delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\mathbb{E}^{g}[R_{t}^{j}|h_{t}^{i},u_{t}^{i}]blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] =x~t,u~ti𝔼[Rtj|x~t,(uti,u~ti)]P~ti,gi(x~t,u~ti|kti)absentsubscriptsubscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖𝔼delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗subscript~𝑥𝑡superscriptsubscript𝑢𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript~𝑃𝑡𝑖superscript𝑔𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle=\sum_{\tilde{x}_{t},\tilde{u}_{t}^{-i}}\mathbb{E}[R_{t}^{j}|% \tilde{x}_{t},(u_{t}^{i},\tilde{u}_{t}^{-i})]\tilde{P}_{t}^{i,g^{-i}}(\tilde{x% }_{t},\tilde{u}_{t}^{-i}|k_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) ] over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (114)
=:rtj,gi(kti,uti),\displaystyle=:r_{t}^{j,g^{-i}}(k_{t}^{i},u_{t}^{i}),= : italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (115)

establishing part (2) of Definition 18.

In Lemma C.6, we show that any behavioral strategy of player i𝑖iitalic_i can be replaced by an equivalent randomized USI-based strategy while preserving payoffs of all players.

Lemma C.6.

Let Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be unilaterally sufficient information. Then for every behavioral strategy profile gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, if the Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT based strategy ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is given by

ρti(uti|kti)=h~titigti(uti|h~ti)Fti,gi(h~ti|kti),superscriptsubscript𝜌𝑡𝑖conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑘𝑡𝑖subscriptsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript~𝑡𝑖superscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\rho_{t}^{i}(u_{t}^{i}|k_{t}^{i})=\sum_{\tilde{h}_{t}^{i}\in\mathcal{H}_{t}^{i% }}g_{t}^{i}(u_{t}^{i}|\tilde{h}_{t}^{i})F_{t}^{i,g^{i}}(\tilde{h}_{t}^{i}|k_{t% }^{i}),italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (116)

where Fti,gi(h~ti|kti)superscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖F_{t}^{i,g^{i}}(\tilde{h}_{t}^{i}|k_{t}^{i})italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is defined in Definition 1, then

Jj(gi,gi)=Jj(ρi,gi),superscript𝐽𝑗superscript𝑔𝑖superscript𝑔𝑖superscript𝐽𝑗superscript𝜌𝑖superscript𝑔𝑖J^{j}(g^{i},g^{-i})=J^{j}(\rho^{i},g^{-i}),italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) = italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) ,

for all j𝑗j\in\mathcal{I}italic_j ∈ caligraphic_I and all behavioral strategy profiles gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT of players other than i𝑖iitalic_i.

Proof C.7 (Proof of Lemma C.6).

Let j𝑗j\in\mathcal{I}italic_j ∈ caligraphic_I. Consider an MDP with state Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, action Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and instantaneous reward r~ti,j(hti,uti):=𝔼gi[Rtj|hti,uti]assignsuperscriptsubscript~𝑟𝑡𝑖𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscript𝔼superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\tilde{r}_{t}^{i,j}(h_{t}^{i},u_{t}^{i}):=\mathbb{E}^{g^{-i}}[R_{t}^{j}|h_{t}^% {i},u_{t}^{i}]over~ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) := blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ]. By Lemma C.4, Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is an information state (as defined in Definition 9) for this MDP. Hence Jj(gi,gi)=Jj(ρi,gi)superscript𝐽𝑗superscript𝑔𝑖superscript𝑔𝑖superscript𝐽𝑗superscript𝜌𝑖superscript𝑔𝑖J^{j}(g^{i},g^{-i})=J^{j}(\rho^{i},g^{-i})italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) = italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) follows from the Policy Equivalence Lemma (Lemma A.2).

In Lemma C.8, we proceed to show that a behavioral strategy can be replaced with an USI-based strategy while preserving not only the payoffs of all players, but also the equilibrium.

Lemma C.8.

If Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information for player i𝑖iitalic_i, then for any BNE strategy profile g=(gi)i𝑔subscriptsuperscript𝑔𝑖𝑖g=(g^{i})_{i\in\mathcal{I}}italic_g = ( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT there exists a Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT such that (ρi,gi)superscript𝜌𝑖superscript𝑔𝑖(\rho^{i},g^{-i})( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) forms a BNE with the same expected payoff profile as g𝑔gitalic_g.

Proof C.9 (Proof of Lemma C.8).

Let ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be associated with gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT as specified in Lemma C.6. Set g¯=(ρi,gi)¯𝑔superscript𝜌𝑖superscript𝑔𝑖\bar{g}=(\rho^{i},g^{-i})over¯ start_ARG italic_g end_ARG = ( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ). Since Ji(ρi,gi)=Ji(gi,gi)superscript𝐽𝑖superscript𝜌𝑖superscript𝑔𝑖superscript𝐽𝑖superscript𝑔𝑖superscript𝑔𝑖J^{i}(\rho^{i},g^{-i})=J^{i}(g^{i},g^{-i})italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) = italic_J start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) and gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a best response to gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is also a best response to gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT.

Consider ji𝑗𝑖j\neq iitalic_j ≠ italic_i. Let g~jsuperscript~𝑔𝑗\tilde{g}^{j}over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT be an arbitrary behavioral strategy of player j𝑗jitalic_j. By using Lemma C.6 twice we have

Jj(g¯j,g¯j)superscript𝐽𝑗superscript¯𝑔𝑗superscript¯𝑔𝑗\displaystyle J^{j}(\bar{g}^{j},\bar{g}^{-j})italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT - italic_j end_POSTSUPERSCRIPT ) =Jj(ρi,gi)=Jj(g)Jj(g~j,gj)absentsuperscript𝐽𝑗superscript𝜌𝑖superscript𝑔𝑖superscript𝐽𝑗𝑔superscript𝐽𝑗superscript~𝑔𝑗superscript𝑔𝑗\displaystyle=J^{j}(\rho^{i},g^{-i})=J^{j}(g)\geq J^{j}(\tilde{g}^{j},g^{-j})= italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) = italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_g ) ≥ italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_j end_POSTSUPERSCRIPT ) (117)
=Jj(g~j,(ρi,g{i,j}))=Jj(g~j,g¯j).absentsuperscript𝐽𝑗superscript~𝑔𝑗superscript𝜌𝑖superscript𝑔𝑖𝑗superscript𝐽𝑗superscript~𝑔𝑗superscript¯𝑔𝑗\displaystyle=J^{j}(\tilde{g}^{j},(\rho^{i},g^{-\{i,j\}}))=J^{j}(\tilde{g}^{j}% ,\bar{g}^{-j}).= italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , ( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT ) ) = italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT - italic_j end_POSTSUPERSCRIPT ) . (118)

Therefore g¯jsuperscript¯𝑔𝑗\bar{g}^{j}over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is a best response to (ρi,g{i,j})superscript𝜌𝑖superscript𝑔𝑖𝑗(\rho^{i},g^{-\{i,j\}})( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT ). We conclude that g¯=(ρi,gi)¯𝑔superscript𝜌𝑖superscript𝑔𝑖\bar{g}=(\rho^{i},g^{-i})over¯ start_ARG italic_g end_ARG = ( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) is also a BNE.

Proof C.10 (Proof of Theorem 3).

Given any BNE strategy profile g𝑔gitalic_g, applying Lemma C.8 iteratively for each i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I, we obtain a K𝐾Kitalic_K-based BNE strategy profile ρ𝜌\rhoitalic_ρ with the same expected payoff profile as g𝑔gitalic_g. Therefore the set of K𝐾Kitalic_K-based BNE payoffs is the same as that of all BNE.

C.4 Proof of Theorem 4

Theorem 19 (Theorem 4, restated).

If K𝐾Kitalic_K is mutually sufficient information, then there exists at least one K𝐾Kitalic_K-based sequential equilibrium.

Proof C.11.

The proof of Theorem 4 follows similar steps to that of Theorem 2, where we construct a sequence of strictly mixed strategy profiles via the fixed points of dynamic program based best response mappings. In addition, we show the sequential rationality of the strategy profile constructed.

Let (ρ(n))n=1superscriptsubscriptsuperscript𝜌𝑛𝑛1(\rho^{(n)})_{n=1}^{\infty}( italic_ρ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a sequence of K𝐾Kitalic_K-based strategy profiles that always assigns strictly mixed actions as constructed in the proof of Theorem 2. By taking a sub-sequence, without loss of generality, assume that ρ(n)ρ()superscript𝜌𝑛superscript𝜌\rho^{(n)}\rightarrow\rho^{(\infty)}italic_ρ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → italic_ρ start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT for some K𝐾Kitalic_K-based strategy profile ρ()superscript𝜌\rho^{(\infty)}italic_ρ start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT.

Let Q(n)superscript𝑄𝑛Q^{(n)}italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT be conjectures of reward-to-go functions consistent (in the sense of Definition 12) with ρ(n)superscript𝜌𝑛\rho^{(n)}italic_ρ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT, i.e.

Qτ(n),i(hti,uti):=𝔼ρ(n)[t=τTRti|hτi,uτi].assignsuperscriptsubscript𝑄𝜏𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscript𝔼superscript𝜌𝑛delimited-[]conditionalsuperscriptsubscript𝑡𝜏𝑇superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖\displaystyle Q_{\tau}^{(n),i}(h_{t}^{i},u_{t}^{i}):=\mathbb{E}^{\rho^{(n)}}% \left[\sum_{t=\tau}^{T}R_{t}^{i}\Big{|}h_{\tau}^{i},u_{\tau}^{i}\right].italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) := blackboard_E start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] . (119)

Let Q()superscript𝑄Q^{(\infty)}italic_Q start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT be the limit of a sub-sequence of (Q(n))n=1superscriptsubscriptsuperscript𝑄𝑛𝑛1(Q^{(n)})_{n=1}^{\infty}( italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT (such a limit exists since the range of each Qτ(n),isuperscriptsubscript𝑄𝜏𝑛𝑖Q_{\tau}^{(n),i}italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT is a compact set). We proceed to show that (ρ(),Q())superscript𝜌superscript𝑄(\rho^{(\infty)},Q^{(\infty)})( italic_ρ start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT ) forms a sequential equilibrium (as defined in Definition 12). Note that by construction, Q()superscript𝑄Q^{(\infty)}italic_Q start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT is fully consistent with ρ()superscript𝜌\rho^{(\infty)}italic_ρ start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT. We only need to show sequential rationality.

Claim: Let Qtϵ,isuperscriptsubscriptQtϵiQ_{t}^{\epsilon,i}italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT be as defined in (101) in the proof of Theorem 2, then

Qt(n),i(hti,uti)=Qtϵn,i(kti,uti;ρ(n),i),superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑄𝑡subscriptitalic-ϵ𝑛𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscript𝜌𝑛𝑖Q_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})=Q_{t}^{\epsilon_{n},i}(k_{t}^{i},u_{t}^{i};% \rho^{(n),-i}),italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT ) , (120)

for all i,t𝒯,htitiformulae-sequence𝑖formulae-sequence𝑡𝒯superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖i\in\mathcal{I},t\in\mathcal{T},h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and uti𝒰tisuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝒰𝑡𝑖u_{t}^{i}\in\mathcal{U}_{t}^{i}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

By construction in the proof of Theorem 2, ρt(n),i(kti)BRtϵn,i[kti](ρ(n),i)superscriptsubscript𝜌𝑡𝑛𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscriptBR𝑡subscriptitalic-ϵ𝑛𝑖delimited-[]superscriptsubscript𝑘𝑡𝑖superscript𝜌𝑛𝑖\rho_{t}^{(n),i}(k_{t}^{i})\in\mathrm{BR}_{t}^{\epsilon_{n},i}[k_{t}^{i}](\rho% ^{(n),-i})italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ∈ roman_BR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_i end_POSTSUPERSCRIPT [ italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ( italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT ). Given the claim, this means that

ρt(n),i(kti)argmaxηΔϵn(𝒰ti)u~tiQt(n),i(hti,u~ti)η(u~ti),superscriptsubscript𝜌𝑡𝑛𝑖superscriptsubscript𝑘𝑡𝑖𝜂superscriptΔsubscriptitalic-ϵ𝑛superscriptsubscript𝒰𝑡𝑖subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖𝜂superscriptsubscript~𝑢𝑡𝑖\rho_{t}^{(n),i}(k_{t}^{i})\in\underset{\eta\in\Delta^{\epsilon_{n}}(\mathcal{% U}_{t}^{i})}{\arg\max}~{}\sum_{\tilde{u}_{t}^{i}}Q_{t}^{(n),i}(h_{t}^{i},% \tilde{u}_{t}^{i})\eta(\tilde{u}_{t}^{i}),italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ∈ start_UNDERACCENT italic_η ∈ roman_Δ start_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG roman_arg roman_max end_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_η ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (121)

for all i,t𝒯formulae-sequence𝑖𝑡𝒯i\in\mathcal{I},t\in\mathcal{T}italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T and htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Applying Berge’s Maximum Theorem [49] in a similar manner to the proof of Theorem 2 we obtain

ρt(),i(kti)argmaxηΔ(𝒰ti)u~tiQt(),i(hti,u~ti)η(u~ti),superscriptsubscript𝜌𝑡𝑖superscriptsubscript𝑘𝑡𝑖𝜂Δsuperscriptsubscript𝒰𝑡𝑖subscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖𝜂superscriptsubscript~𝑢𝑡𝑖\rho_{t}^{(\infty),i}(k_{t}^{i})\in\underset{\eta\in\Delta(\mathcal{U}_{t}^{i}% )}{\arg\max}~{}\sum_{\tilde{u}_{t}^{i}}Q_{t}^{(\infty),i}(h_{t}^{i},\tilde{u}_% {t}^{i})\eta(\tilde{u}_{t}^{i}),italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( ∞ ) , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ∈ start_UNDERACCENT italic_η ∈ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG roman_arg roman_max end_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( ∞ ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_η ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (122)

for all i,t𝒯formulae-sequence𝑖𝑡𝒯i\in\mathcal{I},t\in\mathcal{T}italic_i ∈ caligraphic_I , italic_t ∈ caligraphic_T and htitisuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑖h_{t}^{i}\in\mathcal{H}_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Therefore, we have shown that ρ()superscript𝜌\rho^{(\infty)}italic_ρ start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT is sequentially rational under Q()superscript𝑄Q^{(\infty)}italic_Q start_POSTSUPERSCRIPT ( ∞ ) end_POSTSUPERSCRIPT and we have completed the proof.

Proof of Claim: For clarity of exposition we drop the superscript (n)n(n)( italic_n ) of ρ(n)superscriptρn\rho^{(n)}italic_ρ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT. We know that Qt(n),isuperscriptsubscriptQtniQ_{t}^{(n),i}italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT satisfies the following equations:

QT(n),i(hTi,uTi)superscriptsubscript𝑄𝑇𝑛𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle Q_{T}^{(n),i}(h_{T}^{i},u_{T}^{i})italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝔼ρ[RTi|hTi,uTi],absentsuperscript𝔼𝜌delimited-[]conditionalsuperscriptsubscript𝑅𝑇𝑖superscriptsubscript𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle=\mathbb{E}^{\rho}[R_{T}^{i}|h_{T}^{i},u_{T}^{i}],= blackboard_E start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] , (123a)
Vt(n),i(hti)superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript𝑡𝑖\displaystyle V_{t}^{(n),i}(h_{t}^{i})italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=u~tiQt(n),i(hti,u~ti)ρti(u~ti|kti),assignabsentsubscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript~𝑢𝑡𝑖subscriptsuperscript𝜌𝑖𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle:=\sum_{\tilde{u}_{t}^{i}}Q_{t}^{(n),i}(h_{t}^{i},\tilde{u}_{t}^{% i})\rho^{i}_{t}(\tilde{u}_{t}^{i}|k_{t}^{i}),:= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (123b)
Qt1(n),i(ht1i,ut1i)superscriptsubscript𝑄𝑡1𝑛𝑖superscriptsubscript𝑡1𝑖superscriptsubscript𝑢𝑡1𝑖\displaystyle Q_{t-1}^{(n),i}(h_{t-1}^{i},u_{t-1}^{i})italic_Q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=𝔼ρ[Rt1i|ht1i,ut1i]+h~titiVt(n),i(h~ti)ρ(h~ti|ht1i,uti).assignabsentsuperscript𝔼𝜌delimited-[]conditionalsuperscriptsubscript𝑅𝑡1𝑖superscriptsubscript𝑡1𝑖superscriptsubscript𝑢𝑡1𝑖subscriptsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript~𝑡𝑖superscript𝜌conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡1𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle:=\mathbb{E}^{\rho}[R_{t-1}^{i}|h_{t-1}^{i},u_{t-1}^{i}]+\sum_{% \tilde{h}_{t}^{i}\in\mathcal{H}_{t}^{i}}V_{t}^{(n),i}(\tilde{h}_{t}^{i})% \mathbb{P}^{\rho}(\tilde{h}_{t}^{i}|h_{t-1}^{i},u_{t}^{i}).:= blackboard_E start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] + ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (123c)

Since K𝐾Kitalic_K is mutually sufficient information, we have

ρ(kt+1i|hti,uti)superscript𝜌conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\mathbb{P}^{\rho}(k_{t+1}^{i}|h_{t}^{i},u_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=Pti,ρi(kt+1i|kti,uti),assignabsentsuperscriptsubscript𝑃𝑡𝑖superscript𝜌𝑖conditionalsuperscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle:=P_{t}^{i,\rho^{-i}}(k_{t+1}^{i}|k_{t}^{i},u_{t}^{i}),:= italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (124)
𝔼ρ[Rti|hti,uti]superscript𝔼𝜌delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle\mathbb{E}^{\rho}[R_{t}^{i}|h_{t}^{i},u_{t}^{i}]blackboard_E start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] :=rti,ρi(kti,uti),assignabsentsuperscriptsubscript𝑟𝑡𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle:=r_{t}^{i,\rho^{-i}}(k_{t}^{i},u_{t}^{i}),:= italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (125)

where Pti,ρisuperscriptsubscript𝑃𝑡𝑖superscript𝜌𝑖P_{t}^{i,\rho^{-i}}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and rti,ρisuperscriptsubscript𝑟𝑡𝑖superscript𝜌𝑖r_{t}^{i,\rho^{-i}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT are as specified in Definition 3.

Therefore, through an inductive argument, one can show then Qt(n),i(hti,uti)superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖Q_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) depends on htisuperscriptsubscript𝑡𝑖h_{t}^{i}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT only through ktisuperscriptsubscript𝑘𝑡𝑖k_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, and

QT(n),i(kTi,uTi)superscriptsubscript𝑄𝑇𝑛𝑖superscriptsubscript𝑘𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle Q_{T}^{(n),i}(k_{T}^{i},u_{T}^{i})italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =rTi,ρi(kTi,uTi),absentsuperscriptsubscript𝑟𝑇𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑇𝑖superscriptsubscript𝑢𝑇𝑖\displaystyle=r_{T}^{i,\rho^{-i}}(k_{T}^{i},u_{T}^{i}),= italic_r start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (126a)
Vt(n),i(kti)superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle V_{t}^{(n),i}(k_{t}^{i})italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=u~tiQti(kti,u~ti;ρi)ρti(u~ti|kti),assignabsentsubscriptsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscript𝜌𝑖subscriptsuperscript𝜌𝑖𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle:=\sum_{\tilde{u}_{t}^{i}}Q_{t}^{i}(k_{t}^{i},\tilde{u}_{t}^{i};% \rho^{-i})\rho^{i}_{t}(\tilde{u}_{t}^{i}|k_{t}^{i}),:= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ; italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (126b)
Qt1(n),i(kt1i,ut1i)superscriptsubscript𝑄𝑡1𝑛𝑖superscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑢𝑡1𝑖\displaystyle\quad Q_{t-1}^{(n),i}(k_{t-1}^{i},u_{t-1}^{i})italic_Q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) :=rt1i,ρi(kt1i,ut1i)+k~ti𝒦tiVt(n),i(k~ti)Pt1i,ρi(k~ti|kt1i,uti).assignabsentsuperscriptsubscript𝑟𝑡1𝑖superscript𝜌𝑖superscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑢𝑡1𝑖subscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝒦𝑡𝑖superscriptsubscript𝑉𝑡𝑛𝑖superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑃𝑡1𝑖superscript𝜌𝑖conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡1𝑖superscriptsubscript𝑢𝑡𝑖\displaystyle:=r_{t-1}^{i,\rho^{-i}}(k_{t-1}^{i},u_{t-1}^{i})+\sum_{\tilde{k}_% {t}^{i}\in\mathcal{K}_{t}^{i}}V_{t}^{(n),i}(\tilde{k}_{t}^{i})P_{t-1}^{i,\rho^% {-i}}(\tilde{k}_{t}^{i}|k_{t-1}^{i},u_{t}^{i}).:= italic_r start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_P start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (126c)

The claim is then established by comparing (126) with (101) and combining with the fact that ρti(kti)BRtϵ,i[kti](ρi)superscriptsubscript𝜌𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscriptBR𝑡italic-ϵ𝑖delimited-[]superscriptsubscript𝑘𝑡𝑖superscript𝜌𝑖\rho_{t}^{i}(k_{t}^{i})\in\mathrm{BR}_{t}^{\epsilon,i}[k_{t}^{i}](\rho^{-i})italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ∈ roman_BR start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ , italic_i end_POSTSUPERSCRIPT [ italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] ( italic_ρ start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ).

C.5 Proof of Theorem 5

Theorem 20 (Theorem 5, restated).

If K=(Ki)i𝐾subscriptsuperscript𝐾𝑖𝑖K=(K^{i})_{i\in\mathcal{I}}italic_K = ( italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT where Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information for player i𝑖iitalic_i, then the set of K𝐾Kitalic_K-based sequential equilibrium payoffs is the same as that of all sequential equilibria.

To prove the assertion of Theorem 5 we establish a series of technical results that appear in Lemmas C.12 - C.20 below. The two key results needed for the proof of the theorem are provided by Lemmas C.16 and C.20. Lemma C.16 asserts that a player can switch to a USI-based strategy without changing the dynamic decision problems faced by the other players. The result of Lemma C.16 allows to establish the analogue of the payoff equivalence result of Lemma A.2 under the concept of sequential equilibrium. Lemma C.20 asserts that any one player can switch to a USI-based strategy without affecting the sequential equilibrium (under perfect recall) and its payoffs. The proof of Lemma C.16 is based on two technical results provided by Lemmas C.12 and C.14. The proof of Lemma C.20 is based on Lemmas C.16 and C.18 which states that the history-action value function of a player i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I can be expressed with their USI.

Lemma C.12.

Suppose that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information. Then

g(hti|htj)superscript𝑔conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle\mathbb{P}^{g}(h_{t}^{i}|h_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =g(hti|kti)g(kti|htj),absentsuperscript𝑔conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscript𝑔conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\mathbb{P}^{g}(h_{t}^{i}|k_{t}^{i})\mathbb{P}^{g}(k_{t}^{i}|h_{t% }^{j}),= blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (127)

whenever g(kti)>0,g(htj)>0formulae-sequencesuperscript𝑔superscriptsubscript𝑘𝑡𝑖0superscript𝑔superscriptsubscript𝑡𝑗0\mathbb{P}^{g}(k_{t}^{i})>0,\mathbb{P}^{g}(h_{t}^{j})>0blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) > 0 , blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) > 0.

Proof C.13.

From the definition of unilaterally sufficient information (Definition 1) we have

g(h~ti,h~tj|kti)superscript𝑔superscriptsubscript~𝑡𝑖conditionalsuperscriptsubscript~𝑡𝑗superscriptsubscript𝑘𝑡𝑖\displaystyle\mathbb{P}^{g}(\tilde{h}_{t}^{i},\tilde{h}_{t}^{j}|k_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =Fti,gi(h~ti|kti)Fti,j,gi(h~tj|kti),absentsuperscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝐹𝑡𝑖𝑗superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡𝑗superscriptsubscript𝑘𝑡𝑖\displaystyle=F_{t}^{i,g^{i}}(\tilde{h}_{t}^{i}|k_{t}^{i})F_{t}^{i,j,g^{-i}}(% \tilde{h}_{t}^{j}|k_{t}^{i}),= italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (128)

where

Fti,j,gi(htj|kti):=x~t,h~t{i,j}Φti,gi(x~t,(htj,h~t{i,j})|kti).assignsuperscriptsubscript𝐹𝑡𝑖𝑗superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript𝑘𝑡𝑖subscriptsubscript~𝑥𝑡superscriptsubscript~𝑡𝑖𝑗superscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript~𝑡𝑖𝑗superscriptsubscript𝑘𝑡𝑖\displaystyle F_{t}^{i,j,g^{-i}}(h_{t}^{j}|k_{t}^{i}):=\sum_{\tilde{x}_{t},% \tilde{h}_{t}^{-\{i,j\}}}\Phi_{t}^{i,g^{-i}}(\tilde{x}_{t},(h_{t}^{j},\tilde{h% }_{t}^{-\{i,j\}})|k_{t}^{i}).italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) := ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT ) | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (129)

Therefore, we conclude that Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and Htjsuperscriptsubscript𝐻𝑡𝑗H_{t}^{j}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT are conditionally independent given Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Since Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a function of Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, we have

g(hti|htj)superscript𝑔conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle\mathbb{P}^{g}(h_{t}^{i}|h_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =g(hti,kti|htj)=g(hti|kti)g(kti|htj).absentsuperscript𝑔superscriptsubscript𝑡𝑖conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscript𝑔conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscript𝑔conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\mathbb{P}^{g}(h_{t}^{i},k_{t}^{i}|h_{t}^{j})=\mathbb{P}^{g}(h_{% t}^{i}|k_{t}^{i})\mathbb{P}^{g}(k_{t}^{i}|h_{t}^{j}).= blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) . (130)
Lemma C.14.

Suppose that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information for player i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I. Then there exist functions (Πtj,i,g{i,j})j\{i},t𝒯subscriptsuperscriptsubscriptΠ𝑡𝑗𝑖superscript𝑔𝑖𝑗formulae-sequence𝑗\𝑖𝑡𝒯(\Pi_{t}^{j,i,g^{-\{i,j\}}})_{j\in\mathcal{I}\backslash\{i\},t\in\mathcal{T}}( roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j ∈ caligraphic_I \ { italic_i } , italic_t ∈ caligraphic_T end_POSTSUBSCRIPT, (rti,j,g{i,j})j\{i},t𝒯subscriptsuperscriptsubscript𝑟𝑡𝑖𝑗superscript𝑔𝑖𝑗formulae-sequence𝑗\𝑖𝑡𝒯(r_{t}^{i,j,g^{-\{i,j\}}})_{j\in\mathcal{I}\backslash\{i\},t\in\mathcal{T}}( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j ∈ caligraphic_I \ { italic_i } , italic_t ∈ caligraphic_T end_POSTSUBSCRIPT, where Πti,j,g{i,j}:𝒦ti×tj×𝒰ti×𝒰tjΔ(t+1j):superscriptsubscriptΠ𝑡𝑖𝑗superscript𝑔𝑖𝑗maps-tosuperscriptsubscript𝒦𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝒰𝑡𝑖superscriptsubscript𝒰𝑡𝑗Δsuperscriptsubscript𝑡1𝑗\Pi_{t}^{i,j,g^{-\{i,j\}}}:\mathcal{K}_{t}^{i}\times\mathcal{H}_{t}^{j}\times% \mathcal{U}_{t}^{i}\times\mathcal{U}_{t}^{j}\mapsto\Delta(\mathcal{H}_{t+1}^{j})roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_H start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ), rti,j,g{i,j}:𝒦ti×tj×𝒰ti×𝒰tj[1,1]:superscriptsubscript𝑟𝑡𝑖𝑗superscript𝑔𝑖𝑗maps-tosuperscriptsubscript𝒦𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝒰𝑡𝑖superscriptsubscript𝒰𝑡𝑗11r_{t}^{i,j,g^{-\{i,j\}}}:\mathcal{K}_{t}^{i}\times\mathcal{H}_{t}^{j}\times% \mathcal{U}_{t}^{i}\times\mathcal{U}_{t}^{j}\mapsto[-1,1]italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ↦ [ - 1 , 1 ] such that

  1. (1)

    g(h~t+1j|hti,htj,uti,utj)=Πtj,i,g{i,j}(h~t+1j|kti,htj,uti,utj)superscript𝑔conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscriptsubscriptΠ𝑡𝑗𝑖superscript𝑔𝑖𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗\mathbb{P}^{g}(\tilde{h}_{t+1}^{j}|h_{t}^{i},h_{t}^{j},u_{t}^{i},u_{t}^{j})=% \Pi_{t}^{j,i,g^{-\{i,j\}}}(\tilde{h}_{t+1}^{j}|k_{t}^{i},h_{t}^{j},u_{t}^{i},u% _{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) for all t𝒯\{T}𝑡\𝒯𝑇t\in\mathcal{T}\backslash\{T\}italic_t ∈ caligraphic_T \ { italic_T }; and

  2. (2)

    𝔼g[Rtj|hti,htj,uti,utj]=rti,j,g{i,j}(kti,htj,uti,utj)superscript𝔼𝑔delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscriptsubscript𝑟𝑡𝑖𝑗superscript𝑔𝑖𝑗superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗\mathbb{E}^{g}[R_{t}^{j}|h_{t}^{i},h_{t}^{j},u_{t}^{i},u_{t}^{j}]=r_{t}^{i,j,g% ^{-\{i,j\}}}(k_{t}^{i},h_{t}^{j},u_{t}^{i},u_{t}^{j})blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) for all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T,

for all j\{i}𝑗\𝑖j\in\mathcal{I}\backslash\{i\}italic_j ∈ caligraphic_I \ { italic_i } and all behavioral strategy profiles g𝑔gitalic_g whenever the left-hand side expressions are well-defined.

Proof C.15 (Proof of Lemma C.14).

Let g^lsuperscript^𝑔𝑙\hat{g}^{l}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT be some fixed, fully mixed behavioral strategy for player l𝑙l\in\mathcal{I}italic_l ∈ caligraphic_I.

Fix ji𝑗𝑖j\neq iitalic_j ≠ italic_i. First,

g(xt,ht{i,j}|hti,htj)superscript𝑔subscript𝑥𝑡conditionalsuperscriptsubscript𝑡𝑖𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle\mathbb{P}^{g}(x_{t},h_{t}^{-\{i,j\}}|h_{t}^{i},h_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =g^{i,j},g{i,j}(xt,ht{i,j}|hti,htj)absentsuperscriptsuperscript^𝑔𝑖𝑗superscript𝑔𝑖𝑗subscript𝑥𝑡conditionalsuperscriptsubscript𝑡𝑖𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\mathbb{P}^{\hat{g}^{\{i,j\}},g^{-\{i,j\}}}(x_{t},h_{t}^{-\{i,j% \}}|h_{t}^{i},h_{t}^{j})= blackboard_P start_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT { italic_i , italic_j } end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (131)
=Φti,(g^j,g{i,j})(xt,hti|kti)x~t,h~t{i,j}Φti,(g^j,g{i,j})(x~t,(h~t{i,j},htj)|kti)absentsuperscriptsubscriptΦ𝑡𝑖superscript^𝑔𝑗superscript𝑔𝑖𝑗subscript𝑥𝑡conditionalsuperscriptsubscript𝑡𝑖superscriptsubscript𝑘𝑡𝑖subscriptsubscript~𝑥𝑡superscriptsubscript~𝑡𝑖𝑗superscriptsubscriptΦ𝑡𝑖superscript^𝑔𝑗superscript𝑔𝑖𝑗subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑘𝑡𝑖\displaystyle=\dfrac{\Phi_{t}^{i,(\hat{g}^{j},g^{-\{i,j\}})}(x_{t},h_{t}^{-i}|% k_{t}^{i})}{\sum_{\tilde{x}_{t},\tilde{h}_{t}^{-\{i,j\}}}\Phi_{t}^{i,(\hat{g}^% {j},g^{-\{i,j\}})}(\tilde{x}_{t},(\tilde{h}_{t}^{-\{i,j\}},h_{t}^{j})|k_{t}^{i% })}= divide start_ARG roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , ( over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , ( over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG (132)
=:Φti,j,g{i,j}(xt,ht{i,j}|kti,htj),\displaystyle=:\Phi_{t}^{i,j,g^{-\{i,j\}}}(x_{t},h_{t}^{-\{i,j\}}|k_{t}^{i},h_% {t}^{j}),= : roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (133)

for any behavioral strategy profile g𝑔gitalic_g, where in (131) we used the fact that since (hti,htj)superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗(h_{t}^{i},h_{t}^{j})( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) are included in the conditioning, the conditional probability is independent of the strategies of player i𝑖iitalic_i and j𝑗jitalic_j [23, Section 6.5]. In (132) we used Bayes rule and the definition of USI (Definition 1).

Therefore, using the Law of Total Probability,

g(x~t,u~t{i,j}|hti,htj)superscript𝑔subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle\mathbb{P}^{g}(\tilde{x}_{t},\tilde{u}_{t}^{-\{i,j\}}|h_{t}^{i},h% _{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =h~t{i,j}g(u~t{i,j}|x~t,h~t{i,j},hti,htj)g(x~t,h~t{i,j}|hti,htj)absentsubscriptsuperscriptsubscript~𝑡𝑖𝑗superscript𝑔conditionalsuperscriptsubscript~𝑢𝑡𝑖𝑗subscript~𝑥𝑡superscriptsubscript~𝑡𝑖𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗superscript𝑔subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{h}_{t}^{-\{i,j\}}}\mathbb{P}^{g}(\tilde{u}_{t}^{-\{% i,j\}}|\tilde{x}_{t},\tilde{h}_{t}^{-\{i,j\}},h_{t}^{i},h_{t}^{j})\mathbb{P}^{% g}(\tilde{x}_{t},\tilde{h}_{t}^{-\{i,j\}}|h_{t}^{i},h_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (134)
=h~t{i,j}(l\{i,j}gtl(u~tl|h~tl))Φti,j,g{i,j}(x~t,h~t{i,j}|kti,htj)absentsubscriptsuperscriptsubscript~𝑡𝑖𝑗subscriptproduct𝑙\𝑖𝑗superscriptsubscript𝑔𝑡𝑙conditionalsuperscriptsubscript~𝑢𝑡𝑙superscriptsubscript~𝑡𝑙superscriptsubscriptΦ𝑡𝑖𝑗superscript𝑔𝑖𝑗subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖𝑗superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{h}_{t}^{-\{i,j\}}}\left(\prod_{l\in\mathcal{I}% \backslash\{i,j\}}g_{t}^{l}(\tilde{u}_{t}^{l}|\tilde{h}_{t}^{l})\right)\Phi_{t% }^{i,j,g^{-\{i,j\}}}(\tilde{x}_{t},\tilde{h}_{t}^{-\{i,j\}}|k_{t}^{i},h_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_l ∈ caligraphic_I \ { italic_i , italic_j } end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (135)
=:P~ti,j,g{i,j}(x~t,u~t{i,j}|kti,htj),\displaystyle=:\tilde{P}_{t}^{i,j,g^{-\{i,j\}}}(\tilde{x}_{t},\tilde{u}_{t}^{-% \{i,j\}}|k_{t}^{i},h_{t}^{j}),= : over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (136)

for any behavioral strategy profile g𝑔gitalic_g.

We know that Ht+1j=ξtj(Xt,Ut,Htj)superscriptsubscript𝐻𝑡1𝑗superscriptsubscript𝜉𝑡𝑗subscript𝑋𝑡subscript𝑈𝑡superscriptsubscript𝐻𝑡𝑗H_{t+1}^{j}=\xi_{t}^{j}(X_{t},U_{t},H_{t}^{j})italic_H start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) for some function ξtjsuperscriptsubscript𝜉𝑡𝑗\xi_{t}^{j}italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT independent of the strategy profile g𝑔gitalic_g, hence using the Law of Total Probability we have

g(h~t+1j|hti,htj,uti,utj)superscript𝑔conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗\displaystyle\enspace\enspace\>\mathbb{P}^{g}(\tilde{h}_{t+1}^{j}|h_{t}^{i},h_% {t}^{j},u_{t}^{i},u_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (137)
=x~t,u~t{i,j}𝟏{h~t+1j=ξti(x~t,(ut{i,j},u~t{i,j}),htj)}P~ti,j,g{i,j}(x~t,u~t{i,j}|kti,htj)absentsubscriptsubscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖𝑗subscript1superscriptsubscript~𝑡1𝑗superscriptsubscript𝜉𝑡𝑖subscript~𝑥𝑡superscriptsubscript𝑢𝑡𝑖𝑗superscriptsubscript~𝑢𝑡𝑖𝑗superscriptsubscript𝑡𝑗superscriptsubscript~𝑃𝑡𝑖𝑗superscript𝑔𝑖𝑗subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖𝑗superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{x}_{t},\tilde{u}_{t}^{-\{i,j\}}}\bm{1}_{\{\tilde{h}% _{t+1}^{j}=\xi_{t}^{i}(\tilde{x}_{t},(u_{t}^{\{i,j\}},\tilde{u}_{t}^{-\{i,j\}}% ),h_{t}^{j})\}}\tilde{P}_{t}^{i,j,g^{-\{i,j\}}}(\tilde{x}_{t},\tilde{u}_{t}^{-% \{i,j\}}|k_{t}^{i},h_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT { over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT { italic_i , italic_j } end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT ) , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) } end_POSTSUBSCRIPT over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (138)
=:Πtj,i,g{i,j}(h~t+1j|kti,htj,uti,utj),\displaystyle=:\Pi_{t}^{j,i,g^{-\{i,j\}}}(\tilde{h}_{t+1}^{j}|k_{t}^{i},h_{t}^% {j},u_{t}^{i},u_{t}^{j}),= : roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (139)

establishing part (1) of Lemma C.14.

Since 𝔼[Rtj|xt,ut]𝔼delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗subscript𝑥𝑡subscript𝑢𝑡\mathbb{E}[R_{t}^{j}|x_{t},u_{t}]blackboard_E [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] is strategy-independent, for j\{i}𝑗\𝑖j\in\mathcal{I}\backslash\{i\}italic_j ∈ caligraphic_I \ { italic_i }, using the Law of Total Expectation we have

𝔼g[Rtj|hti,htj,uti,utj]superscript𝔼𝑔delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗\displaystyle\mathbb{E}^{g}[R_{t}^{j}|h_{t}^{i},h_{t}^{j},u_{t}^{i},u_{t}^{j}]blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] =x~t,u~ti𝔼[Rtj|x~t,(ut{i,j},u~t{i,j})]P~ti,j,g{i,j}(x~t,u~t{i,j}|kti,htj)absentsubscriptsubscript~𝑥𝑡superscriptsubscript~𝑢𝑡𝑖𝔼delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗subscript~𝑥𝑡superscriptsubscript𝑢𝑡𝑖𝑗superscriptsubscript~𝑢𝑡𝑖𝑗superscriptsubscript~𝑃𝑡𝑖𝑗superscript𝑔𝑖𝑗subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑢𝑡𝑖𝑗superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{x}_{t},\tilde{u}_{t}^{-i}}\mathbb{E}[R_{t}^{j}|% \tilde{x}_{t},(u_{t}^{\{i,j\}},\tilde{u}_{t}^{-\{i,j\}})]\tilde{P}_{t}^{i,j,g^% {-\{i,j\}}}(\tilde{x}_{t},\tilde{u}_{t}^{-\{i,j\}}|k_{t}^{i},h_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT { italic_i , italic_j } end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT ) ] over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (140)
=:rti,j,g{i,j}(kti,htj,uti,utj),\displaystyle=:r_{t}^{i,j,g^{-\{i,j\}}}(k_{t}^{i},h_{t}^{j},u_{t}^{i},u_{t}^{j% }),= : italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (141)

establishing part (2) of Lemma C.14.

Lemma C.16.

Suppose that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information. Let g=(gj)j𝑔subscriptsuperscript𝑔𝑗𝑗g=(g^{j})_{j\in\mathcal{I}}italic_g = ( italic_g start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j ∈ caligraphic_I end_POSTSUBSCRIPT be a fully mixed behavioral strategy profile. Let a Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT be such that

ρti(uti|kti)=h~tigti(uti|h~ti)Fti,gi(h~ti|kti).superscriptsubscript𝜌𝑡𝑖conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑘𝑡𝑖subscriptsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript~𝑡𝑖superscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\rho_{t}^{i}(u_{t}^{i}|k_{t}^{i})=\sum_{\tilde{h}_{t}^{i}}g_{t}^{i}(u_{t}^{i}|% \tilde{h}_{t}^{i})F_{t}^{i,g^{i}}(\tilde{h}_{t}^{i}|k_{t}^{i}).italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (142)

Then

  1. (1)

    g(h~t+1j|htj,utj)=ρi,gi(h~t+1j|htj,utj)superscript𝑔conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗superscriptsuperscript𝜌𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\mathbb{P}^{g}(\tilde{h}_{t+1}^{j}|h_{t}^{j},u_{t}^{j})=\mathbb{P}^{\rho^{i},g% ^{-i}}(\tilde{h}_{t+1}^{j}|h_{t}^{j},u_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) for all t𝒯\{T}𝑡\𝒯𝑇t\in\mathcal{T}\backslash\{T\}italic_t ∈ caligraphic_T \ { italic_T }; and

  2. (2)

    𝔼g[Rtj|htj,utj]=𝔼ρi,gi[Rtj|htj,utj]superscript𝔼𝑔delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗superscript𝔼superscript𝜌𝑖superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\mathbb{E}^{g}[R_{t}^{j}|h_{t}^{j},u_{t}^{j}]=\mathbb{E}^{\rho^{i},g^{-i}}[R_{% t}^{j}|h_{t}^{j},u_{t}^{j}]blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] = blackboard_E start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] for all t𝒯𝑡𝒯t\in\mathcal{T}italic_t ∈ caligraphic_T,

for all j\{i}𝑗\𝑖j\in\mathcal{I}\backslash\{i\}italic_j ∈ caligraphic_I \ { italic_i } and all htjtj,utj𝒰tjformulae-sequencesuperscriptsubscript𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗superscriptsubscript𝒰𝑡𝑗h_{t}^{j}\in\mathcal{H}_{t}^{j},u_{t}^{j}\in\mathcal{U}_{t}^{j}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∈ caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT.

Proof C.17.

Fixing gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a controlled Markov Chain controlled by Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and player i𝑖iitalic_i faces a Markov Decision Problem. By Lemma C.4, Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is an information state (as defined in 9) of this MDP. Therefore, by the Policy Equivalence Lemma (Lemma A.2) we have

gi,gi(kti)=ρi,gi(kti).superscriptsuperscript𝑔𝑖superscript𝑔𝑖superscriptsubscript𝑘𝑡𝑖superscriptsuperscript𝜌𝑖superscript𝑔𝑖superscriptsubscript𝑘𝑡𝑖\mathbb{P}^{g^{i},g^{-i}}(k_{t}^{i})=\mathbb{P}^{\rho^{i},g^{-i}}(k_{t}^{i}).blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (143)

Furthermore, from the definition of USI we have

gi,gi(htj|kti)superscriptsuperscript𝑔𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript𝑘𝑡𝑖\displaystyle\mathbb{P}^{g^{i},g^{-i}}(h_{t}^{j}|k_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =x~t,h~t{i,j}Φti,gi(x~t,(htj,ht{i,j})|kti)absentsubscriptsubscript~𝑥𝑡superscriptsubscript~𝑡𝑖𝑗superscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript𝑡𝑖𝑗superscriptsubscript𝑘𝑡𝑖\displaystyle=\sum_{\tilde{x}_{t},\tilde{h}_{t}^{-\{i,j\}}}\Phi_{t}^{i,g^{-i}}% (\tilde{x}_{t},(h_{t}^{j},h_{t}^{-\{i,j\}})|k_{t}^{i})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT ) | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) (144)
=:Fti,j,gi(htj|kti).\displaystyle=:F_{t}^{i,j,g^{-i}}(h_{t}^{j}|k_{t}^{i}).= : italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (145)

Using Bayes Rule, we then have

gi,gi(kti|htj)superscriptsuperscript𝑔𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle\mathbb{P}^{g^{i},g^{-i}}(k_{t}^{i}|h_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =gi,gi(htj|kti)gi,gi(kti)k~tigi,gi(htj|k~ti)gi,gi(k~ti)absentsuperscriptsuperscript𝑔𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript𝑘𝑡𝑖superscriptsuperscript𝑔𝑖superscript𝑔𝑖superscriptsubscript𝑘𝑡𝑖subscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsuperscript𝑔𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsuperscript𝑔𝑖superscript𝑔𝑖superscriptsubscript~𝑘𝑡𝑖\displaystyle=\dfrac{\mathbb{P}^{g^{i},g^{-i}}(h_{t}^{j}|k_{t}^{i})\mathbb{P}^% {g^{i},g^{-i}}(k_{t}^{i})}{\sum_{\tilde{k}_{t}^{i}}\mathbb{P}^{g^{i},g^{-i}}(h% _{t}^{j}|\tilde{k}_{t}^{i})\mathbb{P}^{g^{i},g^{-i}}(\tilde{k}_{t}^{i})}= divide start_ARG blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG (146)
=Fti,j,gi(htj|kti)gi,gi(kti)k~tiFti,j,gi(htj|k~ti)gi,gi(k~ti).absentsuperscriptsubscript𝐹𝑡𝑖𝑗superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript𝑘𝑡𝑖superscriptsuperscript𝑔𝑖superscript𝑔𝑖superscriptsubscript𝑘𝑡𝑖subscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝐹𝑡𝑖𝑗superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsuperscript𝑔𝑖superscript𝑔𝑖superscriptsubscript~𝑘𝑡𝑖\displaystyle=\dfrac{F_{t}^{i,j,g^{-i}}(h_{t}^{j}|k_{t}^{i})\mathbb{P}^{g^{i},% g^{-i}}(k_{t}^{i})}{\sum_{\tilde{k}_{t}^{i}}F_{t}^{i,j,g^{-i}}(h_{t}^{j}|% \tilde{k}_{t}^{i})\mathbb{P}^{g^{i},g^{-i}}(\tilde{k}_{t}^{i})}.= divide start_ARG italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG . (147)

Note that (147) applies for all strategies gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Replacing gisuperscript𝑔𝑖g^{i}italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT with ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT we have

ρi,gi(kti|htj)superscriptsuperscript𝜌𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle\mathbb{P}^{\rho^{i},g^{-i}}(k_{t}^{i}|h_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =Fti,j,gi(htj|kti)ρi,gi(kti)k~tiFti,j,gi(htj|k~ti)ρi,gi(k~ti).absentsuperscriptsubscript𝐹𝑡𝑖𝑗superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript𝑘𝑡𝑖superscriptsuperscript𝜌𝑖superscript𝑔𝑖superscriptsubscript𝑘𝑡𝑖subscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝐹𝑡𝑖𝑗superscript𝑔𝑖conditionalsuperscriptsubscript𝑡𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsuperscript𝜌𝑖superscript𝑔𝑖superscriptsubscript~𝑘𝑡𝑖\displaystyle=\dfrac{F_{t}^{i,j,g^{-i}}(h_{t}^{j}|k_{t}^{i})\mathbb{P}^{\rho^{% i},g^{-i}}(k_{t}^{i})}{\sum_{\tilde{k}_{t}^{i}}F_{t}^{i,j,g^{-i}}(h_{t}^{j}|% \tilde{k}_{t}^{i})\mathbb{P}^{\rho^{i},g^{-i}}(\tilde{k}_{t}^{i})}.= divide start_ARG italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) end_ARG . (148)

Combining (143), (147), and (148) we conclude that

gi,gi(kti|htj)=ρi,gi(kti|htj).superscriptsuperscript𝑔𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsuperscript𝜌𝑖superscript𝑔𝑖conditionalsuperscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle\mathbb{P}^{g^{i},g^{-i}}(k_{t}^{i}|h_{t}^{j})=\mathbb{P}^{\rho^{% i},g^{-i}}(k_{t}^{i}|h_{t}^{j}).blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) . (149)

Using (142), Lemma C.12, and Lemma C.14 we have

g(h~t+1j|htj,utj)superscript𝑔conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\quad~{}\mathbb{P}^{g}(\tilde{h}_{t+1}^{j}|h_{t}^{j},u_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (150)
=h~ti:g(h~ti,htj)>0u~tig(h~t+1j|h~ti,htj,u~ti,utj)g(u~ti|h~ti,htj,utj)g(h~ti|htj,utj)absentsubscript:superscriptsubscript~𝑡𝑖superscript𝑔superscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑗0subscriptsuperscriptsubscript~𝑢𝑡𝑖superscript𝑔conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscript𝑔conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗superscript𝑔conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle=\sum_{\tilde{h}_{t}^{i}:\mathbb{P}^{g}(\tilde{h}_{t}^{i},h_{t}^{% j})>0}\sum_{\tilde{u}_{t}^{i}}\mathbb{P}^{g}(\tilde{h}_{t+1}^{j}|\tilde{h}_{t}% ^{i},h_{t}^{j},\tilde{u}_{t}^{i},u_{t}^{j})\mathbb{P}^{g}(\tilde{u}_{t}^{i}|% \tilde{h}_{t}^{i},h_{t}^{j},u_{t}^{j})\mathbb{P}^{g}(\tilde{h}_{t}^{i}|h_{t}^{% j},u_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) > 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (151)
=h~ti,u~tiΠtj,i,g{i,j}(h~t+1j|k~ti,htj,u~ti,utj)gti(u~ti|h~ti)g(h~ti|htj)absentsubscriptsuperscriptsubscript~𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscriptΠ𝑡𝑗𝑖superscript𝑔𝑖𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript~𝑡𝑖superscript𝑔conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{h}_{t}^{i},\tilde{u}_{t}^{i}}\Pi_{t}^{j,i,g^{-\{i,j% \}}}(\tilde{h}_{t+1}^{j}|\tilde{k}_{t}^{i},h_{t}^{j},\tilde{u}_{t}^{i},u_{t}^{% j})g_{t}^{i}(\tilde{u}_{t}^{i}|\tilde{h}_{t}^{i})\mathbb{P}^{g}(\tilde{h}_{t}^% {i}|h_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (152)
=h~ti,u~tiΠtj,i,g{i,j}(h~t+1j|k~ti,htj,u~ti,utj)gti(u~ti|h~ti)g(h~ti|k~ti)g(k~ti|htj)absentsubscriptsuperscriptsubscript~𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscriptΠ𝑡𝑗𝑖superscript𝑔𝑖𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript~𝑡𝑖superscript𝑔conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscript𝑔conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{h}_{t}^{i},\tilde{u}_{t}^{i}}\Pi_{t}^{j,i,g^{-\{i,j% \}}}(\tilde{h}_{t+1}^{j}|\tilde{k}_{t}^{i},h_{t}^{j},\tilde{u}_{t}^{i},u_{t}^{% j})g_{t}^{i}(\tilde{u}_{t}^{i}|\tilde{h}_{t}^{i})\mathbb{P}^{g}(\tilde{h}_{t}^% {i}|\tilde{k}_{t}^{i})\mathbb{P}^{g}(\tilde{k}_{t}^{i}|h_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (153)
=k~ti,u~tiΠtj,i,g{i,j}(h~t+1j|k~ti,htj,u~ti,utj)(h^tigti(u~ti|h^ti)g(h^ti|k~ti))g(k~ti|htj)absentsubscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscriptΠ𝑡𝑗𝑖superscript𝑔𝑖𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗subscriptsuperscriptsubscript^𝑡𝑖superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript^𝑡𝑖superscript𝑔conditionalsuperscriptsubscript^𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscript𝑔conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{k}_{t}^{i},\tilde{u}_{t}^{i}}\Pi_{t}^{j,i,g^{-\{i,j% \}}}(\tilde{h}_{t+1}^{j}|\tilde{k}_{t}^{i},h_{t}^{j},\tilde{u}_{t}^{i},u_{t}^{% j})\left(\sum_{\hat{h}_{t}^{i}}g_{t}^{i}(\tilde{u}_{t}^{i}|\hat{h}_{t}^{i})% \mathbb{P}^{g}(\hat{h}_{t}^{i}|\tilde{k}_{t}^{i})\right)\mathbb{P}^{g}(\tilde{% k}_{t}^{i}|h_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ( ∑ start_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (154)
=k~ti,u~tiΠtj,i,g{i,j}(h~t+1j|k~ti,htj,u~ti,utj)ρti(u~ti|k~ti)g(k~ti|htj),absentsubscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscriptΠ𝑡𝑗𝑖superscript𝑔𝑖𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscriptsubscript𝜌𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscript𝑔conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{k}_{t}^{i},\tilde{u}_{t}^{i}}\Pi_{t}^{j,i,g^{-\{i,j% \}}}(\tilde{h}_{t+1}^{j}|\tilde{k}_{t}^{i},h_{t}^{j},\tilde{u}_{t}^{i},u_{t}^{% j})\rho_{t}^{i}(\tilde{u}_{t}^{i}|\tilde{k}_{t}^{i})\mathbb{P}^{g}(\tilde{k}_{% t}^{i}|h_{t}^{j}),= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (155)

where in (152) we utilized Lemma C.14 and the function Πtj,i,g{i,j}superscriptsubscriptΠ𝑡𝑗𝑖superscript𝑔𝑖𝑗\Pi_{t}^{j,i,g^{-\{i,j\}}}roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT defined in it. In (153) we applied Lemma C.12. In the last equation we used (142) and the definition of USI.

Following a similar argument, we can show that

ρi,gi(h~t+1j|htj,utj)superscriptsuperscript𝜌𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\quad~{}\mathbb{P}^{\rho^{i},g^{-i}}(\tilde{h}_{t+1}^{j}|h_{t}^{j% },u_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (156)
=k~ti,u~tiΠtj,i,g{i,j}(h~t+1j|k~ti,htj,u~ti,utj)ρti(u~ti|k~ti)ρi,gi(k~ti|htj).absentsubscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscriptΠ𝑡𝑗𝑖superscript𝑔𝑖𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscriptsubscript𝜌𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscriptsuperscript𝜌𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{k}_{t}^{i},\tilde{u}_{t}^{i}}\Pi_{t}^{j,i,g^{-\{i,j% \}}}(\tilde{h}_{t+1}^{j}|\tilde{k}_{t}^{i},h_{t}^{j},\tilde{u}_{t}^{i},u_{t}^{% j})\rho_{t}^{i}(\tilde{u}_{t}^{i}|\tilde{k}_{t}^{i})\mathbb{P}^{\rho^{i},g^{-i% }}(\tilde{k}_{t}^{i}|h_{t}^{j}).= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_i , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) . (157)

Using (149) and comparing (155) with (157), we conclude that

g(h~t+1j|htj,utj)=ρi,gi(h~t+1j|htj,utj),superscript𝑔conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗superscriptsuperscript𝜌𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\mathbb{P}^{g}(\tilde{h}_{t+1}^{j}|h_{t}^{j},u_{t}^{j})=\mathbb{P% }^{\rho^{i},g^{-i}}(\tilde{h}_{t+1}^{j}|h_{t}^{j},u_{t}^{j}),blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) = blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (158)

proving statement (1) of the Lemma.

Following an analogous argument, we can show that

𝔼g[Rtj|htj,utj]superscript𝔼𝑔delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\mathbb{E}^{g}[R_{t}^{j}|h_{t}^{j},u_{t}^{j}]blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] =k~ti,u~tirti,j,g{i,j}(k~ti,htj,u~ti,utj)ρti(u~ti|k~ti)g(k~ti|htj)absentsubscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑟𝑡𝑖𝑗superscript𝑔𝑖𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscriptsubscript𝜌𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscript𝑔conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{k}_{t}^{i},\tilde{u}_{t}^{i}}r_{t}^{i,j,g^{-\{i,j\}% }}(\tilde{k}_{t}^{i},h_{t}^{j},\tilde{u}_{t}^{i},u_{t}^{j})\rho_{t}^{i}(\tilde% {u}_{t}^{i}|\tilde{k}_{t}^{i})\mathbb{P}^{g}(\tilde{k}_{t}^{i}|h_{t}^{j})= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) (159)
𝔼ρi,gi[Rtj|htj,utj]superscript𝔼superscript𝜌𝑖superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\mathbb{E}^{\rho^{i},g^{-i}}[R_{t}^{j}|h_{t}^{j},u_{t}^{j}]blackboard_E start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] =k~ti,u~tirti,j,g{i,j}(k~ti,htj,u~ti,utj)ρti(u~ti|k~ti)ρi,gi(k~ti|htj),absentsubscriptsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑟𝑡𝑖𝑗superscript𝑔𝑖𝑗superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑖superscriptsubscript𝑢𝑡𝑗superscriptsubscript𝜌𝑡𝑖conditionalsuperscriptsubscript~𝑢𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscriptsuperscript𝜌𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑗\displaystyle=\sum_{\tilde{k}_{t}^{i},\tilde{u}_{t}^{i}}r_{t}^{i,j,g^{-\{i,j\}% }}(\tilde{k}_{t}^{i},h_{t}^{j},\tilde{u}_{t}^{i},u_{t}^{j})\rho_{t}^{i}(\tilde% {u}_{t}^{i}|\tilde{k}_{t}^{i})\mathbb{P}^{\rho^{i},g^{-i}}(\tilde{k}_{t}^{i}|h% _{t}^{j}),= ∑ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (160)

where rti,j,g{i,j}superscriptsubscript𝑟𝑡𝑖𝑗superscript𝑔𝑖𝑗r_{t}^{i,j,g^{-\{i,j\}}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_j , italic_g start_POSTSUPERSCRIPT - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is defined in Lemma C.14. We similarly conclude that

𝔼g[Rtj|htj,utj]superscript𝔼𝑔delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\mathbb{E}^{g}[R_{t}^{j}|h_{t}^{j},u_{t}^{j}]blackboard_E start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] =𝔼ρi,gi[Rtj|htj,utj],absentsuperscript𝔼superscript𝜌𝑖superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle=\mathbb{E}^{\rho^{i},g^{-i}}[R_{t}^{j}|h_{t}^{j},u_{t}^{j}],= blackboard_E start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] , (161)

proving statement (2) of the Lemma.

Lemma C.18.

Suppose that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information for player i𝑖iitalic_i. Let gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT be a fully mixed behavioral strategy profile for players other than i𝑖iitalic_i. Define Qτisuperscriptsubscript𝑄𝜏𝑖Q_{\tau}^{i}italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT through

Qτi(hτi,uτi)=𝔼gi[Rτi|hτi,uτi]+maxg~τ+1:Ti𝔼g~τ+1:Ti,gi[t=τ+1TRti|hτi,uτi].superscriptsubscript𝑄𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscript𝔼superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscript𝑅𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript~𝑔:𝜏1𝑇𝑖superscript𝔼superscriptsubscript~𝑔:𝜏1𝑇𝑖superscript𝑔𝑖delimited-[]conditionalsuperscriptsubscript𝑡𝜏1𝑇superscriptsubscript𝑅𝑡𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖Q_{\tau}^{i}(h_{\tau}^{i},u_{\tau}^{i})=\mathbb{E}^{g^{-i}}[R_{\tau}^{i}|h_{% \tau}^{i},u_{\tau}^{i}]+\underset{\tilde{g}_{\tau+1:T}^{i}}{\max}~{}\mathbb{E}% ^{\tilde{g}_{\tau+1:T}^{i},g^{-i}}\left[\sum_{t=\tau+1}^{T}R_{t}^{i}\Big{|}h_{% \tau}^{i},u_{\tau}^{i}\right].italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] + start_UNDERACCENT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_max end_ARG blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_τ + 1 : italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = italic_τ + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] . (162)

Then there exist a function Q^τi:𝒦τi×𝒰τi[T,T]:superscriptsubscript^𝑄𝜏𝑖maps-tosuperscriptsubscript𝒦𝜏𝑖superscriptsubscript𝒰𝜏𝑖𝑇𝑇\hat{Q}_{\tau}^{i}:\mathcal{K}_{\tau}^{i}\times\mathcal{U}_{\tau}^{i}\mapsto[-% T,T]over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : caligraphic_K start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ↦ [ - italic_T , italic_T ] such that

Qτi(hτi,uτi)=Q^τi(kτi,uτi).superscriptsubscript𝑄𝜏𝑖superscriptsubscript𝜏𝑖superscriptsubscript𝑢𝜏𝑖superscriptsubscript^𝑄𝜏𝑖superscriptsubscript𝑘𝜏𝑖superscriptsubscript𝑢𝜏𝑖Q_{\tau}^{i}(h_{\tau}^{i},u_{\tau}^{i})=\hat{Q}_{\tau}^{i}(k_{\tau}^{i},u_{% \tau}^{i}).italic_Q start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (163)
Proof C.19.

By Lemma C.4, Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is an information state for the payoff of player i𝑖iitalic_i under gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT. Fixing gisuperscript𝑔𝑖g^{-i}italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a controlled Markov Chain controlled by Utisuperscriptsubscript𝑈𝑡𝑖U_{t}^{i}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Through Definition 9, Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is an information state of this controlled Markov Chain. The Lemma then follows from a direct application of Lemma A.1.

Lemma C.20.

Suppose that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information for player i𝑖iitalic_i. Let g𝑔gitalic_g be (the strategy part of) a sequential equilibrium. Then there exist a Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT-based strategy ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT such that (ρi,gi)superscript𝜌𝑖superscript𝑔𝑖(\rho^{i},g^{-i})( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) is (the strategy part of) a sequential equilibrium with the same expected payoff profile as g𝑔gitalic_g.

Proof C.21 (Proof of Lemma C.20).

Recall that in Theorem 15 we established the equivalence of a variety of definitions of Sequential Equilibrium for strategy profiles. Let (g,Q)𝑔𝑄(g,Q)( italic_g , italic_Q ) be a sequential equilibrium under Definition 13. Let (g(n),Q(n))superscript𝑔𝑛superscript𝑄𝑛(g^{(n)},Q^{(n)})( italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) be a sequence of strategy and conjecture profiles that satisfies conditions (1)(2’)(3) of Definition 13.

Set ρ(n),isuperscript𝜌𝑛𝑖\rho^{(n),i}italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT through

ρt(n),i(uti|kti)=h~tigt(n),i(uti|h~ti)Fti,g(n),i(h~ti|kti),superscriptsubscript𝜌𝑡𝑛𝑖conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑘𝑡𝑖subscriptsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑔𝑡𝑛𝑖conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript~𝑡𝑖superscriptsubscript𝐹𝑡𝑖superscript𝑔𝑛𝑖conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\rho_{t}^{(n),i}(u_{t}^{i}|k_{t}^{i})=\sum_{\tilde{h}_{t}^{i}}g_{t}^{(n),i}(u_% {t}^{i}|\tilde{h}_{t}^{i})F_{t}^{i,g^{(n),i}}(\tilde{h}_{t}^{i}|k_{t}^{i}),italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (164)

where Fti,g(n),isuperscriptsubscript𝐹𝑡𝑖superscript𝑔𝑛𝑖F_{t}^{i,g^{(n),i}}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is defined in Definition 1. By replacing the sequence with one of its sub-sequences, without loss of generality, assume that ρ(n),iρisuperscript𝜌𝑛𝑖superscript𝜌𝑖\rho^{(n),i}\rightarrow\rho^{i}italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT → italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for some ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

For the ease of notation, denote g¯(n)=(ρ(n),i,g(n),i)superscript¯𝑔𝑛superscript𝜌𝑛𝑖superscript𝑔𝑛𝑖\bar{g}^{(n)}=(\rho^{(n),i},g^{(n),-i})over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = ( italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT ) and g¯=(ρi,gi)¯𝑔superscript𝜌𝑖superscript𝑔𝑖\bar{g}=(\rho^{i},g^{-i})over¯ start_ARG italic_g end_ARG = ( italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ). We have g¯(n)g¯superscript¯𝑔𝑛¯𝑔\bar{g}^{(n)}\rightarrow\bar{g}over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT → over¯ start_ARG italic_g end_ARG. In the rest of the proof, we will show that (g¯,Q)¯𝑔𝑄(\bar{g},Q)( over¯ start_ARG italic_g end_ARG , italic_Q ) is a sequential equilibrium.

We only need to show that g¯¯𝑔\bar{g}over¯ start_ARG italic_g end_ARG is sequentially rational to Q𝑄Qitalic_Q and (g¯(n),Q(n))superscript¯𝑔𝑛superscript𝑄𝑛(\bar{g}^{(n)},Q^{(n)})( over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_Q start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) satisfies conditions (2’) of Definition 13, as conditions (1)(3) of Definition 13 are true by construction. Since g¯i=gisuperscript¯𝑔𝑖superscript𝑔𝑖\bar{g}^{-i}=g^{-i}over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT = italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT, we automatically have g¯jsuperscript¯𝑔𝑗\bar{g}^{j}over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT to be sequentially rational given Qjsuperscript𝑄𝑗Q^{j}italic_Q start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT for all j\{i}𝑗\𝑖j\in\mathcal{I}\backslash\{i\}italic_j ∈ caligraphic_I \ { italic_i }, and Q(n),isuperscript𝑄𝑛𝑖Q^{(n),i}italic_Q start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT to be consistent with g¯(n),isuperscript¯𝑔𝑛𝑖\bar{g}^{(n),-i}over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_n ) , - italic_i end_POSTSUPERSCRIPT for each n𝑛nitalic_n. It suffices to establish

  1. (i)

    ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is sequentially rational with respect to Qisuperscript𝑄𝑖Q^{i}italic_Q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT; and

  2. (ii)

    Q(n),jsuperscript𝑄𝑛𝑗Q^{(n),j}italic_Q start_POSTSUPERSCRIPT ( italic_n ) , italic_j end_POSTSUPERSCRIPT is consistent with g¯(n),jsuperscript¯𝑔𝑛𝑗\bar{g}^{(n),-j}over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_n ) , - italic_j end_POSTSUPERSCRIPT for each j\{i}𝑗\𝑖j\in\mathcal{I}\backslash\{i\}italic_j ∈ caligraphic_I \ { italic_i }.

To establish (i), we will use the Lemma C.18 to show that Qti(hti,uti)superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖Q_{t}^{i}(h_{t}^{i},u_{t}^{i})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is a function of (kti,uti)superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖(k_{t}^{i},u_{t}^{i})( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), and hence one can use an ktisuperscriptsubscript𝑘𝑡𝑖k_{t}^{i}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT based strategy to optimize Qtisuperscriptsubscript𝑄𝑡𝑖Q_{t}^{i}italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Proof of (i): By construction,

ρt(n),i(kti)=h~ti:k~ti=ktigt(n),i(h~ti)ηt(n)(h~ti|kti),superscriptsubscript𝜌𝑡𝑛𝑖superscriptsubscript𝑘𝑡𝑖subscript:superscriptsubscript~𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑔𝑡𝑛𝑖superscriptsubscript~𝑡𝑖superscriptsubscript𝜂𝑡𝑛conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\rho_{t}^{(n),i}(k_{t}^{i})=\sum_{\tilde{h}_{t}^{i}:\tilde{k}_{t}^{i}=k_{t}^{i% }}g_{t}^{(n),i}(\tilde{h}_{t}^{i})\cdot\eta_{t}^{(n)}(\tilde{h}_{t}^{i}|k_{t}^% {i}),italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ⋅ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (165)

for some distribution ηt(n)(kti)Δ(ti)superscriptsubscript𝜂𝑡𝑛superscriptsubscript𝑘𝑡𝑖Δsuperscriptsubscript𝑡𝑖\eta_{t}^{(n)}(k_{t}^{i})\in\Delta(\mathcal{H}_{t}^{i})italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ∈ roman_Δ ( caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). Let ηt(kti)subscript𝜂𝑡superscriptsubscript𝑘𝑡𝑖\eta_{t}(k_{t}^{i})italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) be an accumulation point of the sequence [ηt(n)(kti)]n=1superscriptsubscriptdelimited-[]superscriptsubscript𝜂𝑡𝑛superscriptsubscript𝑘𝑡𝑖𝑛1[\eta_{t}^{(n)}(k_{t}^{i})]_{n=1}^{\infty}[ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT. We have

ρti(kti)=h~ti:k~ti=ktigti(h~ti)ηt(h~ti|kti).superscriptsubscript𝜌𝑡𝑖superscriptsubscript𝑘𝑡𝑖subscript:superscriptsubscript~𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑔𝑡𝑖superscriptsubscript~𝑡𝑖subscript𝜂𝑡conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\rho_{t}^{i}(k_{t}^{i})=\sum_{\tilde{h}_{t}^{i}:\tilde{k}_{t}^{i}=k_{t}^{i}}g_% {t}^{i}(\tilde{h}_{t}^{i})\cdot\eta_{t}(\tilde{h}_{t}^{i}|k_{t}^{i}).italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ⋅ italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (166)

As a result, we have

supp(ρti(kti))h~ti:k~ti=ktisupp(gti(h~ti)).suppsuperscriptsubscript𝜌𝑡𝑖superscriptsubscript𝑘𝑡𝑖subscript:superscriptsubscript~𝑡𝑖superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖suppsuperscriptsubscript𝑔𝑡𝑖superscriptsubscript~𝑡𝑖\mathrm{supp}(\rho_{t}^{i}(k_{t}^{i}))\subseteq\bigcup_{\tilde{h}_{t}^{i}:% \tilde{k}_{t}^{i}=k_{t}^{i}}\mathrm{supp}(g_{t}^{i}(\tilde{h}_{t}^{i})).roman_supp ( italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ⊆ ⋃ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT : over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_supp ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) . (167)

By Lemma C.18 we have Qt(n),i(hti,uti)=Q^t(n),i(kti,uti)superscriptsubscript𝑄𝑡𝑛𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript^𝑄𝑡𝑛𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖Q_{t}^{(n),i}(h_{t}^{i},u_{t}^{i})=\hat{Q}_{t}^{(n),i}(k_{t}^{i},u_{t}^{i})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for some function Q^t(n),isuperscriptsubscript^𝑄𝑡𝑛𝑖\hat{Q}_{t}^{(n),i}over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT. Since Q(n),iQisuperscript𝑄𝑛𝑖superscript𝑄𝑖Q^{(n),i}\rightarrow Q^{i}italic_Q start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT → italic_Q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, we have Qti(hti,uti)=Q^ti(kti,uti)superscriptsubscript𝑄𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript^𝑄𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖Q_{t}^{i}(h_{t}^{i},u_{t}^{i})=\hat{Q}_{t}^{i}(k_{t}^{i},u_{t}^{i})italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) for some function Q^isuperscript^𝑄𝑖\hat{Q}^{i}over^ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. By sequential rationality we have

supp(gti(h~ti))argmaxutiQ^ti(kti,uti),suppsuperscriptsubscript𝑔𝑡𝑖superscriptsubscript~𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript^𝑄𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathrm{supp}(g_{t}^{i}(\tilde{h}_{t}^{i}))\subseteq\underset{u_{t}^{i}}{\arg% \max}~{}\hat{Q}_{t}^{i}(k_{t}^{i},u_{t}^{i}),roman_supp ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ⊆ start_UNDERACCENT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (168)

for all h~tisuperscriptsubscript~𝑡𝑖\tilde{h}_{t}^{i}over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT whose corresponding compression k~tisuperscriptsubscript~𝑘𝑡𝑖\tilde{k}_{t}^{i}over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT satisfies k~ti=ktisuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖\tilde{k}_{t}^{i}=k_{t}^{i}over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Therefore, by (167) and (168) we conclude that

supp(ρti(kti))argmaxutiQ^ti(kti,uti),suppsuperscriptsubscript𝜌𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖superscriptsubscript^𝑄𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝑢𝑡𝑖\mathrm{supp}(\rho_{t}^{i}(k_{t}^{i}))\subseteq\underset{u_{t}^{i}}{\arg\max}~% {}\hat{Q}_{t}^{i}(k_{t}^{i},u_{t}^{i}),roman_supp ( italic_ρ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) ⊆ start_UNDERACCENT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_max end_ARG over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (169)

establishing sequential rationality of ρisuperscript𝜌𝑖\rho^{i}italic_ρ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT with respect to Qisuperscript𝑄𝑖Q^{i}italic_Q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

To establish (ii), we will use the Lemmas C.6 and C.16 to show that when player i𝑖iitalic_i switches their strategy from g(n),isuperscript𝑔𝑛𝑖g^{(n),i}italic_g start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT to ρ(n),isuperscript𝜌𝑛𝑖\rho^{(n),i}italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT, other players face the same control problem at every information set. As a result, their Q(n),jsuperscript𝑄𝑛𝑗Q^{(n),j}italic_Q start_POSTSUPERSCRIPT ( italic_n ) , italic_j end_POSTSUPERSCRIPT functions stays the same.

Proof of (ii): Consider player jijij\neq iitalic_j ≠ italic_i. Through standard control theory, we know that a collection of functions Q~jsuperscript~Qj\tilde{Q}^{j}over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is consistent (in the sense of condition (2’) of Definition 13) with a fully mixed strategy profile g~jsuperscript~gj\tilde{g}^{-j}over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT - italic_j end_POSTSUPERSCRIPT if and only if it satisfies the following equations:

Q~Tj(hTj,uTj)superscriptsubscript~𝑄𝑇𝑗superscriptsubscript𝑇𝑗superscriptsubscript𝑢𝑇𝑗\displaystyle\tilde{Q}_{T}^{j}(h_{T}^{j},u_{T}^{j})over~ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =𝔼g~j[RTj|hTj,uTj],absentsuperscript𝔼superscript~𝑔𝑗delimited-[]conditionalsuperscriptsubscript𝑅𝑇𝑗superscriptsubscript𝑇𝑗superscriptsubscript𝑢𝑇𝑗\displaystyle=\mathbb{E}^{\tilde{g}^{-j}}[R_{T}^{j}|h_{T}^{j},u_{T}^{j}],= blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT - italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] , (170a)
V~tj(htj)superscriptsubscript~𝑉𝑡𝑗superscriptsubscript𝑡𝑗\displaystyle\tilde{V}_{t}^{j}(h_{t}^{j})over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =maxu~tjQ~tj(htj,u~tj),t𝒯,formulae-sequenceabsentsubscriptsuperscriptsubscript~𝑢𝑡𝑗superscriptsubscript~𝑄𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript~𝑢𝑡𝑗for-all𝑡𝒯\displaystyle=\max_{\tilde{u}_{t}^{j}}\tilde{Q}_{t}^{j}(h_{t}^{j},\tilde{u}_{t% }^{j}),\qquad\forall t\in\mathcal{T},= roman_max start_POSTSUBSCRIPT over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T , (170b)
Q~tj(htj,utj)superscriptsubscript~𝑄𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\tilde{Q}_{t}^{j}(h_{t}^{j},u_{t}^{j})over~ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =𝔼g~j[Rtj|htj,utj]+h~t+1jV~t+1j(h~t+1j)g~j(h~t+1j|htj,utj),t𝒯\{T}.formulae-sequenceabsentsuperscript𝔼superscript~𝑔𝑗delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗subscriptsuperscriptsubscript~𝑡1𝑗superscriptsubscript~𝑉𝑡1𝑗superscriptsubscript~𝑡1𝑗superscriptsuperscript~𝑔𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗for-all𝑡\𝒯𝑇\displaystyle=\mathbb{E}^{\tilde{g}^{-j}}[R_{t}^{j}|h_{t}^{j},u_{t}^{j}]+\sum_% {\tilde{h}_{t+1}^{j}}\tilde{V}_{t+1}^{j}(\tilde{h}_{t+1}^{j})\mathbb{P}^{% \tilde{g}^{-j}}(\tilde{h}_{t+1}^{j}|h_{t}^{j},u_{t}^{j}),\qquad\forall t\in% \mathcal{T}\backslash\{T\}.= blackboard_E start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT - italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] + ∑ start_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT over~ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT - italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , ∀ italic_t ∈ caligraphic_T \ { italic_T } . (170c)

By Lemma C.16, we have

g(n),j(h~t+1j|htj,utj)superscriptsuperscript𝑔𝑛𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\mathbb{P}^{g^{(n),-j}}(\tilde{h}_{t+1}^{j}|h_{t}^{j},u_{t}^{j})blackboard_P start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) =ρ(n),i,g(n),{i,j}(h~t+1j|htj,utj),absentsuperscriptsuperscript𝜌𝑛𝑖superscript𝑔𝑛𝑖𝑗conditionalsuperscriptsubscript~𝑡1𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle=\mathbb{P}^{\rho^{(n),i},g^{(n),-\{i,j\}}}(\tilde{h}_{t+1}^{j}|h% _{t}^{j},u_{t}^{j}),= blackboard_P start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT ( italic_n ) , - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (171)
𝔼g(n),j[Rtj|htj,utj]superscript𝔼superscript𝑔𝑛𝑗delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle\mathbb{E}^{g^{(n),-j}}[R_{t}^{j}|h_{t}^{j},u_{t}^{j}]blackboard_E start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ( italic_n ) , - italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] =𝔼ρ(n),i,g(n),{i,j}[Rtj|htj,utj],absentsuperscript𝔼superscript𝜌𝑛𝑖superscript𝑔𝑛𝑖𝑗delimited-[]conditionalsuperscriptsubscript𝑅𝑡𝑗superscriptsubscript𝑡𝑗superscriptsubscript𝑢𝑡𝑗\displaystyle=\mathbb{E}^{\rho^{(n),i},g^{(n),-\{i,j\}}}[R_{t}^{j}|h_{t}^{j},u% _{t}^{j}],= blackboard_E start_POSTSUPERSCRIPT italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT ( italic_n ) , - { italic_i , italic_j } end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT [ italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] , (172)

and hence we conclude that Q(n),jsuperscript𝑄𝑛𝑗Q^{(n),j}italic_Q start_POSTSUPERSCRIPT ( italic_n ) , italic_j end_POSTSUPERSCRIPT is also consistent with g¯(n),j=(ρ(n),i,g(n),{i,j})superscript¯𝑔𝑛𝑗superscript𝜌𝑛𝑖superscript𝑔𝑛𝑖𝑗\bar{g}^{(n),-j}=(\rho^{(n),i},g^{(n),-\{i,j\}})over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_n ) , - italic_j end_POSTSUPERSCRIPT = ( italic_ρ start_POSTSUPERSCRIPT ( italic_n ) , italic_i end_POSTSUPERSCRIPT , italic_g start_POSTSUPERSCRIPT ( italic_n ) , - { italic_i , italic_j } end_POSTSUPERSCRIPT ).

Now we have shown that (g¯,Q)¯𝑔𝑄(\bar{g},Q)( over¯ start_ARG italic_g end_ARG , italic_Q ) forms a sequential equilibrium. The second half of the Lemma, which states that g¯¯𝑔\bar{g}over¯ start_ARG italic_g end_ARG yields the same expected payoff as g𝑔gitalic_g, can be shown with the following argument: By Lemma C.6, g¯(n)superscript¯𝑔𝑛\bar{g}^{(n)}over¯ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT yields the same expected payoff profile as g(n)superscript𝑔𝑛g^{(n)}italic_g start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT. Since the expected payoff of each player is a continuous function of the behavioral strategy profile, we conclude that g¯¯𝑔\bar{g}over¯ start_ARG italic_g end_ARG yields the same expected payoff as g𝑔gitalic_g.

Finally, we conclude Theorem 5 from Lemma C.20.

Proof C.22 (Proof of Theorem 5).

Given any SE strategy profile g𝑔gitalic_g, applying Lemma C.20 iteratively for each i𝑖i\in\mathcal{I}italic_i ∈ caligraphic_I, we obtain a K𝐾Kitalic_K-based SE strategy profile ρ𝜌\rhoitalic_ρ with the same expected payoff profile as g𝑔gitalic_g. Therefore the set of K𝐾Kitalic_K-based SE payoffs is the same as that of all SE.

Appendix D Proofs for Section 5 and Section 6

D.1 Proof of Proposition 6

Proposition 21 (Proposition 6, restated).

In the game defined in Example 5.1, the set of K𝐾Kitalic_K-based wPBE payoffs is a proper subset of that of all wPBE payoffs.

Proof D.1.

Set g1Bsuperscriptsubscript𝑔1𝐵g_{1}^{B}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT to be the strategy of Bob where he always chooses U1B=+1superscriptsubscript𝑈1𝐵1U_{1}^{B}=+1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = + 1, and g2A:𝒳1A×𝒰1BΔ(𝒰2A):superscriptsubscript𝑔2𝐴maps-tosuperscriptsubscript𝒳1𝐴superscriptsubscript𝒰1𝐵Δsuperscriptsubscript𝒰2𝐴g_{2}^{A}:\mathcal{X}_{1}^{A}\times\mathcal{U}_{1}^{B}\mapsto\Delta(\mathcal{U% }_{2}^{A})italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT : caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ) is given by

g2A(x1A,u1B)={0 w.p. 1,if u1B=+1;x1A w.p. 23,0 w.p. 13,otherwise,superscriptsubscript𝑔2𝐴superscriptsubscript𝑥1𝐴superscriptsubscript𝑢1𝐵cases0 w.p. 1if superscriptsubscript𝑢1𝐵1superscriptsubscript𝑥1𝐴 w.p. 230 w.p. 13otherwise\displaystyle g_{2}^{A}(x_{1}^{A},u_{1}^{B})=\begin{cases}0\text{ w.p. }1,&% \text{if }u_{1}^{B}=+1;\\ x_{1}^{A}\text{ w.p. }\frac{2}{3},~{}0\text{ w.p. }\frac{1}{3},&\text{% otherwise},\end{cases}italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) = { start_ROW start_CELL 0 w.p. 1 , end_CELL start_CELL if italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = + 1 ; end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT w.p. divide start_ARG 2 end_ARG start_ARG 3 end_ARG , 0 w.p. divide start_ARG 1 end_ARG start_ARG 3 end_ARG , end_CELL start_CELL otherwise , end_CELL end_ROW

and g2B:𝒳1B×𝒰1BΔ(𝒰2B):superscriptsubscript𝑔2𝐵maps-tosuperscriptsubscript𝒳1𝐵superscriptsubscript𝒰1𝐵Δsuperscriptsubscript𝒰2𝐵g_{2}^{B}:\mathcal{X}_{1}^{B}\times\mathcal{U}_{1}^{B}\mapsto\Delta(\mathcal{U% }_{2}^{B})italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT : caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) is the strategy of Bob where he always chooses U2B=1superscriptsubscript𝑈2𝐵1U_{2}^{B}=-1italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = - 1 irrespective of U1Bsuperscriptsubscript𝑈1𝐵U_{1}^{B}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT.

The beliefs μ1B:𝒳1BΔ(𝒳1A):superscriptsubscript𝜇1𝐵maps-tosuperscriptsubscript𝒳1𝐵Δsuperscriptsubscript𝒳1𝐴\mu_{1}^{B}:\mathcal{X}_{1}^{B}\mapsto\Delta(\mathcal{X}_{1}^{A})italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT : caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ), μ2A:𝒳1A×𝒰1BΔ(𝒳1B):superscriptsubscript𝜇2𝐴maps-tosuperscriptsubscript𝒳1𝐴superscriptsubscript𝒰1𝐵Δsuperscriptsubscript𝒳1𝐵\mu_{2}^{A}:\mathcal{X}_{1}^{A}\times\mathcal{U}_{1}^{B}\mapsto\Delta(\mathcal% {X}_{1}^{B})italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT : caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ), and μ2B:𝒳1B×𝒰1BΔ(𝒳1A):superscriptsubscript𝜇2𝐵maps-tosuperscriptsubscript𝒳1𝐵superscriptsubscript𝒰1𝐵Δsuperscriptsubscript𝒳1𝐴\mu_{2}^{B}:\mathcal{X}_{1}^{B}\times\mathcal{U}_{1}^{B}\mapsto\Delta(\mathcal% {X}_{1}^{A})italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT : caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT × caligraphic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ↦ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ) are given by

μ1B(x1B)superscriptsubscript𝜇1𝐵superscriptsubscript𝑥1𝐵\displaystyle\mu_{1}^{B}(x_{1}^{B})italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) =the prior of X1A,absentthe prior of superscriptsubscript𝑋1𝐴\displaystyle=\text{the prior of }X_{1}^{A},= the prior of italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ,
μ2A(x1A,u1B)superscriptsubscript𝜇2𝐴superscriptsubscript𝑥1𝐴superscriptsubscript𝑢1𝐵\displaystyle\mu_{2}^{A}(x_{1}^{A},u_{1}^{B})italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) ={1 w.p. 12,+1 w.p. 12,if u1B=+1;x1A w.p. 1,otherwise,absentcases1 w.p. 121 w.p. 12if superscriptsubscript𝑢1𝐵1superscriptsubscript𝑥1𝐴 w.p. 1otherwise\displaystyle=\begin{cases}-1\text{ w.p. }\frac{1}{2},\;+1\text{ w.p. }\frac{1% }{2},&\text{if }u_{1}^{B}=+1;\\ x_{1}^{A}\text{ w.p. }1,&\text{otherwise},\end{cases}= { start_ROW start_CELL - 1 w.p. divide start_ARG 1 end_ARG start_ARG 2 end_ARG , + 1 w.p. divide start_ARG 1 end_ARG start_ARG 2 end_ARG , end_CELL start_CELL if italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = + 1 ; end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT w.p. 1 , end_CELL start_CELL otherwise , end_CELL end_ROW
μ2B(x1B,u1B)superscriptsubscript𝜇2𝐵superscriptsubscript𝑥1𝐵superscriptsubscript𝑢1𝐵\displaystyle\mu_{2}^{B}(x_{1}^{B},u_{1}^{B})italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) =the prior of X1A.absentthe prior of superscriptsubscript𝑋1𝐴\displaystyle=\text{the prior of }X_{1}^{A}.= the prior of italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT .

One can verify that g𝑔gitalic_g is sequentially rational given μ𝜇\muitalic_μ, and μ𝜇\muitalic_μ is “preconsistent” [17] with g𝑔gitalic_g, i.e. the beliefs can be updated with Bayes rule for consecutive information sets on and off-equilibrium paths. In particular, (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ) is a wPBE. (It can also be shown that (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ) satisfies Watson’s PBE definition [57]. However, (g,μ)𝑔𝜇(g,\mu)( italic_g , italic_μ ) is not a PBE in the sense of Fudenberg and Tirole [11], since μ𝜇\muitalic_μ violates their “no-signaling-what-you-don’t-know” condition.)

We proceed to show that no K𝐾Kitalic_K-based wPBE can attain the payoff profile of g𝑔gitalic_g.

Suppose that ρ=(ρA,ρB)𝜌superscript𝜌𝐴superscript𝜌𝐵\rho=(\rho^{A},\rho^{B})italic_ρ = ( italic_ρ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT , italic_ρ start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ) is a K𝐾Kitalic_K-based weak PBE strategy profile. First, observe that at t=2𝑡2t=2italic_t = 2, Alice can only choose her actions based on U1Bsuperscriptsubscript𝑈1𝐵U_{1}^{B}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT according to the definition of KAsuperscript𝐾𝐴K^{A}italic_K start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT-based strategies. Let α,βΔ({1,0,1})𝛼𝛽Δ101\alpha,\beta\in\Delta(\{-1,0,1\})italic_α , italic_β ∈ roman_Δ ( { - 1 , 0 , 1 } ) be Alice’s mixed action at time t=2𝑡2t=2italic_t = 2 under U2A=1superscriptsubscript𝑈2𝐴1U_{2}^{A}=-1italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = - 1 and U2A=+1superscriptsubscript𝑈2𝐴1U_{2}^{A}=+1italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = + 1 respectively under strategy ρAsuperscript𝜌𝐴\rho^{A}italic_ρ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT. With some abuse of notation, denote ρA=(α,β)superscript𝜌𝐴𝛼𝛽\rho^{A}=(\alpha,\beta)italic_ρ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = ( italic_α , italic_β ). There exists no belief system under which Alice is indifferent between all of her three actions at time t=2𝑡2t=2italic_t = 2. Therefore, no strictly mixed action at t=2𝑡2t=2italic_t = 2 would be sequentially rational. Therefore, sequential rationally of ρAsuperscript𝜌𝐴\rho^{A}italic_ρ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT (with respect to some belief) implies that min{α(1),α(0),α(+1)}=min{β(1),β(0),β(+1)}=0𝛼1𝛼0𝛼1𝛽1𝛽0𝛽10\min\{\alpha(-1),\alpha(0),\alpha(+1)\}=\min\{\beta(-1),\beta(0),\beta(+1)\}=0roman_min { italic_α ( - 1 ) , italic_α ( 0 ) , italic_α ( + 1 ) } = roman_min { italic_β ( - 1 ) , italic_β ( 0 ) , italic_β ( + 1 ) } = 0.

To respond to ρA=(α,β)superscript𝜌𝐴𝛼𝛽\rho^{A}=(\alpha,\beta)italic_ρ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = ( italic_α , italic_β ), Bob can always maximizes his stage 2 instantaneous reward to 0 by using a suitable response strategy. If Bob plays 11-1- 1 at t=1𝑡1t=1italic_t = 1, his best total payoff is given by 0.20.20.20.2; if Bob plays +11+1+ 1 at t=1𝑡1t=1italic_t = 1, his best total payoff is given by 00. Hence Bob strictly prefers 11-1- 1 to +11+1+ 1. Therefore, in any best response (in terms of total expected payoff) to Alice’s strategy ρAsuperscript𝜌𝐴\rho^{A}italic_ρ start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT, Bob plays U1B=1superscriptsubscript𝑈1𝐵1U_{1}^{B}=-1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = - 1 irrespective of his private type. Therefore, Alice has an instantaneous payoff of 11-1- 1 at t=1𝑡1t=1italic_t = 1 and a total payoff 0absent0\leq 0≤ 0 under ρ𝜌\rhoitalic_ρ, proving that the payoff profile of ρ𝜌\rhoitalic_ρ is different from that of g𝑔gitalic_g.

D.2 Proof of Proposition 7

Proposition 22 (Proposition 7, restated).

In the model of Example 5.6, Kti=(Y1:t1,U1:t1,Xti)superscriptsubscript𝐾𝑡𝑖subscript𝑌:1𝑡1subscript𝑈:1𝑡1superscriptsubscript𝑋𝑡𝑖K_{t}^{i}=(Y_{1:t-1},U_{1:t-1},X_{t}^{i})italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_Y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) is unilaterally sufficient information.

We first prove Lemma D.2, which establish the conditional independence of the state processes given the common information.

Lemma D.2.

In the model of Example 5.6, there exists functions (ξtgi)gi𝒢i,i,ξtgi:𝒴1:t1×𝒰1:t1Δ(𝒳1:ti):subscriptsuperscriptsubscript𝜉𝑡superscript𝑔𝑖formulae-sequencesuperscript𝑔𝑖superscript𝒢𝑖𝑖superscriptsubscript𝜉𝑡superscript𝑔𝑖maps-tosubscript𝒴:1𝑡1subscript𝒰:1𝑡1Δsuperscriptsubscript𝒳:1𝑡𝑖(\xi_{t}^{g^{i}})_{g^{i}\in\mathcal{G}^{i},i\in\mathcal{I}},\xi_{t}^{g^{i}}:% \mathcal{Y}_{1:t-1}\times\mathcal{U}_{1:t-1}\mapsto\Delta(\mathcal{X}_{1:t}^{i})( italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i ∈ caligraphic_I end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT : caligraphic_Y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT × caligraphic_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ↦ roman_Δ ( caligraphic_X start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) such that

g(x1:t|y1:t1,u1:t1)=iξtgi(x1:ti|y1:t1,u1:t1),superscript𝑔conditionalsubscript𝑥:1𝑡subscript𝑦:1𝑡1subscript𝑢:1𝑡1subscriptproduct𝑖superscriptsubscript𝜉𝑡superscript𝑔𝑖conditionalsuperscriptsubscript𝑥:1𝑡𝑖subscript𝑦:1𝑡1subscript𝑢:1𝑡1\mathbb{P}^{g}(x_{1:t}|y_{1:t-1},u_{1:t-1})=\prod_{i\in\mathcal{I}}\xi_{t}^{g^% {i}}(x_{1:t}^{i}|y_{1:t-1},u_{1:t-1}),blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT | italic_y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ) , (173)

for all strategy profiles g𝑔gitalic_g and all (y1:t1,u1:t1)subscript𝑦:1𝑡1subscript𝑢:1𝑡1(y_{1:t-1},u_{1:t-1})( italic_y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ) admissible under g𝑔gitalic_g.

Proof D.3 (Proof of Lemma D.2).

Denote Ht0=(𝐘1:t1,𝐔1:t1)superscriptsubscript𝐻𝑡0subscript𝐘:1𝑡1subscript𝐔:1𝑡1H_{t}^{0}=(\mathbf{Y}_{1:t-1},\mathbf{U}_{1:t-1})italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = ( bold_Y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , bold_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ). We prove the result by induction on time t𝑡titalic_t.

Induction Base: The result is true for t=1t1t=1italic_t = 1 since H10=superscriptsubscriptH10H_{1}^{0}=\varnothingitalic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = ∅ and the random variables (X1i)isubscriptsuperscriptsubscriptX1ii(X_{1}^{i})_{i\in\mathcal{I}}( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT are assumed to be mutually independent.

Induction Step: Suppose that we have proved Lemma D.2 for time tttitalic_t. We then prove the result for time t+1t1t+1italic_t + 1.

We have

g(x1:t+1,yt,ut|ht0)superscript𝑔subscript𝑥:1𝑡1subscript𝑦𝑡conditionalsubscript𝑢𝑡superscriptsubscript𝑡0\displaystyle\mathbb{P}^{g}(x_{1:t+1},y_{t},u_{t}|h_{t}^{0})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) =g(xt+1,yt|x1:t,ut,ht0)g(ut|x1:t,ht0)g(x1:t|ht0)absentsuperscript𝑔subscript𝑥𝑡1conditionalsubscript𝑦𝑡subscript𝑥:1𝑡subscript𝑢𝑡superscriptsubscript𝑡0superscript𝑔conditionalsubscript𝑢𝑡subscript𝑥:1𝑡superscriptsubscript𝑡0superscript𝑔conditionalsubscript𝑥:1𝑡superscriptsubscript𝑡0\displaystyle=\mathbb{P}^{g}(x_{t+1},y_{t}|x_{1:t},u_{t},h_{t}^{0})\mathbb{P}^% {g}(u_{t}|x_{1:t},h_{t}^{0})\mathbb{P}^{g}(x_{1:t}|h_{t}^{0})= blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) (174)
=i((xt+1i,yti|xti,ut)gti(uti|x1:ti,ht0)ξtgi(x1:ti|ht0))absentsubscriptproduct𝑖superscriptsubscript𝑥𝑡1𝑖conditionalsuperscriptsubscript𝑦𝑡𝑖superscriptsubscript𝑥𝑡𝑖subscript𝑢𝑡superscriptsubscript𝑔𝑡𝑖conditionalsuperscriptsubscript𝑢𝑡𝑖superscriptsubscript𝑥:1𝑡𝑖superscriptsubscript𝑡0superscriptsubscript𝜉𝑡superscript𝑔𝑖conditionalsuperscriptsubscript𝑥:1𝑡𝑖superscriptsubscript𝑡0\displaystyle=\prod_{i\in\mathcal{I}}\left(\mathbb{P}(x_{t+1}^{i},y_{t}^{i}|x_% {t}^{i},u_{t})g_{t}^{i}(u_{t}^{i}|x_{1:t}^{i},h_{t}^{0})\xi_{t}^{g^{i}}(x_{1:t% }^{i}|h_{t}^{0})\right)= ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT ( blackboard_P ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ) (175)
=:iνtgi(x1:t+1i,yt,ut,ht0)=iνtgi(x1:t+1i,ht+10),\displaystyle=:\prod_{i\in\mathcal{I}}\nu_{t}^{g^{i}}(x_{1:t+1}^{i},y_{t},u_{t% },h_{t}^{0})=\prod_{i\in\mathcal{I}}\nu_{t}^{g^{i}}(x_{1:t+1}^{i},h_{t+1}^{0}),= : ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) , (176)

where the induction hypothesis is utilized in (175).

Therefore, using Bayes rule,

g(x1:t+1|ht+10)superscript𝑔conditionalsubscript𝑥:1𝑡1superscriptsubscript𝑡10\displaystyle\mathbb{P}^{g}(x_{1:t+1}|h_{t+1}^{0})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) =g(x1:t+1,yt,ut|ht0)y~t,u~tg(x~1:t+1,yt,ut|ht+10)absentsuperscript𝑔subscript𝑥:1𝑡1subscript𝑦𝑡conditionalsubscript𝑢𝑡superscriptsubscript𝑡0subscriptsubscript~𝑦𝑡subscript~𝑢𝑡superscript𝑔subscript~𝑥:1𝑡1subscript𝑦𝑡conditionalsubscript𝑢𝑡superscriptsubscript𝑡10\displaystyle=\dfrac{\mathbb{P}^{g}(x_{1:t+1},y_{t},u_{t}|h_{t}^{0})}{\sum_{% \tilde{y}_{t},\tilde{u}_{t}}\mathbb{P}^{g}(\tilde{x}_{1:t+1},y_{t},u_{t}|h_{t+% 1}^{0})}= divide start_ARG blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG (177)
=iνtgi(x1:t+1i,ht+10)x~1:t+1iνtgi(x~1:t+1i,ht+10)absentsubscriptproduct𝑖superscriptsubscript𝜈𝑡superscript𝑔𝑖superscriptsubscript𝑥:1𝑡1𝑖superscriptsubscript𝑡10subscriptsubscript~𝑥:1𝑡1subscriptproduct𝑖superscriptsubscript𝜈𝑡superscript𝑔𝑖superscriptsubscript~𝑥:1𝑡1𝑖superscriptsubscript𝑡10\displaystyle=\dfrac{\prod_{i\in\mathcal{I}}\nu_{t}^{g^{i}}(x_{1:t+1}^{i},h_{t% +1}^{0})}{\sum_{\tilde{x}_{1:t+1}}\prod_{i\in\mathcal{I}}\nu_{t}^{g^{i}}(% \tilde{x}_{1:t+1}^{i},h_{t+1}^{0})}= divide start_ARG ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG (178)
=iνtgi(x1:t+1i,ht+10)ix~1:t+1iνtgi(x~1:t+1i,ht+10)absentsubscriptproduct𝑖superscriptsubscript𝜈𝑡superscript𝑔𝑖superscriptsubscript𝑥:1𝑡1𝑖superscriptsubscript𝑡10subscriptproduct𝑖subscriptsuperscriptsubscript~𝑥:1𝑡1𝑖superscriptsubscript𝜈𝑡superscript𝑔𝑖superscriptsubscript~𝑥:1𝑡1𝑖superscriptsubscript𝑡10\displaystyle=\dfrac{\prod_{i\in\mathcal{I}}\nu_{t}^{g^{i}}(x_{1:t+1}^{i},h_{t% +1}^{0})}{\prod_{i\in\mathcal{I}}\sum_{\tilde{x}_{1:t+1}^{i}}\nu_{t}^{g^{i}}(% \tilde{x}_{1:t+1}^{i},h_{t+1}^{0})}= divide start_ARG ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG (179)
=:iξt+1gi(x1:t+1i|ht+10),\displaystyle=:\prod_{i\in\mathcal{I}}\xi_{t+1}^{g^{i}}(x_{1:t+1}^{i}|h_{t+1}^% {0}),= : ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) , (180)

where

ξt+1gi(x1:t+1i|ht+10):=νtgi(x1:t+1i,ht+10)x~1:t+1iνtgi(x~1:t+1i,ht+10),assignsuperscriptsubscript𝜉𝑡1superscript𝑔𝑖conditionalsuperscriptsubscript𝑥:1𝑡1𝑖superscriptsubscript𝑡10superscriptsubscript𝜈𝑡superscript𝑔𝑖superscriptsubscript𝑥:1𝑡1𝑖superscriptsubscript𝑡10subscriptsuperscriptsubscript~𝑥:1𝑡1𝑖superscriptsubscript𝜈𝑡superscript𝑔𝑖superscriptsubscript~𝑥:1𝑡1𝑖superscriptsubscript𝑡10\xi_{t+1}^{g^{i}}(x_{1:t+1}^{i}|h_{t+1}^{0}):=\dfrac{\nu_{t}^{g^{i}}(x_{1:t+1}% ^{i},h_{t+1}^{0})}{\sum_{\tilde{x}_{1:t+1}^{i}}\nu_{t}^{g^{i}}(\tilde{x}_{1:t+% 1}^{i},h_{t+1}^{0})},italic_ξ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) := divide start_ARG italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_h start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG , (181)

establishing the induction step.

Proof D.4 (Proof of Proposition 7).

Denote Ht0=(𝐘1:t1,𝐔1:t1)superscriptsubscript𝐻𝑡0subscript𝐘:1𝑡1subscript𝐔:1𝑡1H_{t}^{0}=(\mathbf{Y}_{1:t-1},\mathbf{U}_{1:t-1})italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = ( bold_Y start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , bold_U start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ). Then Kti=(Ht0,Xti)superscriptsubscript𝐾𝑡𝑖superscriptsubscript𝐻𝑡0superscriptsubscript𝑋𝑡𝑖K_{t}^{i}=(H_{t}^{0},X_{t}^{i})italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). Given Lemma D.2, we have

g(x1:t1i|kti)superscript𝑔conditionalsuperscriptsubscript𝑥:1𝑡1𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle\mathbb{P}^{g}(x_{1:t-1}^{i}|k_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =g(x1:ti|ht0)g(xti|ht0)=ξtgi(x1:ti|ht0)x~1:t1iξtgi((x~1:t1i,xti)|ht0)absentsuperscript𝑔conditionalsuperscriptsubscript𝑥:1𝑡𝑖superscriptsubscript𝑡0superscript𝑔conditionalsuperscriptsubscript𝑥𝑡𝑖superscriptsubscript𝑡0superscriptsubscript𝜉𝑡superscript𝑔𝑖conditionalsuperscriptsubscript𝑥:1𝑡𝑖superscriptsubscript𝑡0subscriptsuperscriptsubscript~𝑥:1𝑡1𝑖superscriptsubscript𝜉𝑡superscript𝑔𝑖conditionalsuperscriptsubscript~𝑥:1𝑡1𝑖superscriptsubscript𝑥𝑡𝑖superscriptsubscript𝑡0\displaystyle=\dfrac{\mathbb{P}^{g}(x_{1:t}^{i}|h_{t}^{0})}{\mathbb{P}^{g}(x_{% t}^{i}|h_{t}^{0})}=\dfrac{\xi_{t}^{g^{i}}(x_{1:t}^{i}|h_{t}^{0})}{\sum_{\tilde% {x}_{1:t-1}^{i}}\xi_{t}^{g^{i}}((\tilde{x}_{1:t-1}^{i},x_{t}^{i})|h_{t}^{0})}= divide start_ARG blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG = divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG (182)
=:F~ti,gi(x1:t1i|kti).\displaystyle=:\tilde{F}_{t}^{i,g^{i}}(x_{1:t-1}^{i}|k_{t}^{i}).= : over~ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (183)

Since Hti=(Kti,X1:t1i)superscriptsubscript𝐻𝑡𝑖superscriptsubscript𝐾𝑡𝑖superscriptsubscript𝑋:1𝑡1𝑖H_{t}^{i}=(K_{t}^{i},X_{1:t-1}^{i})italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_X start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), we conclude that

g(h~ti|kti)=Fti,gi(h~ti|kti),superscript𝑔conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\mathbb{P}^{g}(\tilde{h}_{t}^{i}|k_{t}^{i})=F_{t}^{i,g^{i}}(\tilde{h}_{t}^{i}|% k_{t}^{i}),blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (184)

for some function Fti,gisuperscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖F_{t}^{i,g^{i}}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Given Lemma D.2, we have

g(x~1:ti|hti)superscript𝑔conditionalsuperscriptsubscript~𝑥:1𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\mathbb{P}^{g}(\tilde{x}_{1:t}^{-i}|h_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =g(x~1:ti,x1:ti|ht0)g(x1:ti|ht0)=jiξtgj(x~1:tj|ht0).absentsuperscript𝑔superscriptsubscript~𝑥:1𝑡𝑖conditionalsuperscriptsubscript𝑥:1𝑡𝑖superscriptsubscript𝑡0superscript𝑔conditionalsuperscriptsubscript𝑥:1𝑡𝑖superscriptsubscript𝑡0subscriptproduct𝑗𝑖superscriptsubscript𝜉𝑡superscript𝑔𝑗conditionalsuperscriptsubscript~𝑥:1𝑡𝑗superscriptsubscript𝑡0\displaystyle=\dfrac{\mathbb{P}^{g}(\tilde{x}_{1:t}^{-i},x_{1:t}^{i}|h_{t}^{0}% )}{\mathbb{P}^{g}(x_{1:t}^{i}|h_{t}^{0})}=\prod_{j\neq i}\xi_{t}^{g^{j}}(% \tilde{x}_{1:t}^{j}|h_{t}^{0}).= divide start_ARG blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG start_ARG blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) end_ARG = ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) . (185)

As a result, we have

g(x~1:ti,k~ti|hti)superscript𝑔superscriptsubscript~𝑥:1𝑡𝑖conditionalsuperscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑡𝑖\displaystyle\mathbb{P}^{g}(\tilde{x}_{1:t}^{-i},\tilde{k}_{t}^{i}|h_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT , over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =𝟏{k~ti=kti}jiξtgj(x1:tj|ht0)absentsubscript1superscriptsubscript~𝑘𝑡𝑖superscriptsubscript𝑘𝑡𝑖subscriptproduct𝑗𝑖superscriptsubscript𝜉𝑡superscript𝑔𝑗conditionalsuperscriptsubscript𝑥:1𝑡𝑗superscriptsubscript𝑡0\displaystyle=\bm{1}_{\{\tilde{k}_{t}^{i}=k_{t}^{i}\}}\prod_{j\neq i}\xi_{t}^{% g^{j}}(x_{1:t}^{j}|h_{t}^{0})= bold_1 start_POSTSUBSCRIPT { over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) (186)
=:Φ~ti,gi(x~1:ti|kti).\displaystyle=:\tilde{\Phi}_{t}^{i,g^{-i}}(\tilde{x}_{1:t}^{-i}|k_{t}^{i}).= : over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (187)

Since (𝐗t,Hti)subscript𝐗𝑡superscriptsubscript𝐻𝑡𝑖(\mathbf{X}_{t},H_{t}^{-i})( bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ) is a fixed function of (𝐗1:ti,Kti)superscriptsubscript𝐗:1𝑡𝑖superscriptsubscript𝐾𝑡𝑖(\mathbf{X}_{1:t}^{-i},K_{t}^{i})( bold_X start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT , italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), we conclude that

g(x~t,h~ti|hti)=Φti,gi(x~t,h~ti|kti),superscript𝑔subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑡𝑖superscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\mathbb{P}^{g}(\tilde{x}_{t},\tilde{h}_{t}^{-i}|h_{t}^{i})=\Phi_{t}^{i,g^{-i}}% (\tilde{x}_{t},\tilde{h}_{t}^{-i}|k_{t}^{i}),blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , (188)

for some function Φti,gisuperscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖\Phi_{t}^{i,g^{-i}}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Combining (184) and (188) while using the fact that Ktisuperscriptsubscript𝐾𝑡𝑖K_{t}^{i}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is a function of Htisuperscriptsubscript𝐻𝑡𝑖H_{t}^{i}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, we obtain

g(x~t,h~t|kti)superscript𝑔subscript~𝑥𝑡conditionalsubscript~𝑡superscriptsubscript𝑘𝑡𝑖\displaystyle\mathbb{P}^{g}(\tilde{x}_{t},\tilde{h}_{t}|k_{t}^{i})blackboard_P start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) =Fti,gi(h~ti|kti)Φti,gi(x~t,h~ti|kti).absentsuperscriptsubscript𝐹𝑡𝑖superscript𝑔𝑖conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖superscriptsubscriptΦ𝑡𝑖superscript𝑔𝑖subscript~𝑥𝑡conditionalsuperscriptsubscript~𝑡𝑖superscriptsubscript𝑘𝑡𝑖\displaystyle=F_{t}^{i,g^{i}}(\tilde{h}_{t}^{i}|k_{t}^{i})\Phi_{t}^{i,g^{-i}}(% \tilde{x}_{t},\tilde{h}_{t}^{-i}|k_{t}^{i}).= italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i , italic_g start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT | italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) . (189)

We conclude that Kisuperscript𝐾𝑖K^{i}italic_K start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is unilaterally sufficient information.

D.3 Proof of Proposition 8

Proposition 23 (Proposition 8, restated).

In the game of Example 6.1 belief-based equilibria do not exist.

Proof D.5.

We first characterize all the Bayes-Nash equilibria of Example 6.1 in behavioral strategy profiles. Then we will show that none of the BNE corresponds to a belief-based equilibrium.

Let α=(α1,α2)[0,1]2𝛼subscript𝛼1subscript𝛼2superscript012\alpha=(\alpha_{1},\alpha_{2})\in[0,1]^{2}italic_α = ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT describe Alice’s behavioral strategy: α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the probability that Alice plays U1A=1superscriptsubscript𝑈1𝐴1U_{1}^{A}=-1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = - 1 given X1A=1superscriptsubscript𝑋1𝐴1X_{1}^{A}=-1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = - 1; α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the probability that Alice plays U1A=+1superscriptsubscript𝑈1𝐴1U_{1}^{A}=+1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = + 1 given X1A=+1superscriptsubscript𝑋1𝐴1X_{1}^{A}=+1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = + 1. Let β=(β1,β2)[0,1]2𝛽subscript𝛽1subscript𝛽2superscript012\beta=(\beta_{1},\beta_{2})\in[0,1]^{2}italic_β = ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT denote Bob’s behavioral strategy: β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the probability that Bob plays U2B=Usuperscriptsubscript𝑈2𝐵UU_{2}^{B}=\mathrm{U}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = roman_U when observing U1A=1superscriptsubscript𝑈1𝐴1U_{1}^{A}=-1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = - 1, β2subscript𝛽2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the probability that Bob plays U2B=Usuperscriptsubscript𝑈2𝐵UU_{2}^{B}=\mathrm{U}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = roman_U when observing U1A=+1superscriptsubscript𝑈1𝐴1U_{1}^{A}=+1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = + 1.

Claim:

α=(13,13),β=(13+c,13c),formulae-sequencesuperscript𝛼1313superscript𝛽13𝑐13𝑐\alpha^{*}=\left(\frac{1}{3},\frac{1}{3}\right),\quad\beta^{*}=\left(\frac{1}{% 3}+c,\frac{1}{3}-c\right),italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG + italic_c , divide start_ARG 1 end_ARG start_ARG 3 end_ARG - italic_c ) , (190)

is the unique BNE of Example 6.1.

Given the claim, one can conclude that a belief based equilibrium does not exist in this game: Bob’s true belief b2subscript𝑏2b_{2}italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on X2subscript𝑋2X_{2}italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT at the beginning of stage 2, given his information H2B=U1Asuperscriptsubscript𝐻2𝐵superscriptsubscript𝑈1𝐴H_{2}^{B}=U_{1}^{A}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT, would satisfy

b2(+1)superscriptsubscript𝑏21\displaystyle b_{2}^{-}(+1)italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ( + 1 ) =α1α1+1α2, if α(0,1);formulae-sequenceabsentsubscript𝛼1subscript𝛼11subscript𝛼2 if 𝛼01\displaystyle=\dfrac{\alpha_{1}}{\alpha_{1}+1-\alpha_{2}},\quad\text{ if }% \alpha\neq(0,1);= divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 - italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG , if italic_α ≠ ( 0 , 1 ) ; (191)
b2+(+1)superscriptsubscript𝑏21\displaystyle b_{2}^{+}(+1)italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( + 1 ) =α2α2+1α1, if α(1,0),formulae-sequenceabsentsubscript𝛼2subscript𝛼21subscript𝛼1 if 𝛼10\displaystyle=\dfrac{\alpha_{2}}{\alpha_{2}+1-\alpha_{1}},\quad\text{ if }% \alpha\neq(1,0),= divide start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , if italic_α ≠ ( 1 , 0 ) , (192)

where b2superscriptsubscript𝑏2b_{2}^{-}italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT represents the belief under U1A=1superscriptsubscript𝑈1𝐴1U_{1}^{A}=-1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = - 1 and b2+superscriptsubscript𝑏2b_{2}^{+}italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT represents the belief under U1A=+1superscriptsubscript𝑈1𝐴1U_{1}^{A}=+1italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT = + 1. If Alice plays α=(13,13)superscript𝛼1313\alpha^{*}=\left(\frac{1}{3},\frac{1}{3}\right)italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG ), then b2=b2+superscriptsubscript𝑏2superscriptsubscript𝑏2b_{2}^{-}=b_{2}^{+}italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. Under a belief-based equilibrium concept (e.g. [36, 56]), Bob’s stage behavioral strategy β𝛽\betaitalic_β should yield the same action distribution under the same belief, which means that β1=β2subscript𝛽1subscript𝛽2\beta_{1}=\beta_{2}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. However we have β=(13+c,13c)superscript𝛽13𝑐13𝑐\beta^{*}=\left(\frac{1}{3}+c,\frac{1}{3}-c\right)italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG + italic_c , divide start_ARG 1 end_ARG start_ARG 3 end_ARG - italic_c ). Therefore, (α,β)superscript𝛼superscript𝛽(\alpha^{*},\beta^{*})( italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), the unique BNE of the game, is not a belief-based equilibrium. We conclude that a belief-based equilibrium does not exist in Example 6.1.

Proof of Claim: Denote Alice’s total expected payoff to be J(α,β)JαβJ(\alpha,\beta)italic_J ( italic_α , italic_β ). Then

J(α,β)𝐽𝛼𝛽\displaystyle\enspace\enspace\>J(\alpha,\beta)italic_J ( italic_α , italic_β )
=12c(1α1+α2)+12α12β1+12(1α1)(1β2)+12(1α2)(1β1)+12α22β2absent12𝑐1subscript𝛼1subscript𝛼212subscript𝛼12subscript𝛽1121subscript𝛼11subscript𝛽2121subscript𝛼21subscript𝛽112subscript𝛼22subscript𝛽2\displaystyle=\frac{1}{2}c(1-\alpha_{1}+\alpha_{2})+\frac{1}{2}\alpha_{1}\cdot 2% \beta_{1}+\frac{1}{2}(1-\alpha_{1})(1-\beta_{2})+\frac{1}{2}(1-\alpha_{2})(1-% \beta_{1})+\frac{1}{2}\alpha_{2}\cdot 2\beta_{2}= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ 2 italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 - italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ 2 italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
=12c(1α1+α2)+12(2α1α2)+12(2α1+α21)β1+12(2α2+α11)β2.absent12𝑐1subscript𝛼1subscript𝛼2122subscript𝛼1subscript𝛼2122subscript𝛼1subscript𝛼21subscript𝛽1122subscript𝛼2subscript𝛼11subscript𝛽2\displaystyle=\frac{1}{2}c(1-\alpha_{1}+\alpha_{2})+\frac{1}{2}(2-\alpha_{1}-% \alpha_{2})+\frac{1}{2}(2\alpha_{1}+\alpha_{2}-1)\beta_{1}+\frac{1}{2}(2\alpha% _{2}+\alpha_{1}-1)\beta_{2}.= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 2 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 2 italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 2 italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

Define J(α)=minβJ(α,β)superscript𝐽𝛼subscript𝛽𝐽𝛼𝛽J^{*}(\alpha)=\min_{\beta}J(\alpha,\beta)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_α ) = roman_min start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT italic_J ( italic_α , italic_β ). Since the game is zero-sum, Alice plays α𝛼\alphaitalic_α at some equilibrium if and only if α𝛼\alphaitalic_α maximizes J(α)superscript𝐽𝛼J^{*}(\alpha)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_α ). We compute

J(α)superscript𝐽𝛼\displaystyle J^{*}(\alpha)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_α ) =12c(1α1+α2)+12(2α1α2)+absent12𝑐1subscript𝛼1subscript𝛼2limit-from122subscript𝛼1subscript𝛼2\displaystyle=\frac{1}{2}c(1-\alpha_{1}+\alpha_{2})+\frac{1}{2}(2-\alpha_{1}-% \alpha_{2})+= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 2 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) +
+12min{2α1+α21,0}+12min{α1+2α21,0}.122subscript𝛼1subscript𝛼21012subscript𝛼12subscript𝛼210\displaystyle+\frac{1}{2}\min\{2\alpha_{1}+\alpha_{2}-1,0\}+\frac{1}{2}\min\{% \alpha_{1}+2\alpha_{2}-1,0\}.+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_min { 2 italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 , 0 } + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_min { italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 , 0 } .

Since J(α)superscript𝐽𝛼J^{*}(\alpha)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_α ) is a continuous piecewise linear function, the set of maximizers can be found by comparing the values at the extreme points of the pieces. We have

J(0,0)superscript𝐽00\displaystyle J^{*}(0,0)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 , 0 ) =12c+11212=12c;absent12𝑐1121212𝑐\displaystyle=\frac{1}{2}c+1-\frac{1}{2}-\frac{1}{2}=\frac{1}{2}c;= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c + 1 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ;
J(12,0)superscript𝐽120\displaystyle J^{*}\left(\frac{1}{2},0\right)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 ) =12c12+1232+1201212=14c+12;absent12𝑐121232120121214𝑐12\displaystyle=\frac{1}{2}c\cdot\frac{1}{2}+\frac{1}{2}\cdot\frac{3}{2}+\frac{1% }{2}\cdot 0-\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}c+\frac{1}{2};= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 3 end_ARG start_ARG 2 end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG = divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_c + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ;
J(0,12)superscript𝐽012\displaystyle J^{*}\left(0,\frac{1}{2}\right)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) =12c32+12321212120=34c+12;absent12𝑐321232121212034𝑐12\displaystyle=\frac{1}{2}c\cdot\frac{3}{2}+\frac{1}{2}\cdot\frac{3}{2}-\frac{1% }{2}\cdot\frac{1}{2}-\frac{1}{2}\cdot 0=\frac{3}{4}c+\frac{1}{2};= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ⋅ divide start_ARG 3 end_ARG start_ARG 2 end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 3 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG 2 end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 = divide start_ARG 3 end_ARG start_ARG 4 end_ARG italic_c + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ;
J(1,0)superscript𝐽10\displaystyle J^{*}(1,0)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 1 , 0 ) =12c0+121+120+120=12;absent12𝑐012112012012\displaystyle=\frac{1}{2}c\cdot 0+\frac{1}{2}\cdot 1+\frac{1}{2}\cdot 0+\frac{% 1}{2}\cdot 0=\frac{1}{2};= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ⋅ 0 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 1 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ;
J(0,1)superscript𝐽01\displaystyle J^{*}(0,1)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 , 1 ) =12c2+121+120+120=c+12;absent12𝑐2121120120𝑐12\displaystyle=\frac{1}{2}c\cdot 2+\frac{1}{2}\cdot 1+\frac{1}{2}\cdot 0+\frac{% 1}{2}\cdot 0=c+\frac{1}{2};= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ⋅ 2 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 1 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 = italic_c + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ;
J(13,13)superscript𝐽1313\displaystyle J^{*}\left(\frac{1}{3},\frac{1}{3}\right)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) =12c+1243+120+120=12c+23;absent12𝑐124312012012𝑐23\displaystyle=\frac{1}{2}c+\frac{1}{2}\cdot\frac{4}{3}+\frac{1}{2}\cdot 0+% \frac{1}{2}\cdot 0=\frac{1}{2}c+\frac{2}{3};= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 4 end_ARG start_ARG 3 end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c + divide start_ARG 2 end_ARG start_ARG 3 end_ARG ;
J(1,1)superscript𝐽11\displaystyle J^{*}(1,1)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 1 , 1 ) =12c+120+120+120=12c.absent12𝑐12012012012𝑐\displaystyle=\frac{1}{2}c+\frac{1}{2}\cdot 0+\frac{1}{2}\cdot 0+\frac{1}{2}% \cdot 0=\frac{1}{2}c.= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⋅ 0 = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c .
α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTα2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT(1,1)11(1,1)( 1 , 1 )(0,1)01(0,1)( 0 , 1 )(0,0)00(0,0)( 0 , 0 )(1,0)10(1,0)( 1 , 0 )(12,0)120(\frac{1}{2},0)( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 0 )(0,12)012(0,\frac{1}{2})( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG )(13,13)1313(\frac{1}{3},\frac{1}{3})( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG )
Figure 3: The pieces (polygons) for which J(α)superscript𝐽𝛼J^{*}(\alpha)italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_α ) is linear on. The extreme points of the pieces are labeled.

Since c<13𝑐13c<\frac{1}{3}italic_c < divide start_ARG 1 end_ARG start_ARG 3 end_ARG, we have (13,13)1313(\frac{1}{3},\frac{1}{3})( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) to be the unique maximum among the extreme points. Hence we have argmaxαJ(α)={(13,13)}subscript𝛼superscript𝐽𝛼1313\arg\max_{\alpha}J^{*}(\alpha)=\{(\frac{1}{3},\frac{1}{3})\}roman_arg roman_max start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_J start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_α ) = { ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) }, i.e. Alice always plays α=(13,13)superscript𝛼1313\alpha^{*}=(\frac{1}{3},\frac{1}{3})italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 1 end_ARG start_ARG 3 end_ARG ) in any BNE of the game.

Now, consider Bob’s equilibrium strategy. βsuperscript𝛽\beta^{*}italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is an equilibrium strategy of Bob only if αargmaxαJ(α,β)superscript𝛼subscript𝛼𝐽𝛼superscript𝛽\alpha^{*}\in\arg\max_{\alpha}J(\alpha,\beta^{*})italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_arg roman_max start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_J ( italic_α , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

For each β𝛽\betaitalic_β, J(α,β)𝐽𝛼𝛽J(\alpha,\beta)italic_J ( italic_α , italic_β ) is a linear function of α𝛼\alphaitalic_α and

αJ(α,β)=(12c12+β1+12β2,12c12+12β1+β2),α(0,1)2.formulae-sequencesubscript𝛼𝐽𝛼𝛽12𝑐12subscript𝛽112subscript𝛽212𝑐1212subscript𝛽1subscript𝛽2for-all𝛼superscript012\displaystyle\nabla_{\alpha}J(\alpha,\beta)=\left(-\frac{1}{2}c-\frac{1}{2}+% \beta_{1}+\frac{1}{2}\beta_{2},\frac{1}{2}c-\frac{1}{2}+\frac{1}{2}\beta_{1}+% \beta_{2}\right),\quad\forall\alpha\in(0,1)^{2}.∇ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_J ( italic_α , italic_β ) = ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ∀ italic_α ∈ ( 0 , 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

We need αJ(α,β)|α=α=(0,0)evaluated-atsubscript𝛼𝐽𝛼superscript𝛽𝛼superscript𝛼00\nabla_{\alpha}J(\alpha,\beta^{*})\Big{|}_{\alpha=\alpha^{*}}=(0,0)∇ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_J ( italic_α , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_α = italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ( 0 , 0 ). Hence

12c12+β1+12β212𝑐12superscriptsubscript𝛽112superscriptsubscript𝛽2\displaystyle-\frac{1}{2}c-\frac{1}{2}+\beta_{1}^{*}+\frac{1}{2}\beta_{2}^{*}- divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =0;absent0\displaystyle=0;= 0 ;
12c12+12β1+β212𝑐1212superscriptsubscript𝛽1superscriptsubscript𝛽2\displaystyle\frac{1}{2}c-\frac{1}{2}+\frac{1}{2}\beta_{1}^{*}+\beta_{2}^{*}divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c - divide start_ARG 1 end_ARG start_ARG 2 end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =0,absent0\displaystyle=0,= 0 ,

which implies that β=(13+c,13c)superscript𝛽13𝑐13𝑐\beta^{*}=(\frac{1}{3}+c,\frac{1}{3}-c)italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG + italic_c , divide start_ARG 1 end_ARG start_ARG 3 end_ARG - italic_c ), proving the claim.

References

  • \bibcommenthead
  • Abreu and Rubinstein [1988] Abreu D, Rubinstein A (1988) The structure of Nash equilibrium in repeated games with finite automata. Econometrica: Journal of the Econometric Society pp 1259–1281. Available at https://doi.org/10.2307/1913097
  • Åström [1965] Åström KJ (1965) Optimal control of Markov processes with incomplete state information. Journal of mathematical analysis and applications 10(1):174–205. Available at https://doi.org/10.1016/0022-247x(65)90154-x
  • Aumann et al [1997] Aumann RJ, Hart S, Perry M (1997) The absent-minded driver. Games and Economic Behavior 20(1):102–116. Available at https://doi.org/10.1006/game.1997.0577
  • Banks and Sundaram [1990] Banks JS, Sundaram RK (1990) Repeated games, finite automata, and complexity. Games and Economic Behavior 2(2):97–117. Available at https://doi.org/10.1016/0899-8256(90)90024-o
  • Başar and Olsder [1999] Başar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23. SIAM
  • Battigalli [1996] Battigalli P (1996) Strategic independence and perfect Bayesian equilibria. Journal of Economic Theory 70(1):201–234. Available at https://doi.org/10.1006/jeth.1996.0082
  • Battigalli [1997] Battigalli P (1997) Dynamic consistency and imperfect recall. Games and Economic Behavior 20(1):31–50. Available at https://doi.org/10.1006/game.1997.0535
  • Bellman [1966] Bellman R (1966) Dynamic programming. Science 153(3731):34–37
  • Filar and Vrieze [2012] Filar J, Vrieze K (2012) Competitive Markov decision processes. Springer Science & Business Media
  • Fudenberg and Tirole [1991a] Fudenberg D, Tirole J (1991a) Game theory. MIT press
  • Fudenberg and Tirole [1991b] Fudenberg D, Tirole J (1991b) Perfect Bayesian equilibrium and sequential equilibrium. Journal of Economic Theory 53(2):236–260. Available at https://doi.org/10.1016/0022-0531(91)90155-w
  • Grove and Halpern [1997] Grove AJ, Halpern JY (1997) On the expected value of games with absentmindedness. Games and Economic Behavior 20(1):51–65. Available at https://doi.org/10.1006/game.1997.0558
  • Gupta et al [2014] Gupta A, Nayyar A, Langbort C, et al (2014) Common information based Markov perfect equilibria for linear-Gaussian games with asymmetric information. SIAM Journal on Control and Optimization 52(5):3228–3260. URL https://doi.org/10.1137/140953514
  • Gupta et al [2016] Gupta A, Langbort C, Başar T (2016) Dynamic games with asymmetric information and resource constrained players with applications to security of cyberphysical systems. IEEE Transactions on Control of Network Systems 4(1):71–81. Available at https://doi.org/10.1109/tcns.2016.2584183
  • Halpern [1997] Halpern JY (1997) On ambiguities in the interpretation of game trees. Games and Economic Behavior 20(1):66–96. Available at https://doi.org/10.1006/game.1997.0557
  • Halpern [2009] Halpern JY (2009) A nonstandard characterization of sequential equilibrium, perfect equilibrium, and proper equilibrium. International Journal of Game Theory 38(1):37–49. Available at https://doi.org/10.1007/s00182-008-0139-0
  • Hendon et al [1996] Hendon E, Jacobsen HJ, Sloth B (1996) The one-shot-deviation principle for sequential rationality. Games and Economic Behavior 12(2):274–282. Available at https://doi.org/10.1006/game.1996.0018
  • Hinderer [1970] Hinderer K (1970) Sufficient statistics, Markovian and stationary models, vol 33, Springer Berlin Heidelberg, Berlin, Heidelberg, chap 18, pp 118–126
  • Kao and Subramanian [2022] Kao H, Subramanian V (2022) Common information based approximate state representations in multi-agent reinforcement learning. In: International Conference on Artificial Intelligence and Statistics, PMLR, pp 6947–6967
  • Kay [1993] Kay SM (1993) Fundamentals of statistical signal processing: Estimation theory. Prentice-Hall, Inc.
  • Kreps and Wilson [1982] Kreps DM, Wilson R (1982) Sequential equilibria. Econometrica: Journal of the Econometric Society pp 863–894. Available at https://doi.org/10.2307/1912767
  • Kuhn [1953] Kuhn H (1953) Extensive games and the problem of information. In: Contributions to the Theory of Games (AM-28), Volume II. Princeton University Press, p 193–216
  • Kumar and Varaiya [2015] Kumar PR, Varaiya P (2015) Stochastic systems: Estimation, identification and adaptive control. SIAM
  • Mahajan and Mannan [2016] Mahajan A, Mannan M (2016) Decentralized stochastic control. Annals of Operations Research 241(1):109–126. Available at https://doi.org/10.1007/s10479-014-1652-0
  • Mas-Colell et al [1995] Mas-Colell A, Whinston MD, Green JR (1995) Microeconomic theory, vol 1. Oxford university press New York
  • Maskin and Tirole [2001] Maskin E, Tirole J (2001) Markov perfect equilibrium: I. Observable actions. Journal of Economic Theory 100(2):191–219. Available at https://doi.org/10.1006/jeth.2000.2785
  • Maskin and Tirole [2013] Maskin E, Tirole J (2013) Markov equilibrium, J. F. Mertens Memorial Conference. Available at https://youtu.be/UNtLnKJzrhs
  • Mertens and Neyman [1981] Mertens JF, Neyman A (1981) Stochastic games. International Journal of Game Theory 10(2):53–66. Available at https://doi.org/10.1007/bf01769259
  • Mertens and Parthasarathy [2003] Mertens JF, Parthasarathy T (2003) Equilibria for discounted stochastic games. In: Stochastic games and applications. Springer, p 131–172
  • Myerson [2013] Myerson RB (2013) Game theory. Harvard university press
  • Nayyar and Başar [2012] Nayyar A, Başar T (2012) Dynamic stochastic games with asymmetric information. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), IEEE, pp 7145–7150, available at https://doi.org/10.1109/cdc.2012.6426857
  • Nayyar et al [2011] Nayyar A, Mahajan A, Teneketzis D (2011) Optimal control strategies in delayed sharing information structures. IEEE Transactions on Automatic Control 56(7):1606–1620. Available at https://doi.org/10.1109/tac.2010.2089381
  • Nayyar et al [2013a] Nayyar A, Gupta A, Langbort C, et al (2013a) Common information based Markov perfect equilibria for stochastic games with asymmetric information: Finite games. IEEE Transactions on Automatic Control 59(3):555–570. Available at https://doi.org/10.1109/tac.2013.2283743
  • Nayyar et al [2013b] Nayyar A, Mahajan A, Teneketzis D (2013b) Decentralized stochastic control with partial history sharing: A common information approach. IEEE Transactions on Automatic Control 58(7):1644–1658. Available at https://doi.org/10.1109/tac.2013.2239000
  • Ouyang et al [2015] Ouyang Y, Tavafoghi H, Teneketzis D (2015) Dynamic oligopoly games with private Markovian dynamics. In: 2015 54th IEEE Conference on Decision and Control (CDC), IEEE, pp 5851–5858, available at https://doi.org/10.1109/cdc.2015.7403139
  • Ouyang et al [2016] Ouyang Y, Tavafoghi H, Teneketzis D (2016) Dynamic games with asymmetric information: Common information based perfect Bayesian equilibria and sequential decomposition. IEEE Transactions on Automatic Control 62(1):222–237. Available at https://doi.org/10.1109/tac.2016.2544936
  • Ouyang et al [2024] Ouyang Y, Tavafoghi H, Teneketzis D (2024) An approach to stochastic dynamic games with asymmetric information and hidden actions. Dynamic Games and Applications pp 1–34. Available at https://doi.org/10.1007/s13235-024-00558-7
  • Piccione and Rubinstein [1997] Piccione M, Rubinstein A (1997) On the interpretation of decision problems with imperfect recall. Games and Economic Behavior 20(1):3–24. Available at https://doi.org/10.1016/0165-4896(96)81573-3
  • Powell [2007] Powell WB (2007) Approximate Dynamic Programming: Solving the curses of dimensionality, vol 703. John Wiley & Sons
  • Rosenberg [1998] Rosenberg D (1998) Duality and Markovian strategies. International Journal of Game Theory 27(4). Available at https://doi.org/10.1007/s001820050091
  • Russell and Norvig [2002] Russell S, Norvig P (2002) Artificial intelligence: A modern approach. Prentice Hall
  • Shapley [1953] Shapley LS (1953) Stochastic games. Proceedings of the National Academy of Sciences 39(10):1095–1100. Available at https://doi.org/10.1073/pnas.39.10.1095
  • Shiryaev [1964] Shiryaev AN (1964) On Markov sufficient statistics in non-additive Bayes problems of sequential analysis. Theory of Probability & Its Applications 9(4):604–618. Available at https://doi.org/10.1137/1109082
  • Smallwood and Sondik [1973] Smallwood RD, Sondik EJ (1973) The optimal control of partially observable Markov processes over a finite horizon. Operations research 21(5):1071–1088. Available at https://doi.org/10.1287/opre.21.5.1071
  • Sondik [1978] Sondik EJ (1978) The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations research 26(2):282–304. Available at https://doi.org/10.1287/opre.26.2.282
  • Striebel [1965] Striebel C (1965) Sufficient statistics in the optimum control of stochastic systems. Journal of Mathematical Analysis and Applications 12(3):576–592. Available at https://doi.org/10.1016/0022-247X(65)90027-2
  • Striebel [1975] Striebel C (1975) Statistics Sufficient for Control, Springer Berlin Heidelberg, Berlin, Heidelberg, chap 3, pp 38–58
  • Subramanian et al [2022] Subramanian J, Sinha A, Seraj R, et al (2022) Approximate information state for approximate planning and reinforcement learning in partially observed systems. Journal of Machine Learning Research 23:12–1
  • Sundaram [1996] Sundaram RK (1996) A first course in optimization theory. Cambridge university press
  • Tang [2021] Tang D (2021) Games in multi-agent dynamic systems: Decision-making with compressed information. PhD thesis, University of Michigan
  • Tang et al [2023] Tang D, Tavafoghi H, Subramanian V, et al (2023) Dynamic games among teams with delayed intra-team information sharing. Dynamic Games and Applications 13:353–411. Available at https://doi.org/10.1007/s13235-022-00424-4
  • Tavafoghi [2017] Tavafoghi H (2017) On design and analysis of cyber-physical systems with strategic agents. PhD thesis, University of Michigan, Ann Arbor
  • Tavafoghi et al [2016] Tavafoghi H, Ouyang Y, Teneketzis D (2016) On stochastic dynamic games with delayed sharing information structure. In: 2016 IEEE 55th Conference on Decision and Control (CDC), IEEE, pp 7002–7009, available at https://doi.org/10.1109/cdc.2016.7799348
  • Tavafoghi et al [2022] Tavafoghi H, Ouyang Y, Teneketzis D (2022) A unified approach to dynamic decision problems with asymmetric information: Nonstrategic agents. IEEE Transactions on Automatic Control 67(3):1105–1119. Available at https://doi.org/10.1109/tac.2021.3060835
  • Varaiya and Walrand [1978] Varaiya P, Walrand J (1978) On delayed sharing patterns. IEEE Transactions on Automatic Control 23(3):443–445. Available at https://doi.org/10.1109/TAC.1978.1101739
  • Vasal et al [2019] Vasal D, Sinha A, Anastasopoulos A (2019) A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric information. IEEE Transactions on Automatic Control 64(1):81–96. Available at https://doi.org/10.1109/tac.2018.2809863
  • Watson [2017] Watson J (2017) A general, practicable definition of perfect Bayesian equilibrium. unpublished draft Available at https://econweb.ucsd.edu/~jwatson/PAPERS/WatsonPBE.pdf
  • Whittle [1969] Whittle P (1969) Sequential decision processes with essential unobservables. Advances in Applied Probability 1(2):271–287. Available at https://doi.org/10.2307/1426220