Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Modeling and Analysis of Cardiac Hybrid Cellular Automata via GPU-Accelerated Monte Carlo Simulation
Previous Article in Journal
Mathematical Modeling as a Catalyst for Equitable Mathematics Instruction: Preparing Teachers and Young Learners with 21st Century Skills
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Transferable Utility Cooperative Differential Games with Continuous Updating Using Pontryagin Maximum Principle

1
School of Mathematics and Statistics, Qingdao University, Qingdao 266071, China
2
St. Petersburg State University, 7/9, Universitetskaya nab., St. Petersburg 199034, Russia
3
School of Automation, Qingdao University, Qingdao 266071, China
4
Faculty of Applied Mathematics and Control Processes, St. Petersburg State University, Universitetskiy Prospekt, 35, Petergof, St. Petersburg 198504, Russia
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(2), 163; https://doi.org/10.3390/math9020163
Submission received: 28 November 2020 / Revised: 1 January 2021 / Accepted: 8 January 2021 / Published: 14 January 2021
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
We consider a class of cooperative differential games with continuous updating making use of the Pontryagin maximum principle. It is assumed that at each moment, players have or use information about the game structure defined in a closed time interval of a fixed duration. Over time, information about the game structure will be updated. The subject of the current paper is to construct players’ cooperative strategies, their cooperative trajectory, the characteristic function, and the cooperative solution for this class of differential games with continuous updating, particularly by using Pontryagin’s maximum principle as the optimality conditions. In order to demonstrate this method’s novelty, we propose to compare cooperative strategies, trajectories, characteristic functions, and corresponding Shapley values for a classic (initial) differential game and a differential game with continuous updating. Our approach provides a means of more profound modeling of conflict controlled processes. In a particular example, we demonstrate that players’ behavior is braver at the beginning of the game with continuous updating because they lack the information for the whole game, and they are “intrinsically time-inconsistent”. In contrast, in the initial model, the players are more cautious, which implies they dare not emit too much pollution at first.

1. Introduction

Dynamic or differential games are an important subsection of game theory that investigates interactive decision-making over time. A differential game is when the evolution of the decision process takes place over a continuous time frame, and it generally involves a differential equation. Differential games provide an effective tool for studying a wide range of dynamic processes such as, for example, problems associated with controlling pollution where it is important to analyze the interactions between participants’ strategic behaviors and the dynamic evolution in the levels of pollutants that polluters release. In Carlson and Leitmann [1], a direct methodfor finding open-loop Nash equilibria for a class of differential n-player games is presented.
Cooperative optimization points to the possibility of socially optimal and individually rational solutions to decision-making problems involving strategic action over time. The approach to solving a static cooperative differential game is typically done using a two-step procedure. First, one determines the collective optimal solution and then payoffs are transferred and distributed by using one of the many accessible cooperative game solutions, such as core, Shapley value, nucleolus. In the dynamic cooperative game, it must be assured that, over time, all players will comply with the agreement. This will occur if each player’s profits in the cooperative situation at any intermediate moment dominate their non-cooperative profits. This property is known as time consistency and was introduced by Petrosjan (originally 1977) [2].
In order to derive equilibrium solutions, existing differential games often depend on the assumption of time-invariant game structures. However, the future is full of essentially unknown events. Therefore, it is necessary to consider that the information available to players about the future be limited. In the realm of dynamic updating, the Looking Forward Approach is used in game theory, and in differential games especially. The Looking Forward Approach solves the problem of modeling players’ behavior when the process information is dynamically updating. This means that the Looking Forward Approach does not use a target trajectory, but composes how a trajectory is to be used by players and how a cooperative payoff is to be allocated along that trajectory. The Looking Forward Approach was first presented in [3]. Afterward, the works in [4,5,6,7,8,9] were published.
In [10,11,12,13,14,15], a class of non-cooperative differential games with continuous updating is considered, and it is assumed that the updating process continues to develop over time. In the paper [10], the Hamilton–Jacobi–Bellman equations of the Nash equilibrium in the game with the continuous updating are derived. The work in [11] is devoted to the class of cooperative differential games with the transferable utility using Hamilton–Jacobi–Bellman equations, construction of characteristic function with continuous updating and several related theorems. Another result related to Hamilton–Jacobi–Bellman equations with continuous updating is devoted to the class of cooperative differential games with nontransferable utility [15]. The works in [13,14] are devoted to the class of linear-quadratic differential games with continuous updating. There the cooperative and non-cooperative cases are considered and corresponding solutions are obtained. In the paper [12], the explicit form of the Nash equilibrium for the differential game with continuous updating is derived by using the Pontryagin maximum principle. In this paper, the class of cooperative game models is examined and results concerning a cooperative setting such as the construction of the notion of cooperative strategies, the characteristic function, and cooperative solution for a class of games with continuous updating using Pontryagin maximum principle are presented. Theoretical results for three players are illustrated on a classic differential game model of pollution control presented in [16]. Another potentially important application of continuous updating approach is related to the class of inverse optimal control problems with continuous updating [17]. The approach can be used for human behavior modeling in engineering systems.
The class of differential games with dynamic and continuous updating has some similarities with Model Predictive Control (MPC) theory which is worked out within the framework of numerical optimal control in the books [18,19,20,21]. The current action of control is realized by solving a limited level of open-loop optimal control problem at each sampling moment in the MPC method. For linear systems, there is an explicit solution [22,23]. However, in general, the MPC approach needs to solve several optimization problems. Another series of related papers corresponding to the stable control category is [24,25,26,27], in which similar methods are considered for the linear-quadratic optimal control problem category. However, the goals of the current paper and the paper on continuous updating methods are different: when the information about the game process is continuous updated over time, the player’s behavior can be modeled. In [28,29], a similar issue is considered and the authors investigate repeated games with sliding planning horizons.
The paper is structured as follows. Section 2 starts with the initial differential game model and cooperative differential model. Section 3 demonstrates the knowledge of a differential game with continuous updating and a cooperative differential game with continuous updating by using the Pontryagin maximum principle method and obtains the definition of a characteristic function with continuous updating. It also presents the results of the theoretical portion. Section 4 gives an example of pollution control based on continuous updating. The conclusion is drawn in Section 5.

2. Initial Differential Game Model

2.1. Preliminary Knowledge

Consider the differential game starting with the initial position x 0 and evolving on time interval [ t 0 , T ] . The equations of the system’s dynamics have the form
x ˙ ( t ) = f ( t , x , u ) , x ( t 0 ) = x 0 ,
where x R l is a set of variables that characterizes the state of the dynamical system at any instant of time during the play of the game, u = ( u 1 , , u n ) , u i U i comp R k , is the control of player i. We shall use the notation U = U 1 × U 2 × × U n .
The existence, uniqueness, and continuability of solution x ( t ) for any admissible measurable controls u 1 ( · ) , , u n ( · ) was dealt with by Tolwinski, Haurie, and Leitmann [30]:
  • f ( · ) : R × R n × U R n is continuous.
  • There exists a positive constant k such that
    t [ t 0 , T ] a n d u U
    f ( t , x , u ) k ( 1 + x )
  • R > 0 , K R > 0 such that
    t [ t 0 , T ] a n d u U
    f ( t , x , u ) f ( t , x , u ) K R x x
    for all x and x such that
    x R a n d x R
  • for any t [ t 0 , T ] and x X set
    G ( x t ) = { f ( t , x , u ) | u U }
    is a convex compact from R l .
The payoff of player i is then defined as
K i ( x 0 , t 0 , T ; u ) = t 0 T g i [ t , x ( t ) , u ] d t , i N ,
where g i [ t , x , u ] , f ( t , x , u ) are the integrable functions, x ( t ) is the solution of the Cauchy problem (1) with fixed open-loop controls u ( t ) = ( u 1 ( t ) , , u n ( t ) ) . The strategy profile u ( t ) = ( u 1 ( t ) , , u n ( t ) ) is called admissible if the problem (1) has a unique and continuable solution.
Let us agree that in a differential game, a subgame is a “truncated” version of the whole game. A subgame is a game in its own right and a subgame starts out at time instant t [ t 0 , T ] , after a particular history of actions u ( t ) . Denote such a subgame by Γ ( x , t , T ) . (A remark on this notation is in order. Let the state of the game be defined by the pair ( x , t ) and denote by Γ ( x , t , T ) the subgame starting at date t with the state variable x; here, the model considered is of a finite horizon and we will have the terminal time T. If we take account of the infinite horizon, of course, one expects all corresponding value functions to depend on state and not on time.)
For each ( x , t , T ) X × [ t 0 , T ] × R , we define a subgame Γ ( x , t , T ) by replacing the objective functional for player i and the system dynamics by
K i ( x , t , T ; u ( s ) ) = t T g i [ s , x ( s ) , u ( s ) ] d s , i N ,
and
x ˙ ( s ) = f ( s , x ( s ) , u ( s ) ) , x ( t ) = x ,
respectively. Therefore, Γ ( x , t , T ) is a differential game defined on the time interval [ t , T ] with initial condition x ( t ) = x .

2.2. Cooperative Differential Game Model

We adopt a cooperative game methodology to solve the differential game model with transferable utility. The steps are as follows.
  • Define the cooperative behavior or strategies and corresponding cooperative trajectory.
  • Determine the computation of the characteristic function values.
  • Allocate among players a total cooperative payoff, such as the allocation belongs to the kernel, the bargaining set, the stable set, the core, the Shapley value and the nucleolus (see, e.g., Osborne and Rubinstein [31] for an introduction to these concepts).
First of all, we introduce the notions of cooperative strategies for players u * = ( u i * , , u n * ) and the corresponding trajectory x * ( t ) . Strategies u * ( t ) are called optimal strategies, i.e., a set of controls that maximizes the joint payoff of players:
u * = ( u i * , , u n * ) = a r g max u 1 , , u n i N K i ( x 0 , t 0 , T ; u ) .
Suppose that the maximum in (2) is achieved on the set of admissible strategies. If we substitute u * ( t ) into Equation (1), we can get the cooperative trajectory x * ( t ) .
Consequently, to determine the way to distribute the maximum total payoff among players, it is fundamental to define the concept of the characteristic function of the coalition S N . This characteristic function shows the strength of the coalition; remarkably, it allows us to take into account the players’ contributions to each coalition.
To define a cooperative game, a characteristic function must be introduced. We call V ( S ; x 0 , t 0 , T ) , S N is a characteristic function for the initial differential game Γ ( x 0 , t 0 , T ) . Through a characteristic function we understand a map from the set of all possible coalitions:
V ( · ) : 2 N R , V ( ) = 0 ,
which assigns to each coalition S the total payoff value which the players from S can guarantee when acting independently. An important property is the superadditivity of a characteristic function:
V ( S 1 S 2 ; x 0 , t 0 , T ) V ( S 1 ; x 0 , t 0 , T ) + V ( S 2 ; x 0 , t 0 , T ) , S 1 , S 2 N , S 1 S 2 = .
The question of constructing a characteristic function is one of the main questions in cooperative game theory. Originally, the value of the characteristic function V ( S ) was interpreted by von Neumann and Morgenstern (1944) as the maximum guaranteed payoff of coalition S that it can gain acting independently of other players [32]. Presently, it is known that there are various means of constructing characteristic functions in cooperative games, such as α —c.f. [33], β —c.f. [34], ζ —c.f. [35], and γ —c.f. [36].
Similar to the above, for each ( x * ( t ) , t , T ) X × [ t 0 , T ] × R , we define a cooperative subgame Γ c ( x * ( t ) , t , T ) (The superscript “c” means “cooperative”) along the cooperative trajectory x * ( t ) by replacing the objective functional for player i and the system dynamics by
i N K i ( x * ( t ) , t , T ; u ) = i N t T g i [ s , x ( s ) , u ( s ) ] d s ,
and
x ˙ ( s ) = f ( s , x ( s ) , u ( s ) ) , x ( t ) = x * ( t ) ,
respectively. Therefore, Γ c ( x * ( t ) , t , T ) is a cooperative differential game defined on the time interval [ t , T ] with initial condition x ( t ) = x * ( t ) .
In this paper, we will adopt the constructive approach proposed by Petrosjan L. and Zaccour G. [37] with respect to putting together a δ -characteristic function. V ( S ; x * ( t ) , t , T ) denotes the strength of a coalition S for the subgame Γ ( x * ( t ) , t , T ) , it can be calculated in two stages: in the beginning, we are obliged to compute the Nash equilibrium strategies { u i N E } for all players i N , and second, we refrigerate the strategy for the players of N \ S , and the players of the coalition S seek to maximize their joint revenue i S K i on u S = { u i } i S . Thus, the definition of the characteristic function is given:
V ( S ; x * ( t ) , t , T ) = max u 1 , , u n i N K i ( x * ( t ) , t , T ; u ) , S = N , max u i , i S i S K i ( x * ( t ) , t , T ; u S , u N \ S N E ) , S N , 0 , S = .
Denote by L ( x * ( t ) , t , T ) the set of imputations in the game Γ ( x * ( t ) , t , T ) :
L ( x * ( t ) , t , T ) = { ξ ( x * ( t ) , t , T ) = ( ξ 1 ( x * ( t ) , t , T ) , , ξ n ( x * ( t ) , t , T ) ) : i = 1 n ξ i ( x * ( t ) , t , T ) = V ( N ; x * ( t ) , t , T ) , ξ i ( x * ( t ) , t , T ) V ( { i } ; x * ( t ) , t , T ) , i N } ,
where V ( { i } ; x * ( t ) , t , T ) is a value of characteristic function V ( S ; x * ( t ) , t , T ) for coalition S = { i } .
By M ( x * ( t ) , t , T ) represent any cooperative solution or subset of imputation set L ( x * ( t ) , t , T ) :
M ( x * ( t ) , t , T ) L ( x * ( t ) , t , T ) .
In fact, the two extensively used cooperative solutions are the Shapley value and the core. In the following, we will consider a specific cooperative solution referred to as the Shapley value [38]. The Shapley value selects a single imputation, an n-vector denoted s h ( · ) = ( s h 1 ( · ) , s h 2 ( · ) , , s h n ( · ) ) , satisfying three axioms: fairness, which means similar players are treated equally; efficiency ( i = 1 n s h i ( · ) = V ( N ; · ) ); and linearity (a relatively technical axiom required to obtain uniqueness). The Shapley value is defined in a unique way and is particularly suitable for a range of applications.
The Shapley value s h ( x * ( t ) , t , T ) = ( s h 1 ( x * ( t ) , t , T ) , , s h n ( x * ( t ) , t , T ) ) M ( x * ( t ) , t , T ) in the game Γ c ( x * ( t ) , t , T ) is a vector, such that
s h i ( x * ( t ) , t , T ) = K N , i K ( k 1 ) ! ( n k ) ! n ! [ V ( K ; x * ( t ) , t , T ) V ( K \ i ; x * ( t ) , t , T ) ] .

3. Differential Game with Continuous Updating

3.1. Preliminary Knowledge

In order to compose the corresponding differential game with continuous updating, we will apply the classic differential game with the specified duration of T ¯ to the continuously updated differential game. Consider the family of games Γ ( x , t , t + T ¯ ) starting from the state x at an arbitrary time t > t 0 . Furthermore, assume that the evolution of the state of the game Γ ( x , t , t + T ¯ ) can be described by the ordinary differential equation
x ˙ t ( s ) = f ( s , x t ( s ) , u ( t , s ) ) , x t ( t ) = x ,
where x ˙ t ( s ) is the derivative with respect to s, x t R l are the state variables of a game that initials from time t, and u ( t , s ) = ( u 1 ( t , s ) , , u n ( t , s ) ) , u i ( t , s ) U i comp R k , s [ t , t + T ¯ ] , indicates the control profile of the game that initials from time t at the instant time s.
For the game Γ ( x , t , t + T ¯ ) , the player’s payoff function has the following form,
K i t ( x , t , t + T ¯ ; u ( t , s ) ) = t t + T ¯ g i [ s , x t , u ] d s , i N ,
where x t ( s ) , u ( t , s ) are trajectory and strategy profile in the game Γ ( x , t , t + T ¯ ) .
The continuously updated differential games can be established in consonance with the following rules.
The instant time t [ t 0 , + ) is continuously evolving, and appropriately, players continue to attain new information about the equations of motion and payment functions in the game Γ ( x , t , t + T ¯ ) .
The strategy vector u ( t ) in the continuously updated differential game is as follows,
u ( t ) = u ( t , s ) | s = t , t [ t 0 , + ) ,
where u ( t , s ) , s [ t , t + T ¯ ] are strategies in the game Γ ( x , t , t + T ¯ ) .
Determine the trajectory x ( t ) in the continuously updated differential game according to
x ˙ ( t ) = f ( t , x , u ) , x ( t 0 ) = x 0 , x R l ,
where u = u ( t ) are strategies in the continuously updated differential game (6), and x ˙ ( t ) is the derivative with respect to t. We assume that the strategy in the continuously updated differential game achieved using (6) is either admissible, or that the uniqueness and continuity of the solution of problem (7) can be guaranteed. The existence, uniqueness, and continuity conditions of the open-loop Nash equilibrium for the continuously updated differential game have been mentioned previously.
There is the indispensable difference between a continuously updated differential game and a classic differential game with the specified duration Γ ( x 0 , t 0 , T ) . In the case of classic game, the players are conducted by payoffs that they will finally gain within the time interval [ t 0 , T ] ; but in the game with continuous updating, they orient themselves toward the expected payoffs (5) at each time instant t [ t 0 , T ] , which are computed due to the information determined by the interval [ t , t + T ¯ ] , or the information that they possess at the instant time t. The subgame of the initial game has the form Γ ( x , t , T ) , by using the same method we can define a family subgames of the differential game with continuous updating at each t as Γ ( x t , s , s , t + T ¯ ) , where x t , s is the state at the instant time s [ t , t + T ¯ ] . We will define this next.
First, we introduce the dynamic of the state:
x ˙ t ( τ ) = f ( τ , x t ( τ ) , u ( t , τ ) ) , x t ( s ) = x t , s .
Therefore, the payoff function of player i in a subgame with continuous updating Γ ( x t , s , s , t + T ¯ ) has the form
K i t ( x t , s , s , t + T ¯ ; u ) = s t + T ¯ g i [ τ , x t ( τ ) , u ( t , τ ) ] d τ , i N ,
where x t ( τ ) satisfy (8) and u ( t , τ ) , τ [ s , t + T ¯ ] , are strategies in the subgame Γ ( x t , s , s , t + T ¯ ) .

3.2. Cooperative Differential Game with Continuous Updating

In a cooperative setting, before starting the game, all players agree to behave jointly in an optimal way (cooperate).

3.2.1. The Approach to Define the Characteristic Function on the Interval [ s , t + T ¯ ]

We introduce the notion of characteristic function V ˜ t ( S ; x , s , t + T ¯ ) , S N defined for each subgame Γ ( x t , s , s , t + T ¯ ) , which s [ t , t + T ¯ ] , t [ t 0 , + ) . Before introducing the characteristic function for the subgame Γ ( x t , s , s , t + T ¯ ) , it should be mentioned that from the Equation (4), we can derive that x t , s depends on the initial point x. Therefore, we can replace x t , s by x in the previous statement, such as by using Γ ( x , s , t + T ¯ ) and K i t ( x , s , t + T ¯ ; u ) to represent the subgame and the payoff function for each player i of the subgame, respectively. Thus, the characteristic function is given:
V ˜ t ( S ; x , s , t + T ¯ ) = max u 1 , , u n i = 1 n K i t ( x , s , t + T ¯ ; u 1 , , u n ) , S = N , max u i , i S i S K i t ( x , s , t + T ¯ ; u S , u N \ S N E ) , S N , 0 , S = ,
where x = x t ( t ) , t [ t 0 , + ) , s [ t , t + T ¯ ] , which we have already described in (4). Moreover, u S = { u i } i S is the strategy profile for the players in the coalition S.
We assume that superadditivity conditions for the characteristic function V ˜ t ( S ; x , s , t + T ¯ ) are satisfied:
V ˜ t ( S 1 S 2 ; x , s , t + T ¯ ) V ˜ t ( S 1 ; x , s , t + T ¯ ) + V ˜ t ( S 2 ; x , s , t + T ¯ ) , S 1 , S 2 N , S 1 S 2 = .

3.2.2. An Algorithm to Calculate Characteristic Function with Continuous Updating and the Shapley Value

The first three steps are to compute the necessary elements in order to define the characteristic function. In the next step, the Shapley value is computed.
Step 1: Optimizing the total payment of the grand coalition with continuous updating.
We shall refer to the cooperative differential game described above by Γ c ( x , t , t + T ¯ ) , the duration of the game is T ¯ . We believe that there are no inherent obstacles to cooperation between players, and their benefits can be transferred. More specifically, we assume that before the game actually starts, the players agree to cooperate in the game.
Definition 1.
Strategy profile u ˜ * ( t , s ) = ( u ˜ 1 * ( t , s ) , , u ˜ n * ( t , s ) ) is generalized open-loop cooperative strategies in a game with continuous updating if, for any fixed t [ t 0 , + ) strategy profile, u ˜ * ( t , s ) are open-loop cooperative strategies in game Γ c ( x , t , t + T ¯ ) .
Using the generalized open-loop cooperative strategies, it seems possible to define the solution concept for a game model with continuous updating.
Definition 2.
Strategy profile u * ( t ) = ( u 1 * ( t ) , , u n * ( t ) ) are called open-loop cooperative strategies in a game with continuous updating when defined in the following way,
u * ( t ) = u ˜ * ( t , s ) | s = t , t [ t 0 , + ) ,
where u ˜ * ( t , s ) has defined the above.
We would like to interpret that the “intrinsically time-inconsistent” of players as follows:
  • u * ( t ) in the moment t coincides with cooperative strategies in the game defined on the interval [ t , t + T ¯ ] ,
  • u * ( t + ϵ ) in the instant t + ϵ has to coincide with cooperative strategies in the game defined on the interval [ t + ϵ , t + ϵ + T ¯ ] .
Trajectory x * ( t ) that corresponds to open-loop cooperative strategies with continuous updating u * ( t ) can be obtained from the system
x ˙ ( t ) = f ( t , x , u * ) , x ( t 0 ) = x 0 , x R l .
Here, x * ( t ) denotes a cooperative trajectory with continuous updating.
Let there exist a set of controls
u ˜ * ( t , s ) = ( u ˜ 1 * ( t , s ) , , u ˜ n * ( t , s ) ) , s [ t , t + T ¯ ] , t [ t 0 , + )
such that
max u 1 , , u n i = 1 n K i t ( x * ( t ) , t , t + T ¯ ; u 1 , , u n ) = max u 1 , , u n i = 1 n t t + T ¯ g i [ s , x t ( s ) , u ( t , s ) ] d s s . t . x t ( s ) s a t i s f i e s x ˙ t ( s ) = f ( s , x t ( s ) , u ( t , s ) ) , x t ( t ) = x * ( t ) .
The solution x t * ( s ) of the system (11) corresponding to u ˜ * ( t , s ) is called the corresponding generalized cooperative trajectory.
Theorem 1.
Let ( i ) f ( s , · , u ( t , s ) ) be continuously differentiable at R l , s [ t , t + T ¯ ] ,
( i i ) g i ( · , · , u ( t , s ) ) be continuously differentiable on R × R l , s [ t , t + T ¯ ] .
A set of strategies, { u ˜ i * ( t , s ) } i N , provides generalized open-loop cooperative strategies in a differential game with continuous updating to the problem in (11), if for any fixed t [ t 0 , T ] , there exists a costate variable ψ t ( s ) with s [ t , t + T ¯ ] so that the following relations are satisfied:
(1) x ˙ t * ( s ) = f ( s , x t * , u ˜ * ) , x t * ( t ) = x * ( t ) , for s [ t , t + T ¯ ] ,
(2) u ˜ i * ( t , s ) = arg max u i U i , i N H t ( s , x t * ( s ) , u ( t , s ) , ψ t ( s ) ) , where s [ t , t + T ¯ ] , i N ,
(3) ψ ˙ t ( s ) = x t H t ( s , x t ( s ) , u ˜ * ( t , s ) , ψ t ( s ) ) , where s [ t , t + T ¯ ] ,
ψ t ( t + T ¯ ) = 0 .
Remark 1.
Let fix t t 0 and consider game Γ c ( x , t , t + T ¯ ) .
The motion equation is in the form
x ˙ t ( s ) = f ( s , x t ( s ) , u ( t , s ) ) , x t ( t ) = x * ( t ) , s [ t , t + T ¯ ] .
The payoff function of the grand coalition has the form
i = 1 n K i t ( x * ( t ) , t , t + T ¯ ; u 1 ( t , s ) , , u n ( t , s ) ) = i = 1 n t t + T ¯ g i [ s , x t ( s ) , u ( t , s ) ] d s .
For the optimization problem (12) and (13) Hamiltonian has the form
H t ( s , x t ( s ) , u ( t , s ) , ψ t ( s ) ) = i = 1 n g i [ s , x t ( s ) , u ( t , s ) ] + ψ t ( s ) f ( s , x t ( s ) , u ( t , s ) ) , s [ t , t + T ¯ ] .
If u ˜ i * ( t , s ) , i N are the generalized open-loop cooperative strategies in the differential game with continuous updating; then, as stated in Definition 1, for every fixed t t 0 , u ˜ i * ( t , s ) , i N is an open-loop cooperative strategy in game Γ c ( x * ( t ) , t , t + T ¯ ) . Therefore, for any fixed t t 0 , the conditions (1)–(3) of the theorem are satisfied as necessary conditions for cooperative strategies in open-loop strategies (see in [39]).
On the other hand, if for every t t 0 , the Hamiltonian H t are concave in ( x t , u ( t , s ) ) , then the conditions of the theorem are sufficient for a cooperative open-loop solution [40].
Then, for any fixed t [ t 0 , + ) , we can obtain the generalized open-loop cooperative strategy u ˜ * ( t , s ) = ( u ˜ i * ( t , s ) , , u ˜ n * ( t , s ) ) and its corresponding state trajectory x t * ( s ) , s [ t , t + T ¯ ] . Using Definitions 1 and 2, we can obtain the cooperative strategies with continuous updating { u i * ( t ) } i N and corresponding cooperative trajectory with continuous updating x * ( t ) , t [ t 0 , + ) .
In order to get the characteristic function of the grand coalition in subgame Γ ( x * ( t ) , s , t + T ¯ ) , substituting u ˜ * ( t , τ ) and x t * ( τ ) into the corresponding payoff function, denote V ˜ t ( N ; x * ( t ) , s , t + T ¯ ) as the function of the coalition N. The current-value maximized cooperative payoff V ˜ t ( N ; x * ( t ) , s , t + T ¯ ) can be expressed as
V ˜ t ( N ; x * ( t ) , s , t + T ¯ ) = i = 1 n s t + T ¯ g i [ τ , x t * ( τ ) , u ˜ * ( t , τ ) ] d τ .
Step 2: Computation of the generalized open-loop Nash equilibrium with continuous updating.
The problem of a non-cooperative subgame along the cooperative trajectory with continuous updating Γ ( x * ( t ) , s , t + T ¯ ) can be stated as follows,
max u i U i K i t ( x * ( t ) , s , t + T ¯ ; u i ( t , s ) , u ˜ i N E ( t , s ) ) = max u i U i s t + T ¯ g i [ τ , x t ( τ ) , u i ( t , τ ) , u ˜ i N E ( t , τ ) ] d τ s . t . x t ( τ ) s a t i s f i e s ( 8 )
where u ˜ i N E ( t , τ ) = ( u ˜ 1 N E ( t , τ ) , , u ˜ i 1 N E ( t , τ ) , u ˜ i + 1 N E ( t , τ ) , , u ˜ n N E ( t , τ ) ) .
In this setting, the current-value Hamiltonian function can be written as
H i t ( τ , x t ( τ ) , u ( t , τ ) , ψ i t ( τ ) ) = g i ( τ , x t , u ) + ψ i t ( τ ) f ( τ , x t , u ) , τ [ s , t + T ¯ ] , i N ,
where τ [ s , t + T ¯ ] , t [ t 0 , + ) . By using the Pontryagin maximum principle with continuous updating [12], we can get the open-loop Nash equilibrium { u ˜ i N E ( t , τ ) } i N , and the corresponding trajectory x t N E ( τ ) , τ [ s , t + T ¯ ] , t [ t 0 , + ) . It is then easy to derive the characteristic function of a single-player coalition as follows, for each i = 1 , 2 , , n
V ˜ t ( { i } ; x * ( t ) , s , t + T ¯ ) = s t + T ¯ g i [ τ , x t N E ( τ ) , u ˜ N E ( t , τ ) ] d τ , s [ t , t + T ¯ ] , t [ t 0 , + ) .
Step 3: Compute the characteristic function for all remaining possible coalitions with continuous updating.
Here, we need to compute only the coalitions that contain more than one player and exclude a grand coalition. There will be 2 n n 2 subsets obtained in the following way. We will apply the δ -characteristic function so that players of S maximize their total payoff i S K i t ( x * ( t ) , s , t + T ¯ ; u S , u ˜ N \ S N E ) along the cooperative strategy with continuous updating x * ( t ) , while the other players, those from N \ S , use generalized open-loop Nash-equilibrium strategies
u ˜ N \ S N E = { u ˜ j N E } j N \ S .
Thus, we have a two-stage construction procedure for the characteristic function: (1) Find generalized open-loop Nash equilibrium strategies u ˜ i N E ( t , τ ) for all players i N , which we have found in the Step 2; (2) “Freeze” the Nash equilibrium strategies u ˜ j N E ( t , τ ) for players from N \ S , and, as for the player from the coalition S, maximize their total payoff over u S = { u i } i S . In order to compute the value function of the subgame Γ ( x * ( t ) , s , t + T ¯ ) , t [ t 0 , + ) , s [ t , t + T ¯ ] , we present the following concept.
Definition 3.
A set of strategies u ˜ S * = { u ˜ i * ( t , τ ) } i S , τ [ s , t + T ¯ ] , provides a generalized open-loop optimal strategy for coalition S N in a subgame with continuous updating Γ ( x * ( t ) , s , t + T ¯ ) when it is the solution obtained by using the Pontryagin maximum principle of the following problem
max u S U S i S K i t ( x * ( t ) , s , t + T ¯ ; u S , u ˜ N \ S N E ) = max u S U S i S s t + T ¯ g i [ τ , x t S ( τ ) , u S ( t , τ ) , u ˜ N \ S N E ( t , τ ) ] d τ s . t . x t S ( τ ) = f ( τ , x t S ( τ ) , u S ( t , τ ) , u ˜ N \ S N E ( t , τ ) ) , x t S ( s ) = x t , s .
The Hamiltonian function of the problem (14) has the form, S (The uppercase letter “S” in the paper always denotes the coalition S, e.g., x t S , ψ S t , and u S ) N :
H S t ( τ , x t S ( τ ) , u S ( t , τ ) , u ˜ N \ S N E ( t , τ ) , ψ S t ) = i S g i ( τ , x t S , u S , u ˜ N \ S N E ) + ψ S t ( τ ) f ( τ , x t S , u S , u ˜ N \ S N E ) .
Theorem 2.
Let
( i ) f ( τ , · , u ( t , τ ) ) be continuously differentiable on R l , τ [ s , t + T ¯ ] ,
( i i ) g i ( · , · , u ( t , τ ) ) be continuously differentiable on R × R l .
A set of strategies, u ˜ S * = { u ˜ i * ( t , τ ) } i S , provides a generalized open-loop optimal strategies of the coalition S in subgame with continuous updating Γ ( x * ( t ) , s , t + T ¯ ) to the problem (14) if there exists 2 n n 2 costate functions ψ S t ( τ ) , where τ [ s , t + T ¯ ] , S N , so that, for s [ t , t + T ¯ ] , t [ t 0 , + ) , the following relations are satisfied:
(1) x ˙ t S ( τ ) = f ( τ , x t S , u ˜ S * ( t , τ ) , u ˜ N \ S N E ) , x t S ( s ) = x t , s , for τ [ s , t + T ¯ ] ,
(2) u ˜ S * ( t , τ ) = arg max u S U S H S t ( x t S ( τ ) , u S ( t , τ ) , u ˜ N \ S N E ( t , τ ) , ψ S t ( τ ) ) , for τ [ s , t + T ¯ ] , where u ˜ S * ( t , τ ) = { u ˜ i S ( t , τ ) } i S
(3) ψ ˙ S t ( τ ) = x t H S t ( τ , x t ( τ ) , u ˜ S * ( t , τ ) , u ˜ N \ S N E ( t , τ ) , ψ S t ( τ ) ) , where τ [ s , t + T ¯ ] , S N ψ S t ( t + T ¯ ) = 0 , S N .
Proof. 
Follow the proof of Theorem 1. □
Therefore, the agents in the coalition S will adopt the generalized open-loop optimal control u ˜ S * ( t , τ ) characterized in Theorem 2. Note that these controls are functions of fixed time t [ t 0 , + ) and instant time τ [ s , t + T ¯ ] .
An illustration of the characteristic function for the coalition S N is provided in the following way,
V ˜ t ( S ; x * ( t ) , s , t + T ¯ ) = i S s t + T ¯ g i [ τ , x t S ( τ ) , u ˜ S * ( t , τ ) , u ˜ N \ S N E ( t , τ ) ] d τ , s [ t , t + T ¯ ] , t [ t 0 , + ) ,
where x t S ( τ ) is the trajectory at time instant τ [ s , t + T ¯ ] when the players in coalition S use generalized open-loop optimal strategies u ˜ S * ( t , τ ) , while players in N \ S use generalized open-loop Nash equilibrium u ˜ N \ S N E ( t , τ ) that was already derived in Step 2.
For the characteristic function in the game model with continuous updating, first, suppose that the function V ˜ t ( S ; x * ( t ) , s , t + T ¯ ) , S N can be continuously differentiated by s [ t , t + T ¯ ] . Moreover, through t [ t 0 , + ) can be integrated, the characteristic function in the game model with continuous updating V ( S ; x * ( t ) , t , T ) is defined as follows.
Definition 4.
Function V ( S ; x * ( t ) , t , T ) , t [ t 0 , T ] , S N is a characteristic function of the differential game with continuous updating Γ ( x * ( t ) , t , T ) , if it is defined as the following integral,
V ( S ; x * ( t ) , t , T ) = t T d d s V ˜ τ ( S ; x * ( τ ) , s , τ + T ¯ ) | s = τ d τ , t [ t 0 , T ] , S N ,
where V ˜ τ ( S ; x * ( τ ) , s , τ + T ¯ ) , s [ τ , τ + T ¯ ] , τ [ t , T ] , S N defined on the interval [ s , τ + T ¯ ] is a characteristic function in game Γ ( x * ( τ ) , s , τ + T ¯ ) .
In (15), we assume that the intergal is taken with a finite time interval because in this case we can only claim that the values of the characteristic function with continuous updating are finite. Later on in the example model, we shall calculate the characteristic function and Shapley value using the final interval method. We assume that superadditivity conditions are satisfied:
V ( S 1 S 2 ; x * ( t ) , t , T ) V ( S 1 ; x * ( t ) , t , T ) + V ( S 2 ; x * ( t ) , t , T ) , S 1 , S 2 N , S 1 S 2 = .
Step 4: Compute the Shapley value based on the characteristic function with continuous updating.
Consider again the cooperative game model Γ c ( x * ( t ) , t , T ) with continuous updating. If the players are allowed to form different coalitions consisting of a subset of all players K N . There are k players in the subset K. An imputation set of cooperative game Γ c ( x * ( t ) , t , T ) is the set L ( x * ( t ) , t , T ) = { ξ ( x * ( t ) , t , T ) = ( ξ 1 ( x * ( t ) , t , T ) , , ξ n ( x * ( t ) , t , T ) ) } , which satisfies the conditions
ξ i ( x * ( t ) , t , T ) V ( { i } ; x * ( t ) , t , T ) , i N ; i N ξ i ( x * ( t ) , t , T ) = V ( N ; x * ( t ) , t , T ) } .
A cooperative solution or the optimal principle is a non-empty subset of the imputation set L ( x * ( t ) , t , T ) . In particular, the Shapley value s h ( x * ( t ) , t , T ) = ( s h 1 ( x * ( t ) , t , T ) , , s h n ( x * ( t ) , t , T ) ) is an imputation whose components are defined as
s h i ( x * ( t ) , t , T ) = K N , i K ( k 1 ) ! ( n k ) ! n ! [ V ( K ; x * ( t ) , t , T ) V ( K \ i ; x * ( t ) , t , T ) ] ,
where K \ i is the relative complement of i in K, the notion V ( K ; x * ( t ) , t , T ) is defined by Definition 4 and is the profit of coalition K. Meanwhile, [ V ( K ; x * ( t ) , t , T ) V ( K \ i ; x * ( t ) , t , T ) ] is the marginal contribution of player i to coalition K.
There are many other cooperative optimality principles, for example, the von Neumann–Morgenstern solution, N-core, and nucleus. In all cases they involve some subsets of the game imputation set.

4. A Cooperative Differential Game for Pollution Control

Let us consider the following game proposed by Long [41]. When countries are indexed by i N , we denote that n = | N | . It is assumed that each player has an industrial production site and the production is proportional to the pollutant u i . Therefore, the player’s strategy is to decide the amount of pollutants emitted into the atmosphere.

4.1. Initial Game Model

Pollution accumulates over time. We denote by x ( t ) the stock of pollution at time t and assume that the countries “contribute” to the same stock of pollution. For simplicity, the evolution of stock x ( t ) is represented by the following linear equation:
x ˙ ( t ) = i = 1 n u i ( t ) δ x ( t ) , x ( t 0 ) = x 0 ,
where δ is a constant rate of decay, in other words, the absorption rate of pollution by nature.
In the following, we assume that the absorption coefficient δ is equal to zero:
x ˙ ( t ) = i = 1 n u i ( t ) , x ( t 0 ) = x 0 .
Pollution is a “public bad” because it exerts adverse affects on health, quality of life, and productivity. We assume that these adverse effects can be represented by having x as an argument of the instantaneous social welfare function F i , with negative derivative:
F i = F i ( x , t , u i ) , F i x < 0 .
In each country, aggregate social welfare is taken to be the integral of the instantaneous social welfare. Thus, the payoff of the player i can be formulated as follows,
K i ( x 0 , t 0 , T ; u ) = t 0 T F i ( x , t , u i ) d t .
For tractability, the function F i is often assumed to take the separable form:
F i ( x , t , u i ( t ) ) = R i ( u i ( t ) ) D i ( x ) ,
where R i ( u i ) may be thought of as the utility of the benefit, and D i ( x ) as the “disutility” caused by pollution. Following standard practice, we take it that R i ( u i ) is strictly concave and increasing in u i , and that D i ( x ) is convex and increasing in x. The possibility that D i is linear is not ruled out.
We assume that the environmental damage cost of player i caused by the pollution stock is D i ( x ) = d i x and the damage cost D i ( x ) increases convexly. In the environmental economics literature, the typical assumption is that the production income function of player i can be expressed as a function of emissions, namely, R i ( u i ( t ) ) = b i u i 1 2 u i 2 , satisfying R i ( 0 ) = 0 , where b i and d i are positive parameters. For the above benefit function to have a concave increase in emissions, we impose the restriction u i ( t ) ( 0 , b i ) .
Suppose that the game is played in a cooperative scenario in which players have the opportunity to cooperate in order to achieve maximum total payoff:
max u 1 , u 2 , . . . , u n i = 1 n K i ( x 0 , t 0 , T ; u ) = i = 1 n t 0 T ( ( b i 1 2 u ˜ i ) u ˜ i d i x ) d t
To solve the optimization problem in (19) and (18), we invoke the Pontryagin maximum principle to characterize the solution as follows. Obviously, these are linear state games (These are games for which the system dynamics and the utility functions are polynomials of degree 1 with respect to the state variables and which satisfy a certain property (described below) concerning the interaction between control variables and state variables. We call this class of games linear state games.). This shows that these games have the property that their open-loop Nash equilibrium are Markov perfect. The class of linear state games has a very useful property. The linearity in the state variables together with the decoupled structure between the state variables and the control variables implies that the open-loop equilibrium is Markov perfect and that the value functions are linear in the state variables.
It is obvious to demonstrate that the optimal emissions control of player i for an initial differential game model is given by
u ˜ i ( t ) = b i i = 1 n d i ( T t ) , i N .
To obtain the cooperative state trajectory for the initial differential game, it suffices to insert u ˜ i ( t ) in (20) into the dynamics and to solve the differential equation to get
x ˜ * ( t ) = x 0 + ( i = 1 n b i n i = 1 n d i T ) ( t t 0 ) + n i = 1 n d i t 2 t 0 2 2 .

4.2. A Pollution Control Game Model with Continuous Updating

In the game Γ ( x , t , t + T ¯ ) , the dynamics of the total amount of pollution x t ( s ) is described by
x t ˙ ( s ) = i = 1 n u i ( t , s ) , x t ( t ) = x ,
in which we assume that the absorption coefficient corresponding to the natural purification of the atmosphere is equal to zero.
The instantaneous payoff of i - t h player is defined as
R i ( u i ( t , s ) ) = b i u i ( t , s ) 1 2 u i 2 ( t , s ) , i N .
Due to decontamination, each player is compelled to bear the cost. Therefore, the instantaneous utility of the i - t h player is equal to R i ( u i ( t , s ) ) d i x t ( s ) , where d i > 0 .
Thus the payoff of the player i is defined as
K i t ( x , t , t + T ¯ ; u ) = t t + T ¯ ( ( b i 1 2 u i ) u i d i x t ) d s ,
where u i = u i ( t , s ) is the control of the player i at the instant time s [ t , t + T ¯ ] , x t = x t ( s ) is the pollution accumulation at the same time s.
Therefore, the payoff function of the player i in the subgame with continuous updating Γ ( x , s , t + T ¯ ) is given by
K i t ( x , s , t + T ¯ ; u ) = s t + T ¯ ( ( b i 1 2 u i ( t , τ ) ) u i ( t , τ ) d i x t ( τ ) ) d τ , i N ,
where x t ( τ ) , u ( t , τ ) , and τ [ s , t + T ¯ ] are both the trajectory and strategies in game Γ ( x , s , t + T ¯ ) . The dynamics of the state is given by
x ˙ t ( τ ) = i = 1 n u i ( t , τ ) , x t ( s ) = x t , s .
Step 1: Optimizing the total payment of the grand coalition with continuous updating.
Consider the game in a cooperative form. This means that all players will work together to maximize their total payoff. We seek the optimal profile of strategies u ˜ * ( t , s ) = ( u ˜ 1 * ( t , s ) , . . . , u ˜ n * ( t , s ) ) such that i = 1 n K i t max u 1 , u 2 , . . . , u n .
The optimization problem is as follows,
i = 1 n K i t ( x , t , t + T ¯ ; u ) = i = 1 n t t + T ¯ ( ( b i 1 2 u i ( t , s ) ) u i ( t , s ) d i x t ( s ) ) d s max u 1 , u 2 , . . . , u n s . t . x t ( s ) s a t i s f i e s ( 22 ) .
In order to deal with the problem (24), we use the classical Pontryagin maximum principle. The corresponding Hamiltonian is
H t ( s , x t ( s ) , u ( t , s ) , ψ t ( s ) ) = i = 1 n ( b i 1 2 u i ) u i i = 1 n d i x t + ψ t ( s ) ( u 1 + u 2 + . . . + u n ) .
The first order partial derivatives w.r.t. u i ’s are
H t u i ( s , x t , u , ψ t ) = b i u i + ψ t = 0 ,
and the Hessian matrix 2 H t u 2 ( s , x t , u , ψ t ) is negative definite, all at once, we can conclude that the Hamiltonian H t is concave w.r.t. u i . Here, we obtain the cooperative strategies:
u ˜ i * ( t , s ) = b i + ψ t ( s )
Considering the Pontryagin’s maximum principle, when dealing with the costate variable
ψ ˙ t ( s ) = i = 1 n d i , ψ t ( t + T ¯ ) = 0
if we set i = 1 n d i = d N , so, we can get ψ t ( s ) = d N ( t + T ¯ s ) . Finally, the form of the cooperative strategies is
u ˜ i * ( t , s ) = b i d N ( t + T ¯ s )
and from (22) we get the optimal (cooperative) trajectory:
x t * ( s ) = x + b N ( s t ) n d N ( t + T ¯ ) ( s t ) + n d N s 2 t 2 2 ,
where x = x t ( t ) , d N = i = 1 n d i , b N = i = 1 n b i .
According to the procedure (10), we construct open-loop optimal cooperative strategies with continuous updating:
u i * ( t ) = u ˜ i * ( t , s ) | s = t = b i d N T ¯ .
After substituting u i * ( t ) into the differential Equation (18), we can arrive at the optimal cooperative trajectory x * ( t ) with continuous updating:
x * ( t ) = x 0 + b N ( t t 0 ) n d N T ¯ ( t t 0 ) .
The results of the comparison of the cooperative strategies, corresponding trajectories between initial differential game model and the differential game with continuous updating obtained are graphically shown in Figure 1 and Figure 2.
From Figure 1 we can see that the optimal control with continuous updating is more stable than the optimal control in the initial game model. We can also see that, from the time t = 4 , the optimal control in the initial game is greater than it with continuous updating, which means players should increase the pollution emissions into the atmosphere in the initial differential game model, a harmful result. This occurs because in the initial game model, players have the whole information of the game on the interval [ t 0 , T ] , players are more cautious, and they dare not emit too much pollution at first. However, in real life, it is impossible to have the information for the whole time interval. Therefore, we consider the game with continuous updating, at each time instant t, players have the information only on [ t , t + T ¯ ] . In the case of continuous updating, the players are brave enough to emit more pollution because of lacking the information for the whole game.
We can see from Figure 2 that, starting from t = 0 to t = 8 , the pollution accumulation in the initial game model is less than the model with continuous updating. Because in the initial game model, players are more knowledgeable, they know the information from the whole time interval, which leads to lower pollution accumulation because the players are cautious. Starting from time t = 8 , pollution with continuous updating is less than pollution in the initial game because the knowledge for players in the model with continuous updating is close to the initial game model as time goes on. Using the continuous updating method can help us to make our modeling more consistent with the actual situation.
Next, for a given subgame Γ ( x * ( t ) , s , t + T ¯ ) of a differential game with continuous updating Γ ( x * ( t ) , t , t + T ¯ ) , the characteristic function for the grand coalition N is given by V ˜ t ( N ; x * ( t ) , s , t + T ¯ ) , which can be represented as
V ˜ t ( N ; x * ( t ) , s , t + T ¯ ) = i = 1 n s t + T ¯ ( ( b i 1 2 u ˜ i * ( t , τ ) ) u ˜ i * d i x t * ( τ ) ) d τ ,
where x t * ( τ ) satisfies (26) with x = x * ( t ) , u ˜ i * ( t , τ ) satisfies (25). Therefore, we can get the value function of the grand coalition N
V ˜ t ( N ; x * ( t ) , s , t + T ¯ ) = ( t + T ¯ s ) [ 1 2 b ˜ N d N x * ( t ) + d N b N 2 ( t T ¯ s ) 1 3 n d N 2 ( ( t + T ¯ s ) 2 3 2 T ¯ 2 ) ] ,
where d N = i = 1 n d i , b N = i = 1 n b i has been defined above and b ˜ N = i = 1 n b i 2 . Note that in (29), x * ( t ) represents the cooperative pollution with continuous updating at the time t.
For our problem, we can also use the dynamic programming method based on the Hamilton–Jacobi–Bellman equation. It is straightforward to verify the Bellman function of the form V ˜ t = A ( t , s ) x t ( s ) + B ( t , s ) , and we get the same result as Pontryagin maximum principle.
Step 2: The computation of the generalized open-loop Nash equilibrium with continuous updating.
The Hamiltonian for each player i = 1 , 2 , , n is
H i t ( τ , x t ( τ ) , u ( t , τ ) , ψ i t ( τ ) ) = ( b i 1 2 u i ) u i i = 1 n d i x t + ψ i t ( τ ) ( u 1 + u 2 + + u n )
its first-order partial derivatives w.r.t. u i ’s are
H i t u i ( τ , x t ( τ ) , u ( t , τ ) , ψ i t ( τ ) ) = b i u i + ψ i t = 0 ,
and the Hessian matrix 2 H i t u i 2 ( τ , x t , u , ψ i t ) is the negative definite whence we conclude that the Hamiltonian H i t is concave w.r.t. u i . We obtain optimal controls
u ˜ i N E ( t , τ ) = b i d i ( t + T ¯ τ ) , i = 1 , 2 , , n .
As for the subgame start at time instant s [ t , t + T ¯ ] , we can easily derive the corresponding trajectory (for the Nash equilibrium case) of subgame Γ ( x * ( t ) , s , t + T ¯ ) along the cooperative trajectory, in other words x t , s = x t * ( s ) is
x t N E ( τ ) = x t * ( s ) + b N ( τ s ) + d N ( ( t + T ¯ τ ) 2 2 ( t + T ¯ s ) 2 2 ) = x * ( t ) + b N ( τ t ) n d N ( t + T ¯ ) ( s t ) + n d N s 2 t 2 2 + d N 2 ( ( t + T ¯ τ ) 2 ( t + T ¯ s ) 2 ) .
The maximum of the payoff for each player i = 1 , 2 , , n in the subgame starting from the time instant s and the state x * ( t ) has the form
V ˜ t ( { i } ; x * ( t ) , s , t + T ¯ ) = s t + T ¯ ( ( b i 1 2 u ˜ i N E ( t , τ ) ) u ˜ i N E d i x t N E ( τ ) ) d τ = ( t + T ¯ s ) [ b i 2 2 d i x * ( t ) + n d i d N 2 ( s t ) ( t s + 2 T ¯ ) 1 6 d i 2 ( t + T ¯ s ) 2 + 1 3 d i d N ( t + T ¯ s ) 2 1 2 d i b N ( s + T ¯ t ) ] .
Step 3: The computation of the characteristic function for all remaining possible coalitions in differential games with continuous updating.
It is possible to calculate the controls and the corresponding value functions for different coalitions. Nonetheless, the form will depend on how we define their respective optimal control problems.
Let us build up the characteristic function hinged on the approach of δ —c.f. The characteristic function of coalition S is calculated in two stages: the first stage is already done (we have already found the Nash equilibrium strategies for each player in Step 2); at the second stage, it is assumed that the remaining players j N \ S carry out their Nash optimal strategies u ˜ j N E ( t , τ ) although the players from coalition S explore to make their joint payoff i S K i maximal. Consider the case of S-coalition. It seems constructive to perform calculations in detail. The respective Hamiltonian for the coalition S is
H S t ( τ , x t ( τ ) , u ( t , τ ) , ψ S t ( τ ) ) = i S ( ( b i 1 2 u i ) u i ) d S x t + ψ S t ( i S u i + j N / S u ˜ j N E ) ,
where d S = i S d i . Note that we substituted u j , j N / S by u ˜ j N E which was found earlier.
The optimal strategies of players in coalition S are u ˜ S * ( t , τ ) , which satisfies
u ˜ i * ( t , τ ) = b i + ψ S t ( τ ) , i S .
The differential equation for ψ S t ( τ ) is ψ ˙ S t ( τ ) = d S which is solved to ensure that ψ S t ( t + T ¯ ) = 0 . Eventually, we get ψ S t ( τ ) = d S ( t + T ¯ τ ) . We substitute the obtained expression for ψ S t ( τ ) into u ˜ i * , and then get
u ˜ i * ( t , τ ) = b i d S ( t + T ¯ τ ) , i S
where d S = i S d i . We see that the player out of coalition S implements their optimal strategy u ˜ S * while the left out players adhere to their Nash equilibrium.
In the next step, we integrate (23) start from the point x t * ( s ) to get x t S ( τ ) :
x t S ( τ ) = x t * ( s ) + b N ( τ s ) + ( t + T ¯ τ ) 2 2 ( ( k 1 ) d S + d N ) ( t + T ¯ s ) 2 2 ( ( k 1 ) d S + d N ) .
x t S ( τ ) is the trajectory in the subgame Γ ( x * ( t ) , s , t + T ¯ ) of the game Γ ( x * ( t ) , t , t + T ¯ ) starting at time instant s at x t * ( s ) , when players from coalition S use strategies u ˜ S * ( t , τ ) , and players from coalition N \ S use u ˜ N E ( t , τ ) . If we consider the case taken along the cooperative trajectory, then we can substitute the x t * ( s ) we have already obtained in (26) to get the state variable x t S ( τ ) that depends on x = x * ( t ) .
x t S ( τ ) = x * ( t ) + b N ( τ t ) n d N ( t + T ¯ ) ( s t ) + n d N s 2 t 2 2 + ( t + T ¯ τ ) 2 2 ( ( k 1 ) d S + d N ) ( t + T ¯ s ) 2 2 ( ( k 1 ) d S + d N ) .
The respective value of the characteristic function V ˜ t ( S ; x * ( t ) , s , t + T ¯ ) is
V ˜ t ( S ; x * ( t ) , s , t + T ¯ ) = ( t + T ¯ s ) [ 1 2 b ˜ S d S x * ( t ) + d S d N n 2 ( t + 2 T ¯ s ) ( s t ) + ( t + T ¯ s ) 2 ( k 2 6 d S 2 + d S d N 3 ) d S b N 2 ( s + T ¯ t ) ] .
According to Definition 4, the characteristic function of a differential game with continuous updating has the following form,
V ( S ; x * ( t ) , t , T ) = t T d V ˜ τ ( S ; x * ( τ ) , s , τ + T ¯ ) d s | s = τ d τ = ( T t ) [ 1 2 b ˜ S d S x 0 + T ¯ 2 ( k 2 2 d S 2 + ( 1 n ) d S d N ) d S ( b N n d N T ¯ ) 2 ( T + t 2 t 0 ) ]
Check the superadditivity condition (16) for constructed characteristic function
V ( S ; x * ( t ) , t , T ) . It turns out that for any S , P N and S P = , let | S | = k 1 , | P | = m 1 the following holds,
V ( S P ; x * ( t ) , t , T ) V ( S ; x * ( t ) , t , T ) V ( P ; x * ( t ) , t , T ) = T ¯ 2 [ ( k + m 2 ) d S d P + k 2 d P 2 + m 2 d S 2 ] 0 .
Thus, the δ -characteristic function V ( S ; x * ( t ) , t , T ) is a superadditive function without any additional conditions applied to the parameters of the model.
In the following figure, we will compare the characteristic function for the grand coalition N between the initial game model and a differential game with continuous updating.
Figure 3 demonstrates the reason accounting for why the value of a characteristic function in the initial model is greater than that of continuous updating is that the complexity of the information within a continuous updating setting can reduce the effectiveness of the coalition. It should be noted that the continuous updating case is more realistic. We can conclude that the payoff of the coalition decreases because, as time goes on, pollution accumulates in the air. The player’s payoff depends on levels of pollution and payoff decreases as pollution increases. It should also be noted that the coalition’s effectiveness decreases in the initial game model at a faster rate than it does with continuous updating.
Step 4: Compute the Shapley value based on the characteristic function with continuous updating.
Any of the known principles of optimality can be applied to find a cooperative solution. First of all, the notion d j d l (Here we should note that d k d j = d j d k . ) represents the interaction of cost among players. Now, consider the cooperative solution of a differential game with continuous updating. According to procedure (17), we construct the Shapley value for any i N with continuous updating using the characteristic function with continuous updating of the auxiliary subgame and get
s h i ( x * ( t ) , t , T ) = K N , i K ( k 1 ) ! ( n k ) ! n ! [ V ( K ; x * ( t ) , t , T ) V ( K \ i ; x * ( t ) , t , T ) ] = ( T t ) [ d i x 0 + 1 2 b i 2 + n d i d N T ¯ d i b N 2 ( T + t 2 t 0 ) + T ¯ 2 ( 1 2 n 3 d i d N 4 + n 12 d i 2 + 1 3 ( j , l i j l N d j d l ) + 1 4 d ˜ N ) ] .
The graphic representation of the Shapley value for subgames with continuous updating and the initial game model along the optimal cooperative trajectory x * ( t ) is demonstrated in Figure 4.
Figure 4 shows that if we consider the problem in a more realistic case (continuous updating), a player with continuous updating can get less allocation from the coalition than they get from the initial game model. This is based on the fact that, at an early stage, the pollution emitted into the atmosphere is more than in the initial game model, and in the latter stage the pollution with continuous updating is less than in the initial game model. Thus, starting with t = 0 , players get more in the initial game model, but in the same period, they all get 0 in the end. This shows that as pollution intensifies the benefits countries receive from its attendant production gradually decrease.

5. Conclusions

In this paper, we presented the detailed consideration of a cooperative differential game model with continuous updating based on Pontryagin maximum principle, where the decision-maker updates his/her behavior based on the new information available which arises from a shifting time horizon. The characteristic function with continuous updating obtained by using the Pontryagin maximum principle for the cooperative case is constructed. The results show that the δ -characteristic function computed for the game is superadditive and does not have any other restrictions on the model’s parameters. The concept of the Shapley value as a cooperative solution with continuous updating is demonstrated in an analytic form for pollution control problems. Ultimately, considering the example of n-player pollution control, optimal strategies, the corresponding trajectory, the characteristic function, and the Shapley value with continuous updating are conceived for the proposed application and graphically compared for their effectiveness. We showed simulation results that show the applicability of the approach.
The practical significance of the work is determined by the fact that the real life conflict controlled processes evolve continuously in time and the players usually are not or cannot use full information about it. Therefore, it is important to introduce the type of differential games with information updating to the field of game theory. Another important practical contribution of the continuous updating approach is the creation of a class of inverse optimal control problems with continuous updating [17]. Problems that can be used to analyze a profile of the human in the human-machine type of engineering systems. The results are illustrated on the model of a driver assistance system and are applied to the real driving data from the simulator located in the Institute of Control Systems, Karlsruhe Institute of Technology. Our method can provide more in-depth modeling of human engineering systems.

Author Contributions

Conceptualization, J.Z.; Data curation, A.T.; Formal analysis, A.T.; Funding acquisition, O.P.; Investigation, O.P.; Methodology, J.Z.; Project administration, H.G.; Resources, H.G.; Software, J.Z.; Supervision, O.P. and H.G.; Validation, A.T.; Visualization, H.G.; Writing—original draft, J.Z.; Writing—review and editing, O.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Postdoctoral International Exchange Program of China and funded by the Russian Foundation for Basic Research (RFBR) according to the Grant No. 18-00-00727 (18-00-00725), and the National Natural Science Foundation of China (Grant No. 71571108).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The corresponding author would like to acknowledge the support from the China-Russia Operations Research and Management Cooperation Research Center, that is an association between Qingdao University and St. Petersburg State University.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Carlson, D.A.; Leitmann, G. An Extension of the Coordinate Transformation Method for Open-Loop Nash Equilibria. J. Optim. Theory Appl. 2004, 123, 27–47. [Google Scholar] [CrossRef]
  2. Petrosjan, L. Agreeable Solutions in Differential Games. Int. J. Math. Game Theory Algebra 1997, 3, 165–177. [Google Scholar]
  3. Petrosian, O.L. Looking Forward Approach in Cooperative Differential Games. Int. Game Theory Rev. 2016, 18, 1–20. [Google Scholar]
  4. Petrosian, O.L. Looking Forward Approach in Cooperative Differential Games with infinite-horizon. Vestnik St.-Peterbg. Univ. Ser. 2016, 4, 18–30. [Google Scholar] [CrossRef]
  5. Petrosian, O.L.; Barabanov, A.E. Looking Forward Approach in Cooperative Differential Games with Uncertain-Stochastic Dynamics. J. Optim. Theory Appl. 2017, 172, 328–347. [Google Scholar] [CrossRef]
  6. Gromova, E.; Petrosian, O.L. Control of information horizon for cooperative differential game of pollution control. In Proceedings of the International Conference Stability and Oscillations of Nonlinear Control Systems, Moscow, Russia, 1–3 June 2016. [Google Scholar]
  7. Petrosian, O.L. About the Looking Forward Approach in Cooperative Differential Games with Transferable Utility. In Static and Dynamic Game Theory: Foundations and Applications; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
  8. Petrosian, O.L.; Nastych, M.; Volf, D. Non-cooperative Differential Game Model of Oil Market with Looking Forward Approach. In Frontiers of Dynamic Games; Birkhäuser: Cham, Switzerland, 2018; pp. 189–202. [Google Scholar]
  9. Petrosian, O.L.; Shi, L.; Li, Y.; Gao, H. Moving Information Horizon Approach for Dynamic Game Models. Mathematics 2019, 7, 1239. [Google Scholar] [CrossRef] [Green Version]
  10. Petrosian, O.L.; Tur, A. Hamilton-Jacobi-Bellman Equations for Non-cooperative Differential Games with Continuous Updating. Commun. Comput. Inform. Sci. 2019, 1090, 178–191. [Google Scholar]
  11. Petrosian, O.L.; Tur, A.; Wang, Z. Cooperative differential games with continuous updating using Hamilton–Jacobi–Bellman equation. Optim. Methods Softw. 2020, 1275, 256–270. [Google Scholar] [CrossRef]
  12. Petrosian, O.L.; Tur, A.; Zhou, J. Pontryagin’s Maximum Principle for Non-cooperative Differential Games with Continuous Updating. Commu. Comput. Inform. Sci. 2020, 1275, 256–270. [Google Scholar]
  13. Kuchkarov, I.; Petrosian, O.L. On class of linear quadratic non-cooperative differential games with continuous updating. Lect. Notes Comput. Sci. 2019, 11548, 635–650. [Google Scholar]
  14. Kuchkarov, I.; Petrosian, O.L. Open-Loop Based Strategies for Autonomous Linear Quadratic Game Models with Continuous Updating. Lect. Notes Comput. Sci. 2020, 12095, 212–230. [Google Scholar]
  15. Wang, Z.; Petrosian, O.L. On class of non-transferable utility cooperative differential games with continuous updating. J. Dyn. Games 2020, 7, 291–2302. [Google Scholar] [CrossRef]
  16. Gromova, E. The Shapley Value as a Sustainable Cooperative Solution in Differential Games of Three Players. In Recent Advances in Game Theory and Applications; Petrosyan, L., Mazalov, V., Eds.; Springer: Petrozavodsk, Russia, 2015; pp. 67–89. [Google Scholar]
  17. Petrosian, O.; Inga, J.; Kuchkarov, I.; Flad, M.; Hohmann, S. Optimal Control and Inverse Optimal Control with Continuous Updating for Human Behavior Modeling (to be published). IFAC-PapersOnLine 2020, 7, 291–2302. [Google Scholar]
  18. Goodwin, G.; Seron, M.; Dona, J. Constrained Control and Estimation: An Optimisation Approach; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  19. Kwon, W.; Han, S. Receding Horizon Control: Model Predictive Control for State Models; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  20. Rawlings, J.; Mayne, D. Model Predictive Control: Theory and Design; Nob Hill Publishing, LLC.: Madison, WI, USA, 2009. [Google Scholar]
  21. Wang, L. Model Predictive Control System Design and Implementation Using MATLAB; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  22. Bemporad, A.; Morari, M.; Dua, V.; Pistikopoulos, E. The explicit linear quadratic regulator for constrained systems. Automatica 2002, 38, 3–20. [Google Scholar] [CrossRef]
  23. Hempel, A.; Goulart, P.; Lygeross, J. Inverse Parametric Optimization With an Application to Hybrid System Control. IEEE Trans. Automat. Control 2015, 60, 1064–1069. [Google Scholar] [CrossRef]
  24. Kwon, W.; Bruckstein, A.; Kailath, T. Stabilizing state-feedback design via the moving horizon method. In Proceedings of the 21st IEEE Conference on Decision and Control, Orlando, FL, USA, 8–10 December 1982; Volume 21, pp. 234–239. [Google Scholar]
  25. Kwon, W.; Pearson, A. A modified quadratic cost problem and feedback stabilization of a linear system. IEEE Trans. Automat. Control 1977, 22, 838–842. [Google Scholar] [CrossRef]
  26. Mayne, D.; Michalska, H. Receding horizon control of nonlinear systems. IEEE Trans. Automat. Control 1990, 35, 814–824. [Google Scholar] [CrossRef]
  27. Shaw, L. Nonlinear control of linear multivariable systems via state-dependent feedback gains. IEEE Trans. Automat. Control 1979, 24, 108–112. [Google Scholar] [CrossRef]
  28. Vasin, A.A.; Divtsova, A.G. Game-theoretic model of agreement on limitation of transboundary atmospheric pollution. Matematicheskaya Teoriya Igr Prilozheniya 2017, 9, 27–44. [Google Scholar]
  29. Vasin, A.A.; Divtsova, A.G. The repeated game modelling an agreement on protection of the environment. In Proceedings of the VIII Moscow International Conference on Operations Research (ORM2018), Omsk, Russia, 8–14 July 2018; Volume 1, pp. 261–263. [Google Scholar]
  30. Tolwinski, B.; Haurie, A.; Leitmann, G. Cooperative equilibria in differential games. J. Math. Anal. Appl. 1986, 119, 182–202. [Google Scholar] [CrossRef]
  31. Muthoo, A.; Osborne, M.J.; Rubinstein, A. A Course in Game Theory. Economica 1996, 63, 164–165. [Google Scholar] [CrossRef]
  32. Von Neumann, J.; Morgenstern, O. Theory of Games and Economic Behavior, 1st ed.; Princeton University Press: Princeton, NJ, USA, 1944. [Google Scholar]
  33. Von Neumann, J.; Morgenstern, O. The characteristic function. In Theory of Games and Economic Behavior, 2nd ed.; Princeton University Press: Princeton, NJ, USA, 1947; pp. 238–242. [Google Scholar]
  34. Reddy, P.V.; Zaccour, G. A friendly computable characteristic function. Math. Soc. Sci. 2016, 82, 18–25. [Google Scholar] [CrossRef]
  35. Gromova, E.; Petrosjan, L. On an approach to constructing a characteristic function in cooperative differential games. Automat. Remote Control 2017, 78, 1680–1692. [Google Scholar] [CrossRef]
  36. Chander, P.; Tulkens, H. The core of an economy with multilateral environmental externalities. Int. J. Game Theory 1997, 26, 379–401. [Google Scholar] [CrossRef]
  37. Petrosjan, L.; Zaccour, G. Time-consistent Shapley value allocation of pollution cost reduction. J. Econom. Dyn. Control 2003, 27, 381–398. [Google Scholar] [CrossRef]
  38. Shapley, L.S. A value for n-persons games. Ann. Math. Stud. 1953, 28, 307–318. [Google Scholar]
  39. Başar, T. Dynamic Noncooperative Game Theory, 2nd ed.; SIAM: Philadelphia, PA, USA, 1999; Volume 23. [Google Scholar]
  40. Leitmann, G.; Schmitendorf, W. Some Sufficiency Conditions for Pareto-Optimal Control. J. Dyn. Syst. Meas. Control 1973, 95, 356–361. [Google Scholar] [CrossRef]
  41. Long, N.V. Pollution control: A differential game approach. Ann. Operat. Res. 1992, 37, 283–296. [Google Scholar] [CrossRef]
Figure 1. Cooperative strategies in the initial game u ˜ ( t ) (red line) and cooperative strategies with continuous updating u * ( t ) (blue line).
Figure 1. Cooperative strategies in the initial game u ˜ ( t ) (red line) and cooperative strategies with continuous updating u * ( t ) (blue line).
Mathematics 09 00163 g001
Figure 2. Cooperative trajectory in the initial game x ˜ ( t ) (red line) and cooperative trajectory with continuous updating x * ( t ) (blue line).
Figure 2. Cooperative trajectory in the initial game x ˜ ( t ) (red line) and cooperative trajectory with continuous updating x * ( t ) (blue line).
Mathematics 09 00163 g002
Figure 3. The value of the characteristic function of a coalition N in the initial game (red line) and the value of the characteristic function of a coalition N with continuous updating V * ( N , t , T ) (blue line).
Figure 3. The value of the characteristic function of a coalition N in the initial game (red line) and the value of the characteristic function of a coalition N with continuous updating V * ( N , t , T ) (blue line).
Mathematics 09 00163 g003
Figure 4. The Shapley value of player i in the initial game (red line) and the Shapley value with continuous updating s h i * ( t ) (blue line).
Figure 4. The Shapley value of player i in the initial game (red line) and the Shapley value with continuous updating s h i * ( t ) (blue line).
Mathematics 09 00163 g004
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhou, J.; Tur, A.; Petrosian, O.; Gao, H. Transferable Utility Cooperative Differential Games with Continuous Updating Using Pontryagin Maximum Principle. Mathematics 2021, 9, 163. https://doi.org/10.3390/math9020163

AMA Style

Zhou J, Tur A, Petrosian O, Gao H. Transferable Utility Cooperative Differential Games with Continuous Updating Using Pontryagin Maximum Principle. Mathematics. 2021; 9(2):163. https://doi.org/10.3390/math9020163

Chicago/Turabian Style

Zhou, Jiangjing, Anna Tur, Ovanes Petrosian, and Hongwei Gao. 2021. "Transferable Utility Cooperative Differential Games with Continuous Updating Using Pontryagin Maximum Principle" Mathematics 9, no. 2: 163. https://doi.org/10.3390/math9020163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop