Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Strategic negotiations for extensive-form games

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

When studying extensive-form games it is commonly assumed that players make their decisions individually. One usually does not allow the possibility for the players to negotiate their respective strategies and formally commit themselves to future moves. As a consequence, many non-zero-sum games have been shown to have equilibrium outcomes that are suboptimal and arguably counter-intuitive. For this reason we feel there is a need to explore a new line of research in which game-playing agents are allowed to negotiate binding agreements before they make their moves. We analyze what happens under such assumptions and define a new equilibrium solution concept to capture this. We show that this new solution concept indeed yields solutions that are more efficient and, in a sense, closer to what one would expect in the real world. Furthermore, we demonstrate that our ideas are not only theoretical in nature, but can also be implemented on bounded rational agents, with a number of experiments conducted with a new algorithm that combines techniques from Automated Negotiations, (Algorithmic) Game Theory, and General Game Playing. Our algorithm, which we call Monte Carlo Negotiation Search, is an adaptation of Monte Carlo Tree Search that equips the agent with the ability to negotiate. It is completely domain-independent in the sense that it is not tailored to any specific game. It can be applied to any non-zero-sum game, provided that its rules are described in Game Description Language. We show with several experiments that it strongly outperforms non-negotiating players, and that it closely approximates the theoretically optimal outcomes, as defined by our new solution concept.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://sanchoggp.blogspot.co.uk/2014/05/what-is-sancho.html.

  2. https://bitbucket.org/rxe/galvanise_v2.

  3. The update function does not always need to be defined completely, because it is only relevant on those triples \((w,a_0, a_1)\) for which \(a_0\) and \(a_1\) are legal in w anyway.

  4. It is easier and more custom to define the Prisoner’s Dilemma as a normal-form game, but the point is that we want to give a simple example of an extensive-form game.

  5. http://games.ggp.org.

  6. In the case the negotiator is bounded rational we should say that it would never accept any proposal it expects to yield utility less than its reservation value.

  7. Strictly speaking \( SPE _i\) and \( NV _i\) are not defined for the Iterated Prisoner’s Dilemma, because it is not a turn-taking game, but we show in “Appendix A” how this is can be resolved with a minor adaptation.

  8. These experiments have already been published previously in [11].

References

  1. Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211(4489), 1390–1396.

    Article  MathSciNet  MATH  Google Scholar 

  2. Baarslag, T., Hindriks, K., Jonker, C. M., Kraus, S., & Lin, Raz. (2010). The first automated negotiating agents competition (ANAC 2010). In T. Ito, M. Zhang, V. Robu, S. Fatima, & Tokuro Matsuo (Eds.), New trends in agent-based complex automated negotiations, series of studies in computational intelligence. Berlin: Springer.

    Google Scholar 

  3. Cazenave, T., & Saffidine, A. (2010). Score bounded Monte-Carlo tree search. In H. J. Van Den Herik, H. Iida, & A. Plaat (Eds.), Computers and games—7th international conference, CG 2010, Kanazawa, Japan, September 24–26, 2010, revised selected papers (Vol. 6515, pp. 93–104)., Lecture notes in computer science Berlin: Springer.

    Google Scholar 

  4. Chevaleyre, Y., Dunne, P. E., Endriss, U., Lang, J., Lemaître, Michel, Maudet, Nicolas, et al. (2006). Issues in multiagent resource allocation. Informatica (Slovenia), 30(1), 3–31.

    MATH  Google Scholar 

  5. de Jonge, D., Baarslag, T., Aydoğan, R., Jonker, C., Fujita, K., & Ito, T. (2019). The challenge of negotiation in the game of diplomacy. In M. Lujak (Ed.), Agreement technologies 2018, revised selected papers (pp. 100–114). Cham: Springer.

    Google Scholar 

  6. de Jonge, D. (2015). Negotiations over Large Agreement Spaces. PhD thesis, Universitat Autònoma de Barcelona.

  7. de Jonge, D., & Sierra, Carles. (2015). NB3: a multilateral negotiation algorithm for large, non-linear agreement spaces with limited time. Autonomous Agents and Multi-Agent Systems, 29(5), 896–942.

    Article  Google Scholar 

  8. de Jonge, D., & Sierra, C. (2016). GANGSTER: an automated negotiator applying genetic algorithms. In N. Fukuta, T. Ito, M. Zhang, K. Fujita, & V. Robu (Eds.), Recent advances in agent-based complex automated negotiation, studies in computational intelligence (pp. 225–234). Berlin: Springer.

    Google Scholar 

  9. de Jonge, D., & Sierra, C. (2017). D-brane: a diplomacy playing agent for automated negotiations research. Applied Intelligence, 47(1), 1–20.

    Article  Google Scholar 

  10. de Jonge, D., & Zhang, D. (2016). Using GDL to represent domain knowledge for automated negotiations. In N. Osman & C. Sierra (Eds.), Autonomous agents and multiagent systems: AAMAS 2016 workshops, visionary papers, Singapore, May 9–10, 2016, revised selected papers (pp. 134–153). Cham: Springer.

    Chapter  Google Scholar 

  11. de Jonge, D., & Zhang, D> (2017). Automated negotiations for general game playing. In Larson, K., Winikoff, M., Das, S., & Durfee, E. (Eds.), Proceedings of the 16th conference on autonomous agents and multiagent systems, AAMAS 2017, São Paulo, Brazil, May 8–12, 2017, pp. 371–379. ACM.

  12. Ephrati, E., Kraus, S., & Lehman, D. (1989). An automated diplomacy player. In D. Levy & D. Beal (Eds.), Heuristic programming in artificial intelligence: the 1st computer olympia (pp. 134–153). Hemel Hempstead: Ellis Horwood Limited.

    Google Scholar 

  13. Fabregues, A. (2012). Facing the challenge of automated negotiations with humans. PhD thesis, Universitat Autònoma de Barcelona.

  14. Fabregues, A., & Sierra, C. (2011). Dipgame: a challenging negotiation testbed. Engineering Applications of Artificial Intelligence, 24(7), 1137–1146.

    Article  Google Scholar 

  15. Faratin, Peyman, Sierra, Carles, & Jennings, Nicholas R. (1998). Negotiation decision functions for autonomous agents. Robotics and Autonomous Systems, 24(3–4), 159–182. Multi-Agent Rationality.

    Article  Google Scholar 

  16. Faratin, P., Sierra, C., & Jennings, N. R. (2000). Using similarity criteria to make negotiation trade-offs. In International conference on multi-agent systems, ICMAS’00, pp. 119–126.

  17. Fatima, S., Wooldridge, M., & Jennings, N. R. (2009). An analysis of feasible solutions for multi-issue negotiation involving nonlinear utility functions. In Proceedings of the 8th international conference on autonomous agents and multiagent systems—volume 2, AAMAS ’09, international foundation for autonomous agents and multiagent systems. pp. 1041–1048, Richland, SC, 2009.

  18. Ferreira, A., Cardoso, H. L., & Reis, L. Pís (2015). Dipblue: a diplomacy agent with strategic and trust reasoning. In 7th international conference on agents and artificial intelligence (ICAART 2015), pp. 398–405.

  19. Finnsson, H. (2012). Simulation-based general game playing. PhD thesis, School of Computer Science, Reykjavik University.

  20. Gal, Y., Grosz, B., Kraus, S., Pfeffer, A., & Shieber, S. (2010). Agent decision-making in open-mixed networks. Artificial Intelligence, 174(18), 1460–1480.

    Article  MathSciNet  Google Scholar 

  21. Genesereth, M., Love, N., & Pell, B. (2005). General game playing: overview of the AAAI competition. AI Magazine, 26(2), 62–72.

    Google Scholar 

  22. Ito, T., Klein, M., & Hattori, Hiromitsu. (2008). A multi-issue negotiation protocol among agents with nonlinear utility functions. Multiagent Grid System, 4, 67–83.

    Article  MATH  Google Scholar 

  23. Iyer, K., & Huhns, Michael N. (2009). Negotiation criteria for multiagent resource allocation. The Knowledge Engineering Review, 24(2), 111–135.

    Article  Google Scholar 

  24. Knuth, D. E., & Moore, Ronald W. (1975). An analysis of alpha-beta pruning. Artificial Intelligence, 6(4), 293–326.

    Article  MathSciNet  MATH  Google Scholar 

  25. Kocsis, L., & Szepesvári, C. (2006). Bandit based monte-carlo planning. In Proceedings of the 17th European conference on machine learning, ECML’06, pp. 282–293. , Berlin, Heidelberg: Springer.

  26. Kraus, S., & Lehmann, D. (1995). Designing and building a negotiating automated agent. Computational Intelligence, 11, 132–171.

    Article  Google Scholar 

  27. Lin, R., Kraus, S., Baarslag, T., Tykhonov, D., Hindriks, Koen, & Jonker, Catholijn M. (2014). Genius: an integrated environment for supporting the design of generic automated negotiators. Computational Intelligence, 30(1), 48–70.

    Article  MathSciNet  Google Scholar 

  28. Love, N., Genesereth, M., & Hinrichs, T. (2006). General game playing: game description language specification. Technical report LG-2006-01, Stanford University, Stanford, CA. http://logic.stanford.edu/reports/LG-2006-01.pdf.

  29. Marsa-Maestre, I., Lopez-Carmona, M. A., Velasco, J. R., & de la Hoz, E. (2009). Effective bidding and deal identification for negotiations in highly nonlinear scenarios. In Proceedings of the 8th international conference on autonomous agents and multiagent systems—volume 2, International foundation for autonomous agents and multiagent systems, AAMAS ’09, pp. 1057–1064, Richland, SC, 2009.

  30. Marsa-Maestre, I., Lopez-Carmona, M. A., Velasco, J. R., Ito, T., Klein, M., & Fujita, K. (2009). Balancing utility and deal probability for auction-based negotiations in highly nonlinear utility spaces. In Proceedings of the 21st international jont conference on artifical intelligence, IJCAI’09, pp. 214–219, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2009.

  31. Nash, J. F. (1950). The bargaining problem. Econometrica, 18, 155–162.

    Article  MathSciNet  MATH  Google Scholar 

  32. Osborne, M. J., & Rubinstein, A. (1990). Bargaining and markets. Cambridge: Academic Press.

    MATH  Google Scholar 

  33. Osborne, M. J., & Rubinstein, A. (1994). A course in game theory. Cambridge: MIT Press.

    MATH  Google Scholar 

  34. Pan, L., Luo, X., Meng, X., Miao, C., He, M., & Guo, X. (2013). A two-stage win–win multiattribute negotiation model: optimization and then concession. Computational Intelligence, 29(4), 577–626.

    Article  MathSciNet  MATH  Google Scholar 

  35. Rosenschein, J. S., & Zlotkin, G. (1994). Rules of encounter. Cambridge: The MIT Press.

    MATH  Google Scholar 

  36. Rosenthal, R. W. (1981). Games of perfect information, predatory pricing and the chain-store paradox. Journal of Economic Theory, 25(1), 92–100.

    Article  MathSciNet  MATH  Google Scholar 

  37. Schiffel, S., & Thielscher, M. (2007). Fluxplayer: A successful general game player. In In: Proceedings of the AAAI national conference on artificial intelligence, pp. 1191–1196. AAAI Press.

  38. Serrano, R. (2008). Bargaining. In S. N. Durlauf & L. Blume (Eds.), The new palgrave dictionary of economics. Basingstoke: Palgrave Macmillan.

    Google Scholar 

  39. Shubik, Martin. (1971). The dollar auction game: a paradox in noncooperative behavior and escalation. The Journal of Conflict Resolution, 15(1), 109–111.

    Article  Google Scholar 

  40. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.

    Article  Google Scholar 

  41. Thielscher, M. (2010). A general game description language for incomplete information games. In Proceedings of the twenty-fourth aaai conference on artificial intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11–15, 2010.

  42. von Neumann, J. (1959). On the theory of games of strategy. In A. W. Tucker & R. D. Luce (Eds.), Contributions to the theory of games (pp. 13–42). Princeton: Princeton University Press.

    Google Scholar 

  43. Zhang, Dongmo, & Thielscher, Michael (2015). A logic for reasoning about game strategies. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence (AAAI-15), pp. 1671–1677.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dave de Jonge.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

In this section we describe the other two games that we used for our experiments. Just as with the Centipede Game, their precise definitions may vary per text book, so we follow the definitions according to their GDL descriptions implemented by Sam Schreiber, which can be found in the GDL database at http://games.ggp.org.

1.1 The dollar auction

The Dollar Auction (DA) [39] is another classic game with a somewhat counter-intuitive outcome. The idea is that an auctioneer will put up a 100-dollar bill for auction. The players of the game may make increasing bids. The player with the highest bid will pay his bid, and receive the 100-dollar bill in return. The loser however, must also pay his bid, but will receive nothing in return.

Again, the counter-intuitive result is that the best strategy is to stop the game immediately and never bid more than 0 dollars. The problem is, that if \(\alpha _0\) bids 98 dollars, and \(\alpha _1\) bids 99 dollars, then if \(\alpha _0\) gives up he will lose 98 dollars. Therefore, he prefers to bid 100 dollars. However, this means \(\alpha _1\) will lose 99 dollars if he gives up, so he will bid 101 dollars, but then \(\alpha _0\) will lose 100 dollars, so he will bid 102 dollars. Clearly, both players are going to lose money, so they would have been better off simply bidding 0 dollars and no more.

The implementation of the DA in GDL is a bit different from the description above, but the idea is the same. In this case both players start with 80 dollars in their pockets, and they bid for a prize of 25 dollars. The game is a turn-taking game, and in each round the active player can either choose to ‘lay a claim to the prize’, which costs him 5 dollars, or to finish the game in which case the other player will receive the prize (if that other player has laid at least 1 claim).

We can formalize it as follows:

$$\begin{aligned} T= & {} \{t_k \mid k \in \{1,2 \dots 33\}\}\\ W= & {} \{w_k \mid k \in \{0,1 \dots 32\}\} \cup T\\ \mathcal {A}_0= & {} \mathcal {A}_1 = \{noop, lay\_claim, finish \}\\ L_i(w_k)= & {} {\left\{ \begin{array}{ll} \{noop\} &{} \ \text { if } \ i \ne k \pmod 2 \\ \{lay\_claim, finish \} &{} \ \text { if } \ i = k \pmod 2 \ \text { and }\ k \ne 32\\ \{ finish \} &{} \ \text { if } \ i = k \pmod 2 \ \text { and }\ k=32 \end{array}\right. } \end{aligned}$$

After playing lay_claim the next state will be another non-terminal state:

$$\begin{aligned} u(w_k, lay\_claim) = w_{k+1} \end{aligned}$$

After playing finish the next state will be a terminal state:

$$\begin{aligned} u(w_k, finish ) = t_{k+1} \end{aligned}$$

If we let \(cl_{i,k}\) denote the number of claims that have been made by player \(\alpha _i\) when the game ends in \(t_k\), then the utility functions are given by.

$$\begin{aligned} \vec {U}(t_k) = {\left\{ \begin{array}{ll} (80 , 80) &{} \text {if }\ k = 1\\ (80 - 5\cdot cl_{0,k} + 25 \ , \ 80 - 5\cdot cl_{1,k} \quad \quad \ ) &{} \text {if }\ k \text { is even } \\ (80 - 5\cdot cl_{0,k} \quad \quad \ \ , \ 80 - 5\cdot cl_{1,k} + 25)&{} \text {if }\ k \text { is odd and }\, k > 1 \end{array}\right. } \end{aligned}$$

which is equivalent to:

$$\begin{aligned} \vec {U}(t_k) = {\left\{ \begin{array}{ll} (80 , 80) &{} \text {if }\ k = 1\\ (105 - 5\cdot \frac{k}{2} \quad , \quad 85 - 5\cdot \frac{k}{2}) &{} \text {if }\ k \ \text { is even} \\ (80 - 5\cdot \frac{k-1}{2} \quad , \quad 105-5\cdot \frac{k-1}{2}) &{} \text {if }\ k \text { is odd and } k > 1 \end{array}\right. } \end{aligned}$$

Specifically:

$$\begin{aligned} \vec {U}(t_1)&= (80,80)\\ \vec {U}(t_2)&=(100,80)\\ \vec {U}(t_3)&= (75,100)\\ \vec {U}(t_4)&= (95,75) \end{aligned}$$

Note that this implementation of the DA is slightly unconventional, because if the game stops after both players have made the same number of bids then \(\alpha _1\) wins the prize, even though both players have bid the same amount of money.

In order to calculate the \(NV_i\) of the DA we need the following two lemmas (note that although the DA is a well-known game in the literature, these lemmas apply strictly to slightly unconventional version of the DA as defined above, so these lemmas are not taken from literature).

Lemma 3

In the Subgame Perfect Equilibrium of the Dollar Auction, \(\alpha _0\) always plays ‘finish’, while \(\alpha _1\) always plays ‘lay_claim’

Proof

This can be seen easily seen using the technique of backward induction (see also Fig. 3). \(\square \)

By combining this lemma with the utility functions of the DA as define above we immediately obtain that \((SPE_0, SPE_1) = (80,80)\).

Lemma 4

Let G be the dollar auction then we have:

$$\begin{aligned} spe_{w_k,\vec {\tau },i} = {\left\{ \begin{array}{ll} U_i(t_{k+1}) &{} \text {if }\ k \ \text { is even} \\ U_i(t_{k+2}) &{} \text {if }\ k \ \text { is odd} \end{array}\right. } \end{aligned}$$

Proof

Suppose the game is in state \(w_k\) and assume both players follow the subgame perfect equilibrium. If k is even then \(\alpha _0\) is the active player, which according to Lemma 3 will play \( finish \), and hence the game will end in the terminal state \(t_{k+1}\). If k is odd then \(\alpha _1\) is the active player, which which according to Lemma 3 will play \( lay\_claim \), so the game will advance to \(w_{k+1}\). Since \(k+1\) is even, we know that the game will end in state \(t_{k+2}\). \(\square \)

We are now ready to show that for the Dollar Auction we have \((NV_0, NV_1) = (100, 80)\).

We need to calculate \(nv_{w_0, \vec {\tau },i}\). By Lemma 4 we know that for \(w_1\) we have \(\vec {spe}_{w_1,\vec {\tau }} = \vec {U}(t_3) = (75,100)\). There is no other terminal state for which the utility vector dominates this result. Therefore, \(nv_{w_1,\vec {\tau },i} = rv_{w_1,\vec {\tau },i} = U_i(t_3)\).

For \(w_0\), the active player is \(\alpha _0\), so we have:

$$\begin{aligned} w^* = \mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{w' \in \{w_1,t_1\}} \{nv_{w_1,\vec {\tau },0} , nv_{t_1,\vec {\tau },0} \} \end{aligned}$$

with \(nv_{w_1,\vec {\tau },0} = U_0(t_3) = 75\) and \(nv_{t_1,\vec {\tau },0} = U_0(t_1) = 80\). So we see that \(w^*= t_1\), and thus \(rv_{w_0,\vec {\tau },i} = nv_{t_1,\vec {\tau },i} = U_i(t_1)\), which means the reservation values are given by \(\vec {U}(t_1) = (80,80)\). The only terminal state for which the utility vector dominates (80, 80) is \(t_2\) which has utility vector (100, 80).

Fig. 3
figure 3

State space of the Dollar Auction. Next to each terminal state \(t_k\), on the right-hand side, we have indicated its utility vector. Next to each non-terminal state \(w_k\), on the left-hand side, we have indicated its Subgame Perfect Equilibrium values \(spe_{w,\vec {\tau },i}\), assuming no agreements have been made. Also, on the left-hand side next to the arrows we have indicated the active player. The thick arrows indicate the optimal moves

1.2 The iterated prisoner’s dilemma

The Iterated Prisoner’s Dilemma (IPD) [1] is simply the Prisoner’s Dilemma repeated n times. The number of repetitions n is known to the agents, and the utility functions of the game is simply defined as the sum of the outcomes of each individual iteration.

The IPD has a very large search space. Each player has 2 legal actions in each round, so if the game lasts for n rounds, there are \(4^{n}\) possible sequences from the initial state to any terminal state. Here, following the GDL description, we have \(n=20\).

We can model each state as a triple \((d_0, d_1, m)\) where \(d_i\) is the number of points obtained so far by player \(\alpha _i\) and m is the total number of rounds that have been played. We then have the following formalization.

$$\begin{aligned} \mathcal {A}_i&= \{cooperate, defect \}\quad \forall i\in \{0,1\}\\ W&= \{(d_0, d_1, m) \mid d_0, d_1 \in \{0,1,\dots 100\}, m \in \{0, 1\dots 20\}\}\\ T&= \{(d_0, d_1, 20) \mid d_0, d_1 \in \{0,1\dots 100\}\}\\ w_0&= (0,0,0)\\ L_i(w)&= \{cooperate, defect \} \quad \forall i\in \{0, 1\}, \forall w\in W\setminus T\\ O&= T\\ out&= Id \\ U_i(d_0, d_1, 20)&= d_i \quad \forall i\in \{0,1\} \end{aligned}$$
$$\begin{aligned} u((d_0, d_1, m), a_0, a_1) = {\left\{ \begin{array}{ll} (d_0 + 3, d_1 + 3, m+1) &{} \text {if }\ a_0 = cooperate, a_1 = cooperate \\ (d_0 + 0, d_1 + 5, m+1) &{} \text {if }\ a_0 = cooperate, a_1 = defect \\ (d_0 + 5, d_1 + 0, m+1) &{} \text {if }\ a_0 = defect , a_1 = cooperate \\ (d_0 + 1, d_1 + 1, m+1) &{} \text {if }\ a_0 = defect , a_1 = defect \end{array}\right. } \end{aligned}$$

In order to calculate the \(SPE_i\) and \(NV_i\) values of the IPD we need the following lemmas. Of course, we do not claim that any of these lemmas are novel results. We only state them because we need them for our calculations.

Lemma 5

In the Subgame Perfect Equilibrium of the Iterated Prisoner’s Dilemma, every player always plays ‘defect’.

Proof

This follows directly from backward induction. \(\square \)

By combining this lemma with the update function of the IPD as defined above we obtain that the terminal state reached by two perfectly rational players is the state (20, 20, 20) (starting from \(w_0 = (0,0,0)\) apply the update function with \(a_0 = defect , a_1 = defect \) twenty times). Therefore, we conclude that we have \((SPE_0, SPE_1) = (20,20)\).

The rest of this section is dedicated to showing that \((NV_0, NV_1) = (60,60)\).

Lemma 6

Let \(s = (\vec {a}_0 \dots \vec {a}_{19})\), be a legal sequence of joint moves of the IPD starting at the initial state, for which there is at least one \(\vec {a}_j\) in which \(\alpha _0\) plays defect, and at least one \(\vec {a}_m\) in which \(\alpha _1\) plays defect. Then the resulting terminal state is not Pareto-optimal.

Proof

We will just prove this for the case that: \(\vec {a}_j =( defect , cooperate)\) and \(\vec {a}_m = (cooperate, defect )\). Let \(t_s\) denote the resulting terminal state of s. Furthermore, let \(s'\) denote the sequence that is identical to s except that both \(\vec {a}_j\) and \(\vec {a}_m\) are replaced with (cooperatecooperate), and let \(t_{s'}\) denote the resulting state of \(s'\). A simple calculation yields:

$$\begin{aligned} U_0(t_{s'}) = U_0(t_{s}) + 1 \quad U_1(t_{s'}) = U_1(t_{s}) + 1 \end{aligned}$$

We see that \(t_{s'}\) dominates \(t_s\). We leave the other cases for the reader to verify. \(\square \)

Let \(t_{i,k}\) denote the terminal state that results after a game in which \(\alpha _i\) has played ‘defect’ k times, and its opponent has always has played ‘cooperate’. Then one can easily verify from the above definition of the IPD that the players’ payoffs are given by:

$$\begin{aligned} U_i(t_{i,k})&= k\cdot 5 + (20-k)\cdot 3 = 60 + 2k \end{aligned}$$
(11)
$$\begin{aligned} U_{-i}(t_{i,k})&= (20-k)\cdot 3 = 60 - 3k \end{aligned}$$
(12)

where \(U_{-i}\) is the utility function of the opponent of \(\alpha _i\).

Lemma 7

The set of Pareto-optimal terminal states of the IPD consists of exactly those terminal states that result from a legal action sequence in which at least one of the two players always plays ‘cooperate’.

Proof

We already know from Lemma 6 that if a sequence of joint moves s results in a Pareto-optimal terminal state t, then at least one of the two players only plays ‘cooperate’ in s. So we just need to prove the converse: any terminal state resulting from such a sequence is Pareto-optimal. We use the same notation as above and denote such a terminal state by \(t_{i,k}\). Without loss of generality we can assume that \(i=0\), because the proof for \(i=1\) goes analogously. If \(t_{0,k}\) is not Pareto-optimal then it must be dominated by some Pareto-optimal solution. Therefore, from Lemma 6 we know that \(t_{0,k}\) must be dominated either by a state of the form \(t_{0,l}\), or by a state of the form \(t_{1,l}\). First we prove that \(t_{0,k}\) cannot be dominated by \(t_{0,l}\). This follows because if \(l<k\) then:

$$\begin{aligned} U_0(t_{0,l}) \quad = \quad 60 + 2l \quad < \quad 60 + 2k \quad = \quad U_0(t_{0,k}), \end{aligned}$$

while if \(k<l\) we have:

$$\begin{aligned} U_{1}(t_{0,l}) \quad = \quad 60 - 3l \quad < \quad 60 - 3k \quad = \quad U_{1}(t_{0,k}). \end{aligned}$$

Next, we prove that \(t_{0,k}\) cannot be dominated by \(t_{1,l}\). This follows because for all kl:

$$\begin{aligned} U_0(t_{1,l}) \quad = \quad 60-3l \quad < \quad 60 +2k \quad = \quad U_0(t_{0,k}). \end{aligned}$$

\(\square \)

Lemma 8

Suppose the game is in a state (ddk), then the only terminal state that is reachable from (ddk) which is Pareto-optimal and for which both players obtain a score higher than \(d+58-3k\) is the state that results from both players always playing cooperate.

Proof

If the game is in state (ddk) then there are \(20-k\) rounds to go. If both players continue by only playing cooperate in every remaining round, then each player will gain 3 points in each round, so each will obtain a final of \(d + 3\cdot (20-k) = d + 60 - 3k\). Indeed, this is greater than \(d+58-3k\).

Now, suppose \(\alpha _i\) always plays cooperate, and \(\alpha _j\) plays defectn times, with \(1\le n \le 20-k\). It is easy to calculate that \(\alpha _i\) will end with \(d+ 3\cdot ((20-k)-n)\) points, and since \(n \ge 1\) we have:

$$\begin{aligned} d+3\cdot ((20-k)-n) \quad = \quad d + 60 -3k - 3n \quad < \quad d + 58-3k. \end{aligned}$$

Finally, if both players play defect at least once, then we can use similar reasoning as in Lemma 6 to show that this is not Pareto-optimal. \(\square \)

We will now calculate the values of \(NV_i\). However, before doing this we should remark that strictly speaking the quantities \( NV _i\) and \( SPE _i\) are not defined for the IPD, because the IPD is not a turn-taking game. However, with a small adaptation specific to the IPD we can make these concepts make sense.

If G is the IPD, and \(N_{w,\vec {\sigma }}\) the negotiation domain for any state w and any standing agreement \(\vec {\sigma }\). Then, we define the utility value \(U_{w,\vec {\sigma }, i}(\vec {\sigma }')\) of any agreement \(\vec {\sigma }'\) by assuming both players will always play defect, except when it is in contradiction with the agreement \(\vec {\sigma }'\). Note that this definition make sense, because indeed, we can assume that without any agreements a rational player would always play defect. From this, it follows that we have \(rv_{w,\vec {\tau }, i} = nv_{w^*,\vec {\tau }, i}\), where \(w^* := u(w, defect , defect )\).

Let \(w_{d,k}\) denote the state (ddk). We will show that for all \(i\in \{0,1\}\) and all integers d we have:

$$\begin{aligned} nv_{w_{d,k},\vec {\tau },i} = d+60-3k \end{aligned}$$

We will prove this by ‘reversed induction’. That is, we first prove it for \(k=19\), and then we prove that if it holds for some k, then it must also hold for \(k-1\). Note that we have \(w_{d,k}^* = (d+1,d+1,k+1)\).

First, let k be 19, so \(w_{d,19}^* = (d+1,d+1,20)\in T\). Then, because this is a terminal state, for any integer d we have

$$\begin{aligned} rv_{w_{d,19},\vec {\tau }, i} = nv_{w_{d,19}^*,\vec {\tau }, i} = U_i(w_{d,19}^*) = U_i(d+1,d+1,20) = d+1. \end{aligned}$$
(13)

Now, if the game is in state \(w_{d,19}\) there is only one more round to go, so it is easy to verify that the only terminal state with a utility vector that dominates this one would be the state resulting from both players playing cooperate. Thus, the agreement to do that is the only rational agreement, and its utility vector is \((d+3,d+3)\), so we have

$$\begin{aligned} nv_{w_{d,19},\vec {\tau }, i} = d+3 \end{aligned}$$

which is indeed equal to \(d+60-3k\), with \(k=19\). This proves the base case.

Now suppose we have proved the theorem for \(k+1\). We now want to prove that it also holds for k. We have:

$$\begin{aligned} rv_{w_{d,k},\vec {\tau },i} = nv_{w_{d,k}^*,\vec {\tau }, i} = nv_{w_{d+1,k+1},\vec {\tau }, i} = (d+1)+60-3\cdot (k+1) \end{aligned}$$

which can be rewritten as:

$$\begin{aligned} rv_{w_{d,k},\vec {\tau },i} = d + 58 - 3k \end{aligned}$$

Now, using Lemma 8 we see that the only possible agreement that is rational and Pareto-optimal, is the deal in which both players agree to always play cooperate. In this case both players will receive \(d + 3\cdot (20-k)\) points, from which it follows that:

$$\begin{aligned} nv_{w_{d,k},\vec {\tau },i} = d + 3\cdot (20-k) = d + 60-3k \end{aligned}$$

If we now set \(d=k=0\) then we obtain our result that \((NV_0, NV_1) = (60, 60)\)

Appendix B

In this section we prove that concession games in general have multiple Nash Equilibria.

A normal-form game for two players is called a concession game if both players have the same set of actions, this set is an ordered set \(\mathcal {A} = \{a_0, a_1, \dots a_n\}\), and whenever \(k<m\) the payoff functions satisfy the following (in)equalities:

$$\begin{aligned} 0<\mathcal {U}_0(a_m, a_m)< & {} \mathcal {U}_0(a_k, a_k) \end{aligned}$$
(14)
$$\begin{aligned} 0<\mathcal {U}_1(a_k, a_k)< & {} \mathcal {U}_1(a_m, a_m) \end{aligned}$$
(15)
$$\begin{aligned} \forall i\in \{0,1\} \ \mathcal {U}_i(a_k, a_m)= & {} 0 \end{aligned}$$
(16)
$$\begin{aligned} \forall i\in \{0,1\} \ \mathcal {U}_i(a_m, a_k)= & {} \frac{1}{2} (\mathcal {U}_i(a_k, a_k) + \mathcal {U}_i(a_m, a_m))>0 \end{aligned}$$
(17)

We now aim to characterize the Nash Equilibria for such a game.

Lemma 9

If \(k<l\le m\) then we have \(\mathcal {U}_1(a_m, a_k) < \mathcal {U}_1(a_m, a_l)\).

Proof

By (15) we have:

$$\begin{aligned} \mathcal {U}_1(a_k, a_k) < \mathcal {U}_1(a_l, a_l) \end{aligned}$$

from this it follows that:

$$\begin{aligned} \frac{1}{2} (\mathcal {U}_i(a_k, a_k) + \mathcal {U}_i(a_m, a_m)) < \frac{1}{2} (\mathcal {U}_i(a_l, a_l) + \mathcal {U}_i(a_m, a_m)) \end{aligned}$$

then, by (17) we have that the left-hand side is equal to \(\mathcal {U}_1(a_m, a_k)\) while the right-and side is equal to \(\mathcal {U}_1(a_m, a_l)\) so we have:

$$\begin{aligned} \mathcal {U}_1(a_m, a_k) < \mathcal {U}_1(a_m, a_l) \end{aligned}$$

\(\square \)

Theorem 4

Let S be some subset of \(\mathcal {A}\) and let \(a_k\) be any action that is not in S, i.e. \(a_k \in \mathcal {A} \setminus S\). If one player plays a mixed strategy with support S, then playing \(a_k\) is not a best response for the other player.

Proof

Suppose agent \(\alpha _0\) is playing a mixed strategy in which each action \(a_i\) has a probability \(P_i\) of being played. If its support is S this means that \(P_i = 0\) iff \(a_i \not \in S\). The expected utility of \(\alpha _1\), when playing \(a_k\) is then given by:

$$\begin{aligned} E(\mathcal {U}_1(a_k)) = \sum _{i=0}^n P_i \cdot \mathcal {U}_1(a_i,a_k) \end{aligned}$$
(18)

We need to prove that there is some other action \(a_l\) for which \(E(\mathcal {U}_1(a_l)) > E(\mathcal {U}_1(a_k))\). In order to prove this we need to consider two different cases, namely the case that \(i<k\) for all \(a_i\in S\), and the case that there is at least one integer i such that \(k<i\) and \(a_i\in S\).

Case 1 Suppose that for all \(a_i\in S\) we have \(i<k\). Note that if \(i<k\) then by (16) we have \(\mathcal {U}_1(a_i, a_k) = 0\), while if \(i\ge k\) we have \(P_i = 0\) because \(a_i \not \in S\). Therefore, every term in the summation in Eq. (18) equals 0, so we have:

$$\begin{aligned} E(\mathcal {U}_1(a_k)) = 0 \end{aligned}$$

Now let l be largest integer such that \(a_l \in S\). Then we have:

$$\begin{aligned} E(\mathcal {U}_1(a_l)) = \sum _{i=0}^n P_i \cdot \mathcal {U}_1(a_i,a_l) = P_l \cdot \mathcal {U}_1(a_l,a_l) > 0 \end{aligned}$$

Here, the second equality holds because for all \(i<l\) we have \(\mathcal {U}_1(a_i,a_l) = 0\), and for all \(i>l\) we have \(P_i = 0\), so the only term that does not vanish is the term with \(i=l\). We have now shown that \(E(\mathcal {U}_1(a_l)) > E(\mathcal {U}_1(a_k))\) so we have proven the proposition for this case.

Case 2 Now suppose there is at least one integer j such that \(a_j\in S\) and \(k<j\). Let l be the smallest such integer. Since \(\mathcal {U}_1(a_i, a_k) = 0\) for all \(i<k\), and \(P_i=0\) for all i with \(k\le i < l\) we can rewrite Eq.  (18) as:

$$\begin{aligned} E(\mathcal {U}_1(a_k)) = \sum _{i=l}^n P_i \cdot \mathcal {U}_1(a_i,a_k) \end{aligned}$$

Note that for each term in this summation we have that \(k < l \le i\), so we can apply Lemma 9 and conclude that

$$\begin{aligned} E(\mathcal {U}_1(a_k)) = \sum _{i=l}^n P_i \cdot \mathcal {U}_1(a_i,a_k) < \sum _{i=l}^n P_i \cdot \mathcal {U}_1(a_i,a_l) = E(\mathcal {U}_1(a_l)) \end{aligned}$$

We have proven that the proposition also holds for this case, so it holds in general. \(\square \)

Corollary 1

In any Mixed Strategy Nash Equilibrium of a Concession Game, \(\alpha _0\) and \(\alpha _1\) must choose exactly the same support.

Proof

Let \(S_0\) denote the support of \(\alpha _0\) and \(S_1\) the support of \(\alpha _1\). In a mixed strategy Nash Equilibrium, every action in the support of \(\alpha _0\) must be a best response to \(\alpha _1\), and vice versa. Therefore, if \(a \in S_1\) then it must be a best response to \(\alpha _0\) and by Theorem 4 we then must have that \(a \in S_0\). Similarly, if \(a \in S_0\) then it must be a best response to \(\alpha _1\) and therefore it must be in \(S_1\). \(\square \)

We now know that if the players play a Nash Equilibrium with supports \(S_0\) and \(S_1\) respectively, then \(S_0 = S_1\). However, that does not mean that any subset \(S \subseteq \mathcal {A}\) can be the support of some Nash Equilibrium. The following three propositions show that at least if \(0<|S| \le 3\) then there is a Nash Equilibrium with support S.

Proposition 2

For any integer i with \(0 \le i \le n\) the pure strategy profile defined by both players choosing action \(a_i\) is a pure Nash Equilibrium.

Proof

This follows directly from Theorem 4 by setting \(S = \{a_i\}\). \(\square \)

Proposition 3

For any subset \(S \subseteq \mathcal {A}\) of size 2 there exists a Mixed Strategy Nash Equilibrium with supports \(S_0 = S_1 = S\).

Proof

Assume \(\alpha _0\) plays a strategy with support \(S = \{a_i, a_j\}\). Let \(P_1\) denote the probability of playing \(a_i\) and \(P_2\) the probability of playing \(a_j\). Furthermore, let us define \(A = \mathcal {U}_1(a_i,a_i)\), and \(B = \mathcal {U}_1(a_j,a_j)\), with \(0<A<B\) The question now is whether there exists a solution that makes the following two expressions equal:

$$\begin{aligned} E(\mathcal {U}_1(a_i))= & {} P_1\cdot A + P_2\cdot \frac{1}{2}(A+B)\\ E(\mathcal {U}_1(a_j))= & {} P_1\cdot 0 + P_2\cdot B \end{aligned}$$

with \(P_1 > 0\), \(P_2 > 0\), and \(P_1 + P_2=1\). It is easy to verify that the following solution solves the equation:

$$\begin{aligned} P_1 = \frac{B-A}{A+B} \quad \quad P_2 = \frac{2A}{A+B} \end{aligned}$$

We should still prove that every action in S is also a best response to \(\alpha _1\) playing a strategy with support S, but this goes analogously. \(\square \)

Proposition 4

For any subset \(S \subseteq \mathcal {A}\) of size 3 there exists a Mixed Strategy Nash Equilibrium with supports \(S_0 = S_1 = S\).

Proof

Assume \(\alpha _0\) plays a strategy with support \(S = \{a_i, a_j, a_k\}\). Let \(P_1, P_2\) and \(P_3\) denote their respective probabilities. Furthermore, let us define \(A = \mathcal {U}_1(a_i,a_i)\), \(B = \mathcal {U}_1(a_j,a_j)\) and \(C = \mathcal {U}_1(a_k,a_k)\), with \(0<A<B<C\). We need to equate these three expressions:

$$\begin{aligned} E(\mathcal {U}_1(a_i))= & {} P_1\cdot A + P_2\cdot \frac{1}{2}(A+B) + P_3\cdot \frac{1}{2}(A+C)\\ E(\mathcal {U}_1(a_j))= & {} P_1\cdot 0 + P_2\cdot B + P_3\cdot \frac{1}{2}(B+C)\\ E(\mathcal {U}_1(a_k))= & {} P_1\cdot 0 + P_2\cdot 0 + P_3\cdot C \end{aligned}$$

with all \(P_i\) positive and \(P_1 + P_2 + P_3 = 1\). One can verify that the following is a solution:

$$\begin{aligned} P_1= & {} \frac{CB+B^2 - AC - AB}{AB + AC + B^2 + BC}\\ P_2= & {} \frac{2AC-2AB}{AB + AC + B^2 + BC}\\ P_3= & {} \frac{4AB}{AB + AC + B^2 + BC}\\ \end{aligned}$$

Again, the case that \(\alpha _1\) plays with support S goes analogously. \(\square \)

We suspect that the Propositions 23 and 4 can be generalized to subsets of any size, but we leave this as an open conjecture.

Conjecture 1

Let S be any subset of \(\mathcal {A}\). Then there exists a Mixed Strategy Nash Equilibrium with supports \(S_0 = S_1 = S\).

Appendix C

In this Appendix we give a complete formalization of the concept of an Extensive-Form Game with Negotiations.

Definition 14

Let \(\vec {\mathcal {A}} = (\mathcal {A}_0, \mathcal {A}_1)\) be some pair of sets of actions. A simple protocol for \(\vec {\mathcal {A}}\), denoted \(Pr^{\vec {\mathcal {A}}}\), is a first-order protocol \(Pr = \langle \vec {\alpha }, \vec {\mathcal {A}}, W, w_0, T, \vec {L}, u, O, out \rangle \) such that:

  • \(W = \{w_0\} \cup T\)

  • \(T = \{t_{a_0,a_1} \mid (a_0,a_1) \in \mathcal {A}_0 \times \mathcal {A}_1\}\)

  • \(L_i(w_0) = \mathcal {A}_i\) for all \(i\in \{0, 1\}\)

  • \(u(w_0, a_0,a_1) = t_{a_0,a_1}\) for all \((a_0,a_1)\in \mathcal {A}_0 \times \mathcal {A}_1\).

  • \(O = \mathcal {A}_0 \times \mathcal {A}_1\)

  • \(out(t_{a_0,a_1}) = (a_0,a_1)\) for all \((a_0,a_1)\in \mathcal {A}_0 \times \mathcal {A}_1\).

A simple protocol is indeed in a certain sense the simplest possible protocol, since it only consists of each player picking one action, each terminal state directly corresponds to the chosen pair of actions, and is also labeled with that same pair of actions.

We will now define the concept of a ‘higher order protocol’, which is essentially a nested protocol in which each state corresponds to a lower-order protocol.

Definition 15

Let n be an integer with \(n>1\). A protocol of order\({\mathbf {n}}\) is a tuple \(Pr = \langle W, w_0, T, \mathcal {P}, O, u, out \rangle \), where:

  • W is a finite set of states.

  • \(w_0 \in W\) is the initial state.

  • \(T \subset W\) is the set of terminal states.

  • \(\mathcal {P}\) is the protocol map that assigns to each non-terminal state \(w \in W\setminus T\) a protocol \(\mathcal {P}(w)\) of order \(n-1\).

  • u is the update function that maps each pair (wo) consisting of a non-terminal state w and an outcome o of the corresponding protocol \(\mathcal {P}(w)\) to a new state \(w' = u(w,o) \in W\).

  • O is the outcome set.

  • out is the outcome function \(out: T \rightarrow O\) that maps each terminal state to an outcome.

Informally, this definition means that if we have a protocol of order 2, then in each state w the agents need to choose a sequence of joint actions according to some first-order protocol \(\mathcal {P}(w)\). The outcome of this first-order protocol will determine the next state of the second-order protocol.

In the following, the notation \(W^G\), and \(u^G\) represent the set of states and the update function of the game G respectively, and similarly for \(T^G\), \(O^G\), \(out^G\) and \(\vec {U}^G\).

Definition 16

Let G be an extensive-form game. Then we define a Extensive-Form Game With Negotiations\( NG \) over G as a protocol of order 2 together with a pair of utility functions, with the following properties:

  1. 1.

    The set of non-terminal states of \( NG \) is partitioned into a set of negotiation states, and a set of action states: \(W\setminus T = W_{nego} \cup W_{action}\), s.t. \(W_{nego} \cap W_{action} = \emptyset \). For each pair \((w,\vec {\sigma })\) there is one action state, denoted \(act_{w,\vec {\sigma }}\), and one negotiation state, denoted \(neg_{w,\vec {\sigma }}\):

    • \(W_{nego} = \{neg_{w,\vec {\sigma }} \mid w \in W^G \setminus T^G, \quad \vec {\sigma } \in \mathcal {S}^G\}\)

    • \(W_{action} = \{act_{w,\vec {\sigma }} \mid w \in W^G \setminus T^G, \quad \vec {\sigma } \in \mathcal {S}^G\}\)

  2. 2.

    The initial state of \( NG \) is the negotiation state \(neg_{w_0, \vec {\tau }}\) where \(w_0\) is the initial state of G and \(\vec {\tau }\) is the trivial joint strategy of G.

  3. 3.

    \(T = T^G\).

  4. 4.

    \(\mathcal {P}\) assigns to each negotiation state \(neg_{w,\vec {\sigma }}\) a negotiation protocol:

    $$\begin{aligned} \mathcal {P}(neg_{w,\vec {\sigma }}) = N_{w,\vec {\sigma }} \end{aligned}$$

    for which \(Agr = \mathcal {S}^G\).

  5. 5.

    \(\mathcal {P}\) assigns to each action state \(act_{w,\vec {\sigma }}\) the simple protocol (Definition 14) defined by the actions \(\mathcal {A}_i = \sigma _i(w)\):

    $$\begin{aligned} \mathcal {P}(act_{w,\vec {\sigma }}) = Pr^{\vec {\sigma }(w)} \end{aligned}$$
  6. 6.

    The update function is defined as follows:

    1. (a)

      \(u(neg_{w,\vec {\sigma }}, \vec {\sigma }') = act_{w,\vec {\sigma }'}\)

    2. (b)

      \(u(neg_{w,\vec {\sigma }}, \eta ) = act_{w,\vec {\sigma }}\).

    3. (c)

      \(u(act_{w,\vec {\sigma }}, a_0,a_1) = neg_{w',\vec {\sigma }}, \quad \) where \(w' = u^G(w,a_0,a_1), \quad \) if \(w' \not \in T\)

    4. (d)

      \(u(act_{w,\vec {\sigma }}, a_0,a_1) = w' \quad \) where \(w' = u^G(w,a_0,a_1), \quad \) if \(w' \in T\).

  7. 7.

    \(O = O^G\)

  8. 8.

    \(out = out^G\)

  9. 9.

    \(\vec {U} = \vec {U}^G\)

For each action state \(act_{w, \vec {\sigma }}\) or negotiation state \(neg_{w, \vec {\sigma }}\) we call \(\vec {\sigma }\) the standing agreement. Let us now discuss a number of properties of \( NG \):

  • This second-order protocol has three types of states: actions states, negotiation states and terminal states (Line 1).

  • Every negotiation state is followed by an action state (Lines 6a and 6b).

  • Every action state is followed by either a negotiation state or a terminal state (Lines 6c and 6d).

  • In each action state \(act_{w,\vec {\sigma }}\) each agent selects an action from the state w of the game G (Line 5).

  • In each action stage \(act_{w,\vec {\sigma }}\) the agents must obey the standing agreement, that is: each \(\alpha _i\) must choose its actions from \(\sigma _i(w)\) (Line 5).

  • In each negotiation state \(neg_{w,\vec {\sigma }}\) the agents negotiate a joint strategy for G (Line 4).

  • When the negotiators agree on some joint strategy \(\sigma '\) then that will become the new standing agreement (Line 6a).

  • When the negotiators do not come to an agreement then the currently standing agreement remains the standing agreement (Line 6b).

Intuitively, this means that NG alternates between action states and negotiation states, where in each action state \(act_{w, \vec {\sigma }}\) the players choose some action from the original game G, but under the restriction that they have to obey the earlier agreement \(\vec {\sigma }\), and where in each negotiation stage \(neg_{w,\vec {\sigma }}\) the players have the change to re-negotiate a new agreement.

The initial state of NG is \(neg_{w_0, \vec {\tau }}\), which means that initially, before any agreement has been made, the trivial joint strategy \(\vec {\tau }\) is considered the standing agreement. This is not a restriction, because the trivial joint strategy, by definition, does not pose any restrictions on the agents’ actions at all. In other words, it is equivalent to saying that there is no standing agreement.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Jonge, D., Zhang, D. Strategic negotiations for extensive-form games. Auton Agent Multi-Agent Syst 34, 2 (2020). https://doi.org/10.1007/s10458-019-09424-y

Download citation

  • Published:

  • DOI: https://doi.org/10.1007/s10458-019-09424-y

Keywords