1 Introduction

As the use of intelligent agents is becoming increasingly widespread, and those agents are becoming increasingly autonomous, it is to be expected that in the near future it will be common for multiple such agents to be deployed to work together, while representing different stakeholders with different objectives. This would require negotiation between such agents and we therefore expect that automated negotiation will become a more and more prevalent technology in our daily lives.

The field of automated negotiation deals with systems of autonomous agents that each have different objectives, but that need to cooperate to ensure beneficial outcomes. When presented with a problem, such agents propose potential solutions to one another, in the hope to find one that is acceptable to everyone. This means that, even though the agents are purely self-interested, they each still need to make sure that their proposals are also sufficiently beneficial to the other agents, otherwise none of those proposals would ever get accepted [2]. A simple example is the scenario of a buyer and a seller that are negotiating the price of a car. While the seller prefers to sell the car for the highest possible price, she has to keep in mind that the price must still be low enough for the buyer to accept the offer.

The main question in automated negotiation is how to make this trade-off, and is commonly known as the bargaining problem. In the case that the agents’ utility functions are common knowledge, this problem has been solved theoretically [3,4,5]. However, the question how to solve it in the case that the agents do not know each others’ utility functions is a long-standing open problem [6]. While we do not claim to have solved it in general, we do argue in this paper that our recently introduced algorithm called MiCRO [1] solves it under a number of specific conditions.

In the literature, many different negotiation strategies have been proposed that aimed to tackle the bargaining problem heuristically. Most of them start by making very selfish proposals and, as time passes, concede by making proposals that yield more and more utility to the opponent. While the agents are not given any knowledge about their opponents’ utility functions, they typically try to infer this information at run-time, from the proposals they receive from their opponents. This is known as opponent modeling. To test such strategies, the annual Automated Negotiating Agent Competition (ANAC) has been organized since 2010. Most agents that were successful in this competition used various different machine learning techniques for opponent modeling. However, we have recently shown that many of the ANAC negotiation scenarios can in fact be tackled just as well without opponent modeling [1]. To demonstrate this, we presented a new negotiation strategy, called MiCRO, which does not employ any kind of opponent modeling or machine learning at all, and yet outperformed many of the strongest participants of ANAC. Furthermore, we claimed that, under certain conditions, it is game-theoretically optimal.

The main goal of this paper is to formally prove our claim that MiCRO is game-theoretically optimal on many of the ANAC domains. Specifically, we prove this in four steps:

  1. 1.

    We define the notion of a consistent negotiation strategy, which is a negotiation strategy that satisfies a number of rationality conditions, and prove that MiCRO is consistent (Sect. 5).

  2. 2.

    We define the notion of the balance set, which is the set of possible agreements that two MiCRO agents could make with each other (Sect. 6).

  3. 3.

    We then formally prove that (under a number of mild assumptions that typically hold in ANAC) MiCRO is a best response against itself among all possible consistent negotiation strategies (Sect. 7).

  4. 4.

    We define the notion of a balanced negotiation domain as a domain where the balance set is a subset of the set of optimal agreements (meaning that two MiCRO agents would make optimal agreements in such a domain) and we show that most of the domains used in ANAC are indeed (approximately) balanced (Sect. 8.2).

In addition, we also make the following contributions:

  • We investigate why so many ANAC domains are balanced, and show that this is related to a certain kind of symmetry in those domains (Sect. 8.3).

  • We propose a number of improvements to MiCRO that help defend it against inconsistent opponents (Sect. 9).

  • We identify a number of research questions about opponent modeling that may be answered with the help of MiCRO (Sect. 10).

We feel, however, that we should stress the following.

Remark 1

MiCRO is not intended to be used as a real-world negotiation algorithm. It is purely intended as a theoretical device that allows us to analyze the complexity of negotiation domains, and as a benchmark algorithm that allows us to assess the strength of other negotiation algorithms.

We argue that our work is important for the following reasons:

  1. 1.

    The nice theoretical properties of MiCRO in combination with its simplicity make it an ideal benchmark strategy. After all, it does not make much sense to use a highly sophisticated machine learning algorithm if it does not even outperform a much simpler strategy like MiCRO.

  2. 2.

    The fact that we have identified a set of conditions under which MiCRO is theoretically optimal is useful because it allows us to test for any other negotiation strategy to what extent it is able to achieve similar results under those conditions. In other words, it provides us with an objective ‘optimality test’ for future negotiation algorithms. In Sect. 10 we will present a number of concrete research questions that one may ask for any new negotiation algorithm, and that may be answered thanks to our work.

  3. 3.

    Knowing under which conditions MiCRO is optimal will be useful for researchers to design more challenging negotiation test cases for which there is no known optimal algorithm.

Finally, we should remark that our original paper [1] contained a number of small errors, so some of the claims we made in that paper were not formulated correctly.Footnote 1 In this paper we fix those errors. Specifically, we have made a small change in the definition of an ‘inconsistent proposal’, and we have made a small change to the definition of MiCRO itself.

The source code of MiCRO, as well as our code to calculate the balance set and the balance values of a given domain, have been made publicly available at: https://www.iiia.csic.es/~davedejonge/downloads.

2 Related work

The first edition of ANAC was held in 2010, and has since then served as the reference for most research on automated negotiation. The negotiation domains that are used in most editions of this competition still form the most commonly used benchmark data set in the field of automated negotiation.

The first three editions of ANAC focused on basic bilateral negotiations with linear utility functions [7], and in 2013 the option was added for agents to learn from previous negotiation sessions [8]. In 2014 the focus shifted to very large domains with non-linear utility functions [9]. However, from 2015 onward the competition went back to smaller domains and linear utility, but focused on multilateral negotiations [10]. In 2019 and 2020 the focus shifted back to bilateral negotiations, but this time the agents only had partial knowledge about their own utility functions [11]. Since 2021 the agents’ preferences have again been represented by ordinary linear utility functions, and, like in 2013, the option to learn from previous negotiation sessions was re-introduced. Furthermore, since 2017 ANAC has been extended with a number of separate leagues focused on more specific challenges, such as the game of Diplomacy [12], supply chain environments [11], the game of Werewolves [11], and negotiations between agents and humans [13].

Despite the increased attention to more complex scenarios, most work on automated negotiation still seems to be focused on small bilateral scenarios with linear utility functions. For example, the main leagues of ANAC 2019–2023 were all exclusively based on such scenarios. Furthermore, the domains of ANAC 2012–2013 are still widely considered to be the default benchmark for automated negotiation, and have been used in many recent high-level publications. For example, Sengupta et al. [14] used the ANAC 2013 domains to test their work, while Mirzayi et al. [15] used a small selection of domains from both ANAC 2012–2013, and Bakker et al. [16] used similar linear ANAC-style domains.

Even for the basic settings of ANAC 2010–2013, a plethora of different opponent modeling techniques have been applied by the participants. For example, to predict the opponent’s concession strategy, Agent K [17], the winner of ANAC 2010, used an extrapolation algorithm based on the average and standard deviation of the utility the opponent has offered so far. Other participants applied more sophisticated machine learning techniques, such as non-linear regression (IAMhaggler [18]), Gaussian processes (IAMHaggler2011 [19]) or wavelet decomposition and cubic smoothing splines (OMAC [20]). To learn the opponent’s utility function, a commonly used technique is Bayesian learning, which was first proposed for automated negotiation in [21] and later applied by many participants of ANAC, such as FSEGA [22], IAMhaggler [18], and Nice Tit-For-Tat [23]. Several other agents used reinforcement learning for this goal, such as HardHeaded [24] and ValueModelAgent [25]. A large survey of opponent modeling techniques for negotiation can be found in [26].

While the above mentioned agents all rely on heuristic methods, there is also a large body of work on theoretically optimal solutions for automated negotiation. However, these solutions typically cannot be applied to the ANAC competitions, because they are based on certain assumptions that do not hold in ANAC. Especially the assumption that both agents’ know each others’ utility functions has been studied extensively. The most well-known example is the Nash bargaining solution [3]. Nash proved that if both negotiators have full knowledge of each others’ utility functions, and the agreement space forms a convex set, then (under a number of additional assumptions) the outcome of the negotiation would be the agreement that maximizes the product of the agents’ utilities. Rubinstein later showed that if the negotiation takes place over discrete rounds and the utility functions are time-discounted then the optimal strategy indeed leads to Nash’s solution [4]. Much more recently, however, we showed that if the agreement space is discrete, rather than convex, and we assume that the negotiating agent will adopt each ‘side’ of the negotiation (e.g. ‘buyer’ and ‘seller’) equally often, then the optimal agreement is in fact the one that maximizes the sum of the utility functions, rather than the product [5].

Much less is known about optimal negotiation strategies for the case that the opponent’s utility is unknown, although several optimal strategies for this scenario have been proposed as well [27, 28]. However, they do need to assume their agent has an accurate model of the probability that the opponent will accept any given proposal, and it remains unclear to what extent such models can realistically be obtained, even in a controlled environment such as ANAC. Furthermore, they only prove their algorithm is a best response against a specific type of opponent, but it remains unclear whether it is also a best response against itself, and therefore whether or not it can form a Nash equilibrium. Later, some improvements to these strategies were proposed in [29], but they do not resolve the above mentioned limitations. An optimal strategy for the acceptance of proposals was presented in [30], but their proof of optimality depends on the assumption that the agent cannot make any proposals itself.

To the best of our knowledge, MiCRO is the only known strategy that has provably optimal properties under the conditions of ANAC.

3 Definitions

In this section we introduce the formal definitions and notation that are required to understand the rest of this paper. Please note that we are not claiming that anything presented in this section is novel.

3.1 Negotiation domains

In a classical scenario for automated negotiation, two agents \(\alpha _1\) and \(\alpha _2\) are bargaining to agree on the details of a contract. The agents have a fixed amount of time to make proposals to one another, according to some given negotiation protocol. That is, each agent may propose an offer \(\omega\), from some given set of possible offers \(\Omega\), to the other agent which may then either accept the proposal or reject it and make a counter proposal \(\omega ' \in \Omega\). The agents continue making proposals to each other until either the deadline has passed, or one of the agents accepts a proposal made by the other. Each agent \(\alpha _i\) has a private utility function \(u_i\) that assigns to each offer \(\omega \in \Omega\) a utility value \(u_i(\omega ) \in {\mathbb {R}}\), but which is not known to the other agent. When an offer \(\omega\) gets accepted the agents receive their respective utility values \(u_1(\omega )\) and \(u_2(\omega )\) corresponding to this offer. On the other hand, if the negotiations fail because no proposal is accepted before the deadline, then each agent \(\alpha _i\) receives a fixed utility value \(rv_i \in {\mathbb {R}}\), which is known as its reservation value.

Definition 1

A bilateral negotiation domain \(D\) is a tuple \(\langle \Omega , u_1, u_2, rv_1, rv_2\rangle\) where:

  • \(\Omega\) is a finite set known as the offer space, representing the possible offers the agents can propose to one another.

  • \(u_1\) and \(u_2\) are two utility functions \(u_i: \Omega \rightarrow {\mathbb {R}}\) (one for each agent) which assign to each offer \(\omega \in \Omega\) a utility value \(u_i(\omega ) \in {\mathbb {R}}\).

  • \(rv_1\), \(rv_2 \in {\mathbb {R}}\) are the reservation values of the respective agents.

In the example of a buyer and a seller that bargain over the price of a car, the set of possible offers \(\Omega\) would be the set of prices that one can reasonably expect them to offer or ask.

In the literature many different kinds of negotiation domains have been studied. For example, one can distinguish between so-called ‘single-issue’ domains, ‘multi-issue’ domains, or ‘strategic’ domains [31]. Furthermore, one can distinguish between linear utility functions and non-linear utility functions. However, in this work these distinctions are not relevant. Our work applies to any of these types of domains, as long as the offer space is not too large and for any given offer \(\omega\) the agent can calculate its own utility value \(u_i(\omega )\) efficiently.

Remark 2

In this paper we do not make any assumptions about the type of utility functions (linear or non-linear), and we do not make any assumptions about the structure of the offer space (single-issue or multi-issue), other than that it is a finite set.

Furthermore, it is sometimes assumed in the literature that the utility obtained by the agents also depends on time. That is, when the agents agree on some offer \(\omega\), the utility received by the agents is given by \(\delta ^{t}\cdot u_i(\omega )\) where \(\delta\) is a real number between 0 and 1, called the discount factor and where t is the time at which they make the deal. This means that the later the deal is made, the less utility they receive. In this paper we do not take into account such discount factors. However, we will briefly argue in Sect. 8.4 that this does not matter much because the presence of discount factors only yields even more benefit to MiCRO.

3.2 Negotiation as an extensive-form game

In this section we will formally define automated negotiation as an extensive-form game. To this end we will define the actions of this game (Definition 2), the states of this game (Definition 3), for each state, the set of actions that are legal in that state (Definition 4), and the state-transition function that determines how the actions of the players change the state of the game (Definition 6). One should keep in mind, however, that in this game the players do not know each other’s utility functions.

We will always silently assume there is a fixed set of offers \(\Omega\).

Definition 2

We define a negotiation action to be a tuple

$$\begin{aligned}(i, \eta , \omega , t) \ \ \in \ \ \{1,2\}\times \{{\mathfrak {p}}, {\mathfrak {a}}\} \times \Omega \times {\mathbb {R}}\end{aligned}$$

where i represents the index of the agent performing the action, and \(\eta\) represents the type of the action, which can be either ‘proposal’, represented by the symbol \({\mathfrak {p}}\), or ‘acceptance’, represented by the symbol \({\mathfrak {a}}\). Furthermore, \(\omega\) is the offer that is being proposed or accepted, and t is the time at which the agent proposes or accepts the offer.

So, the notation \((1,{\mathfrak {p}}, \omega , t)\) means that agent \(\alpha _1\) proposes offer \(\omega\) at time t, and the notation \((2,{\mathfrak {a}}, \omega , t)\) means that agent \(\alpha _2\) accepts offer \(\omega\) at time t.

We use \({\mathcal {A}}\) to denote the set of all possible negotiation actions. Furthermore, we use \({\mathcal {A}}_1\) to denote the set of all negotiation actions with \(i=1\), and \({\mathcal {A}}_2\) for the set of all negotiation actions with \(i=2\). That is:

$$\begin{aligned} {\mathcal {A}}_1:= & {} \{1\}\times \{{\mathfrak {p}}, {\mathfrak {a}}\} \times \Omega \times {\mathbb {R}}\\ {\mathcal {A}}_2:= & {} \{2\}\times \{{\mathfrak {p}}, {\mathfrak {a}}\} \times \Omega \times {\mathbb {R}}\\ {\mathcal {A}}:= & {} {\mathcal {A}}_1 \cup {\mathcal {A}}_2 \end{aligned}$$

If \(a= (i, \eta , \omega , t)\) and \(a' = (i', \eta ', \omega ', t')\) are two negotiation actions with \(t<t'\), then we say that \(a\) comes before \(a'\), or that \(a'\) comes after \(a\). We may denote this as \(a \triangleleft a'\).

$$\begin{aligned} (i, \eta , \omega , t) \triangleleft (i', \eta ', \omega ', t') \quad {\mathop {\longleftrightarrow }\limits ^{\text {def}}}\quad t < t' \end{aligned}$$

Definition 3

We define a negotiation state \(s\) to be a sequence of negotiation actions:

$$\begin{aligned}s= (a_1, a_2, \dots , a_k)\end{aligned}$$

sorted in chronological order (i.e. \(a_j \triangleleft a_{j+1}\) for all \(j \in \{1, 2, \dots , k-1\}\)). The integer k can be any arbitrary non-negative integer, including zero. The set of all possible negotiation states is denoted as \({\mathcal {S}}\).

We may write (with slight abuse of notation) \(a\in s\) to indicate that \(a\) is an element of the sequence \(s\), and \(a\not \in s\) to indicate it is not in the sequence. Furthermore, whenever the letter i is used as the index of an agent \(\alpha _i\), we may use the notation \(3-i\) to refer to the index of the opponent. After all, if \(i=1\) then \(3-i = 2\), and if \(i=2\) then \(3-i = 1\). So, \(\alpha _{3-i}\) is indeed the opponent of \(\alpha _i\).

A negotiation protocol \(\Pi\) specifies, for any state s whether the negotiations have finished or not and which agreements have been made, and for any state s and agent \(\alpha _i\) which offers that agent may propose or accept. That is, it is a pair of maps \(\Pi = (\Pi _{stat}, \Pi _{ac})\) where \(\Pi _{stat}\) represents the status of the negotiations and \(\Pi _{ac}\) represents the agents’ legal actions.

Specifically, \(\Pi _{stat} : {\mathcal {S}}\rightarrow \Omega \cup \{{\mathfrak {o}},{\mathfrak {f}}\}\), where \({\mathfrak {o}}\) and \({\mathfrak {f}}\) are two abstract symbols, with the following meaning:

  • \(\Pi _{stat}(s) = {\mathfrak {o}}\) means that in state \(s\) the negotiations are ongoing.

  • \(\Pi _{stat}(s) = {\mathfrak {f}}\) means that in state \(s\) the negotiations have failed, i.e. finished without agreement.

  • \(\Pi _{stat}(s) = \omega \in \Omega\), means that in state \(s\) the negotiations have finished with agreement \(\omega\).

Furthermore, \(\Pi _{ac} : {\mathcal {S}}\times \{1,2\} \rightarrow 2^{\mathcal {A}}\) determines the legal actions for the agents. That is for any state \(s\) and any agent \(\alpha _i\), the legal actions for that agent in that state are given by \(\Pi _{ac}(s, i)\).

Arguably the most commonly used negotiation protocol in the literature is the alternating offers protocol (AOP) [32]. In the AOP, the agents take turns alternately. In each turn, the agent whose turn it is may propose any offer \(\omega\) from the offer space \(\Omega\), or may accept the previous offer proposed by the opponent. Negotiations finish either when the deadline passes, or as soon as one of the agents accepts a proposal. This protocol has three parameters: the offer space \(\Omega\), the deadline \(T\in {\mathbb {R}}\), and the index \(i\in \{1,2\}\) of the agent that is the first to make a proposal.

Definition 4

Let \(\Omega\) be some set of offers, let \(s= (a_1 , a_2, \dots , a_k)\) be any negotiation state of length k, and, in case \(k\ne 0\), let \(a_k = (i_k, \eta _k, \omega _k,t_k)\). Then, the alternating offers protocol with deadline \(T\in {\mathbb {R}}\) and initial agent \(i\in \{1,2\}\), is defined as follows:

$$\begin{aligned} &\text {If }\ k=0, \ \text { then:}\qquad \qquad \Pi _{stat} = {\mathfrak {o}}\\ &\qquad\quad \qquad \qquad \qquad \qquad \Pi _{ac}(s, i) = \{i\}\times \{{\mathfrak {p}}\}\times \Omega \times (0,\infty ) \\ & \qquad \qquad \quad\qquad \qquad \qquad \Pi _{ac}(s, 3-i) = \emptyset \\&\text {else, if }\ t_k > T, \ \text { then:} \,\quad \Pi _{stat}(s) = {\mathfrak {f}}\\ & \qquad \qquad \qquad \qquad \qquad\quad \Pi _{ac}(s,1) = \Pi _{ac}(s,2) = \emptyset \\ &\text {else, if }\ \eta _k = {\mathfrak {a}}, \ \text { then:} \,\quad \Pi _{stat}(s) = \omega _k \\ &\qquad\qquad\qquad \qquad \qquad \quad \Pi _{ac}(s,1) = \Pi _{ac}(s,2) = \emptyset \\ &\text {else: }\qquad \quad \qquad\qquad\qquad\Pi _{stat} = {\mathfrak {o}}\\ & \qquad\qquad\qquad \quad\qquad \qquad\Pi _{ac}(s, {\hat{k}}) = \big ( \ \{{\hat{k}}\}\times \{{\mathfrak {p}}\}\times \Omega \times (t_k, \infty ) \ \big ) \cup \\ &\quad \qquad \qquad \qquad \qquad \qquad \big ( \ \{{\hat{k}}\}\times \{{\mathfrak {a}}\}\times \{\omega _k\} \times (t_k, \infty ) \ \big)\\ &\qquad \qquad \quad \qquad \qquad \qquad \Pi _{ac}(s, 3 - {\hat{k}}) = \emptyset \end{aligned}$$

where:

$$\begin{aligned} {\hat{k}}:= {\left\{ \begin{array}{ll} 1 \ \text { if }\ k+i \ \text { is odd}\\ 2 \ \text { if }\ k+i \ \text { is even}\\ \end{array}\right. } \end{aligned}$$

The first line says that in the initial state, agent \(\alpha _i\) can propose any offer from \(\Omega\). The second line says that once any action has been made after the deadline, the negotiations finish with failure. The third line says that if an agent accepts a proposal before the deadline, then the negotiations finish, with the accepted proposal as the outcome. The fourth line says that, in all other cases, the next agent \(\alpha _{{\hat{k}}}\) may propose any offer from the set \(\Omega\), or may accept the previous proposal.

Strictly speaking, this definition says that, unless a proposal is accepted before the deadline, the negotiations will only have finished once either of the agents has taken an action after the deadline. However, since any action taken after the deadline does not have any influence on the outcome anyway (since we will always have \(\Pi _{stat}(s) = {\mathfrak {f}}\)), we can in practice consider the negotiations to be finished immediately after the deadline has passed. The only reason we formally require some agent to take an action after the deadline, is to simplify the formalization.

In the literature, it is typically also assumed that agents are allowed to end the negotiations without agreement before the deadline. In the above formalization this is also possible, by choosing an action \(({\hat{k}},\eta ,\omega ,t)\) with \(t>T\), which means the agent chooses to wait until the deadline has passed before making its next action.

Definition 5

Let \(\Pi\) be the alternating offers protocol. Then, a negotiation strategy \(\sigma\) for agent \(\alpha _i\) is a partial function \(\sigma : {\mathcal {S}}\rightarrow {\mathcal {A}}_i\), such that:

  • if \(\Pi _{ac}(s, i) \ne \emptyset\) then \(\sigma (s) \in \Pi _{ac}(s, i)\)

  • if \(\Pi _{ac}(s, i) = \emptyset\) then \(\sigma (s)\) is undefined.

In other words, the map \(\sigma\) is only defined for those states \(s\) in which it is \(\alpha _i\)’s ‘turn’, according to the AOP.

If \(\sigma (s) = (i, {\mathfrak {p}},\omega , t)\) it means that if the negotiation is in state \(s\) then the agent applying strategy \(\sigma\) waits until time t and then proposes offer \(\omega\). Similarly, if \(\sigma (s) = (i, {\mathfrak {a}},\omega , t)\) it means the agent waits until time t and then accepts offer \(\omega\). The condition \(\sigma (s) \in \Pi _{ac}(s,i)\) simply means that the strategy only selects actions that the protocol allows it to select.

For any extensive-form game, its transition function \(\tau\) defines, for a given state \(s\) and a given action \(a\) the next state \(\tau (s, a)\). We will now define the transition function for our model of negotiation.

Definition 6

The transition function \(\tau\) is a map \({\mathcal {S}}\times {\mathcal {A}}\rightarrow {\mathcal {S}}\), defined as follows. Let \(s= (a_1 , a_2, \dots , a_k)\) be a negotiation state, and let \(a= (i, \eta ,\omega ,t)\) be a negotiation action, then the next state \(s' = \tau (s, a) \in {\mathcal {S}}\) is defined as:

$$\begin{aligned} \tau (s, a):= (a_1, a_2, \dots , a_k, a_{k+1}) \end{aligned}$$

where \(a_{k+1} = (i,\eta ,\omega , t + \epsilon _{k+1})\) and where \(\epsilon _{k+1}\) is a strictly positive real number, drawn randomly from some probability distribution.

In other words, whenever an agent performs an action \(a\), it will cause the original state s to be concatenated with a new action \(a_{k+1}\). However, it is important to note that this action is not exactly the action \(a\) performed by the agent. Instead, if the agent performs action \((i, \eta ,\omega ,t)\), then the state is updated with action \((i,\eta ,\omega ,t + \epsilon _{k+1})\). This represents the fact that when the agent sends a propose- or accept- message at time t, it will take a small amount of time \(\epsilon _{k+1}\) for that message to arrive, due to network latency. Since in general we do not know exactly how much time this takes, we model \(\epsilon _{k+1}\) as a random variable. The probability distribution function of this random variable is not relevant for this paper. Note that this makes negotiation an non-deterministic extensive-form game. To be precise, we may assume that both agents are connected to a ‘negotiation manager’ over a network. Each time an agent sends a proposal or acceptance, this message has to go to the negotiation manager, and the state of the negotiation is determined by the time at which the negotiation manager receives the message.

Definition 7

Let \(\Pi\) be the alternating offers protocol, and \(D\) be a negotiation domain, then the extensive-form game \(\Gamma _D\) is defined as follows:

  • The players are the two negotiating agents \(\alpha _1\) and \(\alpha _2\).

  • The players’ sets of possible actions are given by \({\mathcal {A}}_1\) and \({\mathcal {A}}_2\) as in Definition 2.

  • The set of states of the game are given by \({\mathcal {S}}\) as in Definition 3.

  • The initial state of the game is the empty sequence \(() \in {\mathcal {S}}\).

  • For any given state \(s\) and agent \(\alpha _i\), the set of legal actions are given by \(\Pi _{ac}(s, i)\) as in Definition 4.

  • The state transition function is defined as \(\tau\) in Definition 6.

  • The set of terminal states is defined as the set of all states \(s\) for which \(\Pi _{stat}(s) \ne {\mathfrak {o}}\)

  • The players’ utility functions \(U_i\) over the terminal states are given by:

    $$\begin{aligned} U_i(s) = {\left\{ \begin{array}{ll} u_i(\omega ) &{} \text {if }\ \Pi _{stat}(s) = \omega \in \Omega \\ rv_i &{} \text {if }\ \Pi _{stat}(s) = {\mathfrak {f}}\\ \end{array}\right. } \end{aligned}$$

    where \(u_i\) and \(rv_i\) are the utility function and reservation value of the negotiation domain for each agent \(\alpha _i\).

Problem 1

(The Bargaining Problem) Let D be any negotiation domain and \(\Gamma _D\) the corresponding extensive-form game as defined in Definition 7. What is the optimal strategy for the game \(\Gamma _D\)?

3.3 More notation

We will now introduce some more notation that we will use to state and prove our lemmas and theorems.

Recall that, if \(a= (i,\eta ,\omega ,t)\) and \(a' = (i',\eta ',\omega ',t')\), then \(a\triangleleft a'\) means that \(t<t'\). Similarly, we may use the notation \(a\triangleleft t'\) to denote that the time t at which action \(a\) takes place is before \(t'\), or \(t \triangleleft a'\) to denote that action \(a'\) takes place after time t.

For any state \(s= (a_1, a_2, \dots , a_k)\) we use \(s_{<t}\) to refer to the subsequence of s consisting of all actions \(a\in s\) that take place before t and we use \(s_{>t}\) to denote the subsequence consisting of all actions \(a\in s\) that take place after t. For example, if we have:

$$\begin{aligned} s = ((1,{\mathfrak {p}},\omega , 0.1), (2,{\mathfrak {p}},\omega ', 0.3), (1,{\mathfrak {p}},\omega '', 0.5), (2,{\mathfrak {p}},\omega ''', 0.7), (1,{\mathfrak {a}},\omega ''', 0.9) ) \end{aligned}$$

then:

$$\begin{aligned} s_{<0.6}= & {} ((1,{\mathfrak {p}},\omega , 0.1), (2,{\mathfrak {p}},\omega ', 0.3), (1,{\mathfrak {p}},\omega '', 0.5))\\ s_{>0.6}= & {} ((2,{\mathfrak {p}},\omega ''', 0.7), (1,{\mathfrak {a}},\omega ''', 0.9)). \end{aligned}$$

Formally, if \(s= (a_1, a_2, \dots , a_k)\), then:

$$\begin{aligned} s_{<t}:= & {} {\left\{ \begin{array}{ll} () &{} \text {if }\ \ t \triangleleft a_1 \\ (a_1, a_2, \dots , a_j) &{} \text {if }\ \ a_j \triangleleft t \ \ \wedge \ \ t \triangleleft a_{j+1} \\ s &{} \text {if }\ \ a_k \triangleleft t \end{array}\right. }\\ s_{>t}:= & {} {\left\{ \begin{array}{ll} () &{} \text {if }\ \ a_k \triangleleft t \\ (a_{j+1}, a_{j+2}, \dots , a_k) &{} \text {if } \ \ a_j \triangleleft t \ \ \wedge \ \ t \triangleleft a_{j+1} \\ s &{} \text {if }\ \ t \triangleleft a_1 \end{array}\right. } \end{aligned}$$

Definition 8

Let \(s= (a_1, a_2, \dots , a_k)\) be any state. Then we say that an action \(a\) is a reply to another action \(a'\), denoted \(a= reply_s(a')\), if \(a\) follows directly after \(a'\). That is, for any \(j \in \{1,2,\dots , k-1\}\):

$$\begin{aligned}reply_s(a_j):= a_{j+1}\end{aligned}$$

Definition 9

Let \(s\) be any given state. Assuming negotiations follow the AOP, we say that \(\alpha _i\) rejects an offer \(\omega\) at time t, denoted \(rej_s(i,\omega ,t)\), if its opponent \(\alpha _{3-i}\) has proposed \(\omega\) and \(\alpha _i\) replies to it with a new proposal, rather than accepting it.

That is, the predicate \(rej_s(i,\omega ,t)\) holds iff there exist \(\omega ' \in \Omega\) and \(t'\in {\mathbb {R}}\) such that the following two conditions are both satisfied:

  • \((3-i,{\mathfrak {p}},\omega ,t') \in s_{<t}\)

  • \(reply_s((3-i,{\mathfrak {p}},\omega ,t')) = (i,{\mathfrak {p}},\omega ',t)\)

3.4 The MiCRO strategy

We now recall the MiCRO negotiation strategy, which we introduced in [1]. See Algorithm 1 for its implementation in pseudo-code.

Simply stated, MiCRO works as follows: whenever the opponent proposes a new offer, MiCRO also replies with a new offer. This offer is always the agent’s offer with highest utility for itself that the agent has not yet proposed before. On the other hand, when the opponent repeats an offer it has already proposed before, then MiCRO also replies with an offer it has already proposed before.

More formally, let \(\alpha _1\) denote an agent that applies the MiCRO strategy, and \(\alpha _2\) its opponent (which may be applying any arbitrary strategy), and let \(K:= |\Omega |\) denote the size of the domain. Before the negotiations begin, our agent \(\alpha _1\) creates a list \((\omega _1, \omega _2, \dots , \omega _K)\) containing all offers in the domain, sorted in order of decreasing utility for itself. That is, \(u_1(\omega _1) \ge u_1(\omega _2) \ge \dots \ge u_1(\omega _K)\). Then, whenever it is \(\alpha _1\)’s turn to make a proposal, it counts how many different offers it has so far received from the opponent (we denote this number by n), and how many different offers it has so far proposed to the opponent (we denote this number by m)Footnote 2. If \(m \le n\) then MiCRO will propose \(\omega _{m+1}\). On the other hand, if \(m>n\) then it picks a random integer r such that \(1\le r \le m\) and proposes \(\omega _r\). Of course, it should never propose any offer that is below its reservation value, so in case \(u_1(\omega _{m+1}) < rv_1\), it also just repeats a random previous proposal, even if \(m \le n\) (see Algorithm 1, lines 9–12 and 23).

The intuition behind MiCRO, is that it is a kind of Tit-for-Tat strategy [23] that does not use any knowledge about the opponent’s utility function. That is, MiCRO tries to make the same number of concessions as the opponent, but it does not care how large the opponent’s concessions are. After all, since the opponent’s utility is unknown, the size of the opponent’s concession as perceived by MiCRO says nothing about the size of the concession the opponent intended to make. The opponent might make a large concession in terms of its own utility, but this may result in a very small concession measured in our agent’s utility. For the same reason MiCRO never makes large concessions to its opponent. In fact, it always makes exactly the smallest possible concession: it just proposes the next offer from its list. Another difference between MiCRO and classic TFT is that MiCRO uses a different definition of ‘concession’. That is, even if the opponent’s new proposal offers less utility to MiCRO than the opponent’s previous proposal, MiCRO still considers this a concession, as long as it is different from any of the opponent’s previous offers. After all, if the opponent makes offers in order of decreasing utility for itself, then every proposal is indeed a concession from his point of view.

For the rest of this paper we will assume a small adaptation to the original definition of MiCRO. That is, suppose, as above that \(m\le n\). Then, instead of directly proposing \(\omega _{m+1}\), it will first do the following:

  1. 1.

    If \(\omega _{m+1}\) was already proposed earlier by the opponent, then propose (or accept) \(\omega _{m+1}\).

  2. 2.

    Otherwise, check if there exists any other offer \(\omega\) such that \(u_1(\omega ) = u_1(\omega _{m+1})\) and such that \(\omega\) has already been proposed by the opponent, but not yet by our agent. If that is the case, then swap \(\omega\) and \(\omega _{m+1}\) on the list. Then, propose (or accept) \(\omega\).

This is also displayed in Algorithm 1, lines 13–22. Note that, since the two offers that are swapped have the same utility, the list still remains sorted in order of decreasing utility.

While MiCRO can be combined with various acceptance conditions, in this paper we will always assume it accepts a received offer if and only if it is better than or equal to the lowest offer it is, at that time, willing to propose. More precisely, if agent \(\alpha _1\) applies MiCRO and we define:

$$\begin{aligned} \omega _{low} := {\left\{ \begin{array}{ll} \omega _{m+1} &{} \text {if }\ m \le n \\ \omega _m &{} \text {if }\ m > n \end{array}\right. } \end{aligned}$$
(1)

(with m and n defined as before) then a received offer \(\omega\) is accepted by \(\alpha _1\) iff \(u_1(\omega ) \ge \max \{ u_1(\omega _{low}), rv_1\}\) . See also Algorithm 1, lines 1–8.

One might think that MiCRO is very slow, because it makes concessions of minimal size. However, in practice it turns out that the opposite is true: it is very fast because it does not have to update any opponent models. Furthermore, the time it takes to sort all the offers in the offer space turned out to be negligible in our experiments in [1].

4 Experimental results

The focus of this paper is purely on the theoretical properties of MiCRO because the experimental evidence of the strength of MiCRO has already been presented in other work. Nevertheless, we think it is useful to here briefly summarize the experimental results of MiCRO.

In our paper that introduced MiCRO [1], we presented several experiments in which MiCRO competed against top agents from the ANAC competitions of 2012, 2013, 2018, and 2019. It was shown that MiCRO consistently outperformed each of those agents, both in terms of a tournament evaluation and in terms of an empirical game-theoretical evaluation.

Algorithm 1
figure a

Describes how MiCRO decides which offer to propose or accept (note that here offers[m] corresponds to ωm + 1 in the text).

Furthermore, MiCRO was submitted to the ANAC 2022 competition. While it only ended in 9th place out of 19 participants, it was later shown [33] that it was in fact the best participant from a game-theoretical point of view. That is, it was shown to form the best empirical Nash equilibrium among all strategies submitted. The reason that MiCRO still ended in a relatively low position, was that MiCRO did not perform well against lower classified agents. However, MiCRO clearly outperformed the top agents when looking only at direct confrontations with those top agents. This is consistent with the notion of game-theoretical optimality, which assumes that opponents are rational and therefore do not choose a weaker strategy.

Finally, MiCRO was submitted again to ANAC in 2023 and ended in second place, out of 15 participants.

5 Consistent negotiation strategies

In this section we present the formal definition of a consistent negotiation strategy.

5.1 Motivation

Ideally, we would like to show that MiCRO is a best response against itself, among all possible negotiation strategies, but unfortunately this is not true. As we will see in Sect. 9, there are strategies that do form a better response against MiCRO, but, as far as we can tell, such strategies always seem to require detailed knowledge of the opponent’s utility function. Since the field of automated negotiation typically assumes that such knowledge is not available we could dismiss such strategies, and instead only try to show that MiCRO is a best response against itself among all strategies that do not require such knowledge. However, it turns out that this is very difficult to formalize. Instead, we will do something else. We will prove that MiCRO is a best response against itself, among all possible consistent strategies, which we define below.

Our justification for focusing only on consistent strategies, is that we argue that any rational agent would normally be consistent. The only reason to follow an inconsistent strategy would be if you want to behave irrationally on purpose to mislead and exploit an opponent that assumes that you are rational. We will show an example of such an inconsistent strategy in Sect. 9. Furthermore, we argue that such inconsistent strategies only work if they have precise knowledge of the opponent’s utility function and strategy. Moreover, we will show at the end of Sect. 7 that the assumption that the opponents are consistent can be weakened, and in Sect. 9 we will show that, with some small adaptations to MiCRO, it can be weakened ever further.

Remark 3

We argue informally that, without detailed knowledge of the opponent’s utility function and strategy, a rational agent would not have any reason to follow an inconsistent strategy (except perhaps, when negotiations are very close to the deadline).

We will not attempt to prove or even formalize this claim. We simply leave it up to the readers themselves to judge whether this is a reasonable belief or not. Of course, the formal claims we make in this paper do not depend on this belief.

The main advantage of focusing only on consistent strategies, is that we do not have to make any assumptions about the knowledge that agents have about each other’s utility functions.

Remark 4

All theorems and lemmas in this paper hold regardless of whether or not agents have knowledge about each other’s utility functions.

5.2 Definition of consistency

In the following, we will first define the notions of an inconsistent proposal (Definition 11), an inconsistent acceptance (Definition 12), and an inconsistent rejection (Definition 13), and then we define a consistent strategy (Definition 14) as one that never takes any such inconsistent actions. Since these definitions are somewhat involved, we strongly recommend the reader to carefully look at the examples we give before each of them, so as to convince themselves that it would indeed be irrational to follow an inconsistent strategy.

Example 1

Suppose a buyer and a seller are negotiating the price of some item. Initially, the seller asks a price of $100, while the buyer offers only $75. Then, at some later time t, the seller decides to drop his price and ask $50. Clearly, this would be silly, since the buyer has already indicated she is willing to pay $75. Obviously, any rational seller would accept the offer of $75, rather than conceding to $50. Of course, it may happen that at that point the buyer’s offer of $75 is no longer valid (e.g. because the protocol does not allow to accept offers from earlier rounds), but even then the seller should at least try to re-propose the offer of $75 first, before dropping to $50.

We say that at time t, the proposal of $75 has ‘strict priority’ for the seller over the proposal of $50, and since the seller proposes the offer of $50 anyway, even though there exists another offer that has strict priority over it, we say he is making an ‘inconsistent proposal’.

For a more detailed example, see Table 1.

Table 1 Example of an inconsistent proposal

These concepts are formalized in the following two definitions.

Definition 10

Let \(s\) be a negotiation state, let \(t \in {\mathbb {R}}^+\) be any time, and let \(\omega\) and \(\omega '\) be any two different offers. Then we say that \(\omega '\) has strict priority over \(\omega\), for agent \(\alpha _i\) at time t, if the following three conditions all hold:

  1. 1.

    For agent \(\alpha _i\), the offer \(\omega '\) is better than \(\omega\):    \(u_i(\omega ') > u_i(\omega )\)

  2. 2.

    Its opponent already proposed \(\omega '\) at some earlier time \(t'\):

    \(\exists t': (3-i, {\mathfrak {p}}, \omega ', t') \in s_{<t}\)

  3. 3.

    \(\alpha _i\) itself has not yet re-proposed \(\omega '\) between \(t'\) and t:

    \(\not \exists t'' \in [t',t]: (i, {\mathfrak {p}}, \omega ', t'') \in s_{<t}\)

Or, if the following four conditions all hold:

  1. 4.

    Agent \(\alpha _i\) is indifferent between \(\omega\) and \(\omega '\):    \(u_i(\omega ') = u_i(\omega )\)

  2. 5.

    Its opponent already proposed \(\omega '\) at some earlier time \(t'\):

    \(\exists t': (3-i, {\mathfrak {p}}, \omega ', t') \in s_{<t}\)

  3. 6.

    \(\alpha _i\) itself has not yet re-proposed \(\omega '\) between \(t'\) and t:

    \(\not \exists t'' \in [t',t]: (i, {\mathfrak {p}}, \omega ', t'') \in s_{<t}\)

  4. 7.

    Its opponent has not yet proposed \(\omega\) before time t:

    \(\not \exists t': (3-i, {\mathfrak {p}}, \omega , t') \in s_{<t}\)

(note that conditions 2 and 3 are identical to conditions 5 and 6).

In other words, for any reasonable agent \(\alpha _i\), if the offer \(\omega '\) is better than \(\omega\), and \(\omega '\) has already been proposed by the opponent, then \(\alpha _i\) would first try to re-propose \(\omega '\), before proposing \(\omega\). Furthermore, if \(\alpha _i\) is indifferent between the two offers, and \(\omega '\) has already been proposed by the opponent, while \(\omega\) was not, then \(\alpha _i\) would also first try to re-propose \(\omega '\).

We say that an agent \(\alpha _i\) makes an inconsistent proposal, if it proposes some offer \(\omega\), while at that moment some other offer \(\omega '\) has strict priority over \(\omega\) for \(\alpha _i\).

Definition 11

Let \(s\) be a negotiation state. Then, we say that \(\alpha _i\) makes an inconsistent proposal iff there exist \(\omega\), \(\omega ' \in \Omega\) and \(t\in {\mathbb {R}}\) such that the following two conditions both hold:

  • \((i, {\mathfrak {p}}, \omega , t) \in s\)

  • At time t offer \(\omega '\) has strict priority over \(\omega\) for agent \(\alpha _i\) (see Definition 10).

Note that in [1] we used a slightly different definition of ‘inconsistent proposal’. Specifically, we did not include Condition 7 of Definition 10. This was an error, since in that case it could happen that \(\omega\) has strict priority over \(\omega '\), while at the same time \(\omega '\) also has strict priority over \(\omega\), which would make it impossible to be consistent. Therefore, the definition given here should be considered the correct one.

Example 2

Suppose again that the seller initially rejects an offer of $75 from the buyer, but this time it is the buyer that later drops her price and offers $50. If the seller accepts this offer it would be inconsistent, because if he is willing to accept $50, then he should have certainly accepted the offer of $75. Therefore, we say this acceptance was an ‘inconsistent acceptance’, unless the original offer of $75 is no longer available and the seller has already tried to re-propose that offer himself, without success.

For a more detailed example, see Table 2.

Table 2 Example of an inconsistent acceptance

In general, an agent \(\alpha _i\) makes an inconsistent acceptance if it accepts an offer \(\omega\) after rejecting a strictly better offer \(\omega '\), unless \(\alpha _i\) itself has already re-proposed \(\omega '\).

Definition 12

For some given state \(s\), an agent \(\alpha _i\) makes an inconsistent acceptance iff there exist \(\omega\), \(\omega ' \in \Omega\) and \(t, t'\in {\mathbb {R}}\) such that all of the following conditions hold:

  • Agent \(\alpha _i\) accepts offer \(\omega\), at time t:    \((i, {\mathfrak {a}}, \omega , t) \in s\)

  • Agent \(\alpha _i\) prefers \(\omega '\) over \(\omega\):    \(u_i(\omega ') > u_i(\omega )\)

  • The opponent has already proposed \(\omega '\), at time \(t'\) before time t:

    $$\begin{aligned}(3-i, {\mathfrak {p}}, \omega ', t') \in s_{<t}\end{aligned}$$
  • Between the time \(t'\) that the opponent proposed \(\omega '\) and the time t that \(\alpha _i\) accepted \(\omega\), agent \(\alpha _i\) did not try to re-propose \(\omega '\):

    $$\begin{aligned}\not \exists t'': \quad t'< t'' < t \quad \wedge \quad (i, {\mathfrak {p}}, \omega ', t'') \in s\end{aligned}$$

Example 3

Suppose that at some point, the seller proposes a price of $80, but then later he rejects a better offer of $120 from the buyer. We call this is an ‘inconsistent rejection’.

For a more detailed example, see Table 3.

Table 3 Example of an inconsistent rejection

In general, we say an agent \(\alpha _i\) makes an inconsistent rejection whenever it rejects an offer \(\omega\) that is better than or equal to some offer \(\omega '\) that \(\alpha _i\) itself has already proposed earlier.

Definition 13

We say that, for some given state \(s\), \(\alpha _i\) makes an inconsistent rejection iff there exist \(\omega ,\omega ' \in \Omega\) (possibly \(\omega = \omega '\)), and \(t\in {\mathbb {R}}\) such that all of the following conditions hold:

  • Agent \(\alpha _i\) rejects \(\omega\) at time t (see Definition 9):    \(rej_s(i,\omega ,t)\)

  • Agent \(\alpha _i\) proposed offer \(\omega '\) at some earlier time:    \(\exists t': (i, {\mathfrak {p}}, \omega ', t') \in s_{<t}\)

  • For \(\alpha _i\), offer \(\omega\) is better than or equal to \(\omega '\):    \(u_i(\omega ) \ge u_i(\omega ')\)

Definition 14

We say an agent or a negotiation strategy is consistent if it never makes any inconsistent proposals, inconsistent acceptances, or inconsistent rejections.

While we think it should be obvious that inconsistent proposals and acceptances do not make sense for a rational agent, it may be less obvious for inconsistent rejections. After all, it is possible that an agent \(\alpha _1\) proposes some offer \(\omega\), but later, thanks to some opponent modeling algorithm, learns that its opponent \(\alpha _2\) may be willing to accept certain offers that \(\alpha _1\) prefers over \(\omega\). Therefore, \(\alpha _1\) could change its mind and refuse to accept \(\omega\). However, this would mean that \(\alpha _1\) plays a rather weak strategy, because it means that when \(\alpha _1\) proposed \(\omega\), it made a concession that was too large, too early.

The following theorem is proved in “Appendix A”.

Theorem 1

MiCRO is consistent.

5.3 Useful Lemmas

We will now prove two lemmas that will be important in the rest of the paper.

The following lemma plays a key role in the proofs of Theorems 2 and 3.

Lemma 1

Suppose that \(\alpha _1\) applies MiCRO and that \(\alpha _2\) is consistent. Furthermore, let \(s\) denote the final state of a negotiation (under the AOP) between the two agents, and suppose there are three offers \(\omega _i\), \(\omega _j\), and \(\omega _k\) (with \(\omega _j \ne \omega _k\)) such that the following conditions all hold:

  1. 1.

    \(u_1(\omega _i) > u_1(\omega _k)\)

  2. 2.

    \(u_2(\omega _i) \ge u_2(\omega _j)\)

  3. 3.

    At some point during the negotiations \(\alpha _2\) has proposed \(\omega _j\), i.e.: \(\exists t: (2, {\mathfrak {p}}, \omega _j, t)\in s\).

Then the negotiations did not end with \(\omega _k\) as the accepted offer (i.e.: \(\Pi _{stat}(s) \ne \omega _k\)).

Proof

Suppose the contrary, i.e. that \(\omega _k\) is the accepted offer. Let us define \(t_{acc}\) to be the time at which \(\omega _k\) was accepted. We know that before that, \(\alpha _1\) must have already proposed \(\omega _i\) (by Condition 1 and the definition of MiCRO), and that \(\alpha _2\) must have already proposed \(\omega _j\) (by Condition 3). In other words, we know that there are two numbers, t and \(t'\), such that:

  • \((1,{\mathfrak {p}}, \omega _i, t')\in s_{<t_{acc}}\)

  • \((2, {\mathfrak {p}}, \omega _j, t)\in s_{<t_{acc}}\)

This means we can consider two separate cases, namely the case that \(t<t'\) and the case that \(t'<t\) (note that \(t=t'\) is impossible, because the AOP does not allow two agents to make a proposal at the same time). We can alternatively denote these two cases as \((1, {\mathfrak {p}}, \omega _i, t')\in s_{>t}\) and \((1, {\mathfrak {p}}, \omega _i, t') \in s_{<t}\). Now, the second case can be further split up into two subcases 2a and 2b depending on whether or not \(\alpha _2\) has proposed \(\omega _i\) before before t. Then, case 2b can be split up again into two subcases 2b1 and 2b2 , depending on whether \(u_2(\omega _i)\) is greater than \(u_2(\omega _j)\), or equal. And finally, case 2b2 can again split up into two smaller cases, depending on whether or not \(\alpha _1\) has proposed \(\omega _j\) before t. So, we now have five separate cases, and we will show for each of them that there is a contradiction.

Case 1: \((1, {\mathfrak {p}}, \omega _i, t')\in s_{>t}\)

In this case, if \(\alpha _2\) replies to \((1, {\mathfrak {p}}, \omega _i, t')\) by accepting \(\omega _i\) we would have a contradiction because the assumption was that the negotiations end with \(\omega _k\) as the accepted offer. On the other hand, if \(\alpha _2\) does not accept \(\omega _i\), then this would be an inconsistent rejection (since \(u_2(\omega _i) \ge u_2(\omega _j)\) and \(\alpha _2\) has already proposed \(\omega _j\) at time t), so again we have a contradiction because \(\alpha _2\) was supposed to be consistent. A third possibility would be that \(\alpha _2\) does not reply to \((1, {\mathfrak {p}}, \omega _i, t')\) at all, because the deadline passes, but that again would be in contradiction with the assumption that the agents agree on \(\omega _k\).

Case 2a: \((1, {\mathfrak {p}}, \omega _i, t') \in s_{<t} \quad \wedge \quad \exists t'' : (2, {\mathfrak {p}}, \omega _i, t'')\in s_{<t}\)

In this case, note that at time t both agents have previously proposed \(\omega _i\). Clearly, if either of them accepted the offer, then we would have a contradiction with the assumption that \(\omega _k\) was going to be the accepted offer. However, if neither of them accepted it, then it means that one of them is rejecting an offer he already proposed himself before, which is an inconsistent rejection (this is a special case of Definition 13 with \(\omega _i = \omega = \omega '\)).

Case 2b1: \((1, {\mathfrak {p}}, \omega _i,t') \in s_{<t} \quad \wedge \quad \not \exists t'' : (2, {\mathfrak {p}}, \omega _i, t'')\in s_{<t} \quad \wedge \quad u_2(\omega _i) > u_2(\omega _j)\)

In this case, we have that \(\omega _i\) has strict priority over \(\omega _j\) for \(\alpha _2\) at time t, so \((2, {\mathfrak {p}}, \omega _j,t)\) was an inconsistent proposal.

Case 2b2a: \((1, {\mathfrak {p}}, \omega _i,t') \in s_{<t} \quad \wedge \quad \not \exists t'' : (2, {\mathfrak {p}}, \omega _i, t'') \in s_{<t} \quad \wedge \quad u_2(\omega _i) = u_2(\omega _j) \quad \wedge \quad \not \exists t'' : (1, {\mathfrak {p}}, \omega _j, t'')\in s_{<t}\)

In this case, again, we have that \(\omega _i\) has strict priority over \(\omega _j\) for \(\alpha _2\) at time t, so \((2, {\mathfrak {p}}, \omega _j,t)\) was an inconsistent proposal.

Case 2b2b: \((1, {\mathfrak {p}}, \omega _i,t') \in s_{<t} \quad \wedge \quad \not \exists t'' : (2, {\mathfrak {p}}, \omega _i, t'')\in s_{<t} \quad \wedge \quad u_2(\omega _i) = u_2(\omega _j) \quad \wedge \quad \quad \exists t'' : (1, {\mathfrak {p}}, \omega _j, t'') \in s_{<t}\)

In this case, \(\alpha _1\) itself has already proposed \(\omega _j\) before \(\alpha _2\) proposed it at time t. This means that \(\alpha _1\) would then reply to \(\alpha _2\)’s proposal by accepting \(\omega _j\) (by definition of MiCRO). However, since we have assumed that the negotiations end with agreement \(\omega _k\), and that \(\omega _j \ne \omega _k\), we again have a contradiction. \(\square\)

The next lemma is a small variation of Lemma 1, and will be useful in our proof of Proposition 1. Compared to Lemma 1 we drop the assumption that \(\omega _j \ne \omega _k\), but at the expense of the stronger condition \(u_2(\omega _i) > u_2(\omega _j)\).

Lemma 2

Suppose that \(\alpha _1\) applies MiCRO and that \(\alpha _2\) is consistent. Furthermore, let \(s\) denote the final state of a negotiation (under the AOP) between the two agents, and suppose there are three offers \(\omega _i\), \(\omega _j\), and \(\omega _k\) (possibly with \(\omega _j = \omega _k)\) such that the following conditions all hold:

  1. 1.

    \(u_1(\omega _i) > u_1(\omega _k)\)

  2. 2.

    \(u_2(\omega _i) > u_2(\omega _j)\)

  3. 3.

    At some point during the negotiations \(\alpha _2\) has proposed \(\omega _j\), i.e.:

    \(\exists t: (2, {\mathfrak {p}}, \omega _j, t)\in s\).

then negotiations did not end with \(\omega _k\) as the accepted offer.

Proof

To prove this we simply refer to the proof of Lemma 1. The main difference, is that we now have to consider that \(\omega _j\) and \(\omega _k\) could be equal. However, in the proof of Lemma 1 the fact that they were different only played a role in Case 2b2b, and since we are now assuming that \(u_2(\omega _i) > u_2(\omega _j)\), this case no longer applies. \(\square\)

6 The balance set

In this section we present the notion of the balance set. For any negotiation domain, the balance set is a subset of its offer space \(\Omega\). Its importance, is that it is the set of possible outcomes that two agents may agree upon if they both apply the MiCRO strategy (which we prove at the end of this section).

We will later show (in Sect. 8) that in many of the ANAC domains the balance set happens to coincide with the set of optimal agreements, which means that in such domains two MiCRO agents would always negotiate an optimal deal.

6.1 Definition of the balance set

In the following, for any real number x we define \(\Omega _i^x\) to be the set of all offers for which \(u_i(\omega )\ge x\).

$$\begin{aligned}\Omega _i^x:= \{\omega \in \Omega \mid u_i(\omega )\ge x\}\end{aligned}$$

Note that the set of offers that MiCRO is willing to propose or accept is always of the form \(\Omega _i^x\), with x decreasing every time the opponent makes a new new proposal.

Also, in the rest of this paper we will always assume that \((\omega _1, \omega _2, \dots \omega _K)\) denotes a list containing all offers in \(\omega\), sorted in order of decreasing utility for agent \(\alpha _1\) (where \(K = |\Omega |\) is the size of the domain), and that \(\pi\) is the permutation of the integers 1 to K, such that \((\omega _{\pi (1)}, \omega _{\pi (2)}, \dots \omega _{\pi (K)})\) is a list of all offers sorted in order of decreasing utility for agent \(\alpha _2\). Furthermore, for any integer i between 1 and K we define:

$$\begin{aligned}x_i:= u_1(\omega _i) \quad \text {and} \quad y_i:= u_2(\omega _{\pi (i)}).\end{aligned}$$

Definition 15

We define the balance index b to be the smallest integer for which \(\Omega _1^{x_b} \cap \Omega _2^{y_b}\ne \emptyset\), and we define the balance set to be the set \(\Omega _1^{x_b} \cap \Omega _2^{y_b}\). Furthermore, we define

$$\begin{aligned}x^\beta:= & {} min\{u_1(\omega ) \mid \omega \in \Omega _1^{x_b} \cap \Omega _2^{y_b}\}\\ y^\beta:= & {} min\{u_2(\omega ) \mid \omega \in \Omega _1^{x_b} \cap \Omega _2^{y_b}\}\end{aligned}$$

The values \(x^\beta\) and \(y^\beta\) are called the balance values of \(\alpha _1\) and \(\alpha _2\) respectively.

These concepts are also illustrated in Figs. 1 and 2. The intuitive idea is that if \(\alpha _1\) and \(\alpha _2\) both apply the MiCRO strategy, then \(x_i\) and \(y_i\) are the minimum utility values they are respectively willing to accept after they have both made \(i-1\) proposals, and \(\Omega _1^{x_i}\) and \(\Omega _2^{y_i}\) represent the sets of offers they are then respectively willing to accept. Initially, the two agents will only be willing to accept the offers in \(\Omega _1^{x_1} = \{\omega _1\}\) and \(\Omega _2^{y_1} = \{\omega _{\pi (1)}\}\) respectively, and their intersection will, in general, be disjoint. But, as the negotiation progresses, the number of unique proposals they have made increases, which means that i increases, which means that \(x_i\) and \(y_i\) decrease (as a function of i), and so \(\Omega _1^{x_i}\) and \(\Omega _2^{y_i}\) become larger. Then, after they have both made \(b-1\) unique offers, their intersection becomes nonempty, so at that point there are some offers that both agents are willing to accept. The balance set is by definition the set of these offers, and so the utility values the agents receive from the accepted offer must be greater than or equal to their balance values.

The balance index defines how many negotiation rounds are necessary in order for two MiCRO agents to come to an agreement. It turns out that for many domains that were used in ANAC, the balance index is much smaller than the total number of offers, meaning that MiCRO can still negotiate successfully even though the domain might seem too large. For example, the Smartphone domain from ANAC 2013 has a size of \(|\Omega | = 12000\), but its balance index is only 139, so two MiCRO agents only need to exchange 139 proposals each, to come to an agreement.

Fig. 1
figure 1

Diagram of the Ultimatum domain, which was used in ANAC 2013, containing nine offers. The horizontal axis represents the utility \(u_1\) of agent \(\alpha _1\), while the vertical axis represents the utility \(u_2\) of agent \(\alpha _2\). Each dot represents one offer. The red dot represents an offer that maximizes both the sum and the product of the utility of the two agents

Fig. 2
figure 2

The Ultimatum domain. Left: the blue lines are drawn through the values \(x_1\) and \(y_1\). Therefore, the area to the right of the vertical blue line represents \(\Omega _1^{x_1}\) and the area above the horizontal blue line represents \(\Omega _2^{y_1}\). We see that \(\Omega _1^{x_1} \cap \Omega _2^{y_1} = \emptyset\). Center: the blue lines are drawn through the values \(x_2\) and \(y_2\). Therefore, the area to the right of the vertical blue line represents \(\Omega _1^{x_2}\) and the area above the horizontal blue line represents \(\Omega _2^{y_2}\). We see that \(\Omega _1^{x_2} \cap \Omega _2^{y_2} = \emptyset\). Right: the blue lines are drawn through the values \(x_3\) and \(y_3\). We now see that \(\Omega _1^{x_3} \cap \Omega _2^{y_3}\) contains exactly one offer. This means that the balance index equals 3, and that \(\Omega _1^{x_3} \cap \Omega _2^{y_3}\) is the balance set. Furthermore, we notice that the only element of the balance set happens to be exactly the offer that maximizes both the product and the sum of the agents’ utilities, so we say this is a balanced domain

From the definition it is immediately clear that \(x^\beta \ge x_b\) and \(y^\beta \ge y_b\) must hold. However, it may not be clear that these inequalities can sometimes be strict. Therefore, we show in Fig. 3 an example where indeed \(x^\beta > x_b\).

Fig. 3
figure 3

An example of a domain where \(x^\beta > x_b\). This domain has balance index \(b=3\). Note that the offer indicated in red is the only element of the balance set, and has utility values \(u_1(\omega ) = 0.7\) and \(u_2(\omega ) = 0.6\). So, these are also the balance values: \(x^\beta = 0.7\), \(y^\beta = 0.6\). However, we see that \(x_b = x_3 = 0.4\)

The following lemma is necessary for our proof of Lemma 4. Geometrically, it says that if we draw a diagram of the domain, such as in Figs. 1 and 2, we can always draw a horizontal line and a vertical line such that each element of the balance set is on one of those two lines.

Lemma 3

Let \(x^\beta\) and \(y^\beta\) denote the balance values. Then, for any offer \(\omega\) in the balance set we either have \(u_1(\omega ) = x^\beta\) or \(u_2(\omega ) = y^\beta\).

Proof

Let \(\omega _i \in \Omega _1^{x_{b}} \cap \Omega _2^{y_{b}}\) be some element from the balance set. We need to prove that either \(u_1(\omega _i) = x^\beta\) or \(u_2(\omega _i) = y^\beta\). In fact, we will prove the slightly stronger conclusion that either \(u_1(\omega _i) = x_b\) or \(u_2(\omega _i) = y_b\).

Recall that, by definition of the balance set, \(\Omega _1^{x_{b-1}} \cap \Omega _2^{y_{b-1}}\) must be empty, so we have \(\omega _{i} \not \in \Omega _1^{x_{b-1}} \cap \Omega _2^{y_{b-1}}\), which means that either \(x_{b-1} > u_1(\omega _{i})\) or \(y_{b-1}>u_2(\omega _{i})\). Furthermore, since \(\omega _{i} \in \Omega _1^{x_b} \cap \Omega _2^{y_b}\) we have \(u_1(\omega _{i}) \ge x_b\) and \(u_2(\omega _{i}) \ge y_b\). Combining these facts we conclude that we must have:

$$\begin{aligned} x_{b-1}> u_1(\omega _{i}) \ge x_b \quad \text {or} \quad y_{b-1} > u_2(\omega _{i}) \ge y_b. \end{aligned}$$
(2)

Now, recall that \(x_{b-1}\) and \(x_b\) are the utility values for \(\alpha _1\) of \(\omega _{b-1}\) and \(\omega _b\), and these two offers are consecutive in \(\alpha _1\)’s sorted list, so there is no offer \(\omega\) for which \(x_{b-1}> u_1(\omega ) > x_b\). And for the same reason there is no offer for which \(y_{b-1}> u_2(\omega ) > y_b\). Combining this with (2), we conclude that we must have either \(u_1(\omega _{i}) = x_b\) or \(u_2(\omega _{i}) = y_b\), but \(x_b\) and \(y_b\) are the lowest utility values that any offer in \(\Omega _1^{x_b} \cap \Omega _2^{y_b}\) could possibly have, so if \(u_1(\omega _{i}) = x_b\) then \(u_1(\omega _{i}) = x^\beta\) and similarly, if \(u_2(\omega _{i}) = y_b\) then \(u_2(\omega _{i}) = y^\beta\). \(\square\)

6.2 Pareto optimality

The concepts defined in the following definition are common in the literature, but we repeat them here for the sake of self-containment.

Definition 16

We say an offer \(\omega _i\) weakly dominates another offer \(\omega _j\) if \(u_1(\omega _i) \ge u_1(\omega _j)\) and \(u_2(\omega _i) \ge u_2(\omega _j)\), and at least one of these inequalities is strict. We say that \(\omega _i\) strongly dominates \(\omega _j\) if both inequalities are strict. An offer \(\omega _i\) is weakly Pareto-optimal if it is not strongly dominated by any other offer, and we say \(\omega _i\) is strongly Pareto-optimal if it is not weakly dominated by any other offer.

Lemma 4

All elements of a balance set are weakly Pareto optimal.

Proof

Suppose that \(\omega _i\) is an element of the balance set \(\Omega _1^{x_b} \cap \Omega _2^{y_b}\) which is not weakly Pareto optimal. So, there exists another offer \(\omega _j\) with \(u_1(\omega _j) > u_1(\omega _i)\) and \(u_2(\omega _j) > u_2(\omega _i)\). Clearly, this means that \(\omega _j \in \Omega _1^{x_b} \cap \Omega _2^{y_b}\), so \(\omega _j\) is also in the balance set. Now, note that by definition of the balance values we have \(u_1(\omega _i) \ge x^\beta\) and \(u_2(\omega _i) \ge y^\beta\), and by Lemma 3, we must have either \(u_1(\omega _j) = x^\beta\) or \(u_2(\omega _j) = y^\beta\). Combined this means that either \(u_1(\omega _i) \ge u_1(\omega _j)\) or \(u_2(\omega _i) \ge u_2(\omega _j)\), which contradicts our assumption about \(\omega _j\). \(\square\)

The following proposition will be useful later on to prove Theorem 2, but we think it is also interesting by itself.

Proposition 1

If MiCRO makes an agreement with a consistent opponent (under the AOP), then this agreement will be weakly Pareto-optimal.

Proof

We prove this by contradiction. Assume that \(\alpha _1\) applies MiCRO, and that the two agents make an agreement \(\omega _k\) that is not weakly Pareto-optimal, i.e. there exists some other offer \(\omega _i\) with \(u_1(\omega _i) > u_1(\omega _k)\) and \(u_2(\omega _i) > u_2(\omega _k)\).

Let \(s\) denote the final state of the negotiations, and t the time at which \(\omega _k\) was accepted. We consider two cases: 1) the case that \(\alpha _1\) accepted \(\omega _k\) and 2) the case that \(\alpha _2\) accepted \(\omega _k\).

Case 1: \(\mathbf {(1,{\mathfrak {a}},\omega _k,t)\in s}\). If \(\alpha _1\) accepted \(\omega _k\) then it must have been proposed by \(\alpha _2\). This means all conditions of Lemma 2 are satisfied, with \(\omega _j = \omega _k\), so indeed we have a contradiction.

Case 2: \(\mathbf {(2,{\mathfrak {a}},\omega _k,t)\in s}\). If \(\alpha _2\) accepted \(\omega _k\), then \(\alpha _1\) must have proposed it. Since MiCRO proposes all possible offers one by one in order of decreasing utility, this means that, at some earlier time \(t'\), it must also have proposed \(\omega _i\), i.e. \((1,{\mathfrak {p}},\omega _i,t')\in s_{<t}\). Note that the first three conditions of Definition 12 are now satisfied. So, if the fourth one is true as well then \(\alpha _2\) accepting \(\omega _k\) would be an inconsistent acceptance, which is a contradiction. On the other hand, if the fourth condition does not hold, we have \(\exists t'': t'< t'' < t \wedge (2, {\mathfrak {p}}, \omega _i, t'') \in s\) (i.e. at \(t''\) agent \(\alpha _2\) has re-proposed \(\omega _i\) after \(\alpha _1\) proposed it earlier at time \(t'\)). But in that case, by definition of MiCRO, \(\alpha _1\) would have accepted \(\omega _i\) directly after \(t''\), so the final agreement would have been \(\omega _i\) instead of \(\omega _k\). This is again a contradiction. \(\square\)

6.3 Importance of the balance values

In this subsection we will show that, as long as its opponents are consistent, MiCRO never makes agreements below its balance values.Footnote 3

Theorem 2

If an agent \(\alpha _i\) applies MiCRO and its opponent is consistent, then (under the AOP) they will never make any agreement \(\omega\) for which \(u_i(\omega )\) is below \(\alpha _i\)’s balance value.

Proof

We will assume (w.l.o.g.) that it is \(\alpha _1\) that applies MiCRO.

Let \(\omega _i\) be an offer in \(\Omega _1^{x_b} \cap \Omega _2^{y_b}\) such that \(x_i = x^\beta\) (i.e. \(\omega _i\) is an element in the balance set, for which \(\alpha _1\)’s utility is minimal) and let \(\omega _k\) be any offer for which \(x_i > x_k\). We need to prove that \(\omega _k\) will never be accepted. We distinguish two separate cases, namely 1) the case that \(x_b > x_k\) and 2) the case that \(x_k\ge x_b\).

Case 1: Suppose \(x_b > x_k\). We will show that there is an offer \(\omega _j\) such that the conditions of Lemma 1 are satisfied. First, note that the first condition of Lemma 1 is indeed satisfied because, as we already noted, \(x_i > x_k\). Furthermore, by definition of MiCRO, \(\alpha _1\) would only propose or accept \(\omega _k\) if \(\alpha _2\) has made at least \(k-1\) different proposals, and since \(x_b > x_k\), which implies \(b < k\), this means that \(\alpha _2\) must have made at least b different proposals. Note that by definition of the number \(y_b\) there are at most \(b-1\) proposals \(\omega\) for which \(u_2(\omega ) > u_2(\omega _{\pi (b)}) = y_b\), so \(\alpha _2\) must have proposed at least one proposal \(\omega _j\) for which \(y_b \ge u_2(\omega _j)\). Finally, note that since \(\omega _i\) is in the balance set we have \(u_2(\omega _i) \ge y_b\), and thus \(u_2(\omega _i) \ge u_2(\omega _j)\). This means the second and third condition of Lemma 1 are also satisfied, which proves Case 1.

Case 2: Suppose \(x_k \ge x_b\). We first claim that in this case we have \(u_2(\omega _i) > u_2(\omega _k)\). Suppose the opposite, i.e. that \(u_2(\omega _k) \ge u_2(\omega _i)\). Note that from the assumption that \(\omega _i\) is in the balance set we have \(u_2(\omega _i) \ge y_b\), so we then must have \(u_2(\omega _k) \ge y_b\). But note that if both \(x_k\ge x_b\) and \(u_2(\omega _k)\ge y_b\) hold, then \(\omega _k \in \Omega _1^{x_b} \cap \Omega _2^{y_b}\), and therefore, by the definition of \(x^\beta\), we have \(x_k \ge x^\beta = x_i\), but that is in contradiction to our assumption that \(x_i > x_k\). So we conclude that \(u_2(\omega _i) > u_2(\omega _k)\) indeed holds. If we now combine this with the fact that \(x_i > x_k\) we conclude that \(\omega _i\) strictly dominates \(\omega _k\). Therefore, by Proposition 1, \(\omega _k\) could never be the accepted offer, which proves Case 2. \(\square\)

Corollary 1

If two agents that both apply the MiCRO strategy make an agreement (under the AOP), then that agreement must be an element of the balance set.

Proof

Suppose they make an agreement \(\omega\). Then, by combining Theorems 1 and  2, it follows that we must have \(u_1(\omega )\ge x^\beta\) and \(u_2(\omega )\ge y^\beta\). This means that, by definition of the balance values, we must also have \(u_1(\omega )\ge x_b\) and \(u_2(\omega )\ge y_b\), which in turn means that \(\omega\) is in the balance set. \(\square\)

When the balance set contains more than one offer, the question which of those offers two MiCRO agents would agree upon depends on who starts the negotiations. For more details about this we refer to “Appendix B”.

7 Equilibrium properties of MiCRO

In this section we will prove that MiCRO is a best response against itself. While our goal is to prove this under the assumption that the agents do not have knowledge about each other’s utility functions, we actually prove something much stronger, because we never actually use this assumption anywhere. That is, all our lemmas and theorems hold regardless of whether the agents have such knowledge or not.

Specifically, if agent \(\alpha _1\) applies MiCRO, then it does not matter how much information it has about the utility function of \(\alpha _2\), because the MiCRO strategy simply does not use such information. Furthermore, when we say that the best response for \(\alpha _2\) is to also use MiCRO, we mean that there is no better strategy for \(\alpha _2\) even if \(\alpha _2\) has full knowledge of \(\alpha _1\)’s utility function, and therefore it certainly does not have any better strategy if it does not have such knowledge.Footnote 4

7.1 MiCRO is an equilibrium strategy

Before we can prove our main theorem we need the following lemma.

Lemma 5

Suppose we have a balance set with two different offers \(\omega _i\) and \(\omega _k\), such that \(u_1(\omega _i) > u_1(\omega _k)\) and \(u_2(\omega _k) > u_2(\omega _i)\) (i.e. neither of the two weakly dominates the other). Then, there are exactly \(b-1\) offers \(\omega\) with \(u_1(\omega ) > u_1(\omega _k)\) and exactly \(b-1\) offers \(\omega\) with \(u_2(\omega ) > u_2(\omega _i)\), where b is the balance index.

Proof

By definition of the balance set we have \(u_1(\omega _k) \ge x_b\) and \(u_2(\omega _i) \ge y_b\), so we must have:

$$\begin{aligned}u_1(\omega _i)> x_b \quad \text {and} \quad u_2(\omega _k) > y_b\end{aligned}$$

We will only give the proof that there are exactly \(b-1\) offers \(\omega\) with \(u_1(\omega ) > u_1(\omega _k)\). The proof that there are also exactly \(b-1\) offers \(\omega\) with \(u_2(\omega ) > u_2(\omega _i)\) goes analogously.

First, suppose the number of offers \(\omega\) with \(u_1(\omega ) > u_1(\omega _k)\) is less than \(b-1\). Then, for the offer \(\omega _{b-1}\) that appears on position \(b-1\) in \(\alpha _1\)’s list, we would have \(u_1(\omega _k) \ge u_1(\omega _{b-1}) = x_{b-1}\). Furthermore, since \(u_2(\omega _k) > y_b\), we have that \(u_2(\omega _k) \ge y_{b-1}\), and thus that \(\omega _k \in \Omega _1^{x_{b-1}} \cap \Omega _2^{y_{b-1}}\). But that is not possible, because, by definition of the balance index, that set should be empty. On the other hand, if there were more than \(b-1\) offers \(\omega\) for which \(u_1(\omega ) > u_1(\omega _k)\), then we would have \(u_1(\omega _b) > u_1(\omega _k)\). But then \(\omega _k \not \in \Omega _1^{x_{b}}\) and thus \(\omega _k \not \in \Omega _1^{x_{b}} \cap \Omega _2^{y_{b}}\), which is in contradiction to the assumption that \(\omega _k\) is in the balance set. \(\square\)

We will next state our main theorem. Although it requires a fair amount of conditions, it happens that for all ANAC domains that we analyzed (those of ANAC 2012, 2013, and 2021), the first two conditions are indeed satisfied. Moreover, note that the second condition is only a very minor one, since we recall from Lemma 4 that the elements of a balance set are always at least weakly Pareto-optimal. Regarding the third condition, experiments in [1] have shown that the typical deadline of 180 s used in the ANAC competitions is indeed sufficiently long for two MiCRO agents to come to an agreement, so this last condition typically also holds.Footnote 5

Remark 5

All three conditions of Theorem 3 typically hold in (the main leagues of) the ANAC competitions.

Theorem 3

Consider the extensive-form game defined by a negotiation domain D and the alternating offers protocol with deadline T (as defined in Sect. 3.2), and assume that the following conditions hold:

  • The balance values of D lie above the agents’ respective reservation values.

  • All elements of the balance set of D are strongly Pareto-optimal.

  • The deadline T is sufficiently long for two MiCRO agents to come to an agreement.

Then, among all consistent negotiation strategies, MiCRO is a best response against itself.

Proof

Suppose \(\alpha _1\) plays MiCRO, and suppose that \(\alpha _2\) has two options to select as its strategy: MiCRO and some other consistent strategy \(\sigma\). We will show that if \(\alpha _2\) selects \(\sigma\), it will not be able to score higher than when it selects MiCRO.

First note that if both agents apply MiCRO, and the deadline is long enough, then the agents will certainly come to an individually rational agreement. This follows directly from the definition of MiCRO and the assumption that the balance values lie above the reservation values. Let us call this agreement \(\omega _i\).

Now, let us first suppose that if \(\alpha _2\) selects \(\sigma\), then they do not come to an agreement. In that case, the agents receive their respective reservation values, so \(\alpha _2\) would have been better off selecting MiCRO. So, for this case the theorem indeed holds.

Next, let us suppose that if \(\alpha _2\) selects \(\sigma\), then they do come to an agreement, which we will call \(\omega _k\). To prove the theorem we need to show that \(\alpha _2\) does not prefer \(\omega _k\) over \(\omega _i\). That is, we will show that the following assumption leads to contradiction:

$$\begin{aligned} u_2(\omega _k) > u_2(\omega _i) \end{aligned}$$
(3)

The rest of this proof consists of showing that the conditions of Lemma 1 are satisfied, so indeed we have a contradiction. We will do this in three steps: Firstly, we will show that \(u_1(\omega _i) > u_1(\omega _k)\) holds (which is the first condition of Lemma 1). Secondly, we will show that \(\omega _k\) must be in the balance set. Thirdly, we will use this fact to show that \(\alpha _2\) must have proposed at least one offer \(\omega _j\) for which \(u_2(\omega _i) \ge u_2(\omega _j)\) before \(\omega _k\) was accepted (the other two conditions of Lemma 1).

Step 1 We know from Corollary 1 that \(\omega _i\) must be in the balance set. Therefore, by Eq. (3) and the assumption that all elements of the balance set are strongly Pareto-optimal, we have: \(u_1(\omega _i) > u_1(\omega _k)\),

Step 2 Since \(\omega _i\) is in the balance set, we have \(u_2(\omega _i) \ge y_b\). Combined with Eq. (3), this leads to:

$$\begin{aligned} u_2(\omega _k) > y_b \end{aligned}$$
(4)

Furthermore, by Theorem 2 we must have \(u_1(\omega _k) \ge x^\beta\). And since by definition of the balance values we have \(x^\beta \ge x_b\), we have:

$$\begin{aligned} u_1(\omega _k) \ge x_b \end{aligned}$$
(5)

Combining Eqs. (4) and (5) we conclude that \(\omega _k\) is also in the balance set.

Step 3 We will now show that \(\alpha _2\) must have proposed at least one offer \(\omega _j\) with \(u_2(\omega _i) \ge u_2(\omega _j)\) before \(\omega _k\) was accepted. Since we know that \(\omega _i\) and \(\omega _k\) are both in the balance set, and since we have assumed that the elements of the balance set are strongly Pareto-optimal, we can apply Lemma 5. Thus, we know there are exactly \(b-1\) offers \(\omega\) with \(u_2(\omega ) > u_2(\omega _i)\). This means that, at any time during the negotiations, we can distinguish between the following four situations:

  • Situation 1: \(\alpha _2\) has made less than \(b-1\) different proposals.

  • Situation 2: \(\alpha _2\) has made more than \(b-1\) different proposals.

  • Situation 3: \(\alpha _2\) has made exactly \(b-1\) different proposals, and for at least one of them it holds that \(u_2(\omega _i) \ge u_2(\omega )\).

  • Situation 4: \(\alpha _2\) has made exactly \(b-1\) different proposals, and they are exactly those offers for which \(u_2(\omega ) > u_2(\omega _i)\).

In the first situation, \(\alpha _1\) would not yet be willing to propose or accept \(\omega _k\). This is because, by Lemma 5, we know there are exactly \(b-1\) offers that are strictly better for \(\alpha _1\) than \(\omega _k\), which means that \(\omega _k\) must be at position b or later on MiCRO’s list, and thus \(\alpha _1\) would only be willing to propose or accept it when \(\alpha _2\) has made at least \(b-1\) different proposals. So, \(\alpha _1\) will propose some other offer, and \(\alpha _2\) will not accept that offer, because that would contradict the assumption that negotiations end with \(\omega _k\) being accepted. Therefore, negotiations will continue until at least one of the three other situations has occurred. If Situation 2 or 3 occurs, then indeed we have shown what we wanted to show. Therefore, we are only left to consider Situation 4. In this situation \(\alpha _2\) must have already proposed \(\omega _k\) (by Eq. 3). The question is now how \(\alpha _1\) responds to this situation. We can consider the following options:

  • Response 1: \(\alpha _1\) accepts \(\omega _k\).

  • Response 2: \(\alpha _1\) proposes \(\omega _k\).

  • Response 3: \(\alpha _1\) accepts some other offer.

  • Response 4: \(\alpha _1\) proposes some other offer.

Now, it is important to understand that if \(\alpha _1\) and \(\alpha _2\) both play MiCRO, then at some point Situation 4 would also occur, and that \(\alpha _1\) should then reply to Situation 4 in exactly the same way, as when \(\alpha _2\) plays \(\sigma\). This is because, by definition of MiCRO, \(\alpha _1\)’s response only depends on which unique offers have been proposed by the opponent, and not on the order in which they were proposed.Footnote 6

We therefore conclude that Response 1 cannot occur, because if \(\alpha _1\) accepts \(\omega _k\), then it would do the same if \(\alpha _2\) plays MiCRO, but that would be in contradiction with the assumption that MiCRO vs. MiCRO results in \(\omega _i\) being accepted.

For the same reason, Response 2 also cannot occur, because if \(\alpha _1\) proposes \(\omega _k\) when \(\alpha _2\) plays MiCRO, then \(\alpha _2\) would accept it (since \(\alpha _2\) itself had already proposed \(\omega _k\) earlier), but that would again be in contradiction to the assumption that MiCRO vs. MiCRO results in \(\omega _i\) being accepted.

Furthermore, the fact that Response 3 cannot occur is trivial, because we have assumed that the agents agree on \(\omega _k\).

So, we conclude that Response 4 must occur, which means that \(\alpha _1\) will propose some offer that is not \(\omega _k\), and therefore \(\alpha _2\) will not accept that offer (because we assumed only \(\omega _k\) would be accepted). This means that negotiations must continue until either Situation 2 or Siutation 3 has occurred, which means that at some point \(\alpha _2\) must propose some offer \(\omega _j\) with \(u_2(\omega _i) \ge u_2(\omega _j)\), which is what we wanted to show.

Combining this result with the result of Step 1, we see that all conditions of Lemma 1 are satisfied, so we have a contradiction. \(\square\)

7.2 Semi-consistent strategies

We will now present a stronger version of Theorem 3, which allows the opponents to make inconsistent acceptances.

Definition 17

We say a strategy is semi-consistent if it never makes any inconsistent proposals or inconsistent rejections (but it may make inconsistent acceptances).

Theorem 4

Suppose the same conditions as Theorem 3. Then, among all semi-consistent negotiation strategies, MiCRO is a best response against itself.

Proof

Suppose again that \(\alpha _1\) plays MiCRO, and that \(\alpha _2\) plays some alternative strategy \(\sigma\). If \(\alpha _2\) does not make any inconsistent acceptance, then it behaves like a consistent strategy, so by Theorem 3\(\sigma\) cannot be a better response against MiCRO. On the other hand, suppose that \(\alpha _2\) does accept some offer \(\omega\) and that this is an inconsistent acceptance. This means that at some earlier time \(\alpha _1\) proposed a better offer \(\omega '\). Then, \(\alpha _2\) would have been better off accepting that offer, and thus \(\sigma\) was not a best response. \(\square\)

7.3 Is MiCRO versus MiCRO a subgame perfect equilibrium?

We have shown that MiCRO versus MiCRO forms a Nash equilibrium. However, since we defined negotiations as an extensive-form game, an important question is whether it is also a subgame perfect equilibrium (SPE). We will here quickly discuss this question, informally. Recall that a pair of strategies is an SPE iff, for every game state \(s\), it is a Nash equilibrium on the subgame starting at \(s\).

First, note that the common way to prove subgame perfection is to use backward induction. However, we cannot use this technique for negotiations, because, as we recall from Sect. 3.2, the state transition is determined by a random variable which can have an infinite number of possible values. Therefore, each state has an infinite number of possible successor states, which makes backward induction difficult to apply. Instead, we use a different approach.

Suppose we have a state \(s= (a_1, a_2, \dots , a_k)\) and let \(t_k\) be the time of \(a_k\). Then, as long as \(t_k\) is small enough compared to T, the proof of Theorem 3 continues to hold when the negotiations reach state \(s\). In other words, for any state \(s\) that is far enough from the deadline, MiCRO vs. MiCRO is indeed a Nash equilibrium for the subgame starting at \(s\).

However, if the the negotiations reach a state that is too close to the deadline, it may happen that there is not enough time for two MiCRO agents to make an agreement, and so the agents would be better off if they deviated to a strategy that concedes faster, in order to secure a deal. Therefore, MiCRO vs. MiCRO is technically not an SPE.

This means that an agent could try to stall the negotiations on purpose, so as to reach a state where the agents are forced to deviate to a different strategy. However, under the conditions of Theorem 3, for at least one of the two agents this would result in a worse outcome (since any deal between MiCRO and MiCRO would be Pareto optimal). So, an agent that tries to apply this approach would have to make sure it concedes slower than its opponent. But that would mean that if two agents both tried to apply this approach, then they would fail to make an agreement. Therefore, trying to stall the negotiations in order to reach a subgame where MiCRO vs. MiCRO is not a Nash equilibrium will not yield any benefit.

We therefore argue, informally, that, even though MiCRO vs. MiCRO is technically not an SPE, the players cannot benefit from deviating to any other consistent strategy. That is, it forms a Nash equilibrium on all relevant subgames, and therefore still captures the essence of an SPE.

8 Optimality of MiCRO in the ANAC domains

Theorem 3 shows that, under the given conditions, MiCRO vs. MiCRO forms a Nash equilibrium. By itself this does not necessarily mean that MiCRO is an optimal strategy, because there could be other strategies that also form Nash equilibria but with better outcomes. However, in this section we will show that on many of the ANAC domains the agreement made between two MiCRO agents is, in fact, an optimal one.

8.1 Optimality

We first need to formally define the notion of an optimal agreement. This is a complex topic and many different definitions have been proposed in the literature. Here, we will focus on two such definitions: the Nash Bargaining Solution, and the Maximum Social Welfare Solution.

Nash proved that, under certain assumptions, an optimal agreement between two negotiators, is one that maximizes the product of the agents’ utilities [3]. This is known as the Nash bargaining solution (NBS).

Definition 18

The Nash bargaining solution is a set of offers \({\mathcal {N}} \subseteq \Omega\) defined as:

$$\begin{aligned} {\mathcal {N}} := {{\,\mathrm{arg\,max}\,}}_{\omega \in \Omega } \{(u_1(\omega ) - rv_1)\cdot (u_2(\omega ) - rv_2) \} \end{aligned}$$
(6)

However, one of Nash’s assumptions is that the offer space is a convex set, which is not the case for the ANAC domains, which have discrete offer spaces, so the NBS may not be the right solution concept for the ANAC domains. Another consequence of the non-convexity of the ANAC domains, is that the NBS is not always a single offer, but rather a set of offers.

It was therefore shown in [5] that in a tournament setting with finite domains, two optimal negotiation strategies would in fact come to the agreement that maximizes the sum of the agents’ utilities, rather than the product. We will here refer to this as the maximum social welfare solution (MSWS). Note that, just like the NBS, the MSWS may return a set of offers, rather than a single offer.

Definition 19

The maximum social welfare solution is a set of offers \(\mathcal{S}\mathcal{W} \subseteq \Omega\) defined as:

$$\begin{aligned} \mathcal{S}\mathcal{W} := {{\,\mathrm{arg\,max}\,}}_{\omega \in \Omega } \{u_1(\omega ) + u_2(\omega ) \} \end{aligned}$$
(7)

Since we do not want to get into the discussion here which of these solution concepts is better, we will consider both of them. Furthermore, we will see that in most of the ANAC domains the two concepts coincide anyway.

8.2 Balanced negotiation domains

We now define the notion of a balanced negotiation domain. On such domains, any agreement made by two MiCRO agents is always an optimal one. So, two MiCRO agents are, by definition, optimal on such domains. Surprisingly, it turns out that many of the ANAC domains indeed happen to be balanced, or close to balanced, even though we do not see any obvious reason why this would have to be true.

Definition 20

Given some negotiation domain \(D\), let \(\mathcal{S}\mathcal{C}\subseteq \Omega\) be some set of Pareto-optimal offers that are considered ‘optimal’ (according to some solution concept) and let \({\mathcal {B}}\) denote the balance set of \(D\). Then we say the domain is balanced with respect to \(\mathcal{S}\mathcal{C}\), if and only if \({\mathcal {B}} \subseteq \mathcal{S}\mathcal{C}\).

In other words, a domain is balanced if any agreement made between two MiCRO agents is in the set of optimal agreements \(\mathcal{S}\mathcal{C}\).

The Ultimatum domain, displayed in Figs. 1 and 2 is a nice example of a domain that is balanced w.r.t. the NBS, as well as w.r.t. the MSWS (in fact, the NBS and the MSWS coincide in this domain).

For those domains that are not balanced, we still want to know how close they are to being balanced, so we want to define a quantity \(bs_{\mathcal{S}\mathcal{C}}\), which we call the balance score, that measures this. To explain how we calculate it, first suppose there is exactly one optimal solution \(\omega ^*\) and the balance set contains exactly one element \(\omega\). Then, note that for any agent \(\alpha _i\), the quantity \(u_i(\omega ^*) - u_i(\omega )\) represents how much that agent loses from making agreement \(\omega\) instead of the optimal agreement \(\omega ^*\). Furthermore, since \(\omega ^*\) must be Pareto optimal, and \(\omega\) is weakly Pareto optimal (by Lemma 4), this value can only be positive for one of the two agents. So, we calculate \(bs_{\mathcal{S}\mathcal{C}}\) as the loss for that agent. That is, \(bs_{\mathcal{S}\mathcal{C}} = \max _{i\in \{1,2\}} \{u_i(\omega ^*) - u_i(\omega )\}\). If \(bs_{\mathcal{S}\mathcal{C}} = 0\) it means \(\omega\) is itself optimal, so the domain is balanced, while a higher value of \(bs_{\mathcal{S}\mathcal{C}}\) means that \(\omega\) is farther away from the optimal solution, so the domain is less balanced.

In case there are multiple optimal offers \(\omega ^* \in \mathcal{S}\mathcal{C}\) or multiple offers \(\omega \in {\mathcal {B}}\) in the balance set, then we can generalize this as follows:

$$\begin{aligned} bs_{\mathcal{S}\mathcal{C}} := \max _{\omega \in {\mathcal {B}}} \min _{\omega ^* \in \mathcal{S}\mathcal{C}} \max _{i\in \{1,2\}} \{ u_i(\omega ^*) - u_i(\omega ) \} \end{aligned}$$
(8)

Here, the \(\max\) and \(\min\) operators are chosen such that we have \(bs_{\mathcal{S}\mathcal{C}}=0\) if and only if the domain is balanced.

We have calculated the balance scores of the 35 domainsFootnote 7 that were used in ANAC 2012–2013, with respect to the NBS (\(bs_{{\mathcal {N}}})\) and to the MSWS (\(bs_{\mathcal{S}\mathcal{W}}\)). They are displayed them in Table 4. These domains are normalized so that for each agent the worst possible offer always has utility value 0.0 and the best possible offer always has utility value 1.0. We see that for 29 of these domains we have \(bs_{{\mathcal {N}}} \le 0.1\), while for 20 of them we have \(bs_{{\mathcal {N}}} \le 0.05\), and for 14 of them we even have \(bs_{{\mathcal {N}}} = 0.0\), so indeed in these domains the elements of the balance set often lie very close to the NBS. Similarly, if we use the MSWS as the reference optimal solution, then we find that for 25 of these domains we have \(bs_{\mathcal{S}\mathcal{W}} \le 0.1\), while for 18 of them we have \(bs_{\mathcal{S}\mathcal{W}} \le 0.05\), and for 12 of them we even have \(bs_{\mathcal{S}\mathcal{W}} = 0.0\). So, we conclude that the elements of the balance set also often lie very close to the MSWS.

Table 4 The balance scores of the ANAC domains
Table 5 This table displays, for each type of domain, in how many cases the balance score \(bs_N\) was smaller than or equal to 0.10, 0.05 or 0.0, respectively
Table 6 This table displays, for each type of domain, in how many cases the balance score \(bs_{SW}\) was smaller than or equal to 0.10, 0.05 or 0.0, respectively

8.3 Why are so many ANAC domains balanced?

In this section we will try to answer the question why so many of the ANAC domains are (close to) balanced. The following proposition may give some intuition about this. Specifically, it tells us that it is related to a kind of symmetry.

Proposition 2

Let \(\omega ^*\) be some offer, and let \(x^* := u_1(\omega ^*)\) and \(y^* := u_2(\omega ^*)\). Then, if \(\omega ^*\) is Pareto-optimal and \(|\Omega _1^{x^*}| = |\Omega _2^{y^*}|\), we have that \(x^*\) and \(y^*\) are exactly the balance values.

Proof

Let \((\omega _1, \omega _2, \dots \omega _K)\) be a list containing all offers in the domain, such that \(u_1(\omega _1) \ge u_1(\omega _2) \ge \dots \ge u_1(\omega _K)\), and such that among all offers \(\omega\) for which \(u_1(\omega ) = u_1(\omega ^*)\), \(\omega ^*\) appears last in the list. We let \(x_i\) be shorthand for \(u_1(\omega _i)\), so we have:

$$\begin{aligned} x_1 \ge x_2 \ge \dots \ge x_k \end{aligned}$$
(9)

Then, if j is the position of \(\omega ^*\) in this list, we have \(x_j = x^*\) and \(|\Omega _1^{x^*}| = |\Omega _1^{x_j}| = j\). Similarly, we can sort the orders according to \(\alpha _2\)’s utility. If \(j'\) is the position of \(\omega ^*\) in that list, then we get \(y_{j'} = y^*\) and \(|\Omega _2^{y^*}| = |\Omega _2^{y_{j'}}| = j'\), but we have by assumption that \(|\Omega _1^{x^*}| = |\Omega _2^{y^*}|\), so \(j=j'\). We therefore have that \(\omega ^* \in \Omega _1^{x_j} \cap \Omega _2^{y_j}\).

Now, let k be the smallest integer for which \(\omega ^* \in \Omega _1^{x_k} \cap \Omega _2^{y_k}\). Clearly, \(k \le j\), which, by (9), implies \(x_k \ge x_j\). Furthermore, since \(\omega ^* \in \Omega _1^{x_k}\) we have \(x^* \ge x_k\), and since \(x^* = x_j\) we conclude that \(x_k = x_j = x^*\). Similarly we can show that \(y_k = y_j = y^*\).

We now want to show that \(\Omega _1^{x_k} \cap \Omega _2^{y_k}\) is the balance set. To do this, we have to prove that \(\Omega _1^{x_{k-1}} \cap \Omega _2^{y_{k-1}}\) is empty. Suppose the contrary. Then there is some offer \(\omega \in \Omega _1^{x_{k-1}} \cap \Omega _2^{y_{k-1}}\), which means that \(u_1(\omega ) \ge x_{k-1} \ge x_k = x^* = u_1(\omega ^*)\) and \(u_2(\omega ) \ge y_{k-1} \ge y_k = y^* = u_2(\omega ^*)\). Now, if one of these inequalities is strict, then we have a contradiction, because it would mean that \(\omega\) dominates \(\omega ^*\), while \(\omega ^*\) was assumed to be Pareto-optimal. On the other hand, if all these inequalities are actually equalities, then we also have a contradiction, because then we would have \(x_{k-1} = x^*\) and \(y_{k-1} = y^*\) which would imply \(\omega ^* \in \Omega _1^{x_{k-1}} \cap \Omega _2^{y_{k-1}}\), which contradicts our definition of k as the smallest integer for which \(\omega ^* \in \Omega _1^{x_{k}} \cap \Omega _2^{y_{k}}\).

Finally, since \(x^* = x_k\) and \(y^* = y_k\), and since \(x_k\) and \(y_k\) are by definition the lowest possible utility values any offer in \(\Omega _1^{x_k} \cap \Omega _2^{y_k}\) can have, and since we have showed that \(\Omega _1^{x_k} \cap \Omega _2^{y_k}\) is the balance set, we conclude that \(x^*\) and \(y^*\) are indeed the balance values. \(\square\)

Note that if \(\omega ^*\) is in the NBS or MSWS (or any other solution that could be considered ‘optimal’) then it is indeed Pareto-optimal. So, if the domain happens to satisfy the symmetry condition \(|\Omega _1^{x^*}| = |\Omega _2^{y^*}|\), then the utility values of this optimal solution coincide exactly with the balance values.

We can see in the right-hand image of Fig. 2 that this indeed holds for the Ultimatum domain. If we draw a vertical line through the NBS, then there are exactly 3 offers on or to the right of this line, so we have \(|\Omega _1^{x^*}|= 3\). Similarly, if we draw a horizontal line through the NBS, then there are exactly 3 offers on or above this line, so we have \(|\Omega _2^{y^*}|= 3\). This means we have \(|\Omega _1^{x^*}| = |\Omega _2^{y^*}|\), so, according to Proposition 2, the balance values should coincide with the NBS. Indeed, we see in the center image of Fig. 2 that \(\Omega _1^{x_2} \cap \Omega _2^{y_2} = \emptyset\), while we can see in the right-hand image that \(\Omega _1^{x_3} \cap \Omega _2^{y_3} \ne \emptyset\). Therefore, \(\Omega _1^{x_3} \cap \Omega _2^{y_3}\) is the balance set and it contains exactly one element, which happens to be the NBS. So, indeed, the balance values coincide with the NBS.

Of course, this still begs the question why exactly these domains are so symmetrical. One thing to note is that these domains were handcrafted by people [8, 34], so one hypothesis is that the symmetry is caused by the fact that the people who created them had a (possibly subconscious) preference for symmetry. Another hypothesis, is that this symmetry may emerge naturally when linear domains are created at random (similar to how a sum of multiple random variables naturally converges towards a Gaussian distribution).

To investigate this, we also analyzed two other types of domains: a set of randomly generated negotiation domains, and a set of negotiation domains obtained from a real-world industrial problem. For the randomly generated domains we use the 50 domains from the GeniusWeb framework [35] that were used in ANAC 2021, while for the real-world domains we use the Nestlé/Pladis domains from [36]. These represent the negotiation between two logistics companies about the exchange of truck loads, and were generated from real-world data.Footnote 8 The results of this analysis are displayed in Tables 5 and 6.

We see in Table 5 that for the handcrafted domains in 83% of the cases (i.e. in 29 out of the 35 domains) the balance score w.r.t. the NBS was smaller than or equal to 0.10, while for the randomly generated domains this was only 64%, and for the real-world problems this was only 60%. Similarly, if we use \(s_{{\mathcal {N}}}\le 0.05\) or \(s_{{\mathcal {N}}}=0.0\) as the criterion, then we also see that the handcrafted domains are more balanced than the random domains, while the random domains are in turn more balanced than the real-world domains. Furthermore, we can make the same observations from Table 6, which uses \(s_{\mathcal{S}\mathcal{W}}\) instead of \(s_{{\mathcal {N}}}\).

These numbers suggest that both the hypothesized effects may actually be playing a role. Furthermore, it is striking to see that even the real-world domains still seem to display a relatively large degree of balancedness (at least when using \(s_{{\mathcal {N}}}\) as its measure), which again suggests that a there may be some natural mathematical reason for this type of symmetry.

8.4 Negotiations with discount factors

As mentioned earlier, we did not take the presence of discount factors into account in our analysis. In theory, if there are discount factors, there could be strategies that are better than MiCRO. For example, two agents that immediately propose and accept an optimal solution would be better. However, we do not see how such a strategy could be implemented without knowledge of the opponent’s utility function.

Furthermore, while we do not have any theoretical guarantees about the optimality of MiCRO in negotiations with discount factors, we know from experiments that two MiCRO agents typically come to a deal much faster than any other pair of agents [1]. This is because MiCRO does not have to do any calculations to update any opponent models. Therefore, in practice, the presence of discount factors only seems to increase the advantage of MiCRO over other strategies.

9 Inconsistent strategies

In the previous sections we have always assumed that the opponents are (semi-)consistent. In general, this seems like a reasonable assumption. However, it is possible that an opponent could behave inconsistently on purpose because it would allow them to exploit MiCRO. We will here give an example of such a strategy, which we call Anti-MiCRO. However, it should be noted that in order to implement this strategy deliberately, the opponent would need to know MiCRO’s utility function. Furthermore, we will show that we only need to make a minor adaptation to MiCRO to defend it against Anti-MiCRO.

9.1 The Anti-MiCRO strategy

Suppose that \(\alpha _1\) applies MiCRO, and that the domain has K offers, which all have a different utility value for agent \(\alpha _1\). That is, we have:

$$\begin{aligned}u_1(\omega _1)> u_1(\omega _2)> \dots > u_1(\omega _K)\end{aligned}$$

Then, there exists an inconsistent strategy for \(\alpha _2\), which we call Anti-MiCRO, that, for any offer \(\omega \in \Omega {\setminus } \{\omega _K\}\), can force MiCRO to propose \(\omega\). It works as follows. If \(\alpha _2\) has to make the first proposal of the negotiation, then it will propose \(\omega _2\). Otherwise, whenever \(\alpha _1\) proposes some offer \(\omega _j\), then \(\alpha _2\) will reply by proposing \(\omega _{j+2}\), unless \(\alpha _1\) proposes the desired offer \(\omega\), in which case \(\alpha _2\) will accept it.

An example of a negotiation between MiCRO and Anti-MiCRO is displayed in Table 7, in which Anti-MiCRO wants to force MiCRO to propose \(\omega _5\).

Table 7 MiCRO versus Anti-MiCRO

We can see that this strategy works, because when Anti-MiCRO plays against MiCRO, it will never repeat any proposals, and therefore MiCRO will also keep making new proposals. Furthermore, MiCRO will never accept any of \(\alpha _2\)’s proposals, because when \(\alpha _2\) proposes \(\omega _{j}\), agent \(\alpha _1\) will have made in total \(j-2\) different proposals, and therefore MiCRO will only be willing to propose or accept \(\omega _{j-1}\), at best. This means that MiCRO keeps conceding, and therefore will eventually propose the offer desired by \(\alpha _2\).

It is easy to see that Anti-MiCRO is inconsistent, because for any offer \(\omega _j\), if \(\alpha _2\) proposes it, then MiCRO will re-propose it three turns later, but then \(\alpha _2\) will reject it (unless it is the desired offer), which is an inconsistent rejection. In fact, it is a detectable inconsistent rejection, which we will define below.

However, there are three problems with Anti-MiCRO that make it unsuitable in practice:

  1. 1.

    It requires \(\alpha _2\) to know the exact preference ordering of \(\alpha _1\).

  2. 2.

    It is an extremely risky strategy, because some the offers proposed by \(\alpha _2\) may actually be very bad for \(\alpha _2\). So, if \(\alpha _1\) is actually not applying MiCRO, but instead some other strategy, then it could happen that \(\alpha _1\) unexpectedly accepts one of those bad proposals.

  3. 3.

    With just a small adaptation, MiCRO can easily defend itself against Anti-MiCRO, as we will see next.

9.2 Defending against Anti-MiCRO

We now propose three possible ways to adapt MiCRO that allow it to defend itself against strategies like Anti-MiCRO. The idea behind each of them, is to take repercussions once it detects the opponent makes an inconsistent rejection.

Definition 21

We say an agent makes a detectable inconsistent rejection, if it first proposes some offer \(\omega\), and then later rejects that same offer.

Note that this definition is indeed a special case of an inconsistent rejection (See Definition 13) with \(\omega = \omega '\). It is called detectable because in this case our agent can see that the opponent’s rejection is inconsistent, even without knowing the opponent’s utility function. On the other hand, it is impossible for an agent to know that its opponent is making any other kind of inconsistent action without knowing its utility function. For this reason, our adaptations to MiCRO only involve detectable inconsistent rejections. Luckily, this is enough to defend against Anti-MiCRO.

It should be noted that all the previously stated lemmas and theorems about MiCRO that assume the opponent is (semi-)consistent, continue to hold for the adaptations of MiCRO that we propose here. This is because all of these adaptations behave exactly the same as the original MiCRO when the opponent is (semi-)consistent.

9.2.1 MiCRO-Retire

The simplest adaptation is the following: whenever the opponent makes a detectable inconsistent rejection, immediately break off the negotiations. We call this MiCRO-Retire (RETreat after Inconsistent REjection). While this approach is rather harsh, it allows us to easily prove an even stronger version of Theorems 3 and 4.

Theorem 5

Suppose the same conditions as Theorem 3. Then, among all negotiation strategies that never make inconsistent proposals and never make undectectable inconsistent rejections, MiCRO-Retire is a best response against itself.

Proof

If \(\alpha _1\) and \(\alpha _2\) both apply MiCRO-Retire then the outcome would be the same as when they both applied MiCRO. Now, suppose \(\alpha _2\) deviates to some other strategy \(\sigma\) that never makes inconsistent proposals or undetectable inconsistent rejections. If \(\sigma\) also does not make any detectable inconsistent rejections, then \(\sigma\) is in fact semi-consistent and MiCRO-Retire will behave exactly like MiCRO. Therefore, we know by Theorem 4 that \(\sigma\) cannot be a better response. On the other hand, if, at some point, \(\sigma\) does make a detectable inconsistent rejection, then \(\alpha _1\) will withdraw and the negotiations will fail, so \(\alpha _2\) will be worse off than if it had chosen MiCRO-Retire. So, again, \(\sigma\) would not be a better response. \(\square\)

9.2.2 MiCRO-Repair and MiCRO-Inspire

Withdrawing from the negotiations immediately after an inconsistent rejection may be an overly harsh reaction, since some opponents might make inconsistent rejections by accident (e.g. because of bugs in the code, or merely by suboptimal implementation), or on purpose, but for reasons that have nothing to do with exploiting MiCRO. We therefore also propose two more forgiving adaptations.

The first option is called MiCRO-Repair (REpeat old Proposals After Inconsistent Rejection). Whenever it detects an inconsistent rejection by the opponent, it does continue negotiating, but it will not propose any new offers anymore. It will just keep repeating (random) proposals it already made before, until the end of the negotiations.

The second option is even more forgiving. We call it MiCRO-Inspire (Ignore iNSincere Proposals that were Inconsistently REjected). Define the numbers m and n as in Sect. 3.4. Furthermore, define l as the number of unique offers that were proposed by the opponent, but then later rejected by that opponent. Then, we obtain MiCRO-Inspire by replacing every occurrence of n in Algorithm 1, by \(n-l\). That is, in any turn, MiCRO-Inspire will propose a new offer only if \(m \le n-l\). Otherwise it will just repeat an earlier proposal. The idea behind this, is that if the opponent first proposes a given offer, but later rejects it, then it was not a genuine proposal, and therefore it should not be counted.

We suspect that Theorem 5 still holds if we use MiCRO-Repair or MiCRO-Inspire instead of MiCRO-Retire. We will not attempt to prove this, however.

10 Conclusions and future work

We have defined the notion of a consistent negotiation strategy, and have shown that the MiCRO negotiation strategy forms a best response against itself among consistent negotiation strategies. Furthermore, we have defined the notion of a balanced negotiation domain, and we have shown that the MiCRO strategy is a game-theoretically optimal strategy on such domains. By itself, this is not surprising, because the notion of a balanced domain was defined exactly with that purpose. However, what is surprising, is that in practice many of the ANAC domains turn out to be balanced.

Although it is not entirely clear why so many of these domains are balanced, we have provided evidence that suggests this may be caused by two different effects. Firstly, it may be partially due to the fact that they were designed by people, who apparently have a (possibly subconscious) preference for domains that exhibit some degree of symmetry. And secondly, it may also partially be the natural consequence of some yet unknown mathematical mechanism. Anyway, we think that understanding the concept of a balanced negotiation domain will help researchers to design more challenging negotiation test cases in the future.

In summary, we draw the following main conclusions:

  1. 1.

    Under the conditions of ANAC, MiCRO is a best response against itself among all consistent strategies.

  2. 2.

    Many domains used in ANAC are approximately balanced, meaning that, among all consistent strategies, MiCRO is game-theoretically optimal on such domains.

  3. 3.

    While there are inconsistent strategies that are able to exploit MiCRO, as far as we can tell such strategies always need very precise knowledge of their opponent’s utility function, and be absolutely sure that their opponent is applying MiCRO.

One important open question is to what extent realistic negotiation strategies are indeed consistent. While we argued that without knowledge of opponent utility, they should indeed be consistent, we know that in reality agents may use opponent modeling algorithms to obtain such knowledge. If such algorithms are good enough they could be used to implement agents that are inconsistent on purpose, so as to exploit MiCRO. We therefore think that MiCRO can be very useful as a tool to assess the quality of such opponent modeling algorithms. Specifically, it can be used to answer the following questions:

  • Are opponent modeling algorithms strong enough to successfully implement Anti-MiCRO (or some other inconsistent strategy that exploits MiCRO)?

  • If yes, can MiCRO itself also make use of such opponent modeling algorithms to defend itself against such opponents (e.g. by detecting inconsistent proposals or rejections from the opponent and then taking repercussions)?

  • If a learning algorithm negotiates against MiCRO, on a balanced domain, will it learn to behave like MiCRO?

  • If two learning algorithms negotiate against each other on a balanced domain, will they learn to behave like MiCRO?

Furthermore, we are planning to develop generalizations of MiCRO for multilateral negotiations, for negotiations over larger domains in which there is not enough time to propose all offers one by one (e.g. domains with more than a million offers), and for negotiations over extremely large domains in which there is not even enough time to sort the offers in a list (e.g. domains with \(10^{20}\) offers).