Abstract
Making a rapid unpredictable decision from
1 Introduction
In scenarios that involve adversarial behaviour, it often pays to act in a manner that is not entirely predictable. Given a zero-sum competition with two adversaries and known payoffs (or expected payoffs) for every combination of finite choices, minimax probabilities can be found via linear programming [1].
For
Real-time control applications frequently seek superior scalings. A common problem in artificial intelligence in video games (Game AI) is for simulated agents in an adversarial scenario to pick their next destination tactically
One model that satisfies this requirement is the zero-sum hide-search game, following the approach of Sakaguchi [6][1]. Here the choice is that of a Hider who is rewarded for choosing site
is the payoff for the Hider. This imposes sufficient structure on the off-diagonal terms (the predictions are exact) that minimax strategies can be computed in
For general extensions this scaling is sacrificed, for example in security games where the requirement of exact cover (prediction) is dropped (e.g. [11]), or multiple non-interchangeable search resources are introduced (e.g. [12]), both of which require the more general linear program to be solved. Allowing additional stages where sequential choices are restricted to neighbours becomes a search-evasion game [13,14], and whilst they have received considerable attention even some of the most trivial search games remain unsolved [15].
The contribution of this article is to extend (1) to two games with multiple parallel searches where optimal strategies, and their sampling, can be computed with the same algorithmic scaling. In the first, the Searcher is allowed to coordinate
The structure of this article is as follows. In Section 2, we provide some basic definitions, along with a proof of the single prediction case, and illustrate with a simple example. In Section 3, we extend this to multiple searches, where in Section 3.1, we treat the case of multiple coordinated predictions, and in Section 3.2 multiple predictions drawn independently from an identical distribution. We summarise in Section 4.
2 Basic definitions and the single search case
In this section, we describe the solution to the case with a single search, i.e. the problem in equation (1). We then use this to solve a game in Example 2.10 to illustrate the behaviour of solution.
Let us follow the convention that the receiver of payoff
and linearity w.r.t.
Theorem 2.1
(Single prediction) The minimax strategy for this game has expected payoff
The minimax strategy for the Searcher is unique and has probability
which is, in general, mixed.
The minimax strategy for the Hider is unique unless
with the normalisation
This is a slight generalisation of results described in [8, 8.1 “Scud Hunt”] and [9, 1.7.7], which describe this solution with a choice of zero for the
A useful result for solving such problems, which will be extensively used here, is the Gibbs lemma. This is a necessary first-order condition to find a maximum over
for
Lemma 2.2
The set of supported sites of the Searcher are contained in those of the Hider, i.e.
Proof
From the Gibbs lemma w.r.t.
i.e.
i.e.
Lemma 2.3
The supported sites for the Searcher are contiguous over the largest
i.e. the Searcher will only visit those sites with rewards above some
Proof
Assume this was not the case, i.e. there is some
From Lemma 2.2 we have
However, since
For convenience let us define
and a measure of these
Corollary 2.4
The minimax strategy for y must correspond to some
where
Proof
From Lemma 2.3 we know that
to re-write the index set for non-zero
Using the upper case of (10),
Since the
Lemma 2.5
For
and
Proof
From (10)
Now for
Now decompose
where we have defined
and substitute (15) into (17) to give
Corollary 2.6
Since at minimax we have
Lemma 2.7
Proof
Using the same decomposition as (17) we have
Since all the probabilities
and so we have
Now
Corollary 2.8
From
Lemmas 2.5
and
2.7, minimax gives
and (4) follows immediately from Corollary (2.6),
Lemma 2.9
The strategy for the Hider is at minimax iff it is of the form in (5),
with the normalisation K fixed by summation to unity.
The expected payoff of any of these strategies is
Proof
For (5) to describe all the minimax for the Hider we must show it is both necessary and sufficient.
First let us show that it is necessary. To do this we need to show that, given the Searcher’s optimal strategy, the Hider cannot improve its payoff.
If we fix Searcher’s probabilities to be its optimal strategy, the payoff for the Hider can be written as:
and by inspection we see that, since the probabilities must sum to unity, the Hider can maximise its payoff by transferring any probability from the sites
We now need to check that these are sufficient, i.e. if we fix the Hider’s strategy to (5), we must check that the Searcher cannot improve its payoff by deviating its strategy. To this end, let us consider deviations to the minimax strategy of
with the condition that the
The payoff for the Searcher of such a deviation can thus be written as:
It thus follows that increases in the payoff correspond to positive
2.1 Remarks and example
The reason this game is soluble by hand is that the dependence of the payoff on the site choice of the Searcher is restricted entirely to whether it chooses the same location as the Hider. In terms of algorithm, we see from Theorem 2.1 that the solution is described by a maximum over cumulative sums. These can be performed in
A special case that may be of interest is when the Hider suffers a fixed (still strictly positive) penalty for choosing the same site as the Searcher, i.e.
Example 2.10
(Alice and Bob) Alice and Bob play a game where they each pick a number between 1 and 10. If they choose different numbers, then Bob gives Alice the value of her number in dollars; however, if they choose the same, Alice must give Bob 10 dollars. This corresponds to
and Bob to pick
which is plotted in Figure 1.
3 Extensions to multiple searches
Let us now consider the case where the Searcher can pick multiple sites, applying a penalty if any of the predictions are exactly correct, and the penalty is only applied once (the case where the penalty is linear in the number of correct predictions can be treated as a re-scaling of the problem in Section 2).
Let us assume there are
and we additionally define
a lower bound on the value of the game if the Hider were always caught.
Two things are immediately apparent, that optimal choices for the Searcher are disjoint, and that the expectation depends only on the marginal probabilities of a search of site
In Section 3.2, a second problem studied is where each search
Algorithms to solve these cases are composed of steps commonly described elsewhere (sampling a marginal distribution without replacement, solving piecewise differentiable monotonic functions etc.); however, testing an implementation for computational efficiency and for errors in special cases is not always trivial, therefore, we provide one in the following link: https://github.com/pec27/rams.
3.1 Y coordinated searches
For
with the requirement that
Theorem 3.1
The value of this game to the Hider is
where we have defined
The optimal strategies for the Searcher are those, and only those, with inclusion probabilities for the searches that satisfy
with the lower bound replaced by equality when
The probabilities for the Hider can be classified into two cases depending on which of
requiring the
3.1.1 Proof
The procedure of this proof is to first try to solve the problem with one constraint removed and test if this solution also satisfies the removed constraint. In the case that it does not, we guess the solution set and prove stability.
Let us first consider the problem where we remove the constraint that
with
We now check whether this satisfies the constraint
and this is true for all
with
The corresponding probability for the Hider is
with
Let us now consider the solution when this is violated.
3.1.2 Solution for
ν
low
>
ν
Y
In this case, the Hider can guarantee a return of
Formally, let us begin with an ansatz for the inclusion probability distribution for the searches. We consider
chosen s.t.
What is the best response
Note
and the expected value of this strategy is
We now ask if there is a better strategy for the Searcher, and we can trivially see this is false, since it already catches the Hider every time (and there is no direct payoff dependency on the sites of the searches). Having showed these strategies satisfy minimax, let us proceed to verify that no other strategies do.
We check that for any other strategy for the Searcher, the Hider has a better response, and correspondingly for any other strategy for the Hider, the Searcher has a better response. Beginning with the Searcher, suppose
i.e. some site where
Correspondingly for the Hider, we try
Finally, we note that in this case the expected payoff is
3.1.3 Algorithmic solution and remarks
Given the explicit formulae for the value and probabilities in Theorem 3.1, the only operation that requires a non-trivial algorithm is that of choosing
3.2 Non-coordinated searches
In this section, we apply the additional restriction to the game in (28) that the Searcher chooses
with constraints that the
Minimax still applies to this problem since it is a sum of concave–convex functions (for
Theorem 3.2
The optimal distribution for the searches is given by
to choose site
The optimal strategy/strategies for the Hider depends on whether there is a pure solution for the Searcher, i.e. whether
with (in the lower case)
3.2.1 Proof
The Gibbs lemma w.r.t.
This gives us the following corollaries.
Corollary 3.3
Proof
Combining cases of (47) we have
Corollary 3.4
For
Lemma 3.5
Proof
First let us show
Now let us show
Corollary 3.6
Proof
By summation of the
Corollary 3.7
Proof
When
The Gibbs lemma w.r.t.
where the constant on the RHS has been noted as
Corollary 3.8
Proof
Suppose otherwise, i.e.
Corollary 3.9
Proof
Suppose otherwise, i.e.
Lemma 3.10
defined over
Proof
The lower case follows immediately from Corollary 3.9.
For the upper case, we have
Lemma 3.11
Taking
Proof
By inspection they are monotonically decreasing.
strictly negative for
Lemma 3.12
The root of
Proof
By substitution of
By substitution of
and since the sum is a monotonically decreasing function,
Corollary 3.13
Proof
By minimax we know some solution exists in Lemma 3.12. The arguments of the sum are monotonically decreasing, and to match the RHS there must at least be some
Lemma 3.14
For the case
where
Proof
The upper two cases follow immediately from (47). The case
For sufficiency, we can substitute the explicit expression for
3.2.2 Algorithmic solution and remarks
In the left panel of Figure 2 we illustrate root-finding for the monotonic piecewise-analytic function in (45) that is applied in the companion code. For the left-most point
In terms of algorithmic complexity, solving for the root of a convex strictly monotonically decreasing function
In the right panel of Figure 2, we illustrate the behaviour of the hider probabilities as a function of the reward
A novel feature of this solution is that it has an analytic continuation to
4 Summary
In this work, we addressed the decision problem of making a rapid unpredictable choice from
Acknowledgments
The author would like to thank Thomas S. Ferguson and Annika Lang for reading drafts of this article and their comments and support, and to thank Ali Khan and Graeme Leese for early discussions of the problem in Example 2.10. PEC is employed at Mercuna Developments, an AI middleware company registered in Scotland, number SC545088.
-
Conflict of interest: Author states no conflict of interest.
-
Data availability statement: Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
References
[1] J. von Neumann, O. Morgenstern, and A. Rubinstein, Theory of Games and Economic Behavior (60th Anniversary Commemorative Edition), Princeton, NJ, Princeton University Press, 1944. ISBN 9780691130613. Search in Google Scholar
[2] P. M. Vaidya, “Speeding-up linear programming using fast matrix multiplication,” in: Proceedings of the 30th Annual Symposium on Foundations of Computer Science, SFCS ’89, IEEE Computer Society, USA, 1989, pp. 332–337. ISBN 0818619821, 10.1109/SFCS.1989.63499. Search in Google Scholar
[3] S. Jiang, Z. Song, O. Weinstein, and H. Zhang, Faster Dynamic Matrix Inverse for Faster LPs. arXiv e-prints, art. arXiv:2004.07470, April 2020. Search in Google Scholar
[4] M. Jack, “Tactical position selection,” In: Game AI Pro., S. Game, Ed., Chapter 26. Boca Raton, CRC Press, 2013, pp. 337–359. 10.1201/9780429054969-1Search in Google Scholar
[5] E. Johnson, “Guide to effective auto-generated spatial queries,” in: Game AI Pro 3, chapter 26, S. Rabin, Ed., Boca Raton, CRC Press, 2017, pp. 309–325. 10.4324/9781315151700-26Search in Google Scholar
[6] M. Sakaguchi, “Two-sided search games,” J. Operat. Res. Soc. Japan, vol. 16, no. 4, pp. 207–225, Dec 1973. Search in Google Scholar
[7] M. Dresher, Games of Strategy: Theory and Applications. Englewood Cliffs, NJ, Prentice-Hall, 1961. Search in Google Scholar
[8] A. Washburn, Two-Person Zero-Sum Games. 4th edition, New York, Springer, 2014. 10.1007/978-1-4614-9050-0Search in Google Scholar
[9] L. A. Petrosyan and N. A. Zenkevich, Game Theory. Hackensack, NJ, World Scientific, 2016, ISBN 9789814725385, 10.1142/9824. Search in Google Scholar
[10] T. S. Ferguson, Game Theory, Second edition. Hackensack, NJ, World Scientific, 2014. Search in Google Scholar
[11] C. Kiekintveld, M. Jain, J. Tsai, J. Pita, F. Ordóñez, and M. Tambe, “Computing optimal randomized resource allocations for massive security games,” in: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), vol. 1, pp. 689–696, 2009, ISBN 9780981738161, 10.5555/1558013.1558108. Search in Google Scholar
[12] J. Letchford and V. Conitzer, “Solving security games on graphs via marginal probabilities,” Proc. AAAI Conference Artif. Intell., vol 27, no. 1, pp. 591–597, June 2013. 10.1609/aaai.v27i1.8688Search in Google Scholar
[13] K. T. Lee, “A firing game with time lag,” J. Optim. Theory Appl., vol. 41, no. 4, pp. 547–558, December 1983. ISSN 0022-3239, 10.1007/BF00934642. Search in Google Scholar
[14] T. Nakai, “A sequential evasion-search game with a goal,” J. Operat. Res. Soc. Japan, vol. 29, no. 2, pp. 113–122, 1986, 10.15807/jorsj.29.113. Search in Google Scholar
[15] S. Alpern, R. Fokkink, R. Lindelauf, and G.-J. Olsder, “The ‘Princess and Monster’ game on an interval,” SIAM J. Control Optimization, vol. 47, no. 3, pp. 1178–1190, 2008. 10.1137/060672054Search in Google Scholar
[16] J. C. Deville and Y. Tillé, “Unequal probability sampling without replacement through a splitting method,” Biometrika, vol. 85, pp. 89–101, March 1998, 10.1093/biomet/85.1.89. Search in Google Scholar
[17] J. Croucher, “Application of the fundamental theorem of games to an example concerning antiballistic missile defense,” Naval Res. Logistics Quarter., vol. 22, pp. 197–203, March 1975, 10.1002/NAV.3800220117. Search in Google Scholar
[18] V. J. Baston and A. Y. Garnaev, “A search game with a protector,” Naval Res. Logistics, vol. 47, no. 2, pp. 85–96, 2000. https://eprints.soton.ac.uk/29734/. 10.1002/(SICI)1520-6750(200003)47:2<85::AID-NAV1>3.0.CO;2-CSearch in Google Scholar
[19] W. H. Ruckle, Geometric Games and their Applications, Pitman, 1983. Search in Google Scholar
© 2022 Peter E. Creasey, published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.