Abstract
Monte Carlo tree search (MCTS) is a popular choice for solving sequential anytime problems. However, it depends on a numeric feedback signal, which can be difficult to define. Real-time MCTS is a variant which may only rarely encounter states with an explicit, extrinsic reward. To deal with such cases, the experimenter has to supply an additional numeric feedback signal in the form of a heuristic, which intrinsically guides the agent. Recent work has shown evidence that in different areas the underlying structure is ordinal and not numerical. Hence erroneous and biased heuristics are inevitable, especially in such domains. In this paper, we propose a MCTS variant which only depends on qualitative feedback, and therefore opens up new applications for MCTS. We also find indications that translating absolute into ordinal feedback may be beneficial. Using a puzzle domain, we show that our preference-based MCTS variant, wich only receives qualitative feedback, is able to reach a performance level comparable to a regular MCTS baseline, which obtains quantitative feedback.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Please note that this is a fair comparison between PB-MCTS and H-MCTS: The first uses more #samples per iteration, the latter uses more iterations.
References
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. CoRR abs/1606.06565 (2016)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Browne, C.B., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Busa-Fekete, R., Hüllermeier, E.: A survey of preference-based online learning with bandit algorithms. In: Auer, P., Clark, A., Zeugmann, T., Zilles, S. (eds.) ALT 2014. LNCS (LNAI), vol. 8776, pp. 18–39. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11662-4_3
Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA (2017)
Finnsson, H.: Simulation-based general game playing. Ph.D. thesis, Reykjavík University (2012)
Fürnkranz, J., Hüllermeier, E. (eds.): Preference Learning. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-14125-6
Fürnkranz, J., Hüllermeier, E., Cheng, W., Park, S.H.: Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89(1–2), 123–156 (2012). https://doi.org/10.1007/s10994-012-5313-8. Special Issue of Selected Papers from ECML PKDD 2011
Knowles, J.D., Watson, R.A., Corne, D.W.: Reducing local optima in single-objective problems by multi-objectivization. In: Zitzler, E., Thiele, L., Deb, K., Coello Coello, C.A., Corne, D. (eds.) EMO 2001. LNCS, vol. 1993, pp. 269–283. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44719-9_19
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Lee, C.S.: The computational intelligence of MoGo revealed in Taiwan’s computer go tournaments. IEEE Trans. Comput. Intell. AI Games 1, 73–89 (2009)
Pepels, T., Winands, M.H., Lanctot, M.: Real-time Monte Carlo tree search in Ms Pac-Man. IEEE Trans. Comput. Intell. AI Games 6(3), 245–257 (2014)
Perez-Liebana, D., Mostaghim, S., Lucas, S.M.: Multi-objective tree search approaches for general video game playing. In: IEEE Congress on Evolutionary Computation (CEC 2016), pp. 624–631. IEEE (2016)
Ponsen, M., Gerritsen, G., Chaslot, G.: Integrating opponent models with Monte-Carlo tree search in poker. In: Proceedings of Interactive Decision Theory and Game Theory Workshop at the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), AAAI Workshops, vol. WS-10-03, pp. 37–42 (2010)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 2nd edn. Wiley, Hoboken (2005)
Rimmel, A., Teytaud, O., Lee, C.S., Yen, S.J., Wang, M.H., Tsai, S.R.: Current frontiers in computer go. IEEE Trans. Comput. Intell. AI Games 2(4), 229–238 (2010)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Sutton, R.S., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Thurstone, L.L.: A law of comparative judgement. Psychol. Rev. 34, 278–286 (1927)
Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS 2011) (2011)
Wirth, C., Fürnkranz, J., Neumann, G.: Model-free preference-based reinforcement learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016), pp. 2222–2228 (2016)
Yannakakis, G.N., Cowie, R., Busso, C.: The ordinal nature of emotions. In: Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction (ACII 2017) (2017)
Yue, Y., Broder, J., Kleinberg, R., Joachims, T.: The k-armed dueling bandits problem. J. Comput. Syst. Sci. 78(5), 1538–1556 (2012). https://doi.org/10.1016/j.jcss.2011.12.028
Yue, Y., Joachims, T.: Interactively optimizing information retrieval systems as a dueling bandits problem. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), pp. 1201–1208 (2009)
Zoghi, M., Whiteson, S., Munos, R., Rijke, M.: Relative upper confidence bound for the k-armed dueling bandit problem. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 10–18 (2014)
Acknowledgments
This work was supported by the German Research Foundation (DFG project number FU 580/10). We gratefully acknowledge the use of the Lichtenberg high performance computer of the TU Darmstadt for our experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Joppen, T., Wirth, C., Fürnkranz, J. (2018). Preference-Based Monte Carlo Tree Search. In: Trollmann, F., Turhan, AY. (eds) KI 2018: Advances in Artificial Intelligence. KI 2018. Lecture Notes in Computer Science(), vol 11117. Springer, Cham. https://doi.org/10.1007/978-3-030-00111-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-00111-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00110-0
Online ISBN: 978-3-030-00111-7
eBook Packages: Computer ScienceComputer Science (R0)