Search | arXiv e-print repository

doi 10.1109/TG.2022.3206733

AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time

Authors: Johannes Scheiermann, Wolfgang Konen

Abstract: Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they require high computational resources. Many researchers are looking for methods that are simila… ▽ More Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with temporal difference (TD) learning agents. We wrap MCTS for the first time around TD n-tuple networks and we use this wrapping only at test time to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other learning-from-scratch algorithms could only defeat Edax up to level 2). △ Less

Submitted 24 September, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: 11 pages, 10 figures

arXiv:2111.14375 [pdf, other]

Final Adaptation Reinforcement Learning for N-Player Games

Authors: Wolfgang Konen, Samineh Bagheri

Abstract: This paper covers n-tuple-based reinforcement learning (RL) algorithms for games. We present new algorithms for TD-, SARSA- and Q-learning which work seamlessly on various games with arbitrary number of players. This is achieved by taking a player-centered view where each player propagates his/her rewards back to previous rounds. We add a new element called Final Adaptation RL (FARL) to all these… ▽ More This paper covers n-tuple-based reinforcement learning (RL) algorithms for games. We present new algorithms for TD-, SARSA- and Q-learning which work seamlessly on various games with arbitrary number of players. This is achieved by taking a player-centered view where each player propagates his/her rewards back to previous rounds. We add a new element called Final Adaptation RL (FARL) to all these algorithms. Our main contribution is that FARL is a vitally important ingredient to achieve success with the player-centered view in various games. We report results on seven board games with 1, 2 and 3 players, including Othello, ConnectFour and Hex. In most cases it is found that FARL is important to learn a near-perfect playing strategy. All algorithms are available in the GBG framework on GitHub. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 23 pages

arXiv:1907.06508 [pdf, other]

General Board Game Playing for Education and Research in Generic AI Game Learning

Authors: Wolfgang Konen

Abstract: We present a new general board game (GBG) playing and learning framework. GBG defines the common interfaces for board games, game states and their AI agents. It allows one to run competitions of different agents on different games. It standardizes those parts of board game playing and learning that otherwise would be tedious and repetitive parts in coding. GBG is suitable for arbitrary 1-, 2-, ...… ▽ More We present a new general board game (GBG) playing and learning framework. GBG defines the common interfaces for board games, game states and their AI agents. It allows one to run competitions of different agents on different games. It standardizes those parts of board game playing and learning that otherwise would be tedious and repetitive parts in coding. GBG is suitable for arbitrary 1-, 2-, ..., N-player board games. It makes a generic TD($λ$)-n-tuple agent for the first time available to arbitrary games. On various games, TD($λ$)-n-tuple is found to be superior to other generic agents like MCTS. GBG aims at the educational perspective, where it helps students to start faster in the area of game learning. GBG aims as well at the research perspective by collecting a growing set of games and AI agents to assess their strengths and generalization capabilities in meaningful competitions. Initial successful educational and research results are reported. △ Less

Submitted 11 July, 2019; originally announced July 2019.

Comments: 8 pages, for: Conference on Games (CoG), London, 2019. Index Terms: game learning, general game playing, AI, temporal difference learning, board games, n-tuple systems

arXiv:1904.08397 [pdf, other]

SACOBRA with Online Whitening for Solving Optimization Problems with High Conditioning

Authors: Samineh Bagheri, Wolfgang Konen, Thomas Bäck

Abstract: Real-world optimization problems often have expensive objective functions in terms of cost and time. It is desirable to find near-optimal solutions with very few function evaluations. Surrogate-assisted optimizers tend to reduce the required number of function evaluations by replacing the real function with an efficient mathematical model built on few evaluated points. Problems with a high conditi… ▽ More Real-world optimization problems often have expensive objective functions in terms of cost and time. It is desirable to find near-optimal solutions with very few function evaluations. Surrogate-assisted optimizers tend to reduce the required number of function evaluations by replacing the real function with an efficient mathematical model built on few evaluated points. Problems with a high condition number are a challenge for many surrogate-assisted optimizers including SACOBRA. To address such problems we propose a new online whitening operating in the black-box optimization paradigm. We show on a set of high-conditioning functions that online whitening tackles SACOBRA's early stagnation issue and reduces the optimization error by a factor between 10 to 1e12 as compared to the plain SACOBRA, though it imposes many extra function evaluations. Covariance matrix adaptation evolution strategy (CMA-ES) has for very high numbers of function evaluations even lower errors, whereas SACOBRA performs better in the expensive setting (less than 1e03 function evaluations). If we count all parallelizable function evaluations (population evaluation in CMA-ES, online whitening in our approach) as one iteration, then both algorithms have comparable strength even on the long run. This holds for problems with dimension D <= 20. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: 20 pages, 10 figures

arXiv:1512.09251 [pdf, other]

Solving the G-problems in less than 500 iterations: Improved efficient constrained optimization by surrogate modeling and adaptive parameter control

Authors: Samineh Bagheri, Wolfgang Konen, Michael Emmerich, Thomas Bäck

Abstract: Constrained optimization of high-dimensional numerical problems plays an important role in many scientific and industrial applications. Function evaluations in many industrial applications are severely limited and no analytical information about objective function and constraint functions is available. For such expensive black-box optimization tasks, the constraint optimization algorithm COBRA was… ▽ More Constrained optimization of high-dimensional numerical problems plays an important role in many scientific and industrial applications. Function evaluations in many industrial applications are severely limited and no analytical information about objective function and constraint functions is available. For such expensive black-box optimization tasks, the constraint optimization algorithm COBRA was proposed, making use of RBF surrogate modeling for both the objective and the constraint functions. COBRA has shown remarkable success in solving reliably complex benchmark problems in less than 500 function evaluations. Unfortunately, COBRA requires careful adjustment of parameters in order to do so. In this work we present a new self-adjusting algorithm SACOBRA, which is based on COBRA and capable to achieve high-quality results with very few function evaluations and no parameter tuning. It is shown with the help of performance profiles on a set of benchmark problems (G-problems, MOPTA08) that SACOBRA consistently outperforms any COBRA algorithm with fixed parameter setting. We analyze the importance of the several new elements in SACOBRA and find that each element of SACOBRA plays a role to boost up the overall optimization performance. We discuss the reasons behind and get in this way a better understanding of high-quality RBF surrogate modeling. △ Less

Submitted 31 December, 2015; originally announced December 2015.

arXiv:1105.1951 [pdf, other]

Self-configuration from a Machine-Learning Perspective

Authors: Wolfgang Konen

Abstract: The goal of machine learning is to provide solutions which are trained by data or by experience coming from the environment. Many training algorithms exist and some brilliant successes were achieved. But even in structured environments for machine learning (e.g. data mining or board games), most applications beyond the level of toy problems need careful hand-tuning or human ingenuity (i.e. detecti… ▽ More The goal of machine learning is to provide solutions which are trained by data or by experience coming from the environment. Many training algorithms exist and some brilliant successes were achieved. But even in structured environments for machine learning (e.g. data mining or board games), most applications beyond the level of toy problems need careful hand-tuning or human ingenuity (i.e. detection of interesting patterns) or both. We discuss several aspects how self-configuration can help to alleviate these problems. One aspect is the self-configuration by tuning of algorithms, where recent advances have been made in the area of SPO (Sequen- tial Parameter Optimization). Another aspect is the self-configuration by pattern detection or feature construction. Forming multiple features (e.g. random boolean functions) and using algorithms (e.g. random forests) which easily digest many fea- tures can largely increase learning speed. However, a full-fledged theory of feature construction is not yet available and forms a current barrier in machine learning. We discuss several ideas for systematic inclusion of feature construction. This may lead to partly self-configuring machine learning solutions which show robustness, flexibility, and fast learning in potentially changing environments. △ Less

Submitted 5 September, 2011; v1 submitted 10 May, 2011; originally announced May 2011.

Comments: 12 pages, 5 figures, Dagstuhl seminar 11181 "Organic Computing - Design of Self-Organizing Systems", May 2011

Report number: DPA-11181

arXiv:0912.1064 [pdf, other]

On the numeric stability of the SFA implementation sfa-tk

Authors: Wolfgang Konen

Abstract: Slow feature analysis (SFA) is a method for extracting slowly varying features from a quickly varying multidimensional signal. An open source Matlab-implementation sfa-tk makes SFA easily useable. We show here that under certain circumstances, namely when the covariance matrix of the nonlinearly expanded data does not have full rank, this implementation runs into numerical instabilities. We prop… ▽ More Slow feature analysis (SFA) is a method for extracting slowly varying features from a quickly varying multidimensional signal. An open source Matlab-implementation sfa-tk makes SFA easily useable. We show here that under certain circumstances, namely when the covariance matrix of the nonlinearly expanded data does not have full rank, this implementation runs into numerical instabilities. We propse a modified algorithm based on singular value decomposition (SVD) which is free of those instabilities even in the case where the rank of the matrix is only less than 10% of its size. Furthermore we show that an alternative way of handling the numerical problems is to inject a small amount of noise into the multidimensional input signal which can restore a rank-deficient covariance matrix to full rank, however at the price of modifying the original data and the need for noise parameter tuning. △ Less

Submitted 5 December, 2009; originally announced December 2009.

Comments: 12 pages

arXiv:0911.4397 [pdf, other]

How slow is slow? SFA detects signals that are slower than the driving force

Authors: Wolfgang Konen, Patrick Koch

Abstract: Slow feature analysis (SFA) is a method for extracting slowly varying driving forces from quickly varying nonstationary time series. We show here that it is possible for SFA to detect a component which is even slower than the driving force itself (e.g. the envelope of a modulated sine wave). It is shown that it depends on circumstances like the embedding dimension, the time series predictability… ▽ More Slow feature analysis (SFA) is a method for extracting slowly varying driving forces from quickly varying nonstationary time series. We show here that it is possible for SFA to detect a component which is even slower than the driving force itself (e.g. the envelope of a modulated sine wave). It is shown that it depends on circumstances like the embedding dimension, the time series predictability, or the base frequency, whether the driving force itself or a slower subcomponent is detected. We observe a phase transition from one regime to the other and it is the purpose of this work to quantify the influence of various parameters on this phase transition. We conclude that what is percieved as slow by SFA varies and that a more or less fast switching from one regime to the other occurs, perhaps showing some similarity to human perception. △ Less

Submitted 23 November, 2009; originally announced November 2009.

Showing 1–8 of 8 results for author: Konen, W