research-article

Neural-network-based parameter tuning for multi-agent simulation using deep reinforcement learning

Authors:

Masanori Hirano,

Kiyoshi IzumiAuthors Info & Claims

World Wide Web, Volume 26, Issue 5

Pages 3535 - 3559

https://doi.org/10.1007/s11280-023-01197-5

Published: 03 August 2023 Publication History

Abstract

This study proposes a new efficient parameter tuning method for multi-agent simulation (MAS) using deep reinforcement learning. MAS is currently a useful tool for social sciences, but is hard to realize realistic simulations due to its computational burden for parameter tuning. This study proposes efficient parameter tuning to address this issue using deep reinforcement learning methods. To improve compatibility with the tuning task, our proposed method employs actor-critic-based deep reinforcement learning, such as deep deterministic policy gradient (DDPG) and soft actor-critic (SAC). In addition to the customized version of DDPG and SAC for our task, we also propose three additional components to stabilize the learning: an action converter (DDPG only), a redundant full neural network actor, and a seed fixer. For experimental verification, we employ a parameter tuning task in an artificial financial market simulation, comparing our proposed model, its ablations, and the Bayesian estimation-based baseline. The results demonstrate that our model outperforms the baseline in terms of tuning performance, indicating that the additional components of the proposed method are essential. Moreover, the critic of our model works effectively as a surrogate model, that is, as an approximate function of the simulation, which allows the actor to tune the parameters appropriately. We have also found that the SAC-based method exhibits the best and fastest convergence, which we assume is achieved by the high exploration capability of SAC.

References

[1]

Kurahashi, S.: Estimating Effectiveness of Preventing Measures for 2019 Novel Coronavirus Diseases (COVID-19). Proceeding of 2020 9th Int. Congress Adv. Appl. Inf. 487–492 (2020).

[2]

Mizuta T, Kosugi S, Kusumoto T, Matsumoto W, Izumi K, Yagi I, and Yoshimura S Effects of Price Regulations and Dark Pools on Financial Market Stability: An Investigation by Multiagent Simulations Intell. Syst. Account. Finance Manag. 2016 23 1–2 97-120

[3]

Hirano M, Izumi K, Shimada T, Matsushima H, and Sakaji H Impact Analysis of Financial Regulation on Multi-Asset Markets Using Artificial Market Simulations J. Risk Financial Manag. 2020 13 4 75

[4]

Sajjad, M., Singh, K., Paik, E., Ahn, C.W.: A data-driven approach for agent-based modeling: Simulating the dynamics of family formation. J. Art. Soc. Soc. Simul. 19(1), 9 (2016). https://doi.org/10.18564/jasss.2988

[5]

Nonaka Y, Onishi M, Yamashita T, Okada T, Shimada A, and Taniguchi RI Walking velocity model for accurate and massive pedestrian simulator IEEJ Trans. Electron. Inf. Syst. 2013 133 9 1779-1786

[6]

Shigenaka, S., Onishi, M., Yamashita, T., Noda, I.: Estimation of LargeScale Pedestrian Movement Using Data Assimilation. IEICE Trans. Inf. Syst. D. J. 101(9), 1286–1294 (2018).

[7]

Moss, S., Edmonds, B.: Towards Good Social Science. J. Art. Soc. Social Simul. 8(4), 13 (2005). http://jasss.soc.surrey.ac.uk/8/4/13.html

[8]

Matsushima H, Uchitane T, Tsuji J, Yamashita T, Ito N, and Noda I Applying Design of Experiment based Significant Parameter Search and Reducing Number of Experiment to Analysis of Evacuation Simulation Trans. Japanese Society Art. Intell. 2016 31 6 1-9

[9]

Yamashita, Y., Shigenaka, S., Oba, D., Onishi, M.: Estimation of Large-scale Multi Agent Simulation Results Using Neural Networks [in Japanese]. In: 39th Japanese Special Interest Group on Society andArtificial Intelligence (SIG-SAI), p. 05 (2020).

[10]

Ozaki, Y., Tanigaki, Y., Watanabe, S., Onishi, M.: Multiobjective treestructured parzen estimator for computationally expensive optimization problems. In: Proceedings of 2020 Genetic and Evolutionary Computation Conference, pp. 533–541 (2020).

[11]

Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., Mordatch, I.: Emergent Tool Use From Multi-Agent Autocurricula. In: Proceedings of the International Conference on Learning Representations (2020).

[12]

Farmer JD and Foley D The economy needs agent-based modelling Nature 2009 460 7256 685-686

[13]

Battiston S, Farmer JD, Flache A, Garlaschelli D, Haldane AG, Heesterbeek H, Hommes C, Jaeger C, May R, and Scheffer M Complexity theory and financial regulation: Economic policy needs interdisciplinary network analysis and behavioral modeling Science 2016 351 6275 818-819

[14]

Lux T and Marchesi M Scaling and criticality in a stochastic multi-agent model of a financial market Nature 1999 397 6719 498-500

[15]

Cui, W., Brabazon, A.: An agent-based modeling approach to study price impact. In: Proceedings of 2012 IEEE Conference on Computational Intelligence for Financial Engineering and Economics, pp. 241–248 (2012).

[16]

Mizuta, T.: An agent-based model for designing a financial market that works well. arXiv (2019).

[17]

Torii T, Izumi K, and Yamada K Shock transfer by arbitrage trading: analysis using multi-asset artificial market Evol. Inst. Econ. Rev. 2015 12 2 395-412

[18]

Chiarella C and Iori G A simulation analysis of the microstructure of double auction markets Quantitative Finance 2002 2 5 346-353

[19]

Leal, S.J., Napoletano, M.: Market stability vs. market resilience: Regulatory policies experiments in an agent-based model with low- and high-frequency trading. J. Econ. Behav. Organ. 157, 15–41 (2019).

[20]

Paddrik, M., Hayes, R., Todd, A., Yang, S., Beling, P., Scherer, W.: An agent based model of the E-Mini S &P 500 applied to flash crash analysis. In: Proceedings of 2012 IEEE Conference on Computational Intelligence for Financial Engineering and Economics, pp. 257–264 (2012).

[21]

Torii T, Kamada T, Izumi K, and Yamada K Platform Design for Largescale Artificial Market Simulation and Preliminary Evaluation on the K Computer Art. Life Robotics 2017 22 3 301-307

[22]

Torii, T., Izumi, K., Kamada, T., Yonenoh, H., Fujishima, D., Matsuura, I., Hirano, M., Takahashi, T.: Plham: Platform for Large-scale and Highfrequency Artificial Market (2016). https://github.com/plham/plham

[23]

Torii, T., Izumi, K., Kamada, T., Yonenoh, H., Fujishima, D., Matsuura, I., Hirano, M., Takahashi, T., Finnerty, P.: PlhamJ (2019). https://github.com/plham/plhamJ

[24]

Sato, H., Koyama, Y., Kurumatani, K., Shiozawa, Y., Deguchi, H.: U-mart: a test bed for interdisciplinary research into agent-based artificial markets. In: Evolutionary Controversies in Economics, pp. 179–190 (2001).

[25]

Arthur, W.B., Holland, J.H., LeBaron, B., Palmer, R., Tayler, P.: Asset pricing under endogenous expectations in an artificial stock market. The Economy as an Evolving Complex System II, 15–44 (1997).

[26]

Byrd, D., Hybinette, M., Hybinette Balch, T., Morgan, J.: ABIDES: Towards High-Fidelity Multi-Agent Market Simulation. In: Proceedings of the 2020 Conference on Principles of Advanced Discrete Simulation, pp. 11–22 (2020).

[27]

Murase, Y., Uchitane, T., Ito, N.: A Tool for Parameter-space Explorations. Phys. Proced. 57(C), 73–76 (2014).

[28]

Murase, Y., Matsushima, H., Noda, I., Kamada, T.: CARAVAN: A Framework for Comprehensive Simulations on Massive Parallel Machines. Massively Multi-Agent Systems II, 130–143 (2019).

[29]

Angione C, Silverman E, and Yaneske E Using machine learning as a surrogate model for agent-based simulations PLOS ONE 2022 17 2 0263150

[30]

Watkins CJCH and Dayan P Q-learning Mach. Learn. 1992 8 3–4 279-292

[31]

Sutton RS Learning to predict by the methods of temporal differences Mach. Learn. 1988 3 1 9-44

[32]

Tesauro G Temporal Difference Learning and TD-Gammon Commun. ACM 1995 38 3 58-68

[33]

Rummery GA and Niranjan M On-line Q-learning Using Connectionist Systems 1994 Department of Engineering Cambridge, England University of Cambridge

[34]

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, and Hassabis D Human-level control through deep reinforcement learning Nature 2015 518 7540 529-533

[35]

Bellemare MG, Veness J, and Bowling M The Arcade Learning Environment: An Evaluation Platform for General Agents J. Art. Intell. Res. 2013 47 253-279

[36]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Adv. Neural Inf. Process. Syst. 2, 1097–1105 (2012).

[37]

Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-Learning. In: Proceedings of 30th AAAI Conference on Artificial Intelligence, pp. 2094–2100 (2016).

[38]

Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Frcitas, N.: Dueling Network Architectures for Deep Reinforcement Learning. In: Proceedings of 33rd International Conference on Machine Learning, pp. 2939–2947 (2016)

[39]

Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., Legg, S.: Noisy Netw. Explor. arXiv (2017).

[40]

Sutton RS and Barto AG Reinforcement Learning: An Introduction 2018 USA MIT press

[41]

OpenAI: OpenAI Baselines: ACKTR & A2C (2017). https://openai.com/blog/baselines-acktr-a2c/ Accessed 2019-11-06

[42]

Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. In: Proceedings of 32nd AAAI Conference on Artificial Intelligence, pp. 3215–3222 (2018).

[43]

Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., van Hasselt, H., Silver, D.: Distributed Prioritized Experience Replay. arXiv (2018).

[44]

Kapturowski, S., Ostrovski, G., Quan, J., Munos, R., Dabney, W.: Recurrent Experience Replay in Distributed Reinforcement Learning. In: Proceedings of International Conference on Learning Representations, pp. 1–15 (2019)

[45]

Hochreiter S and Schmidhuber J Long Short-Term Memory Neural Comput. 1997 9 8 1735-1780

[46]

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, and Hassabis D A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play Sci. 2018 362 6419 1140-1144

[47]

Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, Van Den Driessche G, Graepel T, and Hassabis D Mastering the game of Go without human knowledge Nature 2017 550 7676 354-359

[48]

Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: Proceedings of 4th International Conference on Learning Representations (2015).

[49]

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., Levine, S.: Soft Actor-Critic Algorithms and Applications. arXiv (2018).

[50]

Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft Actor-Critic: OffPolicy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proc. 35th Int. Conf. Mach. Learn. 2976–2989 (2018).

[51]

Uhlenbeck GE and Ornstein LS On the theory of the brownian motion Physi. Rev. 1930 36 5 823

[52]

Wawrzyński P and Tanwani AK Autonomous reinforcement learning with experience replay Neural Netw. 2013 41 156-167

[53]

Frankle, J., Carbin, M.: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. Proceedings of 7th International Conference on Learning Representations (2018).

[54]

Bookstaber RM The End of Theory: Financial Crises, the Failure of Economics, and the Sweep of Human Interaction 2017 USA Princeton University Press

[55]

Corsi, F.: Measuring and modelling realized volatility: from tick-by-tick to long memory. PhD thesis, Universitá della Svizzera italiana (2005)

[56]

Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019).

Recommendations

Deep Reinforcement Learning for Parameter Tuning of Robot Visual Servoing
Robot visual servoing controls the motion of a robot through real-time visual observations. Kinematics is a key approach to achieving visual servoing. One key challenge of kinematics-based visual servoing is that it requires time-varying parameter ...
Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

Deep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United Kingdom

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...

Comments

Information & Contributors

Information

Published In

cover image World Wide Web

World Wide Web Volume 26, Issue 5

Sep 2023

1444 pages

ISSN:1386-145X

Issue’s Table of Contents

© The Author(s) 2023.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 03 August 2023

Accepted: 12 July 2023

Revision received: 27 June 2023

Received: 10 March 2023

Author Tags

Qualifiers

Research-article

Funding Sources

The University of Tokyo

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents