research-article

Dynamic analysis of multiagent Q-learning with ε-greedy exploration

Authors:

Eduardo Rodrigues Gomes,

Ryszard KowalczykAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 369 - 376

https://doi.org/10.1145/1553374.1553422

Published: 14 June 2009 Publication History

Abstract

The development of mechanisms to understand and model the expected behaviour of multiagent learners is becoming increasingly important as the area rapidly find application in a variety of domains. In this paper we present a framework to model the behaviour of Q-learning agents using the ε-greedy exploration mechanism. For this, we analyse a continuous-time version of the Q-learning update rule and study how the presence of other agents and the ε-greedy mechanism affect it. We then model the problem as a system of difference equations which is used to theoretically analyse the expected behaviour of the agents. The applicability of the framework is tested through experiments in typical games selected from the literature.

References

[1]

Abdallah, S., & Lesser, V. (2008). Non-linear Dynamics in Multiagent Reinforcement Learning Algorithms. Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS'08) (pp. 1321--1324). Estoril, Portugal: IFAAMAS.

Digital Library

[2]

Borgers, T., & Sarin, R. (1997). Learning through reinforcement and replicator dynamics. Journal of Economic Theory, 77, 1--14.

[3]

Borkar, V. S., & Meyn, S. P. (2000). The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control and Optimization, 38, 447--469.

Digital Library

[4]

Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning. New York: John Wiley and Sons.

[5]

Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 746--752). Menlo Park, CA, USA: AAAI.

Digital Library

[6]

Fulda, N., & Ventura, D. (2007). Predicting and preventing coordination problems in cooperative Q-learning systems. Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI'07) (pp. 780--785).

Digital Library

[7]

Galstyan, A., Czajkowski, K., & Lerman, K. (2004). Resource allocation in the grid using reinforcement learning. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'04) (pp. 1314--1315). Washington, DC, USA: IEEE Computer Society.

Digital Library

[8]

Gomes, E. R., & Kowalczyk, R. (2007). Learning the ipa market with individual and social rewards. Proceedings of the International Conference on Intelligent Agent Technology (IAT'07) (pp. 328--334). Fremont, CA, USA: IEEE Computer Society.

Digital Library

[9]

Harandi, M. T., Ahmadabadi, M. N., & Araabi, B. N. (2008). Optimal local basis: A reinforcement learning approach for face recognition. International Journal of Computer Vision, 81, 191--204.

Digital Library

[10]

Hofbauer, J., & Sigmund, K. (1998). Evolutionary games and population dynamics. Cambridge University Press.

[11]

Iglesias, A., Martnez, P., Aler, R., & Fernndez, F. (2008). Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Applied Intelligence, in press.

Digital Library

[12]

Leslie, D. S., & Collins, E. J. (2005). Individual Q-learning in normal form games. SIAM Journal on Control and Optimization, 44, 495--514.

Digital Library

[13]

Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 11, 387--434.

Digital Library

[14]

Panait, L., Tuyls, K., & Luke, S. (2008). Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. Journal of Machine Learning Research, 9, 423--457.

Digital Library

[15]

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

Digital Library

[16]

Tuyls, K., Verbeeck, K., & Lenaerts, T. (2003). A selection-mutation model for Q-learning in multiagent systems. Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'03) (pp. 693--700). New York, NY, USA: ACM.

Digital Library

[17]

Vidal, J. M., & Durfee, E. H. (2003). Predicting the expected behavior of agents that learn about agents: the CLRI framework. Autonomous Agents and Multi-Agent Systems, 6, 77--107.

Digital Library

[18]

Ziogos, N. P., Tellidou, A. C., Gountis, V. P., & Bakirtzis, A. G. (2007). A reinforcement learning algorithm for market participants in FTR auctions. Proceedings of the Seventh IEEE Power Tech (pp. 943--948). IEEE.

Cited By

Wei PFeng WChen YGe NXiang WMao S(2024)Task-Oriented Satellite-UAV Networks With Mobile-Edge ComputingIEEE Open Journal of the Communications Society10.1109/OJCOMS.2023.33412515(202-220)Online publication date: 2024
https://doi.org/10.1109/OJCOMS.2023.3341251
Song CYao YChan S(2024)Incorporating Online Learning Into MCTS-Based Intention ProgressionIEEE Access10.1109/ACCESS.2024.339079612(56400-56413)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3390796
Lim KLee KLee SYee K(2024)Development of agent-based mesh generator for flow analysis using deep reinforcement learningEngineering with Computers10.1007/s00366-024-02045-4Online publication date: 11-Aug-2024
https://doi.org/10.1007/s00366-024-02045-4
Show More Cited By

Recommendations

A selection-mutation model for q-learning in multi-agent systems
AAMAS '03: Proceedings of the second international joint conference on Autonomous agents and multiagent systems

Although well understood in the single-agent framework, the use of traditional reinforcement learning (RL) algorithms in multi-agent systems (MAS) is not always justified. The feedback an agent experiences in a MAS, is usually influenced by the other ...
ϵ-shotgun: ϵ-greedy batch bayesian optimisation
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

Bayesian optimisation is a popular surrogate model-based approach for optimising expensive black-box functions. Given a surrogate model, the next location to expensively evaluate is chosen via maximisation of a cheap-to-query acquisition function. We ...
Hierarchical Reinforcement Learning: A Comprehensive Survey

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
847
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wei PFeng WChen YGe NXiang WMao S(2024)Task-Oriented Satellite-UAV Networks With Mobile-Edge ComputingIEEE Open Journal of the Communications Society10.1109/OJCOMS.2023.33412515(202-220)Online publication date: 2024
https://doi.org/10.1109/OJCOMS.2023.3341251
Song CYao YChan S(2024)Incorporating Online Learning Into MCTS-Based Intention ProgressionIEEE Access10.1109/ACCESS.2024.339079612(56400-56413)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3390796
Lim KLee KLee SYee K(2024)Development of agent-based mesh generator for flow analysis using deep reinforcement learningEngineering with Computers10.1007/s00366-024-02045-4Online publication date: 11-Aug-2024
https://doi.org/10.1007/s00366-024-02045-4
Han DHe Y(2023)The reinforcement learning model with heterogeneous learning rate in activity-driven networksInternational Journal of Modern Physics C10.1142/S012918312350092434:07Online publication date: 5-Jan-2023
https://doi.org/10.1142/S0129183123500924
Lee JKim JLee M(2022)Gain Tuning for SMCSPO of Robot Arm with Q-LearningJournal of Korea Robotics Society10.7746/jkros.2022.17.2.22117:2(221-229)Online publication date: 1-Jun-2022
https://doi.org/10.7746/jkros.2022.17.2.221
Hu SLeung CLeung HSoh HPelachaud CTaylor MFaliszewski PMascardi V(2022)The Dynamics of Q-learning in Population Games: A Physics-inspired Continuity Equation ModelProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535920(615-623)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3535920
Czechowski APiliouras GPelachaud CTaylor MFaliszewski PMascardi V(2022)Poincaré-Bendixson Limit Sets in Multi-Agent LearningProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535887(318-326)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3535887
Chen BHan JChen SYin JChen Z(2022)Automatic Itinerary Planning Using Triple-Agent Deep Reinforcement LearningIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.316900223:10(18864-18875)Online publication date: Oct-2022
https://doi.org/10.1109/TITS.2022.3169002
Wang PDi BSong LJennings N(2022)Multi-Layer Computation Offloading in Distributed Heterogeneous Mobile Edge Computing NetworksIEEE Transactions on Cognitive Communications and Networking10.1109/TCCN.2022.31619558:2(1301-1315)Online publication date: Jun-2022
https://doi.org/10.1109/TCCN.2022.3161955
Li HYang XZhai C(2022)A Reinforcement Learning Approach for Robust Restoration of Generators in Power System2022 China Automation Congress (CAC)10.1109/CAC57257.2022.10055589(1651-1656)Online publication date: 25-Nov-2022
https://doi.org/10.1109/CAC57257.2022.10055589
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents