Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2615731.2615761acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Potential-based difference rewards for multiagent reinforcement learning

Published: 05 May 2014 Publication History

Abstract

Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent's contribution to the system's performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-specific knowledge. This paper introduces two novel reward functions that combine these methods to leverage the benefits of both.
Using the difference reward's Counterfactual as Potential (CaP) allows the application of potential-based reward shaping to a wide range of multiagent systems without the need for domain specific knowledge whilst still maintaining the theoretical guarantee of consistent Nash equilibria.
Alternatively, Difference Rewards incorporating Potential-Based Reward Shaping (DRiP) uses potential-based reward shaping to further shape difference rewards. By exploiting prior knowledge of a problem domain, this paper demonstrates agents using this approach can converge either up to 23.8 times faster than or to joint policies up to 196% better than agents using difference rewards alone.

References

[1]
A. Agogino and K. Tumer. Unifying temporal and structural credit assignment problems. Proc. of the 3rd Intl. Jt. Conf. on Autonomous Agents and Multiagent Systems-Volume 2, pages 980--987, 2004.
[2]
A. K. Agogino and K. Tumer. Analyzing and visualizing multiagent rewards in dynamic and stochastic environments. Journal of Autonomous Agents and Multi-Agent Systems, 17(2):320--338, 2008.
[3]
M. Babes, E. de Cote, and M. Littman. Social reward shaping in the prisoner's dilemma. In Proceedings of The 7th Annual International Conference on Autonomous Agents and Multiagent Systems, volume 3, pages 1389--1392, 2008.
[4]
L. Busoniu, R. Babuska, and B. De Schutter. A Comprehensive Survey of MultiAgent Reinforcement Learning. IEEE Transactions on Systems Man & Cybernetics Part C Applications and Reviews, 38(2):156, 2008.
[5]
S. Devlin, M. Grzeés, and D. Kudenko. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems, 2011.
[6]
S. Devlin and D. Kudenko. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of The 10th Annual International Conference on Autonomous Agents and Multiagent Systems, 2011.
[7]
S. Devlin and D. Kudenko. Dynamic potential-based reward shaping. In Proceedings of The 11th Annual International Conference on Autonomous Agents and Multiagent Systems, 2012.
[8]
M. Grzes and D. Kudenko. Plan-based reward shaping for reinforcement learning. 4th International IEEE Conference on Intelligent Systems, 2:10--22, 2008.
[9]
M. Knudson and K. Tumer. Coevolution of heterogeneous multi-robot teams. Proceedings of the 12th annual conference on Genetic and evolutionary computation, 2010.
[10]
A. Y. Ng, D. Harada, and S. J. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. Proceedings of the 16th International Conference on Machine Learning, pages 278--287, 1999.
[11]
S. Proper and K. Tumer. Coordinating actions in congestion problems: Impact of top-down and bottom-up utilities. Autonomous Agents and MultiAgent Systems, 27(3):419--443, 2013.
[12]
J. Randløv and P. Alstrom. Learning to drive a bicycle using reinforcement learning and shaping. Proceedings of the 16th International Conference on Machine Learning, pages 463--471, 1998.
[13]
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
[14]
K. Tumer and A. Agogino. Distributed agent-based air traffic flow management. Proceedings of the 6th International Conference on Autonomous Agents and Multiagent Systems, pages 330--337, 2007.
[15]
K. Tumer and A. Agogino. Multiagent learning for black box system reward functions. Advances in Complex Systems, 12:493--512, 2009.
[16]
M. Vasirani and S. Ossowski. A market-inspired approach to reservation-based urban road traffic management. Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, 2009.
[17]
C. Watkins and P. Dayan. Q-learning. Machine learning, 8(3):279--292, 1992.
[18]
E. Wiewiora. Potential-based shaping and q-value initialization are equivalent. Journal of Artificial Intelligence Research, 19(1):205--208, 2003.
[19]
E. Wiewiora, G. Cottrell, and C. Elkan. Principled methods for advising reinforcement learning agents. Proceedings of the 20th International Conference on Machine Learning, pages 792--799, 2003.
[20]
D. H. Wolpert, J. Sill, and K. Tumer. Reinforcement learning in distributed domains: Beyondteam games. In Proc. of the 17th Int. Jt. Conf. on Artificial Intelligence, pages 819--824, Seattle, WA, 2001.
[21]
D. H. Wolpert and K. Tumer. Collective intelligence, data routing and Braess' paradox. Journal of Artificial Intelligence Research, 16:359--387, 2002.
[22]
M. Wooldridge. An Introduction to MultiAgent Systems. John Wiley and Sons, 2002.

Cited By

View all
  • (2024)Reward Specifications in Collaborative Multi-agent Learning: A Comparative StudyProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636028(1007-1013)Online publication date: 8-Apr-2024
  • (2023)Novelty Seeking Multiagent Evolutionary Reinforcement LearningProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590428(402-410)Online publication date: 15-Jul-2023
  • (2022)Emergent Cooperation from Mutual Acknowledgment ExchangeProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535967(1047-1055)Online publication date: 9-May-2022
  • Show More Cited By

Index Terms

  1. Potential-based difference rewards for multiagent reinforcement learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems
    May 2014
    1774 pages
    ISBN:9781450327381

    Sponsors

    • IFAAMAS

    In-Cooperation

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 05 May 2014

    Check for updates

    Author Tags

    1. multiagent reinforcement learning
    2. reward shaping

    Qualifiers

    • Research-article

    Conference

    AAMAS '14
    Sponsor:

    Acceptance Rates

    AAMAS '14 Paper Acceptance Rate 169 of 709 submissions, 24%;
    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)33
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reward Specifications in Collaborative Multi-agent Learning: A Comparative StudyProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636028(1007-1013)Online publication date: 8-Apr-2024
    • (2023)Novelty Seeking Multiagent Evolutionary Reinforcement LearningProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590428(402-410)Online publication date: 15-Jul-2023
    • (2022)Emergent Cooperation from Mutual Acknowledgment ExchangeProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535967(1047-1055)Online publication date: 9-May-2022
    • (2022)Disentangling Successor Features for Coordination in Multi-agent Reinforcement LearningProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535935(751-760)Online publication date: 9-May-2022
    • (2019)Multiagent Monte Carlo Tree SearchProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332094(2309-2311)Online publication date: 8-May-2019
    • (2019)A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics NetworkProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331794(980-988)Online publication date: 8-May-2019
    • (2019)A survey on transfer learning for multiagent reinforcement learning systemsJournal of Artificial Intelligence Research10.1613/jair.1.1139664:1(645-703)Online publication date: 1-Jan-2019
    • (2018)Credit assignment for collective multiagent RL with global rewardsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327757.3327905(8113-8124)Online publication date: 3-Dec-2018
    • (2018)Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team RewardProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3238080(2085-2087)Online publication date: 9-Jul-2018
    • (2017)A Theoretical and Empirical Analysis of Reward Transformations in Multi-Objective Stochastic GamesProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems10.5555/3091125.3091384(1625-1627)Online publication date: 8-May-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media