Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3635637.3663303acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

A Summary of Online Markov Decision Processes with Non-oblivious Strategic Adversary

Published: 06 May 2024 Publication History

Abstract

We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of O (√Tlog(L) + τ2√ T log(|A|)) where L is the size of adversary's pure strategy set and |A| denotes the size of agent's action space. Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of O (√Tlog(L)< + τ 2 √ Tk log(k)) where k depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence result to a NE. To our best knowledge, this is first work leading to the last iteration result in OMDPs.

References

[1]
Travis Dick, Andras Gyorgy, and Csaba Szepesvari. 2014. Online learning in Markov decision processes with changing cost sequences. In ICML. 512--520.
[2]
Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, and Yaodong Yang. 2023. Online Markov decision processes with non-oblivious strategic adversary. Autonomous Agents and Multi-Agent Systems 37, 1 (2023), 15.
[3]
Le Cong Dinh, Tri-Dung Nguyen, Alain B Zemhoho, and Long Tran-Thanh. 2021. Last Round Convergence and No-Dynamic Regret in Asymmetric Repeated Games. In Algorithmic Learning Theory. PMLR, 553--577.
[4]
Eyal Even-Dar, Sham M Kakade, and Yishay Mansour. 2009. Online Markov decision processes. Mathematics of Operations Research 34, 3 (2009), 726--736.
[5]
Jerzy Filar and Koos Vrieze. 1997. Applications and Special Classes of Stochastic Games. In Competitive Markov Decision Processes. Springer, 301--341.
[6]
Guillaume J Laurent, Laëtitia Matignon, Le Fort-Piat, et al. 2011. The world of independent learners is not Markovian. International Journal of Knowledge-based and Intelligent Engineering Systems 15, 1 (2011), 55--64.
[7]
Gergely Neu and Julia Olkhovskaya. 2020. Online learning in MDPs with linear function approximation and bandit feedback. arXiv e-prints (2020), arXiv-2007.
[8]
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in Star Craft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350--354.
[9]
Yaodong Yang and Jun Wang. 2020. An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective. arXiv preprint arXiv:2011.00583 (2020).

Index Terms

  1. A Summary of Online Markov Decision Processes with Non-oblivious Strategic Adversary

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
    May 2024
    2898 pages
    ISBN:9798400704864

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 06 May 2024

    Check for updates

    Author Tags

    1. non-oblivious adversary
    2. online markov decision processes

    Qualifiers

    • Research-article

    Conference

    AAMAS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 18
      Total Downloads
    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media