Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3635637.3663034acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Relaxed Exploration Constrained Reinforcement Learning

Published: 06 May 2024 Publication History

Abstract

This research introduces a novel setting for reinforcement learning with constraints, termed Relaxed Exploration Constrained Reinforcement Learning (RECRL). Similar to standard constrained reinforcement learning (CRL), the objective in RECRL is to discover a policy that maximizes the environmental return while adhering to a predefined set of constraints. However, in some real-world settings, it is possible to train the agent in a setting that does not require strict adherence to the constraints, as long as the agent adheres to them once deployed. To model such settings, we introduce RECRL, which explicitly incorporates an initial training phase where the constraints are relaxed, enabling the agent to explore the environment more freely. Subsequently, during deployment, the agent is obligated to fully satisfy all constraints. To address RECRL problems, we introduce a curriculum-based approach called CLiC, designed to enhance the exploration of existing CRL algorithms during the training phase and facilitate convergence towards a policy that satisfies the full set of constraints by the end of training. Empirical evaluations demonstrate that CLiC yields policies with significantly higher returns during deployment compared to training solely under the strict set of constraints. The code is available at https://github.com/Shperb/RECRL.

References

[1]
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In ICML (Proceedings of Machine Learning Research, Vol. 70). PMLR, 22--31.
[2]
Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.
[3]
Yarden As, Ilnura Usmanova, Sebastian Curi, and Andreas Krause. 2022a. Constrained policy optimization via bayesian world models. arXiv preprint arXiv:2201.09802 (2022).
[4]
Yarden As, Ilnura Usmanova, Sebastian Curi, and Andreas Krause. 2022b. Constrained Policy Optimization via Bayesian World Models. In ICLR. OpenReview.net.
[5]
Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda. 1996. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine learning, Vol. 23, 2 (1996), 279--303.
[6]
Adrien Baranes and Pierre-Yves Oudeyer. 2013. Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics Auton. Syst., Vol. 61, 1 (2013), 49--73.
[7]
Dominik Baumann, Alonso Marco, Matteo Turchetta, and Sebastian Trimpe. 2021. GoSafe: Globally Optimal Safe Robot Learning. In ICRA. IEEE, 4452--4458.
[8]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. 41--48.
[9]
Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. 2018. Minimalistic Gridworld Environment for OpenAI Gym. https://github.com/maximecb/gym-minigrid.
[10]
Yinlam Chow, Ofir Nachum, Edgar A. Duén ez-Guzmán, and Mohammad Ghavamzadeh. 2018. A Lyapunov-based Approach to Safe Reinforcement Learning. In NeurIPS. 8103--8112.
[11]
Houston Claure, Yifang Chen, Jignesh Modi, Malte F. Jung, and Stefanos Nikolaidis. 2019. Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams. ArXiv, Vol. abs/1907.00313 (2019).
[12]
Davide Corsi, Raz Yerushalmi, Guy Amir, Alessandro Farinelli, David Harel, and Guy Katz. 2022. Constrained Reinforcement Learning for Robotics via Scenario-Based Programming. CoRR, Vol. abs/2206.09603 (2022).
[13]
Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. 2018. Automatic Goal Generation for Reinforcement Learning Agents. In ICML.
[14]
Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse curriculum generation for reinforcement learning. In CoRL. PMLR, 482--495.
[15]
Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. CoRR, Vol. abs/1708.02190 (2017).
[16]
Javier García and Fernando Fernández. 2012. Safe Exploration of State and Action Spaces in Reinforcement Learning. J. Artif. Intell. Res., Vol. 45 (2012), 515--564.
[17]
Javier García and Fernando Ferná ndez. 2015. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res., Vol. 16 (2015), 1437--1480.
[18]
Yoshinobu Kadota, Masami Kurano, and Masami Yasuda. 2006. Discounted Markov decision processes with utility constraints. Comput. Math. Appl., Vol. 51, 2 (2006), 279--284.
[19]
Johannes Kirschner, Mojmir Mutnỳ, Andreas Krause, Jaime Coello de Portugal, Nicole Hiller, and Jochem Snuverink. 2022. Tuning Particle Accelerators with Safety Constraints using Bayesian Optimization. arXiv preprint arXiv:2203.13968 (2022).
[20]
Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, M. K. Ryu, and Greg Imwalle. 2018. Data center cooling using model-predictive control. In NeurIPS. 3818--3827.
[21]
Yongshuai Liu, Jiaxin Ding, and Xin Liu. 2020. IPO: Interior-Point Policy Optimization under Constraints. In AAAI. AAAI Press, 4940--4947.
[22]
Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2020. Teacher-Student Curriculum Learning. IEEE Trans. Neural Networks Learn. Syst., Vol. 31, 9 (2020), 3732--3740.
[23]
Teodor Mihai Moldovan and Pieter Abbeel. 2012. Safe Exploration in Markov Decision Processes. In ICML. icml.cc / Omnipress.
[24]
Siddharth Mysore, Robert Platt, and Kate Saenko. 2019. Reward-guided curriculum for robust reinforcement learning. preprint (2019).
[25]
Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E Taylor, and Peter Stone. 2020. Curriculum learning for reinforcement learning domains: A framework and survey. arXiv preprint arXiv:2003.04960 (2020).
[26]
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-driven Exploration by Self-supervised Prediction. In ICML.
[27]
Vitchyr H. Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. 2019. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. CoRR, Vol. abs/1903.03698 (2019).
[28]
Rémy Portelas, Cédric Colas, Katja Hofmann, and Pierre-Yves Oudeyer. 2020a. Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning. PMLR, 835--853.
[29]
Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, and Pierre-Yves Oudeyer. 2020b. Automatic curriculum learning for deep rl: A short survey. arXiv preprint arXiv:2003.04664 (2020).
[30]
Rémy Portelas, Clément Romac, Katja Hofmann, and Pierre-Yves Oudeyer. 2020c. Meta automatic curriculum learning. arXiv preprint arXiv:2011.08463 (2020).
[31]
Sébastien Racanière, Andrew K. Lampinen, Adam Santoro, David P. Reichert, Vlad Firoiu, and Timothy P. Lillicrap. 2020. Automated curriculum generation through setter-solver interactions. In ICLR.
[32]
Alex Ray, Joshua Achiam, and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, Vol. 7 (2019), 1.
[33]
R Tyrrell Rockafellar, Stanislav Uryasev, et al. 2000. Optimization of conditional value-at-risk. Journal of risk, Vol. 2 (2000), 21--42.
[34]
Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, and Christopher J. Pal. 2022. Direct Behavior Specification via Constrained Reinforcement Learning. In ICML (Proceedings of Machine Learning Research, Vol. 162). PMLR, 18828--18843.
[35]
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).
[36]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[37]
Petru Soviany, Radu Tudor Ionescu, Paolo Rota, and Nicu Sebe. 2022. Curriculum Learning: A Survey. Int. J. Comput. Vis., Vol. 130, 6 (2022), 1526--1565.
[38]
Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause, Sebastian Trimpe, and Dominik Baumann. 2022. Scalable Safe Exploration for Global Optimization of Dynamical Systems. CoRR, Vol. abs/2201.09562 (2022).
[39]
Chen Tessler, Daniel J. Mankowitz, and Shie Mannor. 2019. Reward Constrained Policy Optimization. In ICLR (Poster).
[40]
Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, and Alekh Agarwal. 2020. Safe reinforcement learning via curriculum induction. Advances in Neural Information Processing Systems, Vol. 33 (2020), 12151--12162.
[41]
Yuxin Wu and Yuandong Tian. 2017. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning. In ICLR (Poster). OpenReview.net.
[42]
Xuesu Xiao, Bo Liu, Garrett Warnell, and Peter Stone. 2021. Toward agile maneuvers in highly constrained spaces: Learning from hallucination. IEEE Robotics and Automation Letters, Vol. 6, 2 (2021), 1503--1510.
[43]
Zifan Xu, Yulin Zhang, Shahaf S Shperberg, Reuth Mirsky, Yuqian Jiang, Bo Liu, and Peter Stone. 2022. Model-Based Meta Automatic Curriculum Learning. In Decision Awareness in Reinforcement Learning Workshop at ICML 2022.
[44]
Qisong Yang, T Sim ao, Nils Jansen, S Tindemans, and M Spaan. 2022. Training and transferring safe policies in reinforcement learning. (2022).

Index Terms

  1. Relaxed Exploration Constrained Reinforcement Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
    May 2024
    2898 pages
    ISBN:9798400704864

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 06 May 2024

    Check for updates

    Author Tags

    1. constrained reinforcement learning
    2. curriculum learning

    Qualifiers

    • Research-article

    Funding Sources

    • NSF
    • ARO
    • ONR
    • Israel Science Foundation (ISF)
    • Israel's Ministry of Innovation Science and Technology (MOST)

    Conference

    AAMAS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 19
      Total Downloads
    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media