research-article

Relaxed Exploration Constrained Reinforcement Learning

Authors:

Shahaf S. Shperberg,

Peter StoneAuthors Info & Claims

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

Pages 1727 - 1735

Published: 06 May 2024 Publication History

Abstract

This research introduces a novel setting for reinforcement learning with constraints, termed Relaxed Exploration Constrained Reinforcement Learning (RECRL). Similar to standard constrained reinforcement learning (CRL), the objective in RECRL is to discover a policy that maximizes the environmental return while adhering to a predefined set of constraints. However, in some real-world settings, it is possible to train the agent in a setting that does not require strict adherence to the constraints, as long as the agent adheres to them once deployed. To model such settings, we introduce RECRL, which explicitly incorporates an initial training phase where the constraints are relaxed, enabling the agent to explore the environment more freely. Subsequently, during deployment, the agent is obligated to fully satisfy all constraints. To address RECRL problems, we introduce a curriculum-based approach called CLiC, designed to enhance the exploration of existing CRL algorithms during the training phase and facilitate convergence towards a policy that satisfies the full set of constraints by the end of training. Empirical evaluations demonstrate that CLiC yields policies with significantly higher returns during deployment compared to training solely under the strict set of constraints. The code is available at https://github.com/Shperb/RECRL.

References

[1]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained Policy Optimization. In ICML (Proceedings of Machine Learning Research, Vol. 70). PMLR, 22--31.

[2]

Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.

[3]

Yarden As, Ilnura Usmanova, Sebastian Curi, and Andreas Krause. 2022a. Constrained policy optimization via bayesian world models. arXiv preprint arXiv:2201.09802 (2022).

[4]

Yarden As, Ilnura Usmanova, Sebastian Curi, and Andreas Krause. 2022b. Constrained Policy Optimization via Bayesian World Models. In ICLR. OpenReview.net.

[5]

Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda. 1996. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine learning, Vol. 23, 2 (1996), 279--303.

[6]

Adrien Baranes and Pierre-Yves Oudeyer. 2013. Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics Auton. Syst., Vol. 61, 1 (2013), 49--73.

Digital Library

[7]

Dominik Baumann, Alonso Marco, Matteo Turchetta, and Sebastian Trimpe. 2021. GoSafe: Globally Optimal Safe Robot Learning. In ICRA. IEEE, 4452--4458.

[8]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. 41--48.

Digital Library

[9]

Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. 2018. Minimalistic Gridworld Environment for OpenAI Gym. https://github.com/maximecb/gym-minigrid.

[10]

Yinlam Chow, Ofir Nachum, Edgar A. Duén ez-Guzmán, and Mohammad Ghavamzadeh. 2018. A Lyapunov-based Approach to Safe Reinforcement Learning. In NeurIPS. 8103--8112.

[11]

Houston Claure, Yifang Chen, Jignesh Modi, Malte F. Jung, and Stefanos Nikolaidis. 2019. Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams. ArXiv, Vol. abs/1907.00313 (2019).

[12]

Davide Corsi, Raz Yerushalmi, Guy Amir, Alessandro Farinelli, David Harel, and Guy Katz. 2022. Constrained Reinforcement Learning for Robotics via Scenario-Based Programming. CoRR, Vol. abs/2206.09603 (2022).

[13]

Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. 2018. Automatic Goal Generation for Reinforcement Learning Agents. In ICML.

[14]

Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse curriculum generation for reinforcement learning. In CoRL. PMLR, 482--495.

[15]

Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. CoRR, Vol. abs/1708.02190 (2017).

[16]

Javier García and Fernando Fernández. 2012. Safe Exploration of State and Action Spaces in Reinforcement Learning. J. Artif. Intell. Res., Vol. 45 (2012), 515--564.

Digital Library

[17]

Javier García and Fernando Ferná ndez. 2015. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res., Vol. 16 (2015), 1437--1480.

Digital Library

[18]

Yoshinobu Kadota, Masami Kurano, and Masami Yasuda. 2006. Discounted Markov decision processes with utility constraints. Comput. Math. Appl., Vol. 51, 2 (2006), 279--284.

Digital Library

[19]

Johannes Kirschner, Mojmir Mutnỳ, Andreas Krause, Jaime Coello de Portugal, Nicole Hiller, and Jochem Snuverink. 2022. Tuning Particle Accelerators with Safety Constraints using Bayesian Optimization. arXiv preprint arXiv:2203.13968 (2022).

[20]

Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, M. K. Ryu, and Greg Imwalle. 2018. Data center cooling using model-predictive control. In NeurIPS. 3818--3827.

[21]

Yongshuai Liu, Jiaxin Ding, and Xin Liu. 2020. IPO: Interior-Point Policy Optimization under Constraints. In AAAI. AAAI Press, 4940--4947.

[22]

Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2020. Teacher-Student Curriculum Learning. IEEE Trans. Neural Networks Learn. Syst., Vol. 31, 9 (2020), 3732--3740.

[23]

Teodor Mihai Moldovan and Pieter Abbeel. 2012. Safe Exploration in Markov Decision Processes. In ICML. icml.cc / Omnipress.

[24]

Siddharth Mysore, Robert Platt, and Kate Saenko. 2019. Reward-guided curriculum for robust reinforcement learning. preprint (2019).

[25]

Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E Taylor, and Peter Stone. 2020. Curriculum learning for reinforcement learning domains: A framework and survey. arXiv preprint arXiv:2003.04960 (2020).

[26]

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-driven Exploration by Self-supervised Prediction. In ICML.

[27]

Vitchyr H. Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. 2019. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. CoRR, Vol. abs/1903.03698 (2019).

[28]

Rémy Portelas, Cédric Colas, Katja Hofmann, and Pierre-Yves Oudeyer. 2020a. Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning. PMLR, 835--853.

[29]

Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, and Pierre-Yves Oudeyer. 2020b. Automatic curriculum learning for deep rl: A short survey. arXiv preprint arXiv:2003.04664 (2020).

[30]

Rémy Portelas, Clément Romac, Katja Hofmann, and Pierre-Yves Oudeyer. 2020c. Meta automatic curriculum learning. arXiv preprint arXiv:2011.08463 (2020).

[31]

Sébastien Racanière, Andrew K. Lampinen, Adam Santoro, David P. Reichert, Vlad Firoiu, and Timothy P. Lillicrap. 2020. Automated curriculum generation through setter-solver interactions. In ICLR.

[32]

Alex Ray, Joshua Achiam, and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, Vol. 7 (2019), 1.

[33]

R Tyrrell Rockafellar, Stanislav Uryasev, et al. 2000. Optimization of conditional value-at-risk. Journal of risk, Vol. 2 (2000), 21--42.

[34]

Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, and Christopher J. Pal. 2022. Direct Behavior Specification via Constrained Reinforcement Learning. In ICML (Proceedings of Machine Learning Research, Vol. 162). PMLR, 18828--18843.

[35]

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).

[36]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[37]

Petru Soviany, Radu Tudor Ionescu, Paolo Rota, and Nicu Sebe. 2022. Curriculum Learning: A Survey. Int. J. Comput. Vis., Vol. 130, 6 (2022), 1526--1565.

Digital Library

[38]

Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause, Sebastian Trimpe, and Dominik Baumann. 2022. Scalable Safe Exploration for Global Optimization of Dynamical Systems. CoRR, Vol. abs/2201.09562 (2022).

[39]

Chen Tessler, Daniel J. Mankowitz, and Shie Mannor. 2019. Reward Constrained Policy Optimization. In ICLR (Poster).

[40]

Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, and Alekh Agarwal. 2020. Safe reinforcement learning via curriculum induction. Advances in Neural Information Processing Systems, Vol. 33 (2020), 12151--12162.

[41]

Yuxin Wu and Yuandong Tian. 2017. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning. In ICLR (Poster). OpenReview.net.

[42]

Xuesu Xiao, Bo Liu, Garrett Warnell, and Peter Stone. 2021. Toward agile maneuvers in highly constrained spaces: Learning from hallucination. IEEE Robotics and Automation Letters, Vol. 6, 2 (2021), 1503--1510.

[43]

Zifan Xu, Yulin Zhang, Shahaf S Shperberg, Reuth Mirsky, Yuqian Jiang, Bo Liu, and Peter Stone. 2022. Model-Based Meta Automatic Curriculum Learning. In Decision Awareness in Reinforcement Learning Workshop at ICML 2022.

[44]

Qisong Yang, T Sim ao, Nils Jansen, S Tindemans, and M Spaan. 2022. Training and transferring safe policies in reinforcement learning. (2022).

Index Terms

Relaxed Exploration Constrained Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Relaxed Exploration Constrained Reinforcement Learning
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

This extended abstract introduces a novel setting of reinforcement learning with constraints, called Relaxed Exploration Constrained Reinforcement Learning (RECRL). As in standard constrained reinforcement learning (CRL), the aim is to find a policy that ...
Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Multi-agent reinforcement learning has gained lot of popularity primarily owing to the success of deep function approximation architectures. However, many real-life multi-agent applications often impose constraints on the joint action sequence that can ...
Optimizing Digital Coupon Assignment Using Constrained Reinforcement Learning
ICMLSC '19: Proceedings of the 3rd International Conference on Machine Learning and Soft Computing

Coupon marketing is a traditional but effective way to retain customers and stimulate new purchases. Recently, digital coupons have been widely used in e-commerce and distributed to almost everyone. However, the decision on when and whom to issue the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

2898 pages

ISBN:9798400704864

General Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Jaime Simão Sichman
University of São Paulo, Brazil
,
Program Chairs:
Natasha Alechina
Utrecht University, Netherlands
,
Virginia Dignum
Umeå University, Sweden

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF
ARO
ONR
Israel Science Foundation (ISF)
Israel's Ministry of Innovation Science and Technology (MOST)

Conference

AAMAS '24

Sponsor:

SIGAI

AAMAS '24: International Conference on Autonomous Agents and Multiagent Systems

May 6 - 10, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
19
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)3

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten