Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3508546.3508598acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article
Open access

An Overview of the Action Space for Deep Reinforcement Learning

Published: 25 February 2022 Publication History
  • Get Citation Alerts
  • Abstract

    In recent years, deep reinforcement learning has been applied to tasks in the real world gradually. Especially in the field of control, reinforcement learning has shown unprecedented popularity, such as robot control, autonomous driving, and so on. Different algorithms may be suitable for different problems, so we investigate and analyze the existing advanced deep reinforcement learning algorithms from the perspective of action space. At the same time, we analyze the differences and connections between discrete action space, continuous action space and discrete-continuous hybrid action space, and elaborate various reinforcement learning algorithms suitable for different action spaces. Applying reinforcement learning to the control problem in the real world still presents huge challenges. Finally, we summarize these challenges and discuss how reinforcement learning can be appropriately applied to satellite attitude control tasks.

    References

    [1]
    Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, and Martin Riedmiller. 2018. Maximum a posteriori policy optimisation. arXiv preprint arXiv:1806.06920(2018).
    [2]
    Gabriel Barth-Maron, Matthew W Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva Tb, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. 2018. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617(2018).
    [3]
    Marc G Bellemare, Will Dabney, and Rémi Munos. 2017. A distributional perspective on reinforcement learning. In International Conference on Machine Learning. PMLR, 449–458.
    [4]
    Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47 (2013), 253–279.
    [5]
    Craig J Bester, Steven D James, and George D Konidaris. 2019. Multi-pass q-networks for deep reinforcement learning with parameterised action spaces. arXiv preprint arXiv:1905.04388(2019).
    [6]
    Karl Cobbe, Jacob Hilton, Oleg Klimov, and John Schulman. 2020. Phasic policy gradient. arXiv preprint arXiv:2009.04416(2020).
    [7]
    Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679(2015).
    [8]
    Gabriel Dulac-Arnold, Daniel Mankowitz, and Todd Hester. 2019. Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901(2019).
    [9]
    Zhou Fan, Ruilong Su, W. Zhang, and Y. Yu. 2019. Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space. In IJCAI.
    [10]
    Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, 2017. Noisy networks for exploration. arXiv preprint arXiv:1706.10295(2017).
    [11]
    Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning. PMLR, 1587–1596.
    [12]
    Roger Grosse and James Martens. 2016. A kronecker-factored approximate fisher matrix for convolution layers. In International Conference on Machine Learning. PMLR, 573–582.
    [13]
    Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. 2016. Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning. PMLR, 2829–2838.
    [14]
    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning. PMLR, 1861–1870.
    [15]
    Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905(2018).
    [16]
    M. Hausknecht and P. Stone. 2016. Deep Reinforcement Learning in Parameterized Action Space. CoRR abs/1511.04143(2016).
    [17]
    Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. 2018. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
    [18]
    Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado Van Hasselt, and David Silver. 2018. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933(2018).
    [19]
    Jens Kober, J Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32, 11 (2013), 1238–1274.
    [20]
    Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17, 1 (2016), 1334–1373.
    [21]
    T. Lillicrap, Jonathan J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. CoRR abs/1509.02971(2016).
    [22]
    Qiang Liu and Dilin Wang. 2016. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm. In NIPS.
    [23]
    Yang Liu, Prajit Ramachandran, Qiang Liu, and Jian Peng. 2017. Stein variational policy gradient. arXiv preprint arXiv:1704.02399(2017).
    [24]
    Patrick Mannion, Jim Duggan, and Enda Howley. 2016. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. Autonomic road transport support systems(2016), 47–66.
    [25]
    Warwick Masson, Pravesh Ranchod, and George Konidaris. 2016. Reinforcement learning with parameterized actions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
    [26]
    Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928–1937.
    [27]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602(2013).
    [28]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
    [29]
    R. Munos, Tom Stepleton, A. Harutyunyan, and Marc G. Bellemare. 2016. Safe and Efficient Off-Policy Reinforcement Learning. In NIPS.
    [30]
    Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, and Martin Riedmiller. 2020. Continuous-discrete reinforcement learning for hybrid control in robotics. In Conference on Robot Learning. PMLR, 735–751.
    [31]
    Andrew Y Ng, Stuart J Russell, 2000. Algorithms for inverse reinforcement learning. In Icml, Vol. 1. 2.
    [32]
    Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, John Duchi, and Russ Tedrake. 2018. Scalable end-to-end autonomous vehicle testing via rare-event simulation. arXiv preprint arXiv:1811.00145(2018).
    [33]
    OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/.
    [34]
    Tom Schaul, John Quan, Ioannis Antonoglou, and D. Silver. 2016. Prioritized Experience Replay. CoRR abs/1511.05952(2016).
    [35]
    John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889–1897.
    [36]
    John Schulman, P. Moritz, Sergey Levine, Michael I. Jordan, and P. Abbeel. 2016. High-Dimensional Continuous Control Using Generalized Advantage Estimation. CoRR abs/1506.02438(2016).
    [37]
    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017).
    [38]
    Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
    [39]
    Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548(2018).
    [40]
    Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224(2016).
    [41]
    Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International conference on machine learning. PMLR, 1995–2003.
    [42]
    Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shu Liao, and Jimmy Ba. 2017. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NIPS.
    [43]
    Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, and Han Liu. 2018. Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space. arXiv preprint arXiv:1810.06394(2018).
    [44]
    Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1–38.

    Cited By

    View all
    • (2024)AK-MADDPG-Based Antijamming Strategy Design Method for Frequency Agile RadarSensors10.3390/s2411344524:11(3445)Online publication date: 27-May-2024
    • (2024)Novel Architecture of Energy Management Systems Based on Deep Reinforcement Learning in MicrogridIEEE Transactions on Smart Grid10.1109/TSG.2023.331709615:2(1646-1658)Online publication date: Mar-2024
    • (2024)Neurosymbolic Reinforcement Learning and Planning: A SurveyIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33114285:5(1939-1953)Online publication date: May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
    December 2021
    699 pages
    ISBN:9781450385053
    DOI:10.1145/3508546
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. action space
    2. deep reinforcement learning
    3. satellite attitude control

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACAI'21

    Acceptance Rates

    Overall Acceptance Rate 173 of 395 submissions, 44%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8,923
    • Downloads (Last 6 weeks)857
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AK-MADDPG-Based Antijamming Strategy Design Method for Frequency Agile RadarSensors10.3390/s2411344524:11(3445)Online publication date: 27-May-2024
    • (2024)Novel Architecture of Energy Management Systems Based on Deep Reinforcement Learning in MicrogridIEEE Transactions on Smart Grid10.1109/TSG.2023.331709615:2(1646-1658)Online publication date: Mar-2024
    • (2024)Neurosymbolic Reinforcement Learning and Planning: A SurveyIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33114285:5(1939-1953)Online publication date: May-2024
    • (2024)Enhancing Indoor Localization With Semi-Crowdsourced Fingerprinting and GAN-Based Data AugmentationIEEE Internet of Things Journal10.1109/JIOT.2023.333170511:7(11945-11959)Online publication date: 1-Apr-2024
    • (2024)Deep Reinforcement Learning-based scheduling for optimizing system load and response time in edge and fog computing environmentsFuture Generation Computer Systems10.1016/j.future.2023.10.012152:C(55-69)Online publication date: 4-Mar-2024
    • (2024)Optimizing Secrecy Energy Efficiency in RIS-assisted MISO systems using Deep Reinforcement LearningComputer Communications10.1016/j.comcom.2024.01.020217(126-133)Online publication date: Mar-2024
    • (2024)Enhancement of power quality in three-phase GC solar photovoltaicsElectrical Engineering10.1007/s00202-024-02304-zOnline publication date: 7-Mar-2024
    • (2023)Mobility-Aware Resource Allocation in IoRT Network for Post-Disaster Communications with Parameterized Reinforcement LearningSensors10.3390/s2314644823:14(6448)Online publication date: 17-Jul-2023
    • (2023)Reinforcement Learning based Sequential Multi-Robot Task Allocation Considering Weight of Objects and Payload of Robots2023 23rd International Conference on Control, Automation and Systems (ICCAS)10.23919/ICCAS59377.2023.10316925(1858-1861)Online publication date: 17-Oct-2023
    • (2023)A Deep Reinforcement Learning Approach for UAV Path Planning Incorporating Vehicle Dynamics with Acceleration ControlUnmanned Systems10.1142/S230138502442004412:03(477-498)Online publication date: 16-Nov-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media