research-article

Open access

An Overview of the Action Space for Deep Reinforcement Learning

Authors:

Junsuo ZhaoAuthors Info & Claims

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

Article No.: 52, Pages 1 - 10

https://doi.org/10.1145/3508546.3508598

Published: 25 February 2022 Publication History

All formats PDF

Abstract

In recent years, deep reinforcement learning has been applied to tasks in the real world gradually. Especially in the field of control, reinforcement learning has shown unprecedented popularity, such as robot control, autonomous driving, and so on. Different algorithms may be suitable for different problems, so we investigate and analyze the existing advanced deep reinforcement learning algorithms from the perspective of action space. At the same time, we analyze the differences and connections between discrete action space, continuous action space and discrete-continuous hybrid action space, and elaborate various reinforcement learning algorithms suitable for different action spaces. Applying reinforcement learning to the control problem in the real world still presents huge challenges. Finally, we summarize these challenges and discuss how reinforcement learning can be appropriately applied to satellite attitude control tasks.

References

[1]

Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, and Martin Riedmiller. 2018. Maximum a posteriori policy optimisation. arXiv preprint arXiv:1806.06920(2018).

[2]

Gabriel Barth-Maron, Matthew W Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva Tb, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. 2018. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617(2018).

[3]

Marc G Bellemare, Will Dabney, and Rémi Munos. 2017. A distributional perspective on reinforcement learning. In International Conference on Machine Learning. PMLR, 449–458.

[4]

Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47 (2013), 253–279.

[5]

Craig J Bester, Steven D James, and George D Konidaris. 2019. Multi-pass q-networks for deep reinforcement learning with parameterised action spaces. arXiv preprint arXiv:1905.04388(2019).

[6]

Karl Cobbe, Jacob Hilton, Oleg Klimov, and John Schulman. 2020. Phasic policy gradient. arXiv preprint arXiv:2009.04416(2020).

[7]

Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679(2015).

[8]

Gabriel Dulac-Arnold, Daniel Mankowitz, and Todd Hester. 2019. Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901(2019).

[9]

Zhou Fan, Ruilong Su, W. Zhang, and Y. Yu. 2019. Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space. In IJCAI.

[10]

Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, 2017. Noisy networks for exploration. arXiv preprint arXiv:1706.10295(2017).

[11]

Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning. PMLR, 1587–1596.

[12]

Roger Grosse and James Martens. 2016. A kronecker-factored approximate fisher matrix for convolution layers. In International Conference on Machine Learning. PMLR, 573–582.

[13]

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. 2016. Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning. PMLR, 2829–2838.

[14]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning. PMLR, 1861–1870.

[15]

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905(2018).

[16]

M. Hausknecht and P. Stone. 2016. Deep Reinforcement Learning in Parameterized Action Space. CoRR abs/1511.04143(2016).

[17]

Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. 2018. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[18]

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado Van Hasselt, and David Silver. 2018. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933(2018).

[19]

Jens Kober, J Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32, 11 (2013), 1238–1274.

Digital Library

[20]

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17, 1 (2016), 1334–1373.

Digital Library

[21]

T. Lillicrap, Jonathan J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. CoRR abs/1509.02971(2016).

[22]

Qiang Liu and Dilin Wang. 2016. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm. In NIPS.

[23]

Yang Liu, Prajit Ramachandran, Qiang Liu, and Jian Peng. 2017. Stein variational policy gradient. arXiv preprint arXiv:1704.02399(2017).

[24]

Patrick Mannion, Jim Duggan, and Enda Howley. 2016. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. Autonomic road transport support systems(2016), 47–66.

[25]

Warwick Masson, Pravesh Ranchod, and George Konidaris. 2016. Reinforcement learning with parameterized actions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.

[26]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928–1937.

[27]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602(2013).

[28]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.

[29]

R. Munos, Tom Stepleton, A. Harutyunyan, and Marc G. Bellemare. 2016. Safe and Efficient Off-Policy Reinforcement Learning. In NIPS.

[30]

Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, and Martin Riedmiller. 2020. Continuous-discrete reinforcement learning for hybrid control in robotics. In Conference on Robot Learning. PMLR, 735–751.

[31]

Andrew Y Ng, Stuart J Russell, 2000. Algorithms for inverse reinforcement learning. In Icml, Vol. 1. 2.

[32]

Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, John Duchi, and Russ Tedrake. 2018. Scalable end-to-end autonomous vehicle testing via rare-event simulation. arXiv preprint arXiv:1811.00145(2018).

[33]

OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/.

[34]

Tom Schaul, John Quan, Ioannis Antonoglou, and D. Silver. 2016. Prioritized Experience Replay. CoRR abs/1511.05952(2016).

[35]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889–1897.

[36]

John Schulman, P. Moritz, Sergey Levine, Michael I. Jordan, and P. Abbeel. 2016. High-Dimensional Continuous Control Using Generalized Advantage Estimation. CoRR abs/1506.02438(2016).

[37]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017).

[38]

Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.

[39]

Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548(2018).

[40]

Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224(2016).

[41]

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In International conference on machine learning. PMLR, 1995–2003.

[42]

Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shu Liao, and Jimmy Ba. 2017. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NIPS.

[43]

Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, and Han Liu. 2018. Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space. arXiv preprint arXiv:1810.06394(2018).

[44]

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1–38.

Digital Library

Cited By

Zhu ZDeng XDong JFeng CFu X(2024)AK-MADDPG-Based Antijamming Strategy Design Method for Frequency Agile RadarSensors10.3390/s2411344524:11(3445)Online publication date: 27-May-2024
https://doi.org/10.3390/s24113445
Lee SSeon JSun YKim SKyeong CKim DKim J(2024)Novel Architecture of Energy Management Systems Based on Deep Reinforcement Learning in MicrogridIEEE Transactions on Smart Grid10.1109/TSG.2023.331709615:2(1646-1658)Online publication date: Mar-2024
https://doi.org/10.1109/TSG.2023.3317096
Acharya KRaza WDourado CVelasquez ASong H(2024)Neurosymbolic Reinforcement Learning and Planning: A SurveyIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33114285:5(1939-1953)Online publication date: May-2024
https://doi.org/10.1109/TAI.2023.3311428
Show More Cited By

Recommendations

An Overview of Deep Reinforcement Learning
CACRE2019: Proceedings of the 2019 4th International Conference on Automation, Control and Robotics Engineering

As a new machine learning method, deep reinforcement learning has made important progress in various fields of people's production and life since it was proposed. However, there are still many difficulties in function design and other aspects. Therefore,...
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

Deep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 2021

699 pages

ISBN:9781450385053

DOI:10.1145/3508546

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACAI'21

ACAI'21: 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 22 - 24, 2021

Sanya, China

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
14,957
Total Downloads

Downloads (Last 12 months)8,923
Downloads (Last 6 weeks)857

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu ZDeng XDong JFeng CFu X(2024)AK-MADDPG-Based Antijamming Strategy Design Method for Frequency Agile RadarSensors10.3390/s2411344524:11(3445)Online publication date: 27-May-2024
https://doi.org/10.3390/s24113445
Lee SSeon JSun YKim SKyeong CKim DKim J(2024)Novel Architecture of Energy Management Systems Based on Deep Reinforcement Learning in MicrogridIEEE Transactions on Smart Grid10.1109/TSG.2023.331709615:2(1646-1658)Online publication date: Mar-2024
https://doi.org/10.1109/TSG.2023.3317096
Acharya KRaza WDourado CVelasquez ASong H(2024)Neurosymbolic Reinforcement Learning and Planning: A SurveyIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33114285:5(1939-1953)Online publication date: May-2024
https://doi.org/10.1109/TAI.2023.3311428
Junoh SPyun J(2024)Enhancing Indoor Localization With Semi-Crowdsourced Fingerprinting and GAN-Based Data AugmentationIEEE Internet of Things Journal10.1109/JIOT.2023.333170511:7(11945-11959)Online publication date: 1-Apr-2024
https://doi.org/10.1109/JIOT.2023.3331705
Wang ZGoudarzi MGong MBuyya R(2024)Deep Reinforcement Learning-based scheduling for optimizing system load and response time in edge and fog computing environmentsFuture Generation Computer Systems10.1016/j.future.2023.10.012152:C(55-69)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.future.2023.10.012
Razaq MSong HPeng LHo P(2024)Optimizing Secrecy Energy Efficiency in RIS-assisted MISO systems using Deep Reinforcement LearningComputer Communications10.1016/j.comcom.2024.01.020217(126-133)Online publication date: Mar-2024
https://doi.org/10.1016/j.comcom.2024.01.020
Singh SRai J(2024)Enhancement of power quality in three-phase GC solar photovoltaicsElectrical Engineering10.1007/s00202-024-02304-zOnline publication date: 7-Mar-2024
https://doi.org/10.1007/s00202-024-02304-z
Kabir HTham MChang YChow COwada Y(2023)Mobility-Aware Resource Allocation in IoRT Network for Post-Disaster Communications with Parameterized Reinforcement LearningSensors10.3390/s2314644823:14(6448)Online publication date: 17-Jul-2023
https://doi.org/10.3390/s23146448
Lee NUhm TPark JNoh KKim H(2023)Reinforcement Learning based Sequential Multi-Robot Task Allocation Considering Weight of Objects and Payload of Robots2023 23rd International Conference on Control, Automation and Systems (ICCAS)10.23919/ICCAS59377.2023.10316925(1858-1861)Online publication date: 17-Oct-2023
https://doi.org/10.23919/ICCAS59377.2023.10316925
Sabzekar SSamadzad MMehditabrizi ATak A(2023)A Deep Reinforcement Learning Approach for UAV Path Planning Incorporating Vehicle Dynamics with Acceleration ControlUnmanned Systems10.1142/S230138502442004412:03(477-498)Online publication date: 16-Nov-2023
https://doi.org/10.1142/S2301385024420044
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents