research-article

Hierarchical Reinforcement Learning: A Comprehensive Survey

Authors:

Shubham Pateria,

Budhitama Subagdja,

Chai QuekAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 54, Issue 5

Article No.: 109, Pages 1 - 35

https://doi.org/10.1145/3453160

Published: 05 June 2021 Publication History

Abstract

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.

Supplementary Material

a109-pateria-suppl.pdf (pateria.zip)

Supplemental movie, appendix, image and software files for, Hierarchical Reinforcement Learning: A Comprehensive Survey

Download
75.90 KB

References

[1]

Joshua Achiam, Harrison Edwards, Dario Amodei, and Pieter Abbeel. 2018. Variational option discovery algorithms. arxiv:1807.10299 (2018).

[2]

Sanjeevan Ahilan and Peter Dayan. 2019. Feudal multi-agent hierarchies for cooperative reinforcement learning. arxiv:1901.08492 (2019).

[3]

Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). AAAI Press, 1726–1734.

[4]

Akhil Bagaria and George Konidaris. 2020. Option discovery using deep skill chaining. In Proceedings of the 8th International Conference on Learning Representations.

[5]

Bram Bakker and Jürgen Schmidhuber. 2004. Hierarchical reinforcement learning with subpolicies specializing for learned subgoals. In Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence. IASTED/ACTA Press, 125–130.

[6]

Andre Barreto, Diana Borsa, Shaobo Hou, Gheorghe Comanici, Eser Aygün, Philippe Hamel, Daniel Toyama, Jonathan Hunt, Shibl Mourad, David Silver, and Doina Precup. 2019. The option keyboard: Combining skills in reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 13052–13062.

[7]

Andrew G. Barto and Sridhar Mahadevan. 2003. Recent advances in hierarchical reinforcement learning. Discr. Event Dyn. Syst. 13, 1-2 (2003), 41–77.

[8]

Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 1 (02 1970), 164–171.

[9]

Melike Baykal-Gürsoy. 2010. Semi-Markov decision processes. Wiley Encyclopedia of Operations Research and Management Science (2010).

[10]

Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Rémi Munos. 2016. Unifying count-based exploration and intrinsic motivation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 1479–1487.

Digital Library

[11]

Richard Bellman. 1962. Dynamic programming treatment of the travelling salesman problem. J. ACM 9, 1 (Jan. 1962), 61–63.

Digital Library

[12]

Richard Bellman. 1954. The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11 1954), 503–515.

[13]

Mordechai Ben-Ari and Francesco Mondada. 2018. Finite state machines. In Elements of Robotics. Springer, 55–61.

[14]

Jhelum Chakravorty, Patrick Nadeem Ward, Julien Roy, Maxime Chevalier-Boisvert, Sumana Basu, Andrei Lupu, and Doina Precup. 2020. Option-critic in cooperative multi-agent systems. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS’20). International Foundation for Autonomous Agents and Multiagent Systems, 1792–1794.

Digital Library

[15]

Z. Chen and B. Liu. 2018. Lifelong Machine Learning. Vol. 12. Morgan & Claypool Publishers. 1–207 pages.

[16]

Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, 103–111.

[17]

Christian Daniel, Herke Van Hoof, Jan Peters, and Gerhard Neumann. 2016. Probabilistic inference for determining options in reinforcement learning. Mach. Learn. 104, 2-3 (2016), 337–357.

Digital Library

[18]

G. Dantzig and Delbert Ray Fulkerson. 2003. On the max flow min cut theorem of networks. Lin. Ineq. Relat. Syst. 38 (2003), 225–231.

[19]

Peter Dayan and Geoffrey E. Hinton. 1993. Feudal reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 5. Morgan-Kaufmann, 271–278.

[20]

Thomas G. Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Int. Res. 13, 1 (Nov. 2000), 227–303.

[21]

Mostafa Al-Emran. 2015. Hierarchical reinforcement learning: A survey. Int. J. Comput. Dig. Syst. 4, 02 (2015).

[22]

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning?J. Mach. Learn. Res. 11 (Mar. 2010), 625–660.

[23]

Ben Eysenbach, Russ R. Salakhutdinov, and Sergey Levine. 2019. Search on the replay buffer: Bridging planning and reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 15246–15257.

[24]

Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is all you need: Learning skills without a reward function. arxiv:1802.06070 (2018).

[25]

Carlos Florensa, Yan Duan, and Pieter Abbeel. 2017. Stochastic neural networks for hierarchical reinforcement learning. arxiv:1704.03012 (2017).

[26]

Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 2145–2153.

Digital Library

[27]

Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual multi-agent policy gradients. arxiv:1705.08926 (2017).

[28]

Roy Fox, Sanjay Krishnan, Ion Stoica, and Ken Goldberg. 2017. Multi-level discovery of deep options. arxiv:1703.08294 (2017).

[29]

Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. 2017. Meta learning shared hierarchies. arxiv:1710.09767 (2017).

[30]

Mohammad Ghavamzadeh, Sridhar Mahadevan, and Rajbala Makar. 2006. Hierarchical multi-agent reinforcement learning. Auton. Agents Multi-agent Syst. 13, 2 (2006), 197–229.

Digital Library

[31]

Karol Gregor, Danilo Jimenez Rezende, and Daan Wierstra. 2016. Variational intrinsic control. arxiv:1611.07507 (2016).

[32]

Shixiang Gu, Ethan Holly, Timothy P. Lillicrap, and Sergey Levine. 2016. Deep reinforcement learning for robotic manipulation. CoRR abs/1610.00633 (2016).

[33]

Abhishek Gupta, Vikash Kumar, Corey Lynch, Sergey Levine, and Karol Hausman. 2020. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Proceedings of the Conference on Robot Learning (Proceedings of Machine Learning Research), Vol. 100. PMLR, 1025–1037.

[34]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 80. PMLR, 1861–1870.

[35]

Tuomas Haarnoja, Aurick Zhou, Sehoon Ha, Jie Tan, George Tucker, and Sergey Levine. 2018. Learning to walk via deep reinforcement learning. (2018). arxiv:1812.11103

[36]

Jean Harb, Pierre-Luc Bacon, Martin Klissarov, and Doina Precup. 2018. When waiting is not an option: Learning options with a deliberation cost. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[37]

Leonard Hasenclever, Fabio Pardo, Raia Hadsell, Nicolas Heess, and Josh Merel. 2020. CoMic: Complementary task learning & mimicry for reusable skills. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 119. PMLR, 4105–4115.

[38]

Karol Hausman, Jost Tobias Springenberg, Ziyu Wang, Nicolas Heess, and Martin A. Riedmiller. 2018. Learning an embedding space for transferable robot skills. In Proceedings of the 6th International Conference on Learning Representations.

[39]

Bernhard Hengst. 2010. Hierarchical reinforcement learning. In Encyclopedia of Machine Learning. Springer US, Boston, MA, 495–502.

[40]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780.

Digital Library

[41]

YiDing Jiang, Shixiang (Shane) Gu, Kevin P. Murphy, and Chelsea Finn. 2019. Language as an abstraction for hierarchical deep reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 9419–9431.

[42]

Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross B. Girshick. 2016. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. arxiv:1612.06890 (2016).

[43]

Nicholas K. Jong, Todd Hester, and Peter Stone. 2008. The utility of temporal abstraction in reinforcement learning. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’08). International Foundation for Autonomous Agents and Multiagent Systems, 299–306.

Digital Library

[44]

Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 1–2 (May 1998), 99–134.

Digital Library

[45]

Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, and Doina Precup. 2020. Options of interest: Temporal abstraction with interest functions. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. AAAI Press, 4444–4451.

[46]

Khimya Khetarpal and Doina Precup. 2019. Learning options with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence. 9955–9956.

Digital Library

[47]

Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations.

[48]

Martin Klissarov, Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. Learnings options end-to-end for continuous action tasks. CoRR abs/1712.00004 (2017).

[49]

George Konidaris and Andrew Barto. 2007. Building portable options: Skill transfer in reinforcement learning. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, 895–900.

[50]

George Konidaris and Andrew Barto. 2009. Skill discovery in continuous reinforcement learning domains using skill chaining. In Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS’09). Curran Associates Inc., Red Hook, NY, 1015–1023.

[51]

Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, and Joshua B. Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 3682–3690.

[52]

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J. Gershman. 2016. Deep successor reinforcement learning. arxiv:1606.02396 (2016).

[53]

Alessandro Lazaric. 2012. Transfer in reinforcement learning: A framework and a survey. In Reinforcement Learning. Springer, 143–173.

[54]

Andrew Levy, George Dimitri Konidaris, Robert Platt Jr., and Kate Saenko. 2019. Learning multi-level hierarchies with hindsight. In Proceedings of the 7th International Conference on Learning Representations.

[55]

Kfir Y. Levy and Nahum Shimkin. 2011. Unified inter and intra options learning using policy gradient methods. In Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning (EWRL’11). Springer-Verlag, Berlin, 153–164.

[56]

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations.

[57]

Fredrik Linåker. 2000. Time series segmentation using an adaptive resource allocating vector quantization network based on change detection. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IEEE Computer Society, 6323.

[58]

Miao Liu, Christopher Amato, Emily P. Anesta, J. Daniel Griffith, and Jonathan P. How. 2016. Learning for decentralized control of multiagent systems in large, partially-observable stochastic environments. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 2523–2529. Retrieved from https://dl.acm.org/doi/10.5555/3016100.3016253.

[59]

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 6382–6393.

Digital Library

[60]

Marlos C. Machado, Marc G. Bellemare, and Michael Bowling. 2017. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 2295–2304.

[61]

Sridhar Mahadevan and Mauro Maggioni. 2007. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. J. Mach. Learn. Res. 8 (Dec. 2007), 2169–2231.

[62]

Rajbala Makar, Sridhar Mahadevan, and Mohammad Ghavamzadeh. 2001. Hierarchical multi-agent reinforcement learning. In Proceedings of the 5th International Conference on Autonomous Agents (AGENTS’01). Association for Computing Machinery, New York, NY, 246–253.

Digital Library

[63]

Oded Maron and Tomás Lozano-Pérez. 1998. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems, Vol. 10. The MIT Press, 570–576.

[64]

Amy McGovern and Andrew G. Barto. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, 361–368.

[65]

Francisco S. Melo. 2001. Convergence of q-learning: A Simple Proof. Technical Report. Institute of Systems and Robotics.

[66]

Ishai Menache, Shie Mannor, and Nahum Shimkin. 2002. Q-cut—Dynamic discovery of sub-goals in reinforcement learning. In Proceedings of the 13th European Conference on Machine Learning (ECML’02). Springer-Verlag, Berlin, 295–306.

Digital Library

[67]

Matheus R. F. Mendonça, Artur Ziviani, and André M. S. Barreto. 2019. Graph-based skill acquisition for reinforcement learning. ACM Comput. Surv. 52, 1 (Feb. 2019).

Digital Library

[68]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.

[69]

Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Data-efficient hierarchical reinforcement learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., Red Hook, NY, 3307–3317.

Digital Library

[70]

Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Near-optimal representation learning for hierarchical reinforcement learning. arxiv:1810.01257 (2018).

[71]

Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, and Sergey Levine. 2019. Why does hierarchy (sometimes) work so well in reinforcement learning? arxiv:1909.10618 (2019).

[72]

S. Omidshafiei, A. Agha-mohammadi, C. Amato, and J. P. How. 2015. Decentralized control of partially observable Markov decision processes using belief space macro-actions. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’15). 5962–5969.

[73]

Ronald Parr and Stuart Russell. 1998. Reinforcement learning with hierarchies of machines. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’97). The MIT Press, Cambridge, MA, 1043–1049.

[74]

Shubham Pateria, Budhitama Subagdja, and Ah-Hwee Tan. 2019. Multi-agent reinforcement learning in spatial domain tasks using inter subtask empowerment rewards. In Proceedings of the IEEE Symposium Series on Computational Intelligence. IEEE, 86–93.

[75]

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 2778–2787.

[76]

Tabish Rashid, Mikayel Samvelyan, Christian Schröder de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2018. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. arxiv:1803.11485 (2018).

[77]

John D. Co-Reyes, Yuxuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, and Sergey Levine. 2018. Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings. arxiv:1806.02813 (2018).

[78]

Matthew Riemer, Miao Liu, and Gerald Tesauro. 2018. Learning abstract options. In Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc., 10424–10434.

[79]

Khashayar Rohanimanesh and Sridhar Mahadevan. 2002. Learning to take concurrent actions. In Proceedings of the 15th International Conference on Neural Information Processing Systems (NIPS’02). The MIT Press, Cambridge, MA, 1651–1658.

Digital Library

[80]

Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall Press.

Digital Library

[81]

Andrei A. Rusu, Sergio Gomez Colmenarejo, Çaglar Gülçehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2016. Policy distillation. In Proceedings of the 4th International Conference on Learning Representations.

[82]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 37. PMLR, 1889–1897.

[83]

Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. 2019. Dynamics-aware unsupervised discovery of skills. arxiv:1907.01657 (2019).

[84]

Özgür Şimşek and Andrew G. Barto. 2008. Skill characterization based on betweenness. In Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS’08). Curran Associates Inc., Red Hook, NY, 1497–1504.

[85]

Özgür Şimşek, Alicia P. Wolfe, and Andrew G. Barto. 2005. Identifying useful subgoals in reinforcement learning by local graph partitioning (ICML’05). Association for Computing Machinery, New York, NY, 816–823.

[86]

Sungryull Sohn, Junhyuk Oh, and Honglak Lee. 2018. Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., Red Hook, NY, 7156–7166.

Digital Library

[87]

Martin Stolle and Doina Precup. 2002. Learning options in reinforcement learning. In Proceedings of the International Symposium on Abstraction, Reformulation, and Approximation. Springer, 212–223.

[88]

Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, and Rob Fergus. 2018. Learning goal embeddings via self-play for hierarchical reinforcement learning. arxiv:1811.09083 (2018).

[89]

Sainbayar Sukhbaatar, Ilya Kostrikov, Arthur Szlam, and Rob Fergus. 2017. Intrinsic motivation and automatic curricula via asymmetric self-play. arxiv:1703.05407 (2017).

[90]

Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. 2016. Learning multiagent communication with backpropagation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 2252–2260.

Digital Library

[91]

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.) The MIT Press, Cambridge, MA.

Digital Library

[92]

Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS’99). The MIT Press, Cambridge, MA, 1057–1063.

Digital Library

[93]

Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 1–2 (Aug. 1999), 181–211.

Digital Library

[94]

Ming Tan. 1997. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. Morgan Kaufmann Publishers Inc., San Francisco, CA, 487–494.

[95]

Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Changjie Fan, and Li Wang. 2018. Hierarchical deep multiagent reinforcement learning. arxiv:1809.09332 (2018).

[96]

Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, and Shie Mannor. 2017. A deep hierarchical approach to lifelong learning in minecraft. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). AAAI Press, 1553–1561.

[97]

E. Todorov, T. Erez, and Y. Tassa. 2012. MuJoCo: A physics engine for model-based control. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 5026–5033.

[98]

Claudio Turchetti. 2004. Stochastic Models of Neural Networks. Vol. 102. IOS Press.

[99]

Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. FeUdal networks for hierarchical reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 3540–3549.

[100]

Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Mach. Learn. 8, 3--4 (1992), 279–292.

Digital Library

[101]

Jiachen Yang, Igor Borovikov, and Hongyuan Zha. 2020. Hierarchical cooperative multi-agent reinforcement learning with skill discovery. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1566–1574.

[102]

Z. Yang, K. Merrick, L. Jin, and H. A. Abbass. 2018. Hierarchical deep reinforcement learning for continuous action control. IEEE Trans. Neural Netw. Learn. Syst. 29, 11 (2018), 5174–5184.

[103]

Tom Zahavy, Avinatan Hasidim, Haim Kaplan, and Yishay Mansour. 2020. Planning in hierarchical reinforcement learning: Guarantees for using local policies. Proc. Mach. Learn. Res., Vol. 117. PMLR, 906–934.

[104]

Shangtong Zhang and Shimon Whiteson. 2019. DAC: The double actor-critic architecture for learning options. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 2012–2022.

Cited By

He ZQiu WZhao WShao XLiu Z(2025)Understanding world models through multi-step pruning policy via reinforcement learningInformation Sciences10.1016/j.ins.2024.121361686(121361)Online publication date: Jan-2025
https://doi.org/10.1016/j.ins.2024.121361
Chen BCao ZDastani MSichman JAlechina NDignum V(2024)HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent PerformanceProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663103(2189-2191)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663103
Ma HVo TLeong TDastani MSichman JAlechina NDignum V(2024)Mixed-Initiative Bayesian Sub-Goal Optimization in Hierarchical Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662991(1328-1336)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662991
Show More Cited By

Index Terms

Hierarchical Reinforcement Learning: A Comprehensive Survey
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Transfer in variable-reward hierarchical reinforcement learning

Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL ...
Concurrent Hierarchical Reinforcement Learning for RoboCup Keepaway
RoboCup 2017: Robot World Cup XXI
Abstract
RoboCup Keepaway, originated from the RoboCup soccer simulation 2D challenge, has been widely used as a machine learning benchmark. In this paper, we present a concurrent hierarchical reinforcement learning approach to RoboCup Keepaway. Following ...
Multi-Agent Hierarchical Reinforcement Learning by Integrating Options into MAXQ
IMSCCS '06: Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences - Volume 1 (IMSCCS'06) - Volume 01

MAXQ is a new framework for multi-agent reinforcement learning. But the MAXQ framework cannot decompose all subtasks into more refined hierarchies and the hierarchies are difficult to be discovered automatically. In this paper, a multi-agent hierarchical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 54, Issue 5

June 2022

719 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3467690

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2021

Accepted: 01 February 2021

Revised: 01 February 2021

Received: 01 July 2020

Published in CSUR Volume 54, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier-1
National Research Foundation, Singapore under its AI Singapore Programme AISG

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

137
Total Citations
View Citations
8,973
Total Downloads

Downloads (Last 12 months)2,695
Downloads (Last 6 weeks)231

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

He ZQiu WZhao WShao XLiu Z(2025)Understanding world models through multi-step pruning policy via reinforcement learningInformation Sciences10.1016/j.ins.2024.121361686(121361)Online publication date: Jan-2025
https://doi.org/10.1016/j.ins.2024.121361
Chen BCao ZDastani MSichman JAlechina NDignum V(2024)HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent PerformanceProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663103(2189-2191)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663103
Ma HVo TLeong TDastani MSichman JAlechina NDignum V(2024)Mixed-Initiative Bayesian Sub-Goal Optimization in Hierarchical Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662991(1328-1336)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662991
Wu GZhang DMiao ZBao WCao J(2024)How to Design Reinforcement Learning Methods for the Edge: An Integrated Approach toward Intelligent Decision MakingElectronics10.3390/electronics1307128113:7(1281)Online publication date: 29-Mar-2024
https://doi.org/10.3390/electronics13071281
Yuan WChen JChen SFeng DHu ZLi PZhao W(2024)Transformer in reinforcement learning for decision-making: a survey基于Transformer的强化学习方法在智能决策领域的应用: 综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230054825:6(763-790)Online publication date: 5-Jul-2024
https://doi.org/10.1631/FITEE.2300548
Wientjes SHolroyd C(2024)The successor representation subserves hierarchical abstraction for goal-directed behaviorPLOS Computational Biology10.1371/journal.pcbi.101131220:2(e1011312)Online publication date: 20-Feb-2024
https://doi.org/10.1371/journal.pcbi.1011312
Wu YMacdonald COunis I(2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3651169
Yang QParasuraman R(2024)Bayesian Strategy Networks Based Soft Actor-Critic LearningACM Transactions on Intelligent Systems and Technology10.1145/364386215:3(1-24)Online publication date: 29-Mar-2024
https://dl.acm.org/doi/10.1145/3643862
Sunel SÇilden EPolat F(2024)Faster MIL-based Subgoal Identification for Reinforcement Learning by Tuning Fewer HyperparametersACM Transactions on Autonomous and Adaptive Systems10.1145/364385219:2(1-29)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643852
Zhang HShen SXu BHuang ZWu JSha JWang SBaeza-Yates RBonchi F(2024)Item-Difficulty-Aware Learning Path Recommendation: From a Real Walking PerspectiveProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671947(4167-4178)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671947
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents