Article

Free access

Modular multitask reinforcement learning with policy sketches

Authors:

Sergey LevineAuthors Info & Claims

ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

August 2017

Pages 166 - 175

Published: 06 August 2017 Publication History

PDF eReader Publisher Site

Abstract

We describe a framework for multitask deep reinforcement learning guided by policy sketches. Sketches annotate tasks with sequences of named subtasks, providing information about high-level structural relationships among tasks but not how to implement them—specifically not providing the detailed guidance used by much previous work on learning policy abstractions for RL (e.g. intermediate rewards, subtask completion signals, or intrinsic motivations). To learn from sketches, we present a model that associates every subtask with a modular subpolicy, and jointly maximizes reward over full task-specific policies by tying parameters across shared subpolicies. Optimization is accomplished via a decoupled actor-critic training objective that facilitates learning common behaviors from multiple dissimilar reward functions. We evaluate the effectiveness of our approach in three environments featuring both discrete and continuous control, and with sparse rewards that can be obtained only after completing a number of high-level sub-goals. Experiments show that using our approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.

References

[1]

Andre, David and Russell, Stuart. Programmable reinforcement learning agents. In Advances in Neural Information Processing Systems, 2001.

Digital Library

[2]

Andre, David and Russell, Stuart. State abstraction for programmable reinforcement learning agents. In Proceedings of the Meeting of the Association for the Advancement of Artificial Intelligence, 2002.

Digital Library

[3]

Andreas, Jacob, Rohrbach, Marcus, Darrell, Trevor, and Klein, Dan. Learning to compose neural networks for question answering. In Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics, 2016.

[4]

Artzi, Yoav and Zettlemoyer, Luke. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association for Computational Linguistics, 1(1):49-62, 2013.

[5]

Bacon, Pierre-Luc and Precup, Doina. The option-critic architecture. In NIPS Deep Reinforcement Learning Workshop, 2015.

[6]

Bakker, Bram and Schmidhuber, Jürgen. Hierarchical reinforcement learning based on subgoal discovery and sub-policy specialization. In Proc. of the 8-th Conf. on Intelligent Autonomous Systems, pp. 438-445, 2004.

[7]

Bengio, Yoshua, Louradour, Jérôme, Collobert, Ronan, and Weston, Jason. Curriculum learning. pp. 41-48. ACM, 2009.

Digital Library

[8]

Branavan, S.R.K., Chen, Harr, Zettlemoyer, Luke S., and Barzilay, Regina. Reinforcement learning for mapping instructions to actions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 82-90. Association for Computational Linguistics, 2009.

Digital Library

[9]

Chen, David L. and Mooney, Raymond J. Learning to interpret natural language navigation instructions from observations. In Proceedings of the Meeting of the Association for the Advancement of Artificial Intelligence, volume 2, pp. 1-2, 2011.

Digital Library

[10]

Daniel, Christian, Neumann, Gerhard, and Peters, Jan. Hierarchical relative entropy policy search. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 273-281, 2012.

[11]

Devin, Coline, Gupta, Abhishek, Darrell, Trevor, Abbeel, Pieter, and Levine, Sergey. Learning modular neural network policies for multi-task and multi-robot transfer. arXiv preprint arXiv:1609.07088, 2016.

[12]

Dietterich, Thomas G. Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Intell. Res. (JAIR), 13:227-303, 2000.

Digital Library

[13]

Greensmith, Evan, Bartlett, Peter L, and Baxter, Jonathan. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471-1530, 2004.

Digital Library

[14]

Hauser, Kris, Bretl, Timothy, Harada, Kensuke, and Latombe, Jean-Claude. Using motion primitives in probabilistic sample-based planning for humanoid robots. In Algorithmic foundation of robotics, pp. 507-522. Springer, 2008.

[15]

Iyyer, Mohit, Boyd-Graber, Jordan, Claudino, Leonardo, Socher, Richard, and Daumé III, Hal. A neural network for factoid question answering over paragraphs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014.

[16]

Kearns, Michael and Singh, Satinder. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.

Digital Library

[17]

Konidaris, George and Barto, Andrew G. Building portable options: Skill transfer in reinforcement learning. In IJCAI, volume 7, pp. 895-900, 2007.

Digital Library

[18]

Konidaris, George, Kuindersma, Scott, Grupen, Roderic, and Barto, Andrew. Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research, pp. 0278364911428653, 2011.

Digital Library

[19]

Kulkarni, Tejas D, Narasimhan, Karthik R, Saeedi, Ardavan, and Tenenbaum, Joshua B. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. arXiv preprint arXiv:1604.06057, 2016.

[20]

Marthi, Bhaskara, Lantham, David, Guestrin, Carlos, and Russell, Stuart. Concurrent hierarchical reinforcement learning. In Proceedings of the Meeting of the Association for the Advancement of Artificial Intelligence, 2004.

Digital Library

[21]

Menache, Ishai, Mannor, Shie, and Shimkin, Nahum. Q-cutdynamic discovery of sub-goals in reinforcement learning. In European Conference on Machine Learning, pp. 295-306. Springer, 2002.

Digital Library

[22]

Neelakantan, Arvind, Le, Quoc V, and Sutskever, Ilya. Neural programmer: Inducing latent programs with gradient descent. arXiv preprint arXiv:1511.04834, 2015.

[23]

Niekum, Scott, Osentoski, Sarah, Konidaris, George, Chitta, Sachin, Marthi, Bhaskara, and Barto, Andrew G. Learning grounded finite-state representations from unstructured demonstrations. The International Journal of Robotics Research, 34(2):131-157, 2015.

Digital Library

[24]

Parr, Ron and Russell, Stuart. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, 1998.

Digital Library

[25]

Precup, Doina. Temporal abstraction in reinforcement learning. PhD thesis, 2000.

Digital Library

[26]

Reed, Scott and de Freitas, Nando. Neural programmer-interpreters. Proceedings of the International Conference on Learning Representations, 2016.

[27]

Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015a.

[28]

Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. Trust region policy optimization. 2015b.

[29]

Socher, Richard, Huval, Brody, Manning, Christopher, and Ng, Andrew. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1201-1211, Jeju, Korea, 2012.

Digital Library

[30]

Stolle, Martin and Precup, Doina. Learning options in reinforcement learning. In International Symposium on Abstraction, Reformulation, and Approximation, pp. 212-223. Springer, 2002.

Digital Library

[31]

Sutton, Richard S, Precup, Doina, and Singh, Satinder. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1):181-211, 1999.

Digital Library

[32]

Tellex, Stefanie, Kollar, Thomas, Dickerson, Steven, Walter, Matthew R., Banerjee, Ashis Gopal, Teller, Seth, and Roy, Nicholas. Understanding natural language commands for robotic navigation and mobile manipulation. In In Proceedings of the National Conference on Artificial Intelligence, 2011.

Digital Library

[33]

Tieleman, Tijmen. RMSProp (unpublished), 2012.

[34]

Vezhnevets, Alexander, Mnih, Volodymyr, Agapiou, John, Osindero, Simon, Graves, Alex, Vinyals, Oriol, and Kavukcuoglu, Koray. Strategic attentive writer for learning macro-actions. arXiv preprint arXiv:1606.04695, 2016.

[35]

Vogel, Adam and Jurafsky, Dan. Learning to follow navigational directions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 806-814. Association for Computational Linguistics, 2010.

Digital Library

[36]

Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.

Digital Library

Cited By

Schmied THofmarcher MPaischer FPascanu RHochreiter SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning to modulate pre-trained models in RLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667782(38231-38265)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667782
Rodriguez-Sanchez RSpiegel BWang JPatel RTellex SKonidaris GKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)RLangProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619620(29161-29178)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619620
Dalmia SOkhonko DLewis MEdunov SWatanabe SMetze FZettlemoyer LMohamed A(2023)LegoNN: Building Modular Encoder-Decoder ModelsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.329601931(3112-3126)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TASLP.2023.3296019
Show More Cited By

Index Terms

Modular multitask reinforcement learning with policy sketches
1. Computing methodologies
  1. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Multitask Learning
Special issue on inductive transfer

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared ...
Read More
Scalable multitask policy gradient reinforcement learning
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which ...
Read More
Multitask learning
Read More

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

August 2017

4208 pages

Publisher

JMLR.org

Publication History

Published: 06 August 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)2

Other Metrics

View Author Metrics

Citations

Cited By

Schmied THofmarcher MPaischer FPascanu RHochreiter SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning to modulate pre-trained models in RLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667782(38231-38265)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667782
Rodriguez-Sanchez RSpiegel BWang JPatel RTellex SKonidaris GKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)RLangProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619620(29161-29178)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619620
Dalmia SOkhonko DLewis MEdunov SWatanabe SMetze FZettlemoyer LMohamed A(2023)LegoNN: Building Modular Encoder-Decoder ModelsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.329601931(3112-3126)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TASLP.2023.3296019
Yoo MCho SWoo HKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Skills regularized task decomposition for multi-task offline reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602983(37432-37444)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602983
Sun LZhang HXu WTomizuka MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)PaCoProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601832(21495-21507)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601832
Fan LWang GJiang YMandlekar AYang YZhu HTang AHuang DZhu YAnandkumar AKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)MINEDOJOProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601603(18343-18362)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601603
Bamford CJiang MSamvelyan MRocktäschel TKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)GriddlyJSProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601365(15051-15065)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601365
Zhong VHanjie AWang SNarasimhan KZettlemoyer LRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)SILGProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541906(21505-21519)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541906
Mao JShi HWu JLevy RTenenbaum JRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Grammar-based grounded lexicon learningProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540863(7865-7878)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540863
Furelos-Blanco DLaw MJonsson ABroda KRusso A(2021)Induction and Exploitation of Subgoal Automata for Reinforcement LearningJournal of Artificial Intelligence Research10.1613/jair.1.1237270(1031-1116)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12372
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents