Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3305381.3305399guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article
Free access

Modular multitask reinforcement learning with policy sketches

Published: 06 August 2017 Publication History
  • Get Citation Alerts
  • Abstract

    We describe a framework for multitask deep reinforcement learning guided by policy sketches. Sketches annotate tasks with sequences of named subtasks, providing information about high-level structural relationships among tasks but not how to implement them—specifically not providing the detailed guidance used by much previous work on learning policy abstractions for RL (e.g. intermediate rewards, subtask completion signals, or intrinsic motivations). To learn from sketches, we present a model that associates every subtask with a modular subpolicy, and jointly maximizes reward over full task-specific policies by tying parameters across shared subpolicies. Optimization is accomplished via a decoupled actor-critic training objective that facilitates learning common behaviors from multiple dissimilar reward functions. We evaluate the effectiveness of our approach in three environments featuring both discrete and continuous control, and with sparse rewards that can be obtained only after completing a number of high-level sub-goals. Experiments show that using our approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.

    References

    [1]
    Andre, David and Russell, Stuart. Programmable reinforcement learning agents. In Advances in Neural Information Processing Systems, 2001.
    [2]
    Andre, David and Russell, Stuart. State abstraction for programmable reinforcement learning agents. In Proceedings of the Meeting of the Association for the Advancement of Artificial Intelligence, 2002.
    [3]
    Andreas, Jacob, Rohrbach, Marcus, Darrell, Trevor, and Klein, Dan. Learning to compose neural networks for question answering. In Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics, 2016.
    [4]
    Artzi, Yoav and Zettlemoyer, Luke. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association for Computational Linguistics, 1(1):49-62, 2013.
    [5]
    Bacon, Pierre-Luc and Precup, Doina. The option-critic architecture. In NIPS Deep Reinforcement Learning Workshop, 2015.
    [6]
    Bakker, Bram and Schmidhuber, Jürgen. Hierarchical reinforcement learning based on subgoal discovery and sub-policy specialization. In Proc. of the 8-th Conf. on Intelligent Autonomous Systems, pp. 438-445, 2004.
    [7]
    Bengio, Yoshua, Louradour, Jérôme, Collobert, Ronan, and Weston, Jason. Curriculum learning. pp. 41-48. ACM, 2009.
    [8]
    Branavan, S.R.K., Chen, Harr, Zettlemoyer, Luke S., and Barzilay, Regina. Reinforcement learning for mapping instructions to actions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 82-90. Association for Computational Linguistics, 2009.
    [9]
    Chen, David L. and Mooney, Raymond J. Learning to interpret natural language navigation instructions from observations. In Proceedings of the Meeting of the Association for the Advancement of Artificial Intelligence, volume 2, pp. 1-2, 2011.
    [10]
    Daniel, Christian, Neumann, Gerhard, and Peters, Jan. Hierarchical relative entropy policy search. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 273-281, 2012.
    [11]
    Devin, Coline, Gupta, Abhishek, Darrell, Trevor, Abbeel, Pieter, and Levine, Sergey. Learning modular neural network policies for multi-task and multi-robot transfer. arXiv preprint arXiv:1609.07088, 2016.
    [12]
    Dietterich, Thomas G. Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Intell. Res. (JAIR), 13:227-303, 2000.
    [13]
    Greensmith, Evan, Bartlett, Peter L, and Baxter, Jonathan. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471-1530, 2004.
    [14]
    Hauser, Kris, Bretl, Timothy, Harada, Kensuke, and Latombe, Jean-Claude. Using motion primitives in probabilistic sample-based planning for humanoid robots. In Algorithmic foundation of robotics, pp. 507-522. Springer, 2008.
    [15]
    Iyyer, Mohit, Boyd-Graber, Jordan, Claudino, Leonardo, Socher, Richard, and Daumé III, Hal. A neural network for factoid question answering over paragraphs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014.
    [16]
    Kearns, Michael and Singh, Satinder. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
    [17]
    Konidaris, George and Barto, Andrew G. Building portable options: Skill transfer in reinforcement learning. In IJCAI, volume 7, pp. 895-900, 2007.
    [18]
    Konidaris, George, Kuindersma, Scott, Grupen, Roderic, and Barto, Andrew. Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research, pp. 0278364911428653, 2011.
    [19]
    Kulkarni, Tejas D, Narasimhan, Karthik R, Saeedi, Ardavan, and Tenenbaum, Joshua B. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. arXiv preprint arXiv:1604.06057, 2016.
    [20]
    Marthi, Bhaskara, Lantham, David, Guestrin, Carlos, and Russell, Stuart. Concurrent hierarchical reinforcement learning. In Proceedings of the Meeting of the Association for the Advancement of Artificial Intelligence, 2004.
    [21]
    Menache, Ishai, Mannor, Shie, and Shimkin, Nahum. Q-cutdynamic discovery of sub-goals in reinforcement learning. In European Conference on Machine Learning, pp. 295-306. Springer, 2002.
    [22]
    Neelakantan, Arvind, Le, Quoc V, and Sutskever, Ilya. Neural programmer: Inducing latent programs with gradient descent. arXiv preprint arXiv:1511.04834, 2015.
    [23]
    Niekum, Scott, Osentoski, Sarah, Konidaris, George, Chitta, Sachin, Marthi, Bhaskara, and Barto, Andrew G. Learning grounded finite-state representations from unstructured demonstrations. The International Journal of Robotics Research, 34(2):131-157, 2015.
    [24]
    Parr, Ron and Russell, Stuart. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, 1998.
    [25]
    Precup, Doina. Temporal abstraction in reinforcement learning. PhD thesis, 2000.
    [26]
    Reed, Scott and de Freitas, Nando. Neural programmer-interpreters. Proceedings of the International Conference on Learning Representations, 2016.
    [27]
    Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015a.
    [28]
    Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. Trust region policy optimization. 2015b.
    [29]
    Socher, Richard, Huval, Brody, Manning, Christopher, and Ng, Andrew. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1201-1211, Jeju, Korea, 2012.
    [30]
    Stolle, Martin and Precup, Doina. Learning options in reinforcement learning. In International Symposium on Abstraction, Reformulation, and Approximation, pp. 212-223. Springer, 2002.
    [31]
    Sutton, Richard S, Precup, Doina, and Singh, Satinder. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1):181-211, 1999.
    [32]
    Tellex, Stefanie, Kollar, Thomas, Dickerson, Steven, Walter, Matthew R., Banerjee, Ashis Gopal, Teller, Seth, and Roy, Nicholas. Understanding natural language commands for robotic navigation and mobile manipulation. In In Proceedings of the National Conference on Artificial Intelligence, 2011.
    [33]
    Tieleman, Tijmen. RMSProp (unpublished), 2012.
    [34]
    Vezhnevets, Alexander, Mnih, Volodymyr, Agapiou, John, Osindero, Simon, Graves, Alex, Vinyals, Oriol, and Kavukcuoglu, Koray. Strategic attentive writer for learning macro-actions. arXiv preprint arXiv:1606.04695, 2016.
    [35]
    Vogel, Adam and Jurafsky, Dan. Learning to follow navigational directions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 806-814. Association for Computational Linguistics, 2010.
    [36]
    Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.

    Cited By

    View all
    • (2023)Learning to modulate pre-trained models in RLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667782(38231-38265)Online publication date: 10-Dec-2023
    • (2023)RLangProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619620(29161-29178)Online publication date: 23-Jul-2023
    • (2023)LegoNN: Building Modular Encoder-Decoder ModelsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.329601931(3112-3126)Online publication date: 1-Jan-2023
    • Show More Cited By

    Index Terms

    1. Modular multitask reinforcement learning with policy sketches
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70
      August 2017
      4208 pages

      Publisher

      JMLR.org

      Publication History

      Published: 06 August 2017

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)2

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Learning to modulate pre-trained models in RLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667782(38231-38265)Online publication date: 10-Dec-2023
      • (2023)RLangProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619620(29161-29178)Online publication date: 23-Jul-2023
      • (2023)LegoNN: Building Modular Encoder-Decoder ModelsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.329601931(3112-3126)Online publication date: 1-Jan-2023
      • (2022)Skills regularized task decomposition for multi-task offline reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602983(37432-37444)Online publication date: 28-Nov-2022
      • (2022)PaCoProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601832(21495-21507)Online publication date: 28-Nov-2022
      • (2022)MINEDOJOProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601603(18343-18362)Online publication date: 28-Nov-2022
      • (2022)GriddlyJSProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601365(15051-15065)Online publication date: 28-Nov-2022
      • (2021)SILGProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541906(21505-21519)Online publication date: 6-Dec-2021
      • (2021)Grammar-based grounded lexicon learningProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540863(7865-7878)Online publication date: 6-Dec-2021
      • (2021)Induction and Exploitation of Subgoal Automata for Reinforcement LearningJournal of Artificial Intelligence Research10.1613/jair.1.1237270(1031-1116)Online publication date: 1-May-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media