Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3618408.3619620guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

RLang: a declarative language for describing partial world knowledge to reinforcement learning agents

Published: 23 July 2023 Publication History

Abstract

We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent. Unlike existing RL DSLs that ground to single elements of a decision-making formalism (e.g., the reward function or policy), RLang can specify information about every element of a Markov decision process. We define precise syntax and grounding semantics for RLang, and provide a parser that grounds RLang programs to an algorithm-agnostic partial world model and policy that can be exploited by an RL agent. We provide a series of example RLang programs demonstrating how different RL methods can exploit the resulting knowledge, encompassing model-free and model-based tabular algorithms, policy gradient and value-based methods, hierarchical approaches, and deep methods.

References

[1]
Abel, D. simple_rl: Reproducible reinforcement learning in python, 2019.
[2]
Andre, D. and Russell, S. J. State abstraction for programmable reinforcement learning agents. In Eighteenth National Conference on Artificial Intelligence, pp. 119-125, USA, 2002. American Association for Artificial Intelligence. ISBN 0262511290.
[3]
Andreas, J., Klein, D., and Levine, S. Modular multitask reinforcement learning with policy sketches. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 166-175. JMLR.org, 2017.
[4]
Artzi, Y. and Zettlemoyer, L. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association for Computational Linguistics, 1:49-62, 2013.
[5]
Bahdanau, D., Hill, F., Leike, J., Hughes, E., Hosseini, A., Kohli, P., and Grefenstette, E. Learning to understand goal specifications by modelling reward. In International Conference on Learning Representations, 2018.
[6]
Barto, A. G. and Mahadevan, S. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1):41-77, 2003.
[7]
Barto, A. G., Sutton, R. S., and Anderson, C.W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5):834-846, 1983.
[8]
Boutilier, C., Dearden, R., and Goldszmidt, M. Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1-2):49-107, 2000.
[9]
Brafman, R. I. and Tennenholtz, M. R-max: A general polynomial-time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3 (Oct):213-231, 2002.
[10]
Branavan, S., Zettlemoyer, L., and Barzilay, R. Reading between the lines: Learning to map high-level instructions to commands. In Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 1268-1277, 2010.
[11]
Branavan, S., Silver, D., and Barzilay, R. Learning to win by reading manuals in a monte-carlo framework. Journal of Artificial Intelligence Research, 43:661-704, 2012.
[12]
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. Openai gym, 2016.
[13]
Denil, M., Colmenarejo, S. G., Cabi, S., Saxton, D., and de Freitas, N. Programmable agents. arXiv preprint arXiv:1706.06383, 2017.
[14]
Dietterich, T. G. The maxq method for hierarchical reinforcement learning. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 118-126, 1998.
[15]
Diuk, C., Cohen, A., and Littman, M. L. An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th international conference on Machine learning, pp. 240-247, 2008.
[16]
Fernández, F. and Veloso, M. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 720-727, 2006.
[17]
Fujita, Y., Nagarajan, P., Kataoka, T., and Ishikawa, T. Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research, 22(77):1-14, 2021.
[18]
Ghallab, M., Howe, A., Knoblock, C., McDermott, D., Ram, A., Veloso, M., Weld, D., and Wilkins, D. Pddl—the planning domain definition language, 1998.
[19]
Gopalan, N., Arumugam, D., Wong, L., and Tellex, S. Sequence-to-sequence language grounding of nonmarkovian task specifications. In Robotics: Science and Systems, 2018.
[20]
Goyal, P., Niekum, S., and Mooney, R. Using natural language for reward shaping in reinforcement learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019.
[21]
Goyal, P., Niekum, S., and Mooney, R. Pixl2r: Guiding reinforcement learning using natural language by mapping pixels to rewards. In Conference on Robot Learning, 2020.
[22]
Jothimurugan, K., Alur, R., and Bastani, O. A composable specification language for reinforcement learning tasks. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
[23]
Kernigham, B. W. and Ritchie, D. M. The C Programming Language. Prentice Hall of India, 1973.
[24]
Littman, M. L., Topcu, U., Fu, J., Isbell, C., Wen, M., and MacGlashan, J. Environment-independent task specifications via gltl. arXiv preprint arXiv:1704.04341, 2017.
[25]
Luketina, J., Nardelli, N., Farquhar, G., Foerster, J., Andreas, J., Grefenstette, E., Whiteson, S., and Rocktäschel, T. A survey of reinforcement learning informed by natural language. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, August 10-16 2019, Macao, China., volume 57, pp. 6309- 6317. AAAI Press (Association for the Advancement of Artificial Intelligence), 2019.
[26]
Maas, A. L., Hannun, A. Y., and Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models, 2013.
[27]
Maclin, R. and Shavlik, J. W. Creating advice-taking reinforcement learners. Machine Learning, 22(1):251-281, 1996.
[28]
Misra, D. and Artzi, Y. Reinforcement learning for mapping instructions to actions with reward learning, 2017.
[29]
Misra, D., Langford, J., and Artzi, Y. Mapping instructions and visual observations to actions with reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1004-1015, 2017.
[30]
Nota, C. The autonomous learning library. https://github.com/cpnota/autonomous-learning-library, 2020.
[31]
Patel, R., Pavlick, E., and Tellex, S. Grounding language to non-markovian tasks with no supervision of task specifications. In Proceedings of Robotics: Science and Systems, June 2020.
[32]
Puterman, M. L. Markov decision processes. Handbooks in Operations Research and Management Science, 2:331- 434, 1990.
[33]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[34]
Sumers, T. R., Ho, M. K., Hawkins, R. D., Narasimhan, K., and Griffiths, T. L. Learning rewards from linguistic feedback. Proceedings of the AAAI Conference on Artificial Intelligence, 35(7):6002-6010, May 2021.
[35]
Sun, S.-H., Wu, T.-L., and Lim, J. J. Program guided agent. In International Conference on Learning Representations, 2020.
[36]
Sutton, R. S., Precup, D., and Singh, S. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2): 181-211, 1999.
[37]
Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
[38]
Van Rossum, G. and Drake Jr, F. L. Python Reference Manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
[39]
Watkins, C. J. and Dayan, P. Q-learning. Machine learning, 8(3-4):279-292, 1992.
[40]
Williams, E. C., Gopalan, N., Rhee, M., and Tellex, S. Learning to parse natural language to grounded reward functions with weak supervision. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1-7. IEEE, 2018.
[41]
Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
[42]
Younes, H. L. and Littman, M. L. Ppddl 1.0: The language for the probabilistic part of ipc-4. In Proc. International Planning Competition, 2004.
[43]
Zhong, V., Rocktaschel, T., and Grefenstette, E. Rtfm: Generalising to new environment dynamics via reading. In International Conference on Learning Representations, 2019.

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media