research-article

RLang: a declarative language for describing partial world knowledge to reinforcement learning agents

AUTHORs:

Rafael Rodriguez-Sanchez,

Benjamin A. Spiegel,

Stefanie Tellex,

George KonidarisAuthors Info & Claims

ICML'23: Proceedings of the 40th International Conference on Machine Learning

Article No.: 1212, Pages 29161 - 29178

Published: 23 July 2023 Publication History

Abstract

We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent. Unlike existing RL DSLs that ground to single elements of a decision-making formalism (e.g., the reward function or policy), RLang can specify information about every element of a Markov decision process. We define precise syntax and grounding semantics for RLang, and provide a parser that grounds RLang programs to an algorithm-agnostic partial world model and policy that can be exploited by an RL agent. We provide a series of example RLang programs demonstrating how different RL methods can exploit the resulting knowledge, encompassing model-free and model-based tabular algorithms, policy gradient and value-based methods, hierarchical approaches, and deep methods.

References

[1]

Abel, D. simple_rl: Reproducible reinforcement learning in python, 2019.

[2]

Andre, D. and Russell, S. J. State abstraction for programmable reinforcement learning agents. In Eighteenth National Conference on Artificial Intelligence, pp. 119-125, USA, 2002. American Association for Artificial Intelligence. ISBN 0262511290.

Digital Library

[3]

Andreas, J., Klein, D., and Levine, S. Modular multitask reinforcement learning with policy sketches. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 166-175. JMLR.org, 2017.

Digital Library

[4]

Artzi, Y. and Zettlemoyer, L. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association for Computational Linguistics, 1:49-62, 2013.

[5]

Bahdanau, D., Hill, F., Leike, J., Hughes, E., Hosseini, A., Kohli, P., and Grefenstette, E. Learning to understand goal specifications by modelling reward. In International Conference on Learning Representations, 2018.

[6]

Barto, A. G. and Mahadevan, S. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1):41-77, 2003.

Digital Library

[7]

Barto, A. G., Sutton, R. S., and Anderson, C.W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5):834-846, 1983.

[8]

Boutilier, C., Dearden, R., and Goldszmidt, M. Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1-2):49-107, 2000.

Digital Library

[9]

Brafman, R. I. and Tennenholtz, M. R-max: A general polynomial-time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3 (Oct):213-231, 2002.

[10]

Branavan, S., Zettlemoyer, L., and Barzilay, R. Reading between the lines: Learning to map high-level instructions to commands. In Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 1268-1277, 2010.

Digital Library

[11]

Branavan, S., Silver, D., and Barzilay, R. Learning to win by reading manuals in a monte-carlo framework. Journal of Artificial Intelligence Research, 43:661-704, 2012.

[12]

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. Openai gym, 2016.

[13]

Denil, M., Colmenarejo, S. G., Cabi, S., Saxton, D., and de Freitas, N. Programmable agents. arXiv preprint arXiv:1706.06383, 2017.

[14]

Dietterich, T. G. The maxq method for hierarchical reinforcement learning. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 118-126, 1998.

Digital Library

[15]

Diuk, C., Cohen, A., and Littman, M. L. An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th international conference on Machine learning, pp. 240-247, 2008.

Digital Library

[16]

Fernández, F. and Veloso, M. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 720-727, 2006.

Digital Library

[17]

Fujita, Y., Nagarajan, P., Kataoka, T., and Ishikawa, T. Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research, 22(77):1-14, 2021.

[18]

Ghallab, M., Howe, A., Knoblock, C., McDermott, D., Ram, A., Veloso, M., Weld, D., and Wilkins, D. Pddl—the planning domain definition language, 1998.

[19]

Gopalan, N., Arumugam, D., Wong, L., and Tellex, S. Sequence-to-sequence language grounding of nonmarkovian task specifications. In Robotics: Science and Systems, 2018.

[20]

Goyal, P., Niekum, S., and Mooney, R. Using natural language for reward shaping in reinforcement learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019.

Digital Library

[21]

Goyal, P., Niekum, S., and Mooney, R. Pixl2r: Guiding reinforcement learning using natural language by mapping pixels to rewards. In Conference on Robot Learning, 2020.

[22]

Jothimurugan, K., Alur, R., and Bastani, O. A composable specification language for reinforcement learning tasks. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.

[23]

Kernigham, B. W. and Ritchie, D. M. The C Programming Language. Prentice Hall of India, 1973.

[24]

Littman, M. L., Topcu, U., Fu, J., Isbell, C., Wen, M., and MacGlashan, J. Environment-independent task specifications via gltl. arXiv preprint arXiv:1704.04341, 2017.

[25]

Luketina, J., Nardelli, N., Farquhar, G., Foerster, J., Andreas, J., Grefenstette, E., Whiteson, S., and Rocktäschel, T. A survey of reinforcement learning informed by natural language. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, August 10-16 2019, Macao, China., volume 57, pp. 6309- 6317. AAAI Press (Association for the Advancement of Artificial Intelligence), 2019.

[26]

Maas, A. L., Hannun, A. Y., and Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models, 2013.

[27]

Maclin, R. and Shavlik, J. W. Creating advice-taking reinforcement learners. Machine Learning, 22(1):251-281, 1996.

Digital Library

[28]

Misra, D. and Artzi, Y. Reinforcement learning for mapping instructions to actions with reward learning, 2017.

[29]

Misra, D., Langford, J., and Artzi, Y. Mapping instructions and visual observations to actions with reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1004-1015, 2017.

[30]

Nota, C. The autonomous learning library. https://github.com/cpnota/autonomous-learning-library, 2020.

[31]

Patel, R., Pavlick, E., and Tellex, S. Grounding language to non-markovian tasks with no supervision of task specifications. In Proceedings of Robotics: Science and Systems, June 2020.

[32]

Puterman, M. L. Markov decision processes. Handbooks in Operations Research and Management Science, 2:331- 434, 1990.

[33]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

[34]

Sumers, T. R., Ho, M. K., Hawkins, R. D., Narasimhan, K., and Griffiths, T. L. Learning rewards from linguistic feedback. Proceedings of the AAAI Conference on Artificial Intelligence, 35(7):6002-6010, May 2021.

[35]

Sun, S.-H., Wu, T.-L., and Lim, J. J. Program guided agent. In International Conference on Learning Representations, 2020.

[36]

Sutton, R. S., Precup, D., and Singh, S. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2): 181-211, 1999.

[37]

Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.

[38]

Van Rossum, G. and Drake Jr, F. L. Python Reference Manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.

[39]

Watkins, C. J. and Dayan, P. Q-learning. Machine learning, 8(3-4):279-292, 1992.

Digital Library

[40]

Williams, E. C., Gopalan, N., Rhee, M., and Tellex, S. Learning to parse natural language to grounded reward functions with weak supervision. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1-7. IEEE, 2018.

Digital Library

[41]

Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.

Digital Library

[42]

Younes, H. L. and Littman, M. L. Ppddl 1.0: The language for the probabilistic part of ipc-4. In Proc. International Planning Competition, 2004.

[43]

Zhong, V., Rocktaschel, T., and Grefenstette, E. Rtfm: Generalising to new environment dynamics via reading. In International Conference on Learning Representations, 2019.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'23: Proceedings of the 40th International Conference on Machine Learning

July 2023

43479 pages

Copyright © 2023.

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents