Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599393acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free access

Internal Logical Induction for Pixel-Symbolic Reinforcement Learning

Published: 04 August 2023 Publication History

Abstract

Reinforcement Learning (RL) has experienced rapid advancements in recent years. The widely studied RL algorithms mainly focus on a single input form, such as pixel-based image input or symbolic vector input. These two forms have different characteristics and, in many scenarios, will appear together, while few RL algorithms have studied the problems with mixed input types. Specifically, in the scenario where both pixel and symbolic inputs are available, symbolic input usually offers abstract features with specific semantics, which is more conducive to the agent's focus. Conversely, pixel input provides more comprehensive information, enabling the agent to make well-informed decisions. Tailoring the processing approach based on the properties of these two input types can contribute to solving the problem more effectively. To tackle the above issue, we propose an Internal Logical Induction (ILI) framework that integrates deep RL and rule learning into one system. ILI utilizes the deep RL algorithm to process the pixel input and the rule learning algorithm to induce propositional logic knowledge from symbolic input. To efficiently combine these two mechanisms, we further adopt a reward shaping technique by treating valuable knowledge as intrinsic rewards for the RL procedure. Experimental results demonstrate that the ILI framework outperforms baseline approaches in RL problems with pixel-symbolic input, and its inductive knowledge exhibits transferability advantages when pixel input semantics change.

Supplementary Material

MP4 File (rtfp1414-2min-promo.mp4)
2 minute promotional video.

References

[1]
Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Józefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, and Wojciech Zaremba. 2020. Learning Dexterous In-Hand Manipulation. The International Journal of Robotics Research, Vol. 39, 1 (2020), 3--20.
[2]
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight Experience Replay. In Advances in Neural Information Processing Systems (NeurIPS). 5048--5058.
[3]
Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. 2017. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Processing Magazine, Vol. 34, 6 (2017), 26--38.
[4]
Masataro Asai and Christian Muise. 2020. Learning Neural-Symbolic Descriptive Planning Models via Cube-Space Priors: The Voyage Home (to STRIPS). In International Joint Conference on Artificial Intelligence (IJCAI). 2676--2682.
[5]
Andre Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Zidek, and Remi Munos. 2018. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement. In International Conference on Machine Learning (ICML). 501--510.
[6]
Andre Barreto, Will Dabney, Remi Munos, Jonathan J Hunt, Tom Schaul, Hado P van Hasselt, and David Silver. 2017. Successor Features for Transfer in Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 4055--4065.
[7]
Yuri Burda, Harrison Edwards, Amos J. Storkey, and Oleg Klimov. 2019. Exploration by random network distillation. In International Conference on Learning Representations (ICLR).
[8]
Ning Chen, Jun Zhu, Fuchun Sun, and Eric P. Xing. 2012. Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, 12 (2012), 2365--2378.
[9]
Nuttapong Chentanez, Andrew Barto, and Satinder Singh. 2004. Intrinsically Motivated Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 1281--1288.
[10]
Rohan Chitnis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. 2022. Learning Neuro-Symbolic Relational Transition Models for Bilevel Planning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 4166--4173.
[11]
William W. Cohen. 1995. Fast Effective Rule Induction. In International Conference on Machine Learning (ICML). 115--123.
[12]
Peter Dayan. 1993. Improving Generalization for Temporal Difference Learning: The Successor Representation. In Advances in Neural Information Processing Systems (NeurIPS). 613--624.
[13]
Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. 2022. ProcTHOR: Large-Scale Embodied AI Using Procedural Generation. In Advances in Neural Information Processing Systems (NeurIPS). 5982--5994.
[14]
Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. 2022. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. In Advances in Neural Information Processing Systems (NeurIPS).
[15]
Fernando Fernández and Manuela M. Veloso. 2006. Probabilistic policy reuse in a reinforcement learning agent. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 720--727.
[16]
Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, and Daan Wierstra. 2017. PathNet: Evolution Channels Gradient Descent in Super Neural Networks. arXiv preprint arXiv:1701.08734 (2017).
[17]
Eibe Frank and Ian H. Witten. 1998. Generating Accurate Rule Sets Without Global Optimization. In International Conference on Machine Learning (ICML). 144--151.
[18]
Johannes Fürnkranz, Dragan Gamberger, and Nada Lavrac. 2012. Foundations of Rule Learning. Springer.
[19]
Brian R. Gaines and Paul Compton. 1995. Induction of Ripple-Down Rules Applied to Modeling Large Databases. Journal of Intelligent Information Systems, Vol. 5, 3 (1995), 211--228.
[20]
Mark A. Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, Vol. 11, 1 (2009), 10--18.
[21]
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Gheshlaghi Azar, and David Silver. 2018. Rainbow: Combining Improvements in Deep Reinforcement Learning. In AAAI Conference on Artificial Intelligence (AAAI). 3215--3222.
[22]
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531 (2015).
[23]
Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. 2016. VIME: Variational Information Maximizing Exploration. In Advances in Neural Information Processing Systems (NeurIPS). 1109--1117.
[24]
Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, and Chongjie Zhang. 2021. Generalizable Episodic Memory for Deep Reinforcement Learning. In International Conference on Machine Learning (ICML). 4380--4390.
[25]
Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. 1996. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, Vol. 4 (1996), 237--285.
[26]
Akira Kinose, Masashi Okada, Ryo Okumura, and Tadahiro Taniguchi. 2022. Multi-View Dreaming: Multi-View World Model with Contrastive Learning. arXiv preprint arXiv:2203.11024 (2022).
[27]
Phuc H. Le-Khac, Graham Healy, and Alan F. Smeaton. 2020. Contrastive Representation Learning: A Framework and Review. IEEE Aerospace Conference, Vol. 8 (2020), 193907--193934.
[28]
Kimin Lee, Michael Laskin, Aravind Srinivas, and Pieter Abbeel. 2021. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning. In International Conference on Machine Learning (ICML). 6131--6141.
[29]
Minne Li, Lisheng Wu, Jun WANG, and Haitham Bou Ammar. 2019. Multi-View Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 1418--1429.
[30]
Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. 2019. SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning. In AAAI Conference on Artificial Intelligence (AAAI). 2970--2977.
[31]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level Control Through Deep Reinforcement Learning. Nature, Vol. 518, 7540 (2015), 529--533.
[32]
Junhyuk Oh, Yijie Guo, Satinder Singh, and Honglak Lee. 2018. Self-Imitation Learning. In International Conference on Machine Learning (ICML). 3878--3887.
[33]
Pouya Ghiasnezhad Omran, Kewen Wang, and Zhe Wang. 2021. An Embedding-Based Approach to Rule Learning in Knowledge Graphs. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 4 (2021), 1348--1359.
[34]
OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, and Lei Zhang. 2019. Solving Rubik's Cube with a Robot Hand. arXiv preprint arXiv:1910.07113 (2019).
[35]
Hyunjong Park, Sanghoon Lee, Junghyup Lee, and Bumsub Ham. 2021. Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). 12026--12035.
[36]
Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, and Chai Quek. 2022. Hierarchical Reinforcement Learning: A Comprehensive Survey. ACM Computing Surveys (CSUR), Vol. 54, 5 (2022), 109:1--109:35.
[37]
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-driven Exploration by Self-supervised Prediction. In International Conference on Machine Learning (ICML). 2778--2787.
[38]
J. Ross Quinlan. 1990. Learning Logical Definitions from Relations. Machine Learning, Vol. 5 (1990), 239--266.
[39]
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, Vol. 22, 268 (2021), 1--8.
[40]
Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2015. Policy distillation. arXiv preprint arXiv:1511.06295 (2015).
[41]
Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive Neural Networks. arXiv preprint arXiv:1606.04671 (2016).
[42]
Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. 2015. Universal Value Function Approximators. In International Conference on Machine Learning (ICML). 1312--1320.
[43]
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized Experience Replay. In International Conference on Learning Representations (ICLR).
[44]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, Vol. 529, 7587 (2016), 484--489.
[45]
Robert L Solso, M Kimberly MacLin, and Otto H MacLin. 2005. Cognitive Psychology. Pearson Education New Zealand.
[46]
Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. 2021. The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels. arXiv preprint arXiv:2101.02722 (2021).
[47]
Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron C. Courville, and Marc G. Bellemare. 2020. On Bonus Based Exploration Methods in The Arcade Learning Environment. In International Conference on Learning Representations (ICLR).
[48]
Yunhao Tang. 2020. Self-Imitation Learning via Generalized Lower Bound Q-learning. In Advances in Neural Information Processing Systems (NeurIPS). 13964--13975.
[49]
Yang Wang. 2020. Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion. arXiv preprint arXiv:2006.08159 (2020).
[50]
Dennis Wei, Sanjeeb Dash, Tian Gao, and Oktay Gunluk. 2019. Generalized Linear Rule Models. In International Conference on Machine Learning (ICML). 6687--6696.
[51]
Jinglin Xu, Wenbin Li, Xinwang Liu, Dingwen Zhang, Ji Liu, and Junwei Han. 2020. Deep Embedded Complementary and Interactive Information for Multi-View Classification. In AAAI Conference on Artificial Intelligence (AAAI). 6494--6501.
[52]
Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, and Wei Liu. 2020. Towards Playing Full MOBA Games with Deep Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS). 621--632.
[53]
Haiyan Yin and Sinno Jialin Pan. 2017. Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay. In AAAI Conference on Artificial Intelligence (AAAI). 1640--1646.
[54]
Yang Yu. 2018. Towards Sample Efficient Reinforcement Learning. In International Joint Conference on Artificial Intelligence (IJCAI). 5739--5743.
[55]
Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda. 2020. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access, Vol. 8 (2020), 58443--58469.
[56]
Zhi-Hua Zhou. 2019. Abductive Learning: Towards Bridging Machine Learning and Logical Reasoning. Science China Information Sciences, Vol. 62, 7 (2019), 76101:1--76101:3.
[57]
Zhuangdi Zhu, Kaixiang Lin, and Jiayu Zhou. 2020. Transfer Learning in Deep Reinforcement Learning: A Survey. arXiv preprint arXiv:2009.07888 (2020).
[58]
Zeyu Zhu and Huijing Zhao. 2022. A Survey of Deep RL and IL for Autonomous Driving Policy Learning. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 9 (2022), 14043--14065.

Index Terms

  1. Internal Logical Induction for Pixel-Symbolic Reinforcement Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2023
    5996 pages
    ISBN:9798400701030
    DOI:10.1145/3580305
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. reinforcement learning
    2. rule learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 460
      Total Downloads
    • Downloads (Last 12 months)295
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media