Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3045390.3045591guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Graying the black box: understanding DQNs

Published: 19 June 2016 Publication History

Abstract

In recent years there is a growing interest in using deep representations for reinforcement learning. In this paper, we present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. Using our tools we reveal that the features learned by DQNs aggregate the state space in a hierarchical fashion, explaining its success. Moreover we are able to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize deep neural networks in reinforcement learning.

References

[1]
Bellemare, Marc G, Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment: An evaluation platform for general agents. arXiv preprint arXiv:1207.4708, 2012.
[2]
Bellemare, Marc G, Ostrovski, Georg, Guez, Arthur, Thomas, Philip S, and Munos, Rémi. Increasing the action gap: New operators for reinforcement learning. arXiv preprint arXiv:1512.04860, 2015.
[3]
Dayan, Peter and Hinton, Geoffrey E. Feudal reinforcement learning. pp. 271-271. Morgan Kaufmann Publishers, 1993.
[4]
Dean, Thomas and Lin, Shieu-Hong. Decomposition techniques for planning in stochastic domains. Citeseer, 1995.
[5]
Dietterich, Thomas G. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. (JAIR), 13:227-303, 2000.
[6]
Engel, Yaakov and Mannor, Shie. Learning embedded maps of markov processes. In in Proceedings of ICML 2001. Citeseer, 2001.
[7]
Erhan, Dumitru, Bengio, Yoshua, Courville, Aaron, and Vincent, Pascal. Visualizing higher-layer features of a deep network. Dept. IRO, Université de Montréal, Tech. Rep, 4323, 2009.
[8]
Gordon, Geoffrey J. Stable function approximation in dynamic programming. 1995.
[9]
Hauskrecht, Milos, Meuleau, Nicolas, Kaelbling, Leslie Pack, Dean, Thomas, and Boutilier, Craig. Hierarchical solution of Markov decision processes using macro-actions. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pp. 220-229. Morgan Kaufmann Publishers Inc., 1998.
[10]
Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702, 2015.
[11]
Lin, Long-Ji. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993.
[12]
Lunga, Dalton, Prasad, Santasriya, Crawford, Melba M, and Ersoy, Ozan. Manifold-learning-based feature extraction for classification of hyperspectral data: a review of advances in manifold learning. Signal Processing Magazine, IEEE, 31(1):55-66, 2014.
[13]
Mann, Timothy A, Mannor, Shie, and Precup, Doina. Approximate value iteration with temporally extended actions. Journal of Artificial Intelligence Research, 53(1): 375-438, 2015.
[14]
Mannor, Shie, Menache, Ishai, Hoze, Amit, and Klein, Uri. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the twenty-first international conference on Machine learning, pp. 71. ACM, 2004.
[15]
Menache, Ishai, Mannor, Shie, and Shimkin, Nahum. Q-cutdynamic discovery of sub-goals in reinforcement learning. In Machine Learning: ECML 2002, pp. 295- 306. Springer, 2002.
[16]
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[17]
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529- 533, 2015.
[18]
Nair, Arun, Srinivasan, Praveen, Blackwell, Sam, Alcicek, Cagdas, Fearon, Rory, De Maria, Alessandro, Panneershelvam, Vedavyas, Suleyman, Mustafa, Beattie, Charles, Petersen, Stig, et al. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296, 2015.
[19]
Nguyen, Anh, Yosinski, Jason, and Clune, Jeff. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. arXiv preprint arXiv:1412.1897, 2014.
[20]
Parisotto, Emilio, Ba, Jimmy Lei, and Salakhutdinov, Ruslan. Actor-mimic: Deep multitask and transfer reinforcement learning, 2015.
[21]
Parr, Ronald. Flexible decomposition algorithms for weakly coupled Markov decision problems. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pp. 422-430. Morgan Kaufmann Publishers Inc., 1998.
[22]
Quiroga, R Quian, Reddy, Leila, Kreiman, Gabriel, Koch, Christof, and Fried, Itzhak. Invariant visual representation by single neurons in the human brain. Nature, 435 (7045):1102-1107, 2005.
[23]
Ramachandran, P. and Varoquaux, G. Mayavi: 3D Visualization of Scientific Data. Computing in Science & Engineering, 13(2):40-51, 2011. ISSN 1521-9615.
[24]
Riedmiller, Martin. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML 2005, pp. 317- 328. Springer, 2005.
[25]
Rusu, Andrei A., Colmenarejo, Sergio Gomez, Gulcehre, Caglar, Desjardins, Guillaume, Kirkpatrick, James, Pascanu, Razvan, Mnih, Volodymyr, Kavukcuoglu, Koray, and Hadsell, Raia. Policy distillation, 2015.
[26]
Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
[27]
Simonyan, Karen and Zisserman, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[28]
Simsek, Özgür, Wolfe, Alicia P, and Barto, Andrew G. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the 22nd international conference on Machine learning, pp. 816- 823. ACM, 2005.
[29]
Singh, Satinder P, Jaakkola, Tommi, and Jordan, Michael I. Reinforcement learning with soft state aggregation. Advances in neural information processing systems, pp. 361-368, 1995.
[30]
Stolle, Martin. Automated discovery of options in reinforcement learning. PhD thesis, McGill University, 2004.
[31]
Sutton, Richard S, Precup, Doina, and Singh, Satinder. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1):181-211, 1999.
[32]
Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian, and Fergus, Rob. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
[33]
Tenenbaum, Joshua B, De Silva, Vin, and Langford, John C. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.
[34]
Tesauro, Gerald. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58-68, 1995.
[35]
Thrun, Sebastian. Learning metric-topological maps for indoor mobile robot navigation. Artificial Intelligence, 99(1):21-71, 1998.
[36]
Tsitsiklis, John N and Van Roy, Benjamin. An analysis of temporal-difference learning with function approximation. Automatic Control, IEEE Transactions on, 42(5): 674-690, 1997.
[37]
Van Der Maaten, Laurens. Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research, 15(1):3221-3245, 2014.
[38]
Van der Maaten, Laurens and Hinton, Geoffrey. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(2579-2605):85, 2008.
[39]
Van Hasselt, Hado, Guez, Arthur, and Silver, David. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461, 2015.
[40]
Wang, Ziyu, de Freitas, Nando, and Lanctot, Marc. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581, 2015.
[41]
Yosinski, Jason, Clune, Jeff, Bengio, Yoshua, and Lipson, Hod. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, pp. 3320-3328, 2014.
[42]
Zeiler, Matthew D and Fergus, Rob. Visualizing and understanding convolutional networks. In Computer Vision-ECCV 2014, pp. 818-833. Springer, 2014.

Cited By

View all
  • (2023)Interpretable Imitation Learning with Symbolic RewardsACM Transactions on Intelligent Systems and Technology10.1145/362782215:1(1-34)Online publication date: 19-Dec-2023
  • (2022)ProtoXProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602245(27239-27252)Online publication date: 28-Nov-2022
  • (2022)Inherently explainable reinforcement learning in natural languageProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601447(16178-16190)Online publication date: 28-Nov-2022
  • Show More Cited By
  1. Graying the black box: understanding DQNs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48
    June 2016
    3077 pages

    Publisher

    JMLR.org

    Publication History

    Published: 19 June 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Interpretable Imitation Learning with Symbolic RewardsACM Transactions on Intelligent Systems and Technology10.1145/362782215:1(1-34)Online publication date: 19-Dec-2023
    • (2022)ProtoXProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602245(27239-27252)Online publication date: 28-Nov-2022
    • (2022)Inherently explainable reinforcement learning in natural languageProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601447(16178-16190)Online publication date: 28-Nov-2022
    • (2022)Understanding the evolution of linear regions in deep reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601061(10891-10903)Online publication date: 28-Nov-2022
    • (2022)Beyond rewardsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600519(3444-3460)Online publication date: 28-Nov-2022
    • (2021)The Shoutcasters, the Game Enthusiasts, and the AI: Foraging for Explanations of Real-time Strategy PlayersACM Transactions on Interactive Intelligent Systems10.1145/339604711:1(1-46)Online publication date: 15-Mar-2021
    • (2020)Meta-trained agents implement Bayes-optimal agentsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497293(18691-18703)Online publication date: 6-Dec-2020
    • (2019)An Atari model zoo for analyzing, visualizing, and comparing deep reinforcement learning agentsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367494(3260-3267)Online publication date: 10-Aug-2019
    • (2019)Towards better interpretability in deep Q-networksProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33014561(4561-4569)Online publication date: 27-Jan-2019
    • (2019)ATMSeerProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300911(1-12)Online publication date: 2-May-2019
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media