Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3091125.3091204acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Asynchronous Data Aggregation for Training End to End Visual Control Networks

Published: 08 May 2017 Publication History

Abstract

Robust training of deep neural networks requires a large amount of data. However gathering and labeling this data can be expensive and determining which distribution of features are needed for training is not a trivial problem. This is compounded when training neural networks for autonomous navigation in continuous non-deterministic environments using only visual input. Increasing the quantity of demonstrated data does not solve this problem as the demonstrated sequences of actions are not guaranteed to produce the same outcomes and slight changes in orientation generate drastically different visual representations. This results in a training set with a different distribution than what the agent will typically encounter in application. Here, we develop a method that can grow a training set from the same distribution as the agent's experiences and capture useful features not found in demonstrated behavior. Additionally, we show that our approach scales to efficiently handle complex tasks that require a large amount of data (experiences) for training. Concretely, we propose the deep asynchronous Dagger framework, which combines the Dagger algorithm with an asynchronous actor-learner architecture for parallel dataset aggregation and network policy learning. We apply our method to the task of navigating 3D mazes in Minecraft with randomly changing block types and analyze our results.

References

[1]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pages 41--48, New York, NY, USA, 2009. ACM.
[2]
M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba. End to end learning for self-driving cars. CoRR, abs/1604.07316, 2016.
[3]
D. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). CoRR, abs/1511.07289, 2015.
[4]
A. Dragan, K. Lee, and S. Srinivasa. Legibility and predictability of robot motion. In hri, 2013.
[5]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735--1780, Nov. 1997.
[6]
A. Hussein, M. M. Gaber, and E. Elyan. Deep Active Learning for Autonomous Navigation, pages 3--17. Springer International Publishing, Cham, 2016.
[7]
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015.
[8]
M. Johnson, K. Hofmann, T. Hutton, and D. Bignell. The malmo platform for artificial intelligence experimentation. In 25th International Joint Conference on Artificial Intelligence, 2016.
[9]
Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. In M. A. Arbib, editor, The Handbook of Brain Theory and Neural Networks, pages 255--258. MIT Press, Cambridge, MA, USA, 1998.
[10]
S. Levine and P. Abbeel. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems, pages 1071--1079, 2014.
[11]
S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res., 17(1):1334--1373, Jan. 2016.
[12]
S. Levine and V. Koltun. Guided policy search. In ICML '13: Proceedings of the 30th International Conference on Machine Learning, 2013.
[13]
S. Levine, N. Wagener, and P. Abbeel. Learning contact-rich manipulation skills with guided policy search. In 2015 IEEE international conference on robotics and automation (ICRA), pages 156--163. IEEE, 2015.
[14]
L.-J. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 1992. UMI Order No. GAX93--22750.
[15]
D.-R. Liu, H.-L. Li, and D. Wang. Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey. International Journal of Automation and Computing, 12(3):229--242, 2015.
[16]
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. CoRR, abs/1602.01783, 2016.
[17]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 02 2015.
[18]
A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. D. Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, and D. Silver. Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296, 2015.
[19]
S. Ross, G. J. Gordon, and J. A. Bagnell. No-regret reductions for imitation learning and structured prediction. In In AISTATS, 2011.
[20]
D. Silver, J. A. D. Bagnell, and A. T. Stentz . Active learning from demonstration for robust autonomous navigation. In IEEE Conference on Robotics and Automation, May 2012.
[21]
J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller. Striving for simplicity: The all convolutional net. CoRR, abs/1412.6806, 2014.
[22]
J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. Efficient object localization using convolutional networks. CoRR, abs/1411.4280, 2014.
[23]
C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8(3):279--292, 1992.
[24]
A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, and S. Levine. Collective robot reinforcement learning with distributed asynchronous guided policy search. CoRR, abs/1610.00673, 2016.
[25]
X. Zhu, C. Vondrick, C. C. Fowlkes, and D. Ramanan. Do we need more training data? International Journal of Computer Vision, 119(1):76--92, 2016.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems
May 2017
1914 pages

Sponsors

  • IFAAMAS

In-Cooperation

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

  1. active learning
  2. autonomous agent
  3. autonomous navigation
  4. deep learning
  5. neural network
  6. reinforcement learning
  7. visual navigation

Qualifiers

  • Research-article

Funding Sources

  • Toyota Research Institute / MIT CSAIL Joint Research Center
  • Microsoft Research Cambridge

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 107
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media