research-article

Asynchronous Data Aggregation for Training End to End Visual Control Networks

Authors:

Mathew Monfort,

Matthew Johnson,

Katja HofmannAuthors Info & Claims

AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Pages 530 - 537

Published: 08 May 2017 Publication History

Abstract

Robust training of deep neural networks requires a large amount of data. However gathering and labeling this data can be expensive and determining which distribution of features are needed for training is not a trivial problem. This is compounded when training neural networks for autonomous navigation in continuous non-deterministic environments using only visual input. Increasing the quantity of demonstrated data does not solve this problem as the demonstrated sequences of actions are not guaranteed to produce the same outcomes and slight changes in orientation generate drastically different visual representations. This results in a training set with a different distribution than what the agent will typically encounter in application. Here, we develop a method that can grow a training set from the same distribution as the agent's experiences and capture useful features not found in demonstrated behavior. Additionally, we show that our approach scales to efficiently handle complex tasks that require a large amount of data (experiences) for training. Concretely, we propose the deep asynchronous Dagger framework, which combines the Dagger algorithm with an asynchronous actor-learner architecture for parallel dataset aggregation and network policy learning. We apply our method to the task of navigating 3D mazes in Minecraft with randomly changing block types and analyze our results.

References

[1]

Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pages 41--48, New York, NY, USA, 2009. ACM.

Digital Library

[2]

M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba. End to end learning for self-driving cars. CoRR, abs/1604.07316, 2016.

[3]

D. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). CoRR, abs/1511.07289, 2015.

[4]

A. Dragan, K. Lee, and S. Srinivasa. Legibility and predictability of robot motion. In hri, 2013.

Digital Library

[5]

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735--1780, Nov. 1997.

Digital Library

[6]

A. Hussein, M. M. Gaber, and E. Elyan. Deep Active Learning for Autonomous Navigation, pages 3--17. Springer International Publishing, Cham, 2016.

[7]

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015.

[8]

M. Johnson, K. Hofmann, T. Hutton, and D. Bignell. The malmo platform for artificial intelligence experimentation. In 25th International Joint Conference on Artificial Intelligence, 2016.

Digital Library

[9]

Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. In M. A. Arbib, editor, The Handbook of Brain Theory and Neural Networks, pages 255--258. MIT Press, Cambridge, MA, USA, 1998.

Digital Library

[10]

S. Levine and P. Abbeel. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems, pages 1071--1079, 2014.

Digital Library

[11]

S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res., 17(1):1334--1373, Jan. 2016.

Digital Library

[12]

S. Levine and V. Koltun. Guided policy search. In ICML '13: Proceedings of the 30th International Conference on Machine Learning, 2013.

Digital Library

[13]

S. Levine, N. Wagener, and P. Abbeel. Learning contact-rich manipulation skills with guided policy search. In 2015 IEEE international conference on robotics and automation (ICRA), pages 156--163. IEEE, 2015.

[14]

L.-J. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 1992. UMI Order No. GAX93--22750.

Digital Library

[15]

D.-R. Liu, H.-L. Li, and D. Wang. Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey. International Journal of Automation and Computing, 12(3):229--242, 2015.

Digital Library

[16]

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. CoRR, abs/1602.01783, 2016.

Digital Library

[17]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 02 2015.

[18]

A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. D. Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, and D. Silver. Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296, 2015.

[19]

S. Ross, G. J. Gordon, and J. A. Bagnell. No-regret reductions for imitation learning and structured prediction. In In AISTATS, 2011.

[20]

D. Silver, J. A. D. Bagnell, and A. T. Stentz . Active learning from demonstration for robust autonomous navigation. In IEEE Conference on Robotics and Automation, May 2012.

[21]

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller. Striving for simplicity: The all convolutional net. CoRR, abs/1412.6806, 2014.

[22]

J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. Efficient object localization using convolutional networks. CoRR, abs/1411.4280, 2014.

[23]

C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8(3):279--292, 1992.

Digital Library

[24]

A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, and S. Levine. Collective robot reinforcement learning with distributed asynchronous guided policy search. CoRR, abs/1610.00673, 2016.

[25]

X. Zhu, C. Vondrick, C. C. Fowlkes, and D. Ramanan. Do we need more training data? International Journal of Computer Vision, 119(1):76--92, 2016.

Digital Library

Index Terms

Asynchronous Data Aggregation for Training End to End Visual Control Networks
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Learning from demonstrations
    2. Machine learning approaches
      1. Neural networks
  2. Modeling and simulation
    1. Simulation types and techniques
      1. Agent / discrete models
      2. Data assimilation
2. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Recommendations

Vision-Based Autonomous Navigation System Using ANN and FSM Control
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics Meeting

Autonomous mobile robot navigation is a very relevant problem in robotics research. This paper proposes a vision-based autonomous navigation system using artificial neural networks (ANN) and finite state machines (FSM). In the first step, ANNs are used ...
Domain-adversarial training of neural networks

We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for ...
Discriminative deep belief networks for visual data classification

Visual data classification using insufficient labeled data is a well-known hard problem. Semi-supervise learning, which attempts to exploit the unlabeled data in additional to the labeled ones, has attracted much attention in recent years. This paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

May 2017

1914 pages

General Chairs:
Kate Larson
University of Waterloo, Canada
,
Michael Winikoff
University of Otago, New Zealand
,
Program Chairs:
Sanmay Das
Washington University in St. Louis, USA
,
Edmund Durfee
University of Michigan, USA

Sponsors

IFAAMAS

In-Cooperation

ACM: Association for Computing Machinery

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Toyota Research Institute / MIT CSAIL Joint Research Center
Microsoft Research Cambridge

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
107
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten