Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Imitation Learning: A Survey of Learning Methods

Published: 06 April 2017 Publication History

Abstract

Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years; however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations, without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction, and computer games, to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this article, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications, and highlight current and future research directions.

References

[1]
Pieter Abbeel, Adam Coates, Morgan Quigley, and Andrew Y. Ng. 2007. An application of reinforcement learning to aerobatic helicopter flight. Advances in Neural Information Processing Systems 19 (2007), 1.
[2]
Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning. ACM, 1.
[3]
Ricardo Aler, Oscar Garcia, and José María Valls. 2005. Correcting and improving imitation models of humans for robosoccer agents. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, 2005, Vol. 3. IEEE, 2402--2409.
[4]
Brenna Argall, Brett Browning, and Manuela Veloso. 2007. Learning by demonstration with critique from a human teacher. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, 57--64.
[5]
Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 5 (2009), 469--483.
[6]
Tamim Asfour, Pedram Azad, Florian Gyarfas, and Rüdiger Dillmann. 2008. Imitation learning of dual-arm manipulation tasks in humanoid robots. International Journal of Humanoid Robotics 5, 2 (2008), 183--202.
[7]
Paul Bakker and Yasuo Kuniyoshi. 1996. Robot see, robot do: An overview of robot imitation. In Proceedings of the Workshop on Learning in Robots and Animals (AISB’96). 3--11.
[8]
Juan Pedro Bandera Rubio. 2010. Vision-Based Gesture Recognition in a Robot Learning by Imitation Framework. Ph.D. Dissertation. Universidad de Málaga.
[9]
Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2012. The arcade learning environment: An evaluation platform for general agents. arXiv preprint arXiv:1207.4708 (2012).
[10]
Roger Bemelmans, Gert Jan Gelderblom, Pieter Jonker, and Luc De Witte. 2012. Socially assistive robots in elderly care: A systematic review into effects and effectiveness. Journal of the American Medical Directors Association 13, 2 (2012), 114--120.
[11]
Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1 (2009), 1--127.
[12]
Darrin C. Bentivegna, Christopher G. Atkeson, and Gordon Cheng. 2004. Learning tasks from observation and practice. Robotics and Autonomous Systems 47, 2 (2004), 163--169.
[13]
Erik Berger, Heni Ben Amor, David Vogt, and Bernhard Jung. 2008. Towards a simulator for imitation learning with kinesthetic bootstrapping. In Workshop Proceedings of International Conference on Simulation, Modeling and Programming for Autonomous Robots (SIMPAR’08). 167--173.
[14]
Aude Billard, Sylvain Calinon, RŘdiger Dillmann, and Stefan Schaal. 2008. Robot programming by demonstration. In Springer Handbook of Robotics. Springer, 1371--1394.
[15]
Aude Billard and Maja J. Matarić. 2001. Learning human arm movements by imitation: Evaluation of a biologically inspired connectionist architecture. Robotics and Autonomous Systems 37, 2 (2001), 145--160.
[16]
Josh C. Bongard and Gregory S. Hornby. 2013. Combining fitness-based search and user modeling in evolutionary robotics. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 159--166.
[17]
Tim Brys, Anna Harutyunyan, Halit Bener Suay, Sonia Chernova, Matthew E. Taylor, and Ann Nowé. 2015a. Reinforcement learning from demonstration through shaping. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’15).
[18]
Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. 2015b. Policy transfer using reward shaping. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 181--188.
[19]
Jonas Buchli, Freek Stulp, Evangelos Theodorou, and Stefan Schaal. 2011. Learning variable impedance control. International Journal of Robotics Research 30, 7 (2011), 820--833.
[20]
Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38, 2 (2008), 156--172.
[21]
Sylvain Calinon and Aude Billard. 2007a. Incremental learning of gestures by imitation in a humanoid robot. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, 255--262.
[22]
Sylvain Calinon and Aude Billard. 2008. A framework integrating statistical and social cues to teach a humanoid robot new skills. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’08), Workshop on Social Interaction with Intelligent Indoor Robots.
[23]
Sylvain Calinon and Aude G. Billard. 2007b. What is the teachers role in robot programming by demonstration?: Toward benchmarks for improved learning. Interaction Studies 8, 3 (2007), 441--464.
[24]
Sylvain Calinon, Zhibin Li, Tohid Alizadeh, Nikos G. Tsagarakis, and Darwin G. Caldwell. 2012. Statistical dynamical systems for skills acquisition in humanoids. In Proceedings of the 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids’12). IEEE, 323--329.
[25]
Luigi Cardamone, Daniele Loiacono, and Pier Luca Lanzi. 2009. Learning drivers for TORCS through imitation using supervised methods. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2009 (CIG’09). IEEE, 148--155.
[26]
Nutan Chen, Justin Bayer, Sebastian Urban, and Patrick Van Der Smagt. 2015. Efficient movement representation by embedding dynamic movement primitives in deep autoencoders. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids’15). IEEE, 434--440.
[27]
Ran Cheng and Yaochu Jin. 2015. A social learning particle swarm optimization algorithm for scalable optimization. Information Sciences 291 (2015), 43--60.
[28]
Sonia Chernova and Manuela Veloso. 2007a. Confidence-based policy learning from demonstration using gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 233.
[29]
Sonia Chernova and Manuela Veloso. 2007b. Confidence-based policy learning from demonstration using Gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 233.
[30]
Sonia Chernova and Manuela Veloso. 2008. Teaching collaborative multi-robot tasks through demonstration. In Proceedings of the 8th IEEE-RAS International Conference on Humanoid Robots, 2008 (Humanoids’08). IEEE, 385--390.
[31]
Dan Ciresan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, 3642--3649.
[32]
Christopher Clark and Amos Storkey. 2015. Training deep convolutional neural networks to play go. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 1766--1774.
[33]
Adam Coates, Pieter Abbeel, and Andrew Y. Ng. 2008. Learning for control from multiple demonstrations. In Proceedings of the 25th International Conference on Machine Learning. ACM, 144--151.
[34]
William Curran, Tim Brys, Matthew Taylor, and William Smart. 2015. Using PCA to efficiently represent state spaces. arXiv Preprint arXiv:1505.00322 (2015).
[35]
David B. DAmbrosio and Kenneth O. Stanley. 2013. Scalable multiagent learning through indirect encoding of policy geometry. Evolutionary Intelligence 6, 1 (2013), 1--26.
[36]
Hal Daumé Iii, John Langford, and Daniel Marcu. 2009. Search-based structured prediction. Machine Learning 75, 3 (2009), 297--325.
[37]
Kerstin Dautenhahn and Chrystopher L. Nehaniv. 2002. The Correspondence Problem. MIT Press.
[38]
Agostino De Santis, Bruno Siciliano, Alessandro De Luca, and Antonio Bicchi. 2008. An atlas of physical human--robot interaction. Mechanism and Machine Theory 43, 3 (2008), 253--270.
[39]
Yiannis Demiris and Anthony Dearden. 2005. From motor babbling to hierarchical learning by imitation: A robot developmental pathway. In Proc. of the 5th International Workshop on Epigenetic Robotics. 31--37.
[40]
Kevin R. Dixon and Pradeep K. Khosla. 2004. Learning by observation with mobile robots: A computational approach. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004 (ICRA’04). Vol. 1. IEEE, 102--107.
[41]
Alain Droniou, Serena Ivaldi, and Olivier Sigaud. 2014. Learning a repertoire of actions with deep neural networks. In Proceedings of the 2014 Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob’14). IEEE, 229--234.
[42]
Haitham El-Hussieny, Samy F. M. Assal, A. A. Abouelsoud, Said M. Megahed, and Tsukasa Ogasawara. 2015. Incremental learning of reach-to-grasp behavior: A PSO-based Inverse optimal control approach. In Proceedings of the 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR’15). IEEE, 129--135.
[43]
David Feil-Seifer and Maja J. Mataric. 2005. Defining socially assistive robotics. In Proceedings of the 9th International Conference on Rehabilitation Robotics, 2005 (ICORR’05). IEEE, 465--468.
[44]
Benjamin Geisler. 2002. An Empirical Study of Machine Learning Algorithms Applied to Modeling Player Behavior in a First Person Shooter Video Game. Ph.D. Dissertation. Citeseer.
[45]
Tao Geng, Mark Lee, and Martin Hülse. 2011. Transferring human grasping synergies to a robot. Mechatronics 21, 1 (2011), 272--284.
[46]
Miguel González-Fierro, Carlos Balaguer, Nicola Swann, and Thrishantha Nanayakkara. 2013. A humanoid robot standing up through learning from demonstration using a multimodal reward function. In Proceedings of the 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids’13). IEEE, 74--79.
[47]
Bernard Gorman. 2009. Imitation Learning Through Games: Theory, Implementation and Evaluation. Ph.D. Dissertation. Dublin City University.
[48]
Daniel H. Grollman and Aude G. Billard. 2012. Robot learning from failed demonstrations. International Journal of Social Robotics 4, 4 (2012), 331--342.
[49]
Frederic Gruau and Kameel Quatramaran. 1997. Cellular encoding for interactive evolutionary robotics. In 4th European Conference on Artificial Life. MIT Press, 368--377.
[50]
Florent Guenter, Micha Hersch, Sylvain Calinon, and Aude Billard. 2007. Reinforcement learning for imitating constrained reaching movements. Advanced Robotics 21, 13 (2007), 1521--1544.
[51]
Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, and Xiaoshi Wang. 2014. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In Advances in Neural Information Processing Systems. 3338--3346.
[52]
He He, Jason Eisner, and Hal Daume. 2012. Imitation learning by coaching. In Advances in Neural Information Processing Systems. 3149--3157.
[53]
Philip Hingston. 2012. Believable bots. Can Computers Play Like People (2012).
[54]
Chih-Lyang Hwang, Bo-Lin Chen, Huei-Ting Syu, Chao-Kuei Wang, and Mansour Karkoub. 2016. Humanoid robot’s visual imitation of 3-D motion of a human subject using neural-network-based inverse kinematics. IEEE Systems Journal 10, 2 (2016), 685--696.
[55]
Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. 2013. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation 25, 2 (2013), 328--373.
[56]
Auke Jan Ijspeert, Jun Nakanishi, and Stefan Schaal. 2002a. Learning Attractor Landscapes for Learning Motor Primitives. Technical Report.
[57]
Auke Jan Ijspeert, Jun Nakanishi, and Stefan Schaal. 2002b. Learning rhythmic movements by demonstration using nonlinear oscillators. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’02). 958--963.
[58]
Shuhei Ikemoto, Heni Ben Amor, Takashi Minato, Bernhard Jung, and Hiroshi Ishiguro. 2012. Physical human-robot interaction: Mutual learning and adaptation. IEEE Robotics 8 Automation Magazine 19, 4 (2012), 24--35.
[59]
Shuo Jin, Chengkai Dai, Yang Liu, and Charlie CL Wang. 2016. Motion imitation based on sparsely sampled correspondence. arXiv Preprint arXiv:1607.04907 (2016).
[60]
Kshitij Judah, Alan Fern, and Thomas G Dietterich. 2012. Active imitation learning via reduction to iid active learning. arXiv Preprint arXiv:1210.4876 (2012).
[61]
S. Mohammad Khansari-Zadeh and Aude Billard. 2011. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics 27, 5 (2011), 943--957.
[62]
Beomjoon Kim, Amir Massoud Farahmand, Joelle Pineau, and Doina Precup. 2013. Learning from limited demonstrations. In Advances in Neural Information Processing Systems. 2859--2867.
[63]
Jens Kober, J. Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. International Journal of Robotics Research 32, 11 (2013), 1238--1274.
[64]
Jens Kober, Betty Mohler, and Jan Peters. 2008. Learning perceptual coupling for motor primitives. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008 (IROS’08). IEEE, 834--839.
[65]
Jens Kober and Jan Peters. 2009a. Learning motor primitives for robotics. In Proceedings of the IEEE International Conference on Robotics and Automation, 2009 (ICRA’09). IEEE, 2112--2118.
[66]
Jens Kober and Jan Peters. 2010. Imitation and reinforcement learning. IEEE Robotics 8 Automation Magazine 17, 2 (2010), 55--62.
[67]
Jens Kober and Jan Peters. 2014. Movement templates for learning of hitting and batting. In Learning Motor Skills. Springer, 69--82.
[68]
Jens Kober and Jan R. Peters. 2009b. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems. 849--856.
[69]
Jens Kober and Jan R. Peters. 2009c. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems. 849--856.
[70]
Nate Kohl and Peter Stone. 2004. Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004 (ICRA’04), Vol. 3. IEEE, 2619--2624.
[71]
Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez. 2013. Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 1061--1068.
[72]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[73]
Gregory Kuhlmann and Peter Stone. 2007. Graph-based domain mapping for transfer learning in general games. In Machine Learning (ECML’07). Springer, 188--200.
[74]
Hoang M. Le, Andrew Kang, Yisong Yue, and Peter Carr. 2016. Smooth Imitation Learning for Online Sequence Prediction. In Proceedings of the 33rd International Conference on Machine Learning.
[75]
Geoffrey Lee, Min Luo, Fabio Zambetta, and Xiaodong Li. 2014. Learning a super mario controller from examples of human play. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC’14). IEEE, 1--8.
[76]
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2015. End-to-end training of deep visuomotor policies. arXiv Preprint arXiv:1504.00702 (2015).
[77]
Sergey Levine and Vladlen Koltun. 2013. Guided policy search. In Proceedings of the 30th International Conference on Machine Learning. 1--9.
[78]
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv Preprint arXiv:1509.02971 (2015).
[79]
Hsien-I Lin, Yu-Cheng Liu, and Chi-Li Chen. 2011. Evaluation of human-robot arm movement imitation. In Proceedings of the 2011 8th Asian Control Conference (ASCC’11). IEEE, 287--292.
[80]
Long Ji Lin. 1991. Programming robots using reinforcement learning and teaching. In Proceedings of the Ninth National Conference on Artificial Intelligence - Volume 2 (AAAI’91). 781--786.
[81]
Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 3--4 (1992), 293--321.
[82]
Maja J. Mataric. 2000a. Getting humanoids to move and imitate. IEEE Intelligent Systems 15, 4 (2000), 18--24.
[83]
Maja J. Mataric. 2000b. Sensory-motor primitives as a basis for imitation: Linking perception to action and biology to robotics. In Imitation in Animals and Artifacts. Citeseer.
[84]
Hermann Mayer, Faustino Gomez, Daan Wierstra, Istvan Nagy, Alois Knoll, and Jürgen Schmidhuber. 2008. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. Advanced Robotics 22, 13--14 (2008), 1521--1537.
[85]
Hua-Qing Min, Jin-Hui Zhu, and Xi-Jing Zheng. 2005. Obstacle avoidance with multi-objective optimization by PSO in dynamic environment. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Vol. 5. IEEE, 2950--2956.
[86]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. arXiv Preprint arXiv:1602.01783 (2016).
[87]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.
[88]
Yasser Mohammad and Toyoaki Nishida. 2012. Fluid imitation. International Journal of Social Robotics 4, 4 (2012), 369--382.
[89]
Yasser Mohammad and Toyoaki Nishida. 2013. Tackling the correspondence problem. In Proceedings of the International Conference on Active Media Technology. Springer, 84--95.
[90]
Katharina Mülling, Jens Kober, Oliver Kroemer, and Jan Peters. 2013. Learning to select and generalize striking movements in robot table tennis. International Journal of Robotics Research 32, 3 (2013), 263--279.
[91]
Jorge Munoz, German Gutierrez, and Araceli Sanchis. 2009. Controller for torcs created by imitation. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2009 (CIG’09). IEEE, 271--278.
[92]
Jorge Muñoz, German Gutierrez, and Araceli Sanchis. 2010. A human-like TORCS controller for the simulated car racing championship. In Proceedings of the 2010 IEEE Symposium on Computational Intelligence and Games (CIG’10). IEEE, 473--480.
[93]
Jun Nakanishi, Jun Morimoto, Gen Endo, Gordon Cheng, Stefan Schaal, and Mitsuo Kawato. 2004. Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems 47, 2 (2004), 79--91.
[94]
Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger, and Eric Liang. 2006. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX. Springer, 363--372.
[95]
Andrew Y. Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning, Vol. 99. 278--287.
[96]
Monica N. Nicolescu and Maja J. Mataric. 2003. Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 241--248.
[97]
Scott Niekum, Sachin Chitta, Andrew G. Barto, Bhaskara Marthi, and Sarah Osentoski. 2013. Incremental semantically grounded learning from demonstration. In Robotics: Science and Systems, Vol. 9.
[98]
Stefano Nolfi and Dario Floreano. 2000. Evolutionary Robotics: The Biology, Intelligence, and Technology. MIT Press, Cambridge, MA, USA.
[99]
Mark Ollis, Wesley H. Huang, and Michael Happold. 2007. A Bayesian approach to imitation learning for robot navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007 (IROS’07). IEEE, 709--714.
[100]
Juan Ortega, Noor Shaker, Julian Togelius, and Georgios N. Yannakakis. 2013. Imitating human playing styles in super mario bros. Entertainment Computing 4, 2 (2013), 93--104.
[101]
Erhan Oztop and Michael A. Arbib. 2002. Schema design and implementation of the grasp-related mirror neuron system. Biological Cybernetics 87, 2 (2002), 116--140.
[102]
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359.
[103]
Peter Pastor, Mrinal Kalakrishnan, Sachin Chitta, Evangelos Theodorou, and Stefan Schaal. 2011. Skill learning and task outcome prediction for manipulation. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA’11). IEEE, 3828--3834.
[104]
Peter Pastor, Mrinal Kalakrishnan, Franziska Meier, Freek Stulp, Jonas Buchli, Evangelos Theodorou, and Stefan Schaal. 2013. From dynamic movement primitives to associative skill memories. Robotics and Autonomous Systems 61, 4 (2013), 351--361.
[105]
Jan Peters and Stefan Schaal. 2008. Reinforcement learning of motor skills with policy gradients. Neural Networks 21, 4 (2008), 682--697.
[106]
Dean Pomerleau. 1995. Neural network vision for robot driving. In The Handbook of Brain Theory and Neural Networks, M. Arbib (Ed.).
[107]
Polly K. Pook and Dana H. Ballard. 1993. Recognizing teleoperated manipulations. In Proceedings of the 1993 IEEE International Conference on Robotics and Automation, 1993. IEEE, 578--585.
[108]
Bob Price and Craig Boutilier. 1999. Implicit imitation in multiagent reinforcement learning. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 325--334.
[109]
Rouhollah Rahmatizadeh, Pooya Abolghasemi, and Ladislau Bölöni. 2016. Learning manipulation trajectories using recurrent neural networks. arXiv Preprint arXiv:1603.03833 (2016).
[110]
Jette Randlov and Preben Alstrom. 1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning. 463--471.
[111]
Nathan Ratliff, David Bradley, J. Andrew Bagnell, and Joel Chestnutt. 2007. Boosting structured prediction for imitation learning. Robotics Institute (2007), 54.
[112]
Saleha Raza, Sajjad Haider, and M.-A. Williams. 2012. Teaching coordinated strategies to soccer robots via imitation. In Proceedings of the 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO’12). IEEE, 1434--1439.
[113]
Nizar Rokbani, Abdallah Zaidi, and Adel M. Alimi. 2012. Prototyping a biped robot using an educational robotics kit. In Proceedings of the 2012 International Conference on Education and e-Learning Innovations (ICEELI’12). IEEE, 1--4.
[114]
Stéphane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 661--668.
[115]
Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. 2010. A reduction of imitation learning and structured prediction to no-regret online learning. arXiv Preprint arXiv:1011.0686 (2010).
[116]
Leonel Rozo, Danilo Bruno, Sylvain Calinon, and Darwin G. Caldwell. 2015. Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15). IEEE, 1024--1030.
[117]
Leonel Rozo, Pablo Jiménez, and Carme Torras. 2013. A robot learning from demonstration framework to perform force-based manipulation tasks. Intelligent Service Robotics 6, 1 (2013), 33--51.
[118]
Leonel Rozo Castañeda, Sylvain Calinon, Darwin Caldwell, Pablo Jimenez Schlegl, and Carme Torras. 2013. Learning collaborative impedance-based robot behaviors. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1422--1428.
[119]
Stuart J. Russell and Peter Norvig. 2003. Artificial Intelligence: A Modern Approach (2nd ed.). Pearson Education.
[120]
Claude Sammut, Scott Hurst, Dana Kedzier, Donald Michie, and others. 2014. Learning to fly. In Proceedings of the 9th International Workshop on Machine Learning. 385--393.
[121]
Joe Saunders, Chrystopher L. Nehaniv, and Kerstin Dautenhahn. 2006. Teaching robots by moulding behavior and scaffolding the environment. In Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction. ACM, 118--125.
[122]
Stefan Schaal. 1999. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3, 6 (1999), 233--242.
[123]
Stefan Schaal. 1997. Learning from Demonstration. In Advances in Neural Information Processing Systems 9, M. C. Mozer, M. I. Jordan, and T. Petsche (Eds.). MIT Press, 1040--1046.
[124]
Stefan Schaal, Auke Ijspeert, and Aude Billard. 2003. Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society B: Biological Sciences 358, 1431 (2003), 537--547.
[125]
Stefan Schaal, Peyman Mohajerian, and Auke Ijspeert. 2007. Dynamics systems vs. optimal control: A unifying view. Progress in Brain Research 165 (2007), 425--445.
[126]
Tom Schaul, Julian Togelius, and Jürgen Schmidhuber. 2011. Measuring intelligence through games. arXiv Preprint arXiv:1109.1314 (2011).
[127]
Yoav Shoham, Rob Powers, and Trond Grenager. 2003. Multi-Agent Reinforcement Learning: A Critical Survey. Technical Report. Stanford University.
[128]
Aaron P. Shon, David B. Grimes, Chris L. Baker, Matthew W. Hoffman, Shengli Zhou, and Rajesh P. N. Rao. 2005. Probabilistic gaze imitation and saliency learning in a robotic head. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation. IEEE, 2865--2870.
[129]
David Silver, James Bagnell, and Anthony Stentz. 2008. High performance outdoor navigation from overhead data using imitation learning. In Robotics: Science and Systems IV.
[130]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.
[131]
Tsung-Ying Sun, Chih-Li Huo, Shang-Jeng Tsai, and Chan-Cheng Liu. 2008. Optimal UAV flight path planning using skeletonization and particle swarm optimizer. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence). IEEE, 1183--1188.
[132]
Huan Tan. 2015. A behavior generation framework for robots to learn from demonstrations. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC’15). IEEE, 947--953.
[133]
Adriana Tapus, Cristian Tapus, and Maja J. Mataric. 2009. The use of socially assistive robots in the design of intelligent cognitive therapies for people with dementia. In Proceedings of the IEEE International Conference on Rehabilitation Robotics, 2009 (ICORR’09). IEEE, 924--929.
[134]
Christian Thurau, Christian Bauckhage, and Gerhard Sagerer. 2004a. Imitation learning at all levels of game-AI. In Proceedings of the International Conference on Computer Games, Artificial Intelligence, Design and Education, Vol. 5.
[135]
Christian Thurau, Christian Bauckhage, and Gerhard Sagerer. 2004b. Learning human-like movement behavior for computer games. In Proceedings of the International Conference on the Simulation of Adaptive Behavior. 315--323.
[136]
Julian Togelius, Renzo De Nardi, and Simon M. Lucas. 2007. Towards automatic personalised content creation for racing games. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2007 (CIG’07). IEEE, 252--259.
[137]
Lisa Torrey and Jude Shavlik. 2009. Transfer learning. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 1 (2009), 242.
[138]
Lisa Torrey, Trevor Walker, Jude Shavlik, and Richard Maclin. 2005. Using advice to transfer knowledge acquired in one reinforcement learning task to another. In Machine Learning (ECML’05). Springer, 412--424.
[139]
Aleš Ude, Christopher G. Atkeson, and Marcia Riley. 2004. Programming full-body movements for humanoid robots by observation. Robotics and Autonomous Systems 47, 2 (2004), 93--108.
[140]
Andreas Vlachos. 2012. An investigation of imitation learning algorithms for structured prediction. In Proceedings of the European Workshop on Reinforcement Learning (EWRL). Citeseer, 143--154.
[141]
David Vogt, Heni Ben Amor, Erik Berger, and Bernhard Jung. 2014. Learning two-person interaction models for responsive synthetic humanoids. Journal of Virtual Reality and Broadcasting 11, 1 (2014).
[142]
Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. 2015. Maximum entropy deep inverse reinforcement learning. arXiv Preprint arXiv:1507.04888 (2015).
[143]
Chao Zhang, Ziyang Zhen, Daobo Wang, and Meng Li. 2010. UAV path planning method based on ant colony optimization. In Proceedings of the 2010 Chinese Control and Decision Conference. IEEE, 3790--3792.
[144]
Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, and Pieter Abbeel. 2016. Learning deep neural network policies with continuous memory states. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA’16). IEEE, 520--527.
[145]
Yudong Zhang, Shuihua Wang, and Genlin Ji. 2015. A comprehensive survey on particle swarm optimization algorithm and its applications. Mathematical Problems in Engineering 2015 (2015).
[146]
Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. In 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08. 1433--1438.

Cited By

View all
  • (2024)A Study on Longitudinal Driver Model Based on Generative Adversarial Imitation LearningTransaction of the Korean Society of Automotive Engineers10.7467/KSAE.2024.32.1.13732:1(137-148)Online publication date: 31-Jan-2024
  • (2024)Generalizing Objective-Specification in Markov Decision ProcessesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663281(2767-2769)Online publication date: 6-May-2024
  • (2024)Research on reinforcement learning based on PPO algorithm for human-machine intervention in autonomous drivingElectronic Research Archive10.3934/era.202411132:4(2424-2446)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 50, Issue 2
March 2018
567 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3071073
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2017
Accepted: 01 January 2017
Revised: 01 December 2016
Received: 01 April 2016
Published in CSUR Volume 50, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Imitation learning
  2. deep learning
  3. feature representations
  4. intelligent agents
  5. learning from demonstrations
  6. learning from experience
  7. reinforcement learning
  8. robotics
  9. self-improvement

Qualifiers

  • Survey
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,082
  • Downloads (Last 6 weeks)286
Reflects downloads up to 25 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Study on Longitudinal Driver Model Based on Generative Adversarial Imitation LearningTransaction of the Korean Society of Automotive Engineers10.7467/KSAE.2024.32.1.13732:1(137-148)Online publication date: 31-Jan-2024
  • (2024)Generalizing Objective-Specification in Markov Decision ProcessesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663281(2767-2769)Online publication date: 6-May-2024
  • (2024)Research on reinforcement learning based on PPO algorithm for human-machine intervention in autonomous drivingElectronic Research Archive10.3934/era.202411132:4(2424-2446)Online publication date: 2024
  • (2024)The Application of Residual Connection-Based State Normalization Method in GAILMathematics10.3390/math1202021412:2(214)Online publication date: 9-Jan-2024
  • (2024)UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation LearningFuture Internet10.3390/fi1603010516:3(105)Online publication date: 20-Mar-2024
  • (2024)Imagine and Imitate: Cost-Effective Bidding under Partially Observable Price LandscapesBig Data and Cognitive Computing10.3390/bdcc80500468:5(46)Online publication date: 28-Apr-2024
  • (2024)A New AI Approach by Acquisition of Characteristics in Human Decision-Making ProcessApplied Sciences10.3390/app1413546914:13(5469)Online publication date: 24-Jun-2024
  • (2024)Data-Driven Policy Learning Methods from Biological Behavior: A Systematic ReviewApplied Sciences10.3390/app1410403814:10(4038)Online publication date: 9-May-2024
  • (2024)Leveraging imitation learning in agricultural robotics: a comprehensive survey and comparative analysisFrontiers in Robotics and AI10.3389/frobt.2024.144131211Online publication date: 17-Oct-2024
  • (2024)Semantic learning from keyframe demonstration using object attribute constraintsFrontiers in Robotics and AI10.3389/frobt.2024.134033411Online publication date: 18-Jul-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media