A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation

Zhang, Cheng; Ma, Liang; Schmitz, Alexander

doi:10.1007/s41315-020-00135-2

A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation

Regular Paper
Published: 16 May 2020

Volume 4, pages 217–228, (2020)
Cite this article

International Journal of Intelligent Robotics and Applications Aims and scope Submit manuscript

Cheng Zhang¹,
Liang Ma² &
Alexander Schmitz³

701 Accesses
6 Citations
Explore all metrics

Abstract

For robot manipulation, reinforcement learning has provided an effective end to end approach in controlling the complicated dynamic system. Model-free reinforcement learning methods ignore the model of system dynamics and are limited to simple behavior control. By contrast, model-based methods can quickly reach optimal trajectory planning by building a dynamic system model. However, it is not easy to build an accurate and efficient system model with high generalization ability, especially when facing complex dynamic system and various manipulation tasks. Furthermore, when the rewards provided by the environment are sparse, the agent will also lose effective guidance and fail to optimize the policy efficiently, which results in considerably decreased sample efficiency. In this paper, a model-based deep reinforcement learning algorithm, in which a deep neural network model is utilized to simulate the system dynamics, is designed for robot manipulation. The proposed deep neural network model is robust enough to deal with complex control tasks and possesses the generalization ability. Moreover, a curiosity-based experience replay method is incorporated to solve the sparse reward problem and improve the sample efficiency in reinforcement learning. The agent who manipulates a robotic hand, will be encouraged to explore optimal trajectories according to the failure experience. Simulation experiment results show great effectiveness of the proposed method. Various manipulation tasks are achieved successfully in such a complex dynamic system and the sample efficiency gets improved even in a sparse reward environment, as the learning time gets reduced considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Deep Reinforcement Learning Algorithm for Intelligent Manipulation

Robot Path Planning via Deep Reinforcement Learning with Improved Reward Function

Deep Reinforcement Learning for an Anthropomorphic Robotic Arm Under Sparse Reward Tasks

References

Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., Mcgrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. In: Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, USA, pp. 78–83 (2017)
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (2016). arXiv:1606.01540 [cs.LG]
Cheng, Z., Liang, M.: Trial and error experience replay based deep reinforcement learning. In: IEEE International Symposium on Reinforcement Learning (ISRL 2019), Tokyo, Japan (2019)
Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36(2), 285–298 (2002)
Google Scholar
Ghadirzadeh, A., Maki, A., Kragic, D., Bjrkman, M.: Deep predictive policy training using reinforcement learning. In: Intelligent Robots and Systems (IROS 2017), Vancouver, Canada, pp. 2351–2358 (2017)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation. In: IEEE International Conference on Robotics & Automation (ICRA 2017), Marina Bay Sands, Singapore, pp. 25–34 (2016a)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation. In: International Conference on Robotics and Automation (ICRA 2016), Stockholm, Sweden, pp. 58–66 (2016b)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: IEEE International Conference on Robotics and Automation (ICRA 2017), Marina Bay Sands, Singapore, pp. 40–46 (2017)
Ijspeert, A.J.J., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical systems in humanoidrobots. IEEE Int. Conf. Robot. Autom. ICRA 2002 Washington DC USA 2, 1398–1403 (2002)
Google Scholar
Ijspeert, A.J., Nakanishi, J., Hoffmann, H., Pastor, P., Schaal, S.: Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Comput. 25(2), 328–373 (2013)
MathSciNet MATH Google Scholar
Kumar, V., Tassa, Y., Erez, T., Todorov, E.: Real-time behaviour synthesis for dynamic hand-manipulation. In: International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China, pp. 6808–6815 (2014)
Lanka, S., Wu, T.: Archer: Aggressive rewards to counter bias in hindsight experience replay. In: International Conference on Learning Representations (ICRL 2018), Vancouver, Canada, pp. 78–83 (2018)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N.M.O., Erez, T., Tassa, Y., Silver, D., Wierstra, D.P.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (ICLR 2015), San Diego, USA, pp. 44–49 (2015)
Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations, (ICLR 2016) 2016, San Juan, Puerto Rico (2016)
Mordatch, I., Popovi, Z., Todorov, E.: Contact-invariant optimization for hand manipulation. In: ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2012, Zurich, Switzerland, pp. 137–144 (2012)
Niekum, S., Osentoski, S., Konidaris, G., Barto, A.G.: Learning and generalization of complex tasks from unstructured demonstrations. In: Intelligent Robots and Systems (IROS 2012), Vilamoura-Algarve, Portugal, pp. 45–56 (2012)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning (ICML 2017), Cancun, Mexico, pp. 33–38 (2017)
Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)
Google Scholar
Posa, M., Cantu, C., Tedrake, R.: Erratum: A direct method for trajectory optimization of rigid bodies through contact. Int. J. Robot. Res. 33(1), 69–81 (2014)
Google Scholar
Rajeswaran, A., Kumar, V., Gupta, A., Schulman, J., Todorov, E., Levine, S.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of Robotics: Science and Systems (RSS 2017), Boston, USA, pp. 13–25 (2017a)
Rajeswaran, A., Lowrey, K., Todorov, E., Kakade, S.: Towards generalization and simplicity in continuous control. In: Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, pp. 15–37 (2017b)
Sarah, R., Dominique, S., Flavio, D., Pico, C.: Goal-oriented searching mediated by ventral hippocampus early in trial-and-error learning. Nat. Neurosci. 15(11), 1563–1566 (2012)
Google Scholar
Schaul, T., Dan, H., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning (ICML 2015), Ithaca, USA, pp. 54–59 (2015)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V.D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Google Scholar
Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. J. Mach. Learn. Res. 11(335), 3137–3181 (2010)
MathSciNet MATH Google Scholar
Theodorou, E., Buchli, J., Schaal, S.: Reinforcement learning of motor skills in high dimensions: a path integral approach. In: International Conference on Robotics and Automation (ICRA 2010), Anchorage, Alaska, USA, pp. 2397–2403 (2010b)
Tian, Y.W.: Training agent for first-person shooter game with actor-critic curriculum learning. In: International Conference on Learning Representations (ICLR 2017), Toulon, France, pp. 30–35 (2017)
Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: International Conference on Intelligent Robots and Systems (IROS 2012), Vilamoura-Algarve, Portugal, pp. 5026–5033 (2012)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Communications Engineering, Waseda University, Tokyo, Japan
Cheng Zhang
Department of Information Technology, JiangNan University, Wuxi, China
Liang Ma
Department of Modern Mechanical Engineering, Waseda University, Tokyo, Japan
Alexander Schmitz

Authors

Cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schmitz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Ma, L. & Schmitz, A. A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation. Int J Intell Robot Appl 4, 217–228 (2020). https://doi.org/10.1007/s41315-020-00135-2

Download citation

Received: 02 December 2019
Accepted: 07 May 2020
Published: 16 May 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s41315-020-00135-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Deep Reinforcement Learning Algorithm for Intelligent Manipulation

Robot Path Planning via Deep Reinforcement Learning with Improved Reward Function

Deep Reinforcement Learning for an Anthropomorphic Robotic Arm Under Sparse Reward Tasks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Deep Reinforcement Learning Algorithm for Intelligent Manipulation

Robot Path Planning via Deep Reinforcement Learning with Improved Reward Function

Deep Reinforcement Learning for an Anthropomorphic Robotic Arm Under Sparse Reward Tasks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation