Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation

  • Regular Paper
  • Published:
International Journal of Intelligent Robotics and Applications Aims and scope Submit manuscript

Abstract

For robot manipulation, reinforcement learning has provided an effective end to end approach in controlling the complicated dynamic system. Model-free reinforcement learning methods ignore the model of system dynamics and are limited to simple behavior control. By contrast, model-based methods can quickly reach optimal trajectory planning by building a dynamic system model. However, it is not easy to build an accurate and efficient system model with high generalization ability, especially when facing complex dynamic system and various manipulation tasks. Furthermore, when the rewards provided by the environment are sparse, the agent will also lose effective guidance and fail to optimize the policy efficiently, which results in considerably decreased sample efficiency. In this paper, a model-based deep reinforcement learning algorithm, in which a deep neural network model is utilized to simulate the system dynamics, is designed for robot manipulation. The proposed deep neural network model is robust enough to deal with complex control tasks and possesses the generalization ability. Moreover, a curiosity-based experience replay method is incorporated to solve the sparse reward problem and improve the sample efficiency in reinforcement learning. The agent who manipulates a robotic hand, will be encouraged to explore optimal trajectories according to the failure experience. Simulation experiment results show great effectiveness of the proposed method. Various manipulation tasks are achieved successfully in such a complex dynamic system and the sample efficiency gets improved even in a sparse reward environment, as the learning time gets reduced considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., Mcgrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. In: Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, USA, pp. 78–83 (2017)

  • Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (2016). arXiv:1606.01540 [cs.LG]

  • Cheng, Z., Liang, M.: Trial and error experience replay based deep reinforcement learning. In: IEEE International Symposium on Reinforcement Learning (ISRL 2019), Tokyo, Japan (2019)

  • Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36(2), 285–298 (2002)

    Google Scholar 

  • Ghadirzadeh, A., Maki, A., Kragic, D., Bjrkman, M.: Deep predictive policy training using reinforcement learning. In: Intelligent Robots and Systems (IROS 2017), Vancouver, Canada, pp. 2351–2358 (2017)

  • Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation. In: IEEE International Conference on Robotics & Automation (ICRA 2017), Marina Bay Sands, Singapore, pp. 25–34 (2016a)

  • Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation. In: International Conference on Robotics and Automation (ICRA 2016), Stockholm, Sweden, pp. 58–66 (2016b)

  • Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: IEEE International Conference on Robotics and Automation (ICRA 2017), Marina Bay Sands, Singapore, pp. 40–46 (2017)

  • Ijspeert, A.J.J., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical systems in humanoidrobots. IEEE Int. Conf. Robot. Autom. ICRA 2002 Washington DC USA 2, 1398–1403 (2002)

    Google Scholar 

  • Ijspeert, A.J., Nakanishi, J., Hoffmann, H., Pastor, P., Schaal, S.: Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Comput. 25(2), 328–373 (2013)

    MathSciNet  MATH  Google Scholar 

  • Kumar, V., Tassa, Y., Erez, T., Todorov, E.: Real-time behaviour synthesis for dynamic hand-manipulation. In: International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China, pp. 6808–6815 (2014)

  • Lanka, S., Wu, T.: Archer: Aggressive rewards to counter bias in hindsight experience replay. In: International Conference on Learning Representations (ICRL 2018), Vancouver, Canada, pp. 78–83 (2018)

  • Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N.M.O., Erez, T., Tassa, Y., Silver, D., Wierstra, D.P.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (ICLR 2015), San Diego, USA, pp. 44–49 (2015)

  • Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations, (ICLR 2016) 2016, San Juan, Puerto Rico (2016)

  • Mordatch, I., Popovi, Z., Todorov, E.: Contact-invariant optimization for hand manipulation. In: ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2012, Zurich, Switzerland, pp. 137–144 (2012)

  • Niekum, S., Osentoski, S., Konidaris, G., Barto, A.G.: Learning and generalization of complex tasks from unstructured demonstrations. In: Intelligent Robots and Systems (IROS 2012), Vilamoura-Algarve, Portugal, pp. 45–56 (2012)

  • Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning (ICML 2017), Cancun, Mexico, pp. 33–38 (2017)

  • Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)

    Google Scholar 

  • Posa, M., Cantu, C., Tedrake, R.: Erratum: A direct method for trajectory optimization of rigid bodies through contact. Int. J. Robot. Res. 33(1), 69–81 (2014)

    Google Scholar 

  • Rajeswaran, A., Kumar, V., Gupta, A., Schulman, J., Todorov, E., Levine, S.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of Robotics: Science and Systems (RSS 2017), Boston, USA, pp. 13–25 (2017a)

  • Rajeswaran, A., Lowrey, K., Todorov, E., Kakade, S.: Towards generalization and simplicity in continuous control. In: Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, pp. 15–37 (2017b)

  • Sarah, R., Dominique, S., Flavio, D., Pico, C.: Goal-oriented searching mediated by ventral hippocampus early in trial-and-error learning. Nat. Neurosci. 15(11), 1563–1566 (2012)

    Google Scholar 

  • Schaul, T., Dan, H., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning (ICML 2015), Ithaca, USA, pp. 54–59 (2015)

  • Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V.D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Google Scholar 

  • Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. J. Mach. Learn. Res. 11(335), 3137–3181 (2010)

    MathSciNet  MATH  Google Scholar 

  • Theodorou, E., Buchli, J., Schaal, S.: Reinforcement learning of motor skills in high dimensions: a path integral approach. In: International Conference on Robotics and Automation (ICRA 2010), Anchorage, Alaska, USA, pp. 2397–2403 (2010b)

  • Tian, Y.W.: Training agent for first-person shooter game with actor-critic curriculum learning. In: International Conference on Learning Representations (ICLR 2017), Toulon, France, pp. 30–35 (2017)

  • Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: International Conference on Intelligent Robots and Systems (IROS 2012), Vilamoura-Algarve, Portugal, pp. 5026–5033 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Ma, L. & Schmitz, A. A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation. Int J Intell Robot Appl 4, 217–228 (2020). https://doi.org/10.1007/s41315-020-00135-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41315-020-00135-2

Keywords