Research on Target Trajectory Planning Method of Humanoid Manipulators Based on Reinforcement Learning

Liang, Keyao; Zha, Fusheng; Sheng, Wentao; Guo, Wei; Wang, Pengfei; Sun, Lining

doi:10.1007/978-981-99-6492-5_39

Keyao Liang ORCID: orcid.org/0009-0007-7306-2228¹⁵,
Fusheng Zha ORCID: orcid.org/0000-0001-9695-1940¹⁵,
Wentao Sheng¹⁶,
Wei Guo¹⁵,
Pengfei Wang¹⁵ &
…
Lining Sun ORCID: orcid.org/0000-0002-8354-2440¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14270))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

1055 Accesses

Abstract

The goal of most asymmetrically coordinated manipulative tasks of humanoid manipulators is multilevel. For example, a bottle cap screwing task is composed of several sub-objectives, such as reaching, grasping, aligning, and screwing. In addition, the flexible interaction requirements of dual-arm robots challenge the trajectory planning methods of manipulator with high dimensional and strong coupling characteristics. However, the traditional reinforcement learning algorithms cannot quickly learn and generate the required trajectories above. Based on the idea of multi-agent control, a dual-agent deep deterministic policy gradient algorithm is proposed in this paper, which uses two agents to simultaneously plan the coordinated trajectory of the left arm and the right arm online. This algorithm solves the problem of online trajectory planning for multi-objective tasks of humanoid manipulators. The design of observations and actions in the dual-agent structure can reduce the dimension and decouple the humanoid manipulators’ trajectory planning problem to a certain extent, thus speeding up the learning speed. Moreover, a reward function is constructed to realize the coordinated control between the two agents, to promote dual-agent to generate continuous trajectories for multi-objective tasks. Finally, the effectiveness of the proposed algorithm is verified in Baxter multi-objective task simulation environment under the Gym. The results show that this algorithm can quickly learn and online plan the coordinated trajectory of humanoid manipulators for multi-objective tasks.

Supported in part by the National Natural Science Foundation of China (U2013602, 52075115, 51521003, 61911530250), National Key R &D Program of China (2020YFB13134), Self-Planned Task (SKLRS202001B, SKLRS202110B) of State Key Laboratory of Robotics and System (HIT), Shenzhen Science and Technology Research and Development Foundation (JCYJ20190813171009236), and Basic Scientific Research of Technology (JCKY2020603C009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vahrenkamp, N., Asfour, T., Dillmann, R.: Simultaneous grasp and motion planning: humanoid robot ARMAR-III. IEEE Rob. Autom. Mag. 19(2), 43–57 (2012)
Article Google Scholar
Fang, C., Rocchi, A., Hoffman, E.M., Tsagarakis, N.G., Caldwell, D.G: Efficient self-collision avoidance based on focus of interest for humanoid robots. In: 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pp. 1060–1066. IEEE, Seoul, Korea (South) (2015)
Google Scholar
Park, H.A., Lee, C.S. George: extended cooperative task space for manipulation tasks of humanoid robots. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 6088–6093. IEEE, Seattle, WA, USA (South) (2015)
Google Scholar
Giftthaler, M., Farshidian, F., Sandy, T., Stadelmann, L., Buchli, J.: Efficient kinematic planning for mobile manipulators with non-holonomic constraints using optimal control. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3411–3417. IEEE, Singapore (2017)
Google Scholar
Casalino, A., Massarenti, N., Zanchettin, A.M., Rocco, P.: Predicting the human behaviour in human-robot co-assemblies: an approach based on suffix trees. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11108–11114. IEEE, Las Vegas, NV, USA (2020)
Google Scholar
Lentini, G., Grioli, G., Catalano, M.G. Bicchi, A.: Robot programming without coding. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 7576–7582. IEEE, Paris, France (2020)
Google Scholar
Mronga, D., Kirchner, F.: Learning context-adaptive task constraints for robotic manipulation. Rob. Auton. Syst. 141, 103779 (2021)
Article Google Scholar
Sasabuchi, K., Wake, N., Ikeuchi, K.: Task-oriented motion mapping on robots of various configuration using body role division. IEEE Rob. Autom. Lett. 6(2), 413–420 (2021)
Article Google Scholar
Kim, H., Ohmura, Y., Kuniyoshi, Y.: Transformer-based deep imitation learning for dual-arm robot manipulation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8965–8972. IEEE, Prague, Czech Republic (2021)
Google Scholar
Ye, D., Liu, Z., Sun, M., Shi, B., Zhao, P.: Mastering complex control in moba games with deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6672–6679 (2020)
Google Scholar
Sallab, A.E.L., Abdou, M., Perot, E., Yogamani, S.: Deep reinforcement learning framework for autonomous driving. Elect. Imaging 29(19), 70–76 (2017)
Article Google Scholar
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on machine learning, pp. 1329–1338. PMLR, New York, USA (2016)
Google Scholar
Cuayáhuitl, H., Yu, S., Williamson, A., Carse, J.: Scaling up deep reinforcement learning for multi-domain dialogue systems. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3339–3346. IEEE, Anchorage, AK, USA (2017)
Google Scholar
Lillicrap, T.P: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Mnih, V., et al: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR, New York, USA (2016)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy eep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR, New York, USA (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, 150006, China
Keyao Liang, Fusheng Zha, Wei Guo, Pengfei Wang & Lining Sun
School of Intelligent Manufacturing, Nanjing University of Science and Technology (NJUST), Nanjing, 210094, China
Wentao Sheng

Authors

Keyao Liang
View author publications
You can also search for this author in PubMed Google Scholar
Fusheng Zha
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Wei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lining Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pengfei Wang .

Editor information

Editors and Affiliations

Zhejiang University, Hangzhou, China
Huayong Yang
Harbin Institute of Technology, Shenzhen, China
Honghai Liu
Zhejiang University, Hangzhou, China
Jun Zou
Huazhong University of Science and Technology, Wuhan, China
Zhouping Yin
Shenyang Institute of Automation, Shenyang, Liaoning, China
Lianqing Liu
Zhejiang University, Hangzhou, China
Geng Yang
Zhejiang University, Hangzhou, China
Xiaoping Ouyang
Harbin Institute of Technology, Shenzhen, China
Zhiyong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, K., Zha, F., Sheng, W., Guo, W., Wang, P., Sun, L. (2023). Research on Target Trajectory Planning Method of Humanoid Manipulators Based on Reinforcement Learning. In: Yang, H., et al. Intelligent Robotics and Applications. ICIRA 2023. Lecture Notes in Computer Science(), vol 14270. Springer, Singapore. https://doi.org/10.1007/978-981-99-6492-5_39

Download citation

DOI: https://doi.org/10.1007/978-981-99-6492-5_39
Published: 16 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6491-8
Online ISBN: 978-981-99-6492-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Research on Target Trajectory Planning Method of Humanoid Manipulators Based on Reinforcement Learning