research-article

Enhanced method for reinforcement learning based dynamic obstacle avoidance by assessment of collision risk

Authors:

Fabian Hart and

Ostap OkhrinAuthors Info & Claims

Volume 568, Issue C

https://doi.org/10.1016/j.neucom.2023.127097

Published: 14 March 2024 Publication History

Abstract

Naturally inspired designs of training environments for reinforcement learning (RL) often suffer from highly skewed encounter probabilities, with a small subset of experiences being encountered frequently, while extreme experiences remain rare. Despite recent algorithmic advancements, research has demonstrated that such environments present significant challenges for reinforcement learning algorithms. In this study, we first demonstrate that traditional designs in training environments for RL-based dynamic obstacle avoidance show extremely unbalanced probabilities for obstacle encounters in a way that high-risk scenarios with multiple threatening obstacles are rare. To address this limitation, we propose a traffic-type-independent training environment that allows us to exert control over the difficulty of obstacle encounter experiences. This allows us to customarily shift obstacle encounter probabilities towards high-risk experiences, which are assessed via two metrics: The number of obstacles involved and an existing collision risk metric.

Our findings reveal that shifting the training focus towards higher-risk experiences, from which the agent learns, significantly improves the final performance of the agent. To validate the generalizability of our approach, we designed and evaluated two realistic use cases: a mobile robot and a maritime ship facing the threat of approaching obstacles. In both applications, we observed consistent results, underscoring the broad applicability of our proposed approach across various application contexts and independent of the agent’s dynamics. Furthermore, we introduced Gaussian noise to the sensor signals and incorporated different non-linear obstacle behaviors, which resulted in only marginal performance degradation. This demonstrates the robustness of the trained agent in handling environmental uncertainties.

References

[1]

Patle B., Pandey A., Parhi D., Jagadeesh A., et al., A review: On path planning strategies for navigation of mobile robot, Def. Technol. 15 (4) (2019) 582–606.

[2]

Nilsson N.J., Principles of Artificial Intelligence, Springer Science & Business Media, 1982.

Digital Library

[3]

Brand M., Masuda M., Wehner N., Yu X.-H., Ant colony optimization algorithm for robot path planning, in: 2010 International Conference on Computer Design and Applications, Vol. 3, IEEE, 2010, pp. V3–436.

[4]

LaValle S.M., et al., Rapidly-exploring random trees: A new tool for path planning, TMathematics (1998).

[5]

Lumelsky V.J., Skewis T., Incorporating range sensing in the robot navigation function, IEEE Trans. Syst. Man Cybern. 20 (5) (1990) 1058–1069.

[6]

Khatib O., Real-time obstacle avoidance for manipulators and mobile robots, Int. J. Robot. Res. 5 (1) (1986) 90–98.

Digital Library

[7]

Borenstein J., Koren Y., Real-time obstacle avoidance for fast mobile robots, IEEE Trans. Syst. Man Cybern. 19 (5) (1989) 1179–1187.

[8]

Borenstein J., Koren Y., et al., The vector field histogram-fast obstacle avoidance for mobile robots, IEEE Trans. Robot. Autom. 7 (3) (1991) 278–288.

[9]

Ulrich I., Borenstein J., VFH+: Reliable obstacle avoidance for fast mobile robots, in: Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), Vol. 2, IEEE, 1998, pp. 1572–1577.

[10]

Brock O., Khatib O., High-speed navigation using the global dynamic window approach, in: Proceedings 1999 Ieee International Conference on Robotics and Automation (Cat. No. 99CH36288C), Vol. 1, IEEE, 1999, pp. 341–346.

[11]

Minguez J., Montano L., Nearness diagram (ND) navigation: collision avoidance in troublesome scenarios, IEEE Trans. Robot. Autom. 20 (1) (2004) 45–59.

[12]

Simmons R., The curvature-velocity method for local obstacle avoidance, in: Proceedings of IEEE International Conference on Robotics and Automation, Vol. 4, IEEE, 1996, pp. 3375–3382.

[13]

Quinlan S., Khatib O., Elastic bands: Connecting path planning and control, in: [1993] Proceedings IEEE International Conference on Robotics and Automation, IEEE, 1993, pp. 802–807.

[14]

Kunchev V., Jain L., Ivancevic V., Finn A., Path planning and obstacle avoidance for autonomous mobile robots: A review, in: Knowledge-Based Intelligent Information and Engineering Systems: 10th International Conference, KES 2006, Bournemouth, UK, October 9-11, 2006. Proceedings, Part II 10, Springer, 2006, pp. 537–544.

[15]

Zhu K., Zhang T., Deep reinforcement learning based mobile robot navigation: A review, Tsinghua Sci. Technol. 26 (5) (2021) 674–691.

[16]

Azar A.T., Koubaa A., Ali Mohamed N., Ibrahim H.A., Ibrahim Z.F., Kazim M., Ammar A., Benjdira B., Khamis A.M., Hameed I.A., et al., Drone deep reinforcement learning: A review, Electronics 10 (9) (2021) 999.

[17]

Hu L., Hu H., Naeem W., Wang Z., A review on COLREGs-compliant navigation of autonomous surface vehicles: From traditional to learning-based approaches, J. Autom. Intell. 1 (1) (2022).

[18]

Wang Y., He H., Sun C., Learning to navigate through complex dynamic environment with modular deep reinforcement learning, IEEE Trans. Games 10 (4) (2018) 400–412.

[19]

Ding W., Li S., Qian H., Chen Y., Hierarchical reinforcement learning framework towards multi-agent navigation, in: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, 2018, pp. 237–242.

[20]

Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., Riedmiller M., Playing atari with deep reinforcement learning, 2013, arXiv preprint arXiv:1312.5602.

[21]

Arvind C., Senthilnath J., Autonomous RL: Autonomous vehicle obstacle avoidance in a dynamic environment using MLP-SARSA reinforcement learning, in: 2019 IEEE 5th International Conference on Mechatronics System and Robots (ICMSR), IEEE, 2019, pp. 120–124.

[22]

Xu X., Cai P., Ahmed Z., Yellapu V.S., Zhang W., Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning, Neurocomputing 468 (2022) 181–197.

[23]

Yuan J., Wang H., Zhang H., Lin C., Yu D., Li C., AUV obstacle avoidance planning based on deep reinforcement learning, J. Mar. Sci. Eng. 9 (11) (2021) 1166.

[24]

Fan T., Long P., Liu W., Pan J., Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios, Int. J. Robot. Res. 39 (7) (2020) 856–892.

Digital Library

[25]

Kästner L., Buiyan T., Jiao L., Le T.A., Zhao X., Shen Z., Lambrecht J., Arena-rosnav: Towards deployment of deep-reinforcement-learning-based obstacle avoidance into conventional autonomous navigation systems, in: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 6456–6463.

[26]

Zipf G.K., The Psycho-Biology of Language: An Introduction to Dynamic Philology, routledge, 1936.

[27]

Clerkin E.M., Hart E., Rehg J.M., Yu C., Smith L.B., Real-world visual statistics and infants’ first-learned object names, Philos. Trans. R. Soc. B 372 (1711) (2017).

[28]

Smith L.B., Jayaraman S., Clerkin E., Yu C., The developing infant creates a curriculum for statistical learning, Trends Cogn. Sci. 22 (4) (2018) 325–336.

[29]

Arenas A., Danon L., Diaz-Guilera A., Gleiser P.M., Guimera R., Community analysis in social networks, Eur. Phys. J. B 38 (2004) 373–380.

[30]

Albert R., Barabási A.-L., Statistical mechanics of complex networks, Rev. Modern Phys. 74 (1) (2002) 47.

[31]

McClelland J.L., McNaughton B.L., O’Reilly R.C., Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory, Psychol. Rev. 102 (3) (1995) 419.

[32]

Chan S.C., Lampinen A.K., Richemond P.H., Hill F., Zipfian environments for reinforcement learning, in: Conference on Lifelong Learning Agents, PMLR, 2022, pp. 406–429.

[33]

Chun D.-H., Roh M.-I., Lee H.-W., Ha J., Yu D., Deep reinforcement learning-based collision avoidance for an autonomous ship, Ocean Eng. 234 (2021).

[34]

Lan X., Liu Y., Zhao Z., Cooperative control for swarming systems based on reinforcement learning in unknown dynamic environment, Neurocomputing 410 (2020) 410–418.

[35]

Xie S., Chu X., Zheng M., Liu C., A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control, Neurocomputing 411 (2020) 375–392.

[36]

Li L., Wu D., Huang Y., Yuan Z.-M., A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field, Appl. Ocean Res. 113 (2021).

[37]

Zhang C., Vinyals O., Munos R., Bengio S., A study on overfitting in deep reinforcement learning, 2018, arXiv preprint arXiv:1804.06893.

[38]

Hessel M., Soyer H., Espeholt L., Czarnecki W., Schmitt S., van Hasselt H., Multi-task deep reinforcement learning with popart, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 3796–3803.

[39]

Cobbe K., Klimov O., Hesse C., Kim T., Schulman J., Quantifying generalization in reinforcement learning, in: International Conference on Machine Learning, PMLR, 2019, pp. 1282–1289.

[40]

Kirk R., Zhang A., Grefenstette E., Rocktäschel T., A survey of generalisation in deep reinforcement learning, 2021, arXiv preprint arXiv:2111.09794.

[41]

Alatise M.B., Hancke G.P., A review on challenges of autonomous mobile robot and sensor fusion methods, IEEE Access 8 (2020) 39830–39846.

[42]

Munim Z.H., Autonomous ships: a review, innovative applications and future maritime business models, in: Supply Chain Forum: An International Journal, Taylor & Francis, 2019, pp. 266–279.

[43]

Floreano D., Wood R.J., Science, technology and the future of small autonomous drones, nature 521 (7553) (2015) 460–466.

[44]

Andretta M., Some considerations on the definition of risk based on concepts of systems theory and probability, Risk Anal. 34 (7) (2014) 1184–1195.

[45]

Zhen R., Riveiro M., Jin Y., A novel analytic framework of real-time multi-vessel collision risk assessment for maritime traffic surveillance, Ocean Eng. 145 (2017) 492–501.

[46]

Liu Z., Wu Z., Zheng Z., A novel framework for regional collision risk identification based on AIS data, Appl. Ocean Res. 89 (2019) 261–272.

[47]

Szlapczynski R., Krata P., Szlapczynska J., Ship domain applied to determining distances for collision avoidance manoeuvres in give-way situations, Ocean Eng. 165 (2018) 43–54.

[48]

Huang Y., Van Gelder P., Time-varying risk measurement for ship collision prevention, Risk Anal. 40 (1) (2020) 24–42.

[49]

Wilkie D., Van Den Berg J., Manocha D., Generalized velocity obstacles, in: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2009, pp. 5573–5578.

[50]

Mahmoodi M., Alipour K., Masouleh M.T., Mohammadi H.B., Real-time safe navigation in crowded dynamic environments using Generalized Velocity Obstacles, in: International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 46377, American Society of Mechanical Engineers, 2014, V05BT08A060.

[51]

Peterson C.K., Barton J., Virtual structure formations of cooperating UAVs using wind-compensation command generation and generalized velocity obstacles, in: 2015 IEEE Aerospace Conference, IEEE, 2015, pp. 1–7.

[52]

Huang Y., Chen L., Van Gelder P., Generalized velocity obstacle algorithm for preventing ship collisions at sea, Ocean Eng. 173 (2019) 142–156.

[53]

Leon F., Gavrilescu M., A review of tracking and trajectory prediction methods for autonomous driving, Mathematics 9 (6) (2021) 660.

[54]

Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, MIT Press, 2018.

Digital Library

[55]

Bellman R., The theory of dynamic programming, Bull. Amer. Math. Soc. 60 (6) (1954) 503–515.

[56]

Watkins C.J., Dayan P., Q-learning, Mach. Learn. 8 (3–4) (1992) 279–292.

Digital Library

[57]

Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G., et al., Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529–533.

[58]

Fujimoto S., Hoof H., Meger D., Addressing function approximation error in actor-critic methods, in: International Conference on Machine Learning, PMLR, 2018, pp. 1587–1596.

[59]

Lillicrap T.P., Hunt J.J., Pritzel A., Heess N., Erez T., Tassa Y., Silver D., Wierstra D., Continuous control with deep reinforcement learning, 2015, arXiv preprint arXiv:1509.02971.

[60]

Dong H., Dong H., Ding Z., Zhang S., Chang T., Deep Reinforcement Learning, Springer, 2020.

[61]

Waltz M., Okhrin O., Two-sample testing in reinforcement learning, 2022, CoRR abs/2201.08078. URL: https://arxiv.org/abs/2201.08078, arXiv:2201.08078.

[62]

Nair V., Hinton G.E., Rectified linear units improve restricted Boltzmann machines, in: ICML 2010, ICML ’10, Omni Press, Madison, WI, USA, 2010, pp. 807–814.

[63]

Kingma D.P., Ba J., Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.

[64]

Silver D., Hubert T., Schrittwieser J., Antonoglou I., Lai M., Guez A., Lanctot M., Sifre L., Kumaran D., Graepel T., et al., A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science 362 (6419) (2018) 1140–1144.

[65]

Huang L., Qu H., Fu M., Deng W., Reinforcement learning for mobile robot obstacle avoidance under dynamic environments, in: Pacific Rim International Conference on Artificial Intelligence, Springer, 2018, pp. 441–453.

[66]

Long P., Fan T., Liao X., Liu W., Zhang H., Pan J., Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2018, pp. 6252–6259.

[67]

Tong G., Jiang N., Biyue L., Xi Z., Ya W., Wenbo D., UAV navigation in high dynamic environments: A deep reinforcement learning approach, Chin. J. Aeronaut. 34 (2) (2021) 479–489.

[68]

Treiber M., Kesting A., Elementary car-following models, in: Traffic Flow Dynamics: Data, Models and Simulation, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 157–180,.

[69]

Everett M., Chen Y.F., How J.P., Motion planning among dynamic, decision-making agents with deep reinforcement learning, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, pp. 3052–3059.

[70]

Zhao L., Roh M.-I., COLREGs-compliant multiship collision avoidance based on deep reinforcement learning, Ocean Eng. 191 (2019).

[71]

Xu X., Lu Y., Liu X., Zhang W., Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs, Ocean Eng. 217 (2020),.

[72]

Rajamani R., Vehicle Dynamics and Control, Springer Science & Business Media, 2011.

[73]

Yasukawa H., Yoshimura Y., Introduction of MMG standard method for ship maneuvering predictions, J. Mar. Sci. Technol. 20 (1) (2015) 37–52.

[74]

Tsay R.S., Analysis of Financial Time Series, John Wiley & Sons, New Jersey, 2010.

Index Terms

Enhanced method for reinforcement learning based dynamic obstacle avoidance by assessment of collision risk
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
    2. Planning and scheduling
      1. Robotic planning
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Collision risk assessment and automatic obstacle avoidance strategy for teleoperation robots
Highlights
- Collision risk models take into account distance, speed and some indirect factors.
Abstract
Teleoperation robots have been paid more attention in aerospace, medical domain, and other fields because of their high precision and maneuverability. However, the working environments in these fields are characterized by danger or ...
Read More
Reinforcement learning-based dynamic obstacle avoidance and integration of path planning
Abstract
Deep reinforcement learning has the advantage of being able to encode fairly complex behaviors by collecting and learning empirical information. In the current study, we have proposed a framework for reinforcement learning in decentralized ...
Read More
Reinforcement Learning for Mobile Robot Obstacle Avoidance with Deep Deterministic Policy Gradient
Intelligent Robotics and Applications
Abstract
This paper proposed an improved reinforcement learning (RL) algorithm to develop a strategy for a mobile robot to avoid obstacles with deep deterministic policy gradient (DDPG) in order to solve the problem that the robot spends invalid time ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 568, Issue C

Feb 2024

249 pages

ISSN:0925-2312

Issue’s Table of Contents

The Author(s).

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 14 March 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents