Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Enhanced method for reinforcement learning based dynamic obstacle avoidance by assessment of collision risk

Published: 14 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Naturally inspired designs of training environments for reinforcement learning (RL) often suffer from highly skewed encounter probabilities, with a small subset of experiences being encountered frequently, while extreme experiences remain rare. Despite recent algorithmic advancements, research has demonstrated that such environments present significant challenges for reinforcement learning algorithms. In this study, we first demonstrate that traditional designs in training environments for RL-based dynamic obstacle avoidance show extremely unbalanced probabilities for obstacle encounters in a way that high-risk scenarios with multiple threatening obstacles are rare. To address this limitation, we propose a traffic-type-independent training environment that allows us to exert control over the difficulty of obstacle encounter experiences. This allows us to customarily shift obstacle encounter probabilities towards high-risk experiences, which are assessed via two metrics: The number of obstacles involved and an existing collision risk metric.
    Our findings reveal that shifting the training focus towards higher-risk experiences, from which the agent learns, significantly improves the final performance of the agent. To validate the generalizability of our approach, we designed and evaluated two realistic use cases: a mobile robot and a maritime ship facing the threat of approaching obstacles. In both applications, we observed consistent results, underscoring the broad applicability of our proposed approach across various application contexts and independent of the agent’s dynamics. Furthermore, we introduced Gaussian noise to the sensor signals and incorporated different non-linear obstacle behaviors, which resulted in only marginal performance degradation. This demonstrates the robustness of the trained agent in handling environmental uncertainties.

    References

    [1]
    Patle B., Pandey A., Parhi D., Jagadeesh A., et al., A review: On path planning strategies for navigation of mobile robot, Def. Technol. 15 (4) (2019) 582–606.
    [2]
    Nilsson N.J., Principles of Artificial Intelligence, Springer Science & Business Media, 1982.
    [3]
    Brand M., Masuda M., Wehner N., Yu X.-H., Ant colony optimization algorithm for robot path planning, in: 2010 International Conference on Computer Design and Applications, Vol. 3, IEEE, 2010, pp. V3–436.
    [4]
    LaValle S.M., et al., Rapidly-exploring random trees: A new tool for path planning, TMathematics (1998).
    [5]
    Lumelsky V.J., Skewis T., Incorporating range sensing in the robot navigation function, IEEE Trans. Syst. Man Cybern. 20 (5) (1990) 1058–1069.
    [6]
    Khatib O., Real-time obstacle avoidance for manipulators and mobile robots, Int. J. Robot. Res. 5 (1) (1986) 90–98.
    [7]
    Borenstein J., Koren Y., Real-time obstacle avoidance for fast mobile robots, IEEE Trans. Syst. Man Cybern. 19 (5) (1989) 1179–1187.
    [8]
    Borenstein J., Koren Y., et al., The vector field histogram-fast obstacle avoidance for mobile robots, IEEE Trans. Robot. Autom. 7 (3) (1991) 278–288.
    [9]
    Ulrich I., Borenstein J., VFH+: Reliable obstacle avoidance for fast mobile robots, in: Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), Vol. 2, IEEE, 1998, pp. 1572–1577.
    [10]
    Brock O., Khatib O., High-speed navigation using the global dynamic window approach, in: Proceedings 1999 Ieee International Conference on Robotics and Automation (Cat. No. 99CH36288C), Vol. 1, IEEE, 1999, pp. 341–346.
    [11]
    Minguez J., Montano L., Nearness diagram (ND) navigation: collision avoidance in troublesome scenarios, IEEE Trans. Robot. Autom. 20 (1) (2004) 45–59.
    [12]
    Simmons R., The curvature-velocity method for local obstacle avoidance, in: Proceedings of IEEE International Conference on Robotics and Automation, Vol. 4, IEEE, 1996, pp. 3375–3382.
    [13]
    Quinlan S., Khatib O., Elastic bands: Connecting path planning and control, in: [1993] Proceedings IEEE International Conference on Robotics and Automation, IEEE, 1993, pp. 802–807.
    [14]
    Kunchev V., Jain L., Ivancevic V., Finn A., Path planning and obstacle avoidance for autonomous mobile robots: A review, in: Knowledge-Based Intelligent Information and Engineering Systems: 10th International Conference, KES 2006, Bournemouth, UK, October 9-11, 2006. Proceedings, Part II 10, Springer, 2006, pp. 537–544.
    [15]
    Zhu K., Zhang T., Deep reinforcement learning based mobile robot navigation: A review, Tsinghua Sci. Technol. 26 (5) (2021) 674–691.
    [16]
    Azar A.T., Koubaa A., Ali Mohamed N., Ibrahim H.A., Ibrahim Z.F., Kazim M., Ammar A., Benjdira B., Khamis A.M., Hameed I.A., et al., Drone deep reinforcement learning: A review, Electronics 10 (9) (2021) 999.
    [17]
    Hu L., Hu H., Naeem W., Wang Z., A review on COLREGs-compliant navigation of autonomous surface vehicles: From traditional to learning-based approaches, J. Autom. Intell. 1 (1) (2022).
    [18]
    Wang Y., He H., Sun C., Learning to navigate through complex dynamic environment with modular deep reinforcement learning, IEEE Trans. Games 10 (4) (2018) 400–412.
    [19]
    Ding W., Li S., Qian H., Chen Y., Hierarchical reinforcement learning framework towards multi-agent navigation, in: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, 2018, pp. 237–242.
    [20]
    Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., Riedmiller M., Playing atari with deep reinforcement learning, 2013, arXiv preprint arXiv:1312.5602.
    [21]
    Arvind C., Senthilnath J., Autonomous RL: Autonomous vehicle obstacle avoidance in a dynamic environment using MLP-SARSA reinforcement learning, in: 2019 IEEE 5th International Conference on Mechatronics System and Robots (ICMSR), IEEE, 2019, pp. 120–124.
    [22]
    Xu X., Cai P., Ahmed Z., Yellapu V.S., Zhang W., Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning, Neurocomputing 468 (2022) 181–197.
    [23]
    Yuan J., Wang H., Zhang H., Lin C., Yu D., Li C., AUV obstacle avoidance planning based on deep reinforcement learning, J. Mar. Sci. Eng. 9 (11) (2021) 1166.
    [24]
    Fan T., Long P., Liu W., Pan J., Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios, Int. J. Robot. Res. 39 (7) (2020) 856–892.
    [25]
    Kästner L., Buiyan T., Jiao L., Le T.A., Zhao X., Shen Z., Lambrecht J., Arena-rosnav: Towards deployment of deep-reinforcement-learning-based obstacle avoidance into conventional autonomous navigation systems, in: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 6456–6463.
    [26]
    Zipf G.K., The Psycho-Biology of Language: An Introduction to Dynamic Philology, routledge, 1936.
    [27]
    Clerkin E.M., Hart E., Rehg J.M., Yu C., Smith L.B., Real-world visual statistics and infants’ first-learned object names, Philos. Trans. R. Soc. B 372 (1711) (2017).
    [28]
    Smith L.B., Jayaraman S., Clerkin E., Yu C., The developing infant creates a curriculum for statistical learning, Trends Cogn. Sci. 22 (4) (2018) 325–336.
    [29]
    Arenas A., Danon L., Diaz-Guilera A., Gleiser P.M., Guimera R., Community analysis in social networks, Eur. Phys. J. B 38 (2004) 373–380.
    [30]
    Albert R., Barabási A.-L., Statistical mechanics of complex networks, Rev. Modern Phys. 74 (1) (2002) 47.
    [31]
    McClelland J.L., McNaughton B.L., O’Reilly R.C., Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory, Psychol. Rev. 102 (3) (1995) 419.
    [32]
    Chan S.C., Lampinen A.K., Richemond P.H., Hill F., Zipfian environments for reinforcement learning, in: Conference on Lifelong Learning Agents, PMLR, 2022, pp. 406–429.
    [33]
    Chun D.-H., Roh M.-I., Lee H.-W., Ha J., Yu D., Deep reinforcement learning-based collision avoidance for an autonomous ship, Ocean Eng. 234 (2021).
    [34]
    Lan X., Liu Y., Zhao Z., Cooperative control for swarming systems based on reinforcement learning in unknown dynamic environment, Neurocomputing 410 (2020) 410–418.
    [35]
    Xie S., Chu X., Zheng M., Liu C., A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control, Neurocomputing 411 (2020) 375–392.
    [36]
    Li L., Wu D., Huang Y., Yuan Z.-M., A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field, Appl. Ocean Res. 113 (2021).
    [37]
    Zhang C., Vinyals O., Munos R., Bengio S., A study on overfitting in deep reinforcement learning, 2018, arXiv preprint arXiv:1804.06893.
    [38]
    Hessel M., Soyer H., Espeholt L., Czarnecki W., Schmitt S., van Hasselt H., Multi-task deep reinforcement learning with popart, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 3796–3803.
    [39]
    Cobbe K., Klimov O., Hesse C., Kim T., Schulman J., Quantifying generalization in reinforcement learning, in: International Conference on Machine Learning, PMLR, 2019, pp. 1282–1289.
    [40]
    Kirk R., Zhang A., Grefenstette E., Rocktäschel T., A survey of generalisation in deep reinforcement learning, 2021, arXiv preprint arXiv:2111.09794.
    [41]
    Alatise M.B., Hancke G.P., A review on challenges of autonomous mobile robot and sensor fusion methods, IEEE Access 8 (2020) 39830–39846.
    [42]
    Munim Z.H., Autonomous ships: a review, innovative applications and future maritime business models, in: Supply Chain Forum: An International Journal, Taylor & Francis, 2019, pp. 266–279.
    [43]
    Floreano D., Wood R.J., Science, technology and the future of small autonomous drones, nature 521 (7553) (2015) 460–466.
    [44]
    Andretta M., Some considerations on the definition of risk based on concepts of systems theory and probability, Risk Anal. 34 (7) (2014) 1184–1195.
    [45]
    Zhen R., Riveiro M., Jin Y., A novel analytic framework of real-time multi-vessel collision risk assessment for maritime traffic surveillance, Ocean Eng. 145 (2017) 492–501.
    [46]
    Liu Z., Wu Z., Zheng Z., A novel framework for regional collision risk identification based on AIS data, Appl. Ocean Res. 89 (2019) 261–272.
    [47]
    Szlapczynski R., Krata P., Szlapczynska J., Ship domain applied to determining distances for collision avoidance manoeuvres in give-way situations, Ocean Eng. 165 (2018) 43–54.
    [48]
    Huang Y., Van Gelder P., Time-varying risk measurement for ship collision prevention, Risk Anal. 40 (1) (2020) 24–42.
    [49]
    Wilkie D., Van Den Berg J., Manocha D., Generalized velocity obstacles, in: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2009, pp. 5573–5578.
    [50]
    Mahmoodi M., Alipour K., Masouleh M.T., Mohammadi H.B., Real-time safe navigation in crowded dynamic environments using Generalized Velocity Obstacles, in: International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 46377, American Society of Mechanical Engineers, 2014, V05BT08A060.
    [51]
    Peterson C.K., Barton J., Virtual structure formations of cooperating UAVs using wind-compensation command generation and generalized velocity obstacles, in: 2015 IEEE Aerospace Conference, IEEE, 2015, pp. 1–7.
    [52]
    Huang Y., Chen L., Van Gelder P., Generalized velocity obstacle algorithm for preventing ship collisions at sea, Ocean Eng. 173 (2019) 142–156.
    [53]
    Leon F., Gavrilescu M., A review of tracking and trajectory prediction methods for autonomous driving, Mathematics 9 (6) (2021) 660.
    [54]
    Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, MIT Press, 2018.
    [55]
    Bellman R., The theory of dynamic programming, Bull. Amer. Math. Soc. 60 (6) (1954) 503–515.
    [56]
    Watkins C.J., Dayan P., Q-learning, Mach. Learn. 8 (3–4) (1992) 279–292.
    [57]
    Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G., et al., Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529–533.
    [58]
    Fujimoto S., Hoof H., Meger D., Addressing function approximation error in actor-critic methods, in: International Conference on Machine Learning, PMLR, 2018, pp. 1587–1596.
    [59]
    Lillicrap T.P., Hunt J.J., Pritzel A., Heess N., Erez T., Tassa Y., Silver D., Wierstra D., Continuous control with deep reinforcement learning, 2015, arXiv preprint arXiv:1509.02971.
    [60]
    Dong H., Dong H., Ding Z., Zhang S., Chang T., Deep Reinforcement Learning, Springer, 2020.
    [61]
    Waltz M., Okhrin O., Two-sample testing in reinforcement learning, 2022, CoRR abs/2201.08078. URL: https://arxiv.org/abs/2201.08078, arXiv:2201.08078.
    [62]
    Nair V., Hinton G.E., Rectified linear units improve restricted Boltzmann machines, in: ICML 2010, ICML ’10, Omni Press, Madison, WI, USA, 2010, pp. 807–814.
    [63]
    Kingma D.P., Ba J., Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.
    [64]
    Silver D., Hubert T., Schrittwieser J., Antonoglou I., Lai M., Guez A., Lanctot M., Sifre L., Kumaran D., Graepel T., et al., A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science 362 (6419) (2018) 1140–1144.
    [65]
    Huang L., Qu H., Fu M., Deng W., Reinforcement learning for mobile robot obstacle avoidance under dynamic environments, in: Pacific Rim International Conference on Artificial Intelligence, Springer, 2018, pp. 441–453.
    [66]
    Long P., Fan T., Liao X., Liu W., Zhang H., Pan J., Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2018, pp. 6252–6259.
    [67]
    Tong G., Jiang N., Biyue L., Xi Z., Ya W., Wenbo D., UAV navigation in high dynamic environments: A deep reinforcement learning approach, Chin. J. Aeronaut. 34 (2) (2021) 479–489.
    [68]
    Treiber M., Kesting A., Elementary car-following models, in: Traffic Flow Dynamics: Data, Models and Simulation, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 157–180,.
    [69]
    Everett M., Chen Y.F., How J.P., Motion planning among dynamic, decision-making agents with deep reinforcement learning, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, pp. 3052–3059.
    [70]
    Zhao L., Roh M.-I., COLREGs-compliant multiship collision avoidance based on deep reinforcement learning, Ocean Eng. 191 (2019).
    [71]
    Xu X., Lu Y., Liu X., Zhang W., Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs, Ocean Eng. 217 (2020),.
    [72]
    Rajamani R., Vehicle Dynamics and Control, Springer Science & Business Media, 2011.
    [73]
    Yasukawa H., Yoshimura Y., Introduction of MMG standard method for ship maneuvering predictions, J. Mar. Sci. Technol. 20 (1) (2015) 37–52.
    [74]
    Tsay R.S., Analysis of Financial Time Series, John Wiley & Sons, New Jersey, 2010.

    Index Terms

    1. Enhanced method for reinforcement learning based dynamic obstacle avoidance by assessment of collision risk
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Neurocomputing
            Neurocomputing  Volume 568, Issue C
            Feb 2024
            249 pages

            Publisher

            Elsevier Science Publishers B. V.

            Netherlands

            Publication History

            Published: 14 March 2024

            Author Tags

            1. Dynamic obstacle avoidance
            2. Reinforcement learning
            3. Training environment
            4. Collision risk metric

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0

            Other Metrics

            Citations

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media