Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics
Abstract
:1. Introduction
2. Related Work
2.1. Model-Free Deep Reinforcement Learning Algorithms
2.2. Deep Reinforcement Learning for Robot Navigation Tasks
2.3. Curriculum Learning for Reinforcement Learning Tasks
3. Materials and Methods
3.1. Maximum Entropy Reinforcement Learning–Soft Actor-Critic
3.2. Simulation Environment
3.3. Reinforcement Learning Problem Setup
3.4. Automatic Curriculum Learning: Extension of NavACL to NavACL-Q
Algorithm 1: GetDynamicTask-Q. |
3.5. Pre-Training of the Feature Extractor
4. Results
4.1. Training Results
4.1.1. Pre-Trained Convolutional Encoders
4.1.2. Performance of NavACL-Q SAC with Pre-Trained Convolutional Encoders
4.2. Grid-Based Testing Scenarios
4.3. Ablation Studies
4.3.1. Ablation Studies: Effects of Automatic Curriculum Learning
4.3.2. Ablation Studies: Effects of Pre-Trained Convolutional Encoder
4.4. Comparison to a Map-Based Navigation Approach
5. Discussion
5.1. Learned Behavior of the Agent
5.2. Effects of Pre-Trained Feature Extractor
5.3. Potential Improvements on NavACL-Q
5.4. Effects of Problem Formulations on the Performance
6. Conclusions
7. Patents
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AVG | Automated Guided Vehicle |
RL | Reinforcement Learning |
DRL | Deep Reinforcement Learning |
ACL | Automatic Curriculum Learning |
SLAM | Simultaneous Localization and Mapping |
SAC | Soft Actor-Critic |
NavACL-Q p.t. | NavACL-Q with pre-trained convolutional encoder using Soft Actor-Critic |
NavACL-Q e.t.e | NavACL-Q with Soft Actor-Critic, end-to-end learning |
RND | Soft Actor-Critic with pre-trained convolutional encoder using random starts |
PER | Prioritized Experience Replay |
DQN | Deep Q-Network |
LSTM | Long Short-Term Memory |
Appendix A. Details for Training Via Soft Actor-Critic
Algorithm A1: Distributed Soft Actor-Critic—Worker Process. |
Distributed Soft Actor-Critic Hyperparameters | |
---|---|
Parameter | Value |
Discount factor | |
Target smoothing coefficient | 1 (hard update) |
Target network update interval | 1000 |
Initial temperature coefficient | |
Learning rates for network optimizer , , | |
Optimizer | Adam |
Replay buffer capacity | (Binary Tree) |
(PER) prioritization parameter c | |
(PER) initial prioritization weight | |
(PER) final prioritization weight |
Algorithm A2: Distributed Soft Actor-Critic—Master Process. |
Appendix B. Details for Training the NavACL-Q Algorithm
NavACL-Q Hyperparameters | |
---|---|
Parameter | Value |
Batch size m | 16 |
Upper-confidence coefficient for easy task | |
Upper-confidence coefficient for frontier task | |
Additional threshold for easy task | |
Maximal number of trials to generate a task | 100 |
Learning rate for |
Appendix C. Arena Randomization
Description | Randomization | Induced Randomization with Respect to Geometric Property |
---|---|---|
Initial Robot Yaw-Rotation | Uniform sampled from the interval | Relative Rotation: [1.5 m, 5 m] |
Initial Dolly Yaw-Rotation | Uniform sampled from the interval | |
Number of Obstacles | 1 to 4 | Agent Clearance/ Goal Clearance: |
Position of Obstacles | Randomly placed left and right of the dolly, with a distance uniformly sampled from the interval | |
Initial Robot Position | −0.5 m to 0.5 m on y- and x-axis | Agent-Goal Distance: |
Initial Dolly Position | Uniformly sampled from a circle segment with radius = 5 m and central angle , where the center of the segment corresponds to the center of the robot, with minimum 1.5 m distance to the robot |
Appendix D. Mobile Robot and Target Dolly Specification
Mobile Robot | |
---|---|
Length, Width, Height | 1273 mm × 630 mm × 300 mm |
Maximum Speed | 1.2 m/s |
LiDAR Sensor | 2× 128 Beams, each FOV 225°, Max Distance: 6 m |
Frontal RGB Camera | pixel, FOV |
Dolly | |
Length, Width | 1230 mm × 820 mm |
References
- Yang, S.; Li, J.; Wang, J.; Liu, Z.; Yang, F. Learning urban navigation via value iteration network. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 800–805. [Google Scholar]
- Macek, K.; Vasquez, D.; Fraichard, T.; Siegwart, R. Safe vehicle navigation in dynamic urban scenarios. In Proceedings of the 2008 11th International IEEE Conference on Intelligent Transportation Systems, Beijing, China, 12–15 October 2008; pp. 482–489. [Google Scholar]
- Huang, H.; Gartner, G. A survey of mobile indoor navigation systems. In Cartography in Central and Eastern Europe; Springer: Berlin/Heidelberg, Germany, 2009; pp. 305–319. [Google Scholar]
- Thrun, S. Probabilistic robotics. Commun. ACM 2002, 45, 52–57. [Google Scholar] [CrossRef]
- LaValle, S.M. Planning Algorithms; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Kam, M.; Zhu, X.; Kalata, P. Sensor fusion for mobile robot navigation. Proc. IEEE 1997, 85, 108–119. [Google Scholar] [CrossRef]
- Kocić, J.; Jovičić, N.; Drndarević, V. Sensors and sensor fusion in autonomous vehicles. In Proceedings of the 2018 26th Telecommunications Forum (TELFOR), Belgrade, Serbia, 20–21 November 2018; pp. 420–425. [Google Scholar]
- Zhang, J.; Springenberg, J.T.; Boedecker, J.; Burgard, W. Deep reinforcement learning with successor features for navigation across similar environments. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 2371–2378. [Google Scholar]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv 2017, arXiv:1712.01815. [Google Scholar]
- Badia, A.P.; Piot, B.; Kapturowski, S.; Sprechmann, P.; Vitvitskyi, A.; Guo, Z.D.; Blundell, C. Agent57: Outperforming the atari human benchmark. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 13–18 July 2020; pp. 507–517. [Google Scholar]
- Nguyen, H.; La, H. Review of deep reinforcement learning for robot manipulation. In Proceedings of the 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 25–27 February 2019; pp. 590–595. [Google Scholar]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation. arXiv 2016, arXiv:1610.00633. [Google Scholar]
- Kahn, G.; Villaflor, A.; Ding, B.; Abbeel, P.; Levine, S. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 5129–5136. [Google Scholar]
- Ruan, X.; Ren, D.; Zhu, X.; Huang, J. Mobile robot navigation based on deep reinforcement learning. In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; pp. 6174–6178. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Camacho, E.F.; Alba, C.B. Model Predictive Control; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- NVIDIA Omniverse Platform. Available online: https://developer.nvidia.com/nvidia-omniverse-platform (accessed on 1 October 2021).
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Morad, S.D.; Mecca, R.; Poudel, R.P.; Liwicki, S.; Cipolla, R. Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments. IEEE Robot. Autom. Lett. 2021, 6, 683–690. [Google Scholar] [CrossRef]
- NVIDIA ISAAC SDK, Release: 2021.1. Version 2021.1. Available online: https://developer.nvidia.com/isaac-sdk (accessed on 1 October 2021).
- Van Hasselt, H.; Doron, Y.; Strub, F.; Hessel, M.; Sonnerat, N.; Modayil, J. Deep reinforcement learning and the deadly triad. arXiv 2018, arXiv:1812.02648. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Fortunato, M.; Azar, M.G.; Piot, B.; Menick, J.; Osband, I.; Graves, A.; Mnih, V.; Munos, R.; Hassabis, D.; Pietquin, O.; et al. Noisy networks for exploration. arXiv 2017, arXiv:1706.10295. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2016; pp. 1928–1937. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Abdolmaleki, A.; Springenberg, J.T.; Tassa, Y.; Munos, R.; Heess, N.; Riedmiller, M. Maximum a posteriori policy optimisation. arXiv 2018, arXiv:1806.06920. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Burda, Y.; Edwards, H.; Storkey, A.; Klimov, O. Exploration by random network distillation. arXiv 2018, arXiv:1810.12894. [Google Scholar]
- Tang, H.; Houthooft, R.; Foote, D.; Stooke, A.; Chen, X.; Duan, Y.; Schulman, J.; De Turck, F.; Abbeel, P. # exploration: A study of count-based exploration for deep reinforcement learning. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 1–18. [Google Scholar]
- Pathak, D.; Agrawal, P.; Efros, A.A.; Darrell, T. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, NSW, Australia, 6–11 August 2017; pp. 2778–2787. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Todorov, E.; Erez, T.; Tassa, Y. Mujoco: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 5026–5033. [Google Scholar]
- Huang, D.; Cai, Z.; Wang, Y.; He, X. A real-time fast incremental SLAM method for indoor navigation. In Proceedings of the 2013 Chinese Automation Congress, Changsha, China, 7–8 November 2013; pp. 171–176. [Google Scholar]
- Kim, T.G.; Ko, N.Y.; Noh, S.W. Particle Filter SLAM for Indoor Navigation of a Mobile Robot Using Ultrasonic Beacons. J. Korea Inst. Electron. Commun. Sci. 2012, 7, 391–399. [Google Scholar]
- Megalingam, R.K.; Teja, C.R.; Sreekanth, S.; Raj, A. ROS based autonomous indoor navigation simulation using SLAM algorithm. Int. J. Pure Appl. Math. 2018, 118, 199–205. [Google Scholar]
- Lin, P.T.; Liao, C.A.; Liang, S.H. Probabilistic indoor positioning and navigation (PIPN) of autonomous ground vehicle (AGV) based on wireless measurements. IEEE Access 2021, 9, 25200–25207. [Google Scholar] [CrossRef]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Koenig, N.; Howard, A. Design and use paradigms for gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), Sendai, Japan, 28 September–2 October 2004; Volume 3, pp. 2149–2154. [Google Scholar]
- Marchesini, E.; Farinelli, A. Discrete deep reinforcement learning for mapless navigation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 10688–10694. [Google Scholar]
- Long, P.; Fan, T.; Liao, X.; Liu, W.; Zhang, H.; Pan, J. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 6252–6259. [Google Scholar]
- Zhelo, O.; Zhang, J.; Tai, L.; Liu, M.; Burgard, W. Curiosity-driven exploration for mapless navigation with deep reinforcement learning. arXiv 2018, arXiv:1804.00456. [Google Scholar]
- Xie, L.; Wang, S.; Rosa, S.; Markham, A.; Trigoni, N. Learning with training wheels: Speeding up training with a simple controller for deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 6276–6283. [Google Scholar]
- Chen, X.; Chen, H.; Yang, Y.; Wu, H.; Zhang, W.; Zhao, J.; Xiong, Y. Traffic flow prediction by an ensemble framework with data denoising and deep learning model. Phys. A Stat. Mech. Its Appl. 2021, 565, 125574. [Google Scholar] [CrossRef]
- Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Fei-Fei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3357–3364. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Kulhánek, J.; Derner, E.; De Bruin, T.; Babuška, R. Vision-based navigation using deep reinforcement learning. In Proceedings of the 2019 European Conference on Mobile Robots (ECMR), Prague, Czech Republic, 4–6 September 2019; pp. 1–8. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Chen, G.; Pan, L.; Xu, P.; Wang, Z.; Wu, P.; Ji, J.; Chen, X. Robot navigation with map-based deep reinforcement learning. In Proceedings of the 2020 IEEE International Conference on Networking, Sensing and Control (ICNSC), Nanjing, China, 30 October–2 November 2020; pp. 1–6. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1995–2003. [Google Scholar]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum learning for reinforcement learning domains: A framework and survey. arXiv 2020, arXiv:2003.04960. [Google Scholar]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 41–48. [Google Scholar]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Ren, Z.; Dong, D.; Li, H.; Chen, C. Self-Paced Prioritized Curriculum Learning with Coverage Penalty in Deep Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2216–2226. [Google Scholar] [CrossRef] [PubMed]
- Kim, T.H.; Choi, J. Screenernet: Learning self-paced curriculum for deep neural networks. arXiv 2018, arXiv:1801.00904. [Google Scholar]
- Narvekar, S.; Sinapov, J.; Leonetti, M.; Stone, P. Source task creation for curriculum learning. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, Singapore, 9–13 May 2016; pp. 566–574. [Google Scholar]
- Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Abbeel, P.; Zaremba, W. Hindsight experience replay. arXiv 2017, arXiv:1707.01495. [Google Scholar]
- Fang, M.; Zhou, T.; Du, Y.; Han, L.; Zhang, Z. Curriculum-Guided Hindsight Experience Replay. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Florensa, C.; Held, D.; Wulfmeier, M.; Zhang, M.; Abbeel, P. Reverse curriculum generation for reinforcement learning. In Proceedings of the Conference on Robot Learning, PMLR, Mountain View, CA, USA, 13–15 November 2017; pp. 482–495. [Google Scholar]
- Ivanovic, B.; Harrison, J.; Sharma, A.; Chen, M.; Pavone, M. Barc: Backward reachability curriculum for robotic reinforcement learning. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 15–21. [Google Scholar]
- Hasselt, H. Double Q-learning. Adv. Neural Inf. Process. Syst. 2010, 23, 2613–2621. [Google Scholar]
- Achiam, J. Spinning Up in Deep Reinforcement Learning. Available online: https://spinningup.openai.com/en/latest/2018 (accessed on 1 May 2021).
- Mishra, N.; Rohaninejad, M.; Chen, X.; Abbeel, P. A simple neural attentive meta-learner. arXiv 2017, arXiv:1707.03141. [Google Scholar]
- Bank, D.; Koenigstein, N.; Giryes, R. Autoencoders. arXiv 2020, arXiv:2003.05991. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- NVIDIA TAO. Version 3.0. Available online: https://docs.nvidia.com/tao/tao-toolkit/text/overview.html (accessed on 1 July 2021).
- DetectNet: Deep Neural Network for Object Detection in DIGITS. Available online: https://developer.nvidia.com/blog/detectnet-deep-neural-network-object-detection-digits/ (accessed on 1 August 2021).
- Xiang, Y.; Schmidt, T.; Narayanan, V.; Fox, D. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv 2017, arXiv:1711.00199. [Google Scholar]
- Toromanoff, M.; Wirbel, E.; Moutarde, F. End-to-end model-free reinforcement learning for urban driving using implicit affordances. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7153–7162. [Google Scholar]
- Raffin, A.; Hill, A.; Traoré, R.; Lesort, T.; Díaz-Rodríguez, N.; Filliat, D. Decoupling feature extraction from policy learning: Assessing benefits of state representation learning in goal based robotics. arXiv 2019, arXiv:1901.08651. [Google Scholar]
- Sax, A.; Zhang, J.O.; Emi, B.; Zamir, A.; Savarese, S.; Guibas, L.; Malik, J. Learning to navigate using mid-level visual priors. arXiv 2019, arXiv:1912.11121. [Google Scholar]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv 2014, arXiv:1412.6806. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning, ICML, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Observation Components | |
---|---|
Description | Dimensions |
Sequence of the four most recent camera RGB images | |
Current LiDAR sensor input (front and back sensor concatenated) | |
History of the four previously taken actions | |
History of the four previously received rewards |
Agent-Goal Distance | Euclidean distance from to |
Agent Clearance | Distance from to the nearest obstacle |
Goal Clearance | Distance from to the nearest obstacle |
Relative Angle | The angle between the starting orientation and |
Initial Q-Value | The predicted Q-value from SAC critic network |
Relative Orientation of AVG to Target | Average Success Rate | |||
---|---|---|---|---|
NavACL p.t. | RND | NavACL e.t.e. | Baseline | |
86.6% | ||||
Mean of (Intrapolated Tasks) | ||||
Mean of (Extrapolated Tasks) | ||||
Mean of All Orientations |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xue, H.; Hein, B.; Bakr, M.; Schildbach, G.; Abel, B.; Rueckert, E. Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics. Appl. Sci. 2022, 12, 3153. https://doi.org/10.3390/app12063153
Xue H, Hein B, Bakr M, Schildbach G, Abel B, Rueckert E. Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics. Applied Sciences. 2022; 12(6):3153. https://doi.org/10.3390/app12063153
Chicago/Turabian StyleXue, Honghu, Benedikt Hein, Mohamed Bakr, Georg Schildbach, Bengt Abel, and Elmar Rueckert. 2022. "Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics" Applied Sciences 12, no. 6: 3153. https://doi.org/10.3390/app12063153