Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multi-start team orienteering problem for UAS mission re-planning with data-efficient deep reinforcement learning

Published: 27 March 2024 Publication History

Abstract

In this paper, we study the Multi-Start Team Orienteering Problem (MSTOP), a mission re-planning problem where vehicles are initially located away from the depot and have different amounts of fuel. We consider/assume the goal of multiple vehicles is to travel to maximize the sum of collected profits under resource (e.g., time, fuel) consumption constraints. Such re-planning problems occur in a wide range of intelligent UAS applications where changes in the mission environment force the operation of multiple vehicles to change from the original plan. To solve this problem with deep reinforcement learning (RL), we develop a policy network with self-attention on each partial tour and encoder-decoder attention between the partial tour and the remaining nodes. We propose a modified REINFORCE algorithm where the greedy rollout baseline is replaced by a local mini-batch baseline based on multiple, possibly non-duplicate sample rollouts. By drawing multiple samples per training instance, we can learn faster and obtain a stable policy gradient estimator with significantly fewer instances. The proposed training algorithm outperforms the conventional greedy rollout baseline, even when combined with the maximum entropy objective. The efficiency of our method is further demonstrated in two classical problems – the Traveling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). The experimental results show that our method enables models to develop more effective heuristics and performs competitively with the state-of-the-art deep reinforcement learning methods.

References

[1]
Coutinho WP, Battarra M, and Fliege J The unmanned aerial vehicle routing and trajectory optimisation problem, a taxonomic review Comput Ind Eng 2018 120 116-28
[2]
Rojas Viloria D, Solano-Charris EL, Muñoz-Villamizar A, and Montoya-Torres JR Unmanned aerial vehicles/drones in vehicle routing problems: a literature review Int Trans Oper Res 2021 28 1626-57
[3]
Kool W, Hoof HV, Welling M (2019) Attention, Learn to Solve Routing Problems! In: 2019 International Conference on Learning Representations (ICLR).
[4]
Kwon Y-D, Choo J, Kim B, Yoon I, Gwon Y, Min S (2020) Pomo: Policy optimization with multiple optima for reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS), 21188–98.
[5]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–10. Curran Associates Inc, Long Beach, California, USA. https://dl.acm.org/doi/10.5555/3295222.3295349
[6]
Bresson X, Laurent T (2021) The transformer network for the traveling salesman problem. In: ArXiv.
[7]
Peng B, Wang J, Zhang Z (2020) A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems. In.
[8]
Eysenbach B, Levine S (2021) Maximum entropy rl (provably) solves some robust rl problems. In.
[9]
Ahmed Z, Le Roux N, Norouzi M, Schuurmans D (2019) Understanding the impact of entropy on policy optimization. In: International conference on machine learning, 151–60. PMLR.
[10]
Archetti C, Grazia Speranza M, Vigo D (n.d.) Chapter 10: Vehicle Routing Problems with Profits. In: Vehicle Routing (MOS-SIAM Series on Optimization). https://epubs.siam.org/doi/abs/10.1137/1.9781611973594.ch10
[11]
Archetti C, Bianchessi N, and Speranza MG Optimal solutions for routing problems with profits Discret Appl Math 2013 161 547-57
[12]
Vansteenwegen P, Souffriau W, and Van Oudheusden D The orienteering problem: a survey Eur J Oper Res 2011 209 1-10
[13]
Butt SE and Ryan DM An optimal solution procedure for the multiple tour maximum collection problem using column generation Comput Oper Res 1999 26 427-41
[14]
Boussier S, Feillet D, and Gendreau M An exact algorithm for team orienteering problems 4OR 2007 5 211-30
[15]
Bono G, Dibangoye JS, Simonin O, Matignon L, and Pereyron F Solving multi-agent routing problems using deep attention mechanisms IEEE Trans Intell Transp Syst 2021 22 7804-13
[16]
Lin S-W Solving the team orienteering problem using effective multi-start simulated annealing Appl Soft Comput 2013 13 1064-73
[17]
Lin S-W and Yu VF Solving the team orienteering problem with time windows and mandatory visits by multi-start simulated annealing Comput Ind Eng 2017 114 195-205
[18]
Hapsari I, Surjandari I, and Komarudin K Solving multi-objective team orienteering problem with time windows using adjustment iterated local search J Ind Eng Int 2019 15 679-93
[19]
Bello I, Pham H, Le QV, Norouzi M, Bengio S (2017) Neural Combinatorial Optimization with Reinforcement Learning. In: 2017 International Conference on Learning Representations (ICLR).
[20]
Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in Neural Information Processing Systems (NeurIPS).
[21]
Khalil E, Dai H, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems (NeurIPS).
[22]
Nazari M, Oroojlooy A, Snyder L, Takác M (2018) Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems (NeurIPS).
[23]
Deudon M, Cournut P, Lacoste A, Adulyasak Y, Rousseau LM (2018) Learning heuristics for the tsp by policy gradient. In: International conference on the integration of constraint programming, artificial intelligence, and operations research, 170–81. Springer.
[24]
Cappart Q, Moisan T, Rousseau L-M, Prémont-Schwarz I, Cire A (2020) Combining reinforcement learning and constraint programming for combinatorial optimization. In: ArXiv.
[25]
Li J, Ma Y, Gao R, Cao Z, Lim A, Song W, and Zhang J Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem IEEE Trans Cybern 2021
[26]
Li K, Zhang T, Wang R, Wang Y, Han Y, and Wang L Deep reinforcement learning for combinatorial optimization: covering salesman problems IEEE Trans Cybern 2021
[27]
Xu Y, Fang M, Chen L, Gangyan X, Yali D, and Zhang C Reinforcement learning with multiple relational attention for solving vehicle routing problems IEEE Trans Cybern 2021
[28]
Pan W and Liu SQ Deep reinforcement learning for the dynamic and uncertain vehicle routing problem Appl Intell 2023 53 405-22
[29]
Wang Q VARL: a variational autoencoder-based reinforcement learning Framework for vehicle routing problems Appl Intell 2022 52 8910-23
[30]
Joshi CK, Laurent T, Bresson X (2019) On learning paradigms for the travelling salesman problem. In ArXiv.
[31]
Kool W, van Hoof H, Welling M (2019) Buy 4 reinforce samples, get a baseline for free! In: ICLR 2019 Deep Reinforcement Learning meets Structured Prediction Workshop. https://openreview.net/forum?id=r1lgTGL5DE. Accessed 23 Jun 2022
[32]
Kool W, van Hoof H, Welling M (2019) Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. In: International Conference on Machine Learning (ICML), 3499–508. PMLR.
[33]
Croes GA (1958) A method for solving traveling-salesman problems. Oper Res 6:791–812. https://www.jstor.org/stable/167074. Accessed 23 Jun 2022
[34]
Gurobi Optimization, LLC (2018) Gurobi optimizer reference manual. https://www.gurobi.com
[35]
Williams RJ Simple statistical gradient-following algorithms for connectionist reinforcement learning Mach Learn 1992 8 229-56
[36]
Sutton RS and Barto AG Reinforcement learning: An introduction 2018 MIT press Cambridge
[37]
Sultana N, Chan J, Sarwar T, Qin AK (2021) Learning to Optimise Routing Problems using Policy Optimisation. In: 2021 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.
[38]
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. In.
[39]
Tsiligirides T (1984) Heuristic methods applied to orienteering. J Oper Res Soc 35:797–809. https://www.jstor.org/stable/2582629. Accessed 23 Jun 2022

Cited By

View all
  • (2024)Near-Field Multi-Beam Design for Extremely Large-Scale RIS Assisted by Multi-Start Adam AlgorithmProceedings of the 2024 12th International Conference on Communications and Broadband Networking10.1145/3688636.3688649(70-75)Online publication date: 24-Jul-2024
  • (2024)Generation of Tourist Routes Considering Preferences and Public Transport Using Artificial Intelligence Planning TechniquesComputational Logistics10.1007/978-3-031-71993-6_11(164-175)Online publication date: 9-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Applied Intelligence
Applied Intelligence  Volume 54, Issue 6
Mar 2024
730 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 27 March 2024
Accepted: 26 February 2024

Author Tags

  1. Deep reinforcement learning
  2. Data-efficient training
  3. Combinatorial optimization
  4. Mission re-planning
  5. Autonomous systems

Qualifiers

  • Research-article

Funding Sources

  • National Research Foundation of Korea
  • Korea Advanced Institute of Science and Technology

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Near-Field Multi-Beam Design for Extremely Large-Scale RIS Assisted by Multi-Start Adam AlgorithmProceedings of the 2024 12th International Conference on Communications and Broadband Networking10.1145/3688636.3688649(70-75)Online publication date: 24-Jul-2024
  • (2024)Generation of Tourist Routes Considering Preferences and Public Transport Using Artificial Intelligence Planning TechniquesComputational Logistics10.1007/978-3-031-71993-6_11(164-175)Online publication date: 9-Sep-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media