research-article

Multi-start team orienteering problem for UAS mission re-planning with data-efficient deep reinforcement learning

Authors:

Jaemyung AhnAuthors Info & Claims

Applied Intelligence, Volume 54, Issue 6

Pages 4467 - 4489

https://doi.org/10.1007/s10489-024-05367-4

Published: 27 March 2024 Publication History

Abstract

In this paper, we study the Multi-Start Team Orienteering Problem (MSTOP), a mission re-planning problem where vehicles are initially located away from the depot and have different amounts of fuel. We consider/assume the goal of multiple vehicles is to travel to maximize the sum of collected profits under resource (e.g., time, fuel) consumption constraints. Such re-planning problems occur in a wide range of intelligent UAS applications where changes in the mission environment force the operation of multiple vehicles to change from the original plan. To solve this problem with deep reinforcement learning (RL), we develop a policy network with self-attention on each partial tour and encoder-decoder attention between the partial tour and the remaining nodes. We propose a modified REINFORCE algorithm where the greedy rollout baseline is replaced by a local mini-batch baseline based on multiple, possibly non-duplicate sample rollouts. By drawing multiple samples per training instance, we can learn faster and obtain a stable policy gradient estimator with significantly fewer instances. The proposed training algorithm outperforms the conventional greedy rollout baseline, even when combined with the maximum entropy objective. The efficiency of our method is further demonstrated in two classical problems – the Traveling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). The experimental results show that our method enables models to develop more effective heuristics and performs competitively with the state-of-the-art deep reinforcement learning methods.

References

[1]

Coutinho WP, Battarra M, and Fliege J The unmanned aerial vehicle routing and trajectory optimisation problem, a taxonomic review Comput Ind Eng 2018 120 116-28

Digital Library

[2]

Rojas Viloria D, Solano-Charris EL, Muñoz-Villamizar A, and Montoya-Torres JR Unmanned aerial vehicles/drones in vehicle routing problems: a literature review Int Trans Oper Res 2021 28 1626-57

[3]

Kool W, Hoof HV, Welling M (2019) Attention, Learn to Solve Routing Problems! In: 2019 International Conference on Learning Representations (ICLR).

[4]

Kwon Y-D, Choo J, Kim B, Yoon I, Gwon Y, Min S (2020) Pomo: Policy optimization with multiple optima for reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS), 21188–98.

[5]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–10. Curran Associates Inc, Long Beach, California, USA. https://dl.acm.org/doi/10.5555/3295222.3295349

[6]

Bresson X, Laurent T (2021) The transformer network for the traveling salesman problem. In: ArXiv.

[7]

Peng B, Wang J, Zhang Z (2020) A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems. In.

[8]

Eysenbach B, Levine S (2021) Maximum entropy rl (provably) solves some robust rl problems. In.

[9]

Ahmed Z, Le Roux N, Norouzi M, Schuurmans D (2019) Understanding the impact of entropy on policy optimization. In: International conference on machine learning, 151–60. PMLR.

[10]

Archetti C, Grazia Speranza M, Vigo D (n.d.) Chapter 10: Vehicle Routing Problems with Profits. In: Vehicle Routing (MOS-SIAM Series on Optimization). https://epubs.siam.org/doi/abs/10.1137/1.9781611973594.ch10

[11]

Archetti C, Bianchessi N, and Speranza MG Optimal solutions for routing problems with profits Discret Appl Math 2013 161 547-57

Digital Library

[12]

Vansteenwegen P, Souffriau W, and Van Oudheusden D The orienteering problem: a survey Eur J Oper Res 2011 209 1-10

[13]

Butt SE and Ryan DM An optimal solution procedure for the multiple tour maximum collection problem using column generation Comput Oper Res 1999 26 427-41

Digital Library

[14]

Boussier S, Feillet D, and Gendreau M An exact algorithm for team orienteering problems 4OR 2007 5 211-30

[15]

Bono G, Dibangoye JS, Simonin O, Matignon L, and Pereyron F Solving multi-agent routing problems using deep attention mechanisms IEEE Trans Intell Transp Syst 2021 22 7804-13

[16]

Lin S-W Solving the team orienteering problem using effective multi-start simulated annealing Appl Soft Comput 2013 13 1064-73

Digital Library

[17]

Lin S-W and Yu VF Solving the team orienteering problem with time windows and mandatory visits by multi-start simulated annealing Comput Ind Eng 2017 114 195-205

Digital Library

[18]

Hapsari I, Surjandari I, and Komarudin K Solving multi-objective team orienteering problem with time windows using adjustment iterated local search J Ind Eng Int 2019 15 679-93

[19]

Bello I, Pham H, Le QV, Norouzi M, Bengio S (2017) Neural Combinatorial Optimization with Reinforcement Learning. In: 2017 International Conference on Learning Representations (ICLR).

[20]

Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in Neural Information Processing Systems (NeurIPS).

[21]

Khalil E, Dai H, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems (NeurIPS).

[22]

Nazari M, Oroojlooy A, Snyder L, Takác M (2018) Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems (NeurIPS).

[23]

Deudon M, Cournut P, Lacoste A, Adulyasak Y, Rousseau LM (2018) Learning heuristics for the tsp by policy gradient. In: International conference on the integration of constraint programming, artificial intelligence, and operations research, 170–81. Springer.

Digital Library

[24]

Cappart Q, Moisan T, Rousseau L-M, Prémont-Schwarz I, Cire A (2020) Combining reinforcement learning and constraint programming for combinatorial optimization. In: ArXiv.

[25]

Li J, Ma Y, Gao R, Cao Z, Lim A, Song W, and Zhang J Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem IEEE Trans Cybern 2021

[26]

Li K, Zhang T, Wang R, Wang Y, Han Y, and Wang L Deep reinforcement learning for combinatorial optimization: covering salesman problems IEEE Trans Cybern 2021

[27]

Xu Y, Fang M, Chen L, Gangyan X, Yali D, and Zhang C Reinforcement learning with multiple relational attention for solving vehicle routing problems IEEE Trans Cybern 2021

[28]

Pan W and Liu SQ Deep reinforcement learning for the dynamic and uncertain vehicle routing problem Appl Intell 2023 53 405-22

Digital Library

[29]

Wang Q VARL: a variational autoencoder-based reinforcement learning Framework for vehicle routing problems Appl Intell 2022 52 8910-23

Digital Library

[30]

Joshi CK, Laurent T, Bresson X (2019) On learning paradigms for the travelling salesman problem. In ArXiv.

[31]

Kool W, van Hoof H, Welling M (2019) Buy 4 reinforce samples, get a baseline for free! In: ICLR 2019 Deep Reinforcement Learning meets Structured Prediction Workshop. https://openreview.net/forum?id=r1lgTGL5DE. Accessed 23 Jun 2022

[32]

Kool W, van Hoof H, Welling M (2019) Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. In: International Conference on Machine Learning (ICML), 3499–508. PMLR.

[33]

Croes GA (1958) A method for solving traveling-salesman problems. Oper Res 6:791–812. https://www.jstor.org/stable/167074. Accessed 23 Jun 2022

[34]

Gurobi Optimization, LLC (2018) Gurobi optimizer reference manual. https://www.gurobi.com

[35]

Williams RJ Simple statistical gradient-following algorithms for connectionist reinforcement learning Mach Learn 1992 8 229-56

Digital Library

[36]

Sutton RS and Barto AG Reinforcement learning: An introduction 2018 MIT press Cambridge

Digital Library

[37]

Sultana N, Chan J, Sarwar T, Qin AK (2021) Learning to Optimise Routing Problems using Policy Optimisation. In: 2021 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.

[38]

Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. In.

[39]

Tsiligirides T (1984) Heuristic methods applied to orienteering. J Oper Res Soc 35:797–809. https://www.jstor.org/stable/2582629. Accessed 23 Jun 2022

Cited By

Zhou KZhou WXiao SMeng L(2024)Near-Field Multi-Beam Design for Extremely Large-Scale RIS Assisted by Multi-Start Adam AlgorithmProceedings of the 2024 12th International Conference on Communications and Broadband Networking10.1145/3688636.3688649(70-75)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3688636.3688649
Elizalde-Ramírez FMaya-Padrón C(2024)Generation of Tourist Routes Considering Preferences and Public Transport Using Artificial Intelligence Planning TechniquesComputational Logistics10.1007/978-3-031-71993-6_11(164-175)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-71993-6_11

Recommendations

Solving the team orienteering problem using effective multi-start simulated annealing

The team orienteering problem (TOP) is known as an NP-complete problem. A set of locations is provided and a score is collected from the visit to each location. The objective is to maximize the total score given a fixed time limit for each available ...
An efficient evolutionary algorithm for the orienteering problem

New evolutionary algorithm for solving the Orienteering Problem.It includes a new node inclusion heuristic and adapted Edge Recombination crossover.Compared with Branch-and-Cut, GRASP with PR and 2-Parameter Interactive Algorithm.Competitive results for ...
Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning
Abstract
Recent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam ...

Comments

Information & Contributors

Information

Published In

cover image Applied Intelligence

Applied Intelligence Volume 54, Issue 6

Mar 2024

730 pages

Issue’s Table of Contents

© The Author(s) 2024.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 27 March 2024

Accepted: 26 February 2024

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea
Korea Advanced Institute of Science and Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou KZhou WXiao SMeng L(2024)Near-Field Multi-Beam Design for Extremely Large-Scale RIS Assisted by Multi-Start Adam AlgorithmProceedings of the 2024 12th International Conference on Communications and Broadband Networking10.1145/3688636.3688649(70-75)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3688636.3688649
Elizalde-Ramírez FMaya-Padrón C(2024)Generation of Tourist Routes Considering Preferences and Public Transport Using Artificial Intelligence Planning TechniquesComputational Logistics10.1007/978-3-031-71993-6_11(164-175)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-71993-6_11

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents