research-article

Pareto Frontier Approximation Network (PA-Net) to Solve Bi-objective TSP

Authors:

Sharareh Taghipour,

Sajad SaeediAuthors Info & Claims

2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)

Pages 1198 - 1205

https://doi.org/10.1109/CASE49997.2022.9926520

Published: 20 August 2022 Publication History

Abstract

The travelling salesperson problem (TSP) is a classic resource allocation problem used to find an optimal order of doing a set of tasks while minimizing (or maximizing) an associated objective function. It is widely used in robotics for applications such as planning and scheduling. In this work, we solve TSP for two objectives using reinforcement learning (RL). Often in multi-objective optimization problems, the associated objective functions can be conflicting in nature. In such cases, the optimality is defined in terms of Pareto optimality. A set of these Pareto optimal solutions in the objective space form a Pareto front (or frontier). Each solution has its tradeoff. We present the Pareto frontier approximation network (PA-Net), a network that generates good approximations of the Pareto front for the bi-objective travelling salesperson problem (BTSP). Firstly, BTSP is converted into a constrained optimization problem. We then train our network to solve this constrained problem using the Lagrangian relaxation and policy gradient. With PA-Net we improve the performance over an existing deep RL-based method. The average improvement in the hypervolume metric, which is used to measure the optimality of the Pareto front, is 2.3%. At the same time, PANet has 4.5× faster inference time. Finally, we present the application of PA-Net to find optimal visiting order in a robotic navigation task/coverage planning. Our code is available on the project website<sup>1</sup>.

References

[1]

Y. Xu and C. Che, “A brief review of the intelligent algorithm for traveling salesman problem in UAV route planning,” in 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, 2019, pp. 1–7.

[2]

Z. Yu, L. Jinhai, G. Guochang, Z. Rubo, and Y. Haiyan, “An implementation of evolutionary computation for path planning of cooperative mobile robots,” in Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No. 02EX527), vol. 3. IEEE, 2002, pp. 1798–1802.

[3]

P. T. Zacharia and N. Aspragathos, “Optimal robot task scheduling based on genetic algorithms,” Robotics and Computer-Integrated Manufacturing, vol. 21, no. 1, pp. 67–79, 2005.

[4]

R. Bormann, F. Jordan, J. Hampp, and M. Hägele, “Indoor coverage path planning: Survey, implementation, analysis,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 1718–1725.

[5]

T. Lust and J. Teghem, “The multi-objective traveling salesman problem: a survey and a new approach,” in Advances in Multi-Objective Nature Inspired Computing. Springer, 2010, pp. 119–141.

[6]

B. A. Beirigo and A. G. dos Santos, “Application of NSGA-II framework to the travel planning problem using real-world travel data,” in 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2016, pp. 746–753.

[7]

W. Peng, Q. Zhang, and H. Li, “Comparison between moea/d and NSGA-II on the multi-objective travelling salesman problem,” in Multi-objective memetic algorithms. Springer, 2009, pp. 309–324.

[8]

A. Jaszkiewicz, “On the performance of multiple-objective genetic local search on the 0/1 knapsack problem-a comparative experiment,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 4, pp. 402–412, 2002.

Digital Library

[9]

L. Ke, Q. Zhang, and R. Battiti, “Hybridization of decomposition and local search for multiobjective optimization,” IEEE transactions on cybernetics, vol. 44, no. 10, pp. 1808–1820, 2014.

[10]

X. Cai, Y. Li, Z. Fan, and Q. Zhang, “An external archive guided multi-objective evolutionary algorithm based on decomposition for combinatorial optimization,” IEEE Transactions on Evolutionary Computation, vol. 19, no. 4, pp. 508–523, 2014.

[11]

X. Zhang, Y. Tian, R. Cheng, and Y. Jin, “A decision variable clustering-based evolutionary algorithm for large-scale many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 1, pp. 97–112, 2016.

[12]

S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.

[13]

K. Li, T. Zhang, and R. Wang, “Deep reinforcement learning for multiobjective optimization,” IEEE Transactions on Cybernetics, 2020.

[14]

Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinatorial optimization: a methodological tour d’horizon,” European Journal of Operational Research, 2020.

[15]

N. Vesselinova, R. Steinert, D. F. Perez-Ramirez, and M. Boman, “Learning combinatorial optimization on graphs: A survey with applications to networking,” IEEE Access, vol. 8, pp. 120388–120416, 2020.

[16]

N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, “Reinforcement learning for combinatorial optimization: A survey,” arXiv preprint arXiv:2003.03600, 2020.

[17]

O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Advances in neural information processing systems, 2015, pp. 2692–2700.

Digital Library

[18]

C. K. Joshi, T. Laurent, and X. Bresson, “An efficient graph convolutional network technique for the travelling salesman problem,” arXiv preprint arXiv:1906.01227, 2019.

[19]

I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization with reinforcement learning,” arXiv preprint arXiv:1611.09940, 2016.

[20]

M. Deudon, P. Cournut, A. Lacoste, Y. Adulyasak, and L.-M. Rousseau, “Learning heuristics for the tsp by policy gradient,” in International conference on the integration of constraint programming, artificial intelligence, and operations research. Springer, 2018, pp. 170–181.

[21]

W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!” arXiv preprint arXiv:1803.08475, 2018.

[22]

Y. Ma, J. Li, Z. Cao, W. Song, L. Zhang, Z. Chen, and J. Tang, “Learning to iteratively solve routing problems with dual-aspect collaborative transformer,” Advances in Neural Information Processing Systems, vol. 34, 2021.

[23]

O. Sener and V. Koltun, “Multi-task learning as multi-objective optimization,” in Advances in Neural Information Processing Systems, 2018, pp. 527–538.

[24]

X. Lin, H.-L. Zhen, Z. Li, Q.-F. Zhang, and S. Kwong, “Pareto multitask learning,” in Advances in Neural Information Processing Systems, 2019, pp. 12060–12070.

[25]

D. Mahapatra and V. Rajan, “Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization,” in International Conference on Machine Learning. PMLR, 2020, pp. 6597–6607.

[26]

M. Ruchte and J. Grabocka, “Efficient multi-objective optimization for deep learning,” arXiv preprint arXiv:2103.13392, 2021.

[27]

A. Navon, A. Shamsian, G. Chechik, and E. Fetaya, “Learning the pareto front with hypernetworks,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=NjF772F4ZZR

[28]

D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A survey of multi-objective sequential decision-making,” Journal of Artificial Intelligence Research, vol. 48, pp. 67–113, 2013.

[29]

S. Parisi, M. Pirotta, N. Smacchia, L. Bascetta, and M. Restelli, “Policy gradient approaches for multi-objective sequential decision making,” in 2014 International Joint Conference on Neural Networks (IJCNN). IEEE, 2014, pp. 2323–2330.

[30]

P. Vamplew, R. Issabekov, R. Dazeley, C. Foale, A. Berry, T. Moore, and D. Creighton, “Steering approaches to pareto-optimal multiobjective reinforcement learning,” Neurocomputing, vol. 263, pp. 26–38, 2017.

[31]

R. Yang, X. Sun, and K. Narasimhan, “A generalized algorithm for multi-objective reinforcement learning and policy adaptation,” in Advances in Neural Information Processing Systems, 2019, pp. 14636–14647.

[32]

S. Parisi, M. Pirotta, and M. Restelli, “Multi-objective reinforcement learning through continuous pareto manifold approximation,” Journal of Artificial Intelligence Research, vol. 57, pp. 187–227, 2016.

Digital Library

[33]

G. Mavrotas, “Effective implementation of the ε-constraint method in multi-objective mathematical programming problems,” Applied mathematics and computation, vol. 213, no. 2, pp. 455–465, 2009.

Digital Library

[34]

A. Chinchuluun and P. M. Pardalos, “A survey of recent developments in multiobjective optimization,” Annals of Operations Research, vol. 154, no. 1, pp. 29–50, 2007.

[35]

C. C. Coello, C. Dhaenens, and L. Jourdan, Advances in multi-objective nature inspired computing.Springer, 2009, vol. 272.

[36]

I. Das and J. E. Dennis, “Normal-boundary intersection: A new method for generating the pareto surface in nonlinear multicriteria optimization problems,” SIAM journal on optimization, vol. 8, no. 3, pp. 631–657, 1998.

Digital Library

[37]

M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác, “Reinforcement learning for solving the vehicle routing problem,” Advances in neural information processing systems, vol. 31, 2018.

[38]

C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,” arXiv preprint arXiv:1805.11074, 2018.

[39]

R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, vol. 8, no. 3-4, pp. 229–256, 1992.

Digital Library

[40]

Q. Sykora, M. Ren, and R. Urtasun, “Multi-agent routing value iteration network,” in International Conference on Machine Learning. PMLR, 2020, pp. 9300–9310.

[41]

J.-F. Bérubé, M. Gendreau, and J.-Y. Potvin, “An exact phi-constraint method for bi-objective combinatorial optimization problems: Application to the traveling salesman problem with profits,” European journal of operational research, vol. 194, no. 1, pp. 39–50, 2009.

[42]

C. Audet, J. Bigeon, D. Cartier, S. Le Digabel, and L. Salomon, “Performance indicators in multiobjective optimization,” European journal of operational research, 2020.

[43]

A.-I. Toma, H.-Y. Hsueh, H. A. Jaafar, R. Murai, P. H. Kelly, and S. Saeedi, “PathBench: A benchmarking platform for classical and learned path planning algorithms,” in 2021 18th Conference on Robots and Vision (CRV). IEEE, 2021, pp. 79–86.

Cited By

Wu YFan MCao ZGao RHou YSartoretti GDastani MSichman JAlechina NDignum V(2024)Collaborative Deep Reinforcement Learning for Solving Multi-Objective Vehicle Routing ProblemsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663059(1956-1965)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663059

Index Terms

Pareto Frontier Approximation Network (PA-Net) to Solve Bi-objective TSP
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Theory of computation
  1. Design and analysis of algorithms

Index terms have been assigned to the content through auto-classification.

Recommendations

An improved MOEA/D algorithm for bi-objective optimization problems with complex Pareto fronts and its application to structural optimization

An improved MOEA/D (iMOEA/D) is proposed for bi-objective optimization problems with complex Pareto fronts.An adaptive replacement strategy and a stopping criterion are integrated into iMOEA/D.iMOEA/D is evaluated using seven complicated benchmark ...
Approximating Pareto frontier using a hybrid line search approach

The aggregation of objectives in multiple criteria programming is one of the simplest and widely used approach. But it is well known that this technique sometimes fail in different aspects for determining the Pareto frontier. This paper proposes a new ...
Multi-objective reinforcement learning with continuous pareto frontier approximation
AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

This paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs). We propose a policy-based approach that exploits gradient information to generate solutions close to the Pareto ones. ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)

Aug 2022

1894 pages

Copyright © 2022.

Publisher

IEEE Press

Publication History

Published: 20 August 2022

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu YFan MCao ZGao RHou YSartoretti GDastani MSichman JAlechina NDignum V(2024)Collaborative Deep Reinforcement Learning for Solving Multi-Objective Vehicle Routing ProblemsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663059(1956-1965)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663059

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents