Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/CASE49997.2022.9926520guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Pareto Frontier Approximation Network (PA-Net) to Solve Bi-objective TSP

Published: 20 August 2022 Publication History

Abstract

The travelling salesperson problem (TSP) is a classic resource allocation problem used to find an optimal order of doing a set of tasks while minimizing (or maximizing) an associated objective function. It is widely used in robotics for applications such as planning and scheduling. In this work, we solve TSP for two objectives using reinforcement learning (RL). Often in multi-objective optimization problems, the associated objective functions can be conflicting in nature. In such cases, the optimality is defined in terms of Pareto optimality. A set of these Pareto optimal solutions in the objective space form a Pareto front (or frontier). Each solution has its tradeoff. We present the Pareto frontier approximation network (PA-Net), a network that generates good approximations of the Pareto front for the bi-objective travelling salesperson problem (BTSP). Firstly, BTSP is converted into a constrained optimization problem. We then train our network to solve this constrained problem using the Lagrangian relaxation and policy gradient. With PA-Net we improve the performance over an existing deep RL-based method. The average improvement in the hypervolume metric, which is used to measure the optimality of the Pareto front, is 2.3%. At the same time, PANet has 4.5&#x00D7; faster inference time. Finally, we present the application of PA-Net to find optimal visiting order in a robotic navigation task/coverage planning. Our code is available on the project website<sup>1</sup>.

References

[1]
Y. Xu and C. Che, “A brief review of the intelligent algorithm for traveling salesman problem in UAV route planning,” in 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, 2019, pp. 1–7.
[2]
Z. Yu, L. Jinhai, G. Guochang, Z. Rubo, and Y. Haiyan, “An implementation of evolutionary computation for path planning of cooperative mobile robots,” in Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No. 02EX527), vol. 3. IEEE, 2002, pp. 1798–1802.
[3]
P. T. Zacharia and N. Aspragathos, “Optimal robot task scheduling based on genetic algorithms,” Robotics and Computer-Integrated Manufacturing, vol. 21, no. 1, pp. 67–79, 2005.
[4]
R. Bormann, F. Jordan, J. Hampp, and M. Hägele, “Indoor coverage path planning: Survey, implementation, analysis,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 1718–1725.
[5]
T. Lust and J. Teghem, “The multi-objective traveling salesman problem: a survey and a new approach,” in Advances in Multi-Objective Nature Inspired Computing. Springer, 2010, pp. 119–141.
[6]
B. A. Beirigo and A. G. dos Santos, “Application of NSGA-II framework to the travel planning problem using real-world travel data,” in 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2016, pp. 746–753.
[7]
W. Peng, Q. Zhang, and H. Li, “Comparison between moea/d and NSGA-II on the multi-objective travelling salesman problem,” in Multi-objective memetic algorithms. Springer, 2009, pp. 309–324.
[8]
A. Jaszkiewicz, “On the performance of multiple-objective genetic local search on the 0/1 knapsack problem-a comparative experiment,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 4, pp. 402–412, 2002.
[9]
L. Ke, Q. Zhang, and R. Battiti, “Hybridization of decomposition and local search for multiobjective optimization,” IEEE transactions on cybernetics, vol. 44, no. 10, pp. 1808–1820, 2014.
[10]
X. Cai, Y. Li, Z. Fan, and Q. Zhang, “An external archive guided multi-objective evolutionary algorithm based on decomposition for combinatorial optimization,” IEEE Transactions on Evolutionary Computation, vol. 19, no. 4, pp. 508–523, 2014.
[11]
X. Zhang, Y. Tian, R. Cheng, and Y. Jin, “A decision variable clustering-based evolutionary algorithm for large-scale many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 1, pp. 97–112, 2016.
[12]
S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
[13]
K. Li, T. Zhang, and R. Wang, “Deep reinforcement learning for multiobjective optimization,” IEEE Transactions on Cybernetics, 2020.
[14]
Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinatorial optimization: a methodological tour d’horizon,” European Journal of Operational Research, 2020.
[15]
N. Vesselinova, R. Steinert, D. F. Perez-Ramirez, and M. Boman, “Learning combinatorial optimization on graphs: A survey with applications to networking,” IEEE Access, vol. 8, pp. 120388–120416, 2020.
[16]
N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, “Reinforcement learning for combinatorial optimization: A survey,” arXiv preprint arXiv:2003.03600, 2020.
[17]
O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Advances in neural information processing systems, 2015, pp. 2692–2700.
[18]
C. K. Joshi, T. Laurent, and X. Bresson, “An efficient graph convolutional network technique for the travelling salesman problem,” arXiv preprint arXiv:1906.01227, 2019.
[19]
I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization with reinforcement learning,” arXiv preprint arXiv:1611.09940, 2016.
[20]
M. Deudon, P. Cournut, A. Lacoste, Y. Adulyasak, and L.-M. Rousseau, “Learning heuristics for the tsp by policy gradient,” in International conference on the integration of constraint programming, artificial intelligence, and operations research. Springer, 2018, pp. 170–181.
[21]
W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!” arXiv preprint arXiv:1803.08475, 2018.
[22]
Y. Ma, J. Li, Z. Cao, W. Song, L. Zhang, Z. Chen, and J. Tang, “Learning to iteratively solve routing problems with dual-aspect collaborative transformer,” Advances in Neural Information Processing Systems, vol. 34, 2021.
[23]
O. Sener and V. Koltun, “Multi-task learning as multi-objective optimization,” in Advances in Neural Information Processing Systems, 2018, pp. 527–538.
[24]
X. Lin, H.-L. Zhen, Z. Li, Q.-F. Zhang, and S. Kwong, “Pareto multitask learning,” in Advances in Neural Information Processing Systems, 2019, pp. 12060–12070.
[25]
D. Mahapatra and V. Rajan, “Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization,” in International Conference on Machine Learning. PMLR, 2020, pp. 6597–6607.
[26]
M. Ruchte and J. Grabocka, “Efficient multi-objective optimization for deep learning,” arXiv preprint arXiv:2103.13392, 2021.
[27]
A. Navon, A. Shamsian, G. Chechik, and E. Fetaya, “Learning the pareto front with hypernetworks,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=NjF772F4ZZR
[28]
D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A survey of multi-objective sequential decision-making,” Journal of Artificial Intelligence Research, vol. 48, pp. 67–113, 2013.
[29]
S. Parisi, M. Pirotta, N. Smacchia, L. Bascetta, and M. Restelli, “Policy gradient approaches for multi-objective sequential decision making,” in 2014 International Joint Conference on Neural Networks (IJCNN). IEEE, 2014, pp. 2323–2330.
[30]
P. Vamplew, R. Issabekov, R. Dazeley, C. Foale, A. Berry, T. Moore, and D. Creighton, “Steering approaches to pareto-optimal multiobjective reinforcement learning,” Neurocomputing, vol. 263, pp. 26–38, 2017.
[31]
R. Yang, X. Sun, and K. Narasimhan, “A generalized algorithm for multi-objective reinforcement learning and policy adaptation,” in Advances in Neural Information Processing Systems, 2019, pp. 14636–14647.
[32]
S. Parisi, M. Pirotta, and M. Restelli, “Multi-objective reinforcement learning through continuous pareto manifold approximation,” Journal of Artificial Intelligence Research, vol. 57, pp. 187–227, 2016.
[33]
G. Mavrotas, “Effective implementation of the ε-constraint method in multi-objective mathematical programming problems,” Applied mathematics and computation, vol. 213, no. 2, pp. 455–465, 2009.
[34]
A. Chinchuluun and P. M. Pardalos, “A survey of recent developments in multiobjective optimization,” Annals of Operations Research, vol. 154, no. 1, pp. 29–50, 2007.
[35]
C. C. Coello, C. Dhaenens, and L. Jourdan, Advances in multi-objective nature inspired computing.Springer, 2009, vol. 272.
[36]
I. Das and J. E. Dennis, “Normal-boundary intersection: A new method for generating the pareto surface in nonlinear multicriteria optimization problems,” SIAM journal on optimization, vol. 8, no. 3, pp. 631–657, 1998.
[37]
M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác, “Reinforcement learning for solving the vehicle routing problem,” Advances in neural information processing systems, vol. 31, 2018.
[38]
C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,” arXiv preprint arXiv:1805.11074, 2018.
[39]
R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, vol. 8, no. 3-4, pp. 229–256, 1992.
[40]
Q. Sykora, M. Ren, and R. Urtasun, “Multi-agent routing value iteration network,” in International Conference on Machine Learning. PMLR, 2020, pp. 9300–9310.
[41]
J.-F. Bérubé, M. Gendreau, and J.-Y. Potvin, “An exact phi-constraint method for bi-objective combinatorial optimization problems: Application to the traveling salesman problem with profits,” European journal of operational research, vol. 194, no. 1, pp. 39–50, 2009.
[42]
C. Audet, J. Bigeon, D. Cartier, S. Le Digabel, and L. Salomon, “Performance indicators in multiobjective optimization,” European journal of operational research, 2020.
[43]
A.-I. Toma, H.-Y. Hsueh, H. A. Jaafar, R. Murai, P. H. Kelly, and S. Saeedi, “PathBench: A benchmarking platform for classical and learned path planning algorithms,” in 2021 18th Conference on Robots and Vision (CRV). IEEE, 2021, pp. 79–86.

Cited By

View all
  • (2024)Collaborative Deep Reinforcement Learning for Solving Multi-Objective Vehicle Routing ProblemsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663059(1956-1965)Online publication date: 6-May-2024

Index Terms

  1. Pareto Frontier Approximation Network (PA-Net) to Solve Bi-objective TSP
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)
        Aug 2022
        1894 pages

        Publisher

        IEEE Press

        Publication History

        Published: 20 August 2022

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 14 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Collaborative Deep Reinforcement Learning for Solving Multi-Objective Vehicle Routing ProblemsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663059(1956-1965)Online publication date: 6-May-2024

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media