1 Introduction
Combinatorial optimization deals with problems with an objective function that has to be maximized over a set of combinatorial alternatives [Papadimitriou and Steiglitz
1998]. The most naive way of obtaining the optimal solution is to list all the feasible solutions, evaluating them using the objective function, and selecting the optimal one. Nevertheless, the previous brute-force approach lacks practicality when the size of the problem is too large, as the time and computational resources needed to solve it grow exponentially with the problem size.
Traditionally,
Combinatorial Optimization Problems (COPs) have been addressed using either exact or heuristic methods. Exact methods provide an optimal solution if given sufficient time, but the time required can grow exponentially with the size of the problem. In contrast, heuristic methods do not guarantee an optimal solution but often produce reasonably good solutions within a limited time frame [Pearl
1984]. In general, the effectiveness of a heuristic method depends on its ability to identify and exploit the relevant information of the problem at hand. In this line, as a generalization of heuristic algorithms,
metaheuristics (MH) introduce higher-level generic procedures that guide the search process, making them “general-purpose methods” [Blum and Roli
2003].
Even though plenty of work has been done in the development of MH algorithms, in the past decade the research in the area has reached a point of maturity, where the number of relevant algorithmic proposals has been reduced. At the same time, the expansion of
Deep Learning (DL) techniques to the optimization domain has brought novel research lines. The recent success of DL in fields such as machine translation [Cho et al.
2014], biology [Jumper et al.
2021], or board games [Schrittwieser et al.
2020] has not only attracted the attention of many machine learning practitioners into the optimization field, but has also captured the interest of the optimization community regarding the possible use cases of DL.
Although the first attempts to include
Neural Networks (NN) dates back to the ’80s [Hopfield and Tank
1985], this research line has not attracted the interest of the community until recently. In fact, in the past few decades, NNs have become an interesting alternative, due to the increase in computational capacity and the development of complex NN models, which enable the design of competitive algorithms. Recent reviews [Bengio et al.
2021; Talbi
2021] present taxonomies that differentiate the ways and the stages in which DL can be applied to COPs: end-to-end methods, improvement methods, and hybrid methods. Nonetheless, end-to-end methods, also called
Neural Combinatorial Optimization (NCO) methods [Bello et al.
2016], present an innovation, as they bring a different way of optimizing COPs under a learning-inference scheme, an idea not observed until recently. In this two-step approach, a model is first trained, denoted as learning phase, looking for the best parameter values. Once the model has been trained, it can be applied to any instance (inference phase), for which we aim to provide the best possible solution.
Few works have compared their performance to the existing state-of-the-art metaheuristics for the approached problem. In fact, the issues arising from the comparison of metaheuristic algorithms and NCO methods have not been addressed exhaustively. In this work, we present a critical analysis on the incorporation of NCO algorithms into the classical combinatorial optimization framework, focusing on four fundamental aspects from the optimization perspective: (1)
Performance. How good are the solutions provided by these models? Are they competitive with the state-of-the-art methods? (2)
Computational cost. Considering the computing resources and the time required to train these NN-based models, are they affordable for real-world problem sizes? (3)
Training data & Transferability. Which type and amount of data is required to train the model? Is it necessary to have hundreds or even thousands of instances of the real-world problem to train the model? Can the model transfer the knowledge learned from random instances to other benchmarks and/or other sizes? (4)
Model Reusability. Classical heuristics are carefully handcrafted by using the expert knowledge about the optimization problem. Conversely, NCO methods claim the ability to autonomously learn effective heuristics without any human interaction [Bello et al.
2016; Kwon et al.
2020]. But, how easily can these NCO architectures be applied to different problems without using problem-specific knowledge?
To illustrate the analysis, we guide the reader during the process of implementation and evaluation of an NCO model, exhaustively addressing the aforementioned aspects. To that end, and to enrich the discussion, we choose two practical cases: the
Linear Ordering Problem (LOP) [Ceberio et al.
2015] and the
Permutation Flowshop Scheduling Problem (PFSP) [Gupta and Stafford Jr
2006], well-known NP-hard COPs but with few or non-existent NCO studies for each of the problems. Conducted experiments show that, if both solution quality and computing time are considered, metaheuristics and NCO proposals would belong to the Pareto front (non-dominated to each other), the former providing better solutions and the latter requiring shorter execution times. Regarding NCO methods, they are able to beat classical constructive proposals with a fast inference time, which makes them potentially interesting as a substitute for heuristics in tasks such as online optimization or hybridization purposes. Moreover, the NN model is capable of generalizing the learned knowledge to other types of instances and even to larger instances than those used for training. The experimentation illustrates that it is possible to reuse the proposed NCO model by applying some transformations to the optimization problem, but it has a certain loss of performance. Not limited to that, a number of promising research lines are described for future investigations on the application of end-to-end models over COPs.
The rest of the article is organized as follows: Section
2 introduces a review of meaningful works in the NCO framework. Section
3 presents four fundamental aspects of NCO algorithms and analyzes the current literature based on them. The analysis is illustrated with a case of study of a NCO model specifically developed for this work and applied on the Linear Ordering Problem and the Permutation Flowshop Scheduling Problem, which are defined in Section
4. A broad set of experiments is conducted in Section
5 and obtained results are discussed in Section
6, where we also suggest new directions for future work in the NCO area. Section
7 concludes the article.
2 Advances On Neural Combinatorial Optimization
In general terms, the optimization process of end-to-end models follows a training-inference pipeline. In the training phase, a set of instances is used to learn the parameters of the NN model. In this step, two main learning scenarios arise: Supervised Learning (SL) and Reinforcement Learning (RL).
In SL, the NN model learns to imitate an optimal (or good) policy [Gasse et al.
2019]. The NN model is fed with a collection of labeled instances (<instance, best solution>), and this requires each instance to be previously solved by means of an exact or approximate solver. An example of this approach can be seen in Vinyals et al. [
2015], where the authors propose an end-to-end algorithm to solve the Traveling Salesman Problem.
1 They used an architecture called Pointer Network, a sequence-to-sequence model that embeds the information of the instance and constructs a solution for the problem iteratively, adding a city to the solution at a time.
Using SL presents some serious drawbacks, since obtaining a large set of labels is not usually tractable and affects the applicability of the method [Bengio et al.
2021]. Moreover, using SL may fail to abstract the problem knowledge when the policy used to obtain the labels is suboptimal or there are multiple optimal solutions. In contrast, RL has proven to be a more suitable procedure for solving COPs. In this scenario, an agent learns how to act, without supervision, based on the rewards it receives through its optimization process, i.e., by experience [Sutton and Barto
2018]. In that context, Bello et al. [
2016] introduced the NCO framework, which uses RL to train a NN model to approximate solutions for (a set of) combinatorial problems in an end-to-end manner. In that paper, the authors outperformed the model presented by Vinyals et al. [
2015], using a similar architecture but replacing SL with RL. Furthermore, Bello et al. [
2016] achieved better results than a classical heuristic [Christofides
1976] and OR-Tools’ local search algorithm [Google
2016] on TSP instances up to 100 cities. At this point, it is worth noting that although the work by Bello et al. [
2016] is definitively remarkable, exact methods are able to solve 85,900-city instances to optimality, while problems with 100 cities can be solved in a matter of seconds [Applegate et al.
2006].
Motivated by the good results obtained by NCO in Bello et al. [
2016], most of the papers dealing with end-to-end models have followed a similar learning scheme, and the scientific advances have focused on changes in the used NN model architecture. As an improvement to Pointer Networks [Vinyals et al.
2015; Bello et al.
2016],
Graph Neural Networks (GNN) [Cappart et al.
2021a] address the limitation of having an order-invariant input, i.e., GNNs are capable of representing the features of the problem without considering any specific order of the input sequence. Khalil et al. [
2017a] proposed a GNN architecture called
Structure2Vector (S2V), which automatically learns policies for several graph problems such as
Maximum Cut (Max-Cut), Minimum Vertex Cover (MVC), and TSP. They implement a greedy meta-algorithm design where solutions are constructed by appending graph nodes (items) sequentially to the solutions based on the graph structure to satisfy the constraints of the problem. Concerned by the ability of GNNs to solve very large graphs, Manchanda et al. [
2019] demonstrated the capability of a
Graph Convolutional Neural Network (GCN) to solve instances of
Maximum Coverage Problem (MCP), MVC, and
Influence Maximization (IM) with up to millions of nodes. However, those sizes are difficult to solve for problems represented by dense or fully connected graphs, such as the TSP. In that sense, a comparable case is the work by Ma et al. [
2019], which studied the use of a so-called
Graph Pointer Network (GPN) that combines Ptr-Nets [Vinyals et al.
2015] and GNNs [Cappart et al.
2021a] with a Hierarchical Reinforcement Learning framework to solve TSP instances with up to 1,000 cities.
With the introduction of the Transformer architecture [Vaswani et al.
2017] and due to its good performance in multiple fields, mainly in natural language processing, recent works have tried to apply its main component, the attention mechanism, to solve different COPs. Deudon et al. [
2018] and Kool et al. [
2018] trained an architecture based on attention mechanisms to solve routing problems, improving previously reported results [Vinyals et al.
2015; Bello et al.
2016; Khalil et al.
2017a]. More recently, Kwon et al. [
2020] achieved even better performance for the TSP, specifically, they obtained on average a gap of 0.14% with respect to the optimum value in TSP instances of 100 cities. They used the model from Kool et al. [
2018] and presented an approach, called POMO, that takes different initializations of the same instance and forms a batch to perform the inference. Finally, Kwon et al. [
2021] propose a double-GNN with attention mechanisms that operates on complete bipartite graphs, making it suitable for problems based on data matrices, such as scheduling problems and linear/quadratic assignment problems.
The race to propose more efficient algorithms is bringing significant progress to the NCO field. Nevertheless, some aspects, such as training cost, transferability, and reusability, are put aside in these works. In what follows, we present a broad analysis to address the concerns that arise from the introduction of NCO in the conventional optimization paradigm and propose good practice guidelines to follow.
3 Critical Analysis
As discussed previously, NCO models seem to be a valuable tool in the field of combinatorial optimization. But are they competitive on a standalone basis with state-of-the-art algorithms? From the viewpoint of a combinatorial optimization practitioner, carrying a performance comparison between NCO and Conventional Optimization Algorithms (COAs), such as metaheuristics, presents a number of problems that need to be answered. The main difference between the application of COAs and NCO models comes from the optimization pipeline.
The general pipeline followed in a conventional optimization process usually starts with an (a set of) instance(s) to be solved and a computational budget. Depending on the problem class and the budget, an algorithm (or many) is selected along with its hyper-parameters, whose values are usually set to those recommended by the author(s). Subsequently, the algorithm starts the optimization process and once the budget expires, a result (solution to the problem instance) is provided.
Conversely, DL, and consequently NCO, runs a different pipeline. As mentioned in the introduction, the NCO pipeline involves two phases. The first, the training phase, consists of minimizing a loss function by adjusting the weights of a neural network and requires a significant amount of time and resources. Once the weights are fixed, the model is ready to be used (inference phase), providing, in a short time, a solution for each instance that is given. The inference phase can be repeated for several problem instances without the need of training the model again. This makes NCO suitable for optimization problems that need to be solved frequently and in a short period of time. Conversely, current state-of-the-art metaheuristics become superior when given a larger time budget. Placing both approaches side-by-side, it is easy to see that NCO introduces an optimization pipeline that clashes with the conventional framework in a number of aspects.
In what follows, we address the unanswered questions focusing on the presented four interrelated aspects that need to be studied to fairly compare these models with COAs and contribute to the scientific progress.
Performance analysis. When developing NCO approaches for a given problem, comparing their performance to the current state-of-the-art proposals is a must, not with the purpose of invalidating the NCO algorithm, but to put it into perspective, as is done when evaluating COAs. In that sense, making a fair comparison between NCO and COA is not trivial, as the experimental setup used in both paradigms differs. Traditionally when comparing COAs, two different stopping criteria are used: a limited computation time or a fixed number of objective-value evaluations, each of them having their supporters and detractors. In fact, most of the COAs have the ability to improve results if a larger budget is available, making it also difficult to establish a fair enough limit for all the algorithms included in a comparison. Therefore, reporting the real objective values obtained by each of the proposals, NCO and COA, in the comparison, although not using the same budgets, seems to be the only rigorous way to conduct a pure performance analysis.
Computational cost. NCO models need to be trained for several epochs, and once the training of the model has finished, they are ready to infer (solve) a large number of instances in a short period of time. Conversely, COAs face each instance individually and start the optimization procedure from scratch, without generally any knowledge transfer from instance to instance. In a real scenario, in which the model will be applied continuously to new instances, the training time could be ignored. However, the training time should be considered in the computational cost evaluation of NCO works, as it can be significant based on how often the model has to be updated (re-trained).
It should be noted that both COAs and NCO methods have specific hyper-parameters, such as population size or mutation rate for the former and the number of layers or the learning rate for the latter. As in both cases these hyper-parameters need to be tuned to optimize their performance, we decided not to include this calibration step in the computational cost analysis.
Another issue when comparing the computational efficiency of the available algorithms comes from the different programming languages used to code them and on the hardware in which they run. While COAs are generally written in C/C++ and deployed in CPUs, NCO models are mostly implemented in Python and use libraries optimized to carry out parallelized training and inference on GPUs. To perform a fair comparison, we should implement them in the same programming language and try to run them on similar hardware, which is neither natural nor efficient. So, it seems reasonable to compare the algorithms implemented and executed in the programming languages and hardware infrastructure, which the final practitioner will have easy access to, without the requirement of large overheads.
It has been broadly reported that the use of exact methods is intractable with very large NP-hard instances, as the required computation time grows exponentially. Similarly, there is a fairly high memory/computation cost when training NCO models, which grows with both the training batch and the instance size. For this reason, an analysis on the time and memory consumption of these algorithms can give an intuition of their limitations.
Training & Transferability. In the optimization field, COAs are tested on different functions and/or instances of a given problem. To measure and compare their performance, common testbeds are required such as real-world benchmarks. Unfortunately, as large sets of real-world problems (instances) are difficult to obtain, NCO models are usually trained using randomly generated instances. However, designing random generators capable of drawing instances from the desired target distribution is not trivial. Instead, a common strategy is to use uniform-distribution random generators, which are simple to create, even though they may not be a faithful reflection of real problem instances.
Besides that, it is key to study the transferability of NCO models, i.e., the capacity to apply learned insights to diverse test instances. This becomes particularly important when (1) test instances differ in size, (2) have different characteristics that are not covered by the used generator, or (3) when models are trained on small real-world sets. These situations could result in model overfitting and limited transferability.
Model reusability. As stated in the introduction, Bello et al. [
2016] and Kwon et al. [
2020] claim that NCO methods do not need any problem-specific knowledge. In our opinion, this statement is partially true. On the one hand, no knowledge is explicitly required during the training and inference phases. On the other hand, even if the model is applicable to a particular problem, to design a good-performing NCO model, using problem-specific knowledge during the design of its architecture is essential. Thus, studying model reusability in terms of applicability to (and performance on) different problems would help to explore new NCO architectures.
3.1 Literature Analysis
Once we have introduced the key aspects that, in our opinion, should be considered in all the new NCO proposals, let us examine the existing literature in the field, focusing on the aforementioned four aspects. Exact details have been collected in Table
1. In what follows, a summary is provided:
Performance. As can be seen in Table
1, most of the NCO models have been applied to routing problems. Regarding performance, the best proposals are able to beat constructive or baseline approaches, but they are outperformed by the state-of-the-art algorithms, which can solve large TSP instances in only a few seconds [Applegate et al.
2006]. As an exception, Manchanda et al. [
2019] claim that their algorithm is marginally better than the state-of-the-art for the Influence Maximization problem.
Computational cost. In the current literature, almost no information is provided regarding the training phase, such as the GPU hours and the type of GPU used, which are crucial for estimating training costs. The context in which the NCO model will be implemented conditions the impact of the training time. For example, a user that needs to solve a COP once a day could consider using a metaheuristic or even an exact method. Conversely, in an environment where problem instances are solved in a matter of seconds, NCO approaches could have a place, the training time being negligible.
Training data & Transferability. Most of the works use randomly generated instances for training, following a common trend in combinatorial optimization. However, the influence of the instances used for training and the transferability of the models to other sets of instances has not been studied in depth. In addition, models are mostly applied to toy-size instances, which may be due to the limitations of the models and their lack of scalability, both in terms of computational resources and/or performance.
Model reusability. As described in Table
1, the vast majority of works tackle routing problems, which has become the main playground of NCO algorithms. As a result, models for dealing with routing problems, or those that have an underlying graph structure, have been intensively studied. However, these models have not been tested on problems of a different nature, such as assignment, cutting, or packing, and thus, model reusability remains to be an open line for research.
As can be observed, the four main aspects have not been properly studied in the NCO literature. Thus, to bridge the gaps among all those works and provide an answer for the questions that we have raised, we conduct an exhaustive evaluation and a critical analysis of a purpose-built NCO model, designed for solving two well-known permutation based problems: LOP and PFSP.
5 Experimentation
In the following, we will illustrate the experimental application of the end-to-end model described in the previous section as well as consider a number of algorithms to be compared in the LOP and the PFSP. As the goal is to address the emerging questions already discussed in Section
3, we will conduct a set of experiments to answer them.
5.1 General Setting
There is a general setting in common for all the experiments, which will be described below.
Instances. As depicted in Section
3, we distinguish two main sources of instances needed to train and evaluate the models and COAs: instance generators and benchmarks.
In the case of the LOP, the most evident way of creating a generator is to randomly sample each parameter of the matrix
B from a uniform distribution between
\((0, 1)\) . Regarding benchmarks, the LOLIB [Reinelt
2002] is the most commonly used LOP library, which is composed of real-world instances (
IO (31 instances),
SGB (25), and
XLOLIB (39)) and randomly generated instances that try to mimic real-world data (
RandB (20),
RandA1 (25), and
RandA2 (25)). Both instance sources will be adopted for the experiments. See Appendix
E for more details about the nature of these instances.
For the PFSP, the random instances, i.e., processing times of the jobs, are generated from a uniform distribution between
\((0, 100)\) . Moreover, instances from the Taillard benchmark [Taillard
1993] composed by 10 instances for each size, are used to further evaluate the transferability of the algorithms.
Hyper-parameters. We have experimented with differently sized models. The size mainly comes from the number of encoding layers
L, the embedding size
d of embedding vectors
h and
e, used through the model and the hidden layer size of the MLP. We have found out that increasing the number of layers and embedding size from 3 and 128, respectively, do not significantly improve the GNN performance, while increasing the computational cost. Thus, those values have been used, forming a model with 500k learnable parameters. For further details, see Appendix
C.
Adhering to widely applied techniques, the model parameters are optimized with AdamW [Loshchilov and Hutter
2017] with a learning rate of 1e-4, beta values of 0.9 and 0.95, and a weight decay of 0.1. Finally, the used batch sizes are (512, 128, 64, and 32) for sizes (20, 30, 40, 50) in LOP and (20-5, 20-10, 20-20, 50-10) in PFSP, respectively.
Training. We train four different GNN models for each problem, using instances of sizes \(n=20, 30, 40,\) and 50 for the LOP and instances of sizes (num. jobs - num. machines) \(20-5\) , \(20-10\) , \(20-20\) and \(50-5\) for the PFSP. Each model is trained until convergence using an early-stopping criterion with a patience of 20 validation epochs; 1,000 batches will be seen in each epoch.
Hardware and Software. Models are trained in two
Nvidia RTX 3090 GPUs with 24 GB of memory each. The NCO algorithms have been implemented from scratch in
Python 3.8 using the
PyTorch 1.10 package.
4 Conventional algorithms are written in
C++ and executed on a cluster of 55 nodes, each one equipped with two
Intel Xeon X5650 CPUs and 64 GB of memory.
Algorithms. Among the set of conventional algorithms, we distinguish three groups: exact methods, constructive heuristics, and metaheuristics. For each group, we selected the algorithms that compose the state-of-the-art, and the stopping criterion is based on the limits reported in the original works, such that it allows these algorithms to reach their performance.
Considering exact methods, a fast non-commercial exact solver called
SCIP [Achterberg
2009] will be used to solve the LOP, and the
Discrete Optimization Global Search (DOGS) framework [Libralesso
2020] for the PFSP. Exact algorithms will be run with a time limit of 12 h per instance and, in the case the optimal is not found, the best found solution will be reported. Among the constructive heuristics, the algorithm by Becker [
1967] and the
Liu-Reeves heuristic [Liu and Reeves
2001]
(LRnm) are the best performing for the LOP and the PFSP with TFT criterion, respectively. As they are deterministic constructive algorithms, they will be run once until a solution is given. The implementation of the constructive heuristics and exact solvers has been made following the respective handbooks. Also, we consider state-of-the-art metaheuristics for both problems: a
Memetic Algorithm (MA) [Lugo et al.
2021] for the LOP and a
Differential Evolution (DE) algorithm [Santucci et al.
2015] for the PFSP. Metaheuristics will be stopped once
\(\hbox{1,000}n^2\) objective function evaluations are computed, a common stopping criterion for comparing different metaheuristics [Lugo et al.
2021; Santucci et al.
2015]. Regarding the NCO model described in Section
4.3, we will include the results given by a unique model (named as GNN), and the best results given by an ensemble of five separately trained models (Pop-GNN) as an additional straightforward way to exploit NCO models.
5.2 Performance Analysis
This first experiment has been designed to measure the performance of the end-to-end model compared to the algorithms described in the previous section. For this first purpose, a test-set of 1,000 random instances will be created for each problem, with instances similar (in terms of size and generator) to those used for training.
Results of the algorithms for the LOP are depicted in Table
2. The exact algorithm is capable of finding the optimum solution for values of n up to 40. However, for larger instances, it is unable to identify the optimum solution in the given time limit. Therefore, we report the gaps of the best solution found in these cases (marked with * in the table). The MA is the most competitive, in fact, it provides the best results for all sizes among the studied algorithms. The GNN model outputs good-quality solutions, with a gap between 0.2% and 0.5%, and systematically outperforms Becker’s constructive. Moreover, when an ensemble of models is used (GNN-Pop) its performance increases (0.1%–0.3%) but at the cost of a larger computational effort.
Similarly, the performance results of the algorithms to solve the PFSP are depicted in Table
3. Again, the metaheuristic (DE) is the best performer, while the GNN is superior to the LRnm heuristic. While for the LOP the GNN was able to outperform the exact algorithm for
N = 50, in the case of the PFSP, for 50 jobs and 10 machines, the exact is superior to both the constructive and GNN. Overall, the performance of the GNN is relatively worse for the PFSP compared to the LOP. This is, in our opinion, due to the different nature of the PFSP. Its fitness function, as opposed to the LOP, is harder to represent by pairwise interactions, and thus, it is not naturally learned by the GNN.
5.3 Computational Cost
Together with the performance, practitioners must take into account the execution time required by an algorithm to obtain the solution, which is crucial in optimization contexts in which time restrictions are present.
Following with the setting from the previous section, Table
4 shows the execution times for the different algorithms to solve a unique instance of the LOP. With the increase in size, the exact algorithm quickly reaches non-affordable execution times, while the constructive (Becker) scales successfully. The time required by the metaheuristics grows quadratically on
n, which can be a bottleneck for larger sizes, but for the sizes tested in this work the execution time is still reasonable. Once trained, the GNN shows a very fast response time, just a few minutes for the largest size tested.
5The experiments for the PFSP show a similar trend. As can be seen in Table
5, the constructive (LRnm) is the fastest algorithm. However, the GNN requires a slightly larger budget compared to LRnm, even though it is capable of performing the inference of the largest studied sizes (500/20) in only a few minutes. Finally, the exact method is again the most time-costly, while the metaheuristic (DE) is the second. Compared to the GNN, DE is 30 times slower for instances of 500 jobs and 20 machines. For a complete overview of the Pareto analysis of different algorithms considering both solution quality and computational cost, see Appendix
DApart form the inference time, one should also consider the training time of GNNs, which is the most time-demanding step, taking several hours; up to 29 hours for LOP instances of size \(n=50\) , and is unaffordable for sizes larger than 50 with our hardware. In our opinion, training time is a relevant aspect in scenarios that require the model to be updated continuously, but could be ignored if this is not the case.
To complete the experiment, an analysis of the memory consumption has been conducted. Constructive algorithms, as well as metaheuristics, follow a linear growth, which makes the memory requirements affordable. However, the memory consumption of the training of GNNs grows polynomially with respect to the model and the batch size. Figure
3 shows, for the LOP, the memory usage curves according to the batch size, where different curves are plotted for regularly used batch sizes. It can be seen that the training phase of GNN models is really memory-intensive, which quickly limits their applicability.
5.4 Transferability to Different Instance Sizes
Considering the huge computational resources required by NN-based models to solve large size instances, size-invariant models are really valuable. That is, models that have been trained using small instances and are later applied to solve larger ones. However, they may lack the ability to generalize the learned knowledge into larger and more complex instances, not guaranteeing competitive performances [Joshi et al.
2020].
That being said, we conduct experiments to investigate the generalization ability of GNN models to different instance sizes. For the LOP, we report results provided by models trained with instances of
\(n = 20, 30, 40,\) and 50, and evaluated in instances up to
\(n= \hbox{1,000}\) . In view of the results (see Table
6), the GNN model shows a good generalization capability, as the difference with the best solution worsens slightly (gaps from 0.5% to 0.6%). Regarding generalization, more complex models, such as GNN-40 or GNN-50 show, in general, a higher performance than the more simple ones (GNN-20 and GNN-30), which is the behavior one would expect. In addition to the performance, it is important to remember again the quick response time of these models, an aspect that can be relevant in some scenarios.
Additionally, we illustrate the performance of GNNs with the rest of the proposals as a function of the instance size. Figure
4 shows how the performance gap between the best performing algorithm (MA) and the GNN model increases from n = 20 to n = 200, remains constant from n = 200 to n = 400 and slightly decreases for larger sizes, which values the generalization capacity of the GNN models. Regarding the constructive algorithm (Becker), it reduces its gap with all the other algorithms consistently, even though the improvement decelerates for very large instances.
6In the case of PFSP, the experiment considers three models trained with instances of size (number of jobs/number of machines) 20/5, 20/20, and 50/10. As can be seen in Table
7, the model trained with smaller instances is the best performer for the 20/5 instances, while those trained with larger instances (20/20 and 50/10) are the ones that generalize better, even though they suffer with smaller instances. However, in PFSP one not only needs to consider the size of the permutation (the number of jobs) but also the number of machines. In fact, it has a considerable effect on the performance of the model; note that GNN20-20 outperforms GNN50-10 in instances of 200 jobs and 20 machines. We can conclude that: (1) the closer the number of jobs and machines to the instance used for training to that of the inference, the better the performance; (2) for large instances, which are usually intractable to solve with exact methods, the practitioner should consider training the model with the largest instances possible, always within the budget limits.
5.5 Training Data & Transferability to Target Benchmarks
To analyze the transferability of the learned model to other type of instances, and given that our purpose is to solve a certain target set of instances, we will consider three different training setups:
(1)
The model is trained with random instances obtained from generators, and then target instances are solved using the model.
(2)
The model is trained in two steps, first, with random instances obtained from generators, and then an additional training is performed using a small set of instances sampled from the target set of instances. In this setup, 100 epochs will be dedicated to both training steps.
(3)
The model is trained using a subset of instances sampled from the target set without any previous training.
For the first and the second cases, we will use uniformly distributed instances, and instances from the LOLIB and Taillard benchmarks will constitute the target set for the second and third cases. For the experiment, half of the benchmark instances will be used to form the training set and the other half will be used for testing.
Table
8 gathers the results obtained for the different setups in LOP. Even if LOLIB instances are heterogeneous, regarding their origin or the procedure used to create them, training the model with random instances (setups 1 and 2) is notably better than training the model directly using the set of instances we want to solve. However, it can be observed that GNNs trained with random instance generators perform generally better in random instances (
RandA1,
RandA2,
RandB, and
MB) than in real-world instances (
IO,
SGB, and
XLOLIB). So, the use of generators is advisable, but these generators should be able to produce instances as close as possible to the ones we want to solve.
Finally, as seen previously, including an additional training phase with instances from the target distribution is really helpful (setup 2—GNN-Int), providing the best performing models (gaps between 0.5% and 3.72% with respect to the best solutions found).
This experiment has also been designed to give an intuition about the transferability between different LOLIB benchmarks. That is, considering the computation effort required to train a model, can this model be applied successfully to other types of instances? In this regard, interesting results have been found. Remarkably, training in SGB is transferable to IO, training in XLOLIB is transferable to SGB, and training in random instances is transferable to the rest of random benchmarks.
Regarding the PFSP, Table
9 shows that the intensification phase does not achieve such good results. Instead, models trained with a larger set of randomly generated instances yield better performance. Overall, the gap between the LRnm heuristic and GNNs is smaller in Taillard instances, even though, in general, the GNN is still better.
5.6 Model Reusability
In addition to the aspects analyzed throughout this section, model reusability is also a desired characteristic for optimization algorithms. That is, a model that is general enough to be used in many different problems without much need for tuning. The GNN model designed in this article fulfills this property, as it can be applied “as it is” to any graph-based optimization problem. Of course, the practitioner must feed the network with the appropriate node and edge feature values, and the performance may not be good enough. In fact, even though both problems can be approached using the proposed model, it seems that the LOP fits better with the model’s characteristics, being possibly the reason why it performs better in the LOP. These results confirm that model reusability is an aspect that deserves more attention from the NCO community.
6 Discussion and Future Work
Through the experimentation section, we have tested different aspects and properties of the end-to-end model to analyze its behavior and competitiveness.
First, our observations indicate that NCO models have the potential to be general-purpose algorithms. Proof of that is the successful application of the same GNN architecture for both LOP and PFSP in this article. As the NCO field is still in early development, its effectiveness is likely to improve with anticipated advancements in the DL-optimization sector. This will also be enhanced by a deeper understanding on how to incorporate problem-specific knowledge into NCO models.
We made an effort to propose a good end-to-end model for optimizing the LOP and the PFSP, trying to find the most competitive training strategies. Although the conducted experiments showed that the NCO model obtains a remarkable performance when compared to the constructive heuristics, it is not still able to beat state-of-the-art methods (such as MA or DE). Furthermore, NN-based models have a serious drawback regarding the training time and the memory requirements as the instances become larger.
In light of the experimental results, one might wonder about the comparative advantages of using NCO models over metaheuristics. While specialized metaheuristics have produced better quality solutions across different problems, they require several hours to properly converge for the largest studied sizes. Conversely, NCO methods are able to provide a solution in a few minutes, being those solutions better than those provided by the classical constructive methods included in the study. Thus, in environments where a fast response is required (assuming some performance loss), these models are an interesting option, for example, in online decision-making optimization problems (e.g., logistics). However, it must be also noted that the training is computationally very expensive so, to be efficient, the model should not need to be re-trained frequently, as it would lose its competitiveness (fast response). In this regard, it is also worth noting the valuable advantage of input-size invariant models, such as the one designed in this work. They can be trained for n-sized instances and later be applied to \(m \gt \gt n\) -sized instances, maintaining a constant gap or even reducing it with respect to the state-of-the-art metaheuristics, which usually suffer, to a greater extent, with the increase of instance size.
Regarding the training process, there is another aspect that must be considered. Do we always need a large set of instances of the problem at hand to train a good-performing model? As observed in the experiments, the effectiveness of the model is greatly influenced by the quantity and the diversity of the instances used for training. While thousands of instances are required for training, real benchmarks do not generally have enough instances. Here, random instance generators come into play. However, implementing random instance generators that produce samples with characteristics similar to the target scenarios is usually challenging. Nevertheless, we have shown that, when ad hoc generators are not available, a successful alternative is to employ uniform random generators to train a baseline model, and, if possible, incorporate an additional training phase with a small subset of real instances.
It is important to acknowledge the limitations of the experiments conducted in this study, which are constrained by factors such as the scope of the problems evaluated and the specific configurations of the NCO models tested. These limitations must be considered when interpreting the results. Considering this, experiments conducted in this work suggest that NCO methods have certain capabilities that make them an interesting case of study. Although NCO is not able to outperform state-of-the-art metaheuristics as a standalone method, its fast inference time and generalization ability can be helpful as a combination with other methods. With this purpose in mind, recent reviews [Bengio et al.
2021; Talbi
2021] identify different works that employ Machine Learning models to improve Branch and Bound algorithms, for example, deciding in which node to apply a given heuristic [Khalil et al.
2017b] or generating a high-quality joint variable assignment [Nair et al.
2020].
In our opinion, there exists another relevant research topic that falls in line with the NCO model proposed in this article. The output of the NCO model developed for this work is a probability vector that is used to guide the construction of a solution, i.e., deciding the item to place in the next empty position of the permutation. However, the NCO model could also be used to guide more advanced algorithms, such as trajectory- or population-based metaheuristics. A relevant research line consists in developing models capable of encoding the whole population of solutions as an internal state in a population-based metaheuristic; or models that encode the current solution in trajectory-based algorithms, such as
local search (LS). In fact, LS schemes usually employ quadratic size neighborhood structures that are computationally intensive to process. In such cases, given a problem instance and a solution, it would be very valuable to have a model that is able to propose the most promising neighborhood operation (or operations) to choose. Note that this is challenging, since, in addition to the instance, the models must codify the solution (received as input) at which the LS algorithm is at each step. Although few, there are some works that can inspire research in this direction [Chen and Tian
2019; Wu et al.
2021; Garmendia et al.
2022]. Another interesting research line involves a detailed investigation of the training dataset’s composition necessary for effectively training the model. This includes designing generators that consider the problem structure and the characteristics of the target instances. In this context, the application of generative models, such as Generative Adversarial Networks [Goodfellow et al.
2020] and Diffusion models [Ho et al.
2020], appears particularly promising to automatically generate instances close to the target distribution.
Finally, it is worth noting that almost all the proposals in the literature focus on solving scheduling and routing-like problems. However, in the optimization area, we can find problems with constraints, multiple objectives, dynamic nature, and so on, and the design of NCO models to solve this kind of problem is still at an early stage. For example, when it comes to dealing with hard constrained problems, three main approaches are discussed in the current NCO literature: (1) Manually forcing the solution to fit the given constraints, restricting unfeasible actions [Cappart et al.
2021b]; (2) Adding a penalty value to the reward function for unfeasible outputs, as mentioned by Bello et al. [
2016]; (3) Allowing the model to initially produce unfeasible solutions that are subsequently adjusted using a repair operator to be feasible [Zhang and Dietterich
2000]. However, instead of finding external mechanisms as the above, exploring models able to internally deal with hard constraints (but also multiple objectives, dynamism, etc.) is a key research line.