Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
15 views

Meta-Heuristics Algorithms

This document discusses methods for empirically analyzing the performance of metaheuristics. It covers choosing problems and instances, performance measures such as computing time and solution quality, and examples of empirical analysis. Statistical methods are used to ensure results are significant.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Meta-Heuristics Algorithms

This document discusses methods for empirically analyzing the performance of metaheuristics. It covers choosing problems and instances, performance measures such as computing time and solution quality, and examples of empirical analysis. Statistical methods are used to ensure results are significant.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

11

Performance and Limitations of Metaheuristics

11.1 Empirical Analysis of the Performance of a Metaheuristic


In contrast with exact algorithms whose worst-case time complexity is known (see
Chapter 1), metaheuristics do not provide that kind of bound. They can be very ef-
fective on a given instance of a problem and, at the same time, show long running
times on another without finding a satisfactory solution. On the other hand, for ex-
ample, the selection sort algorithm could spend different amount of time on an al-
ready sorted list, and on a list sorted in the opposite order, but we know that, on any
list permutation, its time complexity function T (n) will be bounded by a second-
degree polynomial and the list will be sorted correctly. However, for hard problems
of increasing size such guarantees are useless in practice since the problems become
intractable. This was exactly why we looked at metaheuristics as a generally efficient
way of tackling hard problems. As we remarked in Chapter 1, more rigorous meth-
ods giving performance guarantees do exist for difficult problems but they are not as
general and easy to apply.
In the case of metaheuristics, the hope is clearly to get polynomial-time-boun-
ded computations for difficult problems, but we cannot be sure that this will be the
case; moreover, we have no guarantee that the solution found will be globally op-
timal either, or at least of high quality. The computing time can vary depending on
the problem, the particular instance at hand, the chosen metaheuristic and its parame-
ters. In addition, almost all the metaheuristics illustrated in this book employ random
choices, which means that the computing time, as well as the solution quality, are ac-
tually random variables. The stochastic nature of most metaheuristics makes their
rigorous analysis difficult. We have seen that this kind of analysis is possible in cer-
tain cases such as evolutionary algorithms or simulated annealing [8], but the results,
being in general asymptotic, are of little help.
Given these considerations, researchers have been led to take into account em-
pirical methods to measure the performance of a metaheuristic and to compare meta-
heuristics with each other. The approach is essentially based on standard statistical
methods and the goal is to be able to ensure that the results are statistically significant.
In what follows we assume that the parameters that characterize a given metaheuris-

© Springer Nature Switzerland AG 2018 191


B. Chopard, M. Tomassini, An Introduction to Metaheuristics for Optimization,
Natural Computing Series, https://doi.org/10.1007/978-3-319-93073-2_11
192 11 Performance and Limitations of Metaheuristics

tic have been chosen and that they do not change during the measurement. As we
have seen in the previous chapters, correctly setting these parameters is very impor-
tant for the efficiency of the method, for example the initial temperature in simulated
annealing (see Chapter 4), or the mutation rate in an EA (Chapter 8). Usually, these
parameters are either set by using standard values that worked on other problems
or the algorithm is run a few times and suitable values are found. To simplify and
unify the treatment, here we shall assume that this choice has already been made. In
a similar vein, we will ignore more sophisticated metaheuristics in which parameters
are dynamically changed online as a result of learning during the search.
To test a given metaheuristic on one or more classes of problems, or to compare
two metaheuristics with each other, the computational experimental methodology is
much the same and goes through the following phases:

• Choice of problems and of their instances;


• Choice of performance measures and statistical analysis of the results;
• Graphical or tabular presentation of results and their discussion.

11.1.1 Problem Choice

There are fundamentally two broad classes of problems to choose from: real-world
instances that come from operations research, engineering, and the sciences, and
“synthetic” problems, which are those that are artificially constructed with the goal
of testing different aspects of search. The approach is essentially similar in both
cases; however, given the didactic orientation of our book, we shall make reference
only to constructive and benchmark problems in the rest of the chapter.
Problem-based benchmark suites are a good choice because they allow one to
conceive of problems with different features. Benchmark functions for continuous
optimization are very widely used because of the practical importance of the problem
in engineering, economics, and the sciences. These benchmarks must contain diverse
functions so as to test for different characteristics such as multimodality, separability,
nonlinearity, increasing dimensions, and several others. A recent informative review
on test functions for mathematical optimization can be found in [44]. For reasons of
space, in the rest of the chapter we will limit ourselves to combinatorial optimiza-
tion test problems. In this case, the important features that are offered in standard
repositories of problem instances such as SATLIB or TSPLIB are a range of instance
sizes, the amount of constrainedness, and the way in which the problem variables
are chosen. On the last point, and roughly speaking, there are two types of instances
in benchmark suites for discrete problems: artificially built instances and randomly
generated instances. Artificial instances can be very useful: they can incorporate cer-
tain characteristics of real-world instances, and they can also help expose particular
aspects that are difficult to find in real-life problems. Randomly generated instances
are very frequently used too, for example in SAT problems or T SP problems. They
11.1 Empirical Analysis of the Performance of a Metaheuristic 193

have the advantage that many instances can be generated in an unbiased way, which
is good for statistical analysis; on the other hand, deviations from randomness are
very common in combinatorial problems and thus these instances might have little to
do with naturally occurring problems. In any case, as we discussed above for global
mathematical optimization, we should use sufficiently varied sets of test functions,
including at least both random and structured instances, once the parameters of the
metaheuristics have been set.

11.1.2 Performance Measures

Among the most interesting data to measure about the search behavior of a meta-
heuristic we mention the computational effort, that is, the computing time, and the
solution quality that can be obtained with a given computational effort. Computing
time can be measured in two ways: either as the physical time elapsed, which is
easily recorded through calls to the operating system, or as a relative quantity such
as the number of operations executed. In optimization, an even simpler often-used
quantity is the number of objective function evaluations. Using the clock time is use-
ful for practitioners who are only interested in the behavior of a solver on a given
computer system for a restricted class of problems. However, there are several draw-
backs. Clock times depend on the processor used and also on other hardware and
software details such as memory, cache, operating system, languages, and compilers.
This makes comparisons across different systems difficult, if not impossible. On the
other hand, the number of function evaluations is system-independent, which makes
it useful for comparisons. A possible drawback of this measure is that it becomes
inadequate if the problem has a time-varying fitness function, or when the objective
function evaluation only accounts for a small fraction of the total computing time,
giving results that cannot reliably be generalized. In spite of some limitations, fitness
evaluation counts are widely used in performance evaluation measures because of
their simplicity and independence from the computing system.
We remind the reader at this point that the most important metaheuristics belong
to the class of Las Vegas algorithms, which are guaranteed to only return correct so-
lutions but whose running time may vary across different runs for the same input.
Such an algorithm may run arbitrarily long without finding a global solution. Con-
sequently, the running times, as well as the solution quality, are random variables.
To measure them in a statistically meaningful way, the algorithm must be run many
times on each given instance under the same conditions in order to compute reliable
average values. In other words, as in any statistical application, the sample size must
be large enough. In the performance evaluation domain it is considered that at least
100 executions are needed.
Several metrics have been suggested to characterize the performance of a meta-
heuristic and an established common methodology is still missing. However, there
is a general agreement on a number of fundamental measures. Here we shall present
two very common ones: the empirical distribution of the probability of solving a
given problem instance as a function of the computational effort, and the empiri-
cal distribution of the obtained solutions. The success rate, which is the number of
194 11 Performance and Limitations of Metaheuristics

times that the algorithm has found the globally optimal solution divided by the total
number of runs, is a simple metric that can be derived from the previous two and it
is often used. Clearly, it is defined only for problems of which the globally optimal
solution is known, which is the case for most benchmark problems.

11.1.3 Examples

We have already mentioned some results that are in the spirit of the present chapter in
Chapter 5 on ant colony methods, and in Chapter 9 on GP multipopulations. In order
to further illustrate the above concepts, in this section we present a couple of simple
case studies. We focus our discussion on two particular metaheuristics: simulated
annealing (see Chapter 4) as applied to the T SP problem, and a genetic algorithm
on N K problems. The latter class of problems has been already used a few times in
the book and is defined in Section 2.2.3.
From the perspective of performance evaluation, it is interesting to find the mean
computing time that a metaheuristic needs to solve problem instances of a given
class when the size N of the instance increases. Indeed, the main motivation behind
the adoption of metaheuristics is the hope that they will solve the problem in polyno-
mial time in N , whereas deterministic algorithms, essentially complete enumeration,
require exponential time on such hard problems. It would certainly be reassuring to
verify that we may actually obtain satisfactory solutions to the problem in reasonable
time, otherwise one might question the usefulness of metaheuristics.
We recall here that the RWSAT metaheuristic presented in Chapter 10 has O(N )
complexity for “easy” problems, that is, those problems for which the ratio α of the
number of clauses to the number of variables is small. However, the complexity sud-
denly becomes exponential O(exp(N )) when α > αc , where αc is the critical point
at which the computational phase transition occurs. It is thus natural to investigate
the complexity behavior of simulated annealing and of genetic algorithms. But, as
we have already pointed out above, owing to their stochastic character, we can only
define the time complexity of metaheuristics in a statistical way. Another distinctive
point is that metaheuristics, being unable to guarantee convergence to the optimal
solution in bounded time, must have a built-in stopping criterion. For example, we
might allow a maximum of m iterations during which there are no improvements
to the best fitness found and then stop. We might therefore measure the mean time
to termination as a function of the instance size N . In this case we will not be able
to obtain any precise information about the quality of the found solution, only the
expected computational effort to obtain an answer.
Figure 11.1 shows the results obtained with simulated annealing on T SP in-
stances with a number of cities N between 20 and 50,000. For each N value, N
cities are first randomly placed on a square of given size and then an SA run is started
using the parameters proposed in Section 4.6. The movements we consider here are
of type 2-Opt. The search stops when, during the last three temperature levels, there
was no improvement of the current best solution. A priori, we have no indication of
the quality of the solution. However, we remember from Section 4.3 that the solution
found was within 5% of the exact optimum obtained with the Concorde algorithm.
11.1 Empirical Analysis of the Performance of a Metaheuristic 195

In Figure 11.1 we can see that the time to obtain a solution for the T SP problem
with random city placement grows almost linearly in the range N = 20 to N =
2,000. The curve can be fitted by the following second-degree polynomial

T (N ) = 7,100 × N 1.14 (11.1)


Here T (N ) is measured as the number of iterations of the SA run until conver-
gence. It corresponds to the total number of accepted and rejected configurations,
that is the number of fitness evaluations1 . For N in the range 5,000-50,000, the com-
putational effort, T (N ), is more important2 . We obtain the following relation

T (N ) = 35.5 × N 1.78 < O(N 2 ) (11.2)

Thus, there is a change of complexity regime between small and large problems.
However, for these values of N , the complexity is less than quadratic.

Time complexity of SA for random TSP problems Performance of a SA to solve a 50-town TSP
104 1 1
computational effort (in millions of iterations)

accuracy for success=0.05

103 slope=1.78

Prob of success
102
average error

101

slope=1.14

100

Statistics on 100 problems

10-1 0 0
101 102 103 104 105 0 2,000,000
problem size N #fitness evaluation

Fig. 11.1. Left image: average time complexity for simulated annealing in solving a T SP
problem with N cities randomly placed in a square of size 2 × 2. Right image: simulated
annealing performance on a T SP problem with 50 cities of which the optimal solution is
known. The SA computational effort is varied by changing the temperature schedule

Now we consider the SA performance from the point of view presented in Sec-
tion 11.1.2. The goal here is to determine the quality of the solution found with a
given computational effort. First of all, we should explain how to vary the effort of a
metaheuristic. This is easy to do by simply changing the termination condition. For
simulated annealing, it is also possible to change some parameter of the algorithm,
for example the temperature schedule, that is the rate at which the temperature T
is decreased. We remember that practical experience suggests that Tk+1 = 0.9Tk .
However, we might replace 0.9 by 0.85 for faster convergence and less precision,
1
The actual computational time varies from 0.03 to 4.3 seconds with a standard laptop.
2
The CPU time varies from 11 to 1,000 seconds on a laptop.
196 11 Performance and Limitations of Metaheuristics

or take 0.95 or 0.99 for slower convergence and better solution quality. This is the
approach adopted here.
The numerical experiment is performed on a benchmark problem of which the
globally optimal solution is known, which allows us to compare the error of the
tour returned by SA at a given computational effort with respect to the length of the
optimal tour. The problem is of size 50 cities distributed on a circle of radius one.
The optimal tour is a polygon with 50 sides and length L = 6.27905. The results are
averages over 100 problem instances that differ in their initial conditions and in the
sequence of random numbers generated.
Figure 11.1 (right) shows two indicators of performance as a function of the com-
putational effort measured as the number of function evaluations. The black curve
plots the mean error of the solution found with respect to the optimal tour of length
L = 6.27905 over 100 SA runs. It is easy to see that the precision of the answer
improves if we devote more computational resources to simulated annealing.
The points on the blue curve are estimates of the success rate, or of the probability
P of success on this problem if we require a precision of  = 0.05. The P value is
obtained as follows for each computational effort:
number of solutions with an error less than 
P = (11.3)
number of repetitions
One can see here that the probability of a correct answer at the  = 5% level
tends to 1 for a computational effort exceeding 2 × 106 iterations. This means that
almost all the 100 runs have found a solution having this precision. To appreciate the
magnitude of the computational effort, we may recall here that the search space size
for this problem is 50! ≈ 1.7 × 1063 .
Clearly, if we increased the required precision by taking a smaller , the suc-
cess rate would correspondingly decrease. Also, while the performance measured is
specific to this particular problem, that is, all the cities lie on a circle, the observed be-
havior can be considered qualitatively general. We should thus expect that finding the
optimum would be increasingly difficult, without adding computational resources, if
we are more demanding on the quality of the solution we want to achieve.
This is true in general whatever the metaheuristic examined. To see this, we shall
now consider the performance of a genetic algorithm in solving problems in the
N K class. As in the T SP case above, we describe first the empirical average time
complexity of a GA for an N K problem with varying N and constant K = 5.
For each N value, 500 N K problem instances are randomly generated. Periodic
boundary conditions are used in the bit strings representing configurations of the
system. The fitness of a bit string x = x0 x1 . . . xN −1 of length N is given by
N
X −1
f (x) = h(xi−2 , xi−1 , xi , xi+1 , xi+2 ) (11.4)
i=0

where h is a table with 32 entries (2K in the general case), randomly chosen among
the integers 0 to 10. This allows us to generate landscapes that are sufficiently dif-
ficult but not too hard. For each of the 500 generated instances the optimal solution
11.1 Empirical Analysis of the Performance of a Metaheuristic 197

is found by exhaustive enumeration, that is, by evaluating all the 2N possible solu-
tions. The goal here is clearly to have an absolute reference for the evaluation of the
performance of the GA on this problem.
The chosen GA has a population size of 100 individuals, one-point crossover with
crossover probability 0.6, and a mutation probability of 0.01 for each of the N bits
of x. The best individual of each generation goes unchanged to the next generation,
where it replaces the worst one. We allow a maximum number of 80 generations and
we compute the computational effort as the number of function evaluations.
After each generation ` a check is made to see whether the known exact solution
has been found. If this is the case, the computational effort for the instance at hand
is recorded as 100 × `. If the best solution has not been found, the GA iteration
continues. The maximum computational effort is thus 100 × 80 = 8,000. If the
solution has not been found after 80 generations, we shall say that the GA has failed,
which will allow the computation of an empirical failure probability at the end. If the
empirical failure probability is denoted by pf , the corresponding success probability
is 1 − pf . The motivation for introducing a failure probability is the observation that,
if the exact solution is not found in a reasonable number of generations, it will be
unlikely to be found later. In fact, if we increase the number of allowed generations
beyond 80, the probability of failure doesn’t change significantly. For this reason, it
is necessary to separate solvable instances from those that are not in order not to bias
the computational effort; otherwise the latter would be influenced by the choice of
the maximum number of allowed generations, which can be arbitrarily large.
Figure 11.2 depicts the results that have been obtained. Here the number of fit-
ness evaluations is the average value over all the problem instance that have been
solved optimally within a computing time corresponding to at most 80 generations.
We see that this time is indeed small when compared with the upper limit of 8000
evaluations.
We also remark that the empirical average computational complexity grows es-
sentially linearly, which is encouraging, given that N K landscapes have an expo-
nentially increasing complexity. On the other hand, it is also seen in the figure that
the failure probability increases with N , and there are more and more problems that
the GA cannot solve exactly.
We now characterize the GA performance in a slightly different way on the same
class of problems. Thus, we keep the same GA parameters and the same problem in-
stance generation as above but we change the termination condition into a stagnation
criterion instead of a hard limit on the number of generations. The condition now
becomes the following: if during m consecutive generations the best fitness has not
been improved, the GA stops. We then save the best solution found up to this point
and the number of generations elapsed. As before, the baseline for comparing the
results is the exhaustive search for the optimal solutions for each N .
After the 500 repetitions for each value of N , we can compute and plot the av-
erage computational effort, given that we know how many generations were needed
in each run. We can also compute the mean relative error with respect to the known
optimum, and the number of times the best solution found was within a precision
interval  around the exact solution. Finally, we can vary the computational effort by
198 11 Performance and Limitations of Metaheuristics

average complexity for a GA to solve a NK-problem


10,000 1
K=5
Statistics on 500 problems

#fitness evaluation

Prob of failure
Exhaustive search

0 0
0 20
N

Fig. 11.2. Black curve: average computational complexity of a genetic algorithm in terms
of function evaluations for solving N K problems with N between 8 and 20 and K = 5.
Each point is the average of 500 randomly generated instances of the corresponding problem.
Blue curve: fraction of problems for which the global optimum has not been found within
the allowed time limit corresponding to 8,000 evaluations. Red curve: number of function
evaluations required by exhaustive search of the 2N possible solutions

varying the value of m. The results presented in Figure 11.3 are for m between 3 and
15, N = 18, two values of K, K = 5 and K = 7, and two values of the precision,
 = 0.1 and  = 0.02.
As expected, the average error decreases with increasing computational effort;
the solution quality is improved by using more generations, and the problems with
K = 5 are easier than those with K = 7.
The previous results make it clear that there is a compromise to be found be-
tween the computational effort expended and the quality of the solutions we would
like to obtain. High-quality solutions require more computational effort, as seen in
the figure. It should be said that in the present case the solution quality can be com-
pared with the ideal baseline optimal solution, which is known. Now, very often the
globally optimal solution is unknown, typically for real-life problems or for very
large benchmark and constructive problems, for instance N K landscapes with, say,
N = 100 and K = 80. In this case, the best we can do is to compare the obtained
solutions to the best solution known, even if we don’t know whether the latter is glob-
ally optimal or not. For some problems, one can get a reliable approximate value for
theoretical lower bounds on solution quality by using Lagrangian relaxation or inte-
ger programming relaxation (see Chapter 1). In these cases, the solutions found by
the metaheuristic can be compared with those bounds.

Quite often performance measures similar to the ones just described are obtained
in the framework of comparative studies between two or more different metaheuris-
11.1 Empirical Analysis of the Performance of a Metaheuristic 199

Performance of a GA to solve a NK-problem (K=5) Performance of a GA to solve a NK-problem (K=7)


0.12 1 0.12 1

epsilon=0.1 epsilon=0.1

epsilon=0.02 epsilon=0.02

Prob of success

Prob of success
average error

average error
Statistics on 500 problems N=18 Statistics on 500 problems N=18

0 0 0 0
0 8,000 0 8,000
#fitness evaluation #fitness evaluation

Fig. 11.3. Performance curves for the GA described in the text for solving N K problems as
a function of the computational effort. The left graphic corresponds to K = 5 and the right
curve is for K = 7. On both panels the probability of success is in blue for two values of the
precision,  = 0.1 and 0.02. The black curves give the average relative error

tics with the goal of establishing the superiority of one of them over the others. This
kind of approach can be useful when it comes to a particular problem or a well-
defined class of problems that are of special interest for the user. However, as we
shall see in the next section, it is in principle impossible to establish the definitive
superiority of a stochastic metaheuristic with respect to others. This doesn’t pre-
vent researchers from trying to apply robust statistical methods when comparing
metaheuristics with each other. The approach is analogous to what we have just
seen applied to a single metaheuristic. However, when comparing algorithms, one
must be able to establish the statistical significance of the observed differences in
performance. In general, since samples are usually not normally distributed, non-
parametric statistical tests are used such as the Wilcoxon, Mann-Whitney, and the
Kolmogorov-Smirnov or Chi-squared tests for significant differences in the empiri-
cal distributions [69].

To recapitulate, performance evaluation in the metaheuristic field is a necessary


and useful step. The examples illustrated here were chosen for their simplicity and
familiarity to the reader rather than their importance, in order to bring the main mes-
sage home without unnecessary complication. The message is twofold: on the one
hand, we discussed some common metrics that are useful for characterizing the per-
formance of a metaheuristic and, on the other hand, using those measures, we showed
that two metaheuristics on a couple of difficult but not extremely hard versions of the
problems require much less computational resources than exhaustive enumeration
to obtain very good solutions. Of course, these conclusions cannot immediately be
generalized to other problems and other metaheuristics without studying their perfor-
mance behavior, but at least the examples suggest that the metaheuristics approach
to hard problem solving is a reasonable one.
200 11 Performance and Limitations of Metaheuristics

The field of performance measures and their statistics is varied and complex; here
we have offered an introduction to this important subject but, to avoid making the text
more cumbersome, several subjects have been ignored. Among these, we might cite
the robustness of a metaheuristic and the parallel and distributed implementations
of metaheuristics and their associated performance measures. The robustness of a
method refers to its ability to perform well on a wide variety of input instances of
a problem class and/or on different problems. Concerning parallel and distributed
metaheuristics, it is too vast a subject to be tackled here. The reader wishing to pur-
sue the study of the issues presented in this chapter is referred to the specialized
literature, e.g., [41, 13] for details and extensions of performance evaluation and sta-
tistical analysis, and [78, 32] for parallel and distributed implementations and their
performance.

11.2 The “No Free Lunch” Theorems and Their


Consequences
We hope that the reader is convinced at this point that metaheuristics, without be-
ing a silver bullet, do nevertheless provide in practice a flexible, general, and rel-
atively easy approach to hard optimization problems. Metaheuristics work through
some kind of “intelligent” sampling of the solution space, both for methods in which
the search follows a trajectory in space, such as simulated annealing, as well as for
population-based methods such as evolutionary algorithms or particle swarms, for
example.
In the previous sections of this chapter we discussed a number of approaches for
evaluating the performance of a metaheuristic on a given problem or class of prob-
lems, and for comparing the effectiveness of different algorithms on a problem or
a set of problems. Especially when comparisons between metaheuristics are called
for, a number of questions arise naturally. Can we really compare the performance
of different metaheuristics on a problem class? What if we include different problem
types or problem classes in the comparison? Is there a principled way of choosing
the best-adapted metaheuristic on a given problem? All these questions are legit-
imate because metaheuristics are general problem-solving methods that can be ap-
plied to many different problems, not specialized exact algorithms such as those used
for sorting or searching. Thus, it has very frequently been the case in the literature
that different metaheuristics or differently parameterized versions of the same meta-
heuristic are compared using a given suitable set of test functions. As we explained
in Section 11.1.1, there are several well-known sets of test functions that typically
contain a few tens of functions chosen according to various important criteria.
On the basis of numerical experiments, often using only a handful of test func-
tions, one can obtain performance measures of two or more metaheuristics by fol-
lowing the methodology explained in the previous sections. Often the authors of
such studies quickly extrapolate the results to unseen cases and sometimes affirm the
superiority of one search method over others. Implicitly, their conclusion is that if
method A(fi ) applied to instance fi of a given test function has a better performance
11.2 The “No Free Lunch” Theorems and Their Consequences 201

than method B(fi ) on the same instance, and the result is the same for all or the ma-
jority of n test functions {f1 , f2 , . . . , fn }, with n typically between 5 and 10, then
the result is probably generalizable to many other cases. But experience shows that
one can reach different conclusions according to the particular metaheuristics used,
their parameterization, and the details of the test functions. In this way, rather ster-
ile discussions on the superiority of this or that method have often appeared in the
literature. However, in 1997 Wolpert and Macready’s work [85] on “no free lunch
theorems” (NFL) showed that, under certain general conditions, it is impossible to
design a “best” general optimization method.
In Wolpert and Macready’s article the colloquial expression “no free lunch,”
which means that nothing can be acquired without a corresponding effort or cost,
is employed to express the fact that no metaheuristic can perform better than another
on all possible problems. More precisely, here is how the ideas contained in the NFL
theorems might be enunciated in a nutshell:

For all performance measures, no algorithm is better than another when they are
compared on all possible discrete functions.

Or, equivalently:

The average behavior of any two search methods on all possible discrete functions
is identical.

The latter formulation implies that if method A is better than method B on a


set of problems, then there must exist another set of problems on which B outper-
forms A. In particular, and perhaps surprisingly, on the finite set of discrete prob-
lems, random search has the same average performance as any more sophisticated
metaheuristic. For example, let’s consider a deterministic local search such as best
improvement (see Chapter 2). For reasons that will become clear below, let’s assume
that the search can restart from an arbitrary configuration when it reaches a local
optimum. Such a metaheuristic, though simple, should provide at least reasonably
good results on many conceivable functions. Let’s consider now a local search that
always chooses a random neighbor along its trajectory in the search space. Perhaps
contrary to intuition, although on many functions best improvement would perform
better, there must be other functions on which the random walk search outperforms
hill climbing since both must have the same performance on average. It goes without
saying that many of the functions that would favor random walk search are essen-
tially random functions, or functions of a particular nature which are not important
in problems that present themselves in real applications. However, those functions
do exist and they influence the results from a statistical point of view.
We now summarize the theoretical framework under which the NFL theorems
have been established without going into too much mathematical detail. The inter-
ested reader will find the full discussion in the original work [85].
202 11 Performance and Limitations of Metaheuristics

• The theory considers search spaces S of size |S|, which can be very large but
always finite. This restricts the context to combinatorial optimization problems
(see Chapters 1 and 2). On these spaces the objective functions f : S → Y are
defined, with Y being a finite set. Then the space F = Y S contains all possible
functions and has size |Y ||S| , which is in general very large but still finite.
• The point of view adopted is that of black box optimization, which means that
the algorithm has no knowledge of the problem apart from the fact that, given
any candidate solution, it can obtain the objective function value of that solu-
tion. Wolpert and Macready use the number of function evaluations as the ba-
sic measure of the performance of a given search method. In addition, to avoid
unbounded growth of this number, only a finite number m of distinct function
evaluations is taken into account. That is to say the search space points are never
resampled. The preceding scenario can easily be applied to all common meta-
heuristics provided we ignore the possibly resampled points.

Under the previous rather mild and general conditions, Wolpert and Macready estab-
lish the following result by using probability and information theory techniques:
X X
P (dym |f, m, A1 ) = P (dym |f, m, A2 )
f f

In the previous expression P (dym |f, m, a) is the conditional probability of obtaining


a given sample dym of size m from function f , corresponding to the sampled search
space points dxm , when algorithm A is iterated m times. Summations are performed
on all functions f ∈ F and A1 and A2 are two particular search algorithms. In
other words, the expression means that the probability of generating a particular
sequence of function values is the same for all algorithms when it is averaged over
all functions, or, equivalently, that P (dym |f, m, A) is independent of algorithm A
when the probability is averaged over all objective functions f .
Now, if Φ(dym ) is any sampling-based performance measure, the average of
Φ(dym ) is independent of A as well:
X X
Φ(dym |f, m, A1 ) = Φ(dym |f, m, A2 )
f f

meaning that no algorithm can outperform any other algorithm when their perfor-
mance is averaged over all possible functions. Wolpert and Macready show that the
results are also valid for all sampling-based performance measures Φ(dym ), and that
they also apply to stochastic algorithms and to time-varying objective functions.
What are the lessons to be learned from the NFL theorems? The most important
positive contribution is that the theorems imply that it cannot be said any longer that
algorithm A is better than algorithm B without also specifying the class of problems
for which this is true. This means that no search method that is based on sampling and
without specific knowledge of the search space can claim to be superior to any other
in general. It is also apparent that performance results claimed for a given benchmark
11.2 The “No Free Lunch” Theorems and Their Consequences 203

suite or for specific problems do not necessarily translate into similar performance
on other problems. As we saw above, this behavior is the result of the existence
of a majority of random functions in the set of all possible functions. However, the
interesting functions in practice are not random, which means that the common meta-
heuristics will in general be more effective on the problems researchers are normally
confronted with.
Moreover, the NFL theorems hold in the black box scenario only. If the user
possesses problem knowledge beyond that this knowledge can, and should, be used
in the search algorithm in order to make it more efficient. This is what often happens
in real applications such as scheduling or assignment in which problem knowledge is
put to good use in the algorithms to solve them. Thus, the conclusions reached in the
NFL theorems are not likely to stop the search for better metaheuristics, but at least
we now know that some discipline and self-restraint must be observed in analyzing
and transferring performance results based on a limited number of problems.
The above results trigger a few considerations that are mainly of interest for the
practitioner. Since it is impossible to prove the superiority of a particular metaheuris-
tic in the absence of specific problem knowledge, why not use simpler and easy to
implement metaheuristics first when tackling a new problem? This approach will
save time and does not prevent one from switching to a more sophisticated method
if the need arises.
In the same vein, we now briefly describe a useful and relatively new approach
to problem solving that somehow exploits the fact that different algorithms perform
better on different groups of functions, and also assumes a context similar to the
black box scenario. In this case, since we do not know how to choose a suitable
algorithm Ai in a small set {A1 , A2 , . . . , Ak } the idea is to use all of them. This
leads to the idea of an algorithm portfolio, in which several algorithms are combined
into a portfolio and executed sequentially or in parallel to solve a given difficult
problem. In certain cases, the portfolio approach may be more advantageous than
the traditional method. The idea comes from the field of randomized algorithms but
it is useful in general [42]. Another way of implementing the approach is to select
and activate the algorithms in the portfolio dynamically during the search, perhaps as
a consequence of some statistical measures of the search space that are generated on
the fly during the search. Clearly, the portfolio composition as well as the decision
of which algorithm to use at which time are themselves difficult problems but some
ideas have been proposed to make these choices automatic or semi-automatic. A
deeper description of this interesting approach would lead us beyond our scope in
this book and we refer the reader to the specialized literature for further details.

You might also like