Algorithm Engineering
Algorithm Engineering
x { k
Camil Demetres u Irene Fino hi Giuseppe F. Italiano
Abstra t
Algorithm Engineering is on erned with the design, analysis, implementation, tun-
ing, debugging and experimental evaluation of omputer programs for solving algorithmi
problems. It provides methodologies and tools for developing and engineering eÆ ient al-
gorithmi odes and aims at integrating and reinfor ing traditional theoreti al approa hes
for the design and analysis of algorithms and data stru tures.
1 Introdu tion
In re ent years, many areas of theoreti al omputer s ien e have shown growing interest in
solving problems arising in real-world appli ations, experien ing a remarkable shift to more
appli ation-motivated resear h. However, for many de ades, resear hers have been mostly
using mathemati al methods for analyzing and predi ting the behavior of algorithms: asymp-
toti analysis in the Random A ess Model has been the main tool in the design of eÆ ient
algorithms, yielding substantial bene ts in omparing and hara terizing their behavior and
leading to major algorithmi advan es. The new demand for algorithms that are of pra ti al
utility has now raised the need to re ne and reinfor e the traditional theoreti al approa h
with experimental studies, tting the general models and te hniques used by theoreti ians to
a tual existing ma hines, and bringing algorithmi resear h ba k to its roots. This implies
to onsider often overlooked, yet pra ti ally important issues su h as hidden onstant fa -
tors, e e ts of the memory hierar hy, impli ations of ommuni ation omplexity, numeri al
pre ision, and use of heuristi s.
The whole pro ess of designing, analyzing, implementing, tuning, debugging and exper-
imentally evaluating algorithms is usually referred to as Algorithm Engineering. Algorithm
Engineering views algorithmi s also as an engineering dis ipline rather than a purely math-
emati al dis ipline. Implementing algorithms and engineering algorithmi odes is a key
step for the transfer of algorithmi te hnology, whi h often requires a high-level of expertise,
to di erent and broader ommunities, and for its e e tive deployment in industry and real
appli ations. Moreover, experiments often raise new onje tures and theoreti al questions,
Work partially supported by the IST Programmes of the EU under ontra t numbers IST-1999-14186
(ALCOM-FT) and IST-2001-33555 (COSIN), and by the Italian Ministry of University and S ienti Resear h
(Proje t \ALINWEB: Algorithmi s for Internet and the Web").
x
Dipartimento di Informati a e Sistemisti a, Universita di Roma \La Sapienza", Via Salaria 113, 00198
Roma, Italy. Email: demetresdis.uniroma1.it, URL: http://www.dis.uniroma1.it/~demetres/.
{
Dipartimento di Informati a, Sistemi e Produzione, Universita di Roma \Tor Vergata",
Via di Tor Vergata 110, 00133 Roma, Italy. Email: fino hidisp.uniroma2.it, URL:
http://www.dsi.uniroma1.it/~fino hi/.
k Dipartimento di Informati a, Sistemi e Produzione, Universita di Roma \Tor Vergata",
Via di Tor Vergata 110, 00133 Roma, Italy. Email: italianodisp.uniroma2.it, URL:
http://www.info.uniroma2.it/~italiano/.
1
opening unexplored resear h dire tions that may lead to further theoreti al improvements
and eventually to more pra ti al algorithms. Theoreti al breakthroughs that answer funda-
mental algorithmi questions and solve long-standing open problems (su h as deterministi
primality testing [3℄) are a ru ial step towards the solution of a problem. However, they of-
ten lead to algorithms that are far from being amenable to eÆ ient implementations, raising
the question of whether more pra ti al solutions exist. We take as an example the ase of
linear programming, i.e., the problem of optimizing a linear fun tion subje t to a olle tion
of linear inequalities. The worst- ase exponential running time of the simplex algorithm pro-
posed by Dantzig [25℄ in 1947, together with the experimental observation that it usually runs
in low-order polynomial time on real-world instan es, stimulated mu h theoreti al resear h
aimed at dis overing if linear programming was polynomially solvable. The rst answer was
the ellipsoid method of Kha hiyan [37℄, whi h, however, was impra ti al. The quest for a
pra ti al algorithm nally led to the interior-point approa h of Karmarkar [36℄. Introdu ing
a ompetitor of the simplex method, in turn, inspired a great uorishing of theoreti al and
experimental work, leading to major improvements in performan es for both approa hes, that
have been also implemented in a widely used library. We believe that Algorithm Engineering
should take into a ount the whole pro ess, from the early design stage to the realization of
eÆ ient implementations.
A major goal of Algorithm Engineering is to de ne standard methodologies and realisti
omputational models for the analysis of algorithms. For instan e, there is more and more
interest in de ning models for the memory hierar hy and for Web algorithmi s. Studying data
distribution s hemes to a hieve lo ality of referen e at ea h level of the memory hierar hy is
indeed fundamental in order to minimize I/O a esses and a he misses and to get eÆ ient
odes. This topi re ently has been the subje t of mu h resear h: we review some of the most
promising advan es in Se tion 2. Issues related to Web algorithmi s are also very interesting,
as the Internet is nowadays a primary motivation for several problems: se urity infrastru ture,
Web a hing, Internet sear hing and information retrieval are just a few of the hot topi s.
Devising realisti models for the Internet and Web graphs is thus essential for testing the
algorithmi solutions proposed in this settings.
Another aspe t of Algorithm Engineering, usually referred to as Experimental Algorith-
mi s, is related to performing empiri al studies for omparing a tual relative performan e of
algorithms so as to study their amenability for use in spe i appli ations. This may lead to
the dis overy of algorithm separators, i.e., families of problem instan es for whi h the perfor-
man es of solving algorithms are learly di erent, and to identifying and olle ting problem
instan es from the real world. Other important results of empiri al investigations in lude
assessing heuristi s for hard problems, hara terizing the asymptoti behavior of omplex
algorithms, dis overing the speed-up a hieved by parallel algorithms and studying the e e ts
of the memory hierar hy and of ommuni ation on real ma hines, thus helping in predi ting
performan e and nding bottlene ks in real systems. The main issues and ommon pitfalls
in Experimental Algorithmi s are reviewed in Se tion 3.
The surge of investigations in Algorithm Engineering is also produ ing, as a side e e t,
several tools whose target is to o er a general-purpose workben h for the experimental val-
idation and ne-tuning of algorithms and data stru tures. In parti ular, software libraries
of eÆ ient algorithms and data stru tures, olle tions and generators of test sets, program
he kers, and software systems for supporting the implementation and debugging pro ess are
relevant examples of su h an e ort. We will dis uss some relevant aspe ts on erning tools for
oding and debugging in Se tion 4 and the bene ts of robust, eÆ ient, and well-do umented
2
software libraries in Se tion 5.
Last, but not least, Algorithm Engineering en ourages fruitful ooperation not only be-
tween theoreti ians and pra titioners, but also, and more importantly, between omputer
s ientists and resear hers from other elds. Indeed, experiments have played so far a ru ial
role in many s ienti dis iplines: in physi s, biology, and other natural s ien es, for instan e,
resear hers have been extensively running experiments to learn ertain aspe ts of nature and
to dis over unpredi table features of its internal organization. Approa hes and results from
di erent elds may be therefore very useful for algorithm design and optimization.
3
perturbations of arbitrary inputs. It seems that the smoothed approa h an explain the
behavior of some algorithms where other analyses fail.
An interesting example in this setting is provided by the simplex algorithm for solving
linear programs. In the late 1970's, the simplex algorithm was shown to onverge in expe ted
polynomial time on various distributions of random inputs with quite unrealisti hara ter-
isti s. The asymptoti worst- ase time bound was known to be exponential, yet experiments
showed that the running time was bounded by a low-degree polynomial also in many real-
world instan es. This raised the quest for a more pre ise analysis. It has been a tually proved
in [53℄ that the simplex algorithm has polynomial smoothed omplexity.
Of a similar avor is resear h on parameterized omplexity, proposed by Rod Downey and
Mike Fellows [30℄: this is a novel approa h to omplexity theory whi h o ers a means of
analysing algorithms in terms of their tra tability and whi h is mu h important in dealing
with NP-hard problems. The key idea of the theory is to isolate some aspe t of the input,
the parameter, and to on ne the ombinatorial explosion of omputational diÆ ulty to an
additional fun tion of the parameter, with other osts remaining polynomial. Fixed-parameter
tra table problems have algorithms running in O(f (k)nd ) time, where n is the instan e size,
k is the parameter, f is some exponential (or worse) fun tion of k, and d is a onstant
independent of k. This kind of algorithms are of pra ti al importan e for small values of the
parameter k and are often at the base of very e e tive heuristi s.
Computational models. Providing simpli ed omputational models whi h preserve the
essential features of the real te hnology is essential in Algorithm Engineering. To exemplify
this idea, we dis uss issues related to memory performan es. Modern omputers are indeed
hara terized by an in reasingly steep memory hierar hy, onsisting of several levels of internal
memories (registers, a hes, main memory) and of external devi es (e.g., disks or CD-ROMs).
Due to physi al prin iples and e onomi al reasons, faster memories are small and expensive,
while slower memories an be larger and inexpensive. Designing algorithms in the lassi al
RAM model, whi h does not take into a ount memory laten y issues, bene ts of a he
hits and overheads due to a he misses, is therefore be oming progressively more dangerous.
EÆ ient algorithmi odes are for ed to ope with bounded (and not unlimited) memory
resour es and should attempt at taking advantage of the hierar hy as mu h as possible. This
poses the hallenging problem of designing algorithms that maintain lo ality of referen e in
data a ess patterns, i.e., able to luster memory a esses both in time and in spa e.
Many extensions of the RAM model have been proposed at this aim. Sin e I/O a esses
represent the major bottlene k when dealing with massive data sets, early works fo used on
issues related to external memories. Many eÆ ient algorithms and data stru tures have been
designed in the lassi al I/O model [2, 8, 57℄, whi h abstra ts a two-level hierar hy onsisting
of a main memory and a disk from/to whi h data are transferred in blo ks of B ontiguous
items. The model was soon extended to deal with parallel systems (multiple pro essors,
parallel memories) [58, 59℄ and with multi-level hierar hies [1, 59℄. Unfortunately, analyzing
algorithms in multi-level models is not easy, due to the many parameters involved in the
analysis: among others, blo k size B and memory size M at ea h level of the hierar hy must
be onsidered. This en ouraged resear h for simpler alternatives.
The a he-oblivious model re ently introdu ed by Frigo et al. [33℄ appears to be very
appealing in this setting, as it allows it to reason about two levels of the hierar hy, but to
prove results for an unknown number of levels. The key idea is not using parameters B and M
4
Tree Recursive
partition Tr partition
...
T1 Tk
Memory Tr T1 ... Tk
layout
3 Experimental Algorithmi s
Algorithms must be implemented and tested in order to have a pra ti al impa t. Experiments
an help measure pra ti al indi ators, su h as implementation onstant fa tors, real-life bot-
tlene ks, lo ality of referen es, a he e e ts and ommuni ation omplexity, that may be
extremely diÆ ult to predi t theoreti ally. They may also help dis over easy and hard in-
stan es for a problem. For example, it has been observed that some hard omputational
problems, su h as data lustering, seem to be \hard only when they are not interesting":
when the solution is robust, like in many real-world instan es, it is often not too diÆ ult to
dis over it. A areful tuning of the ode, as well as the addition of ad-ho heuristi s and lo al
ha ks, may dramati ally improve the performan es of some algorithms, although the theoret-
i al asymptoti behavior may be not a e ted. Many lear examples of this fa t are addressed
in the literature: among them, the implementation issues of the push-relabel algorithm for
the maximum ow problem by Goldberg and Tarjan [22℄ stand out.
Unfortunately, as in any empiri al s ien e, it may be sometimes diÆ ult to draw gen-
eral on lusions about algorithms from experiments. Common pitfalls, often experien ed by
resear hers in their studies, seem to be:
5
Dependen e of empiri al results upon the experimental setup:
{ Ar hite ture of the running ma hine: memory hierar hy, CPU instru tions pipelin-
ing, CISC vs RISC ar hite tures, CPU and data bus speed are te hni al issues that
may substantially a e t the exe ution performan e.
{ Operating system: CPU s heduling, ommuni ation management, I/O bu ering
and memory management are also important fa tors.
{ En oding language: features su h as built-in types, data and ontrol ow synta ti
stru tures and language paradigm should be taken into a ount when hoosing
the en oding language. Among others, C++, C and Fortran are most ommonly
used in this ontext. However, we point out that powerful C++ features su h as
method invo ations, overloading of fun tions and operators, overriding of virtual
fun tions, dynami asting and templates may introdu e high hidden omputation
osts in the generated ma hine ode even using professional ompilers.
{ Compiler's optimization level: memory alignment, register allo ation, instru tion
s heduling, repeated ommon subexpression elimination are the most ommon op-
timization issues.
{ Measuring of performan e indi ators: time measuring may be a riti al point in
many situations in luding pro ling of fast routines. Important fa tors are the
granularity of the time measuring fun tion (typi ally 1 se to 10 mse , depending
upon the platform), and whether we are measuring the real elapsed time, the time
used by the user's pro ess, or the time spent by the operating system to do I/O,
ommuni ation or memory management.
{ Programming skills of developers: the same algorithm implemented by di erent
programmers may lead to di erent on lusions on its pra ti al performan es. More-
over, even di erent su essive re ned implementations oded by the same program-
mer may greatly di er from ea h other.
{ Problem instan es used in the experiments: the range of parameters de ning the
test sets used in the experiments and the stru ture of the problem instan es them-
selves may lead to formulate spe i on lusions on the performan e of algorithms
without ensuring generality. Another typi al pitfall in this ontext onsists of test-
ing odes on data sets representing lasses that are not broad enough. This may
lead to ina urate performan e predi tion. An extreme example is given by the
Netgen problem instan es for the minimum ost ow problem [38℄ that were used
to sele t the best ode for a multi ommodity ow appli ation [41℄. That ode was
later proved to behave mu h slower than several other odes on real-life instan es
by the same authors of [41℄. In general, it has been observed that some algorithms
behave quite di erently if applied on real-life instan es and on randomly generated
test sets. Linear programming provides a well known example.
DiÆ ulty in separating the behavior of algorithms: it is sometimes hard to identify
problem instan es on whi h the performan e of two odes is learly distinguishable. In
general, good algorithm separators are problem families on whi h di eren es grow with
the problem size [34℄.
6
Unreprodu ibility of experimental results: possibly wrong, ina urate or misleading
on lusions presented in experimental studies may be extremely diÆ ult to dete t if the
results are not exa tly and independently reprodu ible by other resear hers.
Modularity and reusability of the ode: modularity and reusability of the ode seem to
on i t with any size and speed optimization issues. Usually, spe ial implementations
are diÆ ult to reuse and to modify be ause of hidden or impli it inter onne tions be-
tween di erent parts of the ode, often due to sophisti ated programming te hniques,
tri ks and ha ks whi h they are based on, but yield to the best performan es in pra ti e.
In general, using the C++ language seems to be a good hoi e if the goal is to ome up
with a modular and reusable ode be ause it allows de ning lean, elegant interfa es
towards (and between) algorithms and data stru tures, while C is espe ially well suited
for fast, ompa t and highly optimized ode.
Limits of the implementations: many implementations have stri t requirements on the
size of the data they deal with, e.g., work only with small numbers or problem instan es
up to a ertain maximum size. It is important to noti e that ignoring size limits may
lead to substantially wrong empiri al on lusions, espe ially in the ase where the used
implementations, for performan e reasons, do not expli itly perform a urate data size
he king.
Numeri al robustness: implementations of omputational geometry algorithms may typ-
i ally su er from numeri al errors due to the nite-pre ision arithmeti of real omputing
ma hines.
Although it seems there is no sound and generally a epted solution to these issues, some
resear hers have proposed a urate and omprehensive guidelines on di erent aspe ts of the
empiri al evaluation of algorithms maturated from their own experien e in the eld (see, for
example [7, 34, 35, 47℄). The interested reader may nd in [43℄ an annotated bibliography of
experimental algorithmi s sour es addressing methodology, tools and te hniques.
3.1 Test Sets
The out ome of an experimental study may strongly depend on instan e type and size: using
wrong or not general test sets may lead to wrong on lusions. Performing experiments on a
good olle tion of test problems may also help in establishing the orre tness of a ode: in
parti ular, olle ting problem instan es on whi h a ode has exhibited buggy behavior may be
useful for testing further implementations for the same problem. In the ase of optimization
algorithms, there usually exist lasses of examples for whi h an algorithm does well or poorly:
examining su h \pathologi al" examples may help hara terize why the algorithm does poorly
and improve its performan es on a larger lass.
Standardizing ommon ben hmarks for algorithm evaluation is a fundamental task in
Experimental Algorithmi s. Mu h e ort has been put in olle ting, designing and generating
good problem instan es both for spe i problems and for general purpose appli ations. A
rst important question is whether to use real or syntheti test-sets. It is often the ase
that algorithms behave quite di erently if applied on real-life instan es and on randomly
generated test sets: both types of instan es should be therefore onsidered. It may be also
useful testing the algorithm on syntheti instan es with solution stru ture known in advan e:
7
this is espe ially true in the ase of hard problems, where exa t optima are diÆ ult, or even
impossible, to nd eÆ iently. A ommon te hnique for generating instan es with a given
solution stru ture onsists of starting from the desired solution and adding data to hide
it [34℄. We wish to remark that nding real data may not be easy, as they may be owned by
private ompanies. The Stanford GraphBase, the NIST olle tion and the DIMACS test sets
are su essful examples of su h an e ort.
The Stanford GraphBase [39℄ is a olle tion of datasets and omputer programs that
generate and examine a wide variety of graphs and networks. It onsists of small building
blo ks of ode and data and is less than 1.2 megabytes altogether. Data les in lude, for
instan e, numeri al data representing the input/output stru ture of US e onomy, highway
distan es between North Ameri an ities, digitized version of famous paintings, \digested"
versions of lassi works of literature. Several instan e generators in luded in the pa kage are
designed to onvert these data les into a large variety of interesting test sets that an be
used to explore ombinatorial algorithms. Other generators produ e graphs with a regular
stru ture or random instan es.
The US National Institute of Standards and Te hnology (NIST) olle ts heterogeneous
types of real data sets. In parti ular, Matrix Market [17℄ is a visual repository of test data
for use in omparative studies of algorithms, with spe ial emphasis on numeri al linear al-
gebra. It features nearly 500 sparse matri es from a variety of appli ation domains, su h as
air traÆ ontrol, astrophysi s, bio hemistry, ir uit physi s, omputer system simulation,
nite elements, demography, e onomi s, uid ow, nu lear rea tor design, o eanography,
and petroleum engineering. Ea h matrix in the olle tion features a web page whi h pro-
vides statisti s on the matrix properties and di erent visualizations of its stru ture (e.g.,
2-dimensional stru ture plots or 3-dimensional ity plots).
The Center for Dis rete Mathemati s and Theoreti al Computer S ien e (DIMACS) en-
ourages experimentation with algorithms by organizing, sin e 1993, hallenges in whi h the
best odes for spe i problems are ompared. Previous hallenges in lude, among others,
network ow and mat hing, liques, oloring and satis ability, traveling salesman problem,
priority queues, di tionaries, and multi-dimensional point sets, parallel algorithms for om-
binatorial problems [29℄. Many test sets and instan e generators have been olle ted for the
problems onsidered so far. All of them onform to spe i formats and are available over
the Web from the DIMACS website [29℄. DIMACS test sets are ommonly used as standard
ben hmarks for omparing the behavior of di erent algorithms for the same problem.
8
4.1 Algorithm Visualization Systems
Many software systems in the algorithmi area have been designed with the goal of provid-
ing spe ialized environments for algorithm visualization. Su h environments exploit inter-
a tive graphi s to enhan e the development, presentation, and understanding of omputer
programs [55℄. Thanks to the apability of onveying a large amount of information in a
ompa t form that is easily per eivable by a human observer, visualization systems an help
developers gain insight about algorithms, test implementation weaknesses, and tune suitable
heuristi s for improving the pra ti al performan es of algorithmi odes. Some examples of
this kind of usage are des ribed in [26℄.
Systems for algorithm animation have matured signi antly sin e the rise of modern om-
puter graphi interfa es and dozens of algorithm animation systems have been developed in
the last two de ades [19, 20, 54, 51, 27, 24, 56, 42, 21, 10℄. For a omprehensive survey we
refer the interested reader to [55, 28℄ and to the referen es therein. In the following we limit
to dis uss the features of algorithm visualization systems that appear to be most appealing
for their deployment in algorithm engineering.
From the viewpoint of the algorithm developer, it is desirable to rely on systems that
o er visualizations at a high level of abstra tion. Namely, one would be more inter-
ested in visualizing the behavior of a omplex data stru ture, su h as a graph, than in
obtaining a parti ular value of a given pointer.
Fast prototyping of visualizations is another fundamental issue: algorithm designers
should be allowed to reate visualization from the sour e ode at hand with little e ort
and without heavy modi ations. At this aim, reusability of visualization ode ould be
of substantial help in speeding up the time required to produ e a running animation.
One of the most important aspe ts of algorithm engineering is the development of
libraries. It is thus quite natural to try to interfa e visualization tools to algorithmi
software libraries: libraries should o er default visualizations of algorithms and data
stru tures that an be re ned and ustomized by developers for spe i purposes.
Software visualization tools should be able to animate not just \toy programs", but
signi antly omplex algorithmi odes, and to test their behavior on large data sets.
Unfortunately, even those systems well suited for large information spa es often la k
advan ed navigation te hniques and methods to alleviate the s reen bottlene k. Finding
a solution to this kind of limitations is nowadays a hallenge.
Advan ed debuggers take little advantage of sophisti ated graphi al displays, even in
ommer ial software development environments. Nevertheless, software visualization
tools may be very bene ial in addressing problems su h as nding memory leaks,
understanding anomalous program behavior, and studying performan e. In parti ular,
environments that provide interpreted exe ution may more easily integrate advan ed
fa ilities in support to debugging and performan e monitoring, and many re ent systems
attempt at exploring this resear h dire tion.
Debugging on urrent programs is more ompli ated than understanding the behavior
of sequential odes: not only on urrent omputations may produ e vast quantities
9
of data, but also the presen e of multiple threads that ommuni ate, ompete for re-
sour es, and periodi ally syn hronize may result in unexpe ted intera tions and non-
deterministi exe utions. Tools for the visualization of on urrent omputations should
therefore support a non-invasive approa h to visualization, sin e the exe ution may
be non-deterministi and an invasive visualization ode may hange the out ome of a
omputation. De larative spe i ation appears well suited at this aim.
There is a general onsensus that algorithm visualization systems an strongly bene t
from the potentialities o ered by the World Wide Web. Indeed, the use of the Web
for easy ommuni ation, edu ation, and distan e learning an be naturally onsidered
a valid support for improving the ooperation between students and instru tors, and
between algorithm engineers.
4.2 Program Che kers
Showing the orre tness of an algorithm and of its implementation is fundamental to gain
total on den e in its output. While theoreti al proofs are the standard approa h to the
former task, devising methodologies for assuring software reliability still remains a diÆ ult
problem, for whi h general powerful solutions are unlikely to be found.
A typi al debugging approa h relies on testing suites: a variety of input instan es for whi h
the orre t solution is known is sele ted, the program is run on the test set and its output is
ompared to the expe ted one, possibly dis overing anomalies and bugs. A areful sele tion
of the test suite (e.g., degenerate or worst- ase instan es) may lead to dis over interesting
fa ts about the ode, but there is no onsesus on what a good test set is and how meaningful
test sets an be generated. More importantly, testing allows it to prove the orre tness of a
ode only on spe i instan es.
A di erent approa h, named program veri ation, attempts at solving the debugging
problem by formally proving that the program is orre t. However, proving mathemati al
laims about the behavior of even simple programs appears to be very diÆ ult.
An alternative method is o ered by program he king, whi h is easier to do than veri -
ation, but more rigorous than testing. A program he ker is an algorithm for he king the
output of a omputation: given a program and an instan e on whi h the program is run, the
he ker erti es whether the output of the program on that instan e is orre t. For instan e,
a he ker for a ode that implements a planarity testing algorithm should exhibit a plane
embedding if the input graph is de lared to be planar, or a Kuratowski subgraph if the input
graph is de lared to be non-planar [44℄.
Designing good he kers may be not easy. Randomization and te hniques derived from
the theory of error-dete ting odes proved themselves to be valuable tools at this aim [16, 60℄.
We take as an example the ase of sorting. The input of the he ker is a pair of arrays:
the rst, say x, represents the input of the sorting program, the se ond, say y, its output.
The he ker must verify not only that the elements in y are in in reasing order, but also
that they in fa t are a permutation of the elements in x. This latter he k an be easily
a omplished if ea h element of y has atta hed a pointer to its original position in x. Even
if y annot be augmented with su h pointers, the problem an still be solved by employing a
randomized method [16,P61℄: a hash fun tion h is hosen at random from a suitably de ned
set of possibilities, and i h(xi ) is ompared against Pi h(yi). If y is a permutation of x,
the two values must be equal; otherwise, it an be proved that they di er with probability at
10
least 12 . If the he ker a epts only pairs x-y for whi h t su h tests pass, the probability of
error will be ( 21 )t , whi h an be made arbitrarily small.
To on lude, we wish to remark that he king allows one to test on all inputs, and not only
on inputs for whi h the orre t output is known. Che kers are usually simpler and faster than
the programs they he k, and thus, presumably, less likely to have bugs. Moreover, he kers
are typi ally reusable in the following sense: sin e they depend only on the spe i ation of
a omputational task, the introdu tion of a new unreliable algorithm for solving an old task
may not require any hange to the asso iated he ker.
11
The repository features implementations oded in in di erent programming languages, in lud-
ing C, C++, Fortran, Lisp, Mathemati a and Pas al. Also available are some input data les
in luding airplanes routes and s hedules, a list of over 3000 names from several nations, and
a subgraph of the Erdos-number author onne tivity graph. A ording to a study on WWW
hits to the Stony Brook Algorithm Repository site re orded over a period of ten weeks [52℄,
most popular problems were shortest paths, traveling-salesman, minimum spanning trees as
well as triagulations and graph data stru tures. On the opposite, least popular problems,
among others, were determinants, satis ability and planar graph drawing.
Libraries for spe ialized domains. Many other libraries for spe ialized domains have
been also implemented. TPIE [9℄ is a templated C++ library that abstra ts the Parallel Disk
Model and implements basi I/O primitives su h as s anning, sorting, permuting, and I/O
eÆ ient data stru tures su h as B-trees, R-trees, K-D-B-trees. CNOP [46℄ is a pa kage for
resour e onstrained network optimization problems that implements, among others, state
of the art algorithms for the onstrained shortest path and minimum spanning tree prob-
lems. The AGD library [6℄ o ers a broad range of existing algorithms for two-dimensional
graph drawing, in luding drawing planar graphs on the grid with straight lines, orthogonal
layout algorithms, hierar hi al approa hes. The CGAL library [31℄ supports numeri ally ro-
bust implementations of many omputational geometry algorithms, providing primitives for
manipulating basi and advan ed geometri obje ts and data stru tures, in luding polygons,
triangulations, polyhedrons, planar maps, range and segment trees, and kd-trees. It also
ontains generators for geometri obje ts. CPLEX is a widely used software pa kage for
solving integer, linear and quadrati programming problems. It was originally developed by
Bixby [15℄ and later ommer ialized. The software in ludes several versions of the simplex and
barriers algorithms and has undergone several major improvements in more than ten years.
This fa t, oupled with the improvements in omputer hardware, resulted in a software whi h
is today more than 6 orders of magnitude faster than in its rst version: there are problems
that an be solved in se onds today on desktop omputers that would have taken more than
2 years to solve using the best linear programming odes and best hardware available in the
late 1980s.
12
of the te hni al issues that may substantially a e t the exe ution performan e. Algorithmi
software libraries, tools for algorithm visualization, program he kers, generators of test sets
for experimenting with algorithms may be of great help throughout this pro ess.
Sin e it may be sometimes diÆ ult to draw general on lusions about algorithms from
experimental observations, developing a s ien e of algorithm testing appears to be funda-
mental. Important questions that have been only partially answered in lude the following.
What is an adequate set of test instan es and how does performan e depend on instan e
type and size? How an we provide eviden e of the orre tness of an implementation, i.e.,
guarantee that the algorithm implemented is the one intended and that the results it provides
are orre t? Whi h are proper performan e indi ators? In parti ular, in ase of bi riteria
problems, how an one onsistently explore tradeo s between running time and quality of the
solutions? And, nally, what are the proper questions that should be asked and the proper
methodologies for experimental works?
Referen es
[1℄ A. Aggarwal, A.K. Chandra, and M. Snir. Hierar hi al memory with blo k transfer. In Pro .
28th Annual IEEE Symposium on Foundations of Computer S ien e (FOCS 87), pages 204{216,
1987.
[2℄ A. Aggarwal and J.S. Vitter. The input/output omplexity of sorting and related problems.
Communi ations of the ACM, 31(9):1116{1127, 1988.
[3℄ M. Agrawal, N. Kayal, and N. Saxena. PRIMES is in P. Manus ript, August 2002.
[4℄ A.V. Aho, J.E. Hop roft, and J.D. Ullman. The Design and Analysis of Computer Algorithms .
Addison Wesley, 1974.
[5℄ R.K. Ahuia, T.L. Magnanti, and J.B. Orlin. Network Flows: .
Theory, Algorithms and Appli ations
Prenti e Hall, Englewood Cli s, NJ, 1993.
[6℄ D. Alberts, C. Gutwenger, P. Mutzel, and S. Naher. AGD-library: a library of algorithms for
graph drawing. In Pro . 1st Int. Workshop on Algorithm Engineering (WAE 97), pages 112{123,
1997.
[7℄ R. Anderson. The role of experiment in the theory of algorithms. In Pro eedings of
the 5th Challenge Workshop, 1996.
DIMACS Available over the Internet at the URL:
http://www. s.amherst.edu/~dsj/methday.html.
[8℄ L. Arge. External memory data stru tures. In Pro . European Symposium on Algorithm (ESA
01), LNCS 2161, pages 1{29, 2001.
[9℄ L. Arge, O. Pro opiu , and J.S. Vitter. Implementing I/O eÆ ient data stru tures using TPIE.
In Pro . 10th Annual European Symposium on Algorithm (ESA 02), LNCS 2461, pages 88{100,
2002.
[10℄ R.S. Baker, M. Boilen, M.T. Goodri h, R. Tamassia, and B. Stibel. Testers and visualizers for
tea hing data stru tures. SIGCSEB: SIGCSE Bulletin (ACM Spe ial Interest Group on Computer
S ien e Edu ation), 31, 1999.
[11℄ R. Bayer and E.M. M Creight. Organization and maintenan e of large ordered indexes. A ta
Informati a, 1(3):173{189, 1972.
[12℄ M. Bender, R. Cole, and R. Raman. Exponential stru tures for eÆ ient a he-oblivious algo-
rithms. In Pro . 29st Int. Colloquium on Automata, Languages and Programming (ICALP 02),
LNCS 2380, pages 195{207, 2002.
13
[13℄ M. Bender, E. Demaine, and M. Fara h-Colton. Ca he-oblivious B-trees. In Pro . 41st Annual
IEEE Symposium on Foundations of Computer S ien e (FOCS 00), pages 399{409, 2000.
[14℄ M. Bender, Z. Duan, J. Ia ono, and J. Wu. A lo ality-preserving a he-oblivious dynami di tio-
nary. In Pro . 13th Annual ACM-SIAM Symposium on Dis rete Algorithms (SODA 02), pages
29{38, 2002.
[15℄ R.E. Bixby. Implementing the simplex method: The initial basis. ORSA Journal on Computing ,
4:267{284, 1992.
[16℄ M. Blum and S. Kannan. Designing programs that he k their work. In Pro . 21st Annual ACM
Symp. on Theory of Computing (STOC 89), pages 86{97, 1989.
[17℄ R. Boisvert, R. Pozo, K. Remington, R. Barrett, and J. Dongarra. Matrix Market: a web resour e
for test matrix olle tions. In R. Boisvert, editor, The Quality of Numeri al Software: Assessment
and Enhan ement, pages 125{137. Chapman and Hall, London, 1997. Matrix Market is available
at the URL: http://math.nist.gov/MatrixMarket/.
[18℄ G.S. Brodal, R. Fagerberg, and R. Ja ob. Ca he-oblivious sear h trees via bynary trees of small
height. In Pro . 13th Annual ACM-SIAM Symposium on Dis rete Algorithms (SODA 02), pages
39{48, 2002.
[19℄ M.H. Brown. Algorithm Animation . MIT Press, Cambridge, MA, 1988.
[20℄ M.H. Brown. Zeus: a System for Algorithm Animation and Multi-View Editing. In Pro eedings
of the 7-th IEEE Workshop on Visual Languages, pages 4{9, 1991.
[21℄ G. Cattaneo, G.F. Italiano, and U. Ferraro-Petrillo. CATAI: Con urrent Algorithms and Data
Types Animation over the Internet. Journal of Visual Languages and Computing, 13(4):391{419,
2002. System Home Page: http://isis.dia.unisa.it/ atai/.
[22℄ B.V. Cherkassky and A.V. Goldberg. On implementing the push-relabel method for the maximum
ow problem. Algorithmi a, 19:390{410, 1997.
[23℄ T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introdu tion to Algorithms. M Graw-
Hill, 2001.
[24℄ P. Cres enzi, C. Demetres u, I. Fino hi, and R. Petres hi. Reversible Exe ution and Visualization
of Programs with Leonardo. Journal of Visual Languages and Computing, 11(2), 2000. System
home page: http://www.dis.uniroma1.it/~demetres/Leonardo/.
[25℄ G.B. Dantzig. Linear programming and extensions . Prin eton University Press, Prin eton, New
Jersey, 1963.
[26℄ C. Demetres u, I. Fino hi, G.F. Italiano, and S. Naeher. Visualization in algorithm engineering:
Tools and te hniques. In Dagstuhl Seminar on Experimental Algorithmi s 00371. Springer Verlag,
2001. To appear.
[27℄ C. Demetres u, I. Fino hi, and G. Liotta. Visualizing Algorithms over the Web with the
Publi ation-driven Approa h. In Pro . of the 4-th Workshop on Algorithm Engineering (WAE'00),
LNCS 1982, pages 147{158, 2000.
[28℄ S. Diehl. . LNCS 2269. Springer Verlag, 2001.
Software Visualization
14
[32℄ M.L. Fredman and R.E. Tarjan. Fibona i heaps and their uses in improved network optimization
algorithms. Journal of ACM, 34:596{615, 1987.
[33℄ M. Frigo, C.E. Leiserson, H. Prokop, and S. Rama handran. Ca he-oblivious algorithms. In Pro .
40th IEEE Symp. on Foundations of Computer S ien e (FOCS 99), pages 285{297, 1999.
[34℄ A.V. Goldberg. Sele ting problems for algorithm evaluation. In Pro . 3-rd Workshop on Algorithm
Engineering (WAE 99), LNCS 1668, pages 1{11, 1999.
[35℄ D. Johnson. A theoreti ian's guide to the experimental analysis of algorithms. In Pro eed-
ings of the 5th DIMACS Challenge Workshop, 1996. Available over the Internet at the URL:
http://www. s.amherst.edu/~dsj/methday.html.
[36℄ N.K. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatori a ,
4:373{395, 1984.
[37℄ L.G. Kha hiyan. A polynomial algorithm in linear programming. Doklady Akademiia Nauk SSSR,
244:1093{1096, 1979. Translated into English in Soviet Mathemati s Doklady, 20:191{194.
[38℄ D. Klingman, A. Napier, and J. Stutz. Netgen: A program for generating large s ale apa itated
assignment, transportation, and minimum ost network ow problems. Management S ien e,
20:814{821, 1974.
[39℄ Donald E. Knuth. Stanford GraphBase: A platform for ombinatorial algorithms. In Pro . 4th
Annual ACM-SIAM Symposium on Dis rete Algorithms (SODA 93), pages 41{43, 1993.
[40℄ R.E. Ladner, R. Fortna, and B.H. Nguyen. A omparison of a he aware and a he oblivious stati
sear h trees using program instrumentation. In R. Fleis her, B. Moret, and E. Meine he-S hmidt,
editors, Experimental Algorithmi s, LNCS 2547, pages 78{92. Springer Verlag, 2002.
[41℄ T. Leong, P. Shor, and C. Stein. Implementation of a ombinatorial multi ommodity ow algo-
rithm. In D.S. Johnson and C.C. M Geo h, eds., Network Flows and Mat hing: First DIMACS
Implementation Challenge, pages 387{406, 1993.
[42℄ A. Malony and D. Reed. Visualizing Parallel Computer System Performan e. In M. Simmons, R.
Koskela, and I. Bu her, editors, Instrumentation for Future Parallel Computing Systems, pages
59{90. ACM Press, 1999.
[43℄ C. M Geo h. A bibliography of algorithm experimentation. In Pro eedings of the
5th DIMACS Workshop,
Challenge 1996. Available over the Internet at the URL:
http://www. s.amherst.edu/~dsj/methday.html.
[44℄ K. Mehlhorn and S. Naher. From algorithms to working programs: On the use of program he king
in LEDA. In Pro . Int. Conf. on Mathemati al Foundations of Computer S ien e (MFCS 98),
1998.
[45℄ K. Mehlhorn and S. Naher. LEDA: A Platform of Combinatorial and Geometri Computing .
Cambrige University Press, 1999.
[46℄ K. Mehlhorn and M. Ziegelmann. CNOP: a pa kage for onstrained network optimization. In
Pro . 3rd Int. Workshop on Algorithm Engineering and Experiments (ALENEX 01), LNCS 2153,
pages 17{30, 2001.
[47℄ B.M.E. Moret. Towards a dis ipline of experimental algorithmi s. In Pro eedings of
the 5th Challenge Workshop, 1996.
DIMACS Available over the Internet at the URL:
http://www. s.amherst.edu/~dsj/methday.html.
[48℄ B.M.E. Moret and H.D. Shapiro. An empiri al assessment of algorithms for onstru ting a minimal
spanning tree. Computational Support for Dis rete Mathemati s, N. Dean and G. Shannon eds.,
DIMACS Series in Dis rete Mathemati s and Theoreti al Computer S ien e, 15:99{117, 1994.
15
[49℄ H. Prokop. Ca he-oblivious algorithms. Master's thesis, Massa husetts Institute of Te hnology,
Cambridge, MA, 1999.
[50℄ N. Robertson and P. Seymour. Graph minors: a survey. In J. Anderson, editor, Surveys in
Combinatori s, pages 153{171. Cambridge University Press, 1985.
[51℄ G.C. Roman, K.C. Cox, C.D. Wil ox, and J.Y Plun. PAVANE: a System for De larative Visu-
alization of Con urrent Computations. Journal of Visual Languages and Computing, 3:161{193,
1992.
[52℄ S. Skiena. Who is interested in algorithms and why? Lessons from the Stony Brook algorithms
repository. In Pro . 2nd Workshop on Algorithm Engineering (WAE 98), pages 204{212, 1998.
[53℄ D.A. Spielman and S.H. Teng. Smoothed analysis of algorithms: Why the simplex algorithm
usually takes polynomial time. In Pro . 33rd Annual ACM Symposium on Theory of Computing
(STOC 01), pages 296{305, 2001.
[54℄ J.T. Stasko. Animating Algorithms with X-TANGO. SIGACT News , 23(2):67{71, 1992.
[55℄ J.T. Stasko, J. Domingue, M.H. Brown, and B.A. Pri e. Software Visualization: Programming
as a Multimedia Experien e. MIT Press, Cambridge, MA, 1997.
[56℄ A. Tal and D. Dobkin. Visualization of Geometri Algorithms. IEEE Transa tions on Visualiza-
tion and Computer Graphi s, 1(2):194{204, 1995.
[57℄ J.S. Vitter. External memory algorithms and data stru tures. In J. Abello and J.S. Vitter, editors,
External Memory Algorithms and Visualization, DIMACS Series in Dis rete Mathemati s and
Theoreti al Computer S ien e. Ameri an Mathemati al So iety Press, 1999.
[58℄ J.S. Vitter and E.A.M. Shriver. Algorithms for parallel memory I: Two-level memories. Algorith-
mi a, 12(2/3):110{147, 1994.
[59℄ J.S. Vitter and E.A.M. Shriver. Algorithms for parallel memory II: Hierar hi al multilevel mem-
ories. Algorithmi a, 12(2/3):148{169, 1994.
[60℄ H. Wasserman and M. Blum. Software reliability via run-time result- he king. Journal of the
ACM, 44(6):826{849, 1997.
[61℄ M. Wegman and J. Carter. New hash fun tions and their use in authenti ation and set equality.
Journal of Computer and System S ien es, 22:265{279, 1981.
16