Towards a De nition of an Algorithm
Towards a De nition of an Algorithm
Noson S. Yanofsky∗
November 26, 2024
Abstract
We define an algorithm to be the set of programs that implement or
express that algorithm. The set of all programs is partitioned into equiv-
alence classes. Two programs are equivalent if they are essentially the
same program. The set of equivalence classes forms the category of algo-
rithms. Although the set of programs does not even form a category, the
set of algorithms form a category with extra structure. The conditions we
give that describe when two programs are essentially the same turn out to
be coherence relations that enrich the category of algorithms with extra
structure. Universal properties of the category of algorithms are proved.
1 Introduction
In their excellent text Introduction to Algorithms, Second Edition [9], Corman,
Leiserson, Rivest, and Stein begin Section 1.1 with a definition of an algorithm:
N.Y. 11210. And Computer Science Department, The Graduate Center, CUNY, New York,
N.Y. 10016. e-mail: noson@sci.brooklyn.cuny.edu
1
2 Noson S. Yanofsky
Knuth [13, 14] has been a little more precise in specifying the requirements
demanded for an algorithm. But he writes “Of course if I am pinned down and
asked to explain more precisely what I mean by these remarks, I am forced to
admit that I don’t know any way to define any particular algorithm except in a
programming language.” ([14], page 1.)
Although algorithms are hard to define, they are nevertheless real mathemat-
ical objects. We name and talk about algorithms with phrases like “Mergesort
runs in n lg n time”. We quantify over all algorithms, e.g., “There does not exist
an algorithm to solve the halting problem.” They are as “real” as the number
e or the set Z. See [10] for an excellent philosophical overview of the subject.
Many researchers have given definitions over the years. (Refer to [5] for a
historical survey of some of these definitions. One must also read the important
works of Yiannis Moschovakis, e.g., [20].) Many of the given definitions are of
the form “An algorithm is a program in this language/system/machine.” This
does not really conform to the current usage of the word “algorithm.” Rather,
this is more in tune with the modern usage of the word “program.” They all
have a feel of being a specific implementation of an algorithm on a specific
system. Imagine a professor teaching a certain algorithm to a class and then
assigning the class to go home and program the algorithm. In any class with the
moral abhorrence of cheating, the students will return many different programs
implementing the same algorithm. We would not call each of these different
programs an algorithm. Rather the different programs are implementations of
a single algorithm. And yet some researcher do call each of those programs a
different algorithm, e.g. [6]. We would like to propose another definition.
Consider Figure 1.
At the bottom of the figure is the set of all functions. Two functions are
highlighted: the sort function and the function that outputs the maximum of
its inputs. On top of the figure is the set of all programs. For every function
there is a set of programs that implement that function. We have highlighted
four programs that implement the sort function: mergesorta and mergesortb
are two different programs that implement the algorithm mergesort. Similarly
quicksortx and quicksorty are two different implementations of the algorithm
quicksort. There are also many different programs that implement the max
function. mergesorta and mergesortb are grouped in one subset of all the
programs that implement the sort function. This subset will correspond to
the mergesort algorithm. Similarly, quicksortx and quicksorty are grouped
together and will correspond to the quicksort algorithm. There are similar
groupings for a binary search algorithm that finds the max of a list of elements.
There are also other algorithms that find the max. This intuition propels us to
define an algorithm as the set of all programs that implement the algorithm.
We define an algorithm analogously to the way that Gottlob Frege defined
a natural number. Basically Frege says that the number 42 is the equivalence
class of all sets of size 42. He looks at the conglomerate of all finite sets and
makes an equivalence relation. Two finite sets are equivalent if there is a one-
to-one onto function from one set to the other. The set of all equivalence classes
under this equivalence relation forms the set of natural numbers. For us, an
Towards a Definition of an Algorithm 3
• One program might perform P rocess1 first and then perform an unre-
lated P rocess2 after. The other program will perform the two unrelated
processes in the opposite order.
• One program might perform a certain process in a loop n times and the
other program will unwind the loop and perform it n − 1 times and then
perform the the process again outside the loop.
• One program might perform two unrelated processes in one loop, and the
other program might perform each of these two processes in its own loops.
In all these examples, the two programs are definitely performing the same
function, and everyone would agree that both programs are implementations of
4 Noson S. Yanofsky
the same algorithm. We are taking that subset of programs to be the definition
of an algorithm.
Many relations that say when two programs are essentially the same will
be given. However, it is doubtful that we have the final word on this. Hence
the word “Towards” in the title. Whether or not two programs are essentially
the same, or whether or not a program is an implementation of a particular
algorithm is really a subjective decision. Different relations can be given for
different purposes. We give relations that most people can agree on that these
two programs are essentially the same, but we are well aware of the fact that
others can come along and give more, less or different relations. The important
realization is that the relations that we feel are the most obvious turn out to
be relations that correspond to standard categorical coherence rules. When we
mod-out by any set of relations, we get more structure. When we mod-out by
these relations, our set of programs become a category with more structure.
Our goal is not to give the final word on the topic, but to point out that this is
a valid definition of an algorithm and that the equivalence classes of algorithms
has more structure than the set of programs.
We consider the set of all programs which we might call Programs. An
equivalence relation ≈ of “essentially the sameness” is then defined on this set.
The set of equivalence classes Programs/ ≈ shall then be called Algorithms.
There is a nice onto function from φ : Programs −→ Algorithms, that takes
every program P to the equivalence class φ(P ) = [P ]. One might think of any
function ψ : Algorithms → Programs such that φ◦ψ = IdAlgorithms as an
“implementer.” ψ takes an algorithm to an implementation of that algorithm.
To continue with this line of reasoning, there are many different algorithms
that perform the same function. For example, Kruskal’s algorithm and Prim’s
algorithm are two different ways of finding a minimum spanning tree of a
weighted graph. Quicksort and Mergesort are two different algorithms to sort
a list. There exists an equivalence relation on the set of all algorithms. Two
algorithms are equivalent ≈0 if they perform the same function. We obtain
Algorithms/ ≈0 which we might call Comp. Functions or computable func-
tions. It is an undecidable problem to determine when two programs perform the
same computable function. Hence we might not be able to effectively give the
relation ≈0 , nevertheless it exists. Even if we were able to give the relation, that
would not mean that the word problem (i.e., telling when two different equiv-
alence classes of descriptions are equivalent) is solvable. Nevertheless, there is
an onto function φ0 : Algorithms −→ Comp. Functions.
We summarize our intentions with the following picture.
pure mathematicians.
With this picture in mind, we can explain other equivalence relations describ-
ing program “sameness”. One can give many different equivalence relations but
they must fall within the two extremes. One extreme says that no two programs
are really the same, i.e., every program is essentially an algorithm. In that case
Programs = Algorithms. This extreme case is taken up by [6]. In contrast,
another extreme is to say that two programs are the same if they perform the
same operation or are bisimilar. In that case Algorithms = Comp. Func-
tions. In this paper we choose a middle way. Others can have other equivalence
relations but they must fall in the middle. There are finer and courser equiva-
lence relations than ours. There will also be unrelated equivalence relations. For
every equivalence relation, the set of algorithms will have a particular structure.
In our scheme, Programs will form a directed graph with a composition of
arrows and a distinguished loop on every vertex. However they will not have
the structure of a true category: the composition will not be associative and
the distinguished loops will not act like the identity. In contrast, Algorithms
will be a real category with extra structure: a Cartesian product structure
and a weak parameterized natural number object (a categorical way of saying
that the category is closed under recursion). This category will turn out to be
an initial category in the 2-category of all categories with products and weak
parameterized natural number objects.
Others have studied similar categories before. Joyal in an unpublished
manuscript about “arithmetical universes”. (see [17] for a history) as well as
[7], [22] and [21] have looked at the free category with products and a strong
natural number object. Marie-France Thibault [23] has looked at a Cartesian
closed category with a weak natural number object. They characterized what
type of functions can be represented in such categories. Although related cat-
egories have been studied, the connection with the notion of an algorithm has
never been seen. Nor has this category ever been constructed as a quotient of a
syntactical graph.
We are not trying to make any ontological statement about the existence of
algorithms. We are merely giving a mathematical way of describing how one
might think of an algorithm. Human beings dealt with rational numbers for
millennia before mathematicians decided that rational numbers are equivalence
classes of pairs of integers:
where
(m, n) ≈ (m0 , n0 ) iff mn0 = nm0 .
Similarly, one can think of the existence of algorithms in any way that one
chooses. We are simply offering a mathematical way of presenting them.
There is a interesting analogy between thinking of a rational number as an
equivalence class of pairs of integers and our definition of an algorithm as an
equivalence class of programs. Just as a rational number can only be expressed
by an element of the equivalence class, so too, an algorithm can only be expressed
6 Noson S. Yanofsky
There is another way to view this entire endeavor. What we are creating here
is an operad. Operads are a universal algebraic/categorical way of describing
extra algebraic structure. Recently operads have become very popular with
algebraic topologists and people who study quantum field theories. We are
creating an operad that describes some of the extra structure that exists on
the set of total functions of a certain type. With such total functions one can
compose, do recursion, and take the product of those functions. We than can
Towards a Definition of an Algorithm 7
look at the algebra of this operad generated by all total functions from powers
of N to N. One then can examine the subalgebra generated by basic or initial
functions (this essentially is our PRdesc). We go further and look at a quotient
of this subalgebra by using more relations (this essentially is our PRalg). We
show in section 4 of this paper that this quotient subalgebra is an initial object
in a certain 2-category. This operadic viewpoint is further elaborated and used
in [19] where we tackle the harder problem of all recursive functions.
• Just as our category of Algorithms is the free category with products and
a weak natural number object generated by the empty category, so too,
the category of Braids is the free braided monoidal category generated
by one object.
• Just as the main focus of computer scientists are algorithms and not pro-
grams, so to, the main focus of topologists is braids and not braid dia-
grams.
There is obviously much more to explore with these analogies. There also should
be a closer relationship between these fields. After all, some of our relations are
very similar to Reidermeister moves.
Section 2 will review the basics of primitive recursive functions and show how
they may be described by special labeled binary trees. Section 3 will then give
many of the relations that tell when two descriptions of a primitive recursive
function are “essentially” the same. Section 4 will discuss the structure of
the set of all algorithms. We shall give a universal categorical description of
the category of algorithms. This is the only Section that uses category theory
in a non-trivial way. Section 5 will discuss complexity results and show how
complexity theory fits into our framework. We conclude this paper with a list
of possible ways this work can progress.
At this point it is appropriate to say what this paper is not.
• Nothing new will be said about category theory. Rather, we are making
a link of these categories and the concept of an algorithm.
• We will not say anything new about program semantics. Our equivalence
relations are between descriptions that correspond to the same function.
Yuri Manin has incorporated an earlier draft [24] of this paper into his second
edition of his A Course in Mathematical Logic [18]. Within Chapter IX of
that book he describes the constructions given in this paper using the language
of PROPs and operads that are of interest to mathematicians and theoretical
physicists. This earlier draft [24] was also discussed in [6].
h(n + 1) = g(h(n)).
A more complicated form of recursion — and the one we shall employ — is
for a given function f : Nk → Nm and a given function g : Nk × Nm → Nm .
From this one constructs h : Nk × N → Nm as
h(x, 0) = f (x)
h(x, n + 1) = g(x, h(x, n))
where x ∈ Nk and n ∈ N.
The most general form of recursion, and the definition usually given for
primitive recursive functions is for a given function f : Nk → Nm and a given
function g : Nk × Nm × N → Nm . From this, one constructs h : Nk × N → Nm
h(x, 0) = f (x)
h(x, n + 1) = g(x, h(x, n), n)
where x ∈ Nk and n ∈ N.
We shall use the middle definition of recursion because the extra input vari-
able in g does not add anything [11]. It simply makes things unnecessarily
complicated. However, we are certain that any proposition that can be said
about the second type of recursion, can also be said for the third type. See [3]
Section 7.5, and [4] Section 5.5.
Although primitive recursive functions are usually described as closed only
under composition and recursion, there is, in fact, another implicit operation
for which the functions are closed: bracket. Given primitive recursive functions
f : Nk → N and g : Nk → N, there is a primitive recursive function h = hf, gi :
Nk → N × N. h is defined as
for any x ∈ Nk . We shall see that having this bracket operation is almost the
same as having a product operation.
In order to save the eyesight of our poor reader, rather than writing too
many exponents, we shall write a power of the set N for some fixed but arbitrary
number as A, B, C etc. With this notation, we may write the recursion operation
as follows: from functions f : A → B and g : A × B → B one constructs
h : A × N → B.
If f and g are functions with the appropriate source and targets, then we
shall write their composition as h = f ◦ g. If they have the appropriate source
and target for the bracket operations, we shall write the bracket operation as
h = hf, gi. We are in need of a similar notation for recursion. So if there are
f : A → B and g : A × B → B we shall write the function that one obtains from
them through recursion as h = f ]g : A × N → B
We are going to form a directed graph that contains all the descriptions of
primitive recursive functions. We shall call this graph PRdesc. The vertices
of the graph shall be powers of the natural number N0 = ∗, N, N2 , N3 , . . .. The
edges of the graph shall be descriptions of primitive recursive functions. One
should keep in mind the following intuitive picture.
Towards a Definition of an Algorithm 11
· · X· /*4 Nk v /*4 · · · r *
3/4 N4
*/ ,
5 N3X /*5 N2 /)5 N ovh ∗
i
2.1 Trees
Each edge in PRdesc shall be a labeled binary tree whose leaves are basic
functions and whose internal nodes are labeled by C, R or B for composi-
tion, recursion and bracket. Every internal node of the tree shall be derived
from its left child and its right child. We shall use the following notation:
g◦f :A→C h = f ]g : A × N → B hf, gi : A → B × C
C R B
In other words, πBA outputs the proper numbers in the order described by X.
Whenever possible, we shall be ambiguous with superscripts and subscripts.
Setting
X = I = h1, 2, 3, . . . , ni
we have what looks like the identity functions. Setting
X = 4 = h1, 2, 3, . . . , n, 1, 2, 3, . . . , ni
12 Noson S. Yanofsky
or in terms of trees
f ×g :A×C→B×D
P
f :A→B g:C→D
f × g = hf ◦ πAA×C , g ◦ πCA×C i : A × C → B × D
B
f ◦ πAA×C : A × C → B g ◦ πCA×C : A × C → D
C C
πAA : A → A πAA : A → A
We took the bracket operation as fundamental and from the bracket opera-
tion we derived the product operation and the diagonal map. We could have just
as easily taken the product and the diagonal as fundamental and constructed
the bracket as
hf,gi
A6 / B×C
66 A
66
66
66
4 6 f ×g
66
66
A × A.
Towards a Definition of an Algorithm 13
Twist Map. We shall need to switch the order of inputs and outputs. The
twist map shall be defined as
Or in terms of trees:
twA,B : A × B → B × A = πBA×B × πAA×B : A × B → B × A
P
πBA×B : A × B → B πAA×B : A × B → A
g1 g2 : A × B × B → B × B
on elements as follows
In terms of maps, may be defined from the composition of the following maps:
B×B
g1 g2 = (g1 × g2 ) ◦ (πA × twA,B × πB ) ◦ (4 × πB×B ):
A × B × B → A × A × B × B → A × B × A × B → B × B.
Since the second variable product is related to the product which is derived
from the bracket, we write it as
g1 g2 : A × B × B → B × B
B’
g1 : A × B → B g2 : A × B → B
◦g2 : A × C → B
g1 ¨
on elements as follows
A×C→A×A×C→A×D→B
We write second variable composition as
g1 ¨◦g2 : A × C → B
C’
g2 : A × C → D g1 : A × D → B
3 Relations
Given the operations of composition, recursion and bracket, what does it mean
for us to say that two descriptions of a primitive recursive function are “es-
sentially” the same? We shall examine these operations, and give relations to
describe when two trees are essentially the same. If two trees are exactly alike
except for a subtree that is equivalent to another tree, then we may replace the
subtree with the equivalent tree.
3.1 Composition
Composition is Associative. That is, for any three composable maps f , g
and h, we have
h ◦ (g ◦ f ) ≈ (h ◦ g) ◦ f.
In terms of trees, we say that the following two trees are equivalent:
h ◦ (g ◦ f ) : A → D ≈ (h ◦ g) ◦ f : A → D
C C
Composition and the Null Function. The null function always outputs a
0 no matter what the input is. So for any function f : A → N, if we are going
Towards a Definition of an Algorithm 15
to compose f with the null function, then f might as well be substituted with
a projection, i.e.,
n ◦ f ≈ n ◦ πNA .
In terms of trees:
n◦f :A→N ≈ n ◦ πNA : A → N.
C C
f1 f2 · · · fk
Notice that the left side of the left tree is essentially “pruned.” Although
there is much information on the left side of the left tree, it is not important.
It can be substituted with another tree that does not have that information.
In terms of procedures, this says that doing g and then doing both f1 and f2 is
the same as doing both f1 ◦ g and f2 ◦ g, i.e., the following two flowcharts are
essentially the same.
g ≈ g g
???
??
??
f1 f2 f1 f2
In terms of trees, this amounts to saying that these trees are equivalent:
hf1 , f2 i ◦ g : A → C1 × C2 hf1 ◦ g, f2 ◦ gi : A → C1 × C2
C B
g:A→B hf1 , f2 i : B → C1 × C2
f1 ◦ g : A → C1 f2 ◦ g : A → C2
B
C C
f 1 : B → C1 f 2 : B → C2
g : A → B f 1 : B → C1 g : A → B f2 : B → C2
It is important to realize that it does not make sense to require composition
to distribute over bracket on the left:
g ◦ hf1 , f2 i hg ◦ f1 , g ◦ f2 i.
16 Noson S. Yanofsky
f1 ? f2 f1 f2
??
??
??
g g g
The left g requires two inputs. The right g’s only require one.
3.3 Bracket
Bracket is Associative. The bracket is associative. For any three maps f, g,
and h with the same domain, we have
hf, gi ≈ tw ◦ hg, f i.
f :A→B g:A→C
hg, f i : A → C × B tw : C × B → B × C
B
g:A→C f :A→B
Twist is Idempotent. There are other relations that the twist map must
respect. Idempotent means
A×B
twA,B ◦ twA,B ≈ πA×B : A × B → A × B.
Towards a Definition of an Algorithm 17
Twist is Coherent. We would like the twist maps of three elements to get
along with themselves.
(twB,C ×πA )◦(πB ×twA,C )◦(twA,B ×πC ) ≈ (πC ×twA,B )◦(twA,C ×πB )◦(πA ×twB,C ).
This is called the hexagon law or the third Reidermeister move. Given the
idempotence and hexagon laws, it is a theorem that there is a unique twist map
made of smaller twist maps between any two products of elements ([16] Section
XI.4).
hf, gi : A → B × C πBB×C : B × C → B
B
f :A→B g:A→C
Similarly for a projection onto the second output: g ≈ πCB×C ◦ hf, gi.
h = (hf1 , f2 i](g1 g2 )) : A × N → B × B
R
hf1 , f2 i : A → B × B g1 g2 : A × B × B → B × B
B B’
f1 : A → B f2 : A → B g1 : A × B → B g2 : A × B → B
f1 : A → B g1 : A × B → B f2 : A → B g2 : A × B → B.
h:A×N→B g1 : A × B → B
R
f :A→B ◦g1 : A × B → A
g2 ¨
C’
g2 : A × B → B g1 : A × B → B
is equivalent (≈) to
h0 : A × N → B
R
◦f : A → B
g1 ¨ g1 ¨◦g2 : A × B → B
C’ C’
f :A→B g1 : A × B → B g2 : A × B → B g1 : A × B → B
20 Noson S. Yanofsky
n:N→N h:A×N→B
R
f :A→B g :A×B→B
Notice that the g on the left tree is not on the right tree.
3.6 Products
The product is associative. That is for any three maps f : A → A0 , g : B →
B0 and h : C → C0 the two products are equivalent:
f × (g × h) ≈ (f × g) × h : A × B × C → A0 × B0 × C0 .
A×B
πAA × πBB ≈ πA×B .
This falls out of the fact that the bracket respects the identity.
Interchange Rule. We must show that the product and the composition
respect each other. In terms of maps, this corresponds to the following situation:
A1 o / B1
π π
A1 × B1
f1 f1 ×g1 g1
A2 o / B2
π π
f2 ◦f1 A2 × B2 g2 ◦g1
f2 f2 ×g2 g2
A3 o / B3
π π
A3 × B3
(f2 × g2 ) ◦ (f1 × g1 ) and (f2 ◦ f1 ) × (g2 ◦ g1 ) are two ways of getting from
A1 × B1 to A3 × B3 . We shall declare these two methods equivalent:
(f2 × g2 ) ◦ (f1 × g1 ) : A1 × B1 → A3 × B3
C
f1 × g1 : A1 × B1 → A2 × B2 f2 × g2 : A2 × B2 → A3 × B3
P P
f 1 : A1 → A2 g1 : B1 → B2 f2 : A2 → A3 g2 : B2 → B3
(f2 ◦ f1 ) × (g2 ◦ g1 ) : A1 × B1 → A3 × B3
P
f 2 ◦ f 1 : A1 → A3 g2 ◦ g1 : B1 → B3
C C
f1 : A1 → A2 f2 : A 2 → A 3 g1 : B1 → B2 g2 : B2 → B3 .
One should realize that this equivalence is not anything new added to our
list of equivalences. It is actually a consequence of the definition of product and
the equivalences that we assume about bracket. In detail
(f2 × g2 ) ◦ (f1 × g1 ) = hf2 π, g2 πi ◦ hf1 π, g1 πi ≈ hf2 πhf1 π, g1 πii, g2 πhf1 π, g1 πii
≈ hf2 ◦ f1 π, g2 ◦ g1 πi = (f2 ◦ f1 ) × (g2 ◦ g1 ).
The first and the last equality are from the definition of product. The first
equivalence comes from the fact that composition distributes over bracket. The
second equivalence is a consequence of the relationship between the projection
maps and the bracket.
4 Algorithms
We have given relations telling when two programs/trees/descriptions are sim-
ilar. We would like to look at the equivalence classes that these relations gen-
erate. It will become apparent that by taking PRdesc and “modding out” by
these equivalence relations, we shall get more structure.
The relations split up into two disjoint sets: those for which there is a loss
of information and those for which there is no loss of information. Let us call
the former set of relations (I) and the latter set (II). The following relations
are in group (I).
Towards a Definition of an Algorithm 23
After setting these trees equivalent, there exists the following quotient graph
and graph morphism.
PRdesc / / PRdesc/(I)
In detail, PRdesc/(I) has the same vertices as PRdesc, namely powers of the
set of natural numbers. The edges are equivalence classes of edges of PRdesc.
Descriptions of primitive recursive functions which are equivalent to “pruned”
descriptions by relations of type (I) we shall call “stupid descriptions”. They
are descriptions that are wasteful in the sense that part of their tree is ded-
icated to describing a certain function and that function is not needed. The
part of the tree that describes the unneeded function can be lopped off. One
might call PRdesc/(I) the graph of “intelligent descriptions” since within this
graph every “stupid descriptions” is equivalent to another program without the
wastefulness.
We can further quotient PRdesc/(I) by relations of type (II):
1. Composition Is Associative: f ◦ (g ◦ h) ≈ (f ◦ g) ◦ h.
7. Twist Is Idempotent: tw ◦ tw = π.
8. Reidermeister III:
(twB,C ×πA )◦(πB ×twA,C )◦(twA,B ×πC ) ≈ (πC ×twA,B )◦(twA,C ×πB )◦(πA ×twB,C ).
10. Recursion and Composition: g1 ¨◦(f ](g2 ¨◦g1 )) ≈ (g1 ¨◦f )](g1 ¨◦g2 ).
PRalg, or primitive recursive algorithms, are the main object of interest in this
Section.
What does PRalg look like? Again the objects are the same as PRdesc,
namely powers of the set of natural numbers. The edges are equivalence classes
of edges of PRdesc.
What type of structure does it have? In PRalg, for any three composable
arrows, we have
f ◦ (g ◦ h) = (f ◦ g) ◦ h
and for any arrow f : A → B we have
f ◦ πAA = f = πBB ◦ f.
That means that composition is associative and that the π’s act as identities.
Whereas PRdesc was only a graph with a composition and identities that did
not act like identities, PRalg is a genuine category.
PRalg has more structure than only a category. For one, there is a strictly
associative product. On objects, the product structure is obvious:
Nm × Nn = Nm+n .
On morphisms, the product × was defined using the bracket above. The π are
the projections of the product. In PRalg the twist map is idempotent and
coherent. The fact that the product respects the composition is expressed with
the interchange rule.
The category PRalg is closed under recursion. In other words, for any
f : A → B and any g : A × B → B, there exists an h : A × N → B defined
by recursion. The categorical way of saying that a category is closed under
recursion, is to say that the category contains a weak parameterized natural
number object. The simplest definition of a weak natural number object in a
category is a diagram
∗
0 /N s /N
such that for any k ∈ N and g : N → N, there exists an h : N → N such that the
following diagram commutes.
∗?
0 /N s /N
??
??
??
??
?
k ??
h h
??
??
N /N
g
Towards a Definition of an Algorithm 25
(See e.g. [3, 4, 16]). Following [15], we do not insist that the h is a unique
morphism that satisfies the condition. When there is uniqueness, we say that
the natural number object is strong. Saying that the above diagram commutes
is the same as saying that h is defined by the simplest recursion scheme. For
our more general version of recursion, we require a weak parameterized natural
number object, that is, for every f : A → B and g : A × B → B there exists a
h : A × N → B such that the following two squares commute.
A×∗
π×0
/ A×N A×N
π×s
/ A×N
o h hπAA×N ,hi h
A /B A×B /B
f g
We must show that in PRalg, the natural number object respects the
bracket operation. This fundamentally says that the central square in the fol-
26 Noson S. Yanofsky
A ×5∗
π×0
/ A×N
55 ++
55 ++
55 ++
5 ++
o 55
55
h1
++
55 ++
+
o /B hh1 ,h2+i+
H @ A Z55 ++
f1
55 ++
55
55 ++
o ++h2
π 55
55 ++
55 ++
++
/ B×B
{ A CC ++
{{ hf1 ,f2 i
{{{{{ CC ++
{ {
{ CC ++
{{{{{ CC
CC ++
{{{{{{ π CC
{{{{{{ CC ++
{{{{{{ CC ++
y C!
A /B
f2
The left hand triangles commute from the fact that ∗ is a terminal object. The
right hand triangles commute because the equivalence relation forced the pro-
jections to respect the bracket. The inner and outer quadrilateral are assumed
to commute. We conclude that the central square commutes.
A×N
π×s
/ A×N
>> ++
>>
>> ++
>> ++
>> ++
o >>
++
h1
>>
>> ++
+
o A@ × B /B hh1 ,h2+i+
g1 Z55 ++
55
55 ++
++
o π 55
π 55 ++h2
55 ++
55 ++
++
/ B×B
A×B×B
v CC ++
vv
g1 g2
CC ++
v v CC
vv CC ++
v vv CC ++
vv π π CC
CC ++
vv
vvv CC ++
zvv C!
A×B /B
g2
Similarly, the left and the right triangles commute because the projections act
Towards a Definition of an Algorithm 27
as they are supposed to. The inner and outer quadrilateral commute out of
assumption. We conclude that central square commutes.
We also must show that the natural number object respects the composition
of morphisms. In ] notation this amounts to
◦(f ](g2 ¨
g1 ¨ ◦g1 )) = (g1 ¨◦f )](g1 ¨◦g2 ).
∗+
0 /N s /N
++
+++ ,
,,,
++ ++ ,,
++
++
,,
++ ++
++ k ,,
h +
h0 + h0 ,,
h
++ ++ ,,
++ ++
++ ++ ,,
++ ,,
+ ,
B / B /B / B.
g1 g2 g1
Let us spend a few moments with some category theory. There is the
category Cat of all (small) categories and functors between them. Consider
also the category CatXN. The objects are triples, (C, ×, N ) where C is a
(small) category, × is a strict product on C and N is a weak parameter-
ized natural number object in C. The morphisms of CatXN are functors
F : (C, ×, N ) → (C0 , ×0 , N 0 ) that respect the product and the natural num-
ber object. For F : C → C0 to respect the product, we mean that
∗
0 /N s /N
28 Noson S. Yanofsky
00 / N0 s0 / N0
∗0
L
-
Cat l ⊥ CatXN.
U
This adjunction means that for all small categories C ∈ Cat and D ∈ CatXN
there is an isomorphism
Since ∅ is the initial object in Cat, the right set has only one object. In other
words L(∅) is a free category with product and a weak parameterized natural
number object and it is an initial object in the category CatXN.
We claim that L(∅) is none other then our category PRalg.
The point of this theorem is that PRalg is not simply a nice category where
all algorithms live. Rather it is a category with much structure. The structure
tells us how algorithms are built out of each other. PRalg by itself is not very
interesting. It is only its extra structure that demonstrates the importance of
this theorem. PRalg is not simply the category made of algorithms, rather, it
is the category that makes up algorithms.
PRfunc is the smallest category with a strict product and a strong param-
eterized natural number object.
Before we go on to other topics, it might be helpful to —literally— step away
from the trees and look at the entire forest. What did we do here? The graph
PRdesc has operations. Given edges of the appropriate arity, we can compose
them, bracket them or do recursion on them. But these operations do not
have much structure. PRdesc is not even a category. By placing equivalence
relations on PRdesc, which are basically coherence relations, we are giving the
quotient category better and more amenable structure. So coherence theory,
sometimes called higher-dimensional algebra, tells us when two programs are
essentially the same.
5 Complexity Results
An algorithm is not one arrow in the category PRalg. An algorithm is a scheme
of arrows, one for every input size. We need a way of choosing each of these
arrows.
There are many different species of algorithms. There are algorithms that
accept n numbers and output one number. A scheme for such an algorithm
might look like this:
N1
c1 N2
c2
.. c /No N3
. ? O _??? c3
ck ??
c4 ??
?
c
N k N4
···
N1
c1
/ N1 N2
c2
/ N2 ... Nk
ck
/ Nk ...
F NVNVVV
ppppp NNNVVVVV
V
pppp NNN VVVSch
NNSch VVVV
VVVV
p Sch
NN&
wppp VV*
Sch
Perhaps it is time to get down from the abstract highland and give two
examples. We shall present mergesort and insertion sort as primitive recursive
algorithms. They are two different members of PRalgF . These two different
algorithms perform the same function in PRfuncF .
Example: Mergesort depends on an algorithm that merges two sorted lists into
one sorted list. We define an algorithm M erge that accepts m numbers of the
first list and n numbers of the second list. M erge inputs and outputs m + n
numbers.
M erge0,1 (x1 ) = M erge1,0 (x1 ) = π11 (x1 ) = x1
M ergem,n (x1 , x2 , . . . , xm , xm+1 , . . . , xm+n ) =
Towards a Definition of an Algorithm 31
(M ergem,n−1 (x1 , x2 , . . . , xm , xm+1 , . . . , xm+n−1 ), xn ) : xm ≤ xn
(M ergem−1,n (x1 , x2 , . . . , xm−1 , xm+1 , . . . , xm+n ), xm ) : xm > xn
With M erge defined, we go on to define M ergeSort. M ergeSort recursively
splits the list into two parts, sorts each part and then merges them.
M ergeSortk (x1 , x2 , . . . , xk ) =
M ergexk/2y,pk/2q (M ergeSortxk/2y (x1 , x2 , . . . , xxk/2y ), M ergeSortpk/2q (xxk/2y+1 , xxk/2y+2 , . . . , xk )
We might write this in short as
Insertk (x1 , x2 , . . . , xk , x) =
(x1 , x2 , . . . , xk , x) : xk ≤ x
(Insertk−1 (x1 , x2 , . . . , xk−1 , x), xk ) : xk > x
The top case is the function πkk × π11 and the bottom case is the function
k−1
(Insertk−1 × π) ◦ (πk−1 × twN,N ). With Insert defined, we go on to define
InsertionSort.
InsertionSort1 (x) = πNN (x) = x
InsertionSortk (x1 , x2 , . . . , xk ) = Insertk−1 (InsertionSortk−1 (x1 , x2 , . . . , xk−1 ), xk )
We might write this in short as
InsertionSort = Insert(InsertionSort × π)
The point of the these examples, is to show that although these two algo-
rithms perform the same function, they are clearly very different algorithms.
Therefore one can not say that they are “essentially” the same.
Now that we have placed the objects of study in order, let us classify them via
complexity theory. The only operations in our trees that are of any complexity
is the recursions. Furthermore, the recursions are only interesting if they are
nested within each other. So for a given tree that represents a description of a
32 Noson S. Yanofsky
primitive recursive function, we might ask what is the largest number of nested
recursions in this tree. In other words, we are interested in the largest number
of “R” labels on a path from the root to a leaf of the tree. Let us call this the
Rdepth of the tree.
Formally, Rdepth is defined recursively on the set of our labeled binary trees.
The Rdepth of a one element tree is 0. The Rdepth of an arbitrary tree T is
fA (n) = Rdepth(A(cn ))
Rdepth1 : (PRdesc/(I))F → {f |f : N → R+ }.
where the minimization is over all descriptions A0 in the equivalence class [A].
(For the categorical cognoscenti, Rdepth1 is a right Kan extension of Rdepth0
along the projection PRdescF −→ (PRdesc/(I))F .
Rdepth1 can easily be extended to
Rdepth2 : PRalgF → {f |f : N → R+ }.
The following theorem will show us that we do not have to take a minimum
over an entire equivalence class.
Proof. Examine all the trees that express these relations throughout this paper.
Notice that if two trees are equivalent, then their Rdepths are equal.
Rdepth2 can be extended to
Rdepth3 : PRfuncF → {f |f : N → R+ }.
We do this again with a minimization over the entire equivalence class (i.e. a
Kan extension.)
And so we have the following (not necessarily commutative) diagram.
{f |f : N → R+ }
6 Future Directions
We are in no way finished with this work and there are many directions that it
can be extended.
Extend to all Computable Functions. The most obvious project that we
are pursuing is to extend this work from primitive recursive functions to all com-
putable functions. In order to do this we must add the minimization operation.
For a given g : A × N → N, there is an h : A → N such that
A
! /∗
hπAA ,h0 i 1
A×N
g
/N
i.e.,
g(x, h0 (x)) = 1.
Let h : A → N be the minimum such function.
We might want to generalize this operation. Let f : A → B and g : A × N →
B, then we define h : A → N to be the function
Categorically, this amounts to looking at all functions h0 that make the triangle
commute:
A1
11
11
hπAA ,h0 i
11 f
11
11
11
A×N /B
g
i.e.,
g(x, h0 (x)) = f (x).
Let h : A → N be the minimum such function.
Hence minimization is a fourth fundamental operation:
h:A→N
M
f :A→B g :A×N→B
There are several problems that are hard to deal with. First, we leave the
domain of total functions and go into the troublesome area of partial functions.
All the relational axioms have to be reevaluated from this point of view. Second,
what should we substitute for Rdepth as a complexity measure?
Progress is being made in this direction in a forthcoming paper by Yuri
Manin and the author [19].
Proof Theory. There are many similarities between our work and work in
proof theory. Many times, one sees two proofs that are essentially the same. In
a sense, Lambek and Scott’s excellent book [15] has the proof theory version of
this paper. They look at equivalence classes of proofs to get categories with extra
structure. There is a belief that a program/algorithm implementing a function
f is a proof of the fact that f (x) = y. Following this intuition, there should be
a very strong relationship between our work and the work done in proof theory.
It would be nice to formalize this relationship. The work of Maietti (e.g. [17])
36 Noson S. Yanofsky
is in this direction.
P ≈1 P 0 ⇒ C(P ) ≈2 C(P 0 ).
We also demand that if there are two compilers, then the two compiled programs
will be essentially the same,
Now place the following equivalence relation ≡ on the set Programs of all
programs. Two programs are equivalent if they are the in the same programming
language and they are essentially the same, i.e.,
and two programs are equivalent if they are in different programming languages
but there exists a compiler that takes one to the other,
We have now placed an equivalence relation on the set of all programs that
tells when two programs are essentially the same. The equivalence classes of
Towards a Definition of an Algorithm 37
Programs/≡ are algorithms. This definition does not depend on any preferred
programming languages. There is much work to do in order to formulate these
ideas correctly. It would also be nice to list the properties of Algorithms =
Programs/≡.
References
[1] S. Abramsky. “Temperley-Lieb algebra: from knot theory to logic and com-
putation via quantum mechanics”. In Mathematics of Quantum Computing
and Technology, Goong Chen, Louis Kauffman and Sam Lomonaco, eds.
Taylor and Francis, 515–558, 2007.
[2] J.C. Baez and M. Stay. “Physics, Topology, Logic and Computation: A
Rosetta Stone” http://math.ucr.edu/home/baez/rosetta.pdf. Downloaded
from the web on November 30, 2009.
[3] M. Barr and C. Wells. Toposes, triples and theories. Grundlehren der Math-
ematischen Wissenschaften, 278. Springer-Verlag, New York, (1985).
[4] M. Barr and C. Wells. Category Theory for Computing Science. Prentice
Hall (1990).
[5] A. Blass, Y. Gurevich. “Algorithms: A Quest for Abso-
lute Definitions.” Available at http://research.microsoft.com/en-
us/um/people/gurevich/opera/164.pdf. Downloaded February 5, 2009.
[6] A. Blass, N. Dershowitz, and Y. Gurevich. “When
are two algorithms the same?”. Available at
http://arxiv.org/PS cache/arxiv/pdf/0811/0811.0811v1.pdf. Downloaded
Feburary 5, 2009.
[7] A. Burroni. Rcursivit graphique. I. Catgorie des fonctions rcursives prim-
itives formelles. [Graph recursiveness. I. The category of formal primitive
recursive functions] Cahiers TopologieGom. Diffrentielle Catg, 27, 4979,
1986.
[8] P. Clote. Computational Models and Function Algebras. Handbook of Com-
putability Theory. Volume 140 (Studies in Logic and the Foundations
of Mathematics) North Holland; 1 edition (October 15, 1999) ISBN-10:
0444898824
[9] T.H. Corman, C.E. Leiserson, R.L. Rivest, and C. Stein; Introduction to
Algorithms, Second Edition. McGraw-Hill (2002).
[10] W. Dean. What algorithms could not be. 2006 Thesis in Department of
Philosophy. Rutgers University.
[11] M. D. Gladstone. “A Reduction of the Recursion Scheme”. J. of Symbolic
Logic, 32, 4,. 505-508 (1967).
38 Noson S. Yanofsky