Lambda-Calculus, Combinators and Functional Programming PDF
Lambda-Calculus, Combinators and Functional Programming PDF
Editorial Board:
S. Abramsky, Department of Computing Science, Imperial College of Science and
Technology, London
P. H. Aczel, Department of Computer Science, Manchester
J. W. de Bakker, Centrum voor Wiskunde en Informatica, Amsterdam
J. A. Goguen, Computing Laboratory, University of Oxford
J. V. Tucker, Department of Computer Science, University of Swansea
G. E. REVESZ
Cambridge
JVfelbourne Sydney
Published by the Press Syndicate of the University of Cambridge
The Pitt Building, Trumpington Street, Cambridge CB2 I RP
40 West 20th Street, New York, NY 10011, USA
10 Stamford Road, Oakleigh, Melbourne 3166, Australia
Revesz, Gyorgy E.
Lambda-calculus, combinators
and functional programming.-(Cambridge tracts
in theoretical computer science; v. 4).
I. Lambda calculus
I. Title
511.3 QA9.5
Preface Vll
1. Introduction
1.1 Variables and functions in mathematics and in programming
languages
1.2 Domains, types, and higher-order functions 6
1.3 Polymorphic functions and Currying 10
2. Type-free lambda-calculus
2.1 Syntactic and semantic considerations 14
2.2 Renaming, a-congruence, and substitution 18
2.3 Beta-reduction and equality 23
2.4 The Church-Rosser theorem 25
2.5 Beta-reduction revisited 28
Appendix A
A proof of the Church-Rosser theorem 152
Appendix B
Introduction to typed .\-calculus 162
References 176
PREFACE
INTRODUCTION
x2 = 2x + 3
For what can we say about the possible values of x, if we do not require
that x represents the same value on each side of the equation?
This is in sharp contrast with the assignment statement in conventional
programming languages. In Fortran, for example, one can write
X= X+ 1
2 CHAPTER I
\fx[(x + 1)(x- 1) = x2 - 1]
where the domain of x must be clear from the context. (Otherwise the
domain should be explicitly stated in the formula itself.) Also, the
existential quantifier in a formula like
3x(x 2 - 5x + 6 = 0),
expresses a property of the set of all possible values of x rather than a
property shared by each of those values. A common feature of these for-
mulas is the presence of some special symbol (integral, quantifier, etc.)
which makes the formula meaningless, if the corresponding variable is re-
placed by a constant. (The formula VS[ (5 + 1) (5 - 1) = 5 2 - 1] does not
make much sense.) The variable in question is said to be bound in these
formulas by the special symbol.
Now, the problem is that the distinction between the free and the
bound usages of variables is not always obvious in every mathematical text.
Quantifiers, for example, may be used implicitly or may be stated only
verbally in the surrounding explanations. Also, a variable which is used as
a free variable in some part of the text may be considered bound in a larger
context.
The situation is even more confusing when we identify a function with
its formula, without giving it any other name. If we say, for instance, that
the function x 3 - 3x 2 + 3x - 1 is monotonic, or continuous, or its firts
derivative is 3x 2 - 6x + 3, then we consider the expression as a mapping
and not as the representation of some value. So, the variable x is treated
here as a bound variabe without explicitly being bound by some special
symbol. As we shall see later, a major advantage of the lambda-notation is
that it forces us to make the distinction between the free and the bound
usages of variables always explicit. This is done by using a special binding
symbol 'A.' for each of the bound variables occurring in an expression.
Hence, we can specify a function by binding its argument(s), without giv-
ing it an extra name. The keyword 'lambda' is used for the same purpose
4 CHAPTER I
[ [x,y] I x E D, y E R]
where the second component is uniquely determined by the first. Thus, for
every x in D there is at most one y in R such that [x,y] belongs to a given
function. If a function has at least one, hence, exactly one ordered pair [x,y]
for every x E D then it is called a total function on D; Otherwise it is called
a partial function on D.
A function with domain D and range R is also called a mapping from
D to R and its set of ordered pairs is called its graph. For nonempty D and
R, there are, of course, many different functions from D to R, and each of
them is said to be of type [D-+ R]. For finite D and R, the number of
[D-+ R] type total functions is obviously I R 11°1 , where the name of a set
between two vertical bars denotes the number of its elements. This formula
extends to infinite sets with the cardinality of the given sets replacing the
number of their elements. The set of all [D-+ R] type functions is also
called the function spaceR 0 .
If the range R has only two elements say, 0 and 1, then each [D-+ R]
type total function is the characteristic function of a subset of D. Hence,
the cardinality of the set of [D-+ R] type total functions with I R I = 2 is
the same as that of the powerset of D. Therefore, if R has at least two el-
ements, the cardinality of the function space R 0 is larger than the
cardinality of D.
INTRODUCTION 7
So, for instance, the set of all number-theoretic, i.e. [N-+ N] type
functions where N denotes the set of natural numbers, is clearly
nondenumerable. On the other hand, it is known from the theory of algo-
rithms that the set of computable functions is denumerable. Namely, each
computable function must have a computational procedure, i.e. a Turing
machine associated with it, and the set of all Turing machines can be enu-
merated in some sequence, say, according to the length of their de-
scriptions encoded in some fixed alphabet. (Descriptions with the same
length can be listed in lexicographic order.) This means that the over-
whelming majority of the [N-+ N] type functions is noncomputable.
It is an interesting question if there is some extensionally definable
property which would distinguish the graphs of computable functions from
those of noncomputable ones. The mere fact that computable functions
have Turing machines associated with them tells nothing directly about
their properties as mappings. It would be nice if we could, so to speak, look
at the graph of the function and tell if it is computable. Interestingly
enough, there is such a property but it is far from being obvious. It is
'continuity' in a somewhat unusual but interesting topology. Here, we do
not discuss this topology but the interested reader is referred to the book
by Joseph Stoy [Stoy77], or to the papers by Dana Scott [Scott73,
Scott80]. For our discussion it suffices to say that the graphs of comput-
able functions are indeed different from those of noncomputable ones,
because they must obey certain restrictions namely, they must be
'continuous' in some abstract sense. The reason why we mention this con-
tinuity property is that it has been instrumental for the construction of a
mathematical model for the type-free lambda-calculus.
The original purpose of type-free lambda-calculus was to provide a
universal framework for studying the formal properties of arbitrary func-
tions, which turned out to be a far too ambitious goal. Nevertheless, the
type-free lambda-calculus can be used as a formal theory of functions in a
wide variety of applications, but we should never try to apply this theory
to noncontinuous, i.e. noncomputable functions. This state of the affairs
is comparable to the development of set theory. Naive set theory was
meant to cover arbitrarily large sets. The paradoxical nature of that goal
has led to the development of axiomatic set theory, where the notion of
classes is introduced in order to avoid the paradoxes of arbitrarily large
sets.
8 CHAPTER I
operations have their own (implicit) types. The test for zero predicate for
example, is of type [N-+8], where 8 ={true, false}.
New types can be defined by using certain type constructors. A record
type, for instance, corresponds to the cartesian product of the types of its
components. Thus, the domain of a record type variable is
TYPE-FREE LAMBDA-CALCULUS
AX.X Ax.Ay.(y)x
Ax.(f)x (f)3
Af.(f)2 (Ay.(x)y)Ax.(u)x
whose meanings are left undefined for the time being. Nevertheless, we
will treat them as functional expressions which satisfy certain formal re-
quirements.
A remark on our nonconventional way of parenthesizing lambda-
expressions may be in order here. The traditional way of putting the
argument(s) of a function between parentheses may have been suggested
by the conventional way of evaluating a function. In the conventional or
applicative order of evaluation, the argument of a function is computed
before the application. That corresponds to the call by value mechanism
of parameter passing in programming languages. In this book we use the
notation (f)x instead of the traditional f(x), which reflects the so called
TYPE-FREE LAMBDA-CALCULUS 17
The computation of the value of a function for some argument requires the
substitution of the argument for the corresponding variable in the ex-
pression representing the function. This is a fundamental operation in
mathematics as well as in programming languages. In a programming lan-
guage like Pascal or Fortran, this corresponds to the substitution of the
actual parameters for the so called formal parameters. In lambda-calculus
both the function and the arguments are represented by A-expressions, and
every function is written as a unary function which may return another
function as its value. So, for instance, the operation of addition will be re-
presented here as
TYPE-FREE LAMBDA-CALCULUS 19
AX.Ay.((+)x)y.
THE ALPHA-RULE
(a) AX.E -+a Az.{z/x}E for any z which is neither free nor bound in E.
[A.u.(y)u/x]A.y.(x)y
would yield
A.y.[A.u. (y )u/x](x)y
and then
and finally
A.y.(A.u.(y)u)y
22 CHAPTER 2
This means that the free occurrence of y in A.u.(y)u would become bound
in the result, which is clearly against our intuition about substitution. In-
deed, the free occurrence of y in the substitution prefix should not be
confused with the bound variable y in the target expression. In order to
avoid this confusion we introduce a new bound variable z which gives the
correct result namely,
A.z.(A.u.(y)u)z
A.z.[Q/x]{z/y}E ~ A.v.[Q/x]{v/y}E
if both z and v satisfy the condition of part (5) of Definition 2.4. (See
Exercise 2.3 at the end of Chapter 2.) This is quite satisfactory for our
purpose, because we are not so much interested in the identity of
A.-expressions as in their equality.
This freedom of choice with respect to the new bound variables while
performing a substitution has been achieved by our simplified renaming
operation which is defined directly without using substitution.
Our approach is slightly different from the conventional definition of
substitution included in most textbooks where every effort is made to de-
fine a unique result for the substitution operation. We feel, however, that
the conventional approach imposes a rather artificial and unnecessary re-
striction on the choice of the bound variable used for renaming during
substitution. Its only gain is that renaming becomes a special case of sub-
stitution and thus, it can be denoted by [z/x].
The complications with the substitution operation have inspired some
researchers to try to get rid of bound variables altogether. This has resulted
in the discovery of combinators which we shall study in Chapter 3. For the
time being, we can be satisfied that Definition 2.4 is a precise formal defi-
nition of the substitution operation in the type-free lambda-calculus.
TYPE-FREE LAMBDA-CALCULUS 23
THE BETA-RULE
(Ax.(x)x)Ax.(x)x
for which P-reduction never terminates. This is the main reason for the
undecidability of the equality of arbitrary A-expressions.
It should be emphasized that for certain A-expressions there are ter-
minating as well as non terminating P-reductions. But, if there is at least one
terminating P-reduction then we say that the given A-expression has a normal
form. For instance, the normal form of the A-expression
(Ay. (Az. w)y)(Ax. (x)x)Ax. (x)x
happens to be
w
in spite of the fact that it also has a nonterminating P-reduction as we have
seen before.
Fortunately, if a A-expression has at least one terminating
P-reduction, then one can find such a P-reduction in a straightforward
manner without 'back-tracking'. This follows from the so called standardi-
zation theorem to be discussed later in Chapter 6. Moreover, if a
A-expression has a normal form then every terminating P-reduction would
result in the same normal form (up to a-congruence).
In other words, the order in which the P-redexes are contracted is ir-
relevant as long as the reduction terminates. This is a corollary of the
Church-Rosser theorem, which is one of the most important results of the
theory of lambda-conversion. As we shall see below, this theorem also
implies that the equality problem of A-expressions having normal forms is
decidable.
but it can be found in Appendix A at the end of this book. The essence of
the theorem can be illustrated by the following example.
Assume that we have a polynomial P with two variables, x and y. In
order to compute its value, say, for x = 3 andy = 4, we have to substitute
3 for x and 4 for y in P and perform the prescribed arithmetic operations.
If we substitute only one of these values for the corresponding variable in
P then we get a polynomial in the other variable. Now, either of these
'partially substituted' polynomials can be used to compute the value of the
original polynomial for the given arguments by substituting the value of the
remaining variable. In other words, the final result does not depend on the
order in which the substitutions are performed.
If substitution is defined correctly, then this must be true in general.
Indeed, for any A-expressions, P, Q, and R, and variables, x andy, we have
~ (Az.[Q/x]{z/y}P)[Q/x]R ~ [[Q/x]R/z][Q/x]{z/y}P
MAN ''
''
'' /
/
/
/
/
'v /
/
Figure 2.1
TYPE-FREE LAMBDA-CALCULUS 27
The property reflected by this diagram is called the diamond property (or
confluence property). The Church-Rosser theorem says, in effect, that
~-reduction has the diamond property.
E,
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
//
/
/
/
/
/
'v /
Figure 2.2
28 CHAPTER 2
Note that this new a-rule is not symmetric by definition and it would not
perform any renaming by itself. But the following /3-rules would take care
of the renaming, as well.
(/31) (;\x.x)Q -+ Q
(/32) (;\x.y)Q -+ y if x andy are different variables.
(/33) (AX.AX.E)Q - AX.E
(/34) (;\x.;\y.E)Q -+ ;\y.(;\x.E)Q if x andy are different variables, and
at least one of these two conditions holds: x ¢ cj>(E), y ¢ cj>(Q).
(/35) (;\x.(E 1)E 2 )Q -+ ((;\x.E 1)Q)(;\x.E 2 )Q
A ;\-expression with the form ;\x.E is called an a-redex without any re-
striction. However, a term of the form (;\x.E)Q is a /3-redex if and only if
it has. the form of the left-hand side of a /3-rule and satisfies its conditions.
In particular, a ;\-expression of the form (;\x.;\y.E)Q with different x and
y and with x E cj>(E) and y E cj>(Q) is not a /3-redex. Such a ;\-expression
can be reduced only after an appropriate renaming is carried out. This can
30 CHAPTER 2
(i\x.x)x ~ x by /31
(i\x.y)x ~ y by /32
(i\x.i\x.E)x ~ i\x.E by /33
(i\x.i\y.E)x ~ i\y.(i\x.E)x by /34, if x and y are different,
where (i\x.E)x ~ E by the induction hypothesis.
Finally, (i\x.(E 1)E 2 )x ~ ((i\x.E 1)x)(i\x.E 2 )x ~(E 1 )E 2 by
f35 and by the induction hypothesis.
Lemma 2.3 If x ¢ <t>(P) then for every Q we have (i\x.P)Q ~ P.
Proof: Again we use structural induction. If P is a variable y then
it must be different from x and thus, the assertion follows from
/32. If P has the form i\y.E then either y is identical to x and thus,
(i\x.i\x.E)x ~ i\x.E by /33, or else y is different from x, and x ¢
<t>(E), which imply that
Note that neither substitution nor a-congruence is needed in our new sys-
tem, since they are covered by reduction as defined in Definition 2.7. We
can summarize the result of this section in the following theorem.
Theorem 2.3 For any two A-expressions, M and N, if M ~ N holds
with respect to Definition 2.5 then it also holds with respect to
Definition 2. 7.
Proof: It is enough to show that M ~ N holds with respect to
Definition 2. 7 whenever M ~ N or M - N is an instance of the
original J3-rule. In the first case the result follows from the Corol-
lary of Lemma 2.4 while in the second case it follows from Lemma
2.5.
The converse of Theorem 2.3 is obviously false. For instance,
AX.E ~ AZ.(Ax.E)z
is not true with respect to Definition 2.5. Similarly,
Exercises
2.1 Reduce the following A-expressions to their respective normal forms
using the J3-rule of Section 2.3.
(a) ( ( (Af.AX.Ay. (x)(f)y )p )q)r
34 CHAPTER 2
Show that for every E, 0, and x the result of the conventional substitution,
[0/x]E, is a-congruent with the result obtained from Definition 2.4.
2.6 Design and implement an algorithm to decide the a-congruence of two
i\-expressions being in normal form.
CHAPTER THREE
then we get
( (i\x.i\y.i\x. (~ )_x)_J? )F
36 CHAPTER 3
which reduces to
i\x.(F)x
I ~ i\x.x
COMBINA TORS AND CONSTANT SYMBOLS 37
((false)P)Q -+ Q
for every P and Q. The reason behind these definitions is the fact that the
truth values are normally used for selecting either one of two given alter-
natives. (Think of an IF-statement in a programming language.) If the
condition evaluates to true then the first alternative is computed, if it eval-
uates to false then the second. Hence, the conditional expression
if C then P else Q
will take the form
((C)P)Q
i\c.i\p.i\q. ((c)p )q
which takes three arguments. Observe the fact that this combinator can be
applied to arbitrary i\-expressions regardless of the 'type' of the first argu-
ment. In this respect type-free i\-calculus is similar to the machine code of
a conventional computer where arbitrary operations can be performed on
any data. We may get some unexpected results but that is not the comput-
er's fault.
Returning to the combinators we can say that there are quite a few
interesting and useful combinators. Some of them will be studied more
thoroughly in this chapter, but we cannot discuss them all. Historically,
combinators were invented first by Schoenfinkel in the late 20's. He used
them to eliminate all variables from mathematical logic. Curry, who inde-
pendently discovered them in about the same time, was mainly responsible
for the subsequent development of their theory. Combinators have re-
COMBINATORS AND CONSTANT SYMBOLS 39
Exercises
3.1 Negation can be represented by the combinator not which is
a-congruent to
Ax. ( (x)false)true
Find combinators to represent the boolean operations and and or.
3.2 Show that the operations represented by your and and or combinators
are both commutative and associative. Also, show that your and and or
combinators do have the usual distributive properties.
3.3 Find a combinator to represent the prefix apply operator characterized
by the reduction rule
((apply)A)B .... (A)B
for arbitrary A-expressions A and B. Using your representation show that
also the iteration
( ( (apply)apply)A)B
/3-reduces to (A)B.
As we have seen in the previous section, both the truth values and the
standard boolean operations can be represented by certain combinators,
that is, by certain A-expressions without free variables. It seems only na-
tural that no free variables occur in these representations, since they must
not depend on the context in which they are used. In other words, they
must behave the same in every context. In this respect, there is no differ-
ence between a constant value like true or false and some well-defined
function like and. As a matter of fact, every well-defined function can be
considered a constant entity regardless of the number of operands it takes.
40 CHAPTER 3
Observe the fact that even the truth values are represented here as
functions. Each of them can take up to two arguments as can be seen from
their representations. But that is quite natural. If we do not use constant
symbols in our lambda-notation then we have only application, ab-
straction, and variables. So, the only way to represent context independent
behavior is by using combinators.
It is important to note that almost anything can be represented in this
manner. To support this claim, we show the combinator representation of
natural numbers developed by A. Church. These combinators are called
Church numerals, and they are defined as follows:
0 ~ }..L\x.x
1 ~ Af.Ax.(f)x
2 ~ Af.Ax.(f)(f)x
3 ~ Af.Ax.(f)(f)(f)x
and, in general, the combinator representing the number n iterates its first
argument to its second argument n times.
The arithmetic operations on these numerals can be represented also
by appropriate combinators. For instance, the successor function, which
increments its argument by one, can be represented as
succ ~ An.Af.Ax.(f)((n)f)x
Note that the names of the bound variables do not matter. Addition can
be represented as
+ ~ Am.An.Af.Ax.((m)f)((n)f)x
* ~ Am.An.Af.(m)(n)f
The reader is recommended to work out the details and find a represen-
tation for the exponentiation m" in this framework.
A predicate to test for zero in this representation can be given as fol-
lows:
zero~ An.((n)(true)false)true
COMBINATORS AND CONSTANT SYMBOLS 41
and
(i\z.((z)a)b)false ~ b
Now, in analogy with the successor function we can define a function next
to obtain [n + 1, n] from [n, n - 1]. The corresponding i\-expression will
be the following:
next~ i\p.i\z.((z)(succ)(p)true)(p)true
which makes use of only the first element of the ordered pair p representing
the argument. So, we can start with [0, 0] and iterate the next function n
times to get [n, n - 1] . But, that is easy since the Church numeral repres-
enting the number n involves precisely n iterations. Therefore, we can ap-
ply this Church numeral to next as its first argument and to the expression
i\z.((z)O)O
as its second argument. This gives us the i\-expression
( (n)i\p.i\z. ( (z) (succ)(p )true) (p )true)i\z. ( (z)O)O
where n stands for the Church numeral representing the number n
Finally, in order to obtain the value of the predecessor function, we
have to select the second element of the resulting ordered pair. Thus, the
predecessor function can be represented by the following i\-expression:
pred ~ i\n. ( ( (n)i\p.i\z.( (z)(succ)(p )true)(p )true) i\z. ( (z)O)O)false
It may be interesting to note that Church himself could not find a repre-
sentation for the predecessor function. He had just about convinced him-
self that the predecessor function was not lambda-definable when Kleene
found a representation for it. (See page 57 in [Klee81 ].)
42 CHAPTER 3
Exercises
3.4 Define a combinator square to compute n 2 for a natural number n using
Church numerals.
3.5 Find a A-expression to represent the predicate even which returns true
whenever the argument is an even number and returns false when it is an
odd number. Use Church numerals and do not worry about the value of
the predicate for non-numeric arguments.
3.6 Consider the following numerals:
0~1
1 ~ Az.((z)false)I
2 ~ AZ. ( (z)false)Az.( (z)false )I
and so on. Find A-expressions to represent the successor and the predecessor
functions and the predicate to test for zero. The latter may be used in the
definition of the predecessor function.
(Y)E = (E)(Y)E·
for every A.-expression E. This implies that
(Y)E = (E)(E) ... (E)(Y)E
for any number of iterations of E. The question is whether we can find a
closed A.-expression with this property. In Chapter 2 we have seen that the
A.-expression
(A.x.(x)x)A.x.(x)x
reduces to itself and thus, it gives rise to an infinite reduction. This feature
is very similar to what we are looking for, only we need to deposit a prefix
of the form (E) in each reduction step. But this can be achieved by the
following modification:
A.y .(A.x.(y)(x)x)A.x.(y)(x)x
Indeed, if we apply this combinator to a A.-expression E then we get
(A.x.(E)(x)x)A.x.(E)(x)x => (E)(A.x.(E)(x)x)A.x.(E)(x)x
=> (E)(E)(A.x.(E)(x)x)A.x.(E)(x)x
and so forth ad infinitum. So, theY combinator will be defined as
Y ~ A.y.(A.x.(y)(x)x)A.x.(y)(x)x
The usefulness of this combinator is due to the fact that every recursive
definition can be brought to the form of a fixed-point equation
f = (E)f
where f does not occur free inside E. The solution to this equation, namely
the fixed-point of the higher order function E can be obtained as (Y)E.
Indeed, the substitution of (Y)E for fin the above equation yields
(Y)E = (E)(Y)E
which is true for every E by the definition of Y. Therefore, the Y
com bin a tor is a universal fixed-point finder, hence, it is called a fzxed-point
combinator.
COMBINATORS AND CONSTANT SYMBOLS 45
In order to see how it works in a simple case consider, for example, the
following definition of the factorial function.
(fact)n = ifn=O then 1 e/se((*)n)(fact)(pred)n
The solution of this implicit equation can be obtained in explicit form as
follows. The equation is written in our lambda-notation as
(fact)n = (((zero)n)1)((*)n)(fact)(pred)n
which is equivalent to
fact= A.n.(((zero)n)1)((*)n)(fact)(pred)n
Now, if we 'abstract out' (in analogy with factoring out) the free occur-
rences of the function name 'fact' on the right-hand side, we get
fact = (A.f.A.n. ( ( (zero)n) 1)( ( *)n)(f)(pred)n)fact
which means that fact is a fixed point of the expression
A.f.A.n. ( ((zero )n) 1)( (*)n)(f) (pred)n
X= X+ 1
which has obviously no finite solution because the successor function has
no fixed points. Nevertheless, the fixed-point combinator would yield a
formal solution in the following closed form:
x = (Y)A.x.((+)x)1
Here the right-hand side gives rise to an infinte reduction, which yields an
infinite number of iterations, because it has no normal form. Indeed, the
46 CHAPTER 3
Q =true
for arbitrary Q, which is clearly a paradox.
One might suggest that the problem with the Y combinator is due to the
fact that it involves self-application. Clearly, the application of a function
to itself seems inconsistent with the fact that the cardinality of the function
space R 0 is always greater than that of D. Therefore, it is impossible to find
a reasonable domain D and a rangeR such that R 0 cD as implied by self-
application. But, if the function space is restricted to Scott-continuous
functions then one can construct such domains. This means that self-
application by itself is not inconsistent with set theory, but the usability of
the type-free lambda-calculus has certain limitations.
Self-application may be useful for certain purposes. It is quite possible
to apply a computer program to itself and get a meaningful result. A well
known example is a LISP interpreter written in LISP. Also, one can easily
write self -applicable programs in the machine code of a regular computer.
Indeed, for a truly type-free theory of functions self-application cannot be
ruled out.
Note that the universal applicability of the Y combinator does not of-
fer a practical solution to every fixed-point equation. If, for instance, we
were to solve the perfectly reasonable fixed-point equation
x=3x-10
that is
x = ((-)((*)3)x)10
then we would get
x = (;\x.((-)((*)3)x)10)x
and the explicit form
x = (Y);\x.((-)((*)3)x)10
The right-hand side of this equation reduces to
(;\x. ((- )( ( *)3 )x) 1O)(Y);\x. ((- )( (*)3 )x) 10
which further reduces to
( (- )( (*)3 )(Y);\x.( (- )( (*)3 )x) 10) 10
48 CHAPTER 3
y2 ~ (YI)G
and in general
Y,,+ I ~ ( Y,)G
where
G ~ .\x ..\y.(y)(x)y
The reader should verify that Y2 ~ T, where T is the fixed-point
combinator of Turing. For more details see exercise 3.9
COMBINATORS AND CONSTANT SYMBOLS 49
Exercises
3. 7 Give a recursive definition for the Fibonacci numbers and use the Y
combinator to compute the fifth Fibonacci number, i.e. the value of
(Fibonacci) 5.
3.8 Give a recursive definition for the greatest common divisor of two in-
tegers and compute the value of ((gcd)l0)14 using theY combinator.
3.9 Consider the sequence of combinators Y1, Y2, ••• , and the combinator
G as defined at the end of this section. Show that
(a) Each ~ is a fixed-point combinator;
(b) Each~ is a fixed-point of G, i.e.~= (G)~,
(;\x.P)x ~ P
for every ;\-expression P and variable x. But the order of these two oper-
ations is important here. If we use them in the reverse order, i.e. if we first
apply P to x and then abstract the result with respect to x, then we get
;\x.(P)x
50 CHAPTER 3
which is clearly not f3-reducible to P. (The prefix .\x. will not disappear
unless the entire expression is applied to some other expression.) Even so,
P may have some free occurrences of x which get captured by the prefix
.\x. Therefore, the application
(.\x.(P)x)Q
THE ETA-RULE
Next we shall prove that the two standard combinators, S and K, are suf-
ficient for eliminating all abstractions, i.e. all bound variables from every
.\-expression. These combinators are defined as follows.
S ~ .\x ..\y ..\z.((x)z)(y)z
K ~ .\x ..\y.x.
Note that the K combinator is actually the same as true. Furthermore, the
identity combinator
I ~ .\x.x
can be expressed in terms of S and K due to the following equality:
((S)K)K =I
52 CHAPTER 3
which can be easily verified by the reader. Hence, we can use these three
combinators S, K, and I where I is just a shorthand for ((S)K)K.
A combinator expression in which the only combinators are the
standard ones is called a standard combinator expression.
In order to eliminate the bound variables from a .\-expression we use an
operation called bracket abstraction, which is the combinatory equivalent of
.\-abstraction. Namely, for every variable x and standard combinator ex-
pression M there exists a standard combinator expression [x]M such that
[x]M = .\x.M. Note that the bracket prefix, [x], is only a meta-notation and
the expression [x]M stands for a true standard combinator expression
which is /3-convertible to .\x.M.
which is equal to
( (,\x. ( (S)(K)x)( (S)I)I)( (S)(K)x)( (S)I)I)( ( (S)[ u]K)[ u]u)w.
Next we get
( ( ( (S)[x](S)(K)x)[x]( (S)I)I)( (S)(K)x)( (S)I)I) ( ( (S)(K)K)I)w
and eventually
( ( ( (S)( (S) (K)S)( (S)(K)K)I) ( (S)( (S)(K)S) (K)I)(K)I) ( (S) (K)x) ( (S)I)I)
(((S)(K)K)I)w
which is equal to our original ,\-expression. By the way, the normal form of
this expression is
(x)(w)w
An immediate consequence of the above theorem is the following:
Corollary: Every closed ,\-expression is equal to some standard
combinator expression with no variables in it.
54 CHAPTER 3
As can be seen from the previous example, the size of the expression grows
larger and larger with each subsequent abstraction. Therefore, the above
algorithm is impractical when we have to abstract on a large number of
variables.
Curry improved on the basic algorithm by introducing two more
combinators, 8 and C, for the following special cases of S.
8 ~ i\x.i\y.i\z.(x)(y)z
C ~ i\x.i\y.i\z.((x)z)y
Having these combinators we can simplify the expressions obtained from
the basic algorithm by applying the following rules:
(1) ( (S)(K)P)(K)Q = (K)(P)Q
(2) ((S)(K)P)I = P
(3) ((S)(K)P)Q = ((8)P)Q
(4) ((S)P)(K)Q = ((C)P)Q
It is interesting to note that the second rule can only be derived in
lambda-calculus by using the 17-rule, as well. This means that the second
equation is an extensional one in lambda-calculus. The reader should verify
each of these equalities.
The improved version of the abstraction algorithm will follow the same
steps as the basic algorithm but whenever an expression of the form
( (S)P)Q is created it will be simplified using the above equations, if it is
possible to do so. For that purpose, the equations will be considered in the
priority of their sequence, that is, if more than one equation is applicable
at the same time then the one with the smallest serial number will be ap-
plied.
To see this algorithm at work consider a function with two variables,
( (F)x)y. Abstracting on x gives us
[x]((F)x)y = ((S)[x](F)x)[x]y = ((S)((S)[x]F)[x]x)[x]y =
((S)((S)(K)F)I)(K)y = ((S)F)(K)y = ((C)F)y
Similarly, abstracting on y yields
[y]((F)x)y = ((S)[y](F)x)[y]y = ((S)((S)[y]F)[y]x)[y]y =
COMBINATORS AND CONSTANT SYMBOLS 55
TURNER'S ALGORITHM
Use the algorithm of Curry, but whenever an expression beginning
with S, 8, or C is formed use one of the following simplifications, if it is
possible to do so.
((S)((8)T)P)Q = (((S')T)P)Q
((8)T)((8)P)Q = (((8')T)P)Q
((C)((8)T)P)Q = (((C')T)P)Q
Turner's algorithm would increase the length of an expression as a linear
function of the number of abstractions. Namely,
56 CHAPTER 3
and
((K)A)B-+ A
Y ~ Ay.(Ax.(y)(x)x)Ax.(y)(x)x
and
T ~ (Ax.Ay.(y)((x)x)y)Ax.Ay.(y)((x)x)y
are noncongruent (and not even {3-convertible), although their applicative
behavior is the same. This shows that the relationship between A-calculus
and the theory of combinators is more subtle than one might think at first.
A deeper analysis of this relationship can be found in [Baren81] or in
[Hind86].
The intuitive appeal of the lambda-notation is certainly missing from
pure combinator expressions even if we use a great deal more combinators
than just the standard ones. The lack of A-abstraction seems to be an ad-
vantage in functional programming, but a completely variable-free notation
is not always desirable. More about functional programming can be found
in Chapter 5.
Exercises
3.10 Show that the standard combinators, S and K, satisfy the following
equalities:
((S)((S)(K)S)(K)K)(K)K = (K)((S)K)K
((S)((S)(K)S)((S)(K)K)K)(K)((S)K)K = K
and
((S)((S)(K)S)( (S)(K)(S)(K)S)( (S)(K)(S)(K)K)S)(K)(K)( (S)K)K = S
3.11 Prove the following extensional (i.e., f3YJ) equality:
((S)((S)(K)S)K)(K)((S)K)K = ((S)K)K
3.12 Use bracket abstraction to prove that the Y combinator satisfies the
equality
Y = ((S)((S)A)B)((S)A)B
where
A= ((S)(K)S)((S)(K)K)I
B = ((S)((S)(K)S)(K)I)(K)I
58 CHAPTER 3
can be represented by
Az.((z)E 1) ... AZ.((z)En)nil
where z is any variable which is not free in Ei (1 ~i~n). For the given rep-
resentation the two basic list manipulating operations can be implemented
by the following combinators:
head ;;:;; AX. (x)true
tail ;;:;; Ax.(x)false
Indeed, the application of head to the above representation of a list returns
its first member E 1, while the application of tail returns the representation
of the remaining (n-1)-element list, as can be verified easily by the reader.
The so called list-constructor operator, which appends its first operand as
LIST MANIPULATION IN LAMBDA-CALCULUS 61
a new element to the front of its second operand regarded as a list, can be
represented by the combinator
cons~ A.x.A.y.A.z.((z)x)y
The reader should verify that
((cons)A)[E 1, ... , En]~ [A, E 1, ... , En]
in view of the given representation.
This means that list manipulation can be implemented in standard
lambda-calculus without using any extra notation. However, this imple-
mentation is based on a simulation of the elementary list operations via
f3-reduction where each of those operations takes several /3-reduction steps
to execute.
A more concise representation can be obtained if the list manipulating
operators are treated as true combinators, i.e. atomic symbols supplied
with appropriate reduction rules. Then a list of the form
can be represented by
( (cons)E 1) ••• ( (cons)En)nil
and the reduction rules for the head, tail, and cons combinators can be
given accordingly.
Here we shall follow a similar approach, but we use a conventional
notation for lists with brackets and commas. Arbitrarily nested lists will be
considered as valid A.-expressions, and the elementary list operators will
be treated as constant symbols. The extended syntax is given below.
The empty list is denoted by []. The constant symbols A, ~. and &
represent the head, tail, and cons operators, respectively. We use these single
character symbols in our implementation simply because they are easier to type
on a terminal keyboard than the corresponding four letter words. The predi-
cate null represents the test for the empty list. So, we can form
,\-expressions like
(A.q.[p,q,r])S
(A.x.[ (x)y,(y)x])M
(A.x.(A.y.[x,y ])a)[b,c,d]
With this, of course, we expect that the first expression here will reduce to
[p,S,r], the second to [(M)y,(y)M], and the third to [[b,c,d],a]. To achieve
this we shall need some additional reduction rules which will be given in the
next section.
LIST MANIPULATION IN LAMBDA-CALCULUS 63
Head
Tail
(~)[]- []
Construction
((&)A)[] - [A]
((&)A)[E 1, ••• ,En] - [A, E 1, ••• ,En] for n~ 1
Selection
for n~ 1
fork> 1, n~ 1
In contrast with LISP, both A and ~ are well-defined here for the
empty list. This turns out to be very useful for a recursive definition of
certain list-manipulating functions. The selection of the first member of a
list, however, is undefined for the empty list. Hence, (1 )E is not always the
same as (A )E.
The apparently meaningless application of an integer k to some list L
is interpreted here as the the selection of the k-th element of L. Both the
integer k and the list L may be given as arbitrary A-expressions and thus,
they must be evaluated (to some extent), before we can tell whether they
fit together. If, for instance, A and B are arbitrary A-expressions then the
A-expression
(2) ( (&)A)(Ax. (Ay. ( (&)x)y )[])B
ALPHA-RULES
(al) {z/x}x - z
(a2) {z/x}E- E if x does not occur free in E
BETA RULES
(/31) (Ax.x)Q - Q
(/32) (Ax.E)Q - E if x does not occur free in E
(/33) (Ax.Ay.E)Q - Az.(Ax.{z/y}E)Q if x ¢-y, and z is neither free
nor bound in (E)Q.
(/34) (Ax.(E 1)E 2 )Q - ((Ax.E 1)Q)(Ax.E 2 )Q
(/35) (Ax.[E 1, ... , En])Q - [(Ax.E 1)Q, ... , (Ax.En)Q] for n~O
It is interesting to note that our /35-rule has a certain similarity to the ap-
plicative property of construction in FP, which we mentioned before. This
property can be formulated in our system as the following reduction rule:
Now, if we can distribute an abstraction prefix over a list then we can re-
place /35 by the above rule. That is precisely what we shall do by using the
following two gamma rules instead of {35.
GAMMA-RULES
(yl) ([EI, ... , En])Q - [(E 1)Q, ... , (En)Q] for n~O
By adding these new axioms to the a-rules and /31 through {34 we get a
complete system in which every A-expression will be evaluated by reducing
it to its normal form if such exists.
The definitions of reduction (~) and equality (=) will remain the
same as given in Chapter 2 except that the relation - will be defined
by the new set of rules.
To see how this system works let us consider an example. Take, for in-
stance, the algebraic law of composition
66 CHAPTER 4
[f, g] • h = [f • h, g • h],
which is treated as an axiom in FP. Here we can prove this equality as
follows:
[f, g] • h = ((A.x.A.y.A.z.(x)(y)z)[f, g])h
by definition of composition as given in Section 3.1. The right-hand side
{3-reduces (in several steps) to A.z.([f, g])(h)z. Then by using the y-rules
we get
A.z.([f, g])(h)z - A.z.[(f)(h)z, (g)(h)z] -
[A.z.(f)(h)z, A.z.(g)(h)z] = [f • h, g • h]
which completes the proof.
The first y-rule can be used also for selecting simultaneously more than
one element of a list. Namely, for any k-tuple of integers, [i 1, ... , ik], we
have
With the aid of the elementary list operators and predicates defined in the
previous sections we can define other list manipulating functions. For in-
LIST MANIPULATION IN LAMBDA-CALCULUS 67
stance, the append function which joins together two lists satisfies the fol-
lowing equation:
This definition will be used for the computation of the value of append for
some arguments, say, [a,b,c] and [d,e], in such a way as if we had written
(i\append.( (append)[a,b,c])[ d,e])(Y)i\f.i\x.i\y.( ( (null)x)y)
((sum)O)x
prod = ((reduce) 1) *
Both of these functions are well-defined for the empty list, which is quite
reasonable in standard mathematics.
In order to get a flavor of list manipulation in our extended lambda-
calculus we show a few more examples. We shall use syntactic sugar while
omitting the fairly trivial translations to pure lambda-notation.
Examples
where
((pairs)x)y = if (null)x then[]
else ( (&)[ (A )x,( A)y]) ((pairs)(~ )x)( ~ )y
where
( (insert)a)x = if (null)x then [a]
else ifa~(")x then ((&)a)x
else ( (&)(" )x)( (insert) a)(~ )x
A somewhat tricky example is finding the permutations of a list. We shall
use two auxiliary functions. The first will separately remove each element
in turn from a list. For'instance,
(removeone )[a,b,c]
The next step would be to compute the permutations of those shorter lists
which are produced by removeone. This we can do by computing
which yields
((map)permute )[[b,c],[a,c],[a,b]]
hence,
[(permute) [b,c ],(permute) [ a,c ],(permute) [a,b ]]
that is,
[[[b,c],[c,b]], [[a,c],[c,a]], [[a,b],[b,a]]]
((put back) [a,b,c]) [[[b,c ],[ c,b ]],[[ a,c ],[ c,a]],[[a,b ],[b,a]]]
List structures are indeed very useful in many applications. They are con-
sidered fundamental in LISP and in other functional languages. It seems
natural that they should be treated as primitive objects in .\-calculus, as
well, but in order to do so it was necessary to add new reduction rules to
the system. By using the y-rules and a few primitive list operators list ma-
nipulation is quite simple in lambda-calculus.
The use of lists as primitive objects in .\-calculus has some other benefits,
too. For one thing, they allow for an effective vectorization of our calculus.
This, besides its mathematical elegance, also has some practical advantages
as can be seen from the following treatment of mutual recursion. A similar
approach has been used by Burge [Burg75] without the full list manipulat-
ing power of our extended calculus.
Nonrecursive function definitions can be treated in lambda-calculus in
a fairly simple manner. Assume namely, that we have a sequence of defi-
nitions of the form
fi (i = 1, ... ,n). Now, the value of E with respect to the given definitions can
be computed by evaluating the combined expression
Thus, the set of equations can be treated as mere syntactic sugar having a
trivial translation into pure lambda-notation. This simple-minded ap-
proach, however, does not always work. As can be seen from the given
translation, the form of the combined expression reflects the order of the
equations. Therefore, a free occurrence of fi in ei will be replaced by ei if
and only if i<j.
In other words, previously defined function names can be used on the
right-hand sides of the equations, but no forward reference can be made to a
function name defined only later in the sequence. This clearly excludes mu-
tual recursion, which always involves some forward reference. If, for in-
stance, f 1 is defined in terms of f 2 (i.e., f2 occurs in e 1) and vice versa then
the forward reference cannot be eliminated simply by changing the order
of the equations.
It should be clear that in the absence of mutual recursion the equations
can be rearranged in such a way that no forward reference occurs. Imme-
diate recursion should not be a problem, because it can be resolved with
the aid of theY combinator. Now, we will show that mutual recursion can
be resolved fairly easily in our extended lambda-calculus. In particular, the
list manipulating power of our calculus is very helpful in working with a list
of variables. Actually, we can use a single variable to represent a list just as
is done in vector algebra. Hence, the solution of a set of simultaneous
equations can be expressed in a compact form with a single occurrence of
the Y combinator. First, we illustrate this method through an example.
Consider the following mutual recursion defining two number-
theoretic (integer type) functions, g and h:
g = An.(((zero)n)O)(( + )(g)(pred)n)(h)(pred)n
We introduce a new variable F to represent the ordered pair [g,h], and re-
write our equations using (1 )F instead of g, and (2)F instead of h.
(1 )F = An.( ( (zero)n)O)( ( + )( (1 )F)(pred)n( (2)F)(pred)n
In order to prove that this is a correct solution we apply our reduction rules.
The right-hand side of the last equation reduces to
(;\f.[r 1, ... , rn])(Y);\f.[r 1, ... , rn]
by the definition of the Y combinator. This further reduces to
([Af.r 1, ... , Af.rn])(Y)Af.[r 1, ... , rnJ
by y2, and then to
[(;\f.r 1)(Y);\f.[r 1, ... , rn], ... , (;\f.rn)(Y);\f.[r 1, ... , rn ]]
by yl. This is clearly ann-tuple which should be equal to F. Indeed, its i-th
component is
(;\f.r)(Y);\f.[r 1, ... , rn]
which implies that
This formula is correct also for nonrecursive definitions but, of course, the
simple-minded approach described at the beginning of this section is more
efficient. Therefore, an optimizing compiler should treat recursive and non
recursive definitions separately. For that purpose, one can compute the
dependency relation between the given definitions and check if it forms a
partial order on the set functions fi (i = 1, ... , n). If so, then no mutual
recursion is present and the definitions can be arranged in a sequence
suitable for the simple-minded solution. Otherwise, one should try to iso-
late the minimal sets of mutually recursive definitions and solve them sep-
arately, before putting them back to their proper place in the sequence.
if A then B else C
zeros = ((&)O)zeros
78 CHAPTER 4
which means that the infinite list of zeros remains the same when one more
zero is attached to it. But this is a fixed-point equation whose solution is
zeros = (Y)Az.((&)O)z
Indeed, the right-hand side reduces to
and then to
((&)O)(Y)Az.( (&)O)z
which clearly generates an infinite list of zeros just by using the given re-
duction rules. Unfortunately, the form in which this list appears is not
suitable for using it as an argument to our 'built-in' functions. For instance,
the special function A cannot work on this list, because it does not have the
appropriate form. But, this is easy to fix. All we have to is to adjust the
reduction rules of the list operators to make them work with partially
evaluated lists. So, the rest of a list may be an arbitrary A-expression which
has not been computed yet. This implies, of course, that the rest of the list,
which will be denoted by R in the following rules, may not reduce to a
'list-tail' or may not have a normal form at all. Nevertheless, we can define
our list operators as follows:
((&)E)R - [E, R
(nuli)[E, R - false
(A)[E, R- E
(~)[E, R- R
(l)[E, R- E
These rules represent the lazy extensions of the defining rules of the given
functions. With the aid of these rules we can easily compute, for example,
(5)zeros, which is the fifth member of the infinite list, that is, 0.
The infinite list of natural numbers is defined by the iteration of the
successor function. The iteration of a function f to some argument x
generates the infinite list
LIST MANIPULATION IN LAMBDA-CALCULUS 79
Note that in the definition of pairs we do not have to test for the empty list,
since neither fibonacci nor (~)fibonacci is empty. An interesting feature of
this definition is the application of (map )sum to an infinite list of pairs.
Clearly, this application should not wait for the computation of the whole
list. As soon as the first pair gets formed by the function pairs, its sum
should be computed and appended to the list [ 1, 1]. This yields an inter-
mediate result whose first three elements are 1, 1, 2, and thus, the compu-
tation of pairs may continue. This method of building the infinite list of
Fibonacci numbers works only with a demand-driven evaluation strategy.
Otherwise, the application of the functions would be delayed until the ar-
guments are totally computed, which would obviously kill the recursion in
this case. This is so, because the list fibonacci is used here repeatedly as a
partially evaluated argument while it is being created.
It is important to note that we do not have to take extra measures in
order to deal with infinite lists. The demand-driven evaluation technique
as described above will automatically give us this opportunity for free.
Later we shall see that a strictly demand-driven evaluation technique usu-
ally involves some loss in the efficiency of computations. Therefore, the
overall simplicity of handling infinite lists does have a price.
For another example with this flavor, consider the computation of the
prime numbers using the Sieve of Eratosthenes. First, an auxiliary function
is defined to filter out the multiples of a number from a list of numbers.
((filter)n)x = if (null)x then[]
else if( (mod)( A )x)n = 0 then ( (filter)n)( ~ )x
primes== (sieve)((iterate)succ)2
LIST MANIPULATION IN LAMBDA-CALCULUS 81
where
Now, if we want to stop at the first such term that is smaller than a given
e then we can write
As can be seen from these examples infinite lists can be defined easily via
recursion. Infinite objects and, in particular, infinite lists occur naturally in
mathematics but they are poorly represented in conventional programming
languages, because their proper treatment requires lazy evaluation which
seems to be less efficient than the traditional approach. Clearly, there is a
trade-off between the expressive power of the language and the efficiency
of its implementation.
Another problem here, and also in most functional languages including
LISP, is the lack of distinction between fixed-sized arrays and dynamically
changeable lists. The fixed size of a finite array makes it easy for an im-
plementation to support a direct access to its elements. List structures are
more flexible, hence, a direct access to their elements is much more difficult
to obtain. A uniform treatment of arrays and lists must be prepared for the
82 CHAPTER 4
Exercises
4.1 Define list manipulating functions to do the following:
(a) Sort a list of numbers using quicksort,
(b) Merge, i.e. shuffle two lists,
(c) Cut off the last element of a list,
(d) Rotate a list to the right or to the left,
(e) Compare the elements of two lists,
(f) Multiply together two square matrices.
4.2 A Curried function with precisely n arguments will be changed into the
corresponding list oriented function (which takes a list of lenght n for an
argument) by the following combinator
uncurry =
A.n.A.g.A. v.( ( (zero)(pred)n)(g) ( 1)v) ( ( ( (uncurry)(pred)n)g)v)(n)v
So, for instance, with n = 3 we get
(((uncurry)3)F)[a,b,c] = (((F)a)b)c
Conversely, a function whose argument is an array (list) of n elements will
be changed into the corresponding Curried function by the following
combinator:
curry= A.n.A.f.(((zero)(pred)n)A.x.(f)[x])
<variable>::= <identifier>
<constant>::=<number> I <operator> I <combinator>
<abstraction>::= 'A<variable>.<'A-expression>
<application>::= (<'A-expression>) <'A-expression>
<list>::= [<'A-expression> <list-tail> I []
<list-tail>::= ,<'A-expression> <list-tail> I]
<operator>::= <arithmetic operator> I <relational operator> I
<predicate> I <boolean operator> I <list operator>
<arithmetic operator>::= + I - I * I I I succ I pred I mod
<relational operator>::=< I $ I = I ~ I > I ::~
<predicate>::= zero I null
<boolean operator>::= and I or I not
<list operator>::= A I - I & I map I append
<combinator>::= true I false I Y
This syntax does not specify what an identifier or a number looks like, but
any reasonable definition would do, and we are not interested in their de-
tails. What we are interested in right now is the meaning of 'A-expressions
which will be described with the aid of a set of reduction rules.
These reduction rules represent meaning preserving transformations on
'A-expressions and the corresponding equality (in the sense of Definition
2.6) divides A into equivalence classes each of which consists of
'A-expressions with the same meaning. If an equivalence class contains a
member from A0 then this member is unique up to a-congruence and it
represents the meaning of every 'A-expression in that class. The following
is a summary of our reduction rules.
ALPHA RULES
(al) {z/x}x -+ z
RULE-BASED SEMANTICS OF A-EXPRESSIONS 87
BETA RULES
({31) (Ax.x)Q - Q
({32) (Ax.E)Q - E if x does not occur free in E
({33) (Ax.Ay.E)Q - Az.(Ax.{z/y}E)Q if x ¥:-y, and z is neither free
nor bound in (E)Q
({34) (Ax.(E 1)E 2 )Q - ((Ax.E 1)Q)(Ax.E 2 )Q
GAMMA RULES
(yl) ([El , ... , En])Q - [(E 1)Q, ... , En)Q] for n~O
PROJECTIONS
88 CHAPTER 5
COMBINATORS
((true)A)B - A, ((false)A)B - B
(Y)E- (E)(Y)E
Similar reduction rules are used for the remaining relational operators and
this completes our list.
RULE-BASED SEMANTICS OF A.-EXPRESSIONS 89
Note that most of the above rules are, in fact, 'rule-schemas' rather
than individual rules as they have an infinite number of instances. The
evaluation of a A-expression will be performed by reducing it to its normal
form using the above rules. But this is the same as the execution of the al-
gorithm (or functional program) represented by the expression. So, the
execution of a program can be described as a sequence of reduction steps
where each step is a single application of some reduction rule. This means
that the execution of a program can be defined in terms of certain transf-
ormations performed directly on its source form. Actually, the program and
its result are considered here as two equivalent representations of the same
object.
According to the Church-Rosser Theorem, the order in which the re-
duction steps are performed does not matter provided that the reduction
process terminates after a finite number of steps. From a practical point
of view, however, it would be desirable to minimize the number of steps
that are needed for the evaluation of a A-expression. That is essentially the
same as minimizing the execution time of a functional program, which
cannot be done in general. Nevertheless, there are various techniques for
improving the run-time efficiency of a program. The efficiency of the
function evaluation process represents a major issue for the implementa-
tion techniques to be studied in Chapters 6 and 7.
This reduction-based approach to the semantics of A-expresions is
closely related to the so called 'operational semantics' of programs. Indeed,
the reduction process is a well-defined procedure for every A-expression
even if it does not terminate. It is nondeterministic though in the sense that
the redex to be contracted in each step may be chosen arbitrarily from
among those that are present in the given A-expression at that point.
where E' is the same as E except that each (if any) occurrence of Y
in E is replaced by Y'.
In order for the new system to work, all recursive definitions should be
written with the aid of the new Y' combinator in place of the original Y.
The latter is disabled in the new system until it gets changed to Y' in a
{32 1-reduction step.
To see how this system works on a simple example consider the fol-
lowing recursive definition of the factorial function.
(fact)n = ifn = 0 then 1 else ((*)n)(fact)(pred)n
which will be written in our ,\-notation as
(fact)n = (((zero)n)l)((*)n)(fact)(pred)n
that is
fact= ,\n.(((zero)n)l)((*)n)(fact)(pred)n
RULE-BASED SEMANTICS OF .\-EXPRESSIONS 91
zeros= ((&)O)zeros
hence,
zeros= (Y')t.z.((&)O)z
where the right-hand side reduces to
(t.z.( (&)O)z)(Y)t.z. ( (&)O)z
which has the normal form
( ( &)0) (Y)t.z.( (&)O)z
Now, the problem is that a finite projection of this infinite list is not com-
putable in the new system.
First of all, we have to define all list operations in a lazy manner as
discussed in Section 4.5. So, for example, the k-th element of an infinite
list will be obtained by extending the reduction rules of projections to in-
finite lists as follows:
(l)[EI, ... ] -+ El
(k)[E 1, ... ] -+ ( {pred)k)[E 2 , ... ]
All the other list manipulating functions, which are the A, ~, &, null, map,
and append, will also be defined lazily. Actually, the last two need not be
defined as primitives, since they can be defined recursively with the aid of
the others as shown in Section 4.3. Namely,
( (map)f)x = ( ( (null)x)[]) ((&)(f)( A )x)( (map)f)( ~ )x
( {append)x)y = ( ( (null)x)y)( (&) (A )x)( {append)(~ )x)y
Returning to our example, the computation of the k-th element of the in-
finite list of zeros begins with
(k)( (&)O)(Y)t.z.( ( &)O)z "*' ({pred)k)(Y)t.z.( (&)O)z
when k~2. Now, in order to continue this computation we have to change
the Y combinator back to Y'. But that requires further modifications of the
reduction rules, because the /32 1-rule is not applicable in this case, since the
recursively defined infinite list involving the Y combinator is not the oper-
ator but the operand of the given projection. Therefore, we introduce the
following new rules:
RULE-BASED SEMANTICS OF .\-EXPRESSIONS 93
(A )(Y)E - (A )(Y')E
(~)(Y)E-+ (~)(Y')E
(nuii)(Y)E -+ (nuii)(Y')E
PRIMITIVE FUNCTIONS
The integers 1, 2, ... (representing selector functions)
k:x produces the k-th element of an object x if x is a se-
quence of at least k elements. Otherwise it returns w.
The tail function
it removes the first element of a sequence; produces w
when applied to a non-sequence object or to the empty
sequence.
id (identity)
id:x = x for all x in 0.
a, s
a adds 1 whiles subtracts 1 from its argument.
eq (test for equality)
eq:x = T if x is a pair of identical objects; eq:x = F if xis
a pair of non-identical objects; otherwise it produces w.
eqO (test for zero)
eqO:O = T.
gt (greater than), ge (greater or equal)
For instance,gt:<5,2> = T,gt:<2,5> = F, etc.
+,-, x, mod (arithmetic operations)
For instance, +:<5,2> = 7, +:<5> = w, etc.
iota (number sequence generator)
iota:n = < 1,2, ... ,n> if n is an integer.
apndl (append left), apndr (append right)
apndf:<a, <x 1, ... , Xn>> =<a, XI, ... , Xn>
apndr:<<xt, ... , Xn>, b> = <xt, ... , Xn, b>
distl (distribute from the left), distr (distribute from the right)
distl:<a, <xt, ... 'xn>> = <<a, xt>· ... , <a, xn>>
distr:<<xt, ... 'xn>, b> = <<xt, b>, ... , <xn, b>>
It is generally assumed that every primitive function returns w when
applied to a wrong argument.
RULE-BASED SEMANTICS OF A-EXPRESSIONS 97
COMBINING FORMS
Composition: f • g
(f • g):x = f:(g:x)
Construction: [f 1, ... , f"]
[f 1, ••• , f"]:x = <f 1:x, ... , f":x>
Conditional: p-f;g
(p-f;g):x = ifp:x=T then f:x else ifp:x=F then g:x else w.
Constant: x
x :y = ify :#= w then x else w.
Apply to all: af
af:x = <f:x 1, ••• ,f:x">' if x = <x 1, ••• , xn>; w otherwise.
Insert: /f
(/f):<x 1> = x 1
(/f): <X 1 , • • • , Xn > = f: <X 1 , (/f): < X2 , .. . , Xn > >.
Now, THE SET OF FUNCTIONS, F, will be defined inductively as fol-
lows:
( 1) Every primitive function is in F.
(2) If f 1 ,
••• , f" are in F and Cis a combining form which takes n
arguments then C applied to f 1, ••• , f" is also in F.
(3) If the expression Dr represents a function in the 'extension' of
F by the symbol f, i.e. if Dr is in F provided that the symbol f is
treated as a primitive function, then the function defined
(recursively) by the equation f = Dr is also in F.
( 4) Nothing else is in F.
Clause (3) has the same purpose as (recursive) function declarations in
conventional programming languages. The function symbol f represents
the name of a user defined function. The above formalism is somewhat
strange because the composition of combining forms with variable argu-
ments is not defined in FP. Combining forms can be applied only to first
order functions. Therefore, a variable function name should not be used
as an argument to a combining form, nor should it be used as a formal pa-
rameter in a combining form. (Note that while functions are always unary
in FP, most combining forms have multiple arguments.)
98 CHAPTER 5
AXIOMS
(A 1) h • (p - f ; g) = p - h • f ; h • g
(A2) (p - f ; g) • h = p • h - f • h ; g • h
(A4) /f•[g]=g
(AS) [f, g) • h = [f • h, g • h]
RULE-BASED SEMANTICS OF .\-EXPRESSIONS 99
with our Curried ;?: operator represents a correct translation of the pair
oriented ge. Indeed, the application of this expression to an ordered pair
[A,B] reduces to ( (;?: )A)B as required.
The function iota can be translated to
(Y 1 )i\i.i\n.( ( ( ( =)n) 1)[ 1])( (append)(i)(pred}n)[n]
RULE-BASED SEMANTICS OF .\-EXPRESSIONS 101
The translations of apndr and distr are left to the reader as an exercise.
Consider now the translation of the combining forms:
Composition is equivalent to
M.;\.g.;\.x.(f)(g)x
Construction is exactly the same in either notation. The Conditional is
equivalent to
;\.p.M.;\.g.;\.x.( ( (p )x) (f)x) (g)x
The Constant combinator xis equivalent to
;\.z.x
Apply to all is equivalent to our map, while Insert is equivalent to
(Y1 );\.i.M.;\.x.( ((null)(- )x)( A )x)(f)[ (A )x,( (i)f)(- )x]
Thus, we can design a relatively simple translation algorithm which
produces an equivalent ;\.-expression to any FP function. The translator it-
self represents a complete formal semantics for FP, since the meaning of
;\.-expressions has already been defined by the reduction rules.
The use of the ;\.-notation as a meta-language for semantic definitions
is quite common in theoretical computer science. Its use as a practical tool
for implementing programming languages is relatively new, but it is
spreading rapidly. This has led to the consideration of nonstrict languages
and various forms of lazy evaluation techniques which are closely related
to the normal order evaluation strategy of ;\.-calculus, which will be dis-
cussed in Chapter 6.
Imperative programs can also be translated to ;\.-calculus, but that is in
general much more complicated. The major difficulty is caused by the
presence of side-effects. It may be interesting to note that structured pro-
gramming, which is a highly disciplined way of writing imperative pro-
grams, can substantially decrease the difficulty of the translation from an
imperative language to ;\.-notation. This can be illustrated by the following
example.
102 CHAPTER 5
without any difficulty, provided that we add the function PRINT to our set
of primitive functions. Any sequence of assignment statements can be
treated in the same way, because it has a linear flow of control. The IF
statement breaks the linear flow of control though in a relatively simple
(well-structured) manner. Therefore, it can be easily translated to
A.-notation provided that its component parts have already been translated.
Similar is true for the WHILE statement, whose general form is the fol-
lowing:
vided that its predicate P and its function F can also be translated to
i\ -notation.
The translation of an unrestricted GO TO statement is obviously much
more difficult. Structured programming is, therefore, very helpful for
translating imperative programs to i\-notation. In fact, it represents a first
step towards functional programming without abolishing the assignment
statement. A limited use of the assignment statement in otherwise purely
functional languages has many advantages. The proper discipline of using
them may depend on the purpose of their usage.
ments in some sequence then we have to make sure that each element oc-
curs exactly once. We are not concerned with sets right now, so we do not
worry about possible repetitions.
The list of odd numbers can be defined also in this way:
[2*k+1 I k+Z]
This means that the elements of a list may be represented by an expression
containing some variable whose value is drawn from another list. The ex-
pression may also contain several variables each being drawn from a dif-
ferent list. The general form of a list comprehension is the following:
which has one generator and one filter. As can be seen from these exam-
ples Miranda uses infix notation for the arithmetic operations, and it has a
nice shorthand for the list of integers from 1 to some limit like n div 2. Let
us consider now the most important features of Miranda, which are rele-
vant to our discussion.
Miranda is a purely functional language which has no side-effects or
any other imperative features. A program in Miranda is called a script,
which is a collection of equations defining various functions and data
structures. Here is a simple example of a Miranda script taken from
[Turn87]:
z = sq xI sq y
sq n = n * n
x=a+b
y=a-b
a= 10
b=S
Scripts are used as environments in which to evaluate expressions. So, for
example, the expression z will evaluate to 9 in the environment represented
by the above script. Function application is denoted simply by
RULE-BASED SEMANTICS OF A.-EXPRESSIONS 105
= rhsN, testN
One can also introduce local definitions on the right-hand side of a defi-
nition, by means of a where clause, as shown in this example:
quadr abc = error "complex roots", delta<O
= [-b/(2*a)], delta=O
= [-b/(2*a)+radix/(2*a), -b/(2*a)-radix/(2*a)], delta>O
where
delta = b*b-4*a*c
radix = sqrt delta
The scope of the where clause is all the right-hand sides associated with a
given left-hand side.
As we mentioned before, Miranda is a higher-order language. Func-
tions of two or more arguments are considered Curried and function ap-
plication is left-associative. So, the application of a function to two
arguments is written simply as f x y, and it will be parsed as the
A-expression ( (f)x)y. If a function f has two or more arguments then a
partial application of the form f x is treated as a function of the remaining
arguments. This makes it possible to define higher-order functions such as
reduce, which was used in Section 4.3 for a uniform definition of the sum
and the product of a sequence. Here we can use pattern matching to define
this function as follows:
reduce a b [] = a
reduce a b (c:x) = reduce (b a c) b x
Hence, we get
sum = reduce 0 ( +)
prod= reduce 1 (*)
The alert reader must have noticed the striking similarities between
Miranda and the A.-notation. It is, indeed, very easy to translate Miranda
programs into our extended A.-notation. A script is just a set of simultane-
ous equations which can be treated as described in Section 4.4. The right-
hand side of every equation will be translated first to a valid A.-expression.
Then, in order to minimize the number of forward references, the
equations will be rearranged on the basis of the dependency analysis of the
given definitions. So, for example, the script which was given at the be-
ginning of this section will be translated as follows:
a= 10
b = 5
x = (( + )a)b
y = ((-)a)b
·sq = A.n.((*)n)n
z = ((/)(sq)x)(sq)y
Let us consider now the translation of lists and tuples. Both will be re-
presented by lists in our type-free A.-notation. Explicitly enumerated lists
can be translated directly without any problem. Also, the translation of list
RULE-BASED SEMANTICS OF A.-EXPRESSIONS 109
[Eiv-L]
((map)Av.E)L
provided that E and L are already in A-notation. So, for example, the list
of odd numbers from 1 to 99 defined in Miranda as
[2*k-1 I k - [1..50]]
will be translated to
((map)Ak.((- )( (*)2)k) 1 )(iota)50
[E I X - L; y - M ]
then we write
M' = ((map)Ay.E)M
from which we get the result in this form:
(flat)( (map)Ax.M')L
where
(flat)x = if x = [] then []
[E I v- L; P]
110 CHAPTER 5
Exercises
5.1 Show that the FP axioms listed in Section 5.3 are derivable from the
reduction rules of Section 5.1.
5.2 The following is the definition of the Ackermann function in FP:
ack = eqO • 1 -+ a • 2;
Translate this definition into i\-calculus and compare it with its Curried
version.
5.3 Define an FP function to compute the n-th Fibonacci number with the
aid of an ordered pair holding two consecutive Fibonacci numbers as an
intermediate result. Why is this better than the usual recursive definition?
Try to imitate this definition in i\-calculus without using lists.
5.4 Define the function hanoi in FP to solve the problem of the towers of
Hanoi. The function should generate the solution as a sequence of moves
denoted by ordered pairs of the form [A, B], which represent moving a disk
from tower A to tower B. The initial configuration, where all disks are on
the first tower, can be represented by [n, A, B, C], where n is the number
of disks, while A, B, and C are the names of the towers. Translate the
function hanoi to i\-notation. What would you do if you did not have lists
in the i\-notation?
112 CHAPTER 5
5.5 Design a translation scheme for the nested where clauses of Miranda,
which have the general form
f =A
where
g=B
where
h= c
etc ...
What is the difference between the above scheme and the following?
f =A
where
g=B
h= c
etc ...
5.6 The function pyth returns a list of all Pythagorean triangles with sides
of total length less than or equal to n. It can be defined in Miranda as fol-
lows:
pyth n = [ [a,b,c] I a+-[l..n];
b.-[l .. n-a];
c+- [l .. n-a-b];
sq a + sq b = sq c ]
Observe the fact that a later qualifier may refer to a variable defined in an
earlier one, but not vice versa. Translate this definition to ,\-notation.
CHAPTER SIX
let f = E 1
let g = E 2
eval E 3
Renaming nodes do not occur initially in the graph, but they may be in-
troduced during the reduction process. The type of a node determines the
number of its children. So, for example, an application node has two chil-
dren, an abstraction node has one, and a variable or constant node has
none. The left-child of an application node is the top node of its operator
part while its right-child is the top node of its operand part. So, for exam-
ple, the graph shown in Figure 6.1 represents the Curried addition
(( + )A)B.
Figure 6.1
116 CHAPTER 6
The internal representation of a list [A 1, A 2 , ••• , An] will have the form
as shown in Figure 6.2, where the list terminator node is identical with an
empty list.
Figure 6.2
Figure 6.3
OUTLINES OF A REDUCTION MACHINE 117
Figure 6.4
0
- a1
-a2
-a3
-a4
- a5
Figure 6.5
120 CHAPTER 6
- ~1
I
-~2
- ~3
-~4
Figure 6.o
OUTLINES OF A REDUCTION MACHINE 121
- y1
-y2
Figure 6.7
-
Figure 6.8
Since every recursive definition can be resolved with the aid of the Y
combinator, every .\-expression can be represented and evaluated using
only directed acyclic graphs. So, the question is why should we be con-
cerned with cyclic graphs at all? The answer can be summarized in one
word: efficiency.
Our implementation gives us the opportunity to represent recursion
by either cyclic or acyclic graphs. We have run various experiments with
both. The size of the acyclic graph tends to grow more rapidly during the
evaluation than that of the corresponding cyclic version. Consequently, the
evaluation process is much faster with the cyclic version than with the
acyclic one. Of course, the difference depends on the given example, but
it is so overwhelming in most cases that there can be no doubt about its
significance. This fact has far-reaching consequences with respect to the
parallel implementation to be discussed in the next chapter.
Now, let us see a bit more closely how graph transformation is done
in our reduction machine. First of all, each node of the graph is stored as
a record with four fields:
tively. The MARKER field is only one bit long and it is used exclusively
by the garbage collector.
The particular encoding used for various node types is not important
as it is quite arbitrary. In our implementation we use, for instance, 1 for the
CODE of an abstraction node, whose OP 1 contains the name of the bound
variable while its OP2 points to its only child. The CODE of an application
node is 2, and its OP1 and OP2 are pointers to its children.
The reduction process involves a traversal of the expression graph
while looking for a redex. This will be done in a depth-first manner begin-
ning with the root node of the entire graph. In order to locate a {3-redex it
is necessary to find an application node first. The record of an application
node encountered will be stored in a register called N 1. Then the record
of its left-child will be stored in N2. If that happens to be an abstraction
node then its left-child will be stored in N3, and the selection of the ap-
propriate {3-rule begins with a search for a free occurrence of the bourd
variable (OP1 of N2) in the subexpression whose root is in N3. If no such
occurrence is found then we have a {32 redex. Otherwise, the CODE of
N3 will decide whether we have a {31, {33, or {34 redex. In each of these
cases the graph will be changed accordingly and the search for the next
redex continues. However, if the node in N3 is neither a variable, nor an
abstraction, nor an application then we have no {3-redex here, and we should
look for another redex.
For finding an a-redex only two nodes are to be checked. Each time a
renaming node is found during the traversal of the graph, it will be stored
in Nl. Then its right-child will be stored in N2, while N3 remains idle.
Similar is true for the two y-rules.
A nice feature of all these rules is that we can recognize their patterns
by looking only at two or three nodes of the entire expression. The rest of
the expression will have no influence on the type of the redex in question.
The only exception is represented by the {32 rule whose applicability de-
pends on the fact whether or not the bound variable occurs free inside the
operator part of the redex. The search for a free occurrence of a variable
in a subexpression is clearly not an elementary operation. It may be
treated, however, as a preliminary test, because it does not change the
graph at all.
This brief description of the operation of the machine must be suffi-
cient for certain observations. First of all, it is easy to see that the instruc-
124 CHAPTER 6
tion set of the machine is indeed isomorphic with a set of reduction rules.
Also, it must be clear that these instructions can be easily simulated on a
conventional computer. An unusual feature of these instructions is, per-
haps, their synthetic nature, since they are assembled from different nodes,
i.e. from different parts of the main storage. Aside from that, the graph can
be interpreted as a structured set (as opposed to a sequence) of in-
structions and thus, it represents indeed a program for computing the re-
sult. This program, however, will change significantly during its execution
and it eventually develops into its result. This makes reduction machines
entirely different from the more conventional fixed program machines.
The operation of the reduction machine is controlled by the contents
of three registers, N 1, N2, and N3. The main purpose of these registers is
simply pattern matching with the left-hand sides of the rules. Fortunately,
the left-hand sides of our rules have very simple patterns which make them
relatively easy to match with the appropriate portion of the graph.
In the previous section we have seen the graph transformation rules asso-
ciated with the a-, /3-, and y-rules. The implementation of the other re-
duction rules follow the same approach. Their patterns have been designed
in such a way that they can be easily recognized by checking only a few
adjacent nodes in the graph.
Each of our primitive functions represents either a unary or a binary
operation. The binary ones are always Curried, except for the infix list-
constructor. If we did not restrict ourselves to a maximum of two arguments
then the patterns to be recognized by the machine would be more complex.
We think that decomposing multi-argument functions to simpler ones is
better than using a more complicated pattern matching procedure. For
instance, we can implement the S combinator in two steps as follows.
((S)A)B ... (S 1)[A,B]
and
((S 1)[A,B])C ... ((A)C)(B)C
OUTLINES OF A REDUCTION MACHINE 125
This way we do not need more registers and the extra time spent on the
intermediate transformation will be compensated by the overall simplicity
of the pattern matching operation.
Before the application of a primitive function its operand(s) may have
to be evaluated or at least examined to some extent. As we have discussed
in the previous section, whenever an application node appears in register
N 1, its left-child will be stored in N2. If that is a unary function then the
top node of its operand is obviously the right-child of N 1, which will be
stored in a separate register called N4.
Consider now, for example, the implementation of the " operator as
shown in Figure 6. 9. If N4 is an infix list-constructor then we can use its
OP1 (the pointer to its left-child) for making the necessary changes in the
graph; otherwise, the " operation cannot be performed at this point, but it
may become executable later if and when th~ operand gets reduced to an
expression which begins as a list. All the other unary primitive functions,
including the projections, are treated in a similar fashion, which means that
their arguments will be analyzed only to the necessary degree.
Figure 6.9
The patterns of the binary operations are similar to the Curried addi-
tion as shown in Figure 6.1. This means that both N 1 and N2 must contain
application nodes when the binary operator appears in N3. Hence, the
right-child of N2 is the top (root) of the first operand while the right-child
of N 1 is the top of the second operand. If the operator in N3 is either an
arithmetic or a relational operator and the operands are numbers then the
operation is performed and the result is stored in N 1. In other words, the
126 CHAPTER 6
given redex will collapse to a single node holding the numeric value of the
result. (The address of the node stored in N 1, i.e. the top node of the redex,
is kept, of course, in a separate register.) If the operands are not numbers
then the execution of the arithmetic operation must be postponed until the
operands are reduced to numbers.
The arithmetic and the relational operators cannot be computed lazily
(without fully evaluating their arguments), because they are strict. So are
the boolean operators, as well as the predicates, except for null, which is
semi-strict. Sometimes, the latter can produce an answer just by looking
at the beginning of its operand without evaluating it.
Most of our list manipulating operators are implemented as semi-strict
functions that can be applied to partially evaluated lists. In order for the
function map (apply to all) to have the same opportunity it has been im-
plemented lazily as shown in Figure 6.1 0. This means that the map opera-
tion will be decomposed into a sequence of its partial applications. The
same is true for our implementation of the append function.
Figure 6.10
-
Figure 6.11
that our /33 rule is more like an intermediate step (a preparation for con-
traction) rather than a contraction by itself. Consider the following
A-expression
((Ax.Ay.E)P)Q
where E, P, and Q are arbitrary A-expressions each containing free occur-
rences of x andy. Contracting the leftmost redex yields
(Az.(Ax. {z/y} E)P)Q
by /33. Now, the contraction of the leftmost redex gives us
( (Az.Ax. {z/y}E)Q)(Az.P)Q
Hence, we get
(Av.(Az. {v /x} {z/y}E)Q)(Az.P)Q
and again
((A v.Az. {v /x} {z/y }E)(Az.P)Q)(Av.Q)(Az.P)Q
This shows that strictly normal order reduction cannot work here. Actu-
ally, it will run indefinitely without making any progress. Fortunately, it is
easy to fix this problem. All we have to do is to remember that after each
/33 reduction the next redex to work with must be its 'trace' that is the one
that follows the newly created abstraction prefix Az. This slightly modified
version of normal order reduction will avoid the above trap as can be easily
verified by the reader. For the sake of simplicity we shall use the term
normal order to refer to this slightly modified version.
Normal order reduction can also be compared with the so called
demand-driven (or call by need) evaluation strategy which is usually defined
by the property that the argument(s) of a function are not computed until
their value becomes necessary for the computation of the given function.
This means that a 'function call' would not automatically trigger the eval-
uation of the argument(s). An argument is evaluated during the computa-
tion of the function (execution of the body) if and only if its value is
actually needed. Take, for instance, the following program:
let iterate= Af.Ax.((&)x)((iterate)f)(f)x
letoddlist = ((iterate)(+)2)1
130 CHAPTER 6
and then to
(( + )20)(i\x.(pred)x)((*)5)(succ)3
when the normal order is followed. This shows that in normal order re-
duction the subexpression
( ( *) 5) (succ) 3
will be copied in its original form and thus, it seems, it will be evaluated
twice. On the other hand, due to the Church-Rosser theorem, it can also
OUTLINES OF A REDUCTION MACHINE 131
be evaluated before the copying occurs so that only its result will be copied.
That is why applicative order evaluation is considered more efficient. It is
similar to the call by value technique of passing parameters to subroutines,
while normal order reduction is comparable with the call by name tech-
nique.
Notice, however, that copying in graph representation is usually per-
formed by setting pointers to the same copy. At the same time, the first
evaluation of a shared subexpression makes its value available to all of its
occurrences. So, it seems that normal order graph reduction may represent
the combination of the best features of both worlds. It is safe, as far as
termination is concerned, and it can avoid some obviously redundant
computations.
There have been some studies about the relative efficiencies of various
implementation techniques for applicative languages, but there are no clear
winners. This should not be surprising at all, if we consider the generality
of the problem. We are dealing with the efficiency of the process of eval-
uating arbitrary partial recursive functions. Standard complexity theory is
clearly not applicable to such a broad class of computable functions. The
time-complexity of such a universal procedure cannot be bounded by any
computable function of the size of the input. Even if we restrict ourselves
to a certain subclass of general recursive, i.e. computable functions, say,
the class of deterministic polynomial time computable functions, the the-
oretical tools of complexity theory do not seem to help. Complexity theory
is concerned with the inherent difficulty of the problems (or classes of
problems) rather than the overall performance of some particular model
of a universal computing device.
A precise analytical comparison of different function evaluation tech-
niques is extremely difficult. A more practical approach is to apply some
statistical sampling techniques, as is usually done in the performance anal-
ysis of hardware systems.
Now, let us go back to the implementation of normal order graph re-
duction in our reduction machine. As we mentioned earlier, the search for
the leftmost redex corresponds to a depth first search in the expression
graph. Normal order reduction, however, cannot be done in a strictly left
to right manner, because the contraction of the leftmost redex may create
a new redex extending more to the left than the current one. A simple ex-
ample is the following:
132 CHAPTER 6
makes the evaluation faster, but a strictly normal order is also feasible in
this case.
Strict and non-strict functions are treated alike. If an argument does
not have the proper form (type) then its evaluation gets started. But, after
each reduction step during the evaluation of the argument an attempt is
made at the application of the function to the partially evaluated argument.
Thus, the argument will be evaluated only to the extent that is absolutely
necessary for the application of the given function.
Observe the fact that this behavior of the reduction machine is a direct
result of the normal order reduction strategy, and no special tricks like
suspensions etc. are needed. This uniformly lazy evaluation strategy may
cause a significant loss in the efficiency when computing strict functions.
Improvements can be achieved by strictness analysis and some other tricks
which we do not discuss here. The whole issue of strictness vs. laziness
appears in a different light when the sequential model of computation is
replaced by parallel processing.
Exercises
6.1 Design an LL(l) parser for ,\-expressions based on their syntax given
in Section 5.1. Supplement this parser by 'semantic actions' to produce the
graph-representation of the input expression.
6.2 Design an output routine to print a ,\-expression in string format when
its graph is given as a directed acyclic graph.
6.3 Design graph-reduction rules for a direct implementation of each of the
following combinators (functions):
(a) append as defined in Section 4.3
(b) sum as defined in Section 4. 3
(c) iterate as defined in Section 4.5
(d) curry as defined in Exercise 4.2
(e) Insert as defined in Section 5.3
(f) Y' and Y defined by the reduction rules
the expression. This is in sharp contrast with the explicit parallelism con-
trolled by the programmer via specific language constructs. Explicit
parallelism is based on the assumption that the programmer has a conscious
control over the events occurring simultaneously during the execution of
the program. This explicit control of parallelism may become extremely
difficult when the number of concurrent events gets very large. A con-
scious control of hundreds or even thousands of parallel processes could
place a tremendous burden on the programmer's shoulders. On the other
hand, it has been suggested by many experts that the implicit parallelism
of functional languages may offer a viable alternative to the programmer
controlled explicit parallelism used in imperative languages like Concurrent
Pascal or ADA.
The graph representation of A-expressions described in the previous
chapter makes their structure more visible, which helps to determine the
interdependence of its subexpressions. Also, when searching for a redex,
we need to locate only a few of its nodes that are characteristic for the
redex in question. These characteristic nodes can be easily distinguished
from the rest of the graph and thus, even nested redexes can be contracted
simultaneously, provided that they have disjoint sets of characteristic
nodes.
The design of our parallel graph reduction strategy is based on a
multiprocessor model with the following assumptions:
(1) We assume that we have a shared memory multiprocessor sys-
tem where each processor can read and write in the shared mem-
ory.
(2) One of the processors will be designated as the master while
the others are called subordinate processors.
(3) Initially the graph representation of the input expression will
be placed in the shared memory. Then the master will start re-
ducing it in normal order.
(4) Whenever the master determines that a subexpression should
be reduced in parallel with the normal order then it will place that
subexpression in a work pool.
(5) The subordinate processors will send requests to the work
pool for subexpressions to be reduced. When a subordinate
processor is given a subexpression it will reduce it in normal order.
TOWARDS A PARALLEL GRAPH-REDUCTION 137
In the case of a nonstrict function, some of the arguments are not al-
ways needed for the computation of the function value. But that may de-
pend on the value of the other arguments, which makes it impossible to tell
in advance which of the arguments should be evaluated and which should
not. (Take, for example, the multiplication as a nonstrict function in both
of its arguments meaning that either argument may be undefined when the
other evaluates to zero.) Therefore, one can only speculate on the possible
need for evaluating those arguments before actually doing it.
The time spent on a speculative computation may turn out to be a
wasted effort only after the fact. In order to minimize the time and space
wasted on speculative computations, they have to be controlled very care-
fully. The point is that a strictly demand-driven evaluation strategy is in-
herently sequential and thus, it is very limited as far as parallel
computations are concerned. Speculative computations, on the other hand,
are risky, so they must be kept under control in order to avoid excessive
waste of time and/ or space.
tation. Such a 'stop and go' technique is quite reasonable when we have
only one processor at hand.
The same technique can also be used with several processors. This
means that each processor would perform graph reduction concurrently
with the others until the free space is consumed. At that point they all
switch over to garbage collection and then the whole process is repeated.
The main advantage of this approach is that the graph will be frozen during
the marking phase. The only problem is that the processors must switch
simultaneously from the computing stage to the garbage collecting stage
and vice versa, and that may involve a great deal of synchronization over-
head.
Therefore, in a multiprocessor system it seems better to collect the
garbage 'on-the-fly', i.e. concurrently with reducing the graph. Some of
the processors can be dedicated to do garbage collection all the time while
others are reducing the graph. This approach will largely reduce the over-
head of task switching and global synchronization but marking an ever
changing graph is a much more difficult task than doing the same with a
static graph. (The graph behaves as a moving target for the marking
phase.) It is, in fact, impossible to mark precisely the graph when it keeps
changing all the time.
Fortunately, as already noted by Dijkstra et al. in [Dijk78], a precise
marking is not absolutely necessary for garbage collection. It is enough to
guarantee that all the active nodes get marked during the marking phase,
but it is not necessary that all garbage nodes be unmarked when the col-
lecting phase begins. In other words, it is sufficient to mark a 'cover' of the
graph in order to make sure that no active nodes are collected during the
collecting phase. Some of the garbage nodes may remain uncollected in
each collecting phase provided that they will be collected at some later
stage. To put it differently, every garbage node can have a finite 'latency'
period after being discarded and before getting collected.
This last observation was the key to the design of a new 'one-level
marking algorithm' due to Peter Revesz [RevP85]. His one-level garbage
collector works very well for directed acyclic graphs but it cannot collect
cyclic garbage. Unfortunately, as we mentioned before, cyclic graphs are
more efficient for representing recursive definitions than acyclic ones.
Therefore, we have decided to use directed cyclic graphs for representing
i\.-expressions involving recursion and look for a more sophisticated gar-
140 CHAPTER 7
bage collection technique for dealing with cyclic garbage. Cyclic garbage
is obviously much more difficult to find, because each node occurring in a
'cyclic garbage structure' has at least one parent (nonzero 'reference
count').
There are many on-the-fly garbage collection techniques available in
the literature that work for cyclic graphs. (See our bibliographical notes.)
It seems, however, that cyclic garbage structures do not occur very often
in our graph reducer, that is, the typical garbage structure tends to be
acyclic in our case. So, we have decided to combine the technique devel-
oped for acyclic graphs by Peter Revesz with a more elaborate technique
that can handle cyclic garbage. The algorithm developed by Dijkstra et al.
[Dijk78] appears to be the most convenient for our purpose.
Consider first the one-level garbage collector that works for directed
acyclic graphs. This algorithm requires only one scan of the graph memory
to find a 'cover' of the active graph.
Assume that the node space (graph memory) consists of an array of node
records which are indexed from 1 to N. Each node record has a one bit field,
called marker, that is used exclusively by the garbage collector. At any
point in time there are three kinds of nodes in this array: ( 1) reachable
nodes representing the active graph, (2) available nodes in the free list, and
(3) garbage nodes.
The free list is a linked list of node records that is treated as a double
ended queue. The 'root' node of the active graph, as well as, the 'head' and
the 'last' of the free list must be known to the garbage collector. Initially
the marker of each node is set to zero. Marking a node means setting its
marker to one. Collecting a garbage node means appending it to the end
of the free list as its new 'last' element.
The marking phase of the garbage collector starts by marking the
root node of the graph and the head of the free list. Then it scans
the node space once from node[ 1] to node[N], meanwhile marking
the children of every node.
This means that every node having at least one parent will be marked, re-
gardless of the marking of its parent(s). Thus, all reachable nodes as well
as the free nodes will be marked, i.e. included in the cover. Garbage nodes
having at least one parent will also be marked. Note, however, that if there
is any acyclic garbage then it must have at least one node without a parent.
TOWARDS A PARALLEL GRAPH-REDUCTION 14 I
The collecting phase scans the entire node space once, and collects
the unmarked nodes while resetting the marker of every node to
zero.
As we said before, a totally orphaned node (first level garbage) will be left
unmarked during the marking phase. Hence, it will be collected imme-
diately during the following collecting phase. It may, however, have many
descendants which are latent garbage at that point. When the collector
collects the orphans then their children become orphans (except for those
having other parent(s), as well), and this will be repeated until the entire
garbage structure is collected. This means that no latent garbage can be
lost forever but the length of the latency period depends on the depth of
the garbage structure itself.
It is also possible that more than one processor is dedicated to the task
of garbage collection in which case the node space will be subdivided into
equal intervals each of which is being scanned by one of those processors.
The two phases (marking and collecting) of these parallel garbage collec-
tors must be synchronized in this case.
Remember that the garbage collector works in parallel with the graph
reducers. This means that nodes may be discarded concurrently with the
execution of the marking phase as well as of the collecting phase. If a node
becomes an orphan during the marking phase after one of its previous
parents has already been scanned, then it obviously remains marked
through the end of the marking phase. Its marker will be reset to zero only
during the collecting phase that follows, but it will not be collected at that
time. During the next marking phase, however, it will obviously remain
unmarked, hence, it will be collected afterwards. In other words, all first
level garbage that exists at the beginning of a marking phase will be col-
lected during the immediately following collecting phase. This means that
all nodes of an acyclic garbage structure will become orphans sooner or
later and thus, they all will be collected eventually by this method.
The insertion of new nodes into the graph while the garbage collector
is working is another matter. The expression graph and the free list contain
all nodes that are reachable either from the root of the expression graph
or from the head of the free list. So, the free list can be treated as part of the
active graph. This means that the free nodes are also considered reachable,
hence, removing a node from the free list and attaching it to the expression
graph is the same as removing an edge from the graph and inserting another
142 CHAPTER 7
edge between two reachable nodes. The reduction rules can also be de-
composed into such elementary steps that the whole process of graph re-
duction consists of a series of elementary steps each being either (i)
removing an edge or (ii) inserting a new edge between two reachable
nodes. The question is how these elementary transformations affect the
on-the-fly garbage collector?
First of all, the insertion of a new edge between two reachable nodes
does not change the reachability of any of the nodes. On the other hand,
the removal of an edge may create an orphan. If that node is being dis-
carded at that point, then we have no problem. But, if the same node is also
the target node of a new edge inserted in the process then strange things
can happen. The problem is dealt with in the paper [Dijk78], and it can be
illustrated by the example shown in Figure 7.1.
Figure 7.1
Here we assume that the edge from B to D is inserted and the edge from
C to D is removed by the reducer. Assume further that the marking algo-
rithm scans these nodes in alphabetic order. Scanning A results in marking
both B and C. But, if B is scanned before the edge from B to D is inserted,
and C is scanned after the edge from C to D is removed then D will never
be marked.
Note that the order in which the insertion and the removal of these
edges occur does not matter, because both can happen during the time
between scanning B and scanning C. Node D may thus be missed by the
marking algorithm even if it is has never been disconnected from the graph.
In order to correct this situation we have to place the following demands
on the reducer(s):
TOWARDS A PARALLEL GRAPH-REDUCTION 143
(a) New edges can be inserted by the reducer only between already
reachable nodes.
(b) The target node of a new edge must be marked by the reducer as
part of the uninterruptable (atomic) operation of inserting the edge.
The first requirement prevents any reachable node from being 'temporarily
disconnected'. The second requirement prevents the target node of a new
edge from being collected during the first collecting phase that gets to this
node after the new edge has been inserted. (The target node must survive
until the beginning of the next marking phase in order to be safe.)
These requirements can be easily satisfied by the implementation of
the graph-reduction rules. (The order of inserting new edges while remov-
ing others must be chosen carefully.)
The only problem with this technique is that it cannot collect cyclic
garbage. For that purpose we could use some other technique as an emer-
gency procedure only when necessary. But, a closer analysis of the situ-
ation has shown us that the one-level garbage collector can be combined
with others in a more efficient manner. The one-level garbage collector
places the same demands on the reducer(s) as does the so called DMLSS
technique described in [Dijk78]. Therefore, the latter seems to be a natural
choice for such a combination. Furthermore, the basic loop of the marking
phase in each technique consists of a single scan through the node space.
During this scan the children of the nodes are marked. So, the basic loops
of the two marking algorithms can be merged killing two birds with one
stone [RevP85].
To show how this works, let us summarize the DMLSS technique. Its
marking phase makes use of three different markings, say colors, white,
yellow, and red. Initially all nodes are white. The marking phase begins with
coloring the root of the graph and the head of the free list yellow. Then the
marking phase would try to pass the yellow color of a parent to its children
and, at the same time, change the parent's color to red. The purpose of this
is to color all reachable nodes red while using yellow only as an intermedi-
ate color representing the boundary between the red and the white
portions of the graph in the course of propagating the red color to the en-
tire graph. The marking process is finished when the yellow color disap-
pears. At that point all reachable nodes are red.
The converse of the last statement is obviously false, because there
may exist red nodes that became garbage during the marking phase after
144 CHAPTER 7
they were colored red. This means that the marking is not precise, but that
is not a problem as long as it 'covers' the graph. For a detailed proof of the
correctness of the algorithm we refer to the original paper [Dijk78] or to
[BenA84].
The reducer(s) must color the target node of a new edge yellow as part
of the atomic operation of insertion. (See demand (b) above.) More pre-
cisely, it should be colored yellow if it is white and it should retain its color
if it is yellow or red. To simplify the description of this operation we use
the notion of shading a node, which means coloring it yellow if it is white,
and leaving its color unchanged if it is yellow or red. So, the basic loop of
the marking phase will have this form:
counter : = 0;
FOR i : = 1 TO N DO
IF node[i] is yellow THEN
BEGIN
color node[i] red;
counter : = counter + 1;
shade the children of node[i];
END
This basic loop will be repeated as long as there are any yellow nodes in the
graph. When no more yellow nodes are left (counter = 0) then the col-
lecting phase is executed as a single sweep through the node space in which
the white nodes are collected and the color of each node is reset to white.
Normally, the marking phase of the DMLSS technique takes several
iterations. Nevertheless, its basic loop can be combined with that of the
one-level marking algorithm by using a three bit marker field for each node,
where the one bit marker for the one-level algorithm and the two bit
marker for the three colors of the DMLSS algorithm will be stored side by
side. Let us use the colors white and blue for marking with the one-level
algorithm. These can be combined with the three colors of the DMLSS al-
gorithm as follows.
phase of the DMLSS algorithm is executed, which collects all nodes except
the purple ones.
The combination of these two techniques has some interesting prop-
erties. Consider the case when a yellow node gets discarded by the reducer
and most, or all of its descendants become garbage as a result. The
DMLSS algorithm would not notice this fact during its protracted marking
phase. Therefore, it will color this node red and keep propagating the color
to all of its descendants. The one-level collector, however, will recognize
it as first level garbage in the next iteration of the basic loop and then col-
lect it immediately thereafter.
Now, the combined algorithm will color this node red rather then pur-
ple during the next iteration of the basic loop. (Its 'blue bit' remains zero
because now it is an orphan.) At the same time, its children will be shaded
with green. Then the node itself will be collected during the next collecting
phase of the one-level collector, leaving its children orphans. But, these
children will retain their yellow or red colors after resetting their 'blue bits'.
Therefore, the propagation of the color continues through the entire gar-
bage structure one step ahead of the one-level collector. If the garbage
structure in question has no cycles then it will be collected by the one-level
collector by the time the DMLSS algorithm is finished with its marking
phase. The DMLSS algorithm would need another complete marking phase
in order to collect this garbage structure. Garbage pick-up is more evenly
distributed in time with the one-level collector.
The idea of combining his superficial one-level garbage collector with
a slow but thorough one is due to Peter Revesz [RevP85]. For more details
on this and other garbage collection techniques we refer to the literature.
To conclude this section we have to emphasize that the free list re-
presents an important interface between the garbage collector and the
graph reducer. It is implemented as a shared queue which can be updated
by both the reducer(s) and the collector(s). The reducers are the consumers
the collectors are the producers of the free list. The contention among these
processes for the shared queue must be handled very carefully in order to
achieve maximum efficiency. For more details on shared queue manage-
ment techniques see [Gott83] or [Hwan84].
TOWARDS A PARALLEL GRAPH-REDUCTION 147
The answer to the first question is relatively simple. The control stack of
the master represents the main thread of the computation while the work
pool represents the pending tasks for possible parallel computations. Each
reducer will have a status bit to tell if it is busy or idle. When busy, it will
also store the address of the expression that is being reduced by it.
Whenever a subordinate processor starts working on a subexpression,
it will insert a special node into the graph in order to alert other processors
which bump into this subexpression while traversing the graph in normal
order. This scheme was devised by Moti Thadani [Thad85] who also de-
veloped the basic version of this control strategy for parallel graph re-
duction. The extra node is called a 'busy signal' node, which contains
information about the processor currently working on the subexpression.
Otherwise, this node is treated as an indirection node.
Now, the question is what happens when two processors are trying to
reduce the same subexpression. Two cases must be distinguished:
(a) The master bumps into a subexpression currently being reduced
by a subordinate processor.
(b) A subordinate processor bumps into a subexpression currently
being reduced by another processor.
In case (a) the master will stop the subordinate processor and take over the
reduction of the subexpression as it is. In case (b) the processor already
working on the subexpression will continue its work and the other
processor will stop, i.e. go back to find some other task from the work pool.
A subordinate processor must be halted also when the master discov-
ers that it performs useless computation. This can happen, for example,
when a /32-redex is contracted which throws away the operand. The busy
signal node is quite helpful in this case, because it holds the identifier of the
processor to be stopped. Note that a subordinate processor cannot initiate
other processes, so it has no offspring to worry about when killing it. Of
course, the busy signal node must be eliminated from the graph when the
corresponding processor stops.
We must observe that subexpressions may be shared and thus, the
subexpression discarded in a 132 step may still be needed later on. Never-
theless, it is better to stop evaluating it after the /32 step, because no effort
that may have already been spent on it will be wasted. The intermediate
result in the form of a partially reduced graph is always reusable. On the
TOWARDS A PARALLEL GRAPH-REDUCTION 149
The original proof of the Church-Rosser theorem was very long and
complicated [Ch-R36]. Many other proofs and generalizations have been
published in the last 50 years. The shortest proof, known so far, is due to
P. Martin Lof and W. Tait. An exposition of their proof appears as Ap-
pendix 1 in [Hind72] and also in [Hind86].
We present below an adaptation of this proof to our definitions of re-
naming and substitution as given in Section 2.2. These definitions are
slightly different from the standard ones. By using these definitions we can
ignore the so called a-steps in the proof. In order to make sure that our
proof is correct, we have worked out most of the details which are usually
left to the reader as an exercise. Therefore, our proof appears to be longer
but it is not really so.
The main idea of the proof is to decompose every ~-reduction into a
sequence of complete internal developments, and show that the latter have
the diamond property. Then the theorem can be shown by induction on the
length of this sequence. The definition of a complete internal development
is based on the notion of the residual of a ~-redex.
Definition A.l (Residual) Let R and S be two occurrences of
~-redexes in a A-expression E such that S is not a proper part of
R. Let E change to E' when R is contracted. Then, the residual of
S with respect toR is defined as follows:
A PROOF OF THE CHURCH-ROSSER THEOREM 153
Lemma A.l If S ,..... S' then {z/y}S ,..... {z/y}S' for any variables y
and z.
Proof We use induction on the number of occurrences of vari-
ables in S, where the occurrence of a bound variable next to its
binding A will also be counted. (This is basically the same as in-
duction on the length of S.)
If S is a single variable then there is nothing to prove.
If S has form Ax.P then there is some P' such that P ,..... P' and
S' ~ Ax.P'. Hence, the assertion follows immediately from the in-
duction hypothesis and Definition 2.2.
If S has form (P)Q then two subcases arise:
Case a: Every /3-redex selected for the given complete internal devel-
opment is in P or Q. In this case there exist A-expressions P' and
Q' such that P ,..... P', Q ,..... Q', and S' ~ (P')Q'. But then the in-
duction hypothesis gives us the following complete internal de-
velopment:
{z/y}S = {z/y}(P)Q = ({z/y}P){z/y}Q,.....
({z/y}P'){z/y}Q' ~ {z/y}(P')Q' ~ {z/y}S'.
Au.[[N/x]Q/x]{u/v}P ~ Au.[N/x][Q/x]{u/v}P ~
[N/x]Au.[Q/x]{u/v}P ~ [N/x][Q/x]Au.{u/v}P ~
[N/x][Q/x]Av.P
which was to be shown.
Finally, if S has form (E)F then the assertion follows easily
from the induction hypothesis. Namely,
[[N/x]Q/x](E)F ~ ([[N/x]Q/x]E)[[N/x]Q/x]F ~
([N/x][Q/x]E)[N/x][Q/x]F ~
[N/x]([Q/x]E)[Q/x]F ~ [N/x][Q/x](E)F
156 APPENDIX A
[[N/x]Q/y][N/x]A.v.P ~ A.u.[[N/x]Q/y][N/x]{u/v}P ~
A.u.[N/x][Q/y]{u/v}P ~ [N/x][Q/y]A.v.P
Finally, if S has form (E)F then the assertion follows easily from
the induction hypothesis.
Part (c): If S is a single variable then x E cp(S) implies S ~ x for
which the assertion is trivial.
If S has form A.v.P then v '¢. x must be the case. Then we can
choose some variable u such that u is neither free nor bound in P
and u ¢ cp(N) U cp(Q) U {x,y,z,v}. Now, the induction hypothesis
gives us
[[N/x]Q/z][N/x]{z/y}A.v.P ~
[[N/x]Q/z][N/x]{z/y}A.u.{u/v}P ~
A.u.[[N/x]Q/z][N/x]{z/y}{u/v}P ~
A.u.[N/x][Q/y]{u/v}P ~
[N/x][Q/y]A.u.{u/v}P ~ [N/x][Q/y]A.v.P
Finally, if S has form (E)F then the assertion follows easily from
the induction hypothesis, and this completes the proof.
Lemma A.3 If M - M' and N - N' then for any variable x
[N/x]M- [N' /x]M'.
Proof. We use induction on the construction of M.
If M is a variable then the assertion is trivial.
If M is of the form A.y.S then M' ~ A.y.S' for some S' with S
- S'. Three subcases arise:
Case A: x = y. Then the following is a complete internal devel-
opment:
[[N'/x]Q'/y)[N'/x]S' ~ [N'/x][Q'/y]S' ~
[N' /x]M'.
Subcase 2(c): x '¢. y, and xEcj>(S') and yEcj>(N'). Then,
by part (c) of Lemma A.2, the following is a complete
internal development:
[N/x](Ay.S)Q ~ ([N/x]Ay.S)[N/x]Q ,....
([N' /x]Ay.S')[N' /x]Q' ~
(Az.[N' /x]{z/y}S')[N' /x]Q' -+
Zm~W
Figure A.l
T;;;;Wn
Figure A.2
is obviously real regardless of the type of x, unless it is in error. So, the in-
formation supplied by an explicit type assignment to the variables may turn
out to be redundant, which makes type checking interesting. It is, in a
sense, the confrontation of a 'specification' with the actual program.
Definition B.2 is actually a set of inference rules to determine the type
of typed A-expressions. These inference rules can also be used for inferring
the type of certain variables occurring in a typed A-expression. For in-
stance, the type of x in
Ax.(succ)x
must be int, because the function succ is of type int-int. So, the type of a
A-expression may be well-defined even if no explicit type assignment is
given to some of its variables. Unfortunately, it is very difficult to deter-
mine in general which of the variables need not be explicitly typed. Many
sophisticated type inference systems have been developed for typed pro-
gramming languages. (See, e.g. [MacQ82], [Mart85], and [Hanc87].)
INTRODUCTION TO TYPED A.-CALCULUS 165
while
[Ax.E 1, ... , AX.En]
is of type
Similar rules can be given for map and append, so the type of an application
involving these operators can be determined from the types of the oper-
ands.
It should be emphasized that the type of a A.-expression depends on the
type of its components. Therefore, the type of a A.-expression will be com-
puted 'inside-out' rather than in normal order. This means that every sub-
expression of a well-typed A.-expression must itself be well-typed. To put
it differently, 'meaningful' A.-expressions cannot have 'meaningless' sub-
expressions.
A completely formal calculus on types can be developed along these
lines:
SYNTAX OF TYPE-DESCRIPTORS
<type-descriptor> ::=<ground-type> I <abstraction-type> I
<application-type> I <list-type> I
<operator-type> I <union-type>
<ground-type> ::=intI real I boolean
<abstraction-type> :: = <type-descriptor> ... <type-descriptor>
<application-type> :: = (<type-descriptor>) <type-descriptor>
<list-type> :: = [] I [<type-descriptor> <list-type-tail>
<list-type-tail> :: = ] I , <type-descriptor> <list-type-tail>
< opeartor-type > ::= +I- I* I I I< I ~ I= I ~ I> I¥- I" I"" I &
<union-type> :: = <type-descriptor>U<type-descriptor>
This syntax corresponds to an extended version of Definition B.l. The
most significant extension is represented by the application-type formed
with two arbitrary type-descriptors. The purpose of type checking is now
to determine whether or not the types involved in an application actually
match. The operator-type is used only for the overloaded operators, whose
types depend on their context.
INTRODUCTION TO TYPED A-CALCULUS 167
(d)not = boolean-+boolean
(§)null = f -+boolean
(d)x = 'T if x is a variable of type 'T
(( + )int)int = int
(( + )int)real = real
(( + )real)int = real
(( + )real)real = real etc ...
((boolean)-r)a = -rUa
(A)[] = []
("'-')t = t
( (& )-r) [] = [ 1']
(int- boolean)int
will be simplified as boolean. After performing all possible simplifications
we obtain the following type equation:
'T = int- ((boolean)int) (int-int)( 'T )int
where T is the only variable. A possible solution to this equation is
T = int-int
which clearly satisfies the equation. The existence of a solution to the type
equation does not necessarily imply the existence of a well-defined recur-
sive function satisfying the given definition. If, for example, in the above·
definition of fact we replace the pred function by the succ function then
we get the same type equation, but the function in question is undefined
for n > 0. Therefore, type checking is not fool-proof.
Note that the combinators true and false can also be treated as over-
loaded operators. Other 'type-free' combinators can be treated in a similar
fashion. For instance, the identity combinator I may be defined for every
typed A-expression E with the property
(I)E = E for every typed A-expression E,
and with the simplification rule
(I)'T = 'T for all TE Typ
The same technique can be used also for theY combinator, which repres-
ents, perhaps, the simplest solution to the problem of recursive definitions
in typed A-calculus.
BIBLIOGRAPHICAL NOTES
II Ill
9 780521 345897 Cover design : Deadman Stone