100% found this document useful (5 votes)

880 views

Lambda-Calculus, Combinators and Functional Programming PDF

Uploaded by

Madhusoodan Shanbhag

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (5 votes)

880 views

Lambda-Calculus, Combinators and Functional Programming PDF

Uploaded by

Madhusoodan Shanbhag

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 194

Lambda-Calculus, Combinators, and Functional Programming

Cambridge Tracts in Theoretical Computer Science

Managing Editor: Professor C. J. van Rijsbergen, Computing Science Department,

University of Glasgow, Glasgow, Scotland

Editorial Board:
S. Abramsky, Department of Computing Science, Imperial College of Science and
Technology, London
P. H. Aczel, Department of Computer Science, Manchester
J. W. de Bakker, Centrum voor Wiskunde en Informatica, Amsterdam
J. A. Goguen, Computing Laboratory, University of Oxford
J. V. Tucker, Department of Computer Science, University of Swansea

Titles in the Series

I. G. J. Chaitin Algorithmic Information Theory
2. L. C. Paulson Logic and Computation
3. J. M. Spivey Understanding Z
4. G. E. Revesz Lambda Calculus, Combinators and Functional Programming
5. S. Vickers Topology via Logic
6. A. M. Ramsay Formal Methods in Artificial Intelligence
7. J. Y. Girard Proofs and Types
LAMBDA-CALCULUS,
COMBINATORS, AND
FUNCTIONAL
PROGRAMMING

G. E. REVESZ

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

The right of the

University of Cambr~dgr
ro prmt and srlf
ofl manner of hoo/..s
wos gromed by
Henry VIII m /JJ.f.
The Uni~·ersity has pmffed
and published rominuously
since 1584.

CAMBRIDGE UNIVERSITY PRESS

Cambridge

New York Port Chester

JVfelbourne Sydney
Published by the Press Syndicate of the University of Cambridge
The Pitt Building, Trumpington Street, Cambridge CB2 I RP
40 West 20th Street, New York, NY 10011, USA
10 Stamford Road, Oakleigh, Melbourne 3166, Australia

© Cambridge University Press 1988

First published 1988

Reprinted 1989

Printed in Great Britain by Billings & Sons Limited, Worcester

British Library cataloguing in publication data

Revesz, Gyorgy E.
Lambda-calculus, combinators
and functional programming.-(Cambridge tracts
in theoretical computer science; v. 4).
I. Lambda calculus
I. Title
511.3 QA9.5

ISBN 0 521 34589 8

CONTENTS

Preface Vll

1. Introduction
1.1 Variables and functions in mathematics and in programming
languages
1.2 Domains, types, and higher-order functions 6
1.3 Polymorphic functions and Currying 10

2. Type-free lambda-calculus
2.1 Syntactic and semantic considerations 14
2.2 Renaming, a-congruence, and substitution 18
2.3 Beta-reduction and equality 23
2.4 The Church-Rosser theorem 25
2.5 Beta-reduction revisited 28

3. Combinators and constant symbols

3.1 Lambda-expressions without free variables 35
3.2 Arithmetic and other constants and combinators 39
3.3 Recursive definitions and theY combinator 43
3.4 Elimination of bound variables: bracket abstraction 49

4. List manipulation in lambda-calculus

4.1 An extension of the syntax of .\-expressions 59
4.2 Additional axioms for list manipulation 63
4.3 List manipulating functions 66
vi CONTENTS

4.4 Dealing with mutual recursion 72

4.5 Computing with infinite lists 76

5. Rule-based semantics of .\-expressions

5.1 Program transformation as a computation 84
5.2 Controlled reduction 89
5.3 The functional programming system FP 94
5.4 Translation of functional programs to .\-calculus 99
5.5 List comprehension in Miranda 103

6. Outlines of a reduction machine

6.1 Graph representation of .\-expressions 113
6.2 The instructions of a reduction machine 118
6.3 Implementation of primitive functions 124
6.4 Demand-driven evaluation 128

7. Towards a parallel graph-reduction

7.1 Harnessing the implicit parallelism 135
7.2 On-the-fly garbage collection 138
7.3 Control of parallelism 147

Appendix A
A proof of the Church-Rosser theorem 152

Appendix B
Introduction to typed .\-calculus 162

Bibliographical notes 171

References 176
PREFACE

There is a growing interest in lambda-calculus and combinators among

computer scientists. The appeal of these abstract mathematical theories is
due to their elementary yet quite universal nature. They seem to have suc-
cessfully captured the most general formal properties of the notion of a
mathematical function, which, in turn, is one of the most powerful con-
cepts of modern mathematics.
The relevance of lambda-calculus to computer science is quite obvious
in many cases. For instance, the design of the programming language LISP
has been largely influenced by lambda-calculus. Also, the call by name
mechanism of parameter correspondence used in ALGOL-like languages
is closely related to the operation of formal substitution as defined pre-
cisely in lambda-calculus. The same is true for the textual substitution
performed by macro generators. More recently John Backus and other
proponents of functional style programming have strongly emphasized the
importance of the function concept in computer science. Corresponding
research efforts to develop efficient implementation techniques for func-
tional languages have produced many interesting results in the theory of
combinators as well as in lambda-calculus.
An explicit and systematic use of lambda-calculus in computer science
was initiated by Peter Landin, Christopher Strachey, and a few others who
started the development of a formal theory of semantics for programming
languages based directly on lambda-calculus. Their approach is now called
denotationa/ semantics and it is widely accepted as a relevant theory. At
first, however, denotational semantics was thought to be flawed by some
Vlll PREFACE

experts, because of its naive use of the type-free lambda-calculus, which

did not seem to have any consistent mathematical model at that time. The
first mathematical model for the type-free lambda-calculus was discovered
by Dana Scott only in 1969 when he was trying to prove the nonexistence
of such models. Since then, lambda-calculus and the theory of combinators
have experienced a vigorous revival, which must have surprised even their
originators, Alonso Church and Haske! B. Curry, who started their devel-
opment in the early 30s.
Lambda-calculus may also play a more significant role in the theory
of parallel algorithms in the future. It represents an abstract mathematical
model of computation which is equivalent to the Turing machine. But,
while the Turing machine is basically a sequential device, lambda-calculus
can retain the implicit parallelism that is normally present in a mathematical
expression. In pure mathematics, there is no such thing as the 'side effect'
of a computation and thus, independent subexpressions may be computed
in parallel. We can take advantage of this type of parallelism when using
lambda-calculus.
This book approaches lambda-calculus and the theory of combinators
from a computer scientist's point of view. It presents only their funda-
mental facts and some of their applications to functional programming in
a self-contained manner. But, in contrast with the usual treatment of
lambda-calculus in computer science texts, we present formal proofs for
most of the results which are relevant to our discussion. At the same time,
lambda-calculus is treated here as a functional language implemented by
software on a conventional computer. It can also be implemented by hard-
ware in a nonconventional computer whose machine code language would
be lambda-calculus by itself. As a matter of fact, there already exist ex-
perimental machines whose instruction sets are fashioned after lambda-
calculus and/ or combinators.
As functional programming is coming of age, several more new hard-
ware designs aimed at a direct execution of functional programs are likely
to appear in the future. Some variant of lambda-calculus will certainly be
used in most of them. Future generations of computers will probably have
the speed and the necessary sophistication built in the hardware that may
make them capable of running functional programs more efficiently, with-
out placing the burden of controlling the parallelism on the programmer's
shoulders.
PREFACE IX

As can be seen from the related literature, lambda-calculus represents

the theoretical backbone of functional languages. Therefore, a systematic
study of functional languages is hardly imaginable without a proper
understanding of lambda-calculus, which is, in fact, the archetype of all
functional or applicative languages. Many of the extra features of those
languages can be treated in lambda-calculus as merely 'syntactic sugar'.
A typical example is the composition of functions in Backus's FP system,
or the ZF expressions (also called list comprehensions) in Miranda. Those
features are, of course, very useful for the design of a practical functional
language. The same is true for the various type systems which help program
debugging at compile-time. The type-free lambda-calculus by itself is not
a user oriented source language. It is rather a high-level machine oriented
language, which is an ideal target language for the compilation of various
functional languages.
The first three chapters of this book cover the fundamentals of type-
free lambda-calculus. A special feature of the book is a new axiom system
for beta-reduction presented in Section 2.5. This axiom system forms the
basis of the graph-reduction technique discussed in Chapter 6, which is
used for evaluating lambda-expressions. Another unique feature of the
book is a direct extension of the lambda-notation to allow lists as primitive
objects. This extension has been inspired by the so called 'construction',
which is one of the combining forms in Backus's FP. An interesting fact
about this combining form is that its defining property cannot be derived
from the standard axioms of lambda-calculus. But, since it is independent
of those axioms, we can add this property as a new axiom to the system.
This way we obtain an extended lambda-calculus which is very convenient
for list processing in a type-free manner.
The list-oriented features of our extended lambda-calculus are de-
scribed in Chapter 4. These features allow for a very elegant treatment of
mutual recursion. The discussion of semantics in Chapter 5 is based on the
idea of program transformations using conversion rules. This leads to a
formal definition of the semantics of the FP system designed by John
Backus, since functional programs are easy to translate to the extended
lambda-notation. Imperative programs are more difficult to translate, in
general, unless they are 'well-structured'. This represents, by the way, an
interesting justification for structured programming.
X PREFACE

Many of the implementations proposed in the literature for functional

languages make extensive use of lambda-calculus and/ or combinators. In
Chapter 6 of our book we describe a graph-reduction based interpreter for
our extended lambda-calculus, which has been implemented on the IBM
PC and then ported to some other IBM machines. A parallel version of the
interpreter is currently being tested on a simulator of a shared memory
multiprocessor machine. All the examples and the exercises included in this
book have been checked out with the aid of the sequential interpreter,
which can be used, by the way, as a software tool for a hands-on course
on lambda-calculus. It can also be used for implementing other functional
languages by translating them into our extended lambda-notation.
We hope that this book will help many computer scientists get ac-
quainted with the basics of lambda-calculus and combinators, which, only
a short while ago, were not fully appreciated by many mathematicians.
The bibliographical notes at the end of the book are far from being com-
plete. Nevertheless, they may serve as a starting point for a more thorough
literature research in some selected areas by the reader.
The author gratefully acknowledges the support by the Thomas J.
Watson Research Center of IBM, which provided the time and the tools,
but most importantly the stimulating environment for writing this book.
CHAPTER ONE

INTRODUCTION

1.1 Variables and functions in mathematics and in pro-

gramming languages

The use of variables in conventional programming languages is quite dif-

ferent from their use in mathematics. This has caused a great deal of diffi-
culty in the definition of the meaning of a variable in programming
languages like Fortran and Pascal. It is indeed very difficult to define ex-
actly the meaning of a variable that is subject to various assignments within
the scope of its declaration.
In a given mathematical context, every occurrence of a variable usually
refers to the same value. Otherwise it would be impossible to talk about the
solution of an equation like

x2 = 2x + 3
For what can we say about the possible values of x, if we do not require
that x represents the same value on each side of the equation?
This is in sharp contrast with the assignment statement in conventional
programming languages. In Fortran, for example, one can write

X= X+ 1
2 CHAPTER I

knowing that the two occurrences of the variable x represent different

values.
In mathematics we would use different variables or perhaps, different
subscripts to distinguish the two values. Indeed, the assignment statement
represents a new definition of the variable x, which must not be confused
with its previous definition even if it has the same name.
The difference between the effect of an assignment statement and that
of a new declaration of the same variable in a nested block (in a block-
structured language) is quite substantial. The block structure clearly delin-
eates the part of the program where the declaration is valid. This is called
static scoping, since the scope of a declaration can be determined by the
examination of the program text at compile time.
On the other hand, the scope, i.e. the lifetime of the definition of a
variable established by an assignment statement will be terminated either
by another assignment to the same variable or by leaving the block in
which the variable is declared. Therefore, a mathematical description of the
assignment statement is far from being trivial. One has to consider the dy-
namic order, i.e. the execution order of the statements in the program. But
that may vary from one execution to the other depending on the input data.
So, in fact, one has to consider all possible execution sequences to deter-
mine the scope of an assignment statement.
The assignment statement has no counterpart in standard mathemat-
ics. Still, the use of variables is not problem free. First of all, we have to
distinguish between two entirely different ways of using variables in
mathematical texts:
The first corresponds to the mental process of direct generalization.
For example, we may think of an arbitrary natural number and denote it
by the variable n, which is a direct generalization of the natural numbers,
1,2,3, ... , by abstracting away from their particular values and considering
only those properties which they have in common. With this, however,
we maintain that n represents a fixed but otherwise arbitrary number
throughout our discussion. It may, in fact, be implicitly defined by the
equations and/ or other relations we specify in our discussion. A variable
used in this way is said to occur free in the given context.
Another way of using variables in mathematics is to let them run
through a certain set of values. This is the case with the variable of inte-
gration in a formula like
INTRODUCTION 3

In this case the value of x is supposed to vary from 0 to 1 exhausting the

set of all real numbers from the closed interval [0, 1]. Another example is
the use of the variable x in the following formula of predicate calculus:

\fx[(x + 1)(x- 1) = x2 - 1]

where the domain of x must be clear from the context. (Otherwise the
domain should be explicitly stated in the formula itself.) Also, the
existential quantifier in a formula like

3x(x 2 - 5x + 6 = 0),
expresses a property of the set of all possible values of x rather than a
property shared by each of those values. A common feature of these for-
mulas is the presence of some special symbol (integral, quantifier, etc.)
which makes the formula meaningless, if the corresponding variable is re-
placed by a constant. (The formula VS[ (5 + 1) (5 - 1) = 5 2 - 1] does not
make much sense.) The variable in question is said to be bound in these
formulas by the special symbol.
Now, the problem is that the distinction between the free and the
bound usages of variables is not always obvious in every mathematical text.
Quantifiers, for example, may be used implicitly or may be stated only
verbally in the surrounding explanations. Also, a variable which is used as
a free variable in some part of the text may be considered bound in a larger
context.
The situation is even more confusing when we identify a function with
its formula, without giving it any other name. If we say, for instance, that
the function x 3 - 3x 2 + 3x - 1 is monotonic, or continuous, or its firts
derivative is 3x 2 - 6x + 3, then we consider the expression as a mapping
and not as the representation of some value. So, the variable x is treated
here as a bound variabe without explicitly being bound by some special
symbol. As we shall see later, a major advantage of the lambda-notation is
that it forces us to make the distinction between the free and the bound
usages of variables always explicit. This is done by using a special binding
symbol 'A.' for each of the bound variables occurring in an expression.
Hence, we can specify a function by binding its argument(s), without giv-
ing it an extra name. The keyword 'lambda' is used for the same purpose
4 CHAPTER I

in LISP. However, functions in programming languages are usually defined

by function declarations where they must be given individual names.
Unfortunately, conventional programming languages treat function
declarations and variable declarations in two different ways: A function
declaration is more like a constant definition than a variable declaration, be-
cause it explicitly assigns a given procedure to a function name. Hence, we
cannot have function type variables denoting arbitrary functions chosen
from a given set. All we can do is to use arbitrary names for individually
specified functions. To put it differently, functions are treated as second
class citizens in most programming languages. They cannot occur on the
left hand side of an assignment statement and cannot be returned as the
result of a computation. Here we are talking about assignment statements
of the form f: = g where both f and g would be function names. This would
amount to the assignment of the procedure itself used for the computation
of g to the function name f. (This is not the same as using the function
name to hold the value to be returned by the function for some given ar-
gument.) It is possible to pass a function name as a parameter to a proce-
dure or function in Pascal, but function names can never be used as output
variables or get their value in a read statement from a file. This clearly
shows that they are treated differently from other variables.
There are no such restrictions in mathematics. Variables denoting
functions are usually treated in the same manner as any other variables.
They may, of course, have different domains, which is the subject of the
next section. We must add, however, that the history of mathematics has
also seen a substantial evolution of the function concept.
The original concept of a function was based on a computational pro-
cedure that would be used for computing its value for any given argument.
This is the same as it is used today in most programming languages where
each function is uniquely defined by a function procedure.
The development of the set theory, however, has replaced this proce-
dural approach by a purely extensional view:
The set theoretical function definition postulates only the existence
of a function value for any given argument without saying anything
about the way of computing it.
This view emphasizes the extension of a function as if it were completely
given at once. This static, existential view of a function is in sharp contrast
with its dynamic, procedural view.
INTRODUCTION 5

The procedural view presupposes the existence of a finite description

of the function in terms of a computing procedure.
Here the set of values is not assumed to be immediately available, but it can
be produced dynamically by computing more and more elements from it.
This view is also called intensional, as opposed to the extensional view of
functions.
The relationship between these two different views of functions is far
from being trivial. It is obvious that different procedures may compute the
same extensional function. But, in general, it is undecidable whether two
procedurally defined functions are extensionally equal.
Definition 1.1 Two functions are extensionally equal if and only if
they have the same value for every argument. In symbols, f = g ,
iff f(x) = g(x) for all x.
Now, the problem of deciding the extensional equality of functions by ex-
amining their respective procedures is essentially the same as the equiv-
alence problem of Turing machines, which is known to be unsolvable.
Nevertheless, there are various techniques for proving the equivalence
of certain procedures. The study of program transformations in computer
science is dealing with such techniques. Extensional equality is usually
called semantic equality or semantic equivalence in this context. This notion
is also relevant to program optimization, for the optimized program should
be semantically equivalent to the original one.
Naive lambda-calculus can be thought of as an attempt at capturing
the most general formal properties of the extensional function concept us-
ing constructive, finitary methods. In this naiVe sense, type-free lambda-
calculus was bound to encounter the same difficulties as naive set theory.
The apparent paradoxes, after several unsuccessful attempts at avoiding
them, were thought to be inherent in the theory. The problem was put to
rest only in 1969, when Dana Scott showed that the intuitive interpretation
of the theory is reasonable if and only if it is restricted to the procedurally
definable (i.e., computable) functions.
As a purely constructive theory of procedurally well-defined functions,
type-free lambda-calculus is just as powerful as any other constructive ap-
proach. It is indeed equivalent to the theory of general recursive functions,
Turing machines, Markov algorithms, or any other constructive methods
for describing computational procedures in exact terms.
6 CHAPTER I

In this book type-free lambda-calculus is used in a purely constructive

manner. Moreover, in Chapter 6 we shall describe an interpreter program
for evaluating arbitrary lambda-exressions. This interpreter is imple-
mented on the IBM PC and some other IBM machines.

1.2 Domains, types, and higher-order functions

According to the usual set-theoretical definition, every function has two

fundamental sets associated with it namely, its domain and its range or
codomain. Given these two sets, say D and R, a function is defined by a set
of ordered pairs of the form

[ [x,y] I x E D, y E R]

where the second component is uniquely determined by the first. Thus, for
every x in D there is at most one y in R such that [x,y] belongs to a given
function. If a function has at least one, hence, exactly one ordered pair [x,y]
for every x E D then it is called a total function on D; Otherwise it is called
a partial function on D.
A function with domain D and range R is also called a mapping from
D to R and its set of ordered pairs is called its graph. For nonempty D and
R, there are, of course, many different functions from D to R, and each of
them is said to be of type [D-+ R]. For finite D and R, the number of
[D-+ R] type total functions is obviously I R 11°1 , where the name of a set
between two vertical bars denotes the number of its elements. This formula
extends to infinite sets with the cardinality of the given sets replacing the
number of their elements. The set of all [D-+ R] type functions is also
called the function spaceR 0 .
If the range R has only two elements say, 0 and 1, then each [D-+ R]
type total function is the characteristic function of a subset of D. Hence,
the cardinality of the set of [D-+ R] type total functions with I R I = 2 is
the same as that of the powerset of D. Therefore, if R has at least two el-
ements, the cardinality of the function space R 0 is larger than the
cardinality of D.
INTRODUCTION 7

So, for instance, the set of all number-theoretic, i.e. [N-+ N] type
functions where N denotes the set of natural numbers, is clearly
nondenumerable. On the other hand, it is known from the theory of algo-
rithms that the set of computable functions is denumerable. Namely, each
computable function must have a computational procedure, i.e. a Turing
machine associated with it, and the set of all Turing machines can be enu-
merated in some sequence, say, according to the length of their de-
scriptions encoded in some fixed alphabet. (Descriptions with the same
length can be listed in lexicographic order.) This means that the over-
whelming majority of the [N-+ N] type functions is noncomputable.
It is an interesting question if there is some extensionally definable
property which would distinguish the graphs of computable functions from
those of noncomputable ones. The mere fact that computable functions
have Turing machines associated with them tells nothing directly about
their properties as mappings. It would be nice if we could, so to speak, look
at the graph of the function and tell if it is computable. Interestingly
enough, there is such a property but it is far from being obvious. It is
'continuity' in a somewhat unusual but interesting topology. Here, we do
not discuss this topology but the interested reader is referred to the book
by Joseph Stoy [Stoy77], or to the papers by Dana Scott [Scott73,
Scott80]. For our discussion it suffices to say that the graphs of comput-
able functions are indeed different from those of noncomputable ones,
because they must obey certain restrictions namely, they must be
'continuous' in some abstract sense. The reason why we mention this con-
tinuity property is that it has been instrumental for the construction of a
mathematical model for the type-free lambda-calculus.
The original purpose of type-free lambda-calculus was to provide a
universal framework for studying the formal properties of arbitrary func-
tions, which turned out to be a far too ambitious goal. Nevertheless, the
type-free lambda-calculus can be used as a formal theory of functions in a
wide variety of applications, but we should never try to apply this theory
to noncontinuous, i.e. noncomputable functions. This state of the affairs
is comparable to the development of set theory. Naive set theory was
meant to cover arbitrarily large sets. The paradoxical nature of that goal
has led to the development of axiomatic set theory, where the notion of
classes is introduced in order to avoid the paradoxes of arbitrarily large
sets.
8 CHAPTER I

Now, as we know, the cardinality of the set of number-theoretic, i.e.,

[N ... N] type functions is nondenumerable while the set of lambda-
expressions is clearly denumerable, for it is a set of finite strings over a fi-
nite alphabet. Therefore, we cannot have a different lambda-expression
for each number-theoretic function. But, if we restrict ourselves to
'continuous' functions then we can show that they are precisely those
which can be described by lambda-expressions. This means that the set of
'continuous' functions coincides with the set of lambda-definable, hence,
computable functions. In this book we are not concerned with this model-
theoretic aspect of the theory but the interested reader is referred to the
literature.
Typed lambda-calculus considers the domains and the ranges of func-
tions as being relevant to their formal treatment. Function composition, for
instance, is considered well-defined only for compatible types, i.e. when
the range of the first function is the same as the domain of the second. In
this theory each function has a unique type and, of course, there is an in-
finite variety of types. This approach is, in fact, a refinement of the type-
free theory and it has a number of important applications. It serves as an
appropriate model for strongly typed programming languages like Pascal
or ADA.
In a strongly typed language every variable must have a type, which is
basically the set of its possible values. This set, however, is usually en-
dowed with some algebraic structure. So, for example, an integer type var-
iable has two important characteristics:
(a) It can have only integer values
(b) It can be used as an argument only in those operations that are
defined on integers.
The first requirement imposes a restriction on the effect of any assignment
to that variable while the second requirement restricts its possible occur-
rences in the expressions.
A third aspect of a type is its relationship to other types, which is im-
portant for the discussion of possible (implicit or explicit) conversions be-
tween types.
Strongly typed languages usually have a finite set of ground types such
as integer, real, boolean, or character, and any number of user defined or
constructed types such as arrays, records, or enumerated types. Each ground
type has a fixed set of primitive operations defined on it. These primitive
INTRODUCTION 9

operations have their own (implicit) types. The test for zero predicate for
example, is of type [N-+8], where 8 ={true, false}.
New types can be defined by using certain type constructors. A record
type, for instance, corresponds to the cartesian product of the types of its
components. Thus, the domain of a record type variable is

where Di is the domain of its i-th component.

Another type constructor is related to the so called 'variant records',
which belong to the same file but have different structure. This construct
corresponds to the set theoretical operation of discriminated union, de-
noted by

D, CBD2CB ... CBD"

which is essentially the same as

except that its elements will be discriminated (labelled) by their origins.

This means that common elements will have as many copies as the number
of the different Di's to which they belong. So, for example, in a discrimi-
nated union of the types integer and real the integers will have two differ-
ent (fixpoint vs. floating point) representations, which must be
distinguished from one another.
The repeated application of the type constructors may result in fairly
complex types, but this process is usually limited to the construction of data
types. The construction of function types, i.e. function spaces is not sup-
ported by conventional languages.
The function space R 0 is also a set, so it can be used as the domain
and/ or the range of other functions. A mapping from R 0 to itself, which
takes a function as an argument and returns another function as its value,
is called a functional or higher-order function. The same construction can
be repeated, which yields an infinite hierarchy of function spaces with ever
increasing cardinalities. The corresponding functions can be classified ac-
cording to this hierarchy and thus, we can talk about first order, second
order, etc ... functions relative to the given ground types.
The usefulness of an infinite hierarchy may be arguable, but certain
functionals may be quite useful in many applications. Take for instance the
10 CHAPTER I

symbolic differentiation of functions. It is clearly a second order function

as it takes a function as an argument and returns another function as a re-
sult.
The implementation of function types seems more difficult than that
of the ground types or other constructed types. The main difficulty, how-
ever, is due to the machine code representation of functions on conven-
tional computers. It is namely much more difficult to manipulate the
machine code representation of a function than to manipulate the machine
representation of an integer or a real number. This has nothing to do with
the cardinality of the function space, because we do not have to manipulate
those infinite sets. All we have to do is to manipulate the finite represent-
ations of their elements just as we do with the integers or reals.
In the present state of the art, function manipulations are called sym-
bolic computations, and they are treated differently from numeric computa-
tions. In lambda-calculus, as we shall see, the distinction between symbolic
and numeric computations is irrelevant.

1.3 Polymorphic functions and Currying

Strong typing seems to be an uncomfortable straight jacket sometimes. A
typical example is the problem of writing a generalized sort routine which
should work the same way for integer, real, or character type keys. In a
strongly typed language we cannot have variables with flexible types, so
we actually need three versions of the sort routine, one for each type. Even
if we have only, say, integer type keys, we cannot sort arbitrary records.
By using variant records we can specify only a finite number of different
record types, but we cannot have an open ended record type.
Similar problems occur in the context of list manipulation. There are
functions, such as the length of a list, which are independent of the type
of the elements in the list. Also, the usual operations on a push-down store,
i.e. popping and pushing, are independent of the type of the elements
stored in the stack.
A reasonable compromise between strong typing and a completely
type free treatment of functions is represented by the use of polymorphic
INTRODUCTION II

functions. Polymorphism in a typed language means using variables with

flexible types. A function is called polymorphic if the type of (at least one of)
its arguments may vary from call to call. Take, for example, the function
ADD which is defined as the usual addition on integer type arguments
while it is defined as the logical OR operation on Boolean type arguments.
This is a polymorphic function, but, because of its rather arbitrary defi-
nition, it is an example of the so called ad hoc polymorphism.
A more interesting kind of polymorphism is one that is based on the
structural similarities of certain domains. For example, the ordering of the
integers under the usual < relation is similar to the alphabetic ordering of
character strings in an alphabet. We can take advantage of this similarity
by using a polymorphic function for comparison which would take either
integer or character type arguments. This way we can write 'generic' pro-
grams which implement the same algorithms for different types.
Typed A.-calculus can be extended to polymorphic types, which makes
it more attractive as a formal theory of types for modern programming
languages. An excellent discussion of the various kinds of polymorphism
can be found in [Ca-W85], which also contains a good overview of the
evolution of types in programming languages. A brief introduction to typed
A.-calculus will be given as Appendix B at the end of this book.
One of the main concerns of type theory is the question of type
equivalence, which is clearly related to the problem of type checking. The
whole point of type checking is to establish the equality (or at least the
compatibility) of the types of certain variables occurring in a program. This
is far from being a trivial task except for very simple cases. The ground
types and the type constructors represent, in fact, an algebra of types,
where certain types may be constructed in many different ways. If, for ex-
ample, we combine three records simply by concatenating them then the
resulting record type corresponds to D 1 x D 2 x D 3 where D; is the domain
of the i-th record. Notice the fact that no parentheses are used here, be-
cause the Cartesian product is associative. In other words, we get the same
record type if we concatenate the first two records and then concatenate
the resulting record and the third one, or the other way around.
In general, however, it is very difficult to decide whether or not two
different type constructions result in the same domain. Type checking is,
therefore, a major problem for programming languages with nontrivial type
systems. Some of the difficulties may be avoided by treating the type
12 CHAPTER I

specifications as purely formal expressions rather than representations of

sets. Different type specifications may then be considered different even
if the corresponding domains are actually the same. For a practical system,
however, we have to allow at least some obvious equalities between dif-
ferent type specifications. (For instance, a file type with variant records
should not be sensitive to the order of enumerating the variants.) The
complexity of the type specifications may increase substantially, if we al-
low recursively defined types. The use of polymorphic types does not make
type checking much simpler.
The type of a function is clearly related to its 'arity', which is the
number of its arguments. Indeed, the domain of a binary function is the
Cartesian product of the respective domains of its two arguments. So, for
instance, the addition of integers is of type [N x N - N].
Addition and multiplication are usually extended to any finite number
of arguments. A function which can take an arbitray number of arguments
is called polyadic. Thus, the domain of a polyadic function is the union of
the corresponding cartesian products. For example, the domain of the in-
teger L function is

N U (NxN) U (NxNxN) U ...

Conventional programming languages do not allow polyadic functions.

The number of formal parameters specified in a function declaration will
determine the number of arguments that must be supplied in each call of
the given function.
To overcome this limitation one can use sequences (i.e., arrays or lists)
as arguments. Using lists as arguments is quite common in functional lan-
guages. It seems that a uniform treatment of functions can be achieved by
using only unary functions which take only lists as arguments. This way
we can have polyadic functions in disguise, but there is a price to pay.
Namely, many practical functions make sense only with a fixed number of
arguments. The implementation of such functions must check the length
of the argument list before using it.
This approach was taken by John Backus in the design of the func-
tional language FP, where the usual arithmetic operations are treated as
unary functions with ordered pairs as arguments [Back78]. The type of the
argument, however, is specified as a list, which may have any number of
INTRODUCTION 13

elements. Therefore, it is the responsibility of the implementation to make

sure that the argument has precisely two elements.
Another method of representing all functions by unary ones is called
Currying, after Haskell B. Curry. This method was introduced by
Schonfinkel [Scho24] and extensively used by Curry. It consists of trans-
forming multi-argument functions into sequences of unary functions. Let
us take again addition for an example. The + function takes two argu-
ments, so it is of type
[NxN-N].
But, if we supply only one argument then we get a unary function which
would add a constant (the given operand) to its only argument (the missing
operand). The resulting functions can be denoted by
addl, add2, add3, ...
and they are clearly of type [N - N]. So, that is the range of the first
function, which is, therefore, of type
[N- [N -N]]
Hence, the process of Currying of a function can be described informally
as exchanging each x symbol in its domain for an - in its range. The
Curry-ed version of an n-argument function on integers will have the type
[N - ... [N -. [N -N]] ... ]
where precisely n arrows occur. For that, of course, we need higher order,
or at least function valued functions which are not readily available in
conventional programming languages. But, in the type-free lambda-
calculus, where functions of any type are treated in the same way, Currying
represents a useful device for replacing multi-argument functions by re-
peated applications of unary ones. As we mentioned before, the imple-
mentation of higher order functions is not as difficult as it might appear at
first.
CHAPTER TWO

TYPE-FREE LAMBDA-CALCULUS

2.1 Syntactic and semantic considerations

As we have mentioned in the introduction, the purpose of the lambda-

calculus is to study the most general properties of functions. This means
that we want to develop a unified theory of functions encompassing all
kinds of functions used in various parts of mathematics. One way of doing
this is to use set theory as a foundation and build the notion of a function
on the notion of a set. This is the usual approach taken by modern math-
ematics, which corresponds to the extensional view of functions. Accord-
ing to this notion, a function is just a (usually infinite) set of ordered pairs
where the first member of each pair is an argument value while the second
member is the corresponding value of the function. This approach has im-
portant theoretical benefits as manifested by the development of modern
mathematics, but it is clearly a departure from the original formula-based
function concept.
In lambda-calculus functions are represented by symbolic notations
called A-expressions and not by sets of ordered pairs. These A-expressions
can be manipulated directly and thus, we can build a theory of function
transformations using this symbolic representation of functions. The
question is what kinds of functions are represented by these
A-expressions, and what kinds of operations can be performed on them by
TYPE-FREE LAMBDA-CALCULUS 15

manipulating their representations? Strangely enough, the first question is

much more difficult to answer than the second. In fact, we shall define the
syntax of A-expressions without defining their semantics right away. Next,
we shall define certain operations on A-expressions and investigate their
properties under these operations. This way we will study a purely formal
system hoping that these symbolic expressions are indeed representations
of existing functions. A justification for this naive approach will be given
later in Chapter 5 where the related semantic issues will be discussed.
For the time being, we shall assume that every A-expression denotes
some function. Moreover, we shall consider arbitrary functions regardless
of their types. This means that we will not be concerned with the domain
and the range of a function, since we are interested only in those properties
which are common to all functions. Now, the question is whether such
properties exist, and, if so, whether they are interesting enough for devel-
oping a meaningful theory about them. The rest of this book should pro-
vide a positive answer to both of these questions. It is concerned namely,
with the development of the type-free lambda-calculus which, in fact, re-
presents a general framework also for its typed versions.
The type-free theory is obviously more general, but its relevance had
been strongly debated for quite some time until it gained universal accept-
ance by the end of the sixties. Today it is widely used in theoretical com-
puter science and even for such practical purposes as the hardware design
of non-conventional computer architectures. As we shall see later in this
book, lambda-calculus is one of the most important tools for studying the
mathematical properties of programming languages. Historically, it has
been developed for similar purposes regarding the language of mathemat-
ical logic, but it has been greatly rejuvenated by the latest developments in
computer science. Now, let us see the formal definition of A-expressions.
First, we assume that there is an infinite sequence of variables and a
finite or infinite sequence of constants. Each variable will be represented
by an identifier (i.e., by a finite string of symbols chosen from some finite
alphabet) as is usual in programming languages like Pascal or LISP. Simi-
larly, each constant will be represented by a finite string of digits and/ or
symbols r.hosen from a finite set of available symbols.
Variables and constants are called atoms since they are the simplest
A-expressions. More complex A-expressions can be built from them by us-
ing two expression forming operations, application and abstraction.
16 CHAPTER 2

An application is simply the application of one A-expression to another.

The first of these two A-expressions is called the operator, the second is
called the operand in that application. This means that any A-expression
can be used both as an operator and as an operand with no restriction at
all. (Note that we are not concerned with the meaning of such applications
at the present time.)
An abstraction is formed with the special symbol A followed by a vari-
able, followed by a dot, followed by an arbitrary A-expression. The pur-
pose of the operation of abstraction is to make a unary function from a
given A-expression. The variable occurring next to the leading A gives the
name of the argument. Functions with more than one argument are formed
by repeated abstractions. A formal definition of A-expressions is given by
the following syntax:

THE SYNTAX OF LAMBDA-EXPRESSIONS

<A -expression>::=< variable >I< constant >I< application>!< abstraction>

This syntax allows us to form A-expressions like

AX.X Ax.Ay.(y)x
Ax.(f)x (f)3
Af.(f)2 (Ay.(x)y)Ax.(u)x

whose meanings are left undefined for the time being. Nevertheless, we
will treat them as functional expressions which satisfy certain formal re-
quirements.
A remark on our nonconventional way of parenthesizing lambda-
expressions may be in order here. The traditional way of putting the
argument(s) of a function between parentheses may have been suggested
by the conventional way of evaluating a function. In the conventional or
applicative order of evaluation, the argument of a function is computed
before the application. That corresponds to the call by value mechanism
of parameter passing in programming languages. In this book we use the
notation (f)x instead of the traditional f(x), which reflects the so called
TYPE-FREE LAMBDA-CALCULUS 17

normal order or left-to-right evaluation strategy, where the function itself

is analyzed first before looking at the argument.
Our syntax represents an LL( 1) grammar for lambda-expressions,
which is very useful for the development of an efficient predictive parser.
(Every application starts with a left parenthesis, and every abstraction
starts with a A.) Every left parenthesis can thus be treated as a prefix 'apply
operator' with the corresponding right parenthesis being just a delimiter.
The main reason, however, for using this notation is its simplicity. The
traditional notation often requires the use of parentheses both in the
function and in the argument part of a lambda-expression. That may be-
come quite confusing when dealing with complicated lambda-expressions.
LISP has already departed from the traditional notation by using (f x) in-
stead. Here we go one step further by using (f)x, which seems more natural
and easier to follow when working with Curried functions.
The syntactic structure of a A-expression is defined by its parse tree
with respect to the above context-free grammar. This grammar is unam-
biguous, hence, every A-expression has a unique parse tree. The subtrees
of this parse tree correspond to the subexpressions of the given
A.-expression. The same subexpression may have, of course, several occur-
rences in a larger expression.
In this book we shall use small letters, x, y, z, etc., as generic names for
arbitrary variables. Arbitrary A.-expressions will usually be denoted by
capital letters, P, Q, R, etc ..
Two A.-expressions, P and Q, are identical, in symbols P = Q, if and
only if Q is an exact (symbol by symbol) copy of P.
As can be seen from our syntax, functional application associates to
the right. The A-expression (P)(Q)R is namely, the application of P to
(Q)R, while the A.-expression ( (P)Q)R denotes the application of (P)Q to
R.
An occurrence of a variable x in a A.-expression E is said to be bound
if it is inside a subterm with the form A.x.P; otherwise it is free.
Definition 2.1. The set of the free variables of a A.-expression E,
denoted by <t>(E), is defined by induction on the construction of
E as follows:
(1) <j>(c) = {} (i.e., the empty set) if cis a constant
(2) <j>(x) = {x} for any variable x
18 CHAPTER 2

(3) <f>(Ax.P) = <t>(P)- {x}

(4) <t>((P)Q) = <t>(P) U <t>(Q)
Note that the same variable may occur both free and bound in a
A-expression. For instance, the first occurrence of x in
(Ax. (z)x)Ay .(y )x
(apart from its occurrence in the AX prefix) is bound, while its second oc-
currence is free. Thus, the set of free variables in this example is {x, z}, but
not every occurrence of x is free.
The two occurrences of y in the following A-expression
Ay. (Ax.(x)y )Ay. (y)x
are bound by two separate AY prefixes. Each AY represents an abstraction
with respect to y, but it affects only those occurrences of y which are free
in the subexpression prefixed by the given Ay. More details on the usage
of the abstraction can be found in the next Section.
Constants may be treated as special variables which can never be used
as bound variables. They represent fairly trivial special cases in most defi-
nitions and theorems. For the sake of simplicity, constants will be omitted
from the definitions and the proofs that follow in this chapter and in Appendix
A.

2.2 Renaming, a-congruence, and substitution

The computation of the value of a function for some argument requires the
substitution of the argument for the corresponding variable in the ex-
pression representing the function. This is a fundamental operation in
mathematics as well as in programming languages. In a programming lan-
guage like Pascal or Fortran, this corresponds to the substitution of the
actual parameters for the so called formal parameters. In lambda-calculus
both the function and the arguments are represented by A-expressions, and
every function is written as a unary function which may return another
function as its value. So, for instance, the operation of addition will be re-
presented here as
TYPE-FREE LAMBDA-CALCULUS 19

AX.Ay.((+)x)y.

A fundamental aspect of the use of bound variables is expressed by the

following:
Two lambda-expressions are considered essentially the same if they
differ only in the names of their bound variables.
For instance, the A-expressions Ax.(y)x and Az.(y)z, represent the same
function, because the choice of the name of the bound variable has no in-
fluence on the meaning of a function. The same is true for programming
languages where the formal parameters occurring in a function declaration
are only dummies, i.e. place holders for the actual arguments, so we can
choose arbitrary names for them. This freedom of choice is formalized in
lambda-calculus via a transformation called a-conversion. It is just a re-
naming of bound variables which would not change the meaning of
A-expressions.
Definition 2.2. The renaming of a variable x to z in a A-expression
P, in symbols {z/x}P, is defined by induction on the construction
of P as follows:
(1) {z/x}x =z
(2) {z/x}y = y if x ~ y
(3) {z/x}Ax.E = AZ.{z/x}E for every A-expression E.
(4) {z/x}Ay.E = Ay.{z/x}E for every A-expression E, if x ~ y
(5) {z/x}(E 1)E 2 = ({z/x}E 1){z/x}E 2 · for any two A-ex-
pressions, E 1 and E 2 •
Note that the renaming prefix {z/x} is not a proper part of the
A-notation, as it is not included in the syntax of A-expressions. The nota-
tion {z/x}P is just a shorthand for the A-expression obtained from P by
performing the prescribed renaming. For example, the notation

{a/b} (Av.Ab.(b )v)Au.( (c)u)b

stands for the A-expression

(A v.Aa. ( a)v)Au. ( ( c)u)a

as can be easily verified by the reader.

20 CHAPTER 2

Definition 2.2 will never be applied with z = x or z = y, because re-

naming will be used only for a-conversion as described by the following
rule:

THE ALPHA-RULE

(a) AX.E -+a Az.{z/x}E for any z which is neither free nor bound in E.

We say that the A-expression on the left-hand side is a-convertible to the

one on the right-hand side. Since xis neither free nor bound in {z/ x} E, the
reverse of a renaming is also a renaming. Hence, a-convertibility is a sym-
metric relation. Clearly, it is also transitive. Next we define a-congruence
or a-equality.
Definition 2.3. Two A-expressions, M and N, are a-congruent (or
a-equal), in symbols M ~ N, if either M = N, or M -+aN, or N is
obtained from M by replacing a subexpression S of M by a
A-expression T such that S -+aT, or there is some A-expression R
such that M ~Rand R ~ N.
The notion of a-congruence is clearly an equivalence (reflexive, symmetric,
and transitive) relation. It will be generalized shortly to permit more inter-
esting 'meaning preserving' transformations on A-expressions. We are now
prepared for giving a precise formal definition of the operation of substi-
tution.
Definition 2.4. The substitution of Q for all free. occurrences of the
variable x in P, in symbols [Q/x]P, is defined inductively as fol-
lows:
(1) [Q/x]x ~ Q
(2) [Q/x]y ~ y if X '¥= y.
(3) [Q/x]Ax.E ~ AX.E for any A-expression E.
( 4) [Q/x]Ay.E ~ Ay.[Q/x]E for any A-expression E, if x-¥= y and
at least one of these two conditions holds: x ¢ cj>(E), y ¢ cj>(Q).
(5) [Q/x]Ay.E ~ Az.[Q/x]{z/y}E for any A-expression E and for
any z with x -¥= z -¥= y which is neither free nor bound in (E)Q, if
x -¥= y and both x E cj>(E) andy E cj>(Q) hold.
(6) [Q/x](E 1 )E 2 ~([Q/x]E 1 )[Q/x]E 2
TYPE-FREE LAMBDA-CALCULUS 21

Again, the substitution prefix [Q/x] is not part of the A.-notation, as it is

not included in our syntax. The notation [Q/x]P is just an abbreviation for
the A.-expression obtained from P by carrying out the prescribed substi-
tution.
Example: The substitution
[A.z.A.y. (y )z/x](A.u.(x)u)( u)x
is defined as being a-congruent to
( [A.z.A.y. (y )z/ x]A.u. (x)u) [A.z.A.y. (y )z/ x]( u)x
that is
(A. u. [A.z.A.y. (y )z/ x] (x)u) ([A.z.A.y. (y )z/ x]u) [A.z.A.y. (y )z/ x]x
that is
(A. u. ([A.z.A.y. (y )z/ x]x) [A.z.A.y. (y )z/ x]u) ( u)A.z.A.y. (y )z
and finally
(A. u. (A.z.A.y. (y )z)u) ( u)A.z.A.y. (y )z
which is already a proper A.-expression.
This example shows that the substitution operation may get quite complex
when we are working with complicated A.-expressions. The most intriguing
case is covered by part (5) of Definition 2.4 where we have to introduce a
new bound variable z in order to avoid the capture of the free
occurrence(s) of y in Q. If we did not have this clause and wanted to use
( 4) instead naively, i.e. without any restrictions, then we would get some
errors as can be seen from the following example:

[A.u.(y)u/x]A.y.(x)y

would yield

A.y.[A.u. (y )u/x](x)y

and then

A.y. ([A.u.(y)u/x]x)[A.u. (y)u/x]y

and finally

A.y.(A.u.(y)u)y
22 CHAPTER 2

This means that the free occurrence of y in A.u.(y)u would become bound
in the result, which is clearly against our intuition about substitution. In-
deed, the free occurrence of y in the substitution prefix should not be
confused with the bound variable y in the target expression. In order to
avoid this confusion we introduce a new bound variable z which gives the
correct result namely,

A.z.(A.u.(y)u)z

It should be noted in this regard that the result of a substitution is not a

unique A.-expression, since it is defined only up to a-congruence. But it is
easy to see that

A.z.[Q/x]{z/y}E ~ A.v.[Q/x]{v/y}E

if both z and v satisfy the condition of part (5) of Definition 2.4. (See
Exercise 2.3 at the end of Chapter 2.) This is quite satisfactory for our
purpose, because we are not so much interested in the identity of
A.-expressions as in their equality.
This freedom of choice with respect to the new bound variables while
performing a substitution has been achieved by our simplified renaming
operation which is defined directly without using substitution.
Our approach is slightly different from the conventional definition of
substitution included in most textbooks where every effort is made to de-
fine a unique result for the substitution operation. We feel, however, that
the conventional approach imposes a rather artificial and unnecessary re-
striction on the choice of the bound variable used for renaming during
substitution. Its only gain is that renaming becomes a special case of sub-
stitution and thus, it can be denoted by [z/x].
The complications with the substitution operation have inspired some
researchers to try to get rid of bound variables altogether. This has resulted
in the discovery of combinators which we shall study in Chapter 3. For the
time being, we can be satisfied that Definition 2.4 is a precise formal defi-
nition of the substitution operation in the type-free lambda-calculus.
TYPE-FREE LAMBDA-CALCULUS 23

2.3 Beta-reduction and equality

The substitution operation defined in the previous section is one of the

most important operations in classical ;\-calculus. It allows the simplifi-
cation of certain ;\-expressions without changing their meanings. This is
expressed by the following rule:

THE BETA-RULE

(/3) (;\x.P)Q -+ [Q/x]P

For arbitrary x, P, and Q, the ;\-expression (;\x.P)Q is called a {3-redex

while the corresponding right-hand side is called its contractum. Remem-
ber that the ;\-expression on the right-hand side is defined only up to
a-congruence.
The contractum of a {3-redex is usually simpler than the /3-redex itself.
For instance, the contractum of (;\x.x)Q is Q while the contractum of
(;\x.y)Q is y. In both cases the resulting ;\-expressions have fewer appli-
cations and fewer abstractions than the original ones. But that is not al-
ways true as can be seen from this example:
(;\x.(x)x);\x.(x)x -+ [;\x.(x)x/x];\x.(x)x ~ (;\x.(x)x);\x.(x)x

The contractum of this {3-redex is a-congruent with itself, although

{3-contraction in general is neither reflexive nor symmetric. It can be used,
however, to define an equivalence relation on ;\-expressions which is called
{3-equality or simply equality. First we define {3-reduction.
Definition 2.5. The relation M => N (read M {3-reduces to N) is de-
fined inductively as follows:
(1 ) M=> N if M ~ N.
(2) M => N if M - N is an instance of the /3-rule.
(3) If M => N for some M and N, then for any ;\-expression E,
both (M)E => (N)E and (E)M => (E)N also hold.
( 4) If M => N for some M and N then for any variable x,
;\x.M => ;\x.N also holds.
(5) If M => E and E => N then also M => N.
24 CHAPTER 2

( 6) M ~ N only in those cases as specified by (1) through (5).

This means that M P-reduces toN if N is obtained from M by replacing a
part of M of the form (Ax.P)Q by its contractum [Q/x]P, or N is obtained
from M by a finite (perhaps empty) sequence of such replacements. The
relation ~ is clearly reflexive and transitive but not symmetric. It is defined
on the a-equivalence classes of A-expressions rather than on the
A-expressions themselves. In other words, M ~ N is invariant under
a-conversion. Now we can extend the notion of a-equality by forming the
symmetric and transitive closure of~.
Definition 2.6. M is P-convertible (or simply equal) to N, in symbols
M = N, iff M ~ N, or M ~ N, or N ~ M, or there is a
A-expression E such that M = E and E = N.
This notion of equality is defined in a purely formal manner with no refer-
ence to the 'meaning' at all. Nevertheless, it is relevant to the meaning,
because it says that the meaning of A-expressions must be invariant under
P-conversion. This is, in a sense, a minimum requirement regarding all
possible definitions of the meaning.
Now the question arises, how to decide whether two A-expressions are
equal or not. Unfortunately, this question in general is algorithmically
undecidable. Note that we are not talking about a more elaborate notion
of semantic equivalence. All we are talking about is P-convertibility. This
may be disappointing but we can look at it this way: As we have said,
P-convertibility represents a special case of the extensional equality. Since
it is already undecidable, then it might, in fact, be equivalent to it. As a
matter of fact, P-conversion turns out to be 'almost' equivalent to exten-
sional equality. It represents a very useful and general tool for comparing
various functions. (See also Lemma 3.1 and the 1]-rule in Section 3.4.)
The process of P-reduction is aimed at the simplification of
A-expressions. It terminates when no more P-redexes remain in the given
A-expression. This gives rise to the notion of the normal form of lambda-
expressions.
A A-expression is said to be in normal form if no P-redex, i.e. no
subexpression of the form (Ax.P)Q occurs in it.
Such a A-expression cannot be reduced any further so, in this sense, it is the
simplest among all those A-expressions which are equal to it. But, there
exist A-expressions like
TYPE-FREE LAMBDA-CALCULUS 25

(Ax.(x)x)Ax.(x)x
for which P-reduction never terminates. This is the main reason for the
undecidability of the equality of arbitrary A-expressions.
It should be emphasized that for certain A-expressions there are ter-
minating as well as non terminating P-reductions. But, if there is at least one
terminating P-reduction then we say that the given A-expression has a normal
form. For instance, the normal form of the A-expression
(Ay. (Az. w)y)(Ax. (x)x)Ax. (x)x

happens to be
w
in spite of the fact that it also has a nonterminating P-reduction as we have
seen before.
Fortunately, if a A-expression has at least one terminating
P-reduction, then one can find such a P-reduction in a straightforward
manner without 'back-tracking'. This follows from the so called standardi-
zation theorem to be discussed later in Chapter 6. Moreover, if a
A-expression has a normal form then every terminating P-reduction would
result in the same normal form (up to a-congruence).
In other words, the order in which the P-redexes are contracted is ir-
relevant as long as the reduction terminates. This is a corollary of the
Church-Rosser theorem, which is one of the most important results of the
theory of lambda-conversion. As we shall see below, this theorem also
implies that the equality problem of A-expressions having normal forms is
decidable.

2.4 The Church-Rosser theorem

The validity of the Church-Rosser theorem seems quite natural in view of

our experience with elementary algebra. Its proof, however, is surprisingly
difficult. Fortunately, the theorem itself is easy to understand without
working through the proof. Therefore, we do not discuss the proof here,
26 CHAPTER 2

but it can be found in Appendix A at the end of this book. The essence of
the theorem can be illustrated by the following example.
Assume that we have a polynomial P with two variables, x and y. In
order to compute its value, say, for x = 3 andy = 4, we have to substitute
3 for x and 4 for y in P and perform the prescribed arithmetic operations.
If we substitute only one of these values for the corresponding variable in
P then we get a polynomial in the other variable. Now, either of these
'partially substituted' polynomials can be used to compute the value of the
original polynomial for the given arguments by substituting the value of the
remaining variable. In other words, the final result does not depend on the
order in which the substitutions are performed.
If substitution is defined correctly, then this must be true in general.
Indeed, for any A-expressions, P, Q, and R, and variables, x andy, we have

(AX.(Ay.P)R)Q ~ [Q/x](Ay.P)R ~ ([Q/x]Ay.P)[Q/x]R

~ (Az.[Q/x]{z/y}P)[Q/x]R ~ [[Q/x]R/z][Q/x]{z/y}P

At the same time we have

((Ax.(Ay.P)R)Q ~ (Ax.[R/y]P)Q ~ [Q/x][R/y]P

which, according to the Church-Rosser theorem, must be equal to the

above.
Theorem 2.1 (Church-Rosser theorem I) If E ~ M and E ~ N
then there is some Z such that M ~ Z and N ~ Z.
This theorem is represented by the diagram in Figure 2.1.

MAN ''
''
'' /
/
/
/
/

'v /
/

Figure 2.1
TYPE-FREE LAMBDA-CALCULUS 27

The property reflected by this diagram is called the diamond property (or
confluence property). The Church-Rosser theorem says, in effect, that
~-reduction has the diamond property.

Corollary: If E ~ M and E ~ N, where both M and N are in

normal form then M ~ N.
In other words, every A.-expression has a unique normal form (up to
a-congruence) provided that it has a normal form at all. This corollary
follows immediately from the theorem, because, in this case, the existence
of some Z with M ~ Z and N ~ Z implies that M ~ Z ~ N.
Now, for the question of ~-equality the diamond property can be
generalized this way:
Theorem 2.2 (Church-Rosser theorem II) If M =N then there is
a Z such that M ~ Z and N ~ Z.
This second form of the Church-Rosser theorem follows from the dia-
mond property by induction on the number of reductions and reverse re-
ductions connecting M and N. Namely, by the definition of equality, M =
N implies that there is a finite sequence of A. -expressions E 0 , E 1, ... , En such
that M ~ E 0 , N ~ En and for every i (0 ~ i < n) either E;~E;+ 1 or
E;+ 1 ~E; . The proof is represented by the diagram in Figure 2.2, which
shows the induction step using the diamond property.

/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
//
/

/
/
/
/
'v /

Figure 2.2
28 CHAPTER 2

An immediate consequence of 'this second form of the Church-Rosser

theorem is the following:
Corollary A: If Nisin normal form and M = N then M * N.
Proof: According to the second Church-Rosser theorem there
* *
exists some ;\-expression Z such that M Z and N Z. But, since
Nisin normal form, N ~ Z must be the case and thus, M N. *
Further, if both M and N are in normal form then M = N implies M ~ N,
that is, two equal expressions in normal form must be ex-congruent.
Here, we should add that for any two ;\-expressions, M and N, it is
always decidable whether or not M ~ N. All we have to do is to rename
systematically the bound variables in each ;\-expression so that the name
of a bound variable will be determined uniquely by the order in which the
binding ;\ 's occur. This way the decision of a-equality can be reduced to
an identity check, which is trivial. Finally, we state the following:
Corollary 8: If M = N, then either M and N both have the same
normal form (up to a-congruence) or else neither of them has a
normal form.
Again, this follows immediately from the Church-Rosser theorem. So, the
question of the equality of ;\-expressons can be reduced to the problem of
deciding whether or not they have normal forms. But, unfortunately, that
is in general undecidable. It is, of course, possible to show that certain
;\-expressions are equal (i.e. /3-convertible) in spite of the fact that they
have no normal form. The problem is that in general we cannot determine
in a finite number of steps whether or not they are equal when they have
no normal form.

2.5 Beta-reduction revisited

As we mentioned before, the substitution operator is not as simple as may

be desirable for many applications. The substitution prefix is not a proper
part of the lambda-notation, and each time it appears during /3-reduction,
it should be eliminated by using its recursive definition. The number of it-
eration steps required for the elimination of the substitution prefix depends
TYPE-FREE LAMBDA-CALCULUS 29

on the construction of the ;\-expression in question. This cannot be con-

sidered an elementary operation, so it is reasonable to look for an alterna-
tive definition of /3-reduction which would be based on more elementary
operations. This can be achieved simply by decomposing the substitution
operation into more elementary steps.
This means that we can define /3-reduction by making essential use of
the properties of the substitution operation without explicitly referring to
it. This way we get five different /3-rules, instead of the original one, but the
new /3-rules do not involve any recursion. Moreover, we can do away
completely with the substitution prefix and stay within the limits of the
pure lambda-notation. Even the renaming operation becomes superfluous,
if we adopt the following a-rule:

THE REVISED ALPHA-RULE

(a) ;\x.E -+ ;\z.(;\x.E)z for any z ¢ cj>(E)

Note that this new a-rule is not symmetric by definition and it would not
perform any renaming by itself. But the following /3-rules would take care
of the renaming, as well.

THE REVISED BETA-RULES

(/31) (;\x.x)Q -+ Q
(/32) (;\x.y)Q -+ y if x andy are different variables.
(/33) (AX.AX.E)Q - AX.E
(/34) (;\x.;\y.E)Q -+ ;\y.(;\x.E)Q if x andy are different variables, and
at least one of these two conditions holds: x ¢ cj>(E), y ¢ cj>(Q).
(/35) (;\x.(E 1)E 2 )Q -+ ((;\x.E 1)Q)(;\x.E 2 )Q

A ;\-expression with the form ;\x.E is called an a-redex without any re-
striction. However, a term of the form (;\x.E)Q is a /3-redex if and only if
it has. the form of the left-hand side of a /3-rule and satisfies its conditions.
In particular, a ;\-expression of the form (;\x.;\y.E)Q with different x and
y and with x E cj>(E) and y E cj>(Q) is not a /3-redex. Such a ;\-expression
can be reduced only after an appropriate renaming is carried out. This can
30 CHAPTER 2

be initiated by an a-reduction of i\y.E which yields (i\x.i\z.(i\y.E)z)Q with

z ¢ q,(Q).
It should be noted that the contractum of a ~-redex is uniquely defined
in this system, since for every ~-redex there is at most one ~-rule that is
applicable to it. In order to see how this system works we prove a few basic
lemmas. But first of all, we have to redefine the notions of ~-reduction and
equality.
Definition 2. 7 The relation M ~ N (read M reduces to N) is de-
fined as follows:
(1) M ~ N if M =N.
(2) M ~ N if M - N is an instance of the new a-rule or one
of the ~-rules.
(3) If M ~ N for some i\-expressions, M and N, then both (M)E
~(N)E and (E)M ~ (E)N for any i\-expression E.
(4) If M ~ N for some i\-expressions, M and N, then
i\x.M ~ i\x.N for any variable x.
(5) If there is a i\-expression E such that M ~ E and E ~ N then
M~N.

( 6) M ~ N only in those cases as specified by (1) through ( 5).

According to this definition reduction is reflexive and transitive. Equality
can be defined again as the symmetric and transitive closure of ~.
Definition 2.8 M is convertible (or simply equal) to N, in symbols
M = N, iff M =
N or M ~ N or N ~ M or there is a
i\-expression E such that M = E and E = N.
Now, we can prove that this new definition of equality is equivalent to the
original one. First we prove some basic facts.
Lemma 2.1 If M ~Nand x E q,(N) then x E q,(M).
Proof: The proof follows immediately from the fact that free var-
iables may only disappear but can never be introduced by any
contraction.
Lemma 2.2 For any variable x and i\-expression P we have
(i\x.P)x ~ P.
Proof: This can be shown by induction on the construction of P.
Namely,
TYPE-FREE LAMBDA-CALCULUS 31

(i\x.x)x ~ x by /31
(i\x.y)x ~ y by /32
(i\x.i\x.E)x ~ i\x.E by /33
(i\x.i\y.E)x ~ i\y.(i\x.E)x by /34, if x and y are different,
where (i\x.E)x ~ E by the induction hypothesis.
Finally, (i\x.(E 1)E 2 )x ~ ((i\x.E 1)x)(i\x.E 2 )x ~(E 1 )E 2 by
f35 and by the induction hypothesis.
Lemma 2.3 If x ¢ <t>(P) then for every Q we have (i\x.P)Q ~ P.
Proof: Again we use structural induction. If P is a variable y then
it must be different from x and thus, the assertion follows from
/32. If P has the form i\y.E then either y is identical to x and thus,
(i\x.i\x.E)x ~ i\x.E by /33, or else y is different from x, and x ¢
<t>(E), which imply that

(i\x.i\y.E)Q ~ i\y.(i\x.E)Q ~ i\y.E

by /34 and by the induction hypothesis.

For P =(E 1)E 2 the assertion follows from /35 and the in-
duction hypothesis and this completes the proof.
Lemma 2.4 For every x, z and P such that z is neither free nor
bound in P we have (i\x.P)z ~ {z/x}P.
Proof: Again we use structural induction. If P is a variable then the
assertion is trivial.
=
For P i\x.E we get

(i\x.i\x.E)z ~ i\x.E ~ i\z.(i\x.E)z ~ i\z.{z/x}E ={z/x}i\x.E

by the induction hypothesis and by Definitions 2. 7 and 2.2.
=
For P i\y.E, where x and z are different from y, we have

(i\xi\y.E)z ~ i\y.(i\x.E)z ~ i\y.{z/x}E = {z/x}i\y.E.

For P = (E 1)E 2 the result follows immediately from /35 and
from the induction hypothesis and this completes the proof.
Corollary: If M ~ N then M ~ N is valid in the sense of Definition
2.7.
32 CHAPTER 2

Proof: It suffices to consider the case when M ..... a N, that is when

M = A.x.E and N =
A.z. {z/x} E for some x, z, and E. In this case
we have

M =A.x.E ~ A.z.(.\x.E)z ~ A.z.{z/x}E =N

by the new a-rule and by Lemma 2.4, which completes the proof.
This means that the renaming operation can be implemented by the new
reduction rules. Next, we show that the same is true for the substitution
operation.
Lemma 2.5 (A.x.P)Q ~ [Q/x]P in the sense of Definition 2.7 for
every x, P, and Q.
Proof: Here we use induction on the number of occurrences of
variables in P.
Basis: If P has a single occurrence of a variable then the as-
sertion follows immediately from /31 or /32, and from Definition
2.4.
Induction step: Assume that the assertion is true for all
A.-expressions with at most n occurrences of variables and let P
haven+ 1 of them. Then P has the form either (P 1)P 2 or A.x.E. For
the former case the result follows immediately from the induction
hypothesis. For P = A.x.E we have three subcases:
Case 1: x is the same as y. In this case the assertion is trivial.
Case 2: x andy are different andy ¢ cp(Q). Here we have

(A.x.A.y.E)Q ~ A.y.(A.x.E)Q ~ A.y.[Q/x]E ~ [Q/x]A.y.E

by /34, the induction hypothesis, and Definition 2.4.

Case 3: x andy are different andy E cp(Q). In this case we get

(A.x.A.y.E)Q ~ (A.x.A.z.(A.y.E)z)Q ~ A.z.(A.x.(A.y.E)z)Q ~

A.z.(A.x.{z/y}E)Q ~ A.z.[Q/x]{z/y}E ~ [Q/x]A.y.E

by a, /34, Lemma 2.5, the induction hypothesis, and Definition 2.4.

Here we have assumed that {z/y}E has at most as many occur-
rences of variables as does E. This is easy to show by a separate
induction and this completes the proof.
TYPE-FREE LAMBDA-CALCULUS 33

Note that neither substitution nor a-congruence is needed in our new sys-
tem, since they are covered by reduction as defined in Definition 2.7. We
can summarize the result of this section in the following theorem.
Theorem 2.3 For any two A-expressions, M and N, if M ~ N holds
with respect to Definition 2.5 then it also holds with respect to
Definition 2. 7.
Proof: It is enough to show that M ~ N holds with respect to
Definition 2. 7 whenever M ~ N or M - N is an instance of the
original J3-rule. In the first case the result follows from the Corol-
lary of Lemma 2.4 while in the second case it follows from Lemma
2.5.
The converse of Theorem 2.3 is obviously false. For instance,
AX.E ~ AZ.(Ax.E)z
is not true with respect to Definition 2.5. Similarly,

follows immediately from Definition 2. 7, but it is not so when Definition

2.5 is used.
Equality, however, is the same in both systems. In particular, both
(Ax.(P 1)P 2 )Q and ((Ax.P 1)Q)(Ax.P 2 )Q
are reducible to ([Q/x]P 1)[Q/x]P 2 regardless of which definition of~ is
used. Hence, they are equal in both senses. This leads to the following
corollary which is an immediate consequence of the above theorem.
Corollary: The Church-Rosser theorem is true also for the new
system.
The new reduction rules appear to be very convenient for computer im-
plementations. A variant of this system is used in the implementation dis-
cussed in Chapter 6.

Exercises
2.1 Reduce the following A-expressions to their respective normal forms
using the J3-rule of Section 2.3.
(a) ( ( (Af.AX.Ay. (x)(f)y )p )q)r
34 CHAPTER 2

(b) ( ( (i\x.i\y.i\z. (y)x)(x)y)( u)z)y

(c) (i\x. (i\y. (x)(y )y )i\z. (x)(z)z)(i\ u.i\ v. u)w
( ' ·.
(d) ( ( (i\x.i\y.i\z. ( (x)z)(y )z) (i\ u.i\ v. u)w )i\s.s >:t
(e) ( ( ~-i\x. (i\y. (x) (y)y )i\y. (x)(y )y}i\z.i\u.i\ v. ( u)(z)v 1(i\r.i\s.r. )t~w
2.2 Show that [z/x]P ~ {z/x} P for every x, z, and P such that z is neither
free nor bound in P. Hint: Examine the proof of Lemma 2.4.
2.3 Use induction on the construction of E to prove that
i\z.[O/x]{z/y}E ~ i\v.[O/x]{v/y}E
for any i\-expressions E and 0, and variables x, y, z, and v provided that
both z and v satisfy the conditions for z in part (5) of Definition 2.4.
2.4 Which of the following two equations is correct?
= (i\ v. (i\u. ( u)v )i\x. (x)x)i\y. (y) (y )y
( (i\u.i\ v. ( v )u)i\x. (x)x)i\y. (y) (y )y
( (i\ v.i\u. ( u)v)i\y. (y)(y)y)i\x.(x)x = ((i\u.i\ v. ( v)v)u)i\y.(y )(y)y

2.5 Consider the conventional definition of substitution:

(1) [0/x]x =0
(2) [0/x]y =y
(3) [0/x]i\x.E = i\x.E
(4) [0/x]i\y.E = i\y.[O/x]E if x-¥= y; andy¢ <t>(O) or
x ¢ <j>(E).
=
(5) [0/x]i\y.E i\z.[O/x][z/y]E if x-¥= y; andy E <t>(O) and
x E <j>(E), where z is the first variable (in the infinite
sequence of all variables) which is not free in (E)O.
(6) [0/x](E 1)E 2 =([0/x]E )[0/x]E1 2

Show that for every E, 0, and x the result of the conventional substitution,
[0/x]E, is a-congruent with the result obtained from Definition 2.4.
2.6 Design and implement an algorithm to decide the a-congruence of two
i\-expressions being in normal form.
CHAPTER THREE

COMBINATORS AND CONSTANT SYMBOLS

3.1 Lambda-expressions without free variables

The meaning (or the value) of a i\-expression may depend on the meaning
of its free variables. The latter are determined by the context in which the
i\-expression occurs. Formally a context is defined as a i\-expression with
one or more holes in it. Each hole must be a proper part of the context and
it must be replaceable by any i\-expression. This means that the hole is
syntactically a i\-expression and thus, it cannot occupy the place of a
missing i\ or some other component which is not a legitimate i\-expression
by itself.
Using a i\-expression in some context means using it as a subex-
pression of another expression. But, because of the possible bindings in the
context, free occurrences of variables may get captured in the process. If,
for example, we put the i\-expression
i\x.(y)x
in the context
((i\x.i\y .hole)E)F

then we get
( (i\x.i\y.i\x. (~ )_x)_J? )F
36 CHAPTER 3

which reduces to

i\x.(F)x

The free occurrence of y in i\x. (y )x becomes bound in the given context,

hence, it gets replaced by F.
Notice the difference between using an expression E in some context
C, and the substitution of E for hole regarded as a free variable in C. The
second corresponds to the application

(i\hole.C)E - [E/ hole]C

where free variables cannot be captured.

The free variables of a i\-expression correspond to the global variables
used in a nested block or in the body of a procedure declaration in a con-
ventional programming language. A bound variable, on the other hand,
corresponds to a formal parameter or a local variable. A well-known reason /
for avoiding the use of global variables in conventional languages is to
minimize the influence of the uncontrollable context on the behavior of a
procedure.
i\-expressions having no free variables are clearly independent of their
context. Indeed, they behave the same way in any context and thus, they
are similar to constant symbols. Take, for example, the i\-expression
i\x.x. If we apply this to some other expression we always get the other
expression as a result regardless of the context. Thus, the expression i\x.x
represents the identity transformation in any context.
Definition 3.1 i\ -expressions without free variables are called
closed i\-expressions or combinators.
The term 'combinator' refers to their use as higher-order functions
which would form new functions by combining given ones. Many of the
combinators have special names. For example, the above mentioned iden-
tity combinator is denoted by I which is considered a constant symbol (or
reserved word) and not a variable.
Note that the I combinator represents a whole class of a-congruent
i\-expressions, because the i\-expression i\ v. v is an identity mapping for
every variable v. This means that the I combinator is defined only up to
a-congruence, namely

I ~ i\x.x
COMBINA TORS AND CONSTANT SYMBOLS 37

For another example, consider the operation of function composition.

Observe the fact that lambda-calculus has only two fundamental oper-
ations, application and abstraction, but no composition. Nevertheless, the
composition of two functions, f and g, can be expressed in lambda-notation
simply by
i\x. (f) (g)x

where x ¢ <P(f) U <f>(g). Hence, the combinator

compose ~ i\f.i\g.i\x. (f) (g)x

represents the composition operator. Note that this is a Curried operator

rather than an infix one. With the aid of the conversion rules we can show
that it is associative, that is
( (compose) ( (compose )f) g) h = ((compose) f) ( (compose) g) h
The reader is recommended to work out the details.
Special symbols for combinators like I and compose can be treated in
lambda-calculus in two different ways:
( 1) Either we can use them simply as shorthands for the corre-
sponding i\-expressions, in which case no extra reduction rules are
needed for them, because we can always revert to the pure
lambda-notation and use the a- and ~-rules as necessary.
(2) Or else we can treat them as atoms, i.e. constant symbols,
which cannot be further analyzed. Then, we have to introduce
specific reduction rules to define their properties.
In this book we shall use both of these alternatives according to our needs.
The defining property of the identity combinator is, of course,
(I)E-+ E
for every i\-expression E. Similarly, the composition combinator can be
defined by the reduction rule
(((compose)F)G)E-+ (F)(G)E
for arbitrary i\-expressions, F, G and E.
Two other important combinators should be mentioned here, which
represent the truth values in lambda-notation. Namely, the combinators
true and false will be defined as follows:
38 CHAPTER 3

true f:: i\x.i\y.x

false f:: i\x.i\y.y

Alternatively, they can be defined by the following reduction rules:

((true)P)Q .... P

((false)P)Q -+ Q

for every P and Q. The reason behind these definitions is the fact that the
truth values are normally used for selecting either one of two given alter-
natives. (Think of an IF-statement in a programming language.) If the
condition evaluates to true then the first alternative is computed, if it eval-
uates to false then the second. Hence, the conditional expression

if C then P else Q
will take the form
((C)P)Q

in our lambda-notation. Here P and Q represent arbitrary i\-expressions,

while the i\-expression C is supposed to be reducible either to i\x.i\y.x or
to i\x.i\y.y. Thus, the conditional expression is represented by the
combinator

i\c.i\p.i\q. ((c)p )q

which takes three arguments. Observe the fact that this combinator can be
applied to arbitrary i\-expressions regardless of the 'type' of the first argu-
ment. In this respect type-free i\-calculus is similar to the machine code of
a conventional computer where arbitrary operations can be performed on
any data. We may get some unexpected results but that is not the comput-
er's fault.
Returning to the combinators we can say that there are quite a few
interesting and useful combinators. Some of them will be studied more
thoroughly in this chapter, but we cannot discuss them all. Historically,
combinators were invented first by Schoenfinkel in the late 20's. He used
them to eliminate all variables from mathematical logic. Curry, who inde-
pendently discovered them in about the same time, was mainly responsible
for the subsequent development of their theory. Combinators have re-
COMBINATORS AND CONSTANT SYMBOLS 39

ceived a great deal of attention lately in computer science especially in the

context of functional programming.

Exercises
3.1 Negation can be represented by the combinator not which is
a-congruent to
Ax. ( (x)false)true
Find combinators to represent the boolean operations and and or.
3.2 Show that the operations represented by your and and or combinators
are both commutative and associative. Also, show that your and and or
combinators do have the usual distributive properties.
3.3 Find a combinator to represent the prefix apply operator characterized
by the reduction rule
((apply)A)B .... (A)B
for arbitrary A-expressions A and B. Using your representation show that
also the iteration
( ( (apply)apply)A)B
/3-reduces to (A)B.

3.2 Arithmetic and other constants and combinators

As we have seen in the previous section, both the truth values and the
standard boolean operations can be represented by certain combinators,
that is, by certain A-expressions without free variables. It seems only na-
tural that no free variables occur in these representations, since they must
not depend on the context in which they are used. In other words, they
must behave the same in every context. In this respect, there is no differ-
ence between a constant value like true or false and some well-defined
function like and. As a matter of fact, every well-defined function can be
considered a constant entity regardless of the number of operands it takes.
40 CHAPTER 3

Observe the fact that even the truth values are represented here as
functions. Each of them can take up to two arguments as can be seen from
their representations. But that is quite natural. If we do not use constant
symbols in our lambda-notation then we have only application, ab-
straction, and variables. So, the only way to represent context independent
behavior is by using combinators.
It is important to note that almost anything can be represented in this
manner. To support this claim, we show the combinator representation of
natural numbers developed by A. Church. These combinators are called
Church numerals, and they are defined as follows:
0 ~ }..L\x.x

1 ~ Af.Ax.(f)x

2 ~ Af.Ax.(f)(f)x

3 ~ Af.Ax.(f)(f)(f)x

and, in general, the combinator representing the number n iterates its first
argument to its second argument n times.
The arithmetic operations on these numerals can be represented also
by appropriate combinators. For instance, the successor function, which
increments its argument by one, can be represented as

succ ~ An.Af.Ax.(f)((n)f)x

Note that the names of the bound variables do not matter. Addition can
be represented as

+ ~ Am.An.Af.Ax.((m)f)((n)f)x

while multiplication will be the same as composition, namely

* ~ Am.An.Af.(m)(n)f
The reader is recommended to work out the details and find a represen-
tation for the exponentiation m" in this framework.
A predicate to test for zero in this representation can be given as fol-
lows:

zero~ An.((n)(true)false)true
COMBINATORS AND CONSTANT SYMBOLS 41

The representation of the predecessor function, which gives n - 1 for

n > 0, and 0 for n = 0, is quite tricky. The idea is to represent ordered pairs
in lambda-notation which would correspond somehow to [n, n - 1]. A
possible representation of the ordered pair [a, b] is the expression
i\z.((z)a)b,
which has the following properties
(i\z.((z)a)b)true ~a

and
(i\z.((z)a)b)false ~ b
Now, in analogy with the successor function we can define a function next
to obtain [n + 1, n] from [n, n - 1]. The corresponding i\-expression will
be the following:
next~ i\p.i\z.((z)(succ)(p)true)(p)true
which makes use of only the first element of the ordered pair p representing
the argument. So, we can start with [0, 0] and iterate the next function n
times to get [n, n - 1] . But, that is easy since the Church numeral repres-
enting the number n involves precisely n iterations. Therefore, we can ap-
ply this Church numeral to next as its first argument and to the expression
i\z.((z)O)O
as its second argument. This gives us the i\-expression
( (n)i\p.i\z. ( (z) (succ)(p )true) (p )true)i\z. ( (z)O)O
where n stands for the Church numeral representing the number n
Finally, in order to obtain the value of the predecessor function, we
have to select the second element of the resulting ordered pair. Thus, the
predecessor function can be represented by the following i\-expression:
pred ~ i\n. ( ( (n)i\p.i\z.( (z)(succ)(p )true)(p )true) i\z. ( (z)O)O)false

It may be interesting to note that Church himself could not find a repre-
sentation for the predecessor function. He had just about convinced him-
self that the predecessor function was not lambda-definable when Kleene
found a representation for it. (See page 57 in [Klee81 ].)
42 CHAPTER 3

It can be shown, in general, that every recursive function of type

[N-+ N] is lambda-definable and integer arithmetic can be faithfully em-
bedded in pure lambda-calculus. This seems to be a useless 'tour de force',
because the decimal notation is clearly more convenient than these en-
coded 'numerals'. There is, however, a good reason behind this exercise.
It has, indeed, some important theoretical implications. First of all, the
proof of the Church-Rosser theorem extends to the arithmetic of natural
numbers. Therefore, we can use the decimal notation for the natural
numbers and various constant symbols for their operations knowing that
the extended system still has the Church-Rosser property. Of course, we
cannot say the same about real arithmetic.
The natural numbers can be represented in lambda-calculus in many
different ways. The representation of their operations will have to be
changed accordingly. An alternative numeral system is given in Exercise
3.6.
It should be emphasized that the Church numerals as well as the rep-
resentations of their basic operations are all combinators. These
combinators will be given special names and will be treated as constant
symbols in the rest of this book. In fact, we will use integers and real numbers
in decimal notation as part of our lambda-notation and we will assume that
the standard arithmetic operations are implemented directly rather than via
combinators. A similar assumption is made regarding the truth values and
the boolean operators. So, the syntax of ;\-expressions will be extended by
the following.

ADDITIONAL SYNTAX RULES FOR CONSTANTS

<constant>::= <number> I <operator> I <combinator>

<number>::= <integer> I <real number>
<operator>::= <arithmetic operator> I <relational operator>
<predicate> I <boolean operator>
<arithmetic operator>::= + I - I * I I I succ I pred I mod
<relational operator>::= < I ~ I = I ~ I > I ¢
<predicate> : : = zero
<boolean operator>::= and I or I not
COMBINATORS AND CONSTANT SYMBOLS 43

<combinator>::= true I false

All these constants will have their usual meaning. The binary operations
and the relations are Curried, however, so that we write ( ( + )a)b instead
of a + b, and (( <)a)b instead of a < b, etc. The translation to and from
the usual infix notation is straightforward. The semantics of these con-
stants will be described in more detail in Chapter 5 with the aid of their
specific reduction rules.

Exercises
3.4 Define a combinator square to compute n 2 for a natural number n using
Church numerals.
3.5 Find a A-expression to represent the predicate even which returns true
whenever the argument is an even number and returns false when it is an
odd number. Use Church numerals and do not worry about the value of
the predicate for non-numeric arguments.
3.6 Consider the following numerals:
0~1

1 ~ Az.((z)false)I
2 ~ AZ. ( (z)false)Az.( (z)false )I
and so on. Find A-expressions to represent the successor and the predecessor
functions and the predicate to test for zero. The latter may be used in the
definition of the predecessor function.

3.3 Recursive definitions and the Y combinator

Recursive functions can be defined easily in high-level programming lan-
guages such as Pascal or LISP. Therefore, they should be definable in
lambda-calculus, as well. As a matter of fact, it is easy to find a formal
solution to every recursion equation in lambda-calculus. This can be done
with the aid of theY combinator, which has the following property:
44 CHAPTER 3

(Y)E = (E)(Y)E·
for every A.-expression E. This implies that
(Y)E = (E)(E) ... (E)(Y)E
for any number of iterations of E. The question is whether we can find a
closed A.-expression with this property. In Chapter 2 we have seen that the
A.-expression
(A.x.(x)x)A.x.(x)x
reduces to itself and thus, it gives rise to an infinite reduction. This feature
is very similar to what we are looking for, only we need to deposit a prefix
of the form (E) in each reduction step. But this can be achieved by the
following modification:
A.y .(A.x.(y)(x)x)A.x.(y)(x)x
Indeed, if we apply this combinator to a A.-expression E then we get
(A.x.(E)(x)x)A.x.(E)(x)x => (E)(A.x.(E)(x)x)A.x.(E)(x)x
=> (E)(E)(A.x.(E)(x)x)A.x.(E)(x)x
and so forth ad infinitum. So, theY combinator will be defined as
Y ~ A.y.(A.x.(y)(x)x)A.x.(y)(x)x
The usefulness of this combinator is due to the fact that every recursive
definition can be brought to the form of a fixed-point equation
f = (E)f
where f does not occur free inside E. The solution to this equation, namely
the fixed-point of the higher order function E can be obtained as (Y)E.
Indeed, the substitution of (Y)E for fin the above equation yields
(Y)E = (E)(Y)E
which is true for every E by the definition of Y. Therefore, the Y
com bin a tor is a universal fixed-point finder, hence, it is called a fzxed-point
combinator.
COMBINATORS AND CONSTANT SYMBOLS 45

In order to see how it works in a simple case consider, for example, the
following definition of the factorial function.
(fact)n = ifn=O then 1 e/se((*)n)(fact)(pred)n
The solution of this implicit equation can be obtained in explicit form as
follows. The equation is written in our lambda-notation as
(fact)n = (((zero)n)1)((*)n)(fact)(pred)n
which is equivalent to
fact= A.n.(((zero)n)1)((*)n)(fact)(pred)n
Now, if we 'abstract out' (in analogy with factoring out) the free occur-
rences of the function name 'fact' on the right-hand side, we get
fact = (A.f.A.n. ( ( (zero)n) 1)( ( *)n)(f)(pred)n)fact
which means that fact is a fixed point of the expression
A.f.A.n. ( ((zero )n) 1)( (*)n)(f) (pred)n

Hence, we get the explicit form

fact= (Y)A.f.A.n.(((zero)n)1)((*)n)(f)(pred)n
which is again a combinator.
The same technique is applicable to every recursive definition regard-
less of the form of the expression on the right-hand side. This means that
in lambda-calculus we have a universal recursion-removal technique that
always works. This broad generality, however, does have its problems as
it produces a formal solution even if the given fixed-point equation has no
realistic solution. Take, for example, the

X= X+ 1
which has obviously no finite solution because the successor function has
no fixed points. Nevertheless, the fixed-point combinator would yield a
formal solution in the following closed form:
x = (Y)A.x.((+)x)1
Here the right-hand side gives rise to an infinte reduction, which yields an
infinite number of iterations, because it has no normal form. Indeed, the
46 CHAPTER 3

Y combinator converts every recursion into an iteration, but it cannot ter-

minate the iteration by itself.
The existence of a universal fixed-point combinator in type-free
lambda-calculus has serious consequences as manifested by the following:
Curry's paradox: The usual logical properties of implication are in-
consistent with P-equality.
Proof: The implication operator => must satisfy the following ax-
iom written in conventional notation:
(P => (P => Q)) => (P => Q)

Furthermore, it must satisfy the inference rule of modus ponens,

which says that
if both P and (P => Q) are true then so is Q
Let imp denote the Curried version of =>. Then the above axiom
will be written as
((imp) ( (imp)P)( (imp)P)Q) ( (imp)P)Q

For arbitrary Q, let N be defined as

N = Ax.((imp)x)((imp)x)Q
and let
P = (Y)N
be a fixed point of N. Hence, we get
((imp)P)((imp)P)Q = (N)P = (N)(Y)N = P
By substitution in the axiom we get
((imp)( (imp )P)( (imp )P)Q )((imp) P)Q =
((imp)P)((imp)P)Q =P
Now, since the axiom must be true for any P and Q,
P = (Y)N =true
must be the case for any Q (which is used in the definition of N).
But then from modus ponens and from the axiom we can conclude
that
COMBINATORS AND CONSTANT SYMBOLS 47

Q =true
for arbitrary Q, which is clearly a paradox.
One might suggest that the problem with the Y combinator is due to the
fact that it involves self-application. Clearly, the application of a function
to itself seems inconsistent with the fact that the cardinality of the function
space R 0 is always greater than that of D. Therefore, it is impossible to find
a reasonable domain D and a rangeR such that R 0 cD as implied by self-
application. But, if the function space is restricted to Scott-continuous
functions then one can construct such domains. This means that self-
application by itself is not inconsistent with set theory, but the usability of
the type-free lambda-calculus has certain limitations.
Self-application may be useful for certain purposes. It is quite possible
to apply a computer program to itself and get a meaningful result. A well
known example is a LISP interpreter written in LISP. Also, one can easily
write self -applicable programs in the machine code of a regular computer.
Indeed, for a truly type-free theory of functions self-application cannot be
ruled out.
Note that the universal applicability of the Y combinator does not of-
fer a practical solution to every fixed-point equation. If, for instance, we
were to solve the perfectly reasonable fixed-point equation
x=3x-10
that is
x = ((-)((*)3)x)10
then we would get
x = (;\x.((-)((*)3)x)10)x
and the explicit form
x = (Y);\x.((-)((*)3)x)10
The right-hand side of this equation reduces to
(;\x. ((- )( ( *)3 )x) 1O)(Y);\x. ((- )( (*)3 )x) 10
which further reduces to
( (- )( (*)3 )(Y);\x.( (- )( (*)3 )x) 10) 10
48 CHAPTER 3

and so on indefinitely. The right-hand side has no normal form so we can-

not get a finite result from this infinite development.
In case of a well-founded recursion, however, the iteration generated
by the Y combinator will terminate precisely at the right moment. For in-
stance, the application of the explicit form of the factorial function to the
argument 5 would yield the result in precisely 5 iterations. This is due to
the presence of a second abstraction, i.e. the prefix .\n, and the predicate
zero.
Unfortunately, it is in general undecidable whether a recursively de-
fined function is computable in a finite number of steps (i.e. has a normal
form) when applied to some argument. The fact that there exist unsolvable
equations should not be a surprise. It is also reasonable that we cannot
have an algorithm to recognize them mechanically.
For the sake of completeness we have to mention that there are other
fixed-point combinators besides the best known Y combinator. The fol-
lowing was given by Turing:
T ~ (.\x ..\y.(y)((x)x)y).\x ..\y.(y)((x)x)y
It is easy to see that
(T)E ~ (E)(T)E

for every .\-expression E. Actually, there is an infinite sequence of fixed-

point combinators Y1, Y2 , ... where
yl ~ y

y2 ~ (YI)G
and in general
Y,,+ I ~ ( Y,)G
where
G ~ .\x ..\y.(y)(x)y
The reader should verify that Y2 ~ T, where T is the fixed-point
combinator of Turing. For more details see exercise 3.9
COMBINATORS AND CONSTANT SYMBOLS 49

Exercises
3. 7 Give a recursive definition for the Fibonacci numbers and use the Y
combinator to compute the fifth Fibonacci number, i.e. the value of
(Fibonacci) 5.
3.8 Give a recursive definition for the greatest common divisor of two in-
tegers and compute the value of ((gcd)l0)14 using theY combinator.
3.9 Consider the sequence of combinators Y1, Y2, ••• , and the combinator
G as defined at the end of this section. Show that
(a) Each ~ is a fixed-point combinator;
(b) Each~ is a fixed-point of G, i.e.~= (G)~,

which means that G has an infinite number of fixed-points each of which

is a fixed-point combinator.

3.4 Elimination of bound variables: bracket abstraction

According to a fundamental theorem of Schonfinkel and Curry, the entire

lambda-calculus can be recast in the theory of combinators which has only
one basic operation: application. Abstraction is represented in this theory
with the aid of two distinguished combinators: S and K, which are called
standard combinators. The idea is similar to the representation of composi-
tion in terms of application and abstraction, but here we have a much more
difficult task.
First, we should observe the fact that abstraction is actually a partial
inverse of the application. We have, namely

(;\x.P)x ~ P

for every ;\-expression P and variable x. But the order of these two oper-
ations is important here. If we use them in the reverse order, i.e. if we first
apply P to x and then abstract the result with respect to x, then we get
;\x.(P)x
50 CHAPTER 3

which is clearly not f3-reducible to P. (The prefix .\x. will not disappear
unless the entire expression is applied to some other expression.) Even so,
P may have some free occurrences of x which get captured by the prefix
.\x. Therefore, the application
(.\x.(P)x)Q

reduces to ([Q/x]P)Q which is, in general, different from (P)Q. Of

course, we can fix this problem by using a fresh variable for the ab-
straction. Namely, it is easy to prove the following:
Lemma 3.1 For any .\-expressions, P and Q, and a variable x such
that x ¢ cp(P) we have
(.\x.(P)x)Q 9 (P)Q
This means, that for x ¢ cp(P) the expression .\x.(P)x is extensionally equal
to P, although it has a different normal form. In other words, our a- and
/3-rules are insufficient for proving the equality

.\x.(P)x =P for x ¢ cp(P)

although these two expressions represent the same extensional function.

Remember that two functions are called extensionally equal if they
have the same value for all arguments. In symbols,

f =g iff (f)x = (g)x for all x.

Now, in order to make our lambda-calculus extensionally complete we have

to introduce a new reduction rule as follows:

THE ETA-RULE

(11) .\x.(P)x -+ P whenever x ¢ <t>(P)

By adding this rule to our axiom system we can extend the notion of
/3-equality to extensional equality, which is then formally defined as
/311-equality.
However, for the most part of the development of our theory, the
11-rule is not necessary. Therefore, we will continue to use only the a- and
the f3-rules as our standard tools for evaluating .\-expressions, and every
application of the 11-rule will be made explicit.
COMBINATORS AND CONSTANT SYMBOLS 51

Returning to the question of eliminating abstraction by using

combinators instead, the number of permissible combinators is a critical
issue. If we allow an arbitrary number of different combinators then the
problem becomes much simpler. Every .\-expression without free variables
can be considered a combinator, so it can be used as such. (There is a
certain analogy with the elimination of division via multiplication by the
inverse of the divisor.) Free variables, on the other hand, can be
'abstracted out' from any .\-expression. If, for instance, u and v are the
only free variables in P then
.\u ..\v.P
is clearly a combinator with
((.\u ..\v.P)u)v = P
This way we can replace every .\-expression by an equivalent one having
only constants, combinators, and free variables applied to each other. This
gives rise to the notion of a combinator expression which is defined by the
following simplified syntax.

THE SYNTAX OF COMBINATOR EXPRESSIONS

Next we shall prove that the two standard combinators, S and K, are suf-
ficient for eliminating all abstractions, i.e. all bound variables from every
.\-expression. These combinators are defined as follows.
S ~ .\x ..\y ..\z.((x)z)(y)z
K ~ .\x ..\y.x.
Note that the K combinator is actually the same as true. Furthermore, the
identity combinator
I ~ .\x.x
can be expressed in terms of S and K due to the following equality:
((S)K)K =I
52 CHAPTER 3

which can be easily verified by the reader. Hence, we can use these three
combinators S, K, and I where I is just a shorthand for ((S)K)K.
A combinator expression in which the only combinators are the
standard ones is called a standard combinator expression.
In order to eliminate the bound variables from a .\-expression we use an
operation called bracket abstraction, which is the combinatory equivalent of
.\-abstraction. Namely, for every variable x and standard combinator ex-
pression M there exists a standard combinator expression [x]M such that
[x]M = .\x.M. Note that the bracket prefix, [x], is only a meta-notation and
the expression [x]M stands for a true standard combinator expression
which is /3-convertible to .\x.M.

ALGORITHM FOR BRACKET ABSTRACTION

Input: A variable x and a standard combinator expression M.
Output: A standard combinator expression [x]M with [x]M = .\x.M.
The algorithm proceeds as follows:
If M = c where c is a constant then let [x]M = (K)c.
If M = C where C is one of the standard combinators then let
[x]M = (K)C.
If M = v where v is a variable then let [x]M = I for v = x, and
let [x]M = (K)v for v ¢. x.
Finally, if M = (P)Q for some standard combinator ex-
pressions, P and Q, then let [x]M = ((S)[x]P)[x]Q.
The last clause of the algorithm shows that it must be applied recursively
until we get down to the atomic components. It is easy to verify that the
expression [x]M obtained from this algorithm is indeed equal to .\x.M. For
that purpose we use induction on the construction of M. The assertion is
trivial when M is an atom or a standard combinator. If M is an application
then we have
( (S)[x]P)[x]Q = ( (S).\x.P).\x.Q

from the induction hypothesis. Here the right-hand side /3-reduces to

.\z. ( (.\x.P)z)(.\x.Q)z

by the definition of S. This further reduces to

COMBINATORS AND CONSTANT SYMBOLS 53

,\z.({z/x}P){z/x}Q = ,\z.{z/x}(P)Q = ,\x.(P)Q

according to Lemma 2.4, Definition 2.2, and the a-rule, which completes
the proof.
Theorem 3.1 To every ,\-expression E one can find a standard
combinator expression F such that E = F.
Proof: Given an arbitrary ,\-expression, E, we successively elimi-
nate its bound variables by performing bracket abstraction
inside-out. This means that we apply our algorithm to the inner-
most abstractions (if any) occurring in E. Thereby we get some
E' such that E' = E and there are fewer abstractions in E' than in
E. Repeating this process with E' in place of E, eventually we get
a standard combinator expression F = E.
Example: Consider the following ,\-expression:
( (,\x,,\y.(x)(y)y),\z.(x)(z)z)(,\u.,\ v.u)w

Applying the algorithm to the innermost ,\ 's yields

( (,\x. ( (S)[y ]x)[y ](y)y)( (S)[z]x)[z](z)z) (,\u.(K)u)w

which is equal to
( (,\x. ( (S)(K)x)( (S)I)I)( (S)(K)x)( (S)I)I)( ( (S)[ u]K)[ u]u)w.

Next we get
( ( ( (S)[x](S)(K)x)[x]( (S)I)I)( (S)(K)x)( (S)I)I) ( ( (S)(K)K)I)w

and eventually
( ( ( (S)( (S) (K)S)( (S)(K)K)I) ( (S)( (S)(K)S) (K)I)(K)I) ( (S) (K)x) ( (S)I)I)
(((S)(K)K)I)w
which is equal to our original ,\-expression. By the way, the normal form of
this expression is
(x)(w)w
An immediate consequence of the above theorem is the following:
Corollary: Every closed ,\-expression is equal to some standard
combinator expression with no variables in it.
54 CHAPTER 3

As can be seen from the previous example, the size of the expression grows
larger and larger with each subsequent abstraction. Therefore, the above
algorithm is impractical when we have to abstract on a large number of
variables.
Curry improved on the basic algorithm by introducing two more
combinators, 8 and C, for the following special cases of S.
8 ~ i\x.i\y.i\z.(x)(y)z
C ~ i\x.i\y.i\z.((x)z)y
Having these combinators we can simplify the expressions obtained from
the basic algorithm by applying the following rules:
(1) ( (S)(K)P)(K)Q = (K)(P)Q
(2) ((S)(K)P)I = P
(3) ((S)(K)P)Q = ((8)P)Q
(4) ((S)P)(K)Q = ((C)P)Q
It is interesting to note that the second rule can only be derived in
lambda-calculus by using the 17-rule, as well. This means that the second
equation is an extensional one in lambda-calculus. The reader should verify
each of these equalities.
The improved version of the abstraction algorithm will follow the same
steps as the basic algorithm but whenever an expression of the form
( (S)P)Q is created it will be simplified using the above equations, if it is
possible to do so. For that purpose, the equations will be considered in the
priority of their sequence, that is, if more than one equation is applicable
at the same time then the one with the smallest serial number will be ap-
plied.
To see this algorithm at work consider a function with two variables,
( (F)x)y. Abstracting on x gives us
[x]((F)x)y = ((S)[x](F)x)[x]y = ((S)((S)[x]F)[x]x)[x]y =
((S)((S)(K)F)I)(K)y = ((S)F)(K)y = ((C)F)y
Similarly, abstracting on y yields
[y]((F)x)y = ((S)[y](F)x)[y]y = ((S)((S)[y]F)[y]x)[y]y =
COMBINATORS AND CONSTANT SYMBOLS 55

( (S)( (S)(K)F)(K)x)I = ((S)(K)(F)x)I = (F)x

This is clearly better than what we can get from the basic algorithm. Yet
the size of the resulting expression may still grow too fast when repeated
abstractions are made. Assume, that P and Q are A-expressions with se-
veral free variables, say x 1, x2 , x3 , •.•• Then Curry's algorithm will give the
following results:
[x 1](P)Q = ((S)[x 1]P)[x 1]Q
[x 2 ][x 1](P)Q = ((S)((8)S)[x 2 ][x 1]P)[x2][xdQ
[x 3 ][x 2][xd(P)Q = ((S)((8)S)((8)(8)S)[x 3][x 2][xdP) [x 3 ][x 2][x 1]Q
and so on. The size of the expression will increase as a quadratic function
of the number of abstractions.
Significant further improvements were made by David Turner who
introduced three more combinators, s', 8', and c', which are related to S,
8, and C. Their definitions are as follows:
S' ~ At.Ax.Ay.Az.(t)((x)z)(y)z
8' ~ At.Ax.Ay.Az.(t)(x)(y)z
C' ~ At.Ax.Ay.Az.(t)((x)z)y
These new combinators behave very much like their counterparts except
that they 'reach across' an extra term at the front.

TURNER'S ALGORITHM
Use the algorithm of Curry, but whenever an expression beginning
with S, 8, or C is formed use one of the following simplifications, if it is
possible to do so.
((S)((8)T)P)Q = (((S')T)P)Q
((8)T)((8)P)Q = (((8')T)P)Q
((C)((8)T)P)Q = (((C')T)P)Q
Turner's algorithm would increase the length of an expression as a linear
function of the number of abstractions. Namely,
56 CHAPTER 3

[x 2][xd(P)Q = (((S')S)[x 2 ][x 1]P)[x2 ][x 1]Q

[x 3 ][x 2][xd(P)Q = (((S')(S')S)[x 3][x 2][x 1]P)[x3][x 2][x 1]0
and so on.
These combinators have been used in the design of the Normal Order
Reduction Machine - NORMA for short - developed by the Austin Re-
search Center of the Burroughs Corporation. Actually, the above defi-
nition of B' is due to Mark Scheevel, and it is slightly different from
Turner's original definition, that is,
B' ~ At.Ax.Ay.Az.((t)x)(y)z
The modified version of B' is claimed to be more efficient than the original.
There are many other efforts to improve the efficiency of bracket ab-
straction as can be seen from the literature. It is quite interesting to see that
these purely theoretical constructs have become important practical tools
for such mundane purposes as hardware design.
To conclude this section we have to emphasize that the theory of
combinators can be developed totally independently of lambda-calculus,
in which case the combinators are defined directly by their reduction rules,
i.e. by their applicative behavior. As we have seen, the standard
combinators, S and K, are sufficient for representing arbitrary closed
A-expressions, hence, for an independent development of the theory of
combinators we need only these two combinators and their reduction rules:
(((S)A)B)C -+ ((A)C)(B)C

and
((K)A)B-+ A

for arbitrary combinator expressions, A, B, and C.

Bracket abstraction represents a translation mechanism from the
A-notation to combinator expressions, but it is not needed for an inde-
pendent development of the theory of combinators. However, if we define
combinators directly by their reduction rules then their representations by
closed A-expressions may not be unique up to a-congruence. A case in
point is the fixed-point combinator, which has many noncongruent rep-
resentations in lambda-notation. In particular, the A-expressions repres-
enting the combinators
COMBINATORS AND CONSTANT SYMBOLS 57

Y ~ Ay.(Ax.(y)(x)x)Ax.(y)(x)x
and
T ~ (Ax.Ay.(y)((x)x)y)Ax.Ay.(y)((x)x)y
are noncongruent (and not even {3-convertible), although their applicative
behavior is the same. This shows that the relationship between A-calculus
and the theory of combinators is more subtle than one might think at first.
A deeper analysis of this relationship can be found in [Baren81] or in
[Hind86].
The intuitive appeal of the lambda-notation is certainly missing from
pure combinator expressions even if we use a great deal more combinators
than just the standard ones. The lack of A-abstraction seems to be an ad-
vantage in functional programming, but a completely variable-free notation
is not always desirable. More about functional programming can be found
in Chapter 5.

Exercises
3.10 Show that the standard combinators, S and K, satisfy the following
equalities:
((S)((S)(K)S)(K)K)(K)K = (K)((S)K)K
((S)((S)(K)S)((S)(K)K)K)(K)((S)K)K = K
and
((S)((S)(K)S)( (S)(K)(S)(K)S)( (S)(K)(S)(K)K)S)(K)(K)( (S)K)K = S
3.11 Prove the following extensional (i.e., f3YJ) equality:
((S)((S)(K)S)K)(K)((S)K)K = ((S)K)K
3.12 Use bracket abstraction to prove that the Y combinator satisfies the
equality
Y = ((S)((S)A)B)((S)A)B
where
A= ((S)(K)S)((S)(K)K)I
B = ((S)((S)(K)S)(K)I)(K)I
58 CHAPTER 3

and S, K, I are the standard combinators.

3.13 Find a more compact representation of the Y combinator using
Turner's combinators.
3.14 Find a standard combinator expression to represent Turing's fixed-
point combinator T.
CHAPTER FOUR

LIST MANIPULATION IN LAMBDA-CALCULUS

4.1 An extension of the syntax of A.-expressions

Lists are regarded only as data structures in most programming languages.

An exception is LISP where the programs themselves are structured as
nested lists. This has been an important step toward a uniform treatment
of programs and data. Modern functional languages, however, with the
exception of Backus's FP, seem to have thrown out the baby with the
bathwater by abandoning this interesting idea.
The construction of a list of functions is one of the fundamental pro-
gram forming operations called 'combining forms' in FP. The irony of the
situation is that this combining form would be more natural in a nonstrict
language such as Miranda, which favors lazy evaluation, than it is in FP,
which is strict. The implementation of Miranda is based on graph-reduction
where both the program and its data are represented internally as graphs,
and they are actually merged during the execution. So, it seems only natural
that they may have similar structures. The implementation of functional
languages using graph-reduction is the subject of an interesting book by
Simon L. Peyton Jones [Peyt87]. List construction, however, is not con-
sidered as a program forming operation in that book, because in Miranda
the elements of a list must be all of the same type.
60 CHAPTER 4

Moreover, the implementation of Miranda involves a translation to the

type-free lambda-calculus where the distinction between functions and
arguments disappears. The same is true for ML and many other functional
languages.
In contrast with those, FP makes a sharp distinction between programs
and data. The latter are called 'objects' and a list of objects is syntactically
different from a list of functions. Function lists have an interesting appli-
cative property in FP which can facilitate parallel processing without
exp/iciltly requiring it. More details on FP and Miranda can be found in
Chapter 5.
In this chapter we will show that a minor extension of the standard
lambda-calculus makes it possible to integrate function lists and data lists
in a uniform framework. Our system is fully consistent with standard
lambda-calculus, which is indeed a proper part of it.
As a matter of fact, lists and list-manipulating functions can be re-
presented in pure lambda-calculus in a fashion similar to the representation
of natural numbers by Church numerals as discussed in Section 3.2. The
idea of 'encoding' a list as a standard A-expression is similar to the repre-
sentation of ordered pairs which has been used for the representation of
the predecessor function in Section 3.2. But, here we need an extra con-
stant symbol, nil, to represent the empty list. Then a list of arbitrary
A-expressions,

can be represented by
Az.((z)E 1) ... AZ.((z)En)nil
where z is any variable which is not free in Ei (1 ~i~n). For the given rep-
resentation the two basic list manipulating operations can be implemented
by the following combinators:
head ;;:;; AX. (x)true
tail ;;:;; Ax.(x)false
Indeed, the application of head to the above representation of a list returns
its first member E 1, while the application of tail returns the representation
of the remaining (n-1)-element list, as can be verified easily by the reader.
The so called list-constructor operator, which appends its first operand as
LIST MANIPULATION IN LAMBDA-CALCULUS 61

a new element to the front of its second operand regarded as a list, can be
represented by the combinator
cons~ A.x.A.y.A.z.((z)x)y
The reader should verify that
((cons)A)[E 1, ... , En]~ [A, E 1, ... , En]
in view of the given representation.
This means that list manipulation can be implemented in standard
lambda-calculus without using any extra notation. However, this imple-
mentation is based on a simulation of the elementary list operations via
f3-reduction where each of those operations takes several /3-reduction steps
to execute.
A more concise representation can be obtained if the list manipulating
operators are treated as true combinators, i.e. atomic symbols supplied
with appropriate reduction rules. Then a list of the form

can be represented by
( (cons)E 1) ••• ( (cons)En)nil
and the reduction rules for the head, tail, and cons combinators can be
given accordingly.
Here we shall follow a similar approach, but we use a conventional
notation for lists with brackets and commas. Arbitrarily nested lists will be
considered as valid A.-expressions, and the elementary list operators will
be treated as constant symbols. The extended syntax is given below.

THE EXTENDED SYNTAX OF A.-EXPRESSIONS

<A.-expression>::=<variable> I <constant> I <abstraction> I

<application> I <list>
<variable>::= <identifier>
<constant>::=<number> I <operator> I <combinator>
<abstraction>::= A.<variable>. <A.-expression>
62 CHAPTER 4

<application>::= (<A.-expression> )<A.-expression>

<list>::= [<A.-expression><list-tail> I[]

<list-tail>::= ,<A.-expression> <list-tail> I]
<operator>::= <arithmetic operator> I <relational operator> I
<predicate> I <boolean operator> I <list operator>

<arithmetic operate:>::= + I - I * I I I succ I pred I mod

<relational operator>::= < I :5 I = I ~ I > I :¢:.
<predicate> : : = zero I null
<boolean operator>::= and I or I not

<list operator>::= A I~ I&

<combinator>::= true I false I Y

The empty list is denoted by []. The constant symbols A, ~. and &
represent the head, tail, and cons operators, respectively. We use these single
character symbols in our implementation simply because they are easier to type
on a terminal keyboard than the corresponding four letter words. The predi-
cate null represents the test for the empty list. So, we can form
,\-expressions like

(A.q.[p,q,r])S

(A.x.[ (x)y,(y)x])M

(A.x.(A.y.[x,y ])a)[b,c,d]

With this, of course, we expect that the first expression here will reduce to
[p,S,r], the second to [(M)y,(y)M], and the third to [[b,c,d],a]. To achieve
this we shall need some additional reduction rules which will be given in the
next section.
LIST MANIPULATION IN LAMBDA-CALCULUS 63

4.2 Additional axioms for list manipulation

First, we give the reduction rules for the elementary list operators intro-
duced in the previous section.

Head

Tail
(~)[]- []

( ~ )[EI, ... ,En] - [E2, ... ,En] for n~2

Construction
((&)A)[] - [A]
((&)A)[E 1, ••• ,En] - [A, E 1, ••• ,En] for n~ 1

Test for the empty list

(null)[] - true

(nuii)[E 1, ••• ,En] - false for n~ 1

Selection
for n~ 1
fork> 1, n~ 1

Note that A and E; (1 :5i:5n) denote arbitrary A-expressions in these rules.

It should be emphasized that we have not defined these operations for all
possible arguments. For instance, (null)3 is undefined, but we do not force
its evaluation by replacing it by a so called 'undefined symbol'. Such
A-expressions will simply be left alone as being irreducible, i.e. already in
normal form. They may, of course, occur as subexpressions of more
meaningful A-expressions, since they may get discarded during the evalu-
ation process.
64 CHAPTER 4

In contrast with LISP, both A and ~ are well-defined here for the
empty list. This turns out to be very useful for a recursive definition of
certain list-manipulating functions. The selection of the first member of a
list, however, is undefined for the empty list. Hence, (1 )E is not always the
same as (A )E.
The apparently meaningless application of an integer k to some list L
is interpreted here as the the selection of the k-th element of L. Both the
integer k and the list L may be given as arbitrary A-expressions and thus,
they must be evaluated (to some extent), before we can tell whether they
fit together. If, for instance, A and B are arbitrary A-expressions then the
A-expression
(2) ( (&)A)(Ax. (Ay. ( (&)x)y )[])B

reduces to (2)[A,B] and hence to B. On the other hand,

(3)[A,B]

reduces to (1 )[], which is in normal form.

The selection function k should not be confused with the constant
valued function Ax.k, which always evaluates to k when applied to any
other A-expression.
Now, in order to make our system work we need additional reduction
rules. The purpose of these new rules is to make our list structures acces-
sible for the a-rules and {3-rules. First of all, we want to distribute the re-
naming prefix among the elements of a list. This will be done by the
aS-rule given below. The following is a complete definition of the renaming
operation in terms of rewriting rules.

ALPHA-RULES

(al) {z/x}x - z
(a2) {z/x}E- E if x does not occur free in E

(a3) {z/x}Ay.E - Ay.{z/x}E for every A-expression E, if x ~ y ~ z.

(a4) {z/x}(E 1)E 2 - ({z/x}E 1){z/x}E 2

(a5) {z/x}[E 1 , ... ,En]- [{z/x}E 1, ... , {z/x}En] for n~O

LIST MANIPULATION IN LAMBDA-CALCULUS 65

Similarly, we can use an extra rule, {35, to implement 'pointwise' substi-

tution in a list.

BETA RULES

(/31) (Ax.x)Q - Q
(/32) (Ax.E)Q - E if x does not occur free in E
(/33) (Ax.Ay.E)Q - Az.(Ax.{z/y}E)Q if x ¢-y, and z is neither free
nor bound in (E)Q.
(/34) (Ax.(E 1)E 2 )Q - ((Ax.E 1)Q)(Ax.E 2 )Q
(/35) (Ax.[E 1, ... , En])Q - [(Ax.E 1)Q, ... , (Ax.En)Q] for n~O

It is interesting to note that our /35-rule has a certain similarity to the ap-
plicative property of construction in FP, which we mentioned before. This
property can be formulated in our system as the following reduction rule:

Now, if we can distribute an abstraction prefix over a list then we can re-
place /35 by the above rule. That is precisely what we shall do by using the
following two gamma rules instead of {35.

GAMMA-RULES

(yl) ([EI, ... , En])Q - [(E 1)Q, ... , (En)Q] for n~O

(y2) Ax.[E 1, ... , En] - [Ax.E 1, ... , AX.En] for n~O

By adding these new axioms to the a-rules and /31 through {34 we get a
complete system in which every A-expression will be evaluated by reducing
it to its normal form if such exists.
The definitions of reduction (~) and equality (=) will remain the
same as given in Chapter 2 except that the relation - will be defined
by the new set of rules.
To see how this system works let us consider an example. Take, for in-
stance, the algebraic law of composition
66 CHAPTER 4

[f, g] • h = [f • h, g • h],
which is treated as an axiom in FP. Here we can prove this equality as
follows:
[f, g] • h = ((A.x.A.y.A.z.(x)(y)z)[f, g])h
by definition of composition as given in Section 3.1. The right-hand side
{3-reduces (in several steps) to A.z.([f, g])(h)z. Then by using the y-rules
we get
A.z.([f, g])(h)z - A.z.[(f)(h)z, (g)(h)z] -
[A.z.(f)(h)z, A.z.(g)(h)z] = [f • h, g • h]
which completes the proof.
The first y-rule can be used also for selecting simultaneously more than
one element of a list. Namely, for any k-tuple of integers, [i 1, ... , ik], we
have

by yl for any Q. If Q is a list then this is obviously a k-fold simultaneous

projection of Q.
To conclude this section we have to emphasize that lists are given
certain applicative properties by the gamma-rules which do not exist in
other systems. Clearly, the y-rules do not hold for the representation of
lists in pure lambda-calculus which has beeen discussed at the beginning
of this Chapter. It is unlikely that there exists some other representation
which satisfies the y-rules, but we do not know for sure. A common feature
of the y-rules is that they are trying to push the brackets to the outside of
the A.-expressions. This will have an impact on the so called head normal
form in our system.

4.3 List manipulating functions

With the aid of the elementary list operators and predicates defined in the
previous sections we can define other list manipulating functions. For in-
LIST MANIPULATION IN LAMBDA-CALCULUS 67

stance, the append function which joins together two lists satisfies the fol-
lowing equation:

Therefore, we can define it recursively this way:

( (append)x)y = i/(null)x then y else ((&)(A )x)( (append)(~ )x)y
In pure i\-notation, i.e. without syntactic sugar, this takes the form
append= i\x.i\y.(((null)x)y)((&)( A)x)((append)( ~ )x)y

This is a recursive equation so we will use the Y combinator to find its

solution. Thus, we get the definition
append = (Y)i\f.i\x.i\y.( ( (null)x)y)( (&)(A )x)( (f) ( ~ )x)y

This definition will be used for the computation of the value of append for
some arguments, say, [a,b,c] and [d,e], in such a way as if we had written
(i\append.( (append)[a,b,c])[ d,e])(Y)i\f.i\x.i\y.( ( (null)x)y)

((&)(A )x)( (f) ( ~ )x)y

which would reduce to
[a,b,c,d,e]
as required.
Another important function is map which distributes the application
of a function over a list. More precisely, it has this property:

So, it can be defined recursively as

((map)f)x = i/(null)x then[] else ((&)(f)(A)x)((map)f)(~)x
Again, by using theY combinator we get the explicit form
map = (Y)i\m.i\f.i\x. ( ( (null)x)[])( (&)(f) (A )x)( (m)f)( ~ )x

which works exactly the way we wanted.

The continued arithmetic operations can also be treated as list oriented
operations. Actually, these operations are the polyadic extensions of the
usual binary operations, because they take an arbitrary number of argu-
68 CHAPTER 4

ments. However, in functional programming they are treated as unary

functions which take only a list as an argument. The length of the list is,
of course, arbitrary. Using the same trick we can define the sum of an ar-
bitrary sequence of integers or integer valued expressions this way:
(sum)x = i/(null)x then 0 else ((+)(l)x)(sum)(~)x
This recursion appears to be similar to the one we have used for map, but
it is, in fact, quite different. In the case of map the function f can be applied
to the first member of the list without any delay. Such a recursion is called
tail recursion, and it can make some progress in the evaluation before un-
winding the entire recursion. In the above definition of sum the application
of the binary addition operator + must be deferred until the end of the list
is found. Therefore, it takes additional space to remember all the pending
additions while going deeper and deeper into the recursion. But, we can
make the definition of sum tail recursive by introducing an 'accumulator'
parameter this way:
((sum)a)x = if (null)x then a
else ((sum)((+)a)(A)x)(~)x

The second argument of sum on the right-hand side of the equation is

( ~ )x which is one element shorter than x. The question is whether the first
argument of sum is evaluated before the recursion continues. If so, then the
first element of the list will be added to the accumulator before the
recursion continues and thus, no addition will be pending during the
recursion. Otherwise, the pending additions will have to be remembered
just as before. Nevertheless, this new definition makes tail recursion pos-
sible provided that the arguments are evaluated from left to right, which is
the normal order. For the computation of the sum of the elements in a list
the accumulator should be, of course, initialized to zero. That is, the sum
of the elements of x will be computed as the value of

((sum)O)x

The product of an arbitrary sequence of numbers can be defined in a sim-

ilar fashion. But, we can also define a generalized function reduce to cap-
ture the similarities in these definitions. Consider namely, the following
definition:
LIST MANIPULATION IN LAMBDA-CALCULUS 69

(((reduce)a)b)x = if (null)x then a

else ( ((reduce)( (b )a)( A )x)b )( ~ )x

where a is the accumulator parameter and b is a binary operator. This

will be written as
reduce = A.a.A.b.A.x. ( ( (nu//)x)a)( ((reduce)( (b )a)( A )x)b )( ~ )x
and hence,
sum= ((reduce)O)+

prod = ((reduce) 1) *
Both of these functions are well-defined for the empty list, which is quite
reasonable in standard mathematics.
In order to get a flavor of list manipulation in our extended lambda-
calculus we show a few more examples. We shall use syntactic sugar while
omitting the fairly trivial translations to pure lambda-notation.

Examples

The inner product of two vectors:

((inner)x)y = (sum)((map)prod)((pairs)x)y

where
((pairs)x)y = if (null)x then[]
else ( (&)[ (A )x,( A)y]) ((pairs)(~ )x)( ~ )y

The length of a list:

(length)x = if (null)x then 0 else (succ)(length)( ~ )x
or more concisely
(length)x = (sum)((map)A.z.l)x
which corresponds to our intuitive way of counting the elements of a list.
The set of subsets (i.e., the powerset) of a set:
(powerset)x = if (null)x then [[]]
70 CHAPTER 4

else ( (append)(powerset)( ~ )x)( (map)(&)(" )x)(powerset)( ~ )x

Here the first argument of map is an application of the binary function &
to a single argument (" )x. But this gives a unary function which is fine with
map. Actually, the function powerset works on lists rather than sets. In a list
repetitions are permitted and they cannot be filtered out, since the equality
problem of A-expressions is unsolvable in general.
The last element of a list:
(last)x = if (null)(~ )x then(" )x else (last)(~ )x
The reverse of a list:
(reverse)x = if(null)(~)x then x else ((append)(reverse)(~)x)[(")x]

A sequence of integers from 1 to n:

(iota)n = if n= 1 then [ 1] else ( (append)(iota)(pred)n)[n]
or equivalently
(iota)n =if n=1 then [1] else ((&)1)((map)succ)(iota)(pred)n
A sequence of integers from n down to 1:
(down)n = if n= 1 then [1] else ((&)n)(down)(pred)n
Sorting of a list of numbers:
(sort)x = if (null)x then [] else ((insert)(" )x)(sort )(-~ )x

where
( (insert)a)x = if (null)x then [a]
else ifa~(")x then ((&)a)x
else ( (&)(" )x)( (insert) a)(~ )x
A somewhat tricky example is finding the permutations of a list. We shall
use two auxiliary functions. The first will separately remove each element
in turn from a list. For'instance,
(removeone )[a,b,c]

will produce the nested list

LIST MANIPULATION IN LAMBDA-CALCULUS 71

[[b,c ],[ a,c ],[ a,b ]]

This function can be defined as follows

(removeone)x = if (null)x then []

else ((&)( ~ )x)((map)(&)( A )x)(removeone)( ~ )x

The next step would be to compute the permutations of those shorter lists
which are produced by removeone. This we can do by computing

((map )permute) (removeone )[a,b,c]

which yields
((map)permute )[[b,c],[a,c],[a,b]]

hence,
[(permute) [b,c ],(permute) [ a,c ],(permute) [a,b ]]

that is,
[[[b,c],[c,b]], [[a,c],[c,a]], [[a,b],[b,a]]]

provided that permute is working correctly. Now, we need another function

to put back here the missing elements. That is, we want

((put back) [a,b,c]) [[[b,c ],[ c,b ]],[[ a,c ],[ c,a]],[[a,b ],[b,a]]]

to produce the list

[[ a,b,c ],[a,c,b] ,[b,a,c ],[b,c,a],[ c,a,b ],[ c,b,a]]

Such a function can be defined this way:

( (putback)x)y = if (null)x then []

else ((append)( (map)(&)( A )x)( A )y)( (putback)( ~ )x)( ~ )y

Hence, the permutations can be computed by the function

(permute)x = if (null)(~ )x then [x]

else ((put back )x) ((map )permute )(removeone )x

72 CHAPTER 4

Nested lists can be used, of course, to represent matrices, i.e. n-dimensional

arrays. The transposition of a two-dimensional rectangular array in this
representation can be performed by the following function:
(transpose)x = if(null)(A)X then[]
else ((&)((map)" )x)(transpose)( (map)~ )x

List structures are indeed very useful in many applications. They are con-
sidered fundamental in LISP and in other functional languages. It seems
natural that they should be treated as primitive objects in .\-calculus, as
well, but in order to do so it was necessary to add new reduction rules to
the system. By using the y-rules and a few primitive list operators list ma-
nipulation is quite simple in lambda-calculus.

4.4 Dealing with mutual recursion

The use of lists as primitive objects in .\-calculus has some other benefits,
too. For one thing, they allow for an effective vectorization of our calculus.
This, besides its mathematical elegance, also has some practical advantages
as can be seen from the following treatment of mutual recursion. A similar
approach has been used by Burge [Burg75] without the full list manipulat-
ing power of our extended calculus.
Nonrecursive function definitions can be treated in lambda-calculus in
a fairly simple manner. Assume namely, that we have a sequence of defi-
nitions of the form

where fi are variables (function names) and ei are .\-expressions. Assume

further, that Eisa .\-expession containing free occurrences of the variables
LIST MANIPULATION IN LAMBDA-CALCULUS 73

fi (i = 1, ... ,n). Now, the value of E with respect to the given definitions can
be computed by evaluating the combined expression

Thus, the set of equations can be treated as mere syntactic sugar having a
trivial translation into pure lambda-notation. This simple-minded ap-
proach, however, does not always work. As can be seen from the given
translation, the form of the combined expression reflects the order of the
equations. Therefore, a free occurrence of fi in ei will be replaced by ei if
and only if i<j.
In other words, previously defined function names can be used on the
right-hand sides of the equations, but no forward reference can be made to a
function name defined only later in the sequence. This clearly excludes mu-
tual recursion, which always involves some forward reference. If, for in-
stance, f 1 is defined in terms of f 2 (i.e., f2 occurs in e 1) and vice versa then
the forward reference cannot be eliminated simply by changing the order
of the equations.
It should be clear that in the absence of mutual recursion the equations
can be rearranged in such a way that no forward reference occurs. Imme-
diate recursion should not be a problem, because it can be resolved with
the aid of theY combinator. Now, we will show that mutual recursion can
be resolved fairly easily in our extended lambda-calculus. In particular, the
list manipulating power of our calculus is very helpful in working with a list
of variables. Actually, we can use a single variable to represent a list just as
is done in vector algebra. Hence, the solution of a set of simultaneous
equations can be expressed in a compact form with a single occurrence of
the Y combinator. First, we illustrate this method through an example.
Consider the following mutual recursion defining two number-
theoretic (integer type) functions, g and h:

(g)n =if (zero)n then 0 else (( + )(g)(pred)n)(h)(pred)n

(h)n =if (zero)n then 1 else ((*)(g)n)(h)(pred)n

In our 'sugar free' A-notation these are written as

g = An.(((zero)n)O)(( + )(g)(pred)n)(h)(pred)n

h = An.(( (zero)n) 1)( (*)(g)n)(h)(pred)n

74 CHAPTER 4

We introduce a new variable F to represent the ordered pair [g,h], and re-
write our equations using (1 )F instead of g, and (2)F instead of h.
(1 )F = An.( ( (zero)n)O)( ( + )( (1 )F)(pred)n( (2)F)(pred)n

(2)F = An.(( (zero)n) 1)( (*)( (1 )F)n)( (2)F)(pred)n

By combining the two equations we get
F = [ An.( ( (zero)n)O)( ( + )( (1 )F)(pred)n)((2)F)(pred)n,
An. ( ((zero )n) 1)( (*)( ( 1)F)n)( (2)F)(pred)n]
Abstracting with respect to F on the right-hand side yields the following
fixed point equation:
F = (Af.[An.( ( (zero)n)O)(( + )( (1 )f)(pred)n)( (2)f)(pred)n,
An.( ((zero )n) 1) ( (*) ( ( 1)f)n) ( (2)f)(pred)n])F
whose solution is given by
F = (Y)Af.[An.( ((zero)n)O)( ( + )( (1 )f)(pred)n)( (2)f)(pred)n,
An(((zero)n)1)((*)((1)f)n)((2)f)(pred)n]
This solution can be used to compute, for example, the value of (F)3,
which is the same as [(g)3, (h)3]. The computation can be performed by
using our reduction rules only.
The same technique can be used in general for any set of simultaneous
recursion equations
fl = el

First we rewrite each ek by substituting (i)F for the free occurrences of fi

(1 ~ i ~ n). The resulting expressions will be denoted by Rk which form
an ordered n-tuple

So, we get the defining equation

LIST MANIPULATION IN LAMBDA-CALCULUS 75

in place of the original ones. Abstracting with respect to F on the right-

hand side yields
F = (;\f.[r 1, ... , rn])F
where each ri is obtained from Ri (i= 1, ... , n) by substituting f for F. The
solution of this fixed-point equation is expressed by

In order to prove that this is a correct solution we apply our reduction rules.
The right-hand side of the last equation reduces to
(;\f.[r 1, ... , rn])(Y);\f.[r 1, ... , rn]
by the definition of the Y combinator. This further reduces to
([Af.r 1, ... , Af.rn])(Y)Af.[r 1, ... , rnJ
by y2, and then to
[(;\f.r 1)(Y);\f.[r 1, ... , rn], ... , (;\f.rn)(Y);\f.[r 1, ... , rn ]]
by yl. This is clearly ann-tuple which should be equal to F. Indeed, its i-th
component is
(;\f.r)(Y);\f.[r 1, ... , rn]
which implies that

provided that F satisfies the equation

F = [RI, ... , Rn]
and this completes the proof.
Now, the value of any A-expression E with respect to the given recur-
sive definitions is computed as the value of the combined expression
(AF.G)(Y);\f.[r 1, ... , rnJ
where
G = (Af 1 ... (Afn.E)(n)F ... )(l)F
76 CHAPTER 4

This formula is correct also for nonrecursive definitions but, of course, the
simple-minded approach described at the beginning of this section is more
efficient. Therefore, an optimizing compiler should treat recursive and non
recursive definitions separately. For that purpose, one can compute the
dependency relation between the given definitions and check if it forms a
partial order on the set functions fi (i = 1, ... , n). If so, then no mutual
recursion is present and the definitions can be arranged in a sequence
suitable for the simple-minded solution. Otherwise, one should try to iso-
late the minimal sets of mutually recursive definitions and solve them sep-
arately, before putting them back to their proper place in the sequence.

4.5 Computing with infinite lists

Infinite lists occur in a natural way as the output of nonterminating com-

putations. Such comnutations are not necessarily wrong but, of course,
they would sooner or later run out of time and/ or space. Nevertheless,
their finite initial portion may contain valuable information and in many
cases an a priori upper bound on their lengths seems rather artificial. For
instance, we may want to write a program to generate a sequence of natural
numbers, square numbers, or prime numbers, without limiting the length
of the list in advance. Such open ended lists are relatively easy to generate,
but it is far more difficult to use them as intermediate results.
The difficulty arises when we try to pass a potentially infinite list as
an argument to a function. The application of the function cannot be de-
layed until the entire list is computed, because it may take an infinite
amount of time. But how many elements are actually needed for the com-
putation of the function? The answer. depends on the function itself and
there is no general rule. For instance, the special function null needs only
the first element to see that the list is not empty. Similarly, the " and the
~ functions can be computed as soon as the first element appears. On the
other hand, the sum of an infinite sequence can never be totally computed,
but we may compute partial sums to approximate the result. So, we may
consider partial application of certain functions, which is related to the idea
LIST MANIPULATION IN LAMBDA-CALCULUS 77

of tail recursion. (See the discussion of the continued arithmetic operations

in Section 4.3.)
The best way to deal with potentially infinite lists is to follow a so
called demand-driven evaluation strategy. This means that we compute
only as much of an argument as is indeed necessary for the computation
of the given function. This also implies that the order of the evaluation
should be top-down, or outside-in, so that the function is examined first,
before any of its arguments is evaluated. This clearly corresponds to the
normal order evaluation in standard lambda-calculus. But here we have
some new problems due to the presence of special functions and their
specific reduction rules.
The power of demand-driven evaluation may be characterized by the
fact that it allows for non-strict functions. A function is called non-strict
if its value is well-defined for some undefined arguments. For instance, the
conditional expression

if A then B else C

is considered non-strict in its second and third arguments, because B need

not be evaluated when A happens to be false, and C need not be evaluated
in the opposite case. Therefore, B or C may occasionally be undefined
without making the whole expression undefined.
For another example, the operation of multiplication may be defined
as a semi-strict function, i.e. non-strict in its second argument. All we have
to do is to compute the first argument first, and check if it is zero. If it is
not zero then we have to evaluate the second argument, as well. However,
if it is zero, we can return the value zero as a result regardless of the second
operand. Thus, the second argument may be undefined (involving a non-
terminating computation) while the function is still well-defined.
Even if an argument is actually used for the computation of the value
of a function, it may not have to be fully evaluated for that purpose.
Therefore, the evaluation of an argument should be suspended as soon as
possible, that is, when the partially evaluated argument is sufficient for the
computation of the function.
Now, let us see how to compute infinite lists in our extended lambda-
calculus. Take, for instance, an infinite list of zeros. We can define this list
recursively as follows:

zeros = ((&)O)zeros
78 CHAPTER 4

which means that the infinite list of zeros remains the same when one more
zero is attached to it. But this is a fixed-point equation whose solution is

zeros = (Y)Az.((&)O)z
Indeed, the right-hand side reduces to

(Az. ( ( &)O)z) (Y)Az.( (&)O)z

and then to

((&)O)(Y)Az.( (&)O)z

which clearly generates an infinite list of zeros just by using the given re-
duction rules. Unfortunately, the form in which this list appears is not
suitable for using it as an argument to our 'built-in' functions. For instance,
the special function A cannot work on this list, because it does not have the
appropriate form. But, this is easy to fix. All we have to is to adjust the
reduction rules of the list operators to make them work with partially
evaluated lists. So, the rest of a list may be an arbitrary A-expression which
has not been computed yet. This implies, of course, that the rest of the list,
which will be denoted by R in the following rules, may not reduce to a
'list-tail' or may not have a normal form at all. Nevertheless, we can define
our list operators as follows:

((&)E)R - [E, R

(nuli)[E, R - false
(A)[E, R- E

(~)[E, R- R

(l)[E, R- E

(k)[E, R - ((pred)k)R fork> 1

These rules represent the lazy extensions of the defining rules of the given
functions. With the aid of these rules we can easily compute, for example,
(5)zeros, which is the fifth member of the infinite list, that is, 0.
The infinite list of natural numbers is defined by the iteration of the
successor function. The iteration of a function f to some argument x
generates the infinite list
LIST MANIPULATION IN LAMBDA-CALCULUS 79

[x, (f)x, (f)(f)x, ... ]

which can be defined recursively as
((iterate )f)x = ((&)x)( (iterate )f) (f)x
The explicit form of this function is given by
iterate= (Y)Ai.AL\x.((&)x)((i)f)(f)x
So, the infinite sequence of natural numbers is
numbers = ((iterate )succ )0
A more interesting example is the list of Fibonacci numbers
[1, 1, 2, 3, 5, 8, 13, ... ]
where each element is the sum of the previous two. An auxiliary function
can be used, which builds such a sequence from two arbitrary initial values.
( (bui/d)x)y: = ((&)x)( (build)y) ( ( + )x)y
Hence,
build= (Y)Ab.Ax.Ay.((&)x)((b)y)(( + )x)y
and
fibonacci = ((build) 1) 1
Another way of defining the fibonacci list is based on the following obser-
vation. By adding together the corresponding elements of two Fibonacci
sequences with the first sequence being offset by one, we get the tail of the
same Fibonacci sequence as a result. That is,
[1, 1, 2, 3, 5, 8, 13, ... ]
+[1,1,2,3,5, 8,13, 21, ... ]
= [1, 2, 3, 5, 8, 13, 21, 34, ... ]
This is used in the following definition:
fibonacci =: ((append)[ 1,1]) ((map )sum)( (pairs )fibonacci)(~ )fibonacci
where
80 CHAPTER 4

((pairs)x)y = ((&)[(A)X, (A)y])((pairs)(~)x)(~)y

Note that in the definition of pairs we do not have to test for the empty list,
since neither fibonacci nor (~)fibonacci is empty. An interesting feature of
this definition is the application of (map )sum to an infinite list of pairs.
Clearly, this application should not wait for the computation of the whole
list. As soon as the first pair gets formed by the function pairs, its sum
should be computed and appended to the list [ 1, 1]. This yields an inter-
mediate result whose first three elements are 1, 1, 2, and thus, the compu-
tation of pairs may continue. This method of building the infinite list of
Fibonacci numbers works only with a demand-driven evaluation strategy.
Otherwise, the application of the functions would be delayed until the ar-
guments are totally computed, which would obviously kill the recursion in
this case. This is so, because the list fibonacci is used here repeatedly as a
partially evaluated argument while it is being created.
It is important to note that we do not have to take extra measures in
order to deal with infinite lists. The demand-driven evaluation technique
as described above will automatically give us this opportunity for free.
Later we shall see that a strictly demand-driven evaluation technique usu-
ally involves some loss in the efficiency of computations. Therefore, the
overall simplicity of handling infinite lists does have a price.
For another example with this flavor, consider the computation of the
prime numbers using the Sieve of Eratosthenes. First, an auxiliary function
is defined to filter out the multiples of a number from a list of numbers.
((filter)n)x = if (null)x then[]
else if( (mod)( A )x)n = 0 then ( (filter)n)( ~ )x

else ( (&)(A )x)( (filter)n)( ~ )x

The mod function is one of our primitive functions defined on integers. It

gives the remainder of the division of its first argument by its second ar-
gument. Now, sieve will be defined as
(sieve)x = ((&)(A)x)(sieve)((filter)(A)x)(~)x

and hence, the infinite list of prime numbers is computed by

primes== (sieve)((iterate)succ)2
LIST MANIPULATION IN LAMBDA-CALCULUS 81

Observe again that sieve can be computed only in a demand-driven fashion.

Actually, the boundaries between the computation of a function and the
computation of its argument(s) are completely blurred by demand-driven
evaluation.
Infinite lists are also useful for approximations in numerical analysis.
Take, for example, the evaluation of the exponential function using its
Taylor series. The rest of the infinite series starting with the (n-1 )-st term
is

where

Hence, it can be defined recursively as

rest = ;\t.;\n.Ax.( (&)t)( ((rest)( (*)t)( (/)x)n)(succ)n)x

Now, if we want to stop at the first such term that is smaller than a given
e then we can write

rest = At.Ae.An.Ax. ( ( ( ( <)t)e )[]) ( (&)t)( ( ((rest) ( (*)t)( (/)x)n)e )(succ)n)x

Hence, the approximate value of the exponential function ex for x > 0 can
be computed by
exp =: ;\x.(sum)((((rest)l)e)l)x

As can be seen from these examples infinite lists can be defined easily via
recursion. Infinite objects and, in particular, infinite lists occur naturally in
mathematics but they are poorly represented in conventional programming
languages, because their proper treatment requires lazy evaluation which
seems to be less efficient than the traditional approach. Clearly, there is a
trade-off between the expressive power of the language and the efficiency
of its implementation.
Another problem here, and also in most functional languages including
LISP, is the lack of distinction between fixed-sized arrays and dynamically
changeable lists. The fixed size of a finite array makes it easy for an im-
plementation to support a direct access to its elements. List structures are
more flexible, hence, a direct access to their elements is much more difficult
to obtain. A uniform treatment of arrays and lists must be prepared for the
82 CHAPTER 4

worst case which makes it less efficient. Therefore, it seems desirable to

introduce separate data structures for arrays and lists, and use arrays rather
than lists whenever possible.
Sequential input and output files can be treated naturally as infinite
lists. They are often called streams in this context and they are evaluated
lazily just like any other infinite objects. One must be careful, however,
with their implementation to avoid side effects.

Exercises
4.1 Define list manipulating functions to do the following:
(a) Sort a list of numbers using quicksort,
(b) Merge, i.e. shuffle two lists,
(c) Cut off the last element of a list,
(d) Rotate a list to the right or to the left,
(e) Compare the elements of two lists,
(f) Multiply together two square matrices.
4.2 A Curried function with precisely n arguments will be changed into the
corresponding list oriented function (which takes a list of lenght n for an
argument) by the following combinator
uncurry =
A.n.A.g.A. v.( ( (zero)(pred)n)(g) ( 1)v) ( ( ( (uncurry)(pred)n)g)v)(n)v
So, for instance, with n = 3 we get
(((uncurry)3)F)[a,b,c] = (((F)a)b)c
Conversely, a function whose argument is an array (list) of n elements will
be changed into the corresponding Curried function by the following
combinator:
curry= A.n.A.f.(((zero)(pred)n)A.x.(f)[x])

( (curry) (pred)n)A.w .A.x. (f) ( (append)w )[x]

Use these combinators to

(a) unCurry the function gcd of Exercise 3.8,
LIST MANIPULATION IN LAMBDA-CALCULUS 83

(b) Curry a list oriented function defined on ordered triples,

(c) prove that each of the above two combinators is an inverse of
the other.
4.3 Are these two function definitions equivalent?
diff = AX.(( (null)x)O)((- )(1 )x)(difj)( ~ )x
diff = AX.(( (null)x)O)(((null)( ~ )x)(l )x)(difj)((&)( (-)( 1)x)(2)x)( ~ )( ~ )x
CHAPTER FIVE

RULE-BASED SEMANTICS OF A-EXPRESSIONS

5.1 Program transformation as a computation

The semantics of a language L can be defined in general as a mapping p,

from L toM, where M is the set of 'meanings'. For the sake of simplicity,
the elements of L will be called syntactic objects while the elements of M
will be called semantic objects. Our syntactic objects are the
A-expressions which form a context-free language to be denoted by A.
Thus, for every A-expression E e: A its meaning will be defined as the value
of p,(E) e: M.
So far, we have defined only A but we have not specified the set M.
In order to have a formal definition we need some symbolic representation
for the elements of M, that is, we need a notational system (a language)
also for M. Moreover, we must have an intimate knowledge of the se-
mantic objects belonging in M, otherwise they would not help us under-
stand A.
The language we know best is, of course, our native tongue, but the
spoken languages do not have the mathematical precision we need. A
precise mathematical notation would be the ideal thing. We all agree, for
example, on the meaning of an integer in decimal notation. We can also
rely upon the decimal notation for representing rational numbers and, by
allowing infinite sequences, real numbers, as well. For a reasonable repre-
RULE-BASED SEMANTICS OF ;\-EXPRESSIONS 85

sentation of the meaning of more complex A-expressions we need, of

course, more complex semantical objects. The question is what kinds of
mathematkal objects are needed for the definition of the meaning of arbi-
trary A-expressions, and how to denote those objects.
What if we choose a well-defined subset A0 c A for the language M?
This may sound strange (circular) at first, but it makes a lot of sense.
Numbers, for one thing, are represented here in decimal notation which is
no different from their representation in standard mathematics. The same
is true for other constants and variables. So, we do not have to worry about
those primitive objects and thus, we can concentrate on the representation
of functions and other more complex structures. The question is what other
specific constructs from A are to be used for a reasonable semantical de-
scription of the entire A. We want to define the meaning of A-expressions
in terms of those special kinds of A-expressions whose meaning is (assumed
to be) self-evident. The mapping JL would then assign a member of this
subset A0 to every expression in A.
For any definition of JL the meaning must be invariant under {3-equality.
Therefore, the simplest solution in this regard is to choose the set of
A-expressioi'IS being in normal form forM= A0 c A, and define the mapping
JL as reducil.rg to normal form.
A summary of the reduction rules which form the basis of this defi-
nition will be given below. There is, however, a serious problem with this
definition of JL, since it is a partial rather than a total function on A and,
unfortunately, there are many intuitively meaningful A-expressions without
normal form. The most important ones are the recursive definitions in-
volving the Y combinator giving rise to an infinite reduction sequence. In
spite of that, the simplicity of this approach makes it worth studying. As a
matter of fact, Church himself regarded A-expressions without normal form
as meaningless.
For the sake of completeness we also summarize here the syntax of the
language A.

THE SYNTAX OF A-EXPRESSIONS:

<A-expression>::= <variable> I <constant> I <abstraction> I

<application> I <list>
86 CHAPTER 5

<variable>::= <identifier>
<constant>::=<number> I <operator> I <combinator>
<abstraction>::= 'A<variable>.<'A-expression>
<application>::= (<'A-expression>) <'A-expression>
<list>::= [<'A-expression> <list-tail> I []
<list-tail>::= ,<'A-expression> <list-tail> I]
<operator>::= <arithmetic operator> I <relational operator> I
<predicate> I <boolean operator> I <list operator>
<arithmetic operator>::= + I - I * I I I succ I pred I mod
<relational operator>::=< I $ I = I ~ I > I ::~
<predicate>::= zero I null
<boolean operator>::= and I or I not
<list operator>::= A I - I & I map I append
<combinator>::= true I false I Y

This syntax does not specify what an identifier or a number looks like, but
any reasonable definition would do, and we are not interested in their de-
tails. What we are interested in right now is the meaning of 'A-expressions
which will be described with the aid of a set of reduction rules.
These reduction rules represent meaning preserving transformations on
'A-expressions and the corresponding equality (in the sense of Definition
2.6) divides A into equivalence classes each of which consists of
'A-expressions with the same meaning. If an equivalence class contains a
member from A0 then this member is unique up to a-congruence and it
represents the meaning of every 'A-expression in that class. The following
is a summary of our reduction rules.

ALPHA RULES

(al) {z/x}x -+ z
RULE-BASED SEMANTICS OF A-EXPRESSIONS 87

(a2) {z/x}E - E if x does not occur free in E

(a3) {z/x}Ay.E - Ay.{z/x}E for every A-expression E, if x ¥:. y ¥:. z.
(a4) {z/x}(E 1)E 2 - ({z/x}E 1){z/x}E 2
(aS) {z/x}[E 1 , ••• ,En] - [{z/x}E 1, ••• , {z/x}E,] for n~O

BETA RULES

({31) (Ax.x)Q - Q
({32) (Ax.E)Q - E if x does not occur free in E
({33) (Ax.Ay.E)Q - Az.(Ax.{z/y}E)Q if x ¥:-y, and z is neither free
nor bound in (E)Q
({34) (Ax.(E 1)E 2 )Q - ((Ax.E 1)Q)(Ax.E 2 )Q

GAMMA RULES

(yl) ([El , ... , En])Q - [(E 1)Q, ... , En)Q] for n~O

(y2) Ax.[E 1, ••• ,En]- [Ax.E 1, ••• , Ax.En] for n~O

LIST MANIPULATING FUNCTIONS

(A)[]-[], (A)[E 1,E 2 ••• ,En]- E1

(~)[]- [], (~)[EI,Ez, ... ,En]- [Ez, ... ,En]
((&)A)[] - [A], ((&)A)[EI ,... ,En] - [A, E1 ,... ,En]
(null)[] - true, (null)[E 1, ••• ,En1 - false for n~ 1
((map)F)[] - [], ( (map)F)[E 1, ... ,En] - [(F)E 1,... ,(F)En]
((append)[])[E 1, ••• ,En] - [E 1, ••• ,En]
((append)[A 1,. •• ,Am])[E 1, ••• ,En]- [AI, ... ,Am, E1, ... ,En]

PROJECTIONS
88 CHAPTER 5

(k)[E 1, ••• ,En] - ((pred)k)[E 2 , ••• ,En] for k;?:2, n;?: 1.

COMBINATORS

((true)A)B - A, ((false)A)B - B
(Y)E- (E)(Y)E

OPERATORS AND PREDICATES

((and)true)true - true ( (and)true)false - false

( (and)false)true - false ((and)false)false - false

((or)true)true- true ((or)true)false - true

((or)false)true - true ((or)false)false - false

(not)true - false (not)false - true

(zero)O - true (zero)n - false for n ::1- 0

((+ )m)n - k if m,n,k are numbers and k = m + n

( (- )m)n - k if m,n,k are numbers and k =m- n
((*)m)n - k if m,n,k are numbers and k = m * n
( (/)m)n - k if m,n,k are numbers and k = m + n
((mod)m)n - k if m,n,k are numbers and k = m mod n

(succ)m - n if m,n are integers and n =m+ 1

(pred)m - n if m,n are integers, m > 0, and n = m - 1

(( <)m)n - true if m,n are numbers and m < n

( ( < )m)n - false if m,n are numbers and m ;?: n

Similar reduction rules are used for the remaining relational operators and
this completes our list.
RULE-BASED SEMANTICS OF A.-EXPRESSIONS 89

Note that most of the above rules are, in fact, 'rule-schemas' rather
than individual rules as they have an infinite number of instances. The
evaluation of a A-expression will be performed by reducing it to its normal
form using the above rules. But this is the same as the execution of the al-
gorithm (or functional program) represented by the expression. So, the
execution of a program can be described as a sequence of reduction steps
where each step is a single application of some reduction rule. This means
that the execution of a program can be defined in terms of certain transf-
ormations performed directly on its source form. Actually, the program and
its result are considered here as two equivalent representations of the same
object.
According to the Church-Rosser Theorem, the order in which the re-
duction steps are performed does not matter provided that the reduction
process terminates after a finite number of steps. From a practical point
of view, however, it would be desirable to minimize the number of steps
that are needed for the evaluation of a A-expression. That is essentially the
same as minimizing the execution time of a functional program, which
cannot be done in general. Nevertheless, there are various techniques for
improving the run-time efficiency of a program. The efficiency of the
function evaluation process represents a major issue for the implementa-
tion techniques to be studied in Chapters 6 and 7.
This reduction-based approach to the semantics of A-expresions is
closely related to the so called 'operational semantics' of programs. Indeed,
the reduction process is a well-defined procedure for every A-expression
even if it does not terminate. It is nondeterministic though in the sense that
the redex to be contracted in each step may be chosen arbitrarily from
among those that are present in the given A-expression at that point.

5.2 Controlled reduction

In Section 5.1 we identified the meaning of a A-expression with its normal

form. This definition of the meaning is clearly not satisfactory since two
important questions remain to be answered:
1. What to do with A-expressions without normal forms?
90 CHAPTER 5

2. What to do with ,\-expression whose normal forms have no

self -evident meanings?
If every ,\-expression without normal form is considered meaningless then
we are faced with the awkward situation of having meaningful ,\-expressions
containing meaningless subexpressions. Indeed, a ,\-expression involving
the Y combinator may represent a recursive function without having
normal form. But, when it is applied to some other ,\-expression(s) then the
resulting expression may have a normal form which represents the value
of the given function. Thus, the value of the function may be well-defined
for at least some argument(s) without the meaning of the function itself
being defined.
To avoid this conflict we propose the following compromise. We in-
troduce the notion of controlled reduction, which is defined by the following
adjustments to the reduction rules:
The rule for the Y combinator will be changed to
(Y')E - (E)(Y)E
where Y' is a newly introduced combinator. Furthermore, the
fi2-rule will be modified as
({32 1 ) (,\x.E)Q - E' if x does not occur free in E,

where E' is the same as E except that each (if any) occurrence of Y
in E is replaced by Y'.
In order for the new system to work, all recursive definitions should be
written with the aid of the new Y' combinator in place of the original Y.
The latter is disabled in the new system until it gets changed to Y' in a
{32 1-reduction step.
To see how this system works on a simple example consider the fol-
lowing recursive definition of the factorial function.
(fact)n = ifn = 0 then 1 else ((*)n)(fact)(pred)n
which will be written in our ,\-notation as
(fact)n = (((zero)n)l)((*)n)(fact)(pred)n
that is
fact= ,\n.(((zero)n)l)((*)n)(fact)(pred)n
RULE-BASED SEMANTICS OF .\-EXPRESSIONS 91

The solution of this recursion equation will be expressed by

fact= (Y1 )AL\n.(((zero)n)1)((*)n)(f)(pred)n
where the right-hand side reduces to the normal form
An.( ((zero )n) 1)( (*)n)( (Y)Af.An.( ((zero )n) 1)( (*)n)(f)(pred)n) (pred)n
If, however, we supply an argument, say 5, to the function fact then
we get the correct result 120 as can be easily checked by the reader.
The trick of controlled reduction lies in the fact that the Y' combinator
can fire only once, but each time an argument Q is presented to an ex-
pression of the form
(F)(Y)F
an attempt at evaluating the expression
((F)(Y)F)Q
is made, which in turn involves an attempt at substituting the argument Q
for some variable in Y. But that would change Y to Y', since no variable
occurs free in Y.
It is necessary, however, that the expression F has at least one more
abstraction besides the abstraction on the function name that is being de-
fined by the recursion. This is the case in the above example where we have
An besides the Af. Indeed, any well-founded recursion must have a condi-
tion which would terminate the recursion after a finite number of steps.
That condition must depend on the argument(s), otherwise there could be
no change in its truth value throughout the recursion.
It is interesting to note that this modified system is not extensional
since we have
(Ax.Y')Q = (Ax.Y)Q for any Q,
which would imply Ax.Y' = Ax.Y in an extensionally complete system.
(Actually, we may assume that this is the case even though Y' ¢. Y.)
In any case, the meaning function J..t can be defined as reducing to normal
form in the new system. If the normal form of an expression has a subex-
pression of the form (Y)F then this subexpression denotes the fixed-point
of F.
Note that the Y combinator can also be used for defining infinite lists.
For instance, an infinite list of zeros can be defined recursively as
92 CHAPTER 5

zeros= ((&)O)zeros
hence,
zeros= (Y')t.z.((&)O)z
where the right-hand side reduces to
(t.z.( (&)O)z)(Y)t.z. ( (&)O)z
which has the normal form
( ( &)0) (Y)t.z.( (&)O)z
Now, the problem is that a finite projection of this infinite list is not com-
putable in the new system.
First of all, we have to define all list operations in a lazy manner as
discussed in Section 4.5. So, for example, the k-th element of an infinite
list will be obtained by extending the reduction rules of projections to in-
finite lists as follows:
(l)[EI, ... ] -+ El
(k)[E 1, ... ] -+ ( {pred)k)[E 2 , ... ]

All the other list manipulating functions, which are the A, ~, &, null, map,
and append, will also be defined lazily. Actually, the last two need not be
defined as primitives, since they can be defined recursively with the aid of
the others as shown in Section 4.3. Namely,
( (map)f)x = ( ( (null)x)[]) ((&)(f)( A )x)( (map)f)( ~ )x
( {append)x)y = ( ( (null)x)y)( (&) (A )x)( {append)(~ )x)y
Returning to our example, the computation of the k-th element of the in-
finite list of zeros begins with
(k)( (&)O)(Y)t.z.( ( &)O)z "*' ({pred)k)(Y)t.z.( (&)O)z
when k~2. Now, in order to continue this computation we have to change
the Y combinator back to Y'. But that requires further modifications of the
reduction rules, because the /32 1-rule is not applicable in this case, since the
recursively defined infinite list involving the Y combinator is not the oper-
ator but the operand of the given projection. Therefore, we introduce the
following new rules:
RULE-BASED SEMANTICS OF .\-EXPRESSIONS 93

(k)(Y)E -+ (k)(Y')E fork~ 1

(A )(Y)E - (A )(Y')E

(~)(Y)E-+ (~)(Y')E

(nuii)(Y)E -+ (nuii)(Y')E

Recursively defined infinite lists can thus be used as arguments in these

operations and each time the operator bounces into the Y combinator it
will change it back to Y'.
This system appears to be very helpful for program debugging, as it
can catch many errors which would otherwise result in infinite computa-
tions. It is, of course, impossible to detect in general whether or not a pro-
gram would terminate for a given input, because that is equivalent to the
halting problem. In particular, it is essential for our technique that the Y
combinator is represented by a specific token. If it is replaced, for instance,
by the .\-expression

.\y. (.\x. (y) (x)x).\x. (y) (x)x

then our method would not work.

As regards the second question mentioned at the beginning of this
section, the answer is much more difficult. Every .\-expression represents,
in fact, a mapping of A into itself, since application is defined for all
.\-expressions. Therefore, every member of M should also represent a
mapping of M to itself. A naiVe interpretation of type-free lambda-calculus
would also require that every type [M -+ M] mapping be represented by
some .\-expression, i.e. by some member of M. (Note that the value of a
free variable may range over the entire domain M and thus, the set of all
possible meanings of .\-expressions depends not only on A but also on
M.) However, the cardinality of the set of all [M- M] type mappings is
always greater than that of M for any nontrivial M.
So, it should not be surprising that it was very difficult to find a satis-
factory set-theoretical interpretation of type-free lambda-calculus. The
first set-theoretical (or rather lattice-theoretical) model for type-free
lambda-calculus was constructed by Dana Scott while he was trying to
prove the non-existence of such models.
The main idea behind his construction is the restriction of the model
to Scott-continuous functions. Continuity in any topology imposes severe
94 CHAPTER 5

restrictions on the behavior of the functions. Therefore, the set of contin-

uous functions is usually much smaller than that of arbitrary functions. In
fact, it is enough to define the value of a continuous function on a sufficiently
large (and dense) subset of its domain from which the remaining values will
follow. Hence, the cardinality of the set of continuous mappings of M to it-
self is usually much smaller than that of the set of arbitrary mappings with
the same type. Moreover, one can find certain domains which have the
same cardinality as the set of their continuous mappings to themselves.
Such models are widely used in Denotational Semantics where the
function p, is defined directly for all syntactic objects. Of course, the
meaning of A-expressions must still be invariant under /3-equality, and the
difficulty of the proof of this invariance largely depends on the definition
of p,. A systematic study of Denotational Semantics goes beyond the scope
of this book, but many fine books on this subject are available in the liter-
ature.

5.3 The functional programming system FP

In the preceding two sections we defined the semantics of A-expressions in

terms of rewriting rules. The semantics of programming languages can now
be defined by translating them to A. This can be done, at least theore-
tically, for every programming language but it is fairly complicated when
dealing with conventional (imperative) programming languages. The
problem is caused mainly by the use of assignment statements. The value
of an expression depends on the values of its variables, which in turn de-
pend on the corresponding assignment statements. Hence, the value of an
expression depends on the order of the execution of those assignment
statements. In other words, the evaluation of the expressions of a con-
ventional programming language is 'history sensitive', which makes their
semantics more complicated.
In pure functional languages there are no assignment statements,
and the order of the execution of the operations occurring in some
expression is restricted only by their partial order defined implicitly
by the structure of the given expression.
RULE-BASED SEMANTICS OF A-EXPRESSIONS 95

This implicit partial order is due to the data-dependencies of the oper-

ations. For example, the addition obviously precedes the multiplication in
the expression (a+b)*c-d while they can be performed in any order in
(a+b)-(c*d). The subtraction is, of course, the last operation in either case.
The foundations of a mathematical theory of functional programming
were laid down by John Backus in his Turing Award paper [Back78]. He
developed a functional programming system called FP.
An FP program is simply an expression representing a function that
maps objects to objects.
This means that only first order functions are used in FP which is a signif-
icant difference between type-free lambda-calculus and FP. Another more
formal difference is the lack of object variables in FP. Each function in FP
has only one argument which is not mentioned explicitly in the notation.
There is, however, an obvious similarity between the theory of
combinators and the variable-free approach to function level reasoning ad-
vocated by Backus.
A fixed set of program forming operations, called functional forms or
combining forms are used in FP to form new functions from given ones.
(These are, in fact, second order functions which take functions as argu-
ments and return functions as results.) A formal description of the FP
system is given below.
Given a set of atoms, THE SET OF OBJECTS 0 is defined recursively
as follows.
(a) Every atom is in 0.
(b) The undefined object called bottom, denoted by w, is in 0.
(c) If x 1 , ••• , xn are in 0 and xi =F w for i = 1, ... , n then the se-
quence <x, ' ... ' xn> is also in 0.
(d) <x 1 , ••• , xn> = w if xi = w for some i. (Sequence construction
is bottom preserving.)
(e) The empty sequence < > is in 0.
The application of a function f to some object x is denoted by f:x. (The
symbol : represents the infix apply operator.) All functions in FP are bot-
tom preserving, i.e. f:w = w for all f. The set of functions in FP consists of
primitive functions and composite functions.
96 CHAPTER 5

PRIMITIVE FUNCTIONS
The integers 1, 2, ... (representing selector functions)
k:x produces the k-th element of an object x if x is a se-
quence of at least k elements. Otherwise it returns w.
The tail function
it removes the first element of a sequence; produces w
when applied to a non-sequence object or to the empty
sequence.
id (identity)
id:x = x for all x in 0.
a, s
a adds 1 whiles subtracts 1 from its argument.
eq (test for equality)
eq:x = T if x is a pair of identical objects; eq:x = F if xis
a pair of non-identical objects; otherwise it produces w.
eqO (test for zero)
eqO:O = T.
gt (greater than), ge (greater or equal)
For instance,gt:<5,2> = T,gt:<2,5> = F, etc.
+,-, x, mod (arithmetic operations)
For instance, +:<5,2> = 7, +:<5> = w, etc.
iota (number sequence generator)
iota:n = < 1,2, ... ,n> if n is an integer.
apndl (append left), apndr (append right)
apndf:<a, <x 1, ... , Xn>> =<a, XI, ... , Xn>
apndr:<<xt, ... , Xn>, b> = <xt, ... , Xn, b>
distl (distribute from the left), distr (distribute from the right)
distl:<a, <xt, ... 'xn>> = <<a, xt>· ... , <a, xn>>
distr:<<xt, ... 'xn>, b> = <<xt, b>, ... , <xn, b>>
It is generally assumed that every primitive function returns w when
applied to a wrong argument.
RULE-BASED SEMANTICS OF A-EXPRESSIONS 97

COMBINING FORMS
Composition: f • g
(f • g):x = f:(g:x)
Construction: [f 1, ... , f"]
[f 1, ••• , f"]:x = <f 1:x, ... , f":x>
Conditional: p-f;g
(p-f;g):x = ifp:x=T then f:x else ifp:x=F then g:x else w.
Constant: x
x :y = ify :#= w then x else w.
Apply to all: af
af:x = <f:x 1, ••• ,f:x">' if x = <x 1, ••• , xn>; w otherwise.
Insert: /f
(/f):<x 1> = x 1
(/f): <X 1 , • • • , Xn > = f: <X 1 , (/f): < X2 , .. . , Xn > >.
Now, THE SET OF FUNCTIONS, F, will be defined inductively as fol-
lows:
( 1) Every primitive function is in F.
(2) If f 1 ,
••• , f" are in F and Cis a combining form which takes n
arguments then C applied to f 1, ••• , f" is also in F.
(3) If the expression Dr represents a function in the 'extension' of
F by the symbol f, i.e. if Dr is in F provided that the symbol f is
treated as a primitive function, then the function defined
(recursively) by the equation f = Dr is also in F.
( 4) Nothing else is in F.
Clause (3) has the same purpose as (recursive) function declarations in
conventional programming languages. The function symbol f represents
the name of a user defined function. The above formalism is somewhat
strange because the composition of combining forms with variable argu-
ments is not defined in FP. Combining forms can be applied only to first
order functions. Therefore, a variable function name should not be used
as an argument to a combining form, nor should it be used as a formal pa-
rameter in a combining form. (Note that while functions are always unary
in FP, most combining forms have multiple arguments.)
98 CHAPTER 5

As a matter of fact, the composition of combining forms can be de-

fined as a third order function so we can generate an infinite number of
combining forms, but then we have to define precisely the substitution of
arguments for formal parameters. If, for instance, f, g, and p are used as
formal parameters in the combining form p - f ; g then its naive substi-
tution for g in f • g gives us f • (p - f ; g) which results in a confusion of
possibly different parameters denoted by the same symbol f. A purely
combinatorial, i.e. variable free treatment of combining forms is, of course,
possible (cf. the discussion of FFP systems in [Back78]) but it may get
fairly complicated because of the large number of the individual reduction
rules that are necessary for their definitions.
The design of FP was meant to avoid the complications of substitution
by using only unary functions and a small set of combining forms to be
treated as combinators. The 'extension' ofF by introducing f as the only
variable is relatively simple, since no confusion of variables may occur.
This formalism, however, becomes rather involved when dealing with mu-
tual recursion or other more complex combinators.
The most important feature of the FP system is its algebraic nature.
This means that FP programs are treated as mathematical objects with a
fixed set of 'program forming operations' defined on them. These program
forming operations, i.e. combining forms satisfy certain algebraic identities
that can be inferred from their definitions and from the definitions of the
primitive functions. There is, of course, an infinite number of algebraic
identities that can be derived in this fashion. The question is whether we
can select a few of them from which all the rest follows. In other words,
we would like to have a finite set of fundamental identities that can be
treated as axioms. The following is a list of axioms proposed by Backus.

AXIOMS

(A 1) h • (p - f ; g) = p - h • f ; h • g

(A2) (p - f ; g) • h = p • h - f • h ; g • h

(A3) /f • [gt, ... , gn] = f • [g t• /f • [g2, ... , g,J)

(A4) /f•[g]=g

(AS) [f, g) • h = [f • h, g • h]
RULE-BASED SEMANTICS OF .\-EXPRESSIONS 99

(A6) 1 • [f, g] = f , in the domain of definition of g

(A 7) 2 • [f, g] = g , in the domain of definition off
(A8) p - (p - q; r); s =p - q; s

(A9) af • apndl • [g, h] = apndl • [f • g, af • h]

(A10) /f • apndl • [g, h] =f • [g, /f • h]

(A 11) x • g = x , in the domain of definition of g

Clearly, there are many interesting identities that can be derived from these
axioms, but it is not clear whether every valid identity is derivable in this
system. In other words, the completeness of the axiom system would require
a rigorous proof. Also, it is necessary to show that the axiom system is
consistent and that the axioms are independent of each other.
Note that the notion of a 'valid identity' may be defined in various
ways. If an identity is considered valid if and only if it is derivable from a
given set of axioms then the question of completeness is meaningless. But,
if we consider extensional, i.e. semantical equality as the basis for the va-
lidity of an identity of functional expressions then the question of finding
the proper set of axioms becomes more interesting. A fully axiomatic
treatment of the FP system appears to be rather difficult. It is possible to
develop a denotational semantics for FP and use the model for the proof
of consistency, etc. This approach has been taken by several authors but
the proofs are still fairly complicated.

5.4 Translation of functional programs to A-calculus

Here we shall define the semantics of FP programs by translating them
into our extended A-notation. The algebraic identities discussed in the
previous section should then be inferred from our reduction rules. Indeed,
the axioms (Al)-(A11) are all derivable in A-calculus, which gives us a
positive answer to the question of their consistency, but the question of
completeness remains open.
100 CHAPTER 5

A very useful feature of functional languages is their lack of side ef-

fects that has been achieved by the elimination of the assignment state-
ment. This makes their translation to i\-calculus relatively simple. As a
matter of fact, many of the well known techniques for implementing func-
tional languages make use of a translation to some variant of the i\-notation
and/ or combinators.
The direct representation of lists in our i\-notation makes the trans-
lation of functional programs even simpler. Our y-rules are closely related
and have actually been inspired by the properties of the combining form
of construction as defined by Backus.
There is, however, a major difference between the semantics of the
two languages: All functions in FP are strict, while most functions in
i\-calculus are considered non-strict. Therefore, we shall use here a non-
strict extension of FP.
We can assume that every atomic object of FP has a corresponding
atom in A. (The undefined object may also be represented by a specific
atom, i.e. constant symbol.) Sequences will be represented by lists.
Let us consider first the translation of primitive functions. The selector
functions and tail have obvious counterparts in A. The identity function
can be translated as the I combinator. The test for equality is non-trivial in
A since the equality of two i\-expressions is undecidable in general. Nev-
ertheless, a strict version of eq, which would completely evaluate its argu-
ments to see if they are identical objects, can be designed easily. (Equality
is decidable for i\-expressions with normal form.) The function eqO corre-
sponds to zero while apndl corresponds to our & operator.
In order to translate the arithmetic and the relational operators, the
ordered pair representation of their operands must be changed to
Currying. But that is easy. Take, for instance, the ge function. The
i\ -expression

i\x. ( (;?:) (1 )x)(2)x

with our Curried ;?: operator represents a correct translation of the pair
oriented ge. Indeed, the application of this expression to an ordered pair
[A,B] reduces to ( (;?: )A)B as required.
The function iota can be translated to
(Y 1 )i\i.i\n.( ( ( ( =)n) 1)[ 1])( (append)(i)(pred}n)[n]
RULE-BASED SEMANTICS OF .\-EXPRESSIONS 101

which is a recursive definition of the same function in A. Similarly, distl can

be translated to
(Y 1 );\.d.;\.x.;\.y.( ( (null)y)[])( (&)[x,( A )y ])( (d)x)(-)y

The translations of apndr and distr are left to the reader as an exercise.
Consider now the translation of the combining forms:
Composition is equivalent to
M.;\.g.;\.x.(f)(g)x
Construction is exactly the same in either notation. The Conditional is
equivalent to
;\.p.M.;\.g.;\.x.( ( (p )x) (f)x) (g)x
The Constant combinator xis equivalent to
;\.z.x
Apply to all is equivalent to our map, while Insert is equivalent to
(Y1 );\.i.M.;\.x.( ((null)(- )x)( A )x)(f)[ (A )x,( (i)f)(- )x]
Thus, we can design a relatively simple translation algorithm which
produces an equivalent ;\.-expression to any FP function. The translator it-
self represents a complete formal semantics for FP, since the meaning of
;\.-expressions has already been defined by the reduction rules.
The use of the ;\.-notation as a meta-language for semantic definitions
is quite common in theoretical computer science. Its use as a practical tool
for implementing programming languages is relatively new, but it is
spreading rapidly. This has led to the consideration of nonstrict languages
and various forms of lazy evaluation techniques which are closely related
to the normal order evaluation strategy of ;\.-calculus, which will be dis-
cussed in Chapter 6.
Imperative programs can also be translated to ;\.-calculus, but that is in
general much more complicated. The major difficulty is caused by the
presence of side-effects. It may be interesting to note that structured pro-
gramming, which is a highly disciplined way of writing imperative pro-
grams, can substantially decrease the difficulty of the translation from an
imperative language to ;\.-notation. This can be illustrated by the following
example.
102 CHAPTER 5

Consider this program segment written in a Pascal like language:

IF X = 0 THEN X := X + 1 ELSE X := X - 1;
y:=x*x/2;
z:=x+y;
PRINT(z);

This can be translated to the A.-expression

(A.x.(A.y. (A.z.(PRINT)z)( ( + )x)y)( ( *)x)( (/)x)2)( ( (zero)x)(succ)x)(pred)x

without any difficulty, provided that we add the function PRINT to our set
of primitive functions. Any sequence of assignment statements can be
treated in the same way, because it has a linear flow of control. The IF
statement breaks the linear flow of control though in a relatively simple
(well-structured) manner. Therefore, it can be easily translated to
A.-notation provided that its component parts have already been translated.
Similar is true for the WHILE statement, whose general form is the fol-
lowing:

WHILE (P)X DO X : = (F) X;

where P is a predicate, F is a function, and X is the list of all variables oc-
curring in the program. Here the function F represents the overall effect
of a single execution of the body of the WHILE statement. (Some of the
variables may get updated while others are left unchanged.) Now, the
translation of the WHILE construct can be based on its recursive defi-
nition, namely
(((while)p)f)x =if (p)x then (((while)p)f)(f)x else x

In pure A.-notation this is written as

while= A.p.A.f.A.x.(((p)x)(((while)p)f)(f)x)x
whose explicit form is
while = (Y')A. w.A.p.A.f.A.x. ( ( (p )x)( ( (w)p)f)(f)x)x
So, the WHILE construct has been defined as a combinator in our
A.-notation, which can be used to translate every WHILE statement pro-
RULE-BASED SEMANTICS OF ;\-EXPRESSIONS 103

vided that its predicate P and its function F can also be translated to
i\ -notation.
The translation of an unrestricted GO TO statement is obviously much
more difficult. Structured programming is, therefore, very helpful for
translating imperative programs to i\-notation. In fact, it represents a first
step towards functional programming without abolishing the assignment
statement. A limited use of the assignment statement in otherwise purely
functional languages has many advantages. The proper discipline of using
them may depend on the purpose of their usage.

5.5 List comprehension in Miranda

Miranda is a non-strict functional language based on higher-order
recursion equations. It is a trademark of Research Software Limited, and
is implemented on a variety of computers. We give here only a brief over-
view of the language, concentrating on its list oriented features.
List comprehensions were first used by David Turner in KRC, where
they were called ZF expressions [Turn82]. They are analoguous to set
comprehensions in Zermelo-Frankel set theory. Set comprehension is used
in mathematics to define sets via some property. So, for example, the set
of odd numbers can be defined as
{x I x E Z and x mod 2 = 1}
where Z represents the set of all integers. It should be noted, however, that
this definition is based on the definition of another set, Z, which must be
known beforehand in order to find its elements with the given property.
The corresponding list comprehension in Miranda will have this form:
[x I x - Z; x mod 2 = 1 ]
The only difference in the notation is that the curly braces are changed to
square brackets, the word 'and' is changed to a semicolon, and the symbol
E is changed to - , which is pronounced 'drawn from'. A more important
difference is that the latter is a list, not a set. The same elements may occur
several times in a list, but if we want to represent a set by listing its ele-
104 CHAPTER 5

ments in some sequence then we have to make sure that each element oc-
curs exactly once. We are not concerned with sets right now, so we do not
worry about possible repetitions.
The list of odd numbers can be defined also in this way:
[2*k+1 I k+Z]
This means that the elements of a list may be represented by an expression
containing some variable whose value is drawn from another list. The ex-
pression may also contain several variables each being drawn from a dif-
ferent list. The general form of a list comprehension is the following:

[<expression> I <qualifier>; ... ; <qualifier>]

where each <qualifier> is either a generator (such as 'x + Z') or a filter
(such as 'x mod 2 = 0'). For example, the list of all factors of a natural
number n can given as
[d I d + [l..n div 2]; n mod d = 0]

which has one generator and one filter. As can be seen from these exam-
ples Miranda uses infix notation for the arithmetic operations, and it has a
nice shorthand for the list of integers from 1 to some limit like n div 2. Let
us consider now the most important features of Miranda, which are rele-
vant to our discussion.
Miranda is a purely functional language which has no side-effects or
any other imperative features. A program in Miranda is called a script,
which is a collection of equations defining various functions and data
structures. Here is a simple example of a Miranda script taken from
[Turn87]:

z = sq xI sq y
sq n = n * n
x=a+b
y=a-b
a= 10
b=S
Scripts are used as environments in which to evaluate expressions. So, for
example, the expression z will evaluate to 9 in the environment represented
by the above script. Function application is denoted simply by
RULE-BASED SEMANTICS OF A.-EXPRESSIONS 105

juxtaposition, as in sq x. In the definition of the sq function, n is a formal

parameter. The scope of a formal parameter is limited to the equation in
which it occurs whereas the scope of the other names introduced in a script
includes the whole script.
There are three basic data types which are built into the language:
numbers, characters, and truth values. Integer and real types are considered
the same. Character type constants are written between qoutation marks,
e.g. "John". Type declarations are optional, because the type of an ex-
pression can be determined automatically by the system from the types of
the constants occurring in it. Type checking is performed during compila-
tion with the aid of a set of type inference rules, which we do not discuss
here. But, in order to minimize the amount of type checking at run time,
programs will have to be translated to a type-free language. Indeed, the
target language of the compilation is a variant of the type-free lambda-
calculus.
There are two kinds of built-in data structures in Miranda, called lists
and tuples. Lists are written with square brackets and commas. The ele-
ments of a list must all be of the same type. The symbol : is used as an
infix cons operator, while + + represents the infix append operator. So, for
example, 0: [1,2,3] has the value [0,1,2,3], while [1,2,3,4] ++ [5,6,7] has
the value [1,2,3,4,5,6,7].
There is a shorthand notation using .. for lists whose elements form an
arithmetic series, e.g. [1..100] or [0,5 .. 25]. This notation can also be used
for infinite lists, so the list of all natural numbers can be denoted by [0 .. ],
and the list of all odd natural numbers by [1,3 .. ]. The prefix # operator
is used to compute the length of a list while the infix ! operator is used for
subscripting, i.e. selection. So, for example,# [0,2 .. 1O] has the value 6, and
[0,2 .. 10] ! 1 has 2. Note that the first member of a list L is L ! 0, and the
last is L ! (# L - 1).
Tuples are analoguous to records in Pascal. They are written using
parentheses instead of square brackets, but they cannot be subscripted.
The elements of a tuple can be of mixed type. For example,
("Charles" "Brown" 35 true)
' ' '
Accessing the elements of a tuple is done by pattern matching, which is a
favorite device of Miranda. It is often used for defining functions with
106 CHAPTER 5

structured arguments. So, for example, the selection functions on 2-tuples

can be defined as
frst (a,b)=a
send (a,b) = b
The application of these functions involves a pattern matching of the
structure of the argument with the pattern (a,b). If this pattern matching
fails then the function is undefined for the given argument.
It is permitted to define a function by giving several alternative
equations, distinguished by different patterns in the formal parameter.
We can use pattern matching on natural numbers as well as on lists. Here
are some examples:
fac 0 = 1
fac (n + 1) = (n + 1) * fac n
sum[]= 0
sum (a:x) = a +sum x
reverse [] = []
reverse (a:x) = reverse x + + [a]
sort[] = []
sort (a:x) =sort [ b I b+-x; b$a ]++[a]+ +sort [ b I b-x; b>a]
There are, of course, many functions which cannot be defined by simple
pattern matching. Take, for example, the gcd function, which is defined in
Miranda via 'guarded equations' as follows:
gcd a b = gcd (a - b), a > b
= gcd (b - a), a < b
=a, a= b
According to the semantics of the language, the guards are tested in order,
from top to bottom. It is, therefore, recommended to use mutually exclu-
sive tests. The general form of a function definition with guarded equations
is this:
RULE-BASED SEMANTICS OF ;\-EXPRESSIONS 107

f args = rhs 1, test 1

= rhs2, test2

= rhsN, testN

One can also introduce local definitions on the right-hand side of a defi-
nition, by means of a where clause, as shown in this example:
quadr abc = error "complex roots", delta<O
= [-b/(2*a)], delta=O
= [-b/(2*a)+radix/(2*a), -b/(2*a)-radix/(2*a)], delta>O
where
delta = b*b-4*a*c
radix = sqrt delta

The scope of the where clause is all the right-hand sides associated with a
given left-hand side.
As we mentioned before, Miranda is a higher-order language. Func-
tions of two or more arguments are considered Curried and function ap-
plication is left-associative. So, the application of a function to two
arguments is written simply as f x y, and it will be parsed as the
A-expression ( (f)x)y. If a function f has two or more arguments then a
partial application of the form f x is treated as a function of the remaining
arguments. This makes it possible to define higher-order functions such as
reduce, which was used in Section 4.3 for a uniform definition of the sum
and the product of a sequence. Here we can use pattern matching to define
this function as follows:

reduce a b [] = a
reduce a b (c:x) = reduce (b a c) b x

Hence, we get

sum = reduce 0 ( +)
prod= reduce 1 (*)

Note that in Miranda an operator can be passed as a parameter, by en-

closing it in parentheses.
108 CHAPTER 5

The alert reader must have noticed the striking similarities between
Miranda and the A.-notation. It is, indeed, very easy to translate Miranda
programs into our extended A.-notation. A script is just a set of simultane-
ous equations which can be treated as described in Section 4.4. The right-
hand side of every equation will be translated first to a valid A.-expression.
Then, in order to minimize the number of forward references, the
equations will be rearranged on the basis of the dependency analysis of the
given definitions. So, for example, the script which was given at the be-
ginning of this section will be translated as follows:
a= 10
b = 5
x = (( + )a)b
y = ((-)a)b
·sq = A.n.((*)n)n
z = ((/)(sq)x)(sq)y

Then the whole script will be combined into a single A.-expression,

(A.a.(A.b.(A.x.(A.y.(A.sq .(A.z.E)((/)(sq)x)(sq)y)A.n.((*)n)n)((-)a)b)(( + )a)b)5) 10
where E represents the expression to be evaluated in this environment.
The translation of function definitions with guarded equations to if-
then-else expressions is trivial. A case analysis via pattern matching is not
a problem either, as long as we use only simple patterns. Miranda allows
more complex patterns, as well, which are somewhat more difficult to
handle, but they all can be translated to the A.-notation.
The general form of a where close is
f = E1
where
g = E2

which can be translated simply as

Let us consider now the translation of lists and tuples. Both will be re-
presented by lists in our type-free A.-notation. Explicitly enumerated lists
can be translated directly without any problem. Also, the translation of list
RULE-BASED SEMANTICS OF A.-EXPRESSIONS 109

comprehensions is relatively simple. Consider first a list comprehension

with a single generator. This has the general form:

[Eiv-L]

where E is an expression containing free occurrences of v, while Lis a list.

It can be translated to

((map)Av.E)L

provided that E and L are already in A-notation. So, for example, the list
of odd numbers from 1 to 99 defined in Miranda as

[2*k-1 I k - [1..50]]
will be translated to
((map)Ak.((- )( (*)2)k) 1 )(iota)50

If we have a list comprehension with two generators like

[E I X - L; y - M ]
then we write
M' = ((map)Ay.E)M
from which we get the result in this form:
(flat)( (map)Ax.M')L

where
(flat)x = if x = [] then []

else ((append)(" )x) (flat)(~ )x

This construction can be repeated for any number of generators which

takes care of their translation. (Note that an application of the gamma-
rules is hidden in this translation, since we map a list, Ax.M', onto L.)
Consider now a list comprehension with a generator and a filter. It has
the general form

[E I v- L; P]
110 CHAPTER 5

where P is a predicate (true/false expression) containing free occurrences

of the variable v. First we can filter out the unwanted elements of L by
forming the list
L' = ((filter)/.. v.P)L
where
((jilter)test)L = ifL =[]then[]
else if (test)( A)L then ((&)(A )L)( (jilter)test)( ~ )L
else ( (jilter)test)( ~ )L
Hence, we get the translation
((map)t..v.E)L'
as before. For two or more filters the process of filtering out the unwanted
elements of the generator is repeated for each filter. On the other hand, if
we have two generators with a combined filter, e.g.
[E I X ... L; y ... M; p ]
where P contains free occurrences of both x andy, then we compute first
the list of ordered pairs,
V = [ [x,y] I x + L; y + M]
as
(flat)( (map)'Ax.M')L
where
M' = ((map)t..y.[x,y])M
Then we take
[ ((t..x.t..y.E)(l )v)(2)v I v + V; (('Ax.t..y.P)(l )v)(2)v]
which has only one generator and one filter. The same technique can be
extended to any number of generators, which means that list comprehen-
sion can be added also to our extended 'A-notation simply as a shorthand
(syntactic sugar) without increasing its power. List comprehension is, of
course, a very useful tool for the programmer to write transparent and
RULE-BASED SEMANTICS OF A.-EXPRESSIONS Ill

concise programs. For more details on the various features of Miranda we

refer to the book [Peyt87].
In conclusion we can say that a translation of Miranda to the type-free
i\-calculus is fairly straightforward. This fact may be used for a formal de-
finition of its semantics. Its type system, however, needs special treatment
which we do not discuss here. This type system is quite flexible
(polymorphic) but still has severe limitations. We feel that the requirement
that the elements of a list must all be of the same type is far too restrictive.
The calculus we have developed for lists can easily handle lists with ele-
ments of mixed type. Moreover, the applicative property of lists expressed
by our gamma-rules might be useful in any higher-order language.

Exercises
5.1 Show that the FP axioms listed in Section 5.3 are derivable from the
reduction rules of Section 5.1.
5.2 The following is the definition of the Ackermann function in FP:

ack = eqO • 1 -+ a • 2;

eqO • 2 - ack • [s • 1, l];

ack • [s • 1, ack • [ 1, s • 2]]

Translate this definition into i\-calculus and compare it with its Curried
version.
5.3 Define an FP function to compute the n-th Fibonacci number with the
aid of an ordered pair holding two consecutive Fibonacci numbers as an
intermediate result. Why is this better than the usual recursive definition?
Try to imitate this definition in i\-calculus without using lists.
5.4 Define the function hanoi in FP to solve the problem of the towers of
Hanoi. The function should generate the solution as a sequence of moves
denoted by ordered pairs of the form [A, B], which represent moving a disk
from tower A to tower B. The initial configuration, where all disks are on
the first tower, can be represented by [n, A, B, C], where n is the number
of disks, while A, B, and C are the names of the towers. Translate the
function hanoi to i\-notation. What would you do if you did not have lists
in the i\-notation?
112 CHAPTER 5

5.5 Design a translation scheme for the nested where clauses of Miranda,
which have the general form
f =A
where
g=B
where
h= c
etc ...
What is the difference between the above scheme and the following?
f =A
where
g=B
h= c
etc ...
5.6 The function pyth returns a list of all Pythagorean triangles with sides
of total length less than or equal to n. It can be defined in Miranda as fol-
lows:
pyth n = [ [a,b,c] I a+-[l..n];
b.-[l .. n-a];
c+- [l .. n-a-b];
sq a + sq b = sq c ]
Observe the fact that a later qualifier may refer to a variable defined in an
earlier one, but not vice versa. Translate this definition to ,\-notation.
CHAPTER SIX

OUTLINES OF A REDUCTION MACHINE

6.1 Graph representation of A. -expressions

The extended lambda-notation A, as defined by its syntax in Section 5.1,
may be used as a programming language without any extras. This would
be a little cumbersome, because the whole program would have to be
written as a single expression. In order to facilitate modular programming
we introduce two program structuring commands.
The let command is used for function definitions, while the eval com-
mand is used for starting the 'body of the main program'. Actually, these
two commands represent only 'syntactic sugar', and they will be translated
to pure lambda-notation before execution. For example, a program with
the form

let f = E 1
let g = E 2
eval E 3

will be translated to the A-expression

Note, however, that mutual recursion needs special treatment as discussed

in Section 4.4.
114 CHAPTER 6

Now, for the evaluation of A-expressions we will design an abstract

(hypothetical) machine which would reduce the input expression to normal
form. The expression must be stored in the memory of our Reduction Ma-
chine before the reduction begins. An expression can be represented, in
general, either by a character string or by a directed graph. Thus, we can
have string-reduction or graph-reduction depending on the internal repre-
sentation we use. Normally, the choice between the two is based on effi-
ciency considerations.
The main advantage of string-reduction is due to the fact that strings
can be treated as character arrays, occupying consecutive storage lo-
cations, which makes their storage management relatively simple. Certain
primitive operations on character strings are also available on regular
computers. A major drawback, however, is the large amount of copying
that may become necessary during the reduction process.
Graph-reduction, on the other hand, can minimize the amount of
copying by using pointers instead. This means that the same copy of a
subexpression can be shared among several parts of a larger expression
simply by referring to it via its pointer. As a result, logically adjacent parts
of an expression will not necessarily occupy adjacent storage locations. So,
the benefit of subexpression sharing is counterbalanced by a more compli-
cated storage management system. Indeed, segments of storage used and
then released dynamically during the reduction process may become widely
scattered all over the storage space. This calls for garbage collection, which
is a technique for recycling the unused pieces of a fragmented storage. Such
techniques are regularly used in LISP interpreters and in many other ap-
plications. As a result, many efficient garbage collection techniques have
been developed over the years.
The design of our reduction machine is based on graph-reduction,
which was originaly developed for the A-calculus by Wadsworth [Wads71 ].
It has been adapted to combinator reduction by Turner and others, and it
seems to be the best technique for implementing lazy evaluation. The in-
struction set of our machine will consist of a set of elementary graph-
transformations which correspond to the reduction rules defining the
semantics of the language. This means that a direct relationship exists be-
tween the formal semantics of the language and its 'hardware
implementation'. The 'machine code' language of the reduction machine is,
in fact, a high-level programming language. The translation of a source
OUTLINES OF A REDUCTION MACHINE 115

program (;\.-expression) to its 'machine code' version is nothing but a

translation from its string form to its graph-representation. But, that is
obviously a one-to-one mapping, which can be easily inverted. Hence, the
original source program can be easily retrieved from its 'machine code'
translation, which is hardly the case with conventional machines.
The graph-representation of a ;\.-expression will be built by a simple
predictive parser during the input phase. The overal structure of this graph
is similar to the parse tree of the input expression. It may contain the fol-
lowing types of nodes:
abstraction (Xx)
application (:)
infix list-constructor (,)
list terminator or empty list ([])
numeric value (#)
variable (x, y, z, etc ... )
combinator (Y, true, or false)
operator(",~,&,+,-, etc ... )
indirection (@)
renaming prefix ( {z/x})

Renaming nodes do not occur initially in the graph, but they may be in-
troduced during the reduction process. The type of a node determines the
number of its children. So, for example, an application node has two chil-
dren, an abstraction node has one, and a variable or constant node has
none. The left-child of an application node is the top node of its operator
part while its right-child is the top node of its operand part. So, for exam-
ple, the graph shown in Figure 6.1 represents the Curried addition
(( + )A)B.

Figure 6.1
116 CHAPTER 6

The internal representation of a list [A 1, A 2 , ••• , An] will have the form
as shown in Figure 6.2, where the list terminator node is identical with an
empty list.

Figure 6.2

So, for example, the A-expression ((Ax.Ay.(x)(y)x)2)[E,F] will be repres-

ented by the graph shown in Figure 6.3.

Figure 6.3
OUTLINES OF A REDUCTION MACHINE 117

The only major difference between our graph-representation and a

regular parse tree is due to the cyclic representation of recursive defi-
nitions. For instance, the following definition of the factorial function:
letfact = An.(((zero)n)l)((*)n)(fact)(pred)n
will be represented by a cyclic graph shown in Figure 6.4. The empty tri-
angle underneath the abstraction node Ajact represents the scope of the
given definition. The scope may be an arbitrary A-expression where, due
to this arrangement, any free occurrence of fact will be replaced by the
right-hand side of its definition during the reduction process.

Figure 6.4

Mutual recursion is treated in the same way after being transformed

to immediate recursion as described in Section 4.4. This means that the set
of simultaneous recursion equations
fi = ei
118 CHAPTER 6

(where each ek may contain free occurrences of every f; ) will be trans-

formed into a single equation
let F = [E 1, ••• ,En]
with f; replaced everywhere by (i)F ( 1 ~ i ~ n). This equation will then
be represented by a cyclic graph where each occurrence of F on the right-
hand side of the equation is replaced by an indirection node pointing back
to the top node of the list corresponding to the right-hand side. Recursive
definitions of infinite lists like
zeros= ((&)O)zeros
are also represented by cyclic graphs obtained in a similar fashion.
In order to avoid going in circles indefinitely when traversing a cyclic
graph one must take some preventive measures. This holds also for the
output routine which is used for printing A-expressions from their graphs.
The output routine which prints a A-expression in string format by traversing
its graph in depth first order is, in fact, the inverse of the parser. As a matter
of fact, the same output routine is used for printing the result of a compu-
tation which is the normal form of the input expression.

6.2 The instructions of a reduction machine

An elementary step of our reduction machine consists of a single applica-

tion of a reduction rule. Therefore, we can say that the given reduction
rules are implemented directly by the architecture of the machine. (The
time required for the execution of an elementary step may vary from rule
to rule, but that is not relevant to our discussion.) Every reduction rule
corresponds to some local changes in the expression graph which will be
performed by the machine.
These local transformations for the a-, /3-, and y-rules are shown in
Figures 6.5, 6.6, and 6. 7, respectively. Renaming nodes are created only in
/33 steps and always use a fresh variable generated by the system. Note
that the y-rules as well as aS are implemented lazily so that they may be
applied to partially evaluated infinite lists.
OUTLINES OF A REDUCTION MACHINE I 19

0
- a1

-a2

-a3

-a4

- a5

Figure 6.5
120 CHAPTER 6

- ~1

I
-~2

- ~3

-~4

Figure 6.o
OUTLINES OF A REDUCTION MACHINE 121

- y1

-y2

Figure 6.7

It should be emphasized that the contraction of a redex should not

change any of the nodes of a given redex except for its top node which may
be changed in its own place. Therefore, in many cases, we have to insert new
nodes into the graph to perform the contraction in question. This is neces-
sary because subexpressions may be shared by various parts of the ex-
pression being evaluated. So, for example, if we were to change the
abstraction node A.y of an a3 redex to a renaming node {z/x} when con-
tracting that redex then this change might affect some other part of the
expression which happens to contain a pointer (edge) to this node. But that
could change the meaning of the whole expression which is clearly unde-
sirable. In short, the integrity (meaning) of the graph representation can
be preserved only if no side effects occur.
As can be seen from the graph representation of the {34 rule, the op-
erand part will be shared by the resulting two new applications. But, the
confluent edges resulting from the application of {34, y 1, or any other rules
will never create new cycles. Therefore, if the initial graph contains no cy-
cles then the reduction process will produce only directed acyclic graphs.
Note that the initial graph always has a tree form except possibly for
some cyclic subgraphs representing recursive definitions. Those cycles can
122 CHAPTER 6

be avoided by using the Y combinator instead. Figure 6.8 shows the

acyclic implementation of the standard Y combinator. The implementation
of controlled reduction discussed in Section 5.2 is fairly similar.

-
Figure 6.8

Since every recursive definition can be resolved with the aid of the Y
combinator, every .\-expression can be represented and evaluated using
only directed acyclic graphs. So, the question is why should we be con-
cerned with cyclic graphs at all? The answer can be summarized in one
word: efficiency.
Our implementation gives us the opportunity to represent recursion
by either cyclic or acyclic graphs. We have run various experiments with
both. The size of the acyclic graph tends to grow more rapidly during the
evaluation than that of the corresponding cyclic version. Consequently, the
evaluation process is much faster with the cyclic version than with the
acyclic one. Of course, the difference depends on the given example, but
it is so overwhelming in most cases that there can be no doubt about its
significance. This fact has far-reaching consequences with respect to the
parallel implementation to be discussed in the next chapter.
Now, let us see a bit more closely how graph transformation is done
in our reduction machine. First of all, each node of the graph is stored as
a record with four fields:

I CODE I OPl I OP2 I MARKER I

The CODE determines the type of the node while OPl and OP2 are usu-
ally pointers (indices) referring to its left-child and right-child, respec-
OUTLINES OF A REDUCTION MACHINE 123

tively. The MARKER field is only one bit long and it is used exclusively
by the garbage collector.
The particular encoding used for various node types is not important
as it is quite arbitrary. In our implementation we use, for instance, 1 for the
CODE of an abstraction node, whose OP 1 contains the name of the bound
variable while its OP2 points to its only child. The CODE of an application
node is 2, and its OP1 and OP2 are pointers to its children.
The reduction process involves a traversal of the expression graph
while looking for a redex. This will be done in a depth-first manner begin-
ning with the root node of the entire graph. In order to locate a {3-redex it
is necessary to find an application node first. The record of an application
node encountered will be stored in a register called N 1. Then the record
of its left-child will be stored in N2. If that happens to be an abstraction
node then its left-child will be stored in N3, and the selection of the ap-
propriate {3-rule begins with a search for a free occurrence of the bourd
variable (OP1 of N2) in the subexpression whose root is in N3. If no such
occurrence is found then we have a {32 redex. Otherwise, the CODE of
N3 will decide whether we have a {31, {33, or {34 redex. In each of these
cases the graph will be changed accordingly and the search for the next
redex continues. However, if the node in N3 is neither a variable, nor an
abstraction, nor an application then we have no {3-redex here, and we should
look for another redex.
For finding an a-redex only two nodes are to be checked. Each time a
renaming node is found during the traversal of the graph, it will be stored
in Nl. Then its right-child will be stored in N2, while N3 remains idle.
Similar is true for the two y-rules.
A nice feature of all these rules is that we can recognize their patterns
by looking only at two or three nodes of the entire expression. The rest of
the expression will have no influence on the type of the redex in question.
The only exception is represented by the {32 rule whose applicability de-
pends on the fact whether or not the bound variable occurs free inside the
operator part of the redex. The search for a free occurrence of a variable
in a subexpression is clearly not an elementary operation. It may be
treated, however, as a preliminary test, because it does not change the
graph at all.
This brief description of the operation of the machine must be suffi-
cient for certain observations. First of all, it is easy to see that the instruc-
124 CHAPTER 6

tion set of the machine is indeed isomorphic with a set of reduction rules.
Also, it must be clear that these instructions can be easily simulated on a
conventional computer. An unusual feature of these instructions is, per-
haps, their synthetic nature, since they are assembled from different nodes,
i.e. from different parts of the main storage. Aside from that, the graph can
be interpreted as a structured set (as opposed to a sequence) of in-
structions and thus, it represents indeed a program for computing the re-
sult. This program, however, will change significantly during its execution
and it eventually develops into its result. This makes reduction machines
entirely different from the more conventional fixed program machines.
The operation of the reduction machine is controlled by the contents
of three registers, N 1, N2, and N3. The main purpose of these registers is
simply pattern matching with the left-hand sides of the rules. Fortunately,
the left-hand sides of our rules have very simple patterns which make them
relatively easy to match with the appropriate portion of the graph.

6.3 Implementation of primitive functions

In the previous section we have seen the graph transformation rules asso-
ciated with the a-, /3-, and y-rules. The implementation of the other re-
duction rules follow the same approach. Their patterns have been designed
in such a way that they can be easily recognized by checking only a few
adjacent nodes in the graph.
Each of our primitive functions represents either a unary or a binary
operation. The binary ones are always Curried, except for the infix list-
constructor. If we did not restrict ourselves to a maximum of two arguments
then the patterns to be recognized by the machine would be more complex.
We think that decomposing multi-argument functions to simpler ones is
better than using a more complicated pattern matching procedure. For
instance, we can implement the S combinator in two steps as follows.
((S)A)B ... (S 1)[A,B]
and
((S 1)[A,B])C ... ((A)C)(B)C
OUTLINES OF A REDUCTION MACHINE 125

This way we do not need more registers and the extra time spent on the
intermediate transformation will be compensated by the overall simplicity
of the pattern matching operation.
Before the application of a primitive function its operand(s) may have
to be evaluated or at least examined to some extent. As we have discussed
in the previous section, whenever an application node appears in register
N 1, its left-child will be stored in N2. If that is a unary function then the
top node of its operand is obviously the right-child of N 1, which will be
stored in a separate register called N4.
Consider now, for example, the implementation of the " operator as
shown in Figure 6. 9. If N4 is an infix list-constructor then we can use its
OP1 (the pointer to its left-child) for making the necessary changes in the
graph; otherwise, the " operation cannot be performed at this point, but it
may become executable later if and when th~ operand gets reduced to an
expression which begins as a list. All the other unary primitive functions,
including the projections, are treated in a similar fashion, which means that
their arguments will be analyzed only to the necessary degree.

Figure 6.9

The patterns of the binary operations are similar to the Curried addi-
tion as shown in Figure 6.1. This means that both N 1 and N2 must contain
application nodes when the binary operator appears in N3. Hence, the
right-child of N2 is the top (root) of the first operand while the right-child
of N 1 is the top of the second operand. If the operator in N3 is either an
arithmetic or a relational operator and the operands are numbers then the
operation is performed and the result is stored in N 1. In other words, the
126 CHAPTER 6

given redex will collapse to a single node holding the numeric value of the
result. (The address of the node stored in N 1, i.e. the top node of the redex,
is kept, of course, in a separate register.) If the operands are not numbers
then the execution of the arithmetic operation must be postponed until the
operands are reduced to numbers.
The arithmetic and the relational operators cannot be computed lazily
(without fully evaluating their arguments), because they are strict. So are
the boolean operators, as well as the predicates, except for null, which is
semi-strict. Sometimes, the latter can produce an answer just by looking
at the beginning of its operand without evaluating it.
Most of our list manipulating operators are implemented as semi-strict
functions that can be applied to partially evaluated lists. In order for the
function map (apply to all) to have the same opportunity it has been im-
plemented lazily as shown in Figure 6.1 0. This means that the map opera-
tion will be decomposed into a sequence of its partial applications. The
same is true for our implementation of the append function.

Figure 6.10

The & operator, i.e. the Curried list-constructor, represents an inter-

esting special case. It is implemented as a completely non-strict function
(see Figure 6.11) which does not evaluate its arguments and does not even
care about the type of its second operand. Henct, its ~tcono operand may
or may not be a list. (If it is not, it still may reduce to one later.)
OUTLINES OF A REDUCTION MACHINE 127

-
Figure 6.11

The combinators, true, false, andY, are treated as absolutely non-strict

operations. They need not evaluate their arguments at all and they can be
applied to arbitrary A.-expressions, which makes their implementation quite
simple. Observe the fact that we do not make any distinction between the
truth values and the corresponding combinators. The same is true for the
integers and the selector functions. Fortunately, the Church-Rosser prop-
erty is preserved under this interpretation, hence, no ambiguity may occur.
The presence of strict as well as non-strict functions has a significant
impact on the evaluation strategy used by the Reduction Machine. In a
sense, it has to deal with a worst case scenario, because it can never tell
when to stop the partial evaluation of the argument and go back to try
again to apply the function. The benefit of having only strict functions is
obvious. Strict functions can be evaluated in applicative order where the
argument is always computed before the application of the function. If the
computation of an argument does not terminate then the value of the
function is considered undefined.
Non-strict functions, on the other hand, may return meaningful results
even if some or all of their arguments are undefined. In our situation, we
can have arbitrarily nested applications of strict, semi-strict, and non-strict
functions. A partial computation of an argument that may satisfy one
function may not satisfy another. Moreover, the value of a strict function
occurring in the argument of a non-strict one may not be needed at all for
the result. Therefore, the strictness of a function by itself does not deter-
mine whether or not its arguments should be evaluated. These questions
will be discussed in more detail in the next section.
12H CHAPTER 6

6.4 Demand-driven evallllation

According to the Church-Rosser theorem the redexes occurring in a

?\-expression can be contracted in any order as long as the reduction se-
quence terminates. The problem is that termination in general can be de-
termined only after the fact. Then, how can we possibly find a terminating
reduction sequence when some of them are terminating and others are not.
Is it just sheer luck when we hit upon one, or is there some better way of
finding out whether such a sequence exists at all? The answer to this
question is the following.
Definition 6.1 (Normal Order Reduction) A sequence of reduction
steps is called normal order reduction if each of its steps involves
the contraction of the leftmost redex of the expression being re-
duced. (The leftmost redex of an expression is the one whose first
symbol precedes the first symbol of any other redex occurring in
the expression. Overlapping redexes cannot share the same first
symbol.)
Normal order reduction is also called leftmost or outside-in reduction. The
latter term refers to the fact that in normal order reduction every redex is
contracted before any of those which are properly contained in it.
Theorem 6.1 (Standardization Theorem) If a ?\-expression has a
normal form, then its normal order reduction terminates.
We do not prove this theorem, because it is rather complicated (although
not as much as the proof of the Church-Rosser theorem). The interested
reader can find a proof in [Bare81]. All we need to know is that normal
order reduction is safe, as it produces the normal form whenever such exists.
So, we do not have to worry about all possible reduction sequences, which
is certainly a big relief. Unfortunately, we still cannot tell beforehand
whether or not the normal order reduction terminates. What we know for
sure is that if it does not terminate then no other reduction sequence can
possibly do so.
A word of caution is needed here. The proof of the standardization
theorem depends on the properties of the rules used for reduction. By
changing any of the rules or adding new rules to the system the validity of
the theorem may be destroyed. As a matter of fact, the standardization
theorem in its original form is not true in our system. This is due to the fact
OUTLINES OF A REDUCTION MACHINE 129

that our /33 rule is more like an intermediate step (a preparation for con-
traction) rather than a contraction by itself. Consider the following
A-expression
((Ax.Ay.E)P)Q
where E, P, and Q are arbitrary A-expressions each containing free occur-
rences of x andy. Contracting the leftmost redex yields
(Az.(Ax. {z/y} E)P)Q
by /33. Now, the contraction of the leftmost redex gives us
( (Az.Ax. {z/y}E)Q)(Az.P)Q

Hence, we get
(Av.(Az. {v /x} {z/y}E)Q)(Az.P)Q
and again
((A v.Az. {v /x} {z/y }E)(Az.P)Q)(Av.Q)(Az.P)Q
This shows that strictly normal order reduction cannot work here. Actu-
ally, it will run indefinitely without making any progress. Fortunately, it is
easy to fix this problem. All we have to do is to remember that after each
/33 reduction the next redex to work with must be its 'trace' that is the one
that follows the newly created abstraction prefix Az. This slightly modified
version of normal order reduction will avoid the above trap as can be easily
verified by the reader. For the sake of simplicity we shall use the term
normal order to refer to this slightly modified version.
Normal order reduction can also be compared with the so called
demand-driven (or call by need) evaluation strategy which is usually defined
by the property that the argument(s) of a function are not computed until
their value becomes necessary for the computation of the given function.
This means that a 'function call' would not automatically trigger the eval-
uation of the argument(s). An argument is evaluated during the computa-
tion of the function (execution of the body) if and only if its value is
actually needed. Take, for instance, the following program:
let iterate= Af.Ax.((&)x)((iterate)f)(f)x

letoddlist = ((iterate)(+)2)1
130 CHAPTER 6

eval (")odd list

This program should produce 1 as a result, which is the first member of
oddlist. This, however, cannot be computed in applicative order, because
the evaluation of oddlist never terminates. (It generates the infinite list of
odd numbers.) In normal order, however, the redex consisting of the ap-
plication of the " function to oddlist takes precedence over any other redex
properly contained in it, which keeps the evaluation of oddlist under con-
trol. Similarly, the selector function k 'demands' only the k-th element of
oddlist and does not need the rest.
Lazy evaluation represents a refinement of the demand-driven ap-
proach in the sense that even those arguments which are actually used for
the computation of a function will not be fully evaluated when a partial
evaluation of their value is already sufficient for the computation at hand.
Now, it should be clear that normal order reduction in our system is
equivalent to a demand-driven evaluation technique. Moreover, by prop-
erly adjusting our reduction rules, we have made it equivalent to lazy
evaluation.
The next question is whether normal order reduction is suitable for a
practical implementation. Take, for example, the following expression:
(i\x.( ( + )x)(pred)x)( (*)5 )(succ)3
The entire expression is a {34 redex and that is, of course, the leftmost redex
here. Its contraction gives us
( (i\x. ( + )x)( ( *)5) (succ)3 )(i\x.(pred)x)( (*)5) (succ)3

which further reduces to

( ( +) ( (*)5) (succ)3 )(i\x. (pred)x) ( ( *)5) (succ)3

and then to
(( + )20)(i\x.(pred)x)((*)5)(succ)3
when the normal order is followed. This shows that in normal order re-
duction the subexpression
( ( *) 5) (succ) 3
will be copied in its original form and thus, it seems, it will be evaluated
twice. On the other hand, due to the Church-Rosser theorem, it can also
OUTLINES OF A REDUCTION MACHINE 131

be evaluated before the copying occurs so that only its result will be copied.
That is why applicative order evaluation is considered more efficient. It is
similar to the call by value technique of passing parameters to subroutines,
while normal order reduction is comparable with the call by name tech-
nique.
Notice, however, that copying in graph representation is usually per-
formed by setting pointers to the same copy. At the same time, the first
evaluation of a shared subexpression makes its value available to all of its
occurrences. So, it seems that normal order graph reduction may represent
the combination of the best features of both worlds. It is safe, as far as
termination is concerned, and it can avoid some obviously redundant
computations.
There have been some studies about the relative efficiencies of various
implementation techniques for applicative languages, but there are no clear
winners. This should not be surprising at all, if we consider the generality
of the problem. We are dealing with the efficiency of the process of eval-
uating arbitrary partial recursive functions. Standard complexity theory is
clearly not applicable to such a broad class of computable functions. The
time-complexity of such a universal procedure cannot be bounded by any
computable function of the size of the input. Even if we restrict ourselves
to a certain subclass of general recursive, i.e. computable functions, say,
the class of deterministic polynomial time computable functions, the the-
oretical tools of complexity theory do not seem to help. Complexity theory
is concerned with the inherent difficulty of the problems (or classes of
problems) rather than the overall performance of some particular model
of a universal computing device.
A precise analytical comparison of different function evaluation tech-
niques is extremely difficult. A more practical approach is to apply some
statistical sampling techniques, as is usually done in the performance anal-
ysis of hardware systems.
Now, let us go back to the implementation of normal order graph re-
duction in our reduction machine. As we mentioned earlier, the search for
the leftmost redex corresponds to a depth first search in the expression
graph. Normal order reduction, however, cannot be done in a strictly left
to right manner, because the contraction of the leftmost redex may create
a new redex extending more to the left than the current one. A simple ex-
ample is the following:
132 CHAPTER 6

( (Ax.x)Ay. (Az. (z)y)f)a

The leftmost redex of this expression is
(Ax.x)Ay. (Az.(z)y)f
whose contraction yields
Ay.(Az.(z)y)f
which contains a {34 redex. However, the leftmost redex is now the entire
expression
(Ay. (Az.(z)y)f)a
This means that after each contraction the search for the new leftmost
redex must be extended to the left of the one that has just been contracted.
But, it need not be started again from the root. It can be done, namely in
a reverse scan going backwards from the position of the last contraction.
This back-tracking can be implemented by a pushdown stack which keeps
track of the pointers to the application nodes that have been left behind
while traversing the graph in search for the leftmost redex.
It should be emphasized that we do not have to remember the entire
path from the root to the current position, because not every node can be
the top node of a redex. Therefore, only the application nodes will be pushed
on the stack and we do not care about the other nodes occurring in between
along the path. After each contraction we pop the stack just once and need
not go back any further. Note that this is quite different from the usual
pointer reversal technique. The latter introduces a side effect on the internal
representation of the graph which may cause problems for parallel evalu-
ation techniques.
The only other nodes that may occasionally appear on the stack are the
renaming nodes resulting from {33 reduction steps. They will be processed
as soon as they appear unless they have to be distributed over two different
parts of a larger expression as specified by a4 and a5, in which case the
new renaming node on the left is processed first while the one on the right
is pushed onto the stack. As long as any renaming nodes remain on the
stack, only a rules can be applied, which means that every renaming will
be finished (no renaming nodes left) before another contraction begins.
Again, we have departed here slightly from a purely normal order reduction
strategy by assigning the a rules the highest priority. We feel that this
OUTLINES OF A REDUCTION MACHINE 133

makes the evaluation faster, but a strictly normal order is also feasible in
this case.
Strict and non-strict functions are treated alike. If an argument does
not have the proper form (type) then its evaluation gets started. But, after
each reduction step during the evaluation of the argument an attempt is
made at the application of the function to the partially evaluated argument.
Thus, the argument will be evaluated only to the extent that is absolutely
necessary for the application of the given function.
Observe the fact that this behavior of the reduction machine is a direct
result of the normal order reduction strategy, and no special tricks like
suspensions etc. are needed. This uniformly lazy evaluation strategy may
cause a significant loss in the efficiency when computing strict functions.
Improvements can be achieved by strictness analysis and some other tricks
which we do not discuss here. The whole issue of strictness vs. laziness
appears in a different light when the sequential model of computation is
replaced by parallel processing.

Exercises
6.1 Design an LL(l) parser for ,\-expressions based on their syntax given
in Section 5.1. Supplement this parser by 'semantic actions' to produce the
graph-representation of the input expression.
6.2 Design an output routine to print a ,\-expression in string format when
its graph is given as a directed acyclic graph.
6.3 Design graph-reduction rules for a direct implementation of each of the
following combinators (functions):
(a) append as defined in Section 4.3
(b) sum as defined in Section 4. 3
(c) iterate as defined in Section 4.5
(d) curry as defined in Exercise 4.2
(e) Insert as defined in Section 5.3
(f) Y' and Y defined by the reduction rules

(Y')E -+ (E)(Y)E for all E

(,\x.Y)Q -+ Y' for all x and Q

134 CHAPTER 6

Note that the /32' -rule must be restricted in this case to

(/32') (Ax.A)Q - A
where A is an atom (i.e. variable or constant) with A '¢. x.
Implement also the reduction rules for computing with infinite lists in terms
of Y' andY.
6.4 Design an algorithm for strictness analysis based on the strictness of
some of the primitive functions. (How to decide whether or not a user de-
fined function is strict?)
CHAPTER SEVEN

TOWARDS A PARALLEL GRAPH-REDUCTION

7.1 Harnessing the implicit parallelism

Normal order reduction as described in the previous chapter is basically a

sequential process. It proceeds by always contracting the leftmost redex
of a given expression until no more redex is found. The question arises
whether the speed of the reduction process can be increased by contracting
more than one redex at the same time. For non-overlapping redexes it
seems natural that they can be contracted in parallel rather than one by one
without any difficulty. According to the Church-Rosser theorem, the end
result does not depend on the order of contractions. So, there is no ap-
parent reason why we should not be able to contract more than one redex
simultaneously, provided that these contractions do not interfere with one
another.
The Church-Rosser property is one of the most important features of
i\.-calculus, and it turns out to be instrumental in the design of our parallel
evaluation strategy. It clearly implies that the operation of contracting a
redex cannot have side effects which would somehow influence the out-
come of subsequent contractions to obtain a different normal form.
The opportunity for a simultaneous contraction of several redexes de-
pends on the structure of the expression. This kind of parallelism is called
implicit parallelism as it is determined implicitly by the overall structure of
136 CHAPTER 7

the expression. This is in sharp contrast with the explicit parallelism con-
trolled by the programmer via specific language constructs. Explicit
parallelism is based on the assumption that the programmer has a conscious
control over the events occurring simultaneously during the execution of
the program. This explicit control of parallelism may become extremely
difficult when the number of concurrent events gets very large. A con-
scious control of hundreds or even thousands of parallel processes could
place a tremendous burden on the programmer's shoulders. On the other
hand, it has been suggested by many experts that the implicit parallelism
of functional languages may offer a viable alternative to the programmer
controlled explicit parallelism used in imperative languages like Concurrent
Pascal or ADA.
The graph representation of A-expressions described in the previous
chapter makes their structure more visible, which helps to determine the
interdependence of its subexpressions. Also, when searching for a redex,
we need to locate only a few of its nodes that are characteristic for the
redex in question. These characteristic nodes can be easily distinguished
from the rest of the graph and thus, even nested redexes can be contracted
simultaneously, provided that they have disjoint sets of characteristic
nodes.
The design of our parallel graph reduction strategy is based on a
multiprocessor model with the following assumptions:
(1) We assume that we have a shared memory multiprocessor sys-
tem where each processor can read and write in the shared mem-
ory.
(2) One of the processors will be designated as the master while
the others are called subordinate processors.
(3) Initially the graph representation of the input expression will
be placed in the shared memory. Then the master will start re-
ducing it in normal order.
(4) Whenever the master determines that a subexpression should
be reduced in parallel with the normal order then it will place that
subexpression in a work pool.
(5) The subordinate processors will send requests to the work
pool for subexpressions to be reduced. When a subordinate
processor is given a subexpression it will reduce it in normal order.
TOWARDS A PARALLEL GRAPH-REDUCTION 137

The most important feature of this model is the existence of a common

storage device that can be accessed by each processor. The graph of the
entire A.-expression is stored in this shared memory while it is being re-
duced concurrently by a number of processors. The control of parallelism
in this system can be done in many ways. The use of a master processor
and several subordinates is not necessarily optimal but this model is rela-
tively simple and it can solve a number of important problems. (This or-
ganization has been suggested by Friedman and Wise in [Frie78].)
First of all, the termination of a computation becomes simple in this
case. The master can easily determine whether the normal form is reached,
because it works in normal order. Moreover, since parallel processing can
be initiated only by the master, the amount of parallelism will be limited to
a reasonable size. This may sound strange at first, because one might think
that one should try to obtain as much parallelism as possible. However, the
fact is that there is usually much more implicit parallelism in a large ex-
pression than we can handle, and the problem is not how to find it but how
to control it. The contraction of a /34-redex, for example, always produces
two new redexes which can be processed in parallel. The question is
whether the amount of work to be done in parallel is large enough to justify
the overhead of initiating a new subtask.
Controlling the parallelism is a very difficult task in general. Normal
order reduction makes it possible for the master to delegate work to its
subordinate processors in a more or less demand-driven fashion. The se-
lection of subexpressions to be sent to the work pool can be done in two
different ways:
(a) Only when it is certain that the evaluation of the subexpression
is needed for computing the result. (Conservative Parallelism.)
(b) Whenever it seems possible that the evaluation of the subex-
pression is useful for computing the result. (Speculative
Parallelism.)
Conservative parallelism can be used efficiently in conjunction with
strictness analysis. If a function is known to be strict in some of its argu-
ments then those arguments can be computed in parallel without running
the risk of doing useless work. The operation of addition is, for instance,
strict in both of its arguments. Thus, the evaluation of an expression like
(( + )P)Q can be done in such a way that the two expressions, P and Q, are
computed in parallel.
138 CHAPTER 7

In the case of a nonstrict function, some of the arguments are not al-
ways needed for the computation of the function value. But that may de-
pend on the value of the other arguments, which makes it impossible to tell
in advance which of the arguments should be evaluated and which should
not. (Take, for example, the multiplication as a nonstrict function in both
of its arguments meaning that either argument may be undefined when the
other evaluates to zero.) Therefore, one can only speculate on the possible
need for evaluating those arguments before actually doing it.
The time spent on a speculative computation may turn out to be a
wasted effort only after the fact. In order to minimize the time and space
wasted on speculative computations, they have to be controlled very care-
fully. The point is that a strictly demand-driven evaluation strategy is in-
herently sequential and thus, it is very limited as far as parallel
computations are concerned. Speculative computations, on the other hand,
are risky, so they must be kept under control in order to avoid excessive
waste of time and/ or space.

7.2 On-the-fly garbage collection

Graphs representing ,\-expressions are stored as linked data structures in

the shared memory. They are changed during the reduction process by
adding new nodes to the graph and discarding others. Discarded nodes will
be considered garbage, and the storage management technique for a dy-
namic recycling of the unused storage space is called garbage collection. A
garbage collector usually has two phases: (1) the 'marking phase' to iden-
tify the active nodes of the graph as opposed to the garbage nodes, (2) the
'collecting phase' to return the space occupied by the garbage nodes to the
free space.
In a uniprocessor environment the task of collecting the garbage is
usually delayed so that the computation may proceed uninterrupted until
the entire free space is consumed. Then the computation is suspended and
the garbage collector is executed. Thus, the garbage collector will be exe-
cuted periodically, but only when it is necessary for the remaining compu-
TOWARDS A PARALLEL GRAPH-REDUCTION 139

tation. Such a 'stop and go' technique is quite reasonable when we have
only one processor at hand.
The same technique can also be used with several processors. This
means that each processor would perform graph reduction concurrently
with the others until the free space is consumed. At that point they all
switch over to garbage collection and then the whole process is repeated.
The main advantage of this approach is that the graph will be frozen during
the marking phase. The only problem is that the processors must switch
simultaneously from the computing stage to the garbage collecting stage
and vice versa, and that may involve a great deal of synchronization over-
head.
Therefore, in a multiprocessor system it seems better to collect the
garbage 'on-the-fly', i.e. concurrently with reducing the graph. Some of
the processors can be dedicated to do garbage collection all the time while
others are reducing the graph. This approach will largely reduce the over-
head of task switching and global synchronization but marking an ever
changing graph is a much more difficult task than doing the same with a
static graph. (The graph behaves as a moving target for the marking
phase.) It is, in fact, impossible to mark precisely the graph when it keeps
changing all the time.
Fortunately, as already noted by Dijkstra et al. in [Dijk78], a precise
marking is not absolutely necessary for garbage collection. It is enough to
guarantee that all the active nodes get marked during the marking phase,
but it is not necessary that all garbage nodes be unmarked when the col-
lecting phase begins. In other words, it is sufficient to mark a 'cover' of the
graph in order to make sure that no active nodes are collected during the
collecting phase. Some of the garbage nodes may remain uncollected in
each collecting phase provided that they will be collected at some later
stage. To put it differently, every garbage node can have a finite 'latency'
period after being discarded and before getting collected.
This last observation was the key to the design of a new 'one-level
marking algorithm' due to Peter Revesz [RevP85]. His one-level garbage
collector works very well for directed acyclic graphs but it cannot collect
cyclic garbage. Unfortunately, as we mentioned before, cyclic graphs are
more efficient for representing recursive definitions than acyclic ones.
Therefore, we have decided to use directed cyclic graphs for representing
i\.-expressions involving recursion and look for a more sophisticated gar-
140 CHAPTER 7

bage collection technique for dealing with cyclic garbage. Cyclic garbage
is obviously much more difficult to find, because each node occurring in a
'cyclic garbage structure' has at least one parent (nonzero 'reference
count').
There are many on-the-fly garbage collection techniques available in
the literature that work for cyclic graphs. (See our bibliographical notes.)
It seems, however, that cyclic garbage structures do not occur very often
in our graph reducer, that is, the typical garbage structure tends to be
acyclic in our case. So, we have decided to combine the technique devel-
oped for acyclic graphs by Peter Revesz with a more elaborate technique
that can handle cyclic garbage. The algorithm developed by Dijkstra et al.
[Dijk78] appears to be the most convenient for our purpose.
Consider first the one-level garbage collector that works for directed
acyclic graphs. This algorithm requires only one scan of the graph memory
to find a 'cover' of the active graph.
Assume that the node space (graph memory) consists of an array of node
records which are indexed from 1 to N. Each node record has a one bit field,
called marker, that is used exclusively by the garbage collector. At any
point in time there are three kinds of nodes in this array: ( 1) reachable
nodes representing the active graph, (2) available nodes in the free list, and
(3) garbage nodes.
The free list is a linked list of node records that is treated as a double
ended queue. The 'root' node of the active graph, as well as, the 'head' and
the 'last' of the free list must be known to the garbage collector. Initially
the marker of each node is set to zero. Marking a node means setting its
marker to one. Collecting a garbage node means appending it to the end
of the free list as its new 'last' element.
The marking phase of the garbage collector starts by marking the
root node of the graph and the head of the free list. Then it scans
the node space once from node[ 1] to node[N], meanwhile marking
the children of every node.
This means that every node having at least one parent will be marked, re-
gardless of the marking of its parent(s). Thus, all reachable nodes as well
as the free nodes will be marked, i.e. included in the cover. Garbage nodes
having at least one parent will also be marked. Note, however, that if there
is any acyclic garbage then it must have at least one node without a parent.
TOWARDS A PARALLEL GRAPH-REDUCTION 14 I

The collecting phase scans the entire node space once, and collects
the unmarked nodes while resetting the marker of every node to
zero.
As we said before, a totally orphaned node (first level garbage) will be left
unmarked during the marking phase. Hence, it will be collected imme-
diately during the following collecting phase. It may, however, have many
descendants which are latent garbage at that point. When the collector
collects the orphans then their children become orphans (except for those
having other parent(s), as well), and this will be repeated until the entire
garbage structure is collected. This means that no latent garbage can be
lost forever but the length of the latency period depends on the depth of
the garbage structure itself.
It is also possible that more than one processor is dedicated to the task
of garbage collection in which case the node space will be subdivided into
equal intervals each of which is being scanned by one of those processors.
The two phases (marking and collecting) of these parallel garbage collec-
tors must be synchronized in this case.
Remember that the garbage collector works in parallel with the graph
reducers. This means that nodes may be discarded concurrently with the
execution of the marking phase as well as of the collecting phase. If a node
becomes an orphan during the marking phase after one of its previous
parents has already been scanned, then it obviously remains marked
through the end of the marking phase. Its marker will be reset to zero only
during the collecting phase that follows, but it will not be collected at that
time. During the next marking phase, however, it will obviously remain
unmarked, hence, it will be collected afterwards. In other words, all first
level garbage that exists at the beginning of a marking phase will be col-
lected during the immediately following collecting phase. This means that
all nodes of an acyclic garbage structure will become orphans sooner or
later and thus, they all will be collected eventually by this method.
The insertion of new nodes into the graph while the garbage collector
is working is another matter. The expression graph and the free list contain
all nodes that are reachable either from the root of the expression graph
or from the head of the free list. So, the free list can be treated as part of the
active graph. This means that the free nodes are also considered reachable,
hence, removing a node from the free list and attaching it to the expression
graph is the same as removing an edge from the graph and inserting another
142 CHAPTER 7

edge between two reachable nodes. The reduction rules can also be de-
composed into such elementary steps that the whole process of graph re-
duction consists of a series of elementary steps each being either (i)
removing an edge or (ii) inserting a new edge between two reachable
nodes. The question is how these elementary transformations affect the
on-the-fly garbage collector?
First of all, the insertion of a new edge between two reachable nodes
does not change the reachability of any of the nodes. On the other hand,
the removal of an edge may create an orphan. If that node is being dis-
carded at that point, then we have no problem. But, if the same node is also
the target node of a new edge inserted in the process then strange things
can happen. The problem is dealt with in the paper [Dijk78], and it can be
illustrated by the example shown in Figure 7.1.

Figure 7.1

Here we assume that the edge from B to D is inserted and the edge from
C to D is removed by the reducer. Assume further that the marking algo-
rithm scans these nodes in alphabetic order. Scanning A results in marking
both B and C. But, if B is scanned before the edge from B to D is inserted,
and C is scanned after the edge from C to D is removed then D will never
be marked.
Note that the order in which the insertion and the removal of these
edges occur does not matter, because both can happen during the time
between scanning B and scanning C. Node D may thus be missed by the
marking algorithm even if it is has never been disconnected from the graph.
In order to correct this situation we have to place the following demands
on the reducer(s):
TOWARDS A PARALLEL GRAPH-REDUCTION 143

(a) New edges can be inserted by the reducer only between already
reachable nodes.
(b) The target node of a new edge must be marked by the reducer as
part of the uninterruptable (atomic) operation of inserting the edge.
The first requirement prevents any reachable node from being 'temporarily
disconnected'. The second requirement prevents the target node of a new
edge from being collected during the first collecting phase that gets to this
node after the new edge has been inserted. (The target node must survive
until the beginning of the next marking phase in order to be safe.)
These requirements can be easily satisfied by the implementation of
the graph-reduction rules. (The order of inserting new edges while remov-
ing others must be chosen carefully.)
The only problem with this technique is that it cannot collect cyclic
garbage. For that purpose we could use some other technique as an emer-
gency procedure only when necessary. But, a closer analysis of the situ-
ation has shown us that the one-level garbage collector can be combined
with others in a more efficient manner. The one-level garbage collector
places the same demands on the reducer(s) as does the so called DMLSS
technique described in [Dijk78]. Therefore, the latter seems to be a natural
choice for such a combination. Furthermore, the basic loop of the marking
phase in each technique consists of a single scan through the node space.
During this scan the children of the nodes are marked. So, the basic loops
of the two marking algorithms can be merged killing two birds with one
stone [RevP85].
To show how this works, let us summarize the DMLSS technique. Its
marking phase makes use of three different markings, say colors, white,
yellow, and red. Initially all nodes are white. The marking phase begins with
coloring the root of the graph and the head of the free list yellow. Then the
marking phase would try to pass the yellow color of a parent to its children
and, at the same time, change the parent's color to red. The purpose of this
is to color all reachable nodes red while using yellow only as an intermedi-
ate color representing the boundary between the red and the white
portions of the graph in the course of propagating the red color to the en-
tire graph. The marking process is finished when the yellow color disap-
pears. At that point all reachable nodes are red.
The converse of the last statement is obviously false, because there
may exist red nodes that became garbage during the marking phase after
144 CHAPTER 7

they were colored red. This means that the marking is not precise, but that
is not a problem as long as it 'covers' the graph. For a detailed proof of the
correctness of the algorithm we refer to the original paper [Dijk78] or to
[BenA84].
The reducer(s) must color the target node of a new edge yellow as part
of the atomic operation of insertion. (See demand (b) above.) More pre-
cisely, it should be colored yellow if it is white and it should retain its color
if it is yellow or red. To simplify the description of this operation we use
the notion of shading a node, which means coloring it yellow if it is white,
and leaving its color unchanged if it is yellow or red. So, the basic loop of
the marking phase will have this form:
counter : = 0;
FOR i : = 1 TO N DO
IF node[i] is yellow THEN
BEGIN
color node[i] red;
counter : = counter + 1;
shade the children of node[i];
END
This basic loop will be repeated as long as there are any yellow nodes in the
graph. When no more yellow nodes are left (counter = 0) then the col-
lecting phase is executed as a single sweep through the node space in which
the white nodes are collected and the color of each node is reset to white.
Normally, the marking phase of the DMLSS technique takes several
iterations. Nevertheless, its basic loop can be combined with that of the
one-level marking algorithm by using a three bit marker field for each node,
where the one bit marker for the one-level algorithm and the two bit
marker for the three colors of the DMLSS algorithm will be stored side by
side. Let us use the colors white and blue for marking with the one-level
algorithm. These can be combined with the three colors of the DMLSS al-
gorithm as follows.

white + white = white

white + blue = blue
yellow + white = yellow
yellow + blue = green
red + white = red
TOWARDS A PARALLEL GRAPH-REDUCTION 145

red + blue = purple

Each node can have one of the above six colors. The marking phase of the
combined algorithm begins with coloring the 'root' and the 'head' green.
The marking of the children of a node in the basic loop will be preceded
by choosing the appropriate shade for the operation of shading as defined
below.
(a) Shading a white, yellow, or red node with blue makes it blue,
green, or purple, respectively. A blue, green, or purple node retains
its color when shaded with blue.
(b) Shading a white, blue, yellow, or green node with green makes it
green. Shading a red or purple node with green makes it purple.
Thus, the basic loop of the marking phase of the combined algorithm will
have this form:
counter:= 0;
FORi:= 1 TON DO
BEGIN
IF node[i] is yellow THEN
BEGIN
color node[i] red;
count : = count + 1;
shade : = green;
END
ELSE IF node[i] is green THEN
BEGIN
color node[i] purple;
count : = count + 1;
shade : = green;
END
ELSE
shade : = blue;
Shade the children of node[i] with shade;
END
After each execution of the basic loop the counter is examined to see if the
marking phase of the DMLSS algorithm is finished. If not, then the col-
lection phase of the one-level algorithm is performed. (This time only those
nodes are collected whose 'blue bit' is zero.) Otherwise, the collecting
146 CHAPTER 7

phase of the DMLSS algorithm is executed, which collects all nodes except
the purple ones.
The combination of these two techniques has some interesting prop-
erties. Consider the case when a yellow node gets discarded by the reducer
and most, or all of its descendants become garbage as a result. The
DMLSS algorithm would not notice this fact during its protracted marking
phase. Therefore, it will color this node red and keep propagating the color
to all of its descendants. The one-level collector, however, will recognize
it as first level garbage in the next iteration of the basic loop and then col-
lect it immediately thereafter.
Now, the combined algorithm will color this node red rather then pur-
ple during the next iteration of the basic loop. (Its 'blue bit' remains zero
because now it is an orphan.) At the same time, its children will be shaded
with green. Then the node itself will be collected during the next collecting
phase of the one-level collector, leaving its children orphans. But, these
children will retain their yellow or red colors after resetting their 'blue bits'.
Therefore, the propagation of the color continues through the entire gar-
bage structure one step ahead of the one-level collector. If the garbage
structure in question has no cycles then it will be collected by the one-level
collector by the time the DMLSS algorithm is finished with its marking
phase. The DMLSS algorithm would need another complete marking phase
in order to collect this garbage structure. Garbage pick-up is more evenly
distributed in time with the one-level collector.
The idea of combining his superficial one-level garbage collector with
a slow but thorough one is due to Peter Revesz [RevP85]. For more details
on this and other garbage collection techniques we refer to the literature.
To conclude this section we have to emphasize that the free list re-
presents an important interface between the garbage collector and the
graph reducer. It is implemented as a shared queue which can be updated
by both the reducer(s) and the collector(s). The reducers are the consumers
the collectors are the producers of the free list. The contention among these
processes for the shared queue must be handled very carefully in order to
achieve maximum efficiency. For more details on shared queue manage-
ment techniques see [Gott83] or [Hwan84].
TOWARDS A PARALLEL GRAPH-REDUCTION 147

7.3 Control of parallelism

As discussed earlier, functional programs present ample opportunity for

implicit parallelism. This implicit parallelism is made, in fact, explicit by the
graph representation. We will show here that the shared memory multi-
processor system discussed in Section 7.1 represents a reasonable model
for parallel graph reduction. So far, we have seen that shared memory
multiprocessor systems can be used efficiently for 'on-the-fly' garbage
collection. Now, we have to discuss the co-ordination of effort by a num-
ber of reducers working concurrently on the same graph.
One of the reducers is called the master, the others are called subordi-
nates. Each reducer works in normal order using its own control stack.
Parallelism can be initiated only by the master by submitting the address
of a subexpression to the work pool. The master is responsible also for the
termination of a computation when normal form is reached.
The control of parallelism is concerned with speeding up the compu-
tation by an efficient use of the available resources. This means that only
the time elapsed between starting and finishing a computation by the
master is critical. The time spent on the computation by any of the subor-
dinate processors must be within that range and does not matter.
The use of a shared memory makes it necessary for the reducers to
protect some of the nodes of the graph from outside interference while
they are working on them. An important feature of our graph reduction
rules is that the nodes of a redex are accessed in read only mode except for
the top node which is updated in place. Therefore, the protection of a redex
during its contraction is relatively easy. The processor working on the
contraction of the redex must have exclusive read/write access to the top
node but it can have simultaneous read only access to the other nodes with
any other processor. This protection mechanism may require a special
'lock' field in each node record, but it also needs hardware support, which
we do not discuss here.
In order to develop an efficient strategy for parallel graph reduction
we have to address three main questions:
(i) How to represent the 'current state' of a parallel computation?
(ii) How to select subexpressions for parallel reduction?
(iii) How to avoid useless computations?
148 CHAPTER 7

The answer to the first question is relatively simple. The control stack of
the master represents the main thread of the computation while the work
pool represents the pending tasks for possible parallel computations. Each
reducer will have a status bit to tell if it is busy or idle. When busy, it will
also store the address of the expression that is being reduced by it.
Whenever a subordinate processor starts working on a subexpression,
it will insert a special node into the graph in order to alert other processors
which bump into this subexpression while traversing the graph in normal
order. This scheme was devised by Moti Thadani [Thad85] who also de-
veloped the basic version of this control strategy for parallel graph re-
duction. The extra node is called a 'busy signal' node, which contains
information about the processor currently working on the subexpression.
Otherwise, this node is treated as an indirection node.
Now, the question is what happens when two processors are trying to
reduce the same subexpression. Two cases must be distinguished:
(a) The master bumps into a subexpression currently being reduced
by a subordinate processor.
(b) A subordinate processor bumps into a subexpression currently
being reduced by another processor.
In case (a) the master will stop the subordinate processor and take over the
reduction of the subexpression as it is. In case (b) the processor already
working on the subexpression will continue its work and the other
processor will stop, i.e. go back to find some other task from the work pool.
A subordinate processor must be halted also when the master discov-
ers that it performs useless computation. This can happen, for example,
when a /32-redex is contracted which throws away the operand. The busy
signal node is quite helpful in this case, because it holds the identifier of the
processor to be stopped. Note that a subordinate processor cannot initiate
other processes, so it has no offspring to worry about when killing it. Of
course, the busy signal node must be eliminated from the graph when the
corresponding processor stops.
We must observe that subexpressions may be shared and thus, the
subexpression discarded in a 132 step may still be needed later on. Never-
theless, it is better to stop evaluating it after the /32 step, because no effort
that may have already been spent on it will be wasted. The intermediate
result in the form of a partially reduced graph is always reusable. On the
TOWARDS A PARALLEL GRAPH-REDUCTION 149

other hand, it may involve a very long or perhaps infinite computation

which should not be continued until a new demand for it occurs.
Consider now the question of initiating parallel processing, that is, se-
lecting subexpressions to be placed in the work pool. The computation of
the value of a function for some argument usually requires the evaluation
of the argument. Therefore, it is reasonable to start the evaluation of the
argument in parallel with the evaluation of the function when the latter is
not simply a primitive function. The master will work on the operator part
while a subordinate processor may work on the operand. Another oppor-
tunity for parallelism arises when a primitive binary operator such as + is
encountered. The master will evaluate the first operand after sending the
second operand to the work pool.
A similar situation occurs when a {34-redex is contracted. Again the
master will work on the leftmost redex while the other newly formed redex
will be sent to the work pool. Similar opportunities occur after the con-
traction of an a4, aS, yl, or y2-redex, which are implemented lazily as
shown in Figures 6.5 and 6.7. Also, the lazy implementation of map gives
rise to an opportunity for parallelism as shown in Figure 6.1 0. A repeated
application of this rule may result in a completely parallel application of the
function F to all members of L. The overhead of initiating these processes
is relatively small but, of course, we never know how much computation
is involved in the reduction of a subexpression sent to the work pool. In
other words, this control strategy is based on speculative parallelism which
may not be optimal. This leads us to our third question, which is concerned
with useless computations.
As we mentioned before, the usefulness of a parallel computation
cannot be determined completely in advance when using a nonstrict lan-
guage. In a strict language like Backus's FP the risk of useless parallelism
is much smaller, but it still occurs if, for example, we want to compute the
two arms of an if statement in parallel. A serious problem with the
strictness requirement is that the value of a function can be undefined in
two different ways. Firstly, it may be undefined because the argument has
the wrong type or the wrong value as is the case, for instance, with (null)2
or with ((/)2)0. Secondly, it may be undefined because its computation
does not terminate. In the first case the undefined value can be repres-
ented by a special symbol like w which is, in fact, a well-defined value. In
the second case this cannot be done in general, because the non-
!50 CHAPTER 7

termination of a computation is undecidable. (A typical cause for non-

termination is an ill-defined recursion.)
The strictness requirement appears to be very practical, as it can help
to establish the equivalence of the 'call by value' and the 'call by name'
mechanisms. Its universal enforcement, however, excludes lazy evaluation
and may cause serious inefficiency in certain cases.
Take, for instance, the selector function 1, which selects the first ele-
ment of a list. If it is treated as a strict function then

should be undefined. Now, in order to determine whether any of the ele-

ments is undefined we have to evaluate them all before we can return the
result. Similar problems occur with the ", ~, and & operators. So, the eval-
uation of strict functions may benefit more from parallelism, because they
demand more thorough evaluation to begin with.
The greatest risk of speculative computations with a nonstrict language
is due to the existence of possibly useless infinite computations. Normal
order lazy evaluation seems to be the only safe way to work with nonstrict
languages. Therefore, no matter how cautious we are when initiating
speculative computations, they may have to be halted even before the
master stops them so that the waste of time and space may be contained.
One possible solution is to set a fixed limit to the time and/ or space that
can be used by any speculative computation.
A very specific infinite computation may occur in our graph reduction
technique because of the use of cyclic graphs. As mentioned earlier, each
processor has a control stack to keep track of its position while traversing
the graph. This means that they do not leave visible marks on the path they
travel.
When a large number of processors are traversing concurrently the
same (shared) graph then marking off the visited nodes is clearly imprac-
tical. (Each node record would have to be extended by as many extra bits
as we have parallel processors. Moreover, the termination of a process
would require the resetting of the corresponding bits throughout the
graph.) The same is true for the 'pointer reversal' technique which is often
used in sequential (uniprocessor) implementations. In short, we feel that
any kind of side effects (even hidden ones) that store information in the
shared graph are undesirable.
TOWARDS A PARALLEL GRAPH-REDUCTION 151

The use of control stacks makes it possible to traverse the graph in a

'read only' mode by as many processors as we like. The graph will be
traversed via a control stack in depth first order as if it were actually a tree.
(The expression graph is nothing but a compact form of the parse tree.)
Cyclic graphs, however, may cause some problems. It is, in fact, necessary
to prevent the processors from going in circles indefinitely when traversing
a cyclic graph. The control stack by itself is not sufficient for that purpose.
It will simply overflow without knowing that the processor is running the
same track over and over again.
To prevent this from happening we use the following trick. We place
a special indirection node in each cycle which must be remembered by each
processor when traversed. Each time a processor traverses this node it will
ask its local memory whether it has already seen it. If so, then it will back
up, otherwise it will keep going but it will remember this node in its local
memory. This way it can avoid traversing the cycle twice. This method re-
quires that every cycle has at least one of these special indirection nodes. Ini-
tially, they will be placed in the graph by the parser, but we have to make
sure that this property is preserved throughout the reduction process. (An
occurrence of this special node could be eliminated in any reduction step
that makes a short cut within a cycle.) Fortunately, a minor adjustment
of the reduction rules is sufficient to preserve this property. This has made
the traversal of the graph without any side effect possible, which can be
done concurrently by any number of processors.
This concludes the discussion of our parallel graph reduction technique
for implementing a non-strict functional language. We have concentrated
on the hardware independent features and presented the main ideas leaving
out many of the technical details. Similar ideas are used in many other
techniques developed for the same purpose. Some of the other approaches
are mentioned in our bibliographical notes but we cannot offer even a
partial survey of this vast area of current research. We hope that future
progress in this area will eliminate the need for a sharp distinction between
symbolic and numeric computations.
APPENDIX A

A PROOF OF THE CHURCH-ROSSER THEOREM

The original proof of the Church-Rosser theorem was very long and
complicated [Ch-R36]. Many other proofs and generalizations have been
published in the last 50 years. The shortest proof, known so far, is due to
P. Martin Lof and W. Tait. An exposition of their proof appears as Ap-
pendix 1 in [Hind72] and also in [Hind86].
We present below an adaptation of this proof to our definitions of re-
naming and substitution as given in Section 2.2. These definitions are
slightly different from the standard ones. By using these definitions we can
ignore the so called a-steps in the proof. In order to make sure that our
proof is correct, we have worked out most of the details which are usually
left to the reader as an exercise. Therefore, our proof appears to be longer
but it is not really so.
The main idea of the proof is to decompose every ~-reduction into a
sequence of complete internal developments, and show that the latter have
the diamond property. Then the theorem can be shown by induction on the
length of this sequence. The definition of a complete internal development
is based on the notion of the residual of a ~-redex.
Definition A.l (Residual) Let R and S be two occurrences of
~-redexes in a A-expression E such that S is not a proper part of
R. Let E change to E' when R is contracted. Then, the residual of
S with respect toR is defined as follows:
A PROOF OF THE CHURCH-ROSSER THEOREM 153

Case 1: R and S do not overlap in E. Then contracting R leaves S

unchanged. This unchanged S in E' is called the residual of S.
Case 2: RandS are the same. Then contracting R is the same as
contracting S. We say S has no residual in E'.
Case 3: R is a proper part of S (R '¥:. S). Then S has the form
(Ax.P)Q, and R is either in P or in Q. Contracting R changes P to
P' or Q to Q'. Hence, S changes to (Ax.P')Q or (Ax.P)Q' and that
is the residual of S.
Definition A.2 (Complete Internal Development) Let R 1, ... , Rn
(n~O) be a (possibly empty) set of ~-redexes occurring in a
A-expression E. Any Ri is called internal with respect to the given
set iff no other Ri forms a proper part of it. Contracting an
internal redex leaves at most n-1 residuals. Contracting R 1, for
example, leaves the residuals R' 2 , ... , R' n· Contracting any of the
latter which is internal with respect to R' 2 , ... , R' n leaves at most
n-2 residuals. Repeating this process until no more residuals are
left represents a complete internal development. The existence of
a complete internal development which changes E to E' will he
denoted byE .... E'. (Note that the symbols .... and -+ are differ-
ent.)
Any subsequence of a complete internal development forms a complete
internal development with respect to the corresponding subset of the given
redexes. Also, if E .... E' and F .... F' then (E)F .... (E')F'. However, the
relation .... is not transitive as can be seen from this example:

(Ax.(x)y)Az.z .... (Az.z)y .... y

but there is no complete internal development from (Ax.(x)y)Az.z to y.

The only hard part of the proof is to show that the relation .... has the
diamond property (Lemma A.4). This will be shown with the aid of some
elementary properties of renaming and substitution which will be estab-
lished first in three technical lemmas. Then, the theorem will be shown by
induction in two steps as illustrated in Figures A.l and A.2.
Note that according to Definition 2.5, we do not have to worry about
a-conversions, because they may occur freely in ~-reductions between any
two consecutive ~-steps and also before the first, as well as after the last
~-step. The same is true for complete internal developments.
154 APPENDIX A

Lemma A.l If S ,..... S' then {z/y}S ,..... {z/y}S' for any variables y
and z.
Proof We use induction on the number of occurrences of vari-
ables in S, where the occurrence of a bound variable next to its
binding A will also be counted. (This is basically the same as in-
duction on the length of S.)
If S is a single variable then there is nothing to prove.
If S has form Ax.P then there is some P' such that P ,..... P' and
S' ~ Ax.P'. Hence, the assertion follows immediately from the in-
duction hypothesis and Definition 2.2.
If S has form (P)Q then two subcases arise:
Case a: Every /3-redex selected for the given complete internal devel-
opment is in P or Q. In this case there exist A-expressions P' and
Q' such that P ,..... P', Q ,..... Q', and S' ~ (P')Q'. But then the in-
duction hypothesis gives us the following complete internal de-
velopment:
{z/y}S = {z/y}(P)Q = ({z/y}P){z/y}Q,.....
({z/y}P'){z/y}Q' ~ {z/y}(P')Q' ~ {z/y}S'.

Case b: The given complete internal development involves contracting

the residual of (P)Q. In this case P must have the form Ax.R and
the contraction of the residual of (P)Q must be the last step in the
given complete internal development. Hence, there are some R'
and Q' such that R ,..... R', Q ,..... Q', and the given complete internal
development has this form:
S := (Ax.R)Q ,..... (Ax.R')Q' -+ [Q' /x]R' ~ S'.

Now, for any M, N, z, and y, if M -+ N (i.e., M reduces to N in

one step) then also {z/y}M-+ {z/y}N, which can be shown easily
by the reader using induction on the construction of M. This
combined with the induction hypothesis gives us the following
complete internal development:
{z/y}S = ({z/y}Ax.R){z/y}Q,..... ({z/y}Ax.R'){z/y}Q' =
{z/y}(Ax.R')Q' -+ {z/y}[Q' /x]R' ~ {z/y}S'

which completes the proof.

A PROOF OF THE CHURCH-ROSSER THEOREM ISS

Next, we have to prove some basic properties of the substitution operation.

First of all we need the following facts whose proofs are left to the reader:
(1) If v ;. <P(N) U <P(Q) then v;. <P([N/x]Q) for any Nand Q.
(2) If M ~ N then [Q/x]M ~ [Q/x]N and {z/y}M ~ {z/y}N for any M,
N, Q, and x, y, z.
Lemma A.2 Fot any A-expressions Q, N, S, and variables x, y, z
the following statements hold:
(a) [[N/x]Q/x]S ~ [N/x][Q/x]S
(b) If x '¥:. y, and x¢<t>(S) or y¢<P(N) then
[[N/x]Q/y][N/x]S ~ [N/x][Q/y]S
(c) If x '¥:. y, and XE<f>(S) and YE<P(N) then for any z with
x '¥:. z '¥:. y which is neither free nor bound in ((S)Q)N
[[N/x]Q/z][N/x]{z/y}S ~ [N/x][Q/y]S
Proof For each part we shall use again induction on the number
of occurrences of variables in S.
Part (a): If S is a single variable then the assertion is trivial. It is
also trivial if S ~ AX.P for some P.
If S has form AV.P with v '¥:. x then we can choose some vari-
able u such that u is neither free nor bound in P and u ;. <P(N) U
<P(Q). Hence, by using the induction hypothesis we get
[[N/x]Q/x]Av.P ~ [[N/x]Q/x]Au.{u/v}P ~

Au.[[N/x]Q/x]{u/v}P ~ Au.[N/x][Q/x]{u/v}P ~

[N/x]Au.[Q/x]{u/v}P ~ [N/x][Q/x]Au.{u/v}P ~
[N/x][Q/x]Av.P
which was to be shown.
Finally, if S has form (E)F then the assertion follows easily
from the induction hypothesis. Namely,
[[N/x]Q/x](E)F ~ ([[N/x]Q/x]E)[[N/x]Q/x]F ~
([N/x][Q/x]E)[N/x][Q/x]F ~

[N/x]([Q/x]E)[Q/x]F ~ [N/x][Q/x](E)F
156 APPENDIX A

Part (b): If S is a single variable then the assertion is trivial. It is

also trivial if S ~ A.y.P for some P.
If S has form Av.P with v '¢. y then we can choose some vari-
able u such that u is neither free nor bound in P and u ¢ cp(N) U
cp(Q). Hence, by using the induction hypothesis we get

[[N/x]Q/y][N/x]A.v.P ~ A.u.[[N/x]Q/y][N/x]{u/v}P ~
A.u.[N/x][Q/y]{u/v}P ~ [N/x][Q/y]A.v.P
Finally, if S has form (E)F then the assertion follows easily from
the induction hypothesis.
Part (c): If S is a single variable then x E cp(S) implies S ~ x for
which the assertion is trivial.
If S has form A.v.P then v '¢. x must be the case. Then we can
choose some variable u such that u is neither free nor bound in P
and u ¢ cp(N) U cp(Q) U {x,y,z,v}. Now, the induction hypothesis
gives us

[[N/x]Q/z][N/x]{z/y}A.v.P ~

[[N/x]Q/z][N/x]{z/y}A.u.{u/v}P ~

A.u.[[N/x]Q/z][N/x]{z/y}{u/v}P ~

A.u.[N/x][Q/y]{u/v}P ~

[N/x][Q/y]A.u.{u/v}P ~ [N/x][Q/y]A.v.P
Finally, if S has form (E)F then the assertion follows easily from
the induction hypothesis, and this completes the proof.
Lemma A.3 If M - M' and N - N' then for any variable x
[N/x]M- [N' /x]M'.
Proof. We use induction on the construction of M.
If M is a variable then the assertion is trivial.
If M is of the form A.y.S then M' ~ A.y.S' for some S' with S
- S'. Three subcases arise:
Case A: x = y. Then the following is a complete internal devel-
opment:

[N/x]A.x.S ~ A.x.S - A.x.S' ~ M' ~ [N' /x]M'

A PROOF OF THE CHURCH-ROSSER THEOREM !57

Case 8: x '¢. y, and x¢<f>(S) or y¢cj>(N). Then the induction hy-

pothesis gives us the following complete internal development:
[N/x]i\y.S ~ i\y.[N/x]S .... i\y.[N' /x]S' ~ [N' /x]i\y.S'
Case C: x ¥:. y, and X€ <t>(S) and y E cj>(N). Now, for any variable z
that is neither free nor bound in (S)N, the induction hypothesis
and Lemma A. I give us the following complete internal develop-
ment:
[N/x]i\y.S ~ i\z.[N/x]{z/y}S ....
i\z.[N' /x]{z/y}S' ~ [N' /x]i\y.S' ~ [N' /x]M'
Finally, if M has form (P)Q then two subcases arise:
Case 1: Every /3-redex selected for the complete internal develop-
ment M .... M' is in P or Q. Then M' ~ (P')Q' for some P' and Q'
such that P ~-+ P' and Q .... Q'. Hence, the induction hypothesis
gives us the following complete internal development:
[N/x](P)Q ~ ([N/x]P)[N/x)Q ....
UN' /x]P')[N' /x]Q' ~ [N' /x](P')Q'
Case 2: The last step of the complete internal development M ....
M' is contracting the residual of (P)Q. Then P has form i\y.S, and
there are i\-expressions s' and Q' such that s - s', Q - Q', and
M' ~ [Q' /y]S'. Now, again three subcases arise:
=
Subcase 2(a): x y. Then, by part (a) of Lemma A.2, the
following is a complete internal development:
[N/x](i\x.S)Q ~ ([N/x]i\x.S)[N/x]Q ~

(i\x.S)[N/x)Q .... (i\x.S')[N' /x]Q' -+

[[N' /x]Q' /x]S' ~ [N' /x][Q' /x]S'.

Subcase 2(b): x '¢. y, and x¢cj>(S') or y¢cj>(N'). Then, by
part (b) of Lemma A.2, the following is a complete
internal development:
[N/x](i\y.S)Q ~ ([N/x]i\y.S)[N/x]Q ....
([N' /x]i\y.S')[N' /x]Q' ~ (i\y.[N' /x]S')[N' /x)Q' -+
158 APPENDIX A

[[N'/x]Q'/y)[N'/x]S' ~ [N'/x][Q'/y]S' ~
[N' /x]M'.
Subcase 2(c): x '¢. y, and xEcj>(S') and yEcj>(N'). Then,
by part (c) of Lemma A.2, the following is a complete
internal development:
[N/x](Ay.S)Q ~ ([N/x]Ay.S)[N/x]Q ,....
([N' /x]Ay.S')[N' /x]Q' ~
(Az.[N' /x]{z/y}S')[N' /x]Q' -+

[[N' /x]Q' /z)[N' /x]{z/y}S' ~ [N' /x][Q' /y]S' ~

[N' /x]M',

and this completes the proof.

Lemma A.4 If E ,.... U and E ,.... V then there is a A-expression Z
such that U ,.... Z and V .... Z.
Proof We use induction on the construction of E.
If Eisa variable then U == E V must be the case and thus
we can choose Z = E.
If E is of the form AX.P then all the /3-redexes selected for the
given complete internal developments must be in P. This means
that U ~ Ax.P' and V ~ Ax.P" for some P' and P" with P ,.... P'
and P ,.... P". Hence, the induction hypothesis gives us some P+
such that P' ,.... P+ and P" .... P+. So, we can choose Z ~ Ax.P+.
Finally, if E is of the form (M)N then three subcases arise:
Case 1: Every /3-redex selected for the given complete internal
developments occurs in M or N. In this case U ~ (M')N' and V
~ (M")N" for some M', N', M", and N" such that M ,.... M', N
,.... N', M ,.... M", and N .... N". By the induction hypothesis we
get some M+ and N+ such that M' ,.... M+, M" ,.... M+, N',.... N+,
and N" ,.... N+. Thus, we can choose Z ~ (M+)N+.
Case 2: E has form (Ax.P)N, and just one of the given complete
internal developments, say, E .... U involves contracting the resi-
dual of E. By Definition A.2 this must be the last step in that
complete internal development. Therefore, we have some
A-expressions P', P", N', and N" such that
A PROOF OF THE CHURCH-ROSSER THEOREM 159

p >+ p', p >+ p"

N ,... N', N ,... N"

and E >+ U has form

E ,... (Ax.P')N' -+ [N' /x]P' ~ U
while E >+ V has form
E ,... (Ax.P")N" ~ V.
The induction hypothesis gives us P+ and N + such that
P' >+ P+, P" >+ P+
N' >+ N+, N" ,... N+
Hence, by Lemma A.3 we can choose Z ~ (N+ /x)P+.
Case 3: Both E >+ U and E >+ V involve contracting the residual
of E. Then the given complete internal developments have form
E >+ (Ax.P')N' -+ [N' /x]P' ~ U
E ,... (Ax.P")N" -+ [N" /x]P" ~ V
and thus, we can choose Z ~ (N+ /x)P+ just like in Case 2, and this
completes the proof.
Thus, we have shown that the relation >+ has the diamond property. What
is left to do is to go from complete internal developments to arbitrary
/3-reductions. This we shall do in two steps:
Step 1: If E >+ U and E • V then there is a W such that U • W
and V >+ W.
Proof Note that every /3-step is a complete internal development
in itself. Hence, there exist some V 1, ... , Vm such that

Now, a repeated application of Lemma A.4 gives us some

z,, ... , zm such that
U >+ Z 1, and V 1 >+ Z1
and for 1 ~i<m
160 APPENDIX A

So, we can choose Zm~W. which completes the proof.

Zm~W

Figure A.l

Step 2: If E ~ U and E ~ V then there is aT such that U ~ T and

V~T.
Proof There exist some U 1, ••• , U" such that
E,... U 1......... u"~U.
A repeated application of Step 1 gives us some W 1, ••• , W" such
that
U 1 ~W 1 , and y,..W 1
and for 1 $j<n
ui+l~wi+l 'and wj ... wj+l·
So, we can choose W"~T. which completes the proof.
A PROOF OF THE CHURCH-ROSSER THEOREM 161

T;;;;Wn

Figure A.2

The result of Step 2 represents the Church-Rosser Theorem for arbi-

trary /3-reductions. It should be noted, however, that the Church-Rosser
property may or may not be preserved when new reduction rules are added
to the system. If the new rules can be implemented via /3-reduction (i.e.
they are consequences of the /3-rules) then the above proof remains valid.
If, however, the new rules are independent of /3-conversion then a new
proof is needed. This is the case, for example, with the 71-rule (see Section
3.4) which cannot be derived from the /3-rules. Nevertheless, the above
proof can be extended relatively easily to /377-conversion. Such extensions
can be found in [Hind86] and [Bare81]. The proof can be extended also
to our y-rules without any difficulty.
APPENDIXB

INTRODUCTION TO TYPED A-CALCULUS

In typed A-calculus every A-expression must have a type. The assignment

of types to A-expressions is straightforward. First, we assume that we have
a fixed set of ground types from which all types are built.
Definition 8.1: The set of types, Typ, is inductively defined as
follows:
(i) The ground types are all in Typ.
(ii) If 'T and a are in Typ then so is 'T-+ a
This means that we have only one type constructor, namely the -+ symbol
for constructing function types. But that is quite sufficient for our pur-
poses, since we have only two expression forming operations, abstraction
and application. Next, we assume that every variable has a type. More
precisely, we assume that for every type 'T there is an infinite sequence of
variables of type 'T. Then we assign a type to every A-expression as follows:
Definition 8.2: For every 'T, the set of A-expressions of type 'T, de-
noted by An is defined inductively as follows:
( 1) Every variable of type 'T is in AT
(2) If x is in AT and E is in Ao then Ax.E is in AT .. o
(3) If P is in AT .. o and Q is in AT then (P)Q is in Ao.
Hence, the set of typed A-expressions is U {AT I TE: Typ}
The notions of free and bound variables, renaming and substitution are
defined in the obvious way. Also the {l-rule remains the same
INTRODUCTION TO TYPED A.-CALCULUS 163

(/3) (Ax.P)Q - [Q/x]P

except that Q and x must be of the same type. It can be shown by induction
on the structure of P that the type of a /3-redex (Ax.P)Q is the same as that
of its contractum. Hence, we can prove that the type of a A-expression does
not change when it is reduced to normal form. This also implies that typing
is consistent with /3-equality, i.e. equal A-expressions are of the same type.
It is easy to see that /3-reduction has the Church-Rosser property also
in typed A-calculus.
Note, however, that the syntax of typed A-expressions is more complex
than that of the type-free notation because of the extra requirement of type
consistency. In particular, if x and Q are not of the same type then the
A-expression (Ax.P)Q is not in AT for any 'T E Typ. Thus, the set of typed
A-expressions forms a proper subset of the set of type-free ones.
Constants can be introduced naturally to typed A-calculus by assigning
a type to each of them. For instance, the constant 3 is assigned the type int
and the operator and is assigned the type

boolean- (boolean- boolean)

Thus, the operator and is applicable only to A-expressions of type boolean

and its application to other A-expressions is in error.
The type assignment given by Definition B.2 can be used for a pre-
liminary type checking of typed A-expressions before they are evaluated.
This is essentially the same as the static, or compile-time type-checking of
a typed programming language. In ALGOL-like languages the type of a
variable is usually defined by a type declaration, which is slightly different
from our permanent type assignment to the variables. But that is not im-
portant, since we can use a different variable in each declaration without
any difficulty.
Alternatively, we can make our abstractions more similar to the type
declarations of an ALGOL-like language by adding a type specification to
each bound variable in the binding prefix. Then the general form of an
abstraction will be the following:

A<variable>: <type>. <A-expression>

So, for example, the A-expression
Ax:int.((*)x)x
164 APPENDIX B

will be of type int-int, while the similar A-expression

Ax:real.( (*)x)x

will be of type real- real.

Observe the fact that the arithmetic operators are defined on both
types, so the type of the above expressions cannot be uniquely determined
without an explicit declaration of the type of the bound variable x. This
simple example already shows the complications occurring in most type
systems. The type assignment described in Definition B.2 is not concerned
with the implicit relationship among various types. The 'overloading' of the
* operator makes its type ambiguous and thus, we need additional infor-
mation for a proper type checking.
If not every constant has a unique type then the type of a
A-expression may or may not be uniquely determined by the types of the
constants occurring in it. If it is, then we say that the 'implicit' type of the
expression is independent of the types of its variables. For example, the
type of the expression
((*)3.14)x

is obviously real regardless of the type of x, unless it is in error. So, the in-
formation supplied by an explicit type assignment to the variables may turn
out to be redundant, which makes type checking interesting. It is, in a
sense, the confrontation of a 'specification' with the actual program.
Definition B.2 is actually a set of inference rules to determine the type
of typed A-expressions. These inference rules can also be used for inferring
the type of certain variables occurring in a typed A-expression. For in-
stance, the type of x in

Ax.(succ)x

must be int, because the function succ is of type int-int. So, the type of a
A-expression may be well-defined even if no explicit type assignment is
given to some of its variables. Unfortunately, it is very difficult to deter-
mine in general which of the variables need not be explicitly typed. Many
sophisticated type inference systems have been developed for typed pro-
gramming languages. (See, e.g. [MacQ82], [Mart85], and [Hanc87].)
INTRODUCTION TO TYPED A.-CALCULUS 165

The type assignment given by Definition B. 2 can be easily extended

to our list structures. For that purpose we have to modify Definition B.l
by adding the following clause:
(iii) If a 1, ... , an are in Typ then so is [a 1, ... , an]
This means that list types are simply the lists of the types of their compo-
nents. Definition B.2 will then be extended as follows:
(4) If E; is of type a; for 1 ~i~n then [E 1, ... , En] is of type
[al, ... , an]
(5) If Pis of type [-r-+a 1, ... , -r-+an] and Q is of type T then (P)Q
is of type [al, ... , an]
Also, the clauses (2) and (3) of Definition B.2 will be extended to the case
when either or both of -r and a are lists of types. Hence, if x is of type -r then
the A-expression
Ax.[E 1, ... , En]
is of type

while
[Ax.E 1, ... , AX.En]
is of type

) ' By applying either of them to some A-expression of type -r we get a

A-expression of type
[a]' ... , an]
Here we have introduced infinitely many different list types just as we have
infinitely many different function types. This means that our list operators
will be 'overloaded' unless we use a different operator for each different list
type. We prefer the first approach, therefore, we will use the following in-
ference rules for our overloaded list operators:
(A)[a 1, ... , an]= a 1
(~)[a!> ... , an] = [a 2, ... , an]
166 APPENDIX B

Similar rules can be given for map and append, so the type of an application
involving these operators can be determined from the types of the oper-
ands.
It should be emphasized that the type of a A.-expression depends on the
type of its components. Therefore, the type of a A.-expression will be com-
puted 'inside-out' rather than in normal order. This means that every sub-
expression of a well-typed A.-expression must itself be well-typed. To put
it differently, 'meaningful' A.-expressions cannot have 'meaningless' sub-
expressions.
A completely formal calculus on types can be developed along these
lines:
SYNTAX OF TYPE-DESCRIPTORS
<type-descriptor> ::=<ground-type> I <abstraction-type> I
<application-type> I <list-type> I
<operator-type> I <union-type>
<ground-type> ::=intI real I boolean
<abstraction-type> :: = <type-descriptor> ... <type-descriptor>
<application-type> :: = (<type-descriptor>) <type-descriptor>
<list-type> :: = [] I [<type-descriptor> <list-type-tail>
<list-type-tail> :: = ] I , <type-descriptor> <list-type-tail>
< opeartor-type > ::= +I- I* I I I< I ~ I= I ~ I> I¥- I" I"" I &
<union-type> :: = <type-descriptor>U<type-descriptor>
This syntax corresponds to an extended version of Definition B.l. The
most significant extension is represented by the application-type formed
with two arbitrary type-descriptors. The purpose of type checking is now
to determine whether or not the types involved in an application actually
match. The operator-type is used only for the overloaded operators, whose
types depend on their context.
INTRODUCTION TO TYPED A-CALCULUS 167

Now, to every typed A-expression we assign a type-descriptor on the

basis of the types of its components and its structure.
Definition 8.3: The type-descriptor assigned to a A-expression E,
denoted by (d)E, is defined inductively as follows:
(d)O = int

(d) 1 = int, etc ...

(d)true = boolean
(§)false = boolean
(d)succ = int-+int
(d)pred = int-+int
(d)mod = int-+ (int-+int)
(§)zero= int-+boolean
(d)and =boolean-+ (boolean-boolean)
(d) or = boolean- (boolean- boolean)

(d)not = boolean-+boolean
(§)null = f -+boolean
(d)x = 'T if x is a variable of type 'T

(d)Ax.P = (d)x- (d)P

(d)(P)Q = ((d)P)(d)Q
(d)[EI, ... , E"] = [(d)EI, ... , (d)E"]
(d)'T = 'T for every type-descriptor 'T

The structure of a type-descriptor obtained in this fashion will mirror the

structure of the given A-expression. It may contain, however, the symbol
f, which represents a generic list-type in the type-descriptor assigned to the
predicate null. According to the definition of d, the type of an overloaded
operator is itself due to the clause (d)'T = 'T. This clause implies also the
assignment (d)[] = [].
Type-descriptors can be simplified with the aid of the following rules:
168 APPENDIX B

SIMPLIFICATION RULES FOR TYPE-DESCRIPTORS

(-r-+o")'r = a for all type-descriptors -r and a.
([-r-+a]l ... , -r-+anJ)-r = [al, ... , anJ for all type-descriptors 1', al, ... , (Jn
(t-+-r)[] = -r for all type-descriptors -r
(t-+-r)[ap ... , anJ = -r for all type-descriptors -r, a 1, ••• ,an

(( + )int)int = int
(( + )int)real = real
(( + )real)int = real
(( + )real)real = real etc ...
((boolean)-r)a = -rUa
(A)[] = []

( t\) [ (J I' · · ·' (J nJ = (J I

(A)[al, ... , anJ u [-rl, ... , -rnJ = (JI U-ri

("'-')[] = []

("'-')t = t
( (& )-r) [] = [ 1']

((&)-r)[a 1, ••• , anJ = [-r, a 1, ••• , anJ

(int)[a 1, ••• , anJ = a 1 U ... Uan
Union-types can be simplified according to the usual properties of the U
operator. Furthermore, any list-type or union of list-types can be simpli-
fied (actually 'unified') to t, if necessary. Otherwise, these simplification
rules are very similar to the reduction rules of type-free A.-calculus. But, the
simplest form of a type-descriptor may not be uniquely determined by
these rules, because they may not have the Church-Rosser property. It is,
therefore, necessary to prove the consistency of this typing system either
by an appropriate generalization of the Church-Rosser theorem or by
INTRODUCTION TO TYPED A-CALCULUS 169

some other method. (For generalized Church-Rosser theorems see, e.g.

[Kn-870] or [Book83]).
Observe the fact that the application of these simplification rules in-
volves a great deal of pattern matching. For instance, the type expression
'Tin the simplification rule ('T-+a)'T = a may be quite complex. Therefore,
the consistency proof for this system is much harder than for standard
{l-reduction.
Nevertheless, we can design an automatic type checking system based
on these rules. After reducing a given type-descriptor to its simplest form
we have to check if there is any application-type left in it. An application-
type that cannot be simplified must be wrong.
Recursive definitions may cause a problem, however, because we do
not have a type-free Y combinator in typed .\-calculus. It cannot be re-
presented by any typed .\-expression, because self-application is not al-
lowed in typed .\-calculus. (The type 'T is considered different from 'T- 'T
for all 'T EType.) One possible way around this problem is to introduce a
different YTcombinator for each 'T EType with the property
(YT)E -+ (E)(YJE for all E of the type 'T,
and with the type inference rule
(YJ('T-+'T) = 'T for all 'TETyp
In the absence of fixed-point combinators, there are no infinite reductions
in the typed .\-calculus, which has the following important consequence:
Theorem B.l: Every typed .\-expression has a normal form.
For a detailed proof of this theorem in standard .\-calculus see Appendix
two on page 323 in [Hind86]. The idea behind the proof is the observation
that the number of arrows in a type-descriptor will never increase during
its simplification.
In order to determine the type of a recursively defined function we
have to solve the type equation obtained form the given recursive defi-
nition. For example, a recursive definition of fact, is the following:
fact= .\n.(((zero)n)l)((*)n)(fact)(pred)n
The corresponding type equation can be obtained from this by Definition
8.3. The type expression obtained in this fashion may be simplified by us-
ing the above rules. Hence, the type
170 APPENDIX B

(int- boolean)int
will be simplified as boolean. After performing all possible simplifications
we obtain the following type equation:
'T = int- ((boolean)int) (int-int)( 'T )int
where T is the only variable. A possible solution to this equation is
T = int-int
which clearly satisfies the equation. The existence of a solution to the type
equation does not necessarily imply the existence of a well-defined recur-
sive function satisfying the given definition. If, for example, in the above·
definition of fact we replace the pred function by the succ function then
we get the same type equation, but the function in question is undefined
for n > 0. Therefore, type checking is not fool-proof.
Note that the combinators true and false can also be treated as over-
loaded operators. Other 'type-free' combinators can be treated in a similar
fashion. For instance, the identity combinator I may be defined for every
typed A-expression E with the property
(I)E = E for every typed A-expression E,
and with the simplification rule
(I)'T = 'T for all TE Typ
The same technique can be used also for theY combinator, which repres-
ents, perhaps, the simplest solution to the problem of recursive definitions
in typed A-calculus.
BIBLIOGRAPHICAL NOTES

In order to preserve the continuity of the presentation of the subject mat-

ter, references to the literature have been kept at a minimum within the
text. The purpose of these notes is to provide additional information on the
relevant literature, but the compilation of a comprehensive bibliography
on lambda-calculus, combinators, and functional programming is far be-
yond our goal. Due to the sheer volume of the literature on each of these
subjects, we have not been able to include every important paper, nor do
we claim that we have listed the most important ones.
These notes serve two purposes, even though they are far from being
complete in any way. On one hand, they try to identify the original sources
of the material presented in this book. On the other hand, they provide
information on further readings for readers interested in some specific
area.
There are two excellent books on lambda-calculus, [Bare81] and
[Hind86], which contain extensive bibliography and cover a lot more ma-
terial than our book. But, in order to appreciate those books, the reader
must have a strong background in mathematics. They are concerned with
the mathematical development of the theory and do not discuss its appli-
cations in computer science. A nice introduction to lambda-calculus and its
applications in computer science can be found in [Burg75].
172 BIBLIOGRAPHICAL NOTES

A classic on lambda-calculus is [Chur41]. The two volumes of Com-

binatory Logic, [Curr58] and [Curr72], represent the most comprehensive
book on combinators and their use in mathematical logic.
The more recent developments in the theory of lambda·-calculus and
combinators have been inspired mainly by their applications in computer
science. Typed lambda-calculus, which is not discussed thoroughly in our
book, has been applied successfully to the theory of (typed) programming
languages and also made significant progress during the last few years.
The rest of these notes follows the order in which the topics are pre-
sented chapter by chapter in our book.
Chapter 1. A systematic study of the fundamental concepts of (imperative)
programming languages undertaken by Strachey [Stra67] and Landin
[Land65] has shown that the A.-notation is a convenient tool for a precise
mathematical description of the meaning of a program. This has led to
the development of Denotational Semantics, which is the subject of the
book [Stoy77]. The tutorial paper [Tenn76] and the book [Gord79] de-
scribe the application of the method to practical languages.
The first mathematical model for the type-free lambda-calculus was
found by Scott in 1969 [Scott73]. (See also his Turing Award lecture,
[Scott77], which tells the whole story.) Another interesting model was
constructed later by Plotkin [Plot76].
Polymorphic functions have become quite popular in type theory
lately due to the investigations started by Girard [Gira 71] and Reynolds
[Reyn74]. A good introduction to the type theory of progamming lan-
guages is [Ca-W85]. For more information on type theory see also
Chapters 15 and 16 in [Hind86].
Chapter 2. Our definitions of renaming and substitution are slightly modi-
fied versions of the standard definitions. The Church-Rosser theorem
was published first in [Ch-R36]. The revised a-, and J3-rules discussed in
Section 2.5 are from [RevG85]. There are many alternative definitions
of J3-reduction, most of which have been designed for being
implementated on a computer rather than being used by people. The
best known are [DeBr72], [Berk82], and [Stap79].
Chapter 3. The standard combinators were discovered independently by
Schonfinkel [Scho24] and Curry [Curr30]. TheY combinator was called
the paradoxical combinator by Curry, because it leads to a paradox when
used in mathematical logic. This has prevented both the A.-calculus and
BIBLIOGRAPHICAL NOTES 173

the theory combinators from being used as a foundation for mathemat-

ics, which was the original goal of their development [Ross82]. The close
relationship between A-calculus and the theory of recursive functions is
well explained in [Klee81].
The most significant improvements on the algorithm for bracket
abstraction, i.e. translation from the A-notation to pure combinators,
have been achieved so far by Abdali [Abda76] and Turner [Turn79a].
Turner's combinators have been used for implementing functional lan-
guages as described in [Turn79b] and [Sche86]. As can be seen from
[Burt82] and [Nosh85], bracket abstraction still represents one of the
most difficult problems for an efficienct use of combinators. Hughes
suggested using program dependent combinators, which he called
super-combinators, rather than a fixed set of predefined combinators
[Hugh82]. Hudak and Goldberg further refined the notion of super-
combinators in [Huda85]. A hardware design based on standard
combinators is presented in [Clar80]. The design of the G-machine is
based on the super-combinator approach [Kieb85]. An interesting
hardware design for implementing combinators is described in
[Rams86].
Chapter 4. The extension of the A-notation to include lists as primitive ob-
jects was given in [RevG84]. The aS-rule, as well as they-rules are from
the same paper. The latter are clearly independent of the J3-rules and
they are not valid for the 'encoded' representation of lists described at
the beginning of Chapter 4. Our treatment of mutual recursion is from
[RevG87].
The use of lazy evaluation for dealing with infinite lists has been
suggested by many authors. We have been influenced mainly by
[Turn82]. Sequential input or output files can also be treated as infinite
lists. If applicative order is used as the standard evaluation technique
then they need special treatment. In this case, they are usually consid-
ered as special objects, called streams, which must be processed with
delayed evaluation, or suspension, etc. (See, for example, Section 3.4 in
[Abel85].)
Chapter 5. One of the major advantages of functional languages is that
their semantics can be described in terms of rewriting rules [Halp84 ].
The same is true, of course, for the lambda-calculus, which represents,
in fact, a model of computation equivalent to the Turing machine. It
174 BIBLIOGRAPHICAL NOTES

should be noted, however, that many authors treat {3-conversion as a

purely syntactic matter. They consider the construction of an independ-
ent mathematical model based on set theory (or some other algebraic
structure) as the only proper way of defining the semantics of lambda-
calculus. There is some truth in this view, but it has never been applied
to Turing machines whose meaning was thought to be intuitively clear.
This author feels that lambda-calculus as a purely formal system for
performing reductions, i.e. mechanical computations, is self-sufficient.
A naive interpretation of A-expressions is, of course, another matter. It
was indeed necessary to clarify what kinds of interpretations are rea-
sonable and what are not [Scott73]. At the same time, the discovery of
Scott-continuous functions has made it possible to find a purely exten-
sional characterization of computable functions, which is explained in
more details in [Stoy77].
Controlled reduction is introduced in [RevG87]. It is related to the
idea of suspension or delayed execution used by many authors following
a suggestion by Friedman and Wise [Frie76]. The difference between
our controlled reduction and those other approaches is that we achieve
the effect of suspension by an appropriate modification of the reduction
rules themselves rather than by some extraneous control mechanism.
The FP system was designed by Backus [Back78], who has had the
most significant impact on the development of functional style pro-
gramming. A formal semantics for functional programs has been devel-
oped by Williams [Will82]. A translator from FP to our extended
A-notation has been implemented by Cazes [Caze87].
A thorough discussion of the design and implementation of Miranda
can be found in the book [Peyt87]. Our description of Miranda is based
on [Turn87], which appears as the Appendix in the same book.
Chapter 6. Graph reduction techniques have been studied by several au-
thors. The effect of sharing is analyzed in [Arvi85], which contains ref-
erences to many other papers on graph reduction. Our graph reduction
rules discussed in this chapter are from [RevG84]. A cyclic represen-
tation of recursion equations has been suggested by Turner [Turn79b].
Many other graph reduction techniques have been developed for imple-
menting functional languages. (See, for example, [Thak86].) Most of
them are using combinators rather than lambda-expressions. One of the
most promising efforts was the development of NORMA (Normal Order
BIBLIOGRAPHICAL NOTES 175

Reduction Machine) [Sche86], but it has been dropped. Graph reduction

is discussed at length in [Peyt87].
The survey paper [Vegd84] is a fairly complete account of various
implementation techniques that have been developed for functional lan-
guages. The so called data flow approach is somewhat complementary
to graph reduction [Ager82], because it proceeds from bottom up rather
than top down in the expression graph. But, it does not change the
graph, so it is a fixed program approach while graph reduction is not. An
interesting technique for parallel string reduction has been developed by
Mago [Mago79].
Chapter 7. A preliminary version of our parallel graph reduction strategy
is described in [Thad85]. More about on-the-fly garbage collection can
be found in [Cohe81]. The parallel graph reduction technique described
in [Clac86] is somewhat similar to ours, but it has some very important
differences. More about implementing functional languages can be
found in [Hend80] and [Peyt87].
REFERENCES

[Abda76] ABDALI, S. K. An abstraction algorithm for combinatory logic, The

Journal of Symbolic Logic Vol.41, No.1, (March 1976), pp. 222-224.
[Abel85] ABELSON, H., and SUSSMAN, G. J. with SUSSMAN, J., Structure
and Interpretation of Computer Programs, MIT Press, and McGraw-Hill
Book Company, 1985.
[Ager82] AGERWALA, T., and ARVIND: Data Flow Systems, Computer 15,2
(1982), pp. 10-13.
[Arvi85] ARVIND, KATHAIL, V., and PINGALI K., Sharing computation in
functional language implementations, Proc. Internal. Workshop on
High-level Computer Architecture, Los Angeles, May 21-25, 1985, pp.
5.1-5.12.
[Back78] BACKUS, J. W., Can programming be liberated from the von
Neumann style? A functional style and its algebra of programs. Com-
munications of the ACM, Vol. 21, No.8, (August 1978), pp. 613-641.
[Bare81] BARENDREGT, H. P., The Lambda Calculus: Its Syntax and Seman-
tics. 1st ed., North-Holland, 1981, 2nd ed., North-Holland, 1984.
[BenA84] BEN-ARI, M., Algorithms for on-the-fly garbage collection. ACM
Trans. Prog. Lang. Sys., Vol. 6, No.3, (July 1984), pp. 333-345.
[Berk82] BERKLING, K. J., and FEHR, E., A modification of the A.-calculus
as a base for functional programming languages. Proc. 9th ICALP
Conference, Lecture Notes in Computer Science, Vol. 140, Springer-
Verlag, (1982), pp. 35-47.
REFERENCES 177

[Book83] BOOK, R. V., Thue systems and the Church-Rosser property.

Combinatorics on Words: Progress and Perspectives, (ed. Cummings, L.
J.) Academic Press, pp. 1-38.
[Burg75] BURGE, W., Recursive Programming Techniques, Addison-Wesley,
1975.
[Burt82] BURTON, F. W, A linear space translation of functional programs to
Turner combinators, Information Processing Letters, Vol.14, No.5,
1982, pp. 201-204.
[Ca-W85] CARDELL!, L., and WEGNER, P., On understanding types, data ab-
straction, and polymorphism. Computing Surveys Vol. 17, No. 4, (De-
cember 1985), pp. 471-522.
[Caze87] CAZES, A., A translator from FP to the lambda calculus. Research
Report, No. RC 12844, IBM Thomas J. Watson Research Center,
Yorktown Heights, New York, 1987.
[Chur41] CHURCH, A. The Calculi of Lambda Conversion Princeton University
Press, Princeton, N.J., 1941.
[Ch-R36] CHURCH, A., and ROSSER, J. B., Some properties of conversion.
Trans. Amer. Math. Soc., Vol. 39. (1936), pp. 472-482.
[Clac86] CLACK, C., and PEYTON-JONES, S. L., The four stroke reduction
engine. Proc. of the 1986 ACM Conference on LISP and Functional
Programming, Cambridge, Mass. (Aug. 1986), pp. 220-232.
[Clar80] CLARKE, T. J. W, GLADSTONE, P. J. S., MACLEAN, C. D., and
NORMAN, A. C.. SKIM - The S, K, I, reduction machine. LISP
Conference Records, Stanford University, Stanford, CA 1980, pp.
128-135.
[Cohe81] COHEN, J., Garbage collection of linked data structures. Computing
Surveys, Vol. 13, No.3, (Sept. 1981), pp. 341-367.
[Curr30] CURRY, H. B., Grundlagen der kombinatorischen Logik. American
J. Math., Vol. 52, (1930), pp. 509-536, 789-834.
[Curr58] CURRY, H. B., and FEYS, R., Combinatory Logic, Vol. I, North-
Holland, Amsterdam, 1958.
[Curr72] CURRY, H. B., HINDLEY, J. R., and SELDIN, J. P., Combinatory
Logic, Vol. II, North-Holland, Amsterdam, 1972.
[DeBr72] De BRUUN, N. G., Lambda-calculus notation with nameles dummies,
a tool for automatic formula manipulation with application to the
Church-Rosser theorem. Indag. Math., Vol. 34, (1972), pp. 381-392.
178 REFERENCES

[Dijk78] DIJKSTRA, E. W., LAMPORT. L., MARTIN, A. J., SCHOLTEN,

C. S., and STEFFENS, E. F. M., On-the-fly garbage collection: An
exercise in cooperation. Communications of the ACM, vol. 21, no. 11,
November 1978, pp. 966-975.
[Frie76] FRIEDMAN, D.P., and WISE, D. S., Cons should not evaluate its ar-
guments. Automata, Languages, and Programming, (ed. MICHELSON,
S., and MILNER, J.), Edinburgh Univ. Press, Edinburgh (1976), pp.
257-284.
[Frie78] FRIEDMAN, D. P., and WISE, D. S., Aspects of applicative pro-
gramming for parallel processing. IEEE Trans. Comput., C-27, (April
1978),pp. 289-296.
[Gira71] GIRARD, J-Y., Une extension de !'interpretation de Godel a !'analyse
et son application a !'elimination des coupures dans !'analyse et Ia
theorie des types. Proceedings of the Second Scandinavian Logic Sym-
posium, (ed. FENSTAD, J. E.), North-Holland, 1971, pp. 63-92.
[Gott83] GOTTLIEB, A., LUBACHEVSKY, B. D., and RUDOLPH, L., Co-
ordination of very large number of processors. ACM Transactions on
Programming Languages and Systems, 5(2}, April 1983, pp. 164-189.
[Gord79] GORDON, M. J. C., The Denotational Description of Programming
Languages, Springer-Verlag, 1979.
[Halp84] HALPERN, J. Y., WILLIAMS, J. H., WIMMERS, E. L., and
WINKLER, T. C., Denotational semantics and rewrite rules for FP.
Con[ Rec. 12th Annual ACM Symposium on Principles of Programming
Languages, New Orleans, LA (1985), pp. 108-120.
[Hanc87] HANCOCK, P., Polymorphic type-checking. Chapter 9 in [Peyt87],
pp. 139-182.
[Hend80] HENDERSON, P., Functional Programming: Application and Imple-
mentation, Prentice-Hall, 1980.
[Hind72] HINDLEY, J. R., LERCHER, B., and SELDIN, J.P., Introduction to
Combinatory Logic, Cambridge University Press, London, 1972.
[Hind86] HINDLEY, J. R., and SELDIN, J.P., Introduction to Combinators and
A.-Calculus. Cambridge University Press, 1986.
[Huda85] HUDAK, P., and GOLDBERG, B., Distributed execution of func-
tional programs using serial combinators. IEEE Transaction on Com-
puters, C-34(10), (October 1985), pp. 881-891.
[Hugh82] HUGHES, R. J. M., Super-combinators: A new implementation
method for applicative languages. Con[ Rec. 1982 ACM Symposium
REFERENCES 179

on LISP and Functional Programming, Carnegie-Mellon Univ.,

Pittsburgh, PA, (Aug. 1982), pp. 1-10.
[Hwan84] HWANG, K., and BRIGGS, R., Computer Architecture and Parallel
Processing. McGraw-Hill, 1984.
[Kieb85] KIEBURTZ, R. B., The G-machine: a fast, graph-reduction evaluator.
Proc. of IFIP Conf on Functional Prog. Lang. and Computer Arch.,
Nancy, 1985, pp. 400-413.
[Klee81] KLEENE, S. C., Origins of recursive function theory. Annals of the
History of Computing, Vol. 3, No.1, (January 1981), pp. 52-67.
[Kn-B70] KNUTH, D. E. and BENDIX, P., Simple word problems in universal
algebras. Computational Problems in Abstract Algebra, (ed. Leech, J.)
Pergamon Press, Oxford (1970), pp. 263-297.
[Land65] LANDIN, P. J., A correspondance between Algol 60 and Church's
lambda-notation. Communications of the ACM, Vol. 8, (1965), pp.
89-101, 158-165.
[MacQ82]MACQUEEN, D. and SETHI, R., A semantic model of types for ap-
plicative languages. Conference Record of the 1982 ACM Symposium
on Lisp and Functional Programming, pp. 243-252.
[Mago79] MAGO, G. A., A network of microprocessors to execute reduction
languages, Part 1 and Part 2. International Journal of Computer and In-
formation Sciences, Vol.8, No.5, (1979) pp. 349-385, and Vol.8, No.6,
(1979) pp. 435-4 71.
[Mart85] MARTIN-LOF, P., Constructive mathematics and computer pro-
gramming. Mathematical Logic and Programming Languages, (ed.
Hoare, C. A. R. and Shepherdson, J. C.), Prentice-Hall, 1985,
pp.167-184.
[Nosh85] NOSHITA, K, and HIKITA, T., The BC-chain method for represent-
ing combinators in linear space. New Generation Computing, Vol.3,
1985, pp. 131-144.
[Peyt87] PEYTON-JONES, S. L., The Implementation of Functional Languages,
Prentice-Hall, 1987.
[Plot76] PLOTKIN, G. D., A powerdomain construction. SIAM Journal on
Computing, Vol. 5, (1976), pp. 452-487.
[Rams86] RAMSDELL, J.D., The CURRY Chip. Proc. of the 1986 ACM Con-
ference on LISP and Functional Programming, Cambridge, Mass.
(Aug.1986), pp. 122-131.
180 REFERENCES

[RevG84] REVESZ, G., An extension of lambda-calculus for functional pro-

gramming, The Journal of Logic Programming, Vol.l, No.3, (1984), pp.
241-251.
[RevG85] REVESZ, G., Axioms for the theory of lambda-conversion, SIAM
Journal on Computing, Vol.l4, No.2, (1985), pp. 373-382.
[RevG87] REVESZ, G., Rule-based semantics for an extended A-calculus. Re-
search Report, No. RC 12570, IBM Thomas J. Watson Research Cen-
ter, Yorktown Heights, New York, 1987.
[RevP85] REVESZ, P. Z., A new parallel garbage collection algorithm. Honor's
Thesis, Tulane University, New Orleans, 1985.
[Reyn74] REYNOLDS, J. C., Towards a theory of type structure. Lecture Notes
in Computer Science, Vol. 19, Springer-Verlag, 1974, pp. 408-425.
[Ross82] ROSSER, J. B., Highlights of the history of the lambda-calculus.
Conference Record of the I982 ACM Conference on LISP and Func-
tional Programming, Pittsburgh, Pennsylvania, (Aug. 1982), pp.
216-225.
[Sche86] SCHEEVEL, M., NORMA: A graph reduction processor. Proc. of the
I986 ACM Conference on LISP and Functional Programming,
Cambridge, Mass. (Aug. 1986), pp. 212-219.
[Scho24] SCHONFINKEL, M., Ober die Bausteine der mathematischen Logik.
Math. Annalen, Vol. 92, (1924), pp. 305-316.
[Scott73] SCOTT, D. S., Models for Various Type-free Calculi. Logic, Method-
ology and Philosophy of Science IV, (ed. SUPPES et al.), North-
Holland, 1973, pp.157-187.
[Scott77] SCOTT, D. S., Logic and programming languages. Communications of
the ACM, Vol. 20, No.9, (September 1977), pp. 634-641.
[Scott80] SCOTT, D. S., Relating theories of the A-calculus. To H. B. Curry:
Essays on Combinatory Logic, Lambda Calculus and Formalism, (ed.
SELDIN, J.P. and HINDLEY, J. R.), Academic Press, 1980.
[Stap79] STAPLES, J., A lambda calculus with naive substitution, J. Austral.
Math. Soc. Ser. A, 28 ( 1979), pp. 269-282.
[Stoy77] STOY, J. E., Denotational Semantics: The Scott-Strachey Approach to
Programming Language Theory, MIT press, 1977.
[Stra67] STRACHEY, C., Fundamental concepts in programming languages.
International Summer School in Computer Programming, Copenhagen,
1967, (unpublished)
REFERENCES 181

[Tenn76] TENNENT, R. D., The denotational semantics of programming lan-

guages. Communications of the ACM, Vol. 19, No. 8, (1976),
pp.437-453.
[Thad85] THADANI, M., Parallelism in reduction machines. Master's Thesis,
Tulane University, New Orleans, LA, 1985.
[Thak86] THAKKAR, S. S., and HOSTMANN, W. E., An instruction fetch unit
for a graph reduction machine. IEEE Computer Architecture Conference
1986, pp. 82-90.
[Turn79a] TURNER, D. A., Another algorithm for bracket abstraction. Journal
of Symbolic Logic, Vol.44, No.2, (June 1979), pp. 267-270.
[Turn79b]TURNER, D. A., A new implementation technique for applicative
languages. Software- Practice and Experience, Vol.9, (Sept. 1979), pp.
31-49.
[Turn82] TURNER, D. A., Recursion equations as a programming language.
Functional Programming and its Applications, (ed. DARLINGTON et
al.), Cambridge University Press, 1982, pp. 1-28.
[Turn87] TURNER, D. A., An introduction to Miranda. Appendix to [Peyt87],
pp. 431-438.
[Vegd84] VEGDAHL, S. R., A survey of proposed architectures for the exe-
cution of functional languages. IEEE Transactions on Computers,
C-33(12), (December 1984 ), pp. 1050-1071.
[Wads71] WADSWORTH, C. P., Semantics and pragmatics of the lambda cal-
culus, Ph.D. thesis, Oxford.
[Will82]. WILLIAMS, J. H., Notes on the FP style of functional programming.
Functional Programming and its Applications, (ed. DARLINGTON et
al.), Cambridge University Press, 1982, pp. 73-101.
Th1s book presents an Introduction to lambda-calculus and
combinators without getting lost in the details of the
mathematical aspects of their theory. Lambda-calculus is
treated here as a functional language and 1ts relevance to
computer science is clearly demonstrated. The main purpose
of the book is to provide computer science students and
researchers with a firm background in lambda-calculus and
combmators and show the applicability of these theories to
functional programming.
The book also contains the description of a software
simulator of a reduction machine to show how
lambda-calculus and or combinators can be used for
implementing functional languages. Reduction Machines
have attracted a great deal of attention in recent years, and
the account given here will be very useful for the design of
such machines. Worked examples and exercises will help
the reader throughout the text and an extensive bibliography
is included at the end.
The presentation of the material is self-contained. It can be
used as a primary text for a course on funct1onal
programming. It can also be used as a supplementary text
for courses on the structure and implementation of
programming languages, theory of computing, or semantics
of programming languages.

Cambridge Tracts in Theoretical

Computer Science
Managing Editor
Professor C.J. van Rijsbergen, Computing Science
Department, University of Glasgow
Editorial Board
S. Abramsky, Department of Computing Science, lmpertal
College of Science and Technology
P.H. Aczel, Department of Computer Science, University of
Manchester
J.W. de Bakker, Centrum voor Wiskunde en Informatica,
Amsterdam
J.A. Goguen, Programmmg Research Group, Oxford
J.V. Tucker, Department of Computer Science.
University of Swansea