Es: A Shell With Higher-Order Functions
Es: A Shell With Higher-Order Functions
Es: A Shell With Higher-Order Functions
ABSTRACT
In the fall of 1990, one of us (Rakitzis) re-implemented the Plan 9 command interpreter, rc,
for use as a UNIX shell. Experience with that shell led us to wonder whether a more general
approach to the design of shells was possible, and this paper describes the result of that
experimentation. We applied concepts from modern functional programming languages, such
as Scheme and ML, to shells, which typically are more concerned with UNIX features than
language design. Our shell is both simple and highly programmable. By exposing many of
the internals and adopting constructs from functional programming languages, we have
created a shell which supports new paradigms for programmers.
Although most users think of the shell as an At a superficial level, es looks like most UNIX
interactive command interpreter, it is really a pro- shells. The syntax for pipes, redirection, background
gramming language in which each statement runs a jobs, etc., is unchanged from the Bourne shell. Es’s
command. Because it must satisfy both the interac- programming constructs are new, but reminiscent of
tive and programming aspects of command execu- rc and Tcl[6].
tion, it is a strange language, shaped as much by Es is freely redistributable, and is available by
history as by design. anonymous ftp from ftp.white.toronto.edu.
— Brian Kernighan & Rob Pike [1]
Using es
Introduction
Commands
A shell is both a programming language and
For simple commands, es resembles other
the core of an interactive environment. The ancestor
shells. For example, newline usually acts as a com-
of most current shells is the 7th Edition Bourne
mand terminator. These are familiar commands
shell[2], which is characterized by simple semantics,
which all work in es:
a minimal set of interactive features, and syntax that
is all too reminiscent of Algol. One recent shell, cd /tmp
rc[3], substituted a cleaner syntax but kept most of rm Ex*
the Bourne shell’s attributes. However, most recent ps aux | grep ’^byron’ |
developments in shells (e.g., csh, ksh, zsh) have awk ’{print $2}’ | xargs kill -9
focused on improving the interactive environment
without changing the structure of the underlying For simple uses, es bears a close resemblance
language – shells have proven to be resistant to to rc. For this reason, the reader is referred to the
innovation in programming languages. paper on rc for a discussion of quoting rules,
redirection, and so on. (The examples shown here,
While rc was an experiment in adding modern however, will try to aim for a lowest common
syntax to Bourne shell semantics, es is an explora- denominator of shell syntax, so that an understanding
tion of new semantics combined with rc-influenced of rc is not a prerequisite for understanding this
syntax: es has lexically scoped variables, first-class paper.)
functions, and an exception mechanism, which are
concepts borrowed from modern programming Functions
languages such as Scheme and ML[4, 5]. Es can be programmed through the use of shell
functions. Here is a simple function to print the date
In es, almost all standard shell constructs (e.g.,
in yy-mm-dd format:
pipes and redirection) are translated into a uniform
representation: function calls. The primitive func- fn d {
tions which implement those constructs can be mani- date +%y-%m-%d
pulated the same way as all other functions: invoked, }
replaced, or passed as arguments to other functions.
The ability to replace primitive functions in es is key Functions can also be called with arguments.
to its extensibility; for example, a user can override Es allows parameters to be specified to functions by
the definition of pipes to cause remote execution, or placing them between the function name and the
the path-searching machinery to implement a path open-brace. This function takes a command cmd and
look-up cache. arguments args and applies the command to each
argument in turn:
fn %interactive-loop {
let (result = 0) {
catch @ e msg {
if {~ $e eof} {
return $result
} {~ $e error} {
echo >[1=2] $msg
} {
echo >[1=2] uncaught exception: $e $msg
}
throw retry
} {
while {} {
%prompt
let (cmd = <>{%parse $prompt}) {
result = <>{$cmd}
}
}
}
}
}
Figure 3: Default interactive loop
%parse prints its first argument to standard error, The difference with these is that they are given
reads a command (potentially more than one line names invoked directly by the user; ‘‘.’’ is the
long) from the current source of command input, and Bourne-compatible command for ‘‘sourcing’’ a file.
throws the eof exception when the input source is Finally, some settor functions are defined to
exhausted. The hook %prompt is provided for the work around UNIX path searching (and other) conven-
user to redefine, and by default does nothing. tions. For example,
Other spoofing functions which either have set-path = @ {
been suggested or are in active use include: a ver- local (set-PATH = )
sion of cd which asks the user whether to create a PATH = <>{%flatten : $*}
directory if it does not already exist; versions of return $*
redirection and program execution which try spelling }
correction if files are not found; a %pipe to run set-PATH = @ {
pipeline elements on (different) remote machines to local (set-path = )
obtain parallel execution; automatic loading of shell path = <>{%fsplit : $*}
functions; and replacing the function which is used return $*
for tilde expansion to support alternate definitions of }
home directories. Moreover, for debugging pur-
poses, one can use trace on hook functions. A note on implementation: these functions tem-
porarily assign their opposite-case settor cousin to
Implementation null before making the assignment to the opposite-
Es is implemented in about 8000 lines of C. case variable. This avoids infinite recursion between
Although we estimate that about 1000 lines are the two settor functions.
devoted to portability issues between different ver- The Environment
sions of UNIX, there are also a number of work- UNIX shells typically maintain a table of vari-
arounds that es must use in order to blend with UNIX. able definitions which is passed on to child processes
The path variable is a good example. when they are created. This table is loosely referred
The es convention for path searching involves to as the environment or the environment variables.
looking through the list elements of a variable called Although traditionally the environment has been
path. This has the advantage that all the usual list used to pass values of variables only, the duality of
operations can be applied equally to path as any functions and variables in es has made it possible to
other variable. However, UNIX programs expect the pass down function definitions to subshells. (While
path to be a colon-separated list stored in PATH. rc also offered this functionality, it was more of a
Hence es must maintain a copy of each variable, kludge arising from the restriction that there was not
with a change in one reflected as a change in the a separate space for ‘‘environment functions.’’)
other. Having functions in the environment brings
Initialization them into the same conceptual framework as vari-
Much of es’s initialization is actually done by ables – they follow identical rules for creation, dele-
an es script, called initial.es, which is con- tion, presence in the environment, and so on. Addi-
verted by a shell script to a C character string at tionally, functions in the environment are an optimi-
compile time and stored internally. The script illus- zation for file I/O and parsing time. Since nearly all
trates how the default actions for es’s parser is set shell state can now be encoded in the environment,
up, as well as features such as the path/PATH it becomes superfluous for a new instance of es, such
aliasing mentioned above. as one started by xterm(1), to run a configuration
Much of the script consists of lines like: file. Hence shell startup becomes very quick.
As a consequence of this support for the
fn-%and = $&and environment, a fair amount of es must be devoted to
fn-%append = $&append ‘‘unparsing’’ function definitions so that they may be
fn-%background = $&background passed as environment strings. This is complicated a
which bind the shell services such as short-circuit- bit more because the lexical environment of a func-
and, backgrounding, etc., to the %-prefixed hook tion definition must be preserved at unparsing. This
variables. is best illustrated by an example:
There are also a set of assignments which bind es> let (a=b) fn foo {echo $a}
the built-in shell functions to their hook variables:
which lexically binds b to the variable a for the
fn-. = $&dot scope of this function definition. Therefore, the
fn-break = $&break external representation of this function must make
fn-catch = $&catch this information explicit. It is encoded as:
region is disabled.3 Thus, any reference to a pointer The current implementation of es has the
in garbage collector space which could be invali- undesirable property that all function calls cause the
dated by a collection immediately causes a memory C stack to nest. In particular, tail calls consume
protection fault. We strongly recommend this tech- stack space, something they could be optimized not
nique to anyone implementing a copying garbage to do. Therefore, properly tail recursive functions,
collector. such as echo-nl above, which a Scheme or ML
There are two performance implications of the programmer would expect to be equivalent to loop-
garbage collector; the first is that, occasionally, ing, have hidden costs. This is an implementation
while the shell is running, all action must stop while deficiency which we hope to remedy in the near
the collector is invoked. This takes roughly 4% of future.
the running time of the shell. More serious is that at Es, in addition to being a good language for
the time of any potential allocation, either the collec- shell programming, is a good candidate for a use as
tor must be disabled, or all pointers to structures in an embeddable ‘‘scripting’’ language, along the lines
garbage collector memory must be identified, effec- of Tcl. Es, in fact, borrows much from Tcl – most
tively requiring them to be in memory at known notably the idea of passing around blocks of code as
addresses, which defeats the registerization optimiza- unparsed strings – and, since the requirements on the
tions required for good performance from modern two languages are similar, it is not surprising that
architectures. It is hard to quantify the performance the syntaxes are so similar. Es has two advantages
consequences of this restriction. over most embedded languages: (1) the same code
The garbage collector consists of about 250 can be used by the shell or other programs, and
lines of code for the collector itself (plus another many functions could be identical; and (2) it sup-
300 lines of debugging code), along with numerous ports a wide variety of programming constructs, such
declarations that identify variables as being part of as closures and exceptions. We are currently work-
the rootset and small (typically 5 line) procedures to ing on a ‘‘library’’ version of es which could be
allocate, copy, and scan all the structure types allo- used stand-alone as a shell or linked in other pro-
cated from collector space. grams, with or without shell features such as wild-
card expansion or pipes.
Future Work
Conclusions
There are several places in es where one would
expect to be able to redefine the built-in behavior There are two central ideas behind es. The first
and no such hook exists. The most notable of these is that a system can be made more programmable by
is the wildcard expansion, which behaves identically exposing its internals to manipulation by the user.
to that in traditional shells. We hope to expose By allowing spoofing of heretofore unmodifiable
some of the remaining pieces of es in future ver- shell features, es gives its users great flexibility in
sions. tailoring their programming environment, in ways
that earlier shells would have supported only with
One of the least satisfying pieces of es is its modification of shell source itself.
parser. We have talked of the distinction between
the core language and the full language; in fact, the Second, es was designed to support a model of
translation of syntactic sugar (i.e., the convenient programming where code fragments could be treated
UNIX shell syntax presented to the user) to core as just one more form of data. This feature is often
language features is done in the same yacc-generated approximated in other shells by passing commands
parser as the recognition of the core language. around as strings, but this approach requires resort-
Unfortunately, this ties the full language in to the ing to baroque quoting rules, especially if the nesting
core very tightly, and offers little room for a user to of commands is several layers deep. In es, once a
extend the syntax of the shell. construct is surrounded by braces, it can be stored or
passed to a program with no fear of mangling.
We can imagine a system where the parser only
recognizes the core language, and a set of exposed Es contains little that is completely new. It is
transformation rules would map the extended syntax a synthesis of the attributes we admire most from
which makes es feel like a shell, down to the core two shells – the venerable Bourne shell and Tom
language. The extend-syntax [9] system for Scheme Duff’s rc – and several programming languages, not-
provides a good example of how to design such a ably Scheme and Tcl. Where possible we tried to
mechanism, but it, like most other macro systems retain the simplicity of es’s predecessors, and in
designed for Lisp-like languages, does not mesh well several cases, such as control flow constructs, we
with the free-form syntax that has evolved for UNIX believe that we have simplified and generalized what
shells. was found in earlier shells.
We do not believe that es is the ultimate shell.
It has a cumbersome and non-extensible syntax, the
3This disabling depends on operating system support. support for traditional shell notations forced some
unfortunate design decisions, and some of es’s reached by electronic mail at haahr@adobe.com or
features, such as exceptions and rich return values, by surface mail at Adobe Systems Incorporated,
do not interact as well with UNIX as we would like 1585 Charleston Road, Mountain View, CA 94039.
them to. Nonetheless, we think that es is successful Byron Rakitzis is a system programmer at Net-
as both a shell and a programming language, and work Appliance Corporation, where he works on the
would miss its features and extensibility if we were design and implementation of their network file
forced to revert to other shells. server. In his spare time he works on shells and win-
Acknowledgements dow systems. His free-software contributions
We’d like to thank the many people who include a UNIX version of rc, the Plan 9 shell, and
helped both with the development of es and the writ- pico, a version of Gerard Holzmann’s picture editor
ing of this paper. Dave Hitz supplied essential popi with code generators for SPARC and MIPS. He
advice on where to focus our efforts. Chris Sieben- received an A.B. in Physics from Princeton Univer-
mann maintained the es mailing list and ftp distribu- sity in 1990. He has two cats, Pooh-Bah and Goldi-
tion of the source. Donn Cave, Peter Ho, Noel locks, who try to rule his home life. Byron can be
Hunt, John Mackin, Bruce Perens, Steven Rezsutek, reached at byron@netapp.com or at Network Appli-
Rich Salz, Scott Schwartz, Alan Watson, and all ance Corporation, 2901 Tasman Drive, Suite 208
other contributors to the list provided many sugges- Santa Clara, CA 95054.
tions, which along with a ferocious willingness to
experiment with a not-ready-for-prime-time shell,
were vital to es’s development. Finally, Susan Karp
and Beth Mitcham read many drafts of this paper
and put up with us while es was under development.
References
1. Brian W. Kernighan and Rob Pike, The UNIX
Programming Environment, Prentice-Hall,
1984.
2. S. R. Bourne, ‘‘The UNIX Shell,’’ Bell Sys.
Tech. J., vol. 57, no. 6, pp. 1971-1990, 1978.
3. Tom Duff, ‘‘Rc – A Shell for Plan 9 and Unix
Systems,’’ in UKUUG Conference Proceedings,
pp. 21-33, Summer 1990.
4. William Clinger and Jonathan Rees (editors),
The Revised4 Report on the Algorithmic
Language Scheme, 1991.
5. Robin Milner, Mads Tofte, and Robert Harper,
The Definition of Standard ML, MIT Press,
1990.
6. John Ousterhout, ‘‘Tcl: An Embeddable Com-
mand Language,’’ in Usenix Conference
Proceedings, pp. 133-146, Winter 1990.
7. Jon L. Bentley, More Programming Pearls,
Addison-Welsey, 1988.
8. David R. Hanson, ‘‘Fast allocation and deallo-
cation of memory based on object lifetimes,’’
Software—Practice and Experience, vol. 20, no.
1, pp. 5-12, January, 1990.
9. R. Kent Dybvig, The Scheme Programming
Language, Prentice-Hall, 1987.
Author Information
Paul Haahr is a computer scientist at Adobe
Systems Incorporated where he works on font
rendering technology. His interests include program-
ming languages, window systems, and computer
architecture. Paul received an A.B. in computer sci-
ence from Princeton University in 1990. He can be