Literate Programming in C: The CWEB System of Structured Software Documentation
Literate Programming in C: The CWEB System of Structured Software Documentation
Literate Programming in C: The CWEB System of Structured Software Documentation
3. The heart of the program is this simple loop. When we reach the end of one of the les, the les match if
and only if the other le has also reached its end. For this reason the test c
1
= c
2
, which requires characters
to be read from both les, must precede the test for le end; when only one le ends, it is the former test
which breaks the loop.
Search for rst dierence, leaving c
1
,= c
2
if and only if a dierence was found 3 )
while ((c
1
getc(f
1
)) = (c
2
getc(f
2
)) c
1
,= EOF)
if (c
1
= \n)
++
line; col left margin; else
++
col ;
This code is used in section 2.
4. When the rst dierence occurs at the end of one of the les, or at the end of a line, we give a message
indicating this fact.
Report the outcome of the comparison 4 )
if (c
1
= c
2
) printf ("Filesmatch.\n");
else
printf ("Filesdiffer.\n");
if (c
1
= EOF c
2
= EOF)
the le(c
1
= EOF); printf ("iscontainedintheotherasinitialsegment.\n");
else if (c
1
= \n c
2
= \n)
the le(c
1
= \n); printf ("hasashorterlinenumber%ldthantheother.\n", line);
else printf ("Firstdifferenceatline%ld,column%d.\n", line, col );
n;
++
arg ; / ignore argument 0, which is the program name /
if (n = 0)
open le(&f
1
, "Firstfiletocompare", ); open le(&f
2
, "Secondfiletocompare", );
else if (n = 1)
f
1
stdin;
if ((f
2
fopen(arg , read mode)) = ) printf ("Couldnotopenfile%s.\n", arg ); exit (1);
else if (n = 2)
open le(&f
1
, "Giveanotherfirstfile", arg
++
);
open le(&f
2
, "Giveanothersecondfile", arg );
8. Index.
arg : 2, 6.
bool: 1, 5.
buf : 7.
col : 2, 3, 4.
c
1
: 2, 3, 4.
c
2
: 2, 3, 4.
EOF: 3, 4.
exit : 6.
f: 7.
ush: 7.
fopen: 6, 7.
f
1
: 2, 3, 6.
f
2
: 2, 3, 6.
getc: 3.
is rst : 5.
left margin: 2, 3.
line: 2, 3, 4.
main: 2.
n: 2.
name: 7.
open le: 6, 7.
printf : 4, 5, 6, 7.
prompt : 7.
read mode: 6, 7.
scanf : 7.
stdin: 6.
stdout : 7.
the le: 4, 5.
8 WHAT A CWEB PROGRAM LOOKS LIKE CWEBx MANUAL
Some remarks about the example program Reading the program should not cause great problems to
anyone familiar with the C language, once one gets used to the representation of the symbols. We mention
a number of points that will have become clear in the course of the example.
The commentary text at the beginning of the sections is set in ordinary paragraphs, which contrasts
suciently with the appearance of the program text that the dividing line between the two can be easily
perceived, even though it is only marked by a bit of white space. In case the section denes (part of) a
named module, the module name heading the program fragment is set ush left, and is followed by ,
or in case this is not the rst dening occurrence of that name, by + (therefore, the occurrence of the
module name Functions 5 ) in 2 is not a dening one, whereas the occurrences of that name in 5 and 7
are). The style in which the module names and comments contained in the program fragments are set is
similar to that of ordinary text; indeed if they are too long to t on the line, they will be broken across lines
(with proper care taken to respect the indentation level). In CWEB embedded comments are always attached
to the right of a program element (usually a statement or declaration); in the example we can see there
is relatively little need for embedded comments, because of the other means provided for documentation.
An embedded comment that is split across lines will not look very good, and should only occur in cases of
emergency; in most cases it is better to use the commentary part at the beginning of the section for any
elaborate explanation. On the other hand, long module names (occupying up to about four lines) are not
uncommon when the task performed by the module calls for an extensive description.
As one can see in the example, it is common to refer to small pieces of C code (in most cases just variables
or simple expressions) from within the commentaries, module names and comments. The CWEB system makes
it easy to include such pieces, by providing a variant of the formatting routines used for the actual program
fragments (diering from them by the omission of any 2-dimensional layout features such as indentation).
In many cases the pieces of C code are so simple that they could easily be typeset directly (using T
E
Xs
math mode), producing the same formatted output without using the facilities of CWEB. But even then it is
preferable to use CWEB instead, because it will then guarantee that all identiers mentioned in such a way
in the documentation part of a section or in a comment, will be included in the index at the end of the
program. Although in many cases a reference would have been generated anyway by the program fragment
in the same section (as happens in all cases for our example program), this mechanism ensures that even
remarks about the use of variables and functions made in sections that contain no program fragment at all
can be traced from the index. Incidentally, identiers that are used only in a module name are not indexed,
which is why there is no reference to 2 in the index entry for exit .
When an index entry is recorded, whether from within a program fragment or a piece of C code embedded
in text, the occurrence may be agged as dening, depending on the context; this happens for instance
in the case of parameters in an ANSI/ISO style function heading, of variable declarations and of labels. If
at least one occurrence of an identier in some section is a dening one, then the corresponding section
number in the index entry for that identier will be underlined. Single-letter identiers, the special identier
NULL (appearing as ), and keywords of the language are considered so ubiquitous that no index references
for them are generated, except those that are underlined; e.g., in the example there is no reference to
6 for the variable n. For keywords this means that they will not appear in the index at all (unless the
programmer explicitly marks certain occurrences as dening); note however that identiers dened in a
typedef declaration (like bool in the example) will be indexed, even though they are set in boldface just
like keywords are.
Further attributes of CWEB programs An aspect of CWEB programs that does not stand out very clearly
in our miniature example is that it allows sets of related sections to be grouped together into chapters.
Each chapter is identied by its title, which appears in boldface after the number of its rst section; in
our example sections 1 and 8 start new chapters. The division into chapters has a few more eects on the
document, which were suppressed in our example, since they would interfere with the overall structure of
this manual: each chapter starts on a fresh page, its title appears in the running head of all its pages, and all
chapter titles are collected in the table of contents. (Style changes such as employed in this manual are easy
to obtain, since the style is not determined by the CWEB system, but rather by a separate format consisting
of T
E
X macros; a few small changes to standard format can change the overall appearance of the document,
and it would be equally easy to change for instance the page size or the symbols used to represent operators.)
CWEBx MANUAL WHAT A CWEB PROGRAM LOOKS LIKE 9
There is one important point left to explain about the example, which is the special position of the lines
starting with #dene and #include. Although they look like ordinary preprocessor lines, which could
have been included in the program fragments, they are in fact separate items that are given between the
documentation part and the program part of a section (this can be seen best in 6), forming a third type of
constituent of sections (although in most sections they will be absent). Their place in CWEB is less distinctive
then that of their analogues in WEB systems for languages that have no preprocessor (like the original WEB
for Pascal, which provides a separate macro facility itself): indeed the directives are just passed on to the
C preprocessor. Yet there is some advantage in specifying them as special items to CWEB, and in most cases
using these facilities is preferable to embedding the directives in the C program fragments.
One reason is that one usually wants the eects of preprocessor directives to be visible throughout the
C le that is generated, while this would not always be the case if they were specied inside the program
fragments; for instance if the denition of read mode in 6 had been included in the program fragment, it
could not have been validly used in 7, because that section will precede 6 in the C le produced. This
diculty could be overcome by collecting all macro denitions in a module that is used at the start of the
program and dened in many sections throughout the CWEB document. In fact this is just about how CWEB
treats the separately specied preprocessor directives: they are collected in order of appearance, and placed
at the very beginning of the C le. (Some other place of insertion for the preprocessor directives can be
specied by means of a pseudo-module named Preprocessor directives ), but this is quite rare.) Since a
section can dene only one module, the CWEB facility for preprocessor directives may help avoid having to
split up sections merely because they contain such a directive. Furthermore, an important reason to specify
#include directives to CWEB, is that this allows it to inspect those header les for any typedef declarations,
so that programs can be formatted properly; without this programs using typedef identiers dened in
header les would seriously confuse the syntax analysis that CWEB performs, resulting in very poor quality
formatting of program fragments.
Preprocessor directives other than those mentioned above can only be incorporated in a program by
including them in an ordinary program module, but there is relatively little need for such directives. In
situations where one would use conditional compilation in ordinary C, one can usually use the change le
mechanism provided by CWEB instead (this will be discussed below), especially if it involves system dependent
modications; this has the advantage that such modications do not aect the main source les, and only
those modications that are actually applied will be visible in the CWEB document. In the rare cases that
one does include a preprocessor directive in a program fragment, the fact that it is not being specied as a
separate item to CWEB is usually easy to recognise in the CWEB document, because the module name being
dened or some program text precedes it; however even if this should not be the case then such embedded
directives can still be distinguished by a slight dierence in horizontal and vertical spacing.
Output to multiple les There is one important construction one may encounter in CWEB documents,
that we have not mentioned yet. There may be module names that consist of a le name in typewriter type,
like common.h 14 ); usually such module names are nowhere referenced, but only have one or more dening
occurrences. CWEB documents containing such a module will produce a le of that name in addition to the
C program that is normally produced. The module bearing the name of the le will form the root module
of the C code written to that le, in the same way as the unnamed module forms the root module for the
ordinary output. This feature is particularly useful for the production of header les that can be included
by other compilation units (and even by the program produced as main output). It allows one for instance
to state function prototype declarations that go to the header le and the matching function denitions in
the C program in the immediate vicinity of one another within the CWEB document. The module with the
le name can refer to submodules, and so on to any depth, just like the modules contributing to the main
output. This possibility should be used with some restraint however, lest readers have diculty nding
out to which le the program fragment dened by some module will be sent. The preprocessor lines that
are handled by CWEB will normally only become part of the main program output, not of any additional
output les; this provides one valid reason for sometimes bypassing the facilities of CWEB, and incorporating
#dene and #include directives directly into program modules.
10 HOW TO CREATE A CWEB PROGRAM CWEBx MANUAL
4 How to create a CWEB program
In the previous section we have explained how one should read CWEB documents; in this section we shall
discuss how they can be written. The CWEB document we have been discussing is the printed text that is
eventually produced from the source le written by the programmer, but that le does not look quite like
the printed version; on the other hand the dierence in appearance is not so great that there is any diculty
nding the place in the source le corresponding to some part of the printed text.
The general setup The programmer creates a plain text le using the format explained below, which
contains both program fragments and commentary, and has le name extension .w; e.g., the le from
which the example above was produced is compare.w (it is included in the CWEBx distribution). The CWEB
system consists of two utility programs CTANGLE and CWEAVE that can be applied to this source le. In
order to create an executable program, one issues the command ctangle compare, which will read the
le compare.w and write a le compare.c containing the corresponding C program. This le can then be
processed in the ordinary way by any C compiler to produce an executable program. To produce a printed
document on the other hand, one issues the command cweave compare, which will again read the le
compare.w, and this time write a le compare.tex. This le serves as input for the typesetting program T
E
X:
by giving the command tex compare it will be processed, and the result is a le compare.dvi. This le
can be either previewed or converted to hardcopy output by the system dependent programs for this purpose
that accompany T
E
X. Despite the somewhat elaborate processing trajectories, it will become apparent that
the programmer has good control over the nal result produced in both cases.
A word of explanation about the names of CWEB and its constituent programs. The initial Cs stand
for the programming language, of course; the rest of the names are the same as those chosen by Knuth
for the original WEB system (which existed long before the World Wide Web). The CWEB language allows
one to separately describe small parts of a C program and their interconnections, both formal (via module
references) and informal (by some semantic relationship); with some fantasy this evokes the image of a web
of connected pieces. These parts are linearised quite dierently in their presentation for human readability
than in the ocial form in which they are presented to the C compiler, and it is the program CTANGLE that
does the somewhat complicated reordering to obtain the latter from the former. This process is traditionally
called tangling the code, although one could also call it untangling if one prefers formal to human order.
The CWEAVE program intertwines the T
E
X and C parts of the source text and weaves them together like
warp and weft, resulting in a beautifully formatted document. Despite these pretty metaphors, you will be
forgiven if you sometimes get these names mixed up.
This general organisation of CWEB has some immediate consequences. First of all, one needs to have
an operational T
E
X system and (not surprisingly) a C compiler in order to use CWEB; the CWEB programs
form only a comparatively small part of the utilities needed. Second, the CWEB language must be such that
both valid C code and T
E
X input can be derived mechanically from it, which are rather dierent formats.
Nevertheless the CWEB language is quite simple: this is because for almost all of the CWEB source text the
required format is either that of T
E
X or that of C. The main function of the specic CWEB commands is
to structure the source le and determine which parts of the input will be processed further in what way.
Finally, a somewhat unfortunate consequence of CWEBs setup is that errors may be detected by any one of
CTANGLE, the C compiler, CWEAVE and T
E
X. The knowledge about C and T
E
X built into the CWEB programs
is far from sucient to ensure that they will always produce error-free output code, although of course they
do their best not to introduce any errors themselves. A bright point in the case of C errors, is that the
#line directives produced by CTANGLE enable the compiler to refer directly to lines in the CWEB source le
in its errors messages, rather than to the intermediate C le (but T
E
X does not have a similar facility).
It follows from these facts that the CWEB programmer must be acquainted both with C and with T
E
X;
however, the depth of the knowledge required it not the same in both cases. Obviously, one cannot write a
computer program without a good understanding of the programming language used, but a very supercial
knowledge of T
E
X will suce: in most cases no T
E
Xpertise beyond the basic facts in chapters 26 of The
T
E
Xbook is required (but please dont skip chapter 2, as only too many people have done). The reason
for this is that one rarely needs to instruct T
E
X to do sophisticated formatting. It is true that the proper
typesetting of computer programs is a subtle matter, but it is precisely this part that is taken care of by
CWEAVE (even for references to C constructs in the commentary), and the programmer can just concentrate
CWEBx MANUAL HOW TO CREATE A CWEB PROGRAM 11
on writing syntactically correct C code. On the other hand the full power of T
E
X is available if one wishes
to use it, for instance to illuminate the program with things like complicated tables, or math formulae of a
dierent nature than those occurring in a computer program.
Since the CWEB commands deal only with the structure of the source le, not with its contents, they can
be very brief: they consist of @ followed by one other character, and are commonly referred to as control
codes. For instance, @ (i.e., @ followed by white space) indicates the start of a new section, and @c
marks the start of the C part of a section that contributes to the unnamed module. Control codes may
placed at any position within the source lines, although it is customary to place the ones dening the coarse
structure of the source le at the beginning of a line for better visibility. In some cases a control code marks
the beginning of a piece of text that will be interpreted by CWEB in a special way, as for instance @< which
starts a module name; the end of these control texts is always marked by the special code @>. The character
@ was selected because it is quite uncommon both in C and in T
E
X source code, but in those cases where
one does need to pass on the character itself (e.g., in C strings and comments) it should be written as @@.
We now discuss the various control codes, grouped by their function. Here we shall treat only the most
important control codes, which are used regularly in ordinary programs. Treatment of a number of additional
control codes, that either serve for ne tuning in special cases, or are intended to allow emergency xes in
unforeseen cases, is deferred to a later section, in order not to confuse novice CWEB users. For the codes that
are discussed, we do however provide full details of their use; most of these can be skipped on rst reading.
A summary of all CWEB control codes can be found an the end of this manual.
Sectioning codes: @*, @ , @~ The most important control codes are those that specify the division
of the CWEB program into sections. There are three codes that indicate the start of a new section, and are
therefore called sectioning codes. Each of them has a slightly dierent eect, and each section must start
with one of them (i.e., a section is never implicitly started). The three sectioning codes are @*, @ , and
@~, of which the second one is the most commonly used. No section numbers should be given in the source
le: these will be automatically computed and inserted by CWEB. A tab or newline following @ is considered
equivalent to a space, and for any of these three control codes, (further) white space separating it from the
T
E
X text that follows is ignored, as long as there is no completely blank line (which T
E
X would interpret as
the end of the paragraph that started with the section number).
A section starting with @* will start a new chapter of the CWEB document; it should be followed by the
title of the chapter, which is terminated by the two-character sequence . (again the space might be any
white space character). The title is not recognised by CWEB itself, but rather by T
E
X, as a delimited macro
argument
. This means that if one wants to have an occurrence of the sequence . in the title itself, this
can be achieved by enclosing the title (but not the . terminating it) in braces. If one wants to put other
things than plain text in a chapter title, one should be aware that it is converted to upper case in the running
heads of pages and also written to the table of contents le; only items that behave properly under these
operations should be used in a chapter title. Apart from issuing a title that will appear in several places, a
section starting a chapter will force a page break before it, and it will cause the section number to be printed
on the terminal during the execution of CTANGLE and CWEAVE, as a progress report.
As a feature for advanced users of CWEB, some extra information may be supplied with the @* control
code: if it is immediately followed by * or by a decimal number, than this is not included in the chapter
title, but rather interpreted as an indication of the level of the chapter. Here @** indicates the start
of a grouping of sections even coarser than a chapter, and the grouping started by @*n becomes ner as
n increases, with @*0 corresponding to unadorned @*. The eect of this level depends on the denition
of the T
E
X macros that format the chapter title and the lines in the table of contents, \N respectively
\contentsline, to which the level is passed as rst argument (for @** the level is 1); it could eect for
instance the font used for the chapter title or the amount of indentation of that title in the table of contents.
In the default denitions of these macros the level is largely ignored, except that @*n will not force a page
break for n s, where s is the value of the \secpagedepth register, which is set initially to 2.
Therefore, if no correctly specied title follows @*, then CWEAVE will nd nothing wrong, but T
E
X will
complain about a Runaway argument of a macro that the programmer did not explicitly write (namely
\N); this is one of the scarier error message that novice users can come across, so please be warned.
12 HOW TO CREATE A CWEB PROGRAM CWEBx MANUAL
In contrast to @*, a section starting with @~ instead of @ will tie itself to the previous section, in
the sense that a page break between these sections will be avoided. More precisely this is what happens:
normally CWEAVE will instruct T
E
X to break pages only between sections (except when one is too large too
t on a single page) and put as many sections on each page as possible subject to this restriction; however,
a section starting with @~ will be considered to be continuation of the previous section for the purpose of
page breaking. A situation where one would use @~ is the following: suppose we dene a function, and
also want to state its prototype, which will belong to a dierent module, since it has to appear earlier in the
program or even on a separate (header) le. A natural place to give the prototype in the CWEB document is
directly before the function, so that it can easily be seen that the prototype matches the actual denition.
Now without special measures there is a substantial chance that a page break will occur between these two
sections, since the short section with the prototype might t on an already partially lled page, whereas the
larger section with the denition might not. By starting the latter section with @~, it can be achieved that
in such cases the former section is moved together with the latter to the new page.
Like any other section the very rst section starts with s sectioning code (usually @*), and any text
that might precede it is not part of any section; this text is said to be in limbo. This material is ignored
by CTANGLE, and copied literally into the T
E
X le by CWEAVE (except for the replacement of @@ by @),
following the rst line which always reads \input cwebxmac (in order to load the standard format). The
purpose of the text in limbo is to allow issuing T
E
X commands that apply to the whole document (such as
macro denitions, possibly modications or additions to the standard format), or producing a title page or
an introduction preceding the sections of the CWEB document. No control codes are allowed in the limbo
text (well, almost; there are two exceptions, that will be mentioned below). The last section ends simply at
the end of the CWEB source le; there is no way to add material after it (or elsewhere outside the sections).
However, CWEAVE will append some material at the end itself (unless it is invoked with a x ag): an index
of identier uses, a list of module names, and a table of contents. Because the index is seamlessly attached
to the last section, it is customary to give that section the title Index and not to include any program
fragment in it.
Subsectioning codes: @d, @h, @f, @c, @< . . . @>= Each section, as delimited by the sectioning
codes, contains a T
E
X part (although it may be empty), and in addition at most one C part, which always
comes at the end of the section, and zero or more intermediate parts, of which there are three kinds: those
that specify #dene and #include directives, and format denitions (see below). Intermediate parts can
be given in an arbitrary order, as long as they come after the T
E
X part and before the C part, if present. The
beginning of any part other than the T
E
X part (which starts directly after the sectioning code) is marked by
an appropriate control code, which is called a subsectioning code; these codes are optional in the sense that
they need only be given if the corresponding part is present. The end of the T
E
X part is determined by the
rst subsectioning code, or in absence of any of them by the next sectioning code.
The C part, if present, begins at the rst occurrence of @< or @c; the former starts a dening occurrence
of a module name, and the latter is used when the C part belongs to the unnamed module. The code @c may
also be written as @C (in fact all alphabetic codes are equivalent to their upper case counterparts). Once the
C part of a section is started, any further module names are interpreted as modules references rather than
as dening occurrences. A module name, whether dening or not, consists of T
E
X code between the @<
and the next occurrence of @>. As a measure against accidental misinterpretation of module names, due
for instance to a forgotten @ or @c, the closing @> of a dening occurrence must be followed (optionally
with some white space in between, but no newline) by one of =, ==, += and +==, while for a non-dening
occurrence this must not be the case. The possibilities += and +== are included for those who like their
source code for continuations of modules to resemble the printed output, but the distinction is ignored by
CWEB: it will simply print after the rst dening occurrence of a module name and + after any further
dening occurrences.
The subsectioning codes that mark the beginning of intermediate parts are @d, @h, and @f. Of these
the rst two specify preprocessor directives for respectively a macro denition and the inclusion of a header
le, and the last species a so-called format denition. The codes @d and @h will be replaced by #dene
respectively by #include in both the program and the printed document. We already mentioned how the
eect of using @d or @h diers from that of using #dene or #include directly in the C part of the
CWEBx MANUAL HOW TO CREATE A CWEB PROGRAM 13
section: the directive will be moved to the beginning of the C le, and in case of @h, the header le will
be scanned for typedef denitions. Here we mention a few more points that are relevant when writing the
source le.
Macro denitions following @d are not line-oriented like those in C: everything up to the next subsection-
ing or sectioning code is considered to belong to the macro, and newlines need not be escaped, as CTANGLE
will take care of escaping any newlines while writing to the C le. There are some mild restrictions on the
replacement text of a @d macro denition: parentheses and braces should be balanced (this is a deliberate
requirement, made in order to allow detection of programming errors that would otherwise be very hard to
track; the same requirement also holds for each complete C part of a section), and no module names should
be referenced. It is not possible to use other preprocessor directives in macro denitions either, but that is
because this is already impossible in C. After a @h command, at least one newline should occur before the
next sectioning or subsectioning code.
Apart from this, @d and @h are followed by whatever would follow #dene respectively #include,
with the same deviant lexical rules as in C. So whether a macro introduced by @d is dened with or without
arguments depends on whether the rst character after the identier following @d is a left parenthesis or
not, where spaces are signicant. The le name after @h may be enclosed either in double quotes or in
angle brackets; the latter indicates that the header le is located in some system include le area. After the
le name a comment may be placed.
The header le specied after @h itself should of course contain ordinary C code rather than CWEB input;
after all, it will be read directly by the C compiler. As as mentioned before, the le will be scanned by CWEAVE
as well, searching for any typedef denitions; moreover, if it contains any lines starting with #include, then
those les will be scanned recursively as well. In the case of system header les (specied with angle brackets),
CWEAVE will refrain from scanning the le unless the le is found on an explicitly specied search path (see
below); in fact it is better not to scan any of the ANSI/ISO standard header les, since CWEAVE already
knows about all typedef denitions that can occur in such header les. It is not uncommon that a header le
specied after @h (using quotes) is itself an auxiliary output le produced from a CWEB source le, possibly
even from the very source le containing the @h command. There is no circularity or other problematic
aspect of such a situation, but one should remember to run CTANGLE to produce the header le, before the
run of CWEAVE that needs it.
The way CWEAVE searches for the header le depends on how the name following @h is specied: if it is
enclosed in quotes then CWEAVE will look rst in the current directory. There may have been specied one or
more alternative places to look for header les, in the form of strings that can be prexed to the le name
(given on the command line or compiled into CWEAVE, or both). If so, these will be tried in order, regardless
of the delimiters used for the le name, until a match is found; CWEAVE will only insist on actually nding a
header le if the le name was enclosed in quotes.
There is one aspect of scanning header les that might cause a problem in some cases: when scanning a
header le, CWEAVE is unaware of other preprocessor directives that may disable certain nested #include
directives; CWEAVE will therefore obey such #include directives unconditionally. Such a problem is not very
likely, but it could be serious if the nested header le cannot be found (and is enclosed in quotes), or if there
are circular references between header les. Various solutions could be found for such a problem, depending
on the precise situation, varying from creating dummy les or avoiding conditional compilation by the use
of change les to (as a last resort) avoiding the scan of the header le altogether, by using #include in
a program fragment rather than @h; in the latter case relevant information could be extracted from the
header le manually, and converted into format denitions (@f) described below.
When preprocessor directives are incorporated in the C part of a section, the ordinary rules of C apply:
they should be spelled out in full, as #define or #include, and occur at the beginning of a line; the
directive ends at the next non-escaped newline. Although in C it is permissible to extend a preprocessor
directive into the following line by placing a multi-line comment that contains the newline, this should not
be done in CWEB, since the comment will be removed by CTANGLE but the newline will remain. If one needs
a very long comment after a preprocessor directive, one should start it on the line following the directive; in
the formatted document such a comment will be placed on the same line as the directive. The same holds
for comments placed after a @h command.
14 HOW TO CREATE A CWEB PROGRAM CWEBx MANUAL
Format denitions, indicated by the code @f, are entirely specic to CWEB, and have no eect on the
C program that is dened. They are not needed very often, but when they are, a proper use of them
is essential for obtaining acceptably formatted output. To understand why they are sometimes needed,
one has to consider the way CWEAVE formats program fragments. The input is broken up into tokens (like
identiers, constants, operator symbols), and a syntactic category is attached to each; the resulting sequence
of categories is then analysed according to a grammar, and formatted correspondingly. Certain identier
tokens are recognised as reserved words and get a corresponding the syntactic category, others are recognised
as typedef identiers and get the same syntactic category as for instance size t, and the remaining ones
are treated as ordinary identiers. This scheme usually works ne, but occasionally there can be problems,
caused by the fact that CWEAVE is not aware of all the information that is available to the compiler. The
main reasons for this are macros (which may cause the code seen by the compiler to be quite dierent from
that seen by CWEAVE), typedef declarations that are hidden from CWEAVEs sight, and module names that
stand for a construct of a dierent syntactic category than statement (which is what CWEAVE expects them
to be by default). In all these cases CWEB provides mechanisms for the user to put CWEAVE on the right track,
and format denitions are one such mechanism (others will be discussed below).
Format denitions allow the programmer to explicitly state the syntactic category that CWEAVE should
attach to a given identier. They have the form @f x y, which will become format x y in the typeset
output; here x and y can be arbitrary identiers or keywords. This denition has the eect of associating to x
the same syntactic category that is associated to y. Such a change of category is required when an identier is
dened as a macro to stand for a keyword: whenever you say @d ident keyword, say @f ident keyword
as well. For instance, the author of this manual thinks the keyword static is not very informative when
applied to functions, and therefore often creates an alias for it by saying #dene local static; this directive
is then followed by format local static. We see that the rst identier after @d or @f is always typeset
in italics; this is so despite the fact that in the example, as a consequence of the format denition, this
identier will be typeset as local in all other places. Another reason to change a category could be that
an identier is in fact a typedef identier, but CWEAVE cannot deduce this fact (presumably the declaration
occurs in some header le that is not scanned by CWEAVE); in such cases one can use a standard dened
type like FILE or size t as the second argument to @f. Finally, it is possible that some C implementation
uses additional, non-standard keywords (or macros that behave as a keyword); such an identier should be
formatted like a standard keyword that has a similar syntactic function as it (which hopefully exists). In
fact the identier va dcl, which is used in a convention for functions with variable argument lists that is
not part of ANSI/ISO C, is nevertheless built into CWEAVE, because there is no keyword that has the required
syntax category (namely declaration), so that it would otherwise not be possible to introduce it; one can on
the other hand easily undo the reservation by saying format va dcl x.
Format denitions can also be used for a reason that does not have to do with syntax analysis. There
are two classes of identiers that are parsed like ordinary identiers, but are nevertheless treated specially;
these classes consist initially of the identiers TeX respectively NULL. The main distinction of these
classes is that their identiers are typeset dierently, namely as T
E
X macros; the mentioned identiers will
therefore be written to the T
E
X le as \TeX respectively \NULL, which causes them to be typeset as T
E
X
respectively . This mechanism gives the user the ability to change the appearance of identiers in any
desired way, simply by dening the macro appropriately. The class of TeX is intended for identiers that
are still alphabetic in appearance (possibly with letters being accented or shifted), while the class of NULL
is intended for identiers that are represented by mathematical symbols. Hence the T
E
X macro will be
processed in horizontal mode with italic font selected in the rst case, and in math mode in the second
case. Simply saying @f alpha NULL suces to make alpha print as ; the format denition is typeset as
format alpha () to make the correspondence of the identier and typeset symbol evident.
Unlike C identiers, T
E
X macros cannot contain underscores and digits. On writing of the macros to the
T
E
X le, underscores are replaced by x, so that they will become part of the macro. Digits however are
not changed, so identiers containing digits should not be put into the class of TeX or NULL by a format
denition, unless special care is taken: the macro will only consist of the part up to the rst digit. No index
entries for identiers of the class of NULL are recorded (the same holds for keywords); on the other hand
index entries for typedef identiers are recorded, despite the fact that they are formatted as keywords.
CWEBx MANUAL HOW TO CREATE A CWEB PROGRAM 15
Text within C program fragments: comments and module names Within the program part of a
section, the input should basically follow the rules of the C syntax, but amidst the C tokens there may
also occur module names and comments. In both cases the C code is temporarily interrupted by a piece of
ordinary text that is processed directly by T
E
X, just like the T
E
X part of a section. In the case of module
names this text is delimited by @< and @>, in the case of comments by /* and */. So comments are
actually valid C comments, but the converse is not true: the contents of a comment is processed by T
E
X, so
not all C comments can be used without modication; a point to keep in mind if one is converting ordinary
C code to CWEB. Like C comments, the comments of CWEB cannot contain the two-character sequence */
(regardless of the T
E
X context, because comments are recognised before T
E
X even gets to see them). The
sequence /* is forbidden as well, which allows CTANGLE to warn the programmer about unclosed comments,
that might otherwise lead to particularly elusive errors. In the T
E
X texts of comments and module names
no control codes are allowed (except in embedded pieces of C code, described below), but @@ can be used to
represent the character @ (this is true in all contexts); a module name is terminated by the rst occurrence
of the code @>. During the processing of these T
E
X texts, line ends are replaced by spaces, which implies
that T
E
X comments (starting with %) cannot be used. (In the T
E
X part of a section on the other hand, such
comments can safely be used: they are completely ignored by CWEAVE, and not even copied to the T
E
X le.)
The text for module names serves a dual purpose: apart from determining the text representing the module
in the printed output, it also serves to identify dening occurrences of a module name with references to
it. For the latter purpose it is irrelevant how the contents of a module name will be further processed;
there should basically be a character-by-character match. This rule is however alleviated in two ways to
make matching easier. First, any amount of consecutive white space is replaced by a single space, and white
space at either end of a module name is discarded. Second, an abbreviation mechanism for module names
may be used. A module name may be specied by a prex of the full name, followed by .... A few
conditions must be satised to allow this mechanism to work. All specications of one same module name
must be extensions of the one among them of minimal length, which must not be a prex of any other (full)
module name. All specications of the name that do not end with ... must be equal; there must be at
least one such specication, which denes the full module name of which all other specications give a prex.
Loosely speaking, the minimal specication is used for identication purposes, and the maximal specication
is used for typesetting all occurrences. With the help of these rules, and a text editor, there should be little
reason to choose module names any shorter than what is needed to express the function of a module clearly.
There is a limit on the length of a module name, but it is so generous that this could hardly be a problem:
1000 characters after replacement of consecutive white space characters by single spaces.
The parser of CWEAVE normally assumes that references to modules stand for (compound) statements,
which is likely to be a valid assumption in the majority of the cases (or at least one that does not upset
parsing, for instance when the module is actually a statement sequence). Occasionally however, one of two
other syntactic categories applies instead, namely declaration or expression (the remaining categories are
extremely unlikely). When this is the case, the programmer should make it clear to CWEAVE, lest the parser
might choke on the input and produce badly formatted output. This can be done by placing the control
code @; once (for a declaration) respectively twice (for an expression) directly after the module name (in
the latter case this also conveniently provides a separation from any = or += that might follow).
At the end of the CWEB document, after the index, a list will be placed of all module names used. This
list sorted lexicographically, with sorting based on the source strings for the full module names, collated
(unlike the identier index) in the order of the internal (ASCII) character codes. For this reason it is a good
convention to ensure that all module names are already distinguished by a prex consisting of alphabetic
characters and spaces only, of which the rst word is capitalised; then the order of the list will be natural
and independent of any internal details that the reader is not aware of.
C code within text: |...| fragments In order to mention a piece of C code within T
E
X text, it can
simply be enclosed in vertical bar characters (|); then CWEAVE will format it in a way similar to to C code
of modules. This feature may be used in any kind of T
E
X text except in limbo, i.e., in the ordinary T
E
X part
of a section, in comments and in module names. The piece of C code itself should not contain any comment.
The lightweight construction with vertical bars resembles the math shift characters ($) for T
E
Xs math
mode, and indeed in simple cases like |a[i+3]| the output would be identical if the | characters were
16 HOW TO CREATE A CWEB PROGRAM CWEBx MANUAL
replaced by $. The two modes should not be confused however: the C mode is implemented by CWEAVE,
which translates the C constructs before T
E
X ever gets to see them; it often uses math mode itself, and as
a consequence it should never be used when T
E
X is already in math mode. The syntax used by CWEAVE is
of a stricter kind than that of T
E
Xs math mode, but it can still be used for some expressions that are not
quite proper C; in particular there is no objection to writing things like begin p < end , which humans
understand better than compilers. On the other hand an incomplete formula like n (which can be used
in sentences, with the missing operand expressed in words) is better written as $\leq n$ than as |<=n|:
the latter is not understood by CWEAVEs parser, and therefore the <= and the n are translated separately
with an ordinary space in between; the result looks reasonable, but T
E
X may very well decide to break the
line at the space.
There is a lexical price to pay for using delimiters that are not control codes: it is impossible to use
character | in any piece of T
E
X text where |...| constructions are allowed (even if one tries for instance
to set up a verbatim context, because CWEAVE acts before T
E
X does). This should not cause great problems
however, since | is not a character in ordinary text fonts, and for | and \| in math mode, plain T
E
X
already has the substitutes \vert and \Vert; for exceptional text fonts (like typewriter type) that do
have |, the standard format for CWEB provides \v as a substitute (by means of \chardef) for |. Inside
|...| one has a similar problem of not being able to write the bitwise-or operator [ in the usual way.
For this purpose CWEB provides the control code @v to represent that operator (which you may also use in
an actual program fragment, although there is no need to do so there). Note that the composite operators
|= and || can be used without problem; consequently no |...| should be immediately followed by =
or by another |...|.
Although C comments are forbidden inside |...|, it is possible to mention a module in T
E
X text by
enclosing the module name in vertical bars; this T
E
X text can either be the T
E
X part of a section or a
comment, but not another module name. Mentioning a module in this way does not imply any inclusion
of the module body, so it is not considered to be a use of the module; in the cross-references it is referred
to as a citation of the module. For the module name itself the same rules apply as for other occurrences
of module names; in particular the abbreviation mechanism can be used, and CWEAVE will automatically
insert the relevant section number in the module name. Citing a module may form an exception to the
rule that an occurrence of a module name when the C part of a section has not already started must be a
dening one. Since CTANGLE normally ignores the vertical bars of |...| constructions together with the
surrounding T
E
X text, it needs a simple rule to decide whether a module is being cited or dened. It does
this by inspecting the next token (where a newline counts as a token, but codes like @; that are ignored
by CTANGLE are skipped): if this is = (or += etc.), then it assumes that the module is being dened, and
if it is | that the module is being cited; in other cases it signals an error (this could for instance happen
if a @c code is missing). Therefore it is not really necessary that the module name is the only item in the
|...| construction, as long as it is the nal item; this extra freedom is not likely to be of much practical
use, however.
Modules producing additional output les: @( . . . @> As was mentioned before, there are special
module names that will cause the program produced by that module to be written to a separate output le.
Such a module name is specied by enclosing the le name in @( and @>; in fact it is sucient to use @(
instead of @< in just one occurrence of the module name. The le name will be set in typewriter type by
CWEAVE, so that the dierence with an ordinary module name is easily perceived. Although hardly relevant
for this case, the compression of white space and the abbreviation mechanism for module names also applies
to these special module names. The le name can contain any special characters, including | and @; the
latter must as always be doubled.
Control codes that help parsing in special situations: @;, @[, @] In the discussion of the
format command we already mentioned the way CWEAVE parses and formats program fragments, and the
fact that some programming constructions can confuse the parser, leading to badly formatted output. Like
@f, the control codes in this subsection provide ways to avoid such problems, but they do so on a local basis
in the code itself, rather than by global denitions. They are mainly used in connection with macros with
replacement texts and/or arguments that are not expressions. Since macro invocations look like identiers
or function calls, and macro arguments appear to be function arguments, a piece of code containing a macro
CWEBx MANUAL HOW TO CREATE A CWEB PROGRAM 17
invocation whose replacement text and arguments are not all expressions may seem syntactically incorrect
when not expanded. An example of such a scenario is a macro whose replacement text is a compound
statement; an invocation of such a macro needs no semicolon following it, and sometimes placing a semicolon
would actually cause an error (e.g., if the invocation is used as the rst branch of an if -else statement, since
the semicolon would be taken to be an empty statement after the conditional statement, and the else
would be unmatched). Since the parser of CWEAVE does not expand macros, it will fail to recognise a macro
invocation without a following semicolon as a statement, and like many parsers it is not good at recovering
from such a failure. Although no error message is usually issued, formatting can be severely disrupted;
indeed, correct formatting will only be inserted locally for constructions that do not contain the error, so
one unrecognised construction can easily destroy the layout of the entire program fragment it occurs in.
CWEAVE provides some simple mechanisms for guiding the parser through such unusual code, and by
applying them in several ways nearly all problems that arise in practice can be solved. One of these is the
control code @;, which produces no C code (nor any printed output), but which can be used in places where
the CWEAVE parser would require a semicolon for a successful parse; another is the combination @[, . . . , @],
used as a pair of parentheses, which will cause whatever is enclosed to get the syntactic category expression,
regardless of its actual category.
The most obvious use of @; is in the case already mentioned of a macro invocation that expands to
a (compound) statement: placing @; after such a macro invocation will cause it to be recognised as a
statement by CWEAVE, keeping its parser happy while not aecting the actual C program. There are other
situations as well where one does not want to place a semicolon, yet wishes CWEAVE to act as if it were there.
If a macro stands for statement that happens to end in a semicolon, then it is a good idea to suppress the
nal semicolon in the denition: in that case all invocations can supply the semicolon, and one does not have
to remember writing @; instead of ; at the invocations of this macro. For instance, the macro replacement
text could be do statement ) while ( condition)), or if ( condition)) statement ) else expression),
or even if ( condition)) statement ) else, where the nal else was placed with the purpose of picking
up the following semicolon as an empty statement; in all these cases the macro invocation together with
the following semicolon is a complete statement that can be used without special precaution, even as the
rst branch of an if -else statement. However, in these cases the macro denition itself needs a bit of extra
care: a @; should be placed at the end to represent the semicolon that will follow in invocations, so that
CWEAVE can properly format the replacement text of the macro. Finally, it there can be purely aesthetic
reasons for wanting to suppress a semicolon at the end of a |...| construction, for instance when referring
to a declaration as char p, which strictly speaking requires a nal semicolon to become a declaration; to
let CWEAVE format this properly, one should write |char *p @;|. Constructions like return home and
goto sleep, which are fairly common to mention in module names, would also fall into this category, but
in this particular case no @; is necessary, since CWEAVE parses these as expressions, even though strictly
speaking they are not.
Since @; is invisible in the output, yet can be sensed by the parser, it can conveniently be used to
pass information to the parser, and there are a few instances of such use where it does not stand for a
semicolon. We already mentioned placing one or two copies of @; after a module name to indicate the
syntactic category. Another use is to place it before a typedef identier to cause it to be treated as an
ordinary identier; this is useful if the identier is locally redeclared, or used as eld selector in a struct or
union specier. When the identier is used as a tag immediately after struct or union, or as a selector
after . or , it is not necessary to place @; before it.
Unlike @;, the control codes @[ and @] themselves do not participate in parsing. The material between
them is parsed normally, which may or may not succeed in recognising a single construct; then the pieces
recognised are concatenated (without separation), and the result is given the category expression for the
purpose of parsing further items outside. The most obvious use of this mechanism is to encapsulate any
arguments in a macro invocation that are not expressions (e.g., some storage allocation macros have a type
as argument), so that the invocation can be parsed as a function call. There need not be anything in between
@[ and @], so @[ @] can be used as an invisible expression in the same way as @; can be used an
invisible semicolon. An example where this is useful, is a module standing for an initialiser list, that is
moreover dened in multiple sections (see for instance the module Rules 157 ) in the source document for
18 INVOCATION OF CTANGLE AND CWEAVE CWEBx MANUAL
CWEAVE): it is natural to end each program fragment dening a part of such a module with a comma, but this
will not be parsed properly unless an expression follows, which can be achieved by adding @[ @]. Finally,
if for some tricky piece of code none of the mentioned methods suce to get it parsed properly by CWEAVE,
one may use @[ and @] (followed by @; if necessary) to minimise the damage: by placing @[ and @]
around an appropriate part of the program containing of the problem area, we can ignore the fact that the
parser failed to recognise it, and force it to continue as if it ad recognised an expression; thus we can contain
the problem, and prevent the eects from spreading any further.
5 Invocation of CTANGLE and CWEAVE
The simplest form of calling CTANGLE and CWEAVE is to supply one command line argument, which is the
name of the CWEB source le without the .w sux. It is possible however to modify the behaviour of the
programs by selecting certain optional settings, and small patches to the master source le can be achieved
by supplying a change le. The general syntax for invoking CTANGLE is
ctangle [(+ [ ) options )] CWEB le )[.w] [( change le )[.ch] [ + [ ) [ output le )[.c]]]
where square brackets indicate optionality, vertical bars separate alternatives, and parentheses are used for
grouping. Here options ) is a string of one or more characters designating options, as described below; there
may be more than one such string of options, and they may be given between or after the les names instead
of before them, with no dierence in meaning. For CWEAVE the situation is entirely similar, except that the
default extension for the output le is .tex instead of .c.
Command line options A command parameter that starts with + or and has at least one more
character, serves to control optional settings of the program being invoked. The characters after the initial
character + or denote individual options that are turned on respectively o; option characters are case-
insensitive. The character i forms an exception, since it is used to supply a string argument rather than
to set a switch; the string is the remainder of the option string (following the i), and +i and i are
equivalent. All option characters will be accepted, but only the ones listed below have any eect on the
operation of the program. We list the switches in the direction that alters the default setting.
switch program eect
b both do not write a banner line to the terminal
p both do not show a progress report on the terminal
h both omit conrmation of successful completion
l CTANGLE omit #line directives, make C le look nice
x CWEAVE do not attach index and other information at the end of the document
+d CWEAVE report failure to completely parse pieces of C code
+t CWEAVE write three les, with separate ones for index and list of module names
+e CWEAVE even out number of pages before table of contents
+i CWEAVE add alternative search path for header les (takes argument)
+f CWEAVE force a line break after each statement
+a CWEAVE force all statements to be on a line by themselves
+u CWEAVE unaligned brace style: do not align and vertically
+w CWEAVE wide brace style: force line breaks before and after
+m CWEAVE merged declarations style: do not force line breaks between local declarations
+c both run in compatibility mode with Levy/Knuth CWEB
+s both show memory usage statistics at completion
++ both handle C
++
language instead of C
The options +d and +s only operate if CWEAVE or CTANGLE was compiled with the preprocessor symbol
DEBUG respectively STAT dened (with most C compilers this can be accomplished by including a command
line parameter DDEBUG respectively DSTAT when compiling the CWEB system).
CWEBx MANUAL INVOCATION OF CTANGLE AND CWEAVE 19
The options b, h, and p can be used to control the amount of output that CWEB writes to the user
terminal; the combination bph will eliminate terminal output altogether when no errors are encountered.
The option l of CTANGLE is intended either for use with broken compilers or debuggers that cannot
handle #line directives properly, or for cases where the C le is of more importance than just as an
intermediate le, for instance when the program is transferred to people who do not wish to practice literate
programming. Apart from omitting #line directives and comments that indicate the section number from
which code originates, an attempt is made to make the C le more readable to humans: the spacing and
(almost all) comments of the source le are preserved in the C output, and when modules are substituted
into others, indentation levels are accumulated, so as to produce indentation that looks natural. Doubtlessly
the result is not perfect (and lines may get quite long), but it is denitely more readable than the output
normally produced. Since layout and comments of the source le need to be preserved by CTANGLE, this
option consumes signicantly more memory than its contrary.
The option +d causes CWEAVE to issue a warning when it could not properly parse some piece of C code;
this could happen either because a code fragment is incomplete in the sense that it does not represent a
single complete syntactic entity (as in the |<=n| example above, or when a module body ends with a label
without a following statement), or because the code is actually unsyntactic, or because CWEAVE has been
fooled by an unusual construction. In all cases however the result can be (very) badly formatted output, and
a correction should be made; users who care about the quality of the typeset output are advised to always
set this option (or at least when the document is being nalised). Setting the +d switch is equivalent to
placing a control code @1 at the beginning of the rst section; the nature of the warning messages and
possible remedies will be discussed later in this manual.
The two output les that the option +t will cause CWEAVE to create in addition to its main output
le, are called name ).idx and name ).scn, where name ) is the name of the main output le without
its extension. These les will be read by \input commands in the main output le, so that the typeset
document will not be any dierent; on large projects however it can be helpful to have this information on
separate les, for instance for making a global index. The option +e is intended for use with two-sided
printers: it ensures that the table of contents comes out on a fresh sheet of paper, so that it can conveniently
be moved to the front.
The option +i (or equivalently i) can be used to specify a directory for CWEAVE to search for header
les in @h commands. Although directory structures are system-dependent, CWEB assumes that a le can
be looked up in a specied directory by prexing a string indicating that directory to the le name (this
works for many systems); the desired prex string should then be supplied as the remainder of the option
string after the i character. E.g., on the UNIX system the author uses, CWEAVE can be told about the
location of the Xlib header les by supplying an argument +i/usr/local/X11R5/include/ (one could
replace +i by I to make it look more like the similar option passed to the C compiler); the important
thing to note is the nal pathname separator /. Up to 8 additional prexes can be specied by giving
several such arguments; they will be tried in order from left to right. It is also possible to x one such prex
at compile time, by dening the preprocessor symbol CWEBHEADERS to be the desired prex string when
compiling the compilation unit common.c of CWEB; this will behave as if it were the rst prex specied by
a +i argument.
The last ve options mentioned will alter layout style of program fragments. The option +f will result
in a more vertical style than the default, and +a will do so even more; the dierence between them is that
+f will not force a simple statement to start on a new line if it follows a label or the condition of an if or
while statement, whereas +a will start a new line in such cases. The option +u selects a style in which
corresponding opening and closing braces are unaligned because a line break is inserted after instead of
before it. The option +w on the other hand selects a brace style that has more vertical symmetry than the
default one, since opening braces will appear on a line by themselves, like closing braces; the price is that
listings will consume more paper. The option +a overrides +f, and similarly +w overrides +u. Finally,
the option +m is for people (like the author) who are extremely keen on saving paper: it avoids forced line
breaks between the declarations in a compound statement, just like they are not placed by default between
the statements; the separation between declarations and statements within a compound statement is still
indicated by a line break, that even has some extra vertical space, because this separation is signicant in
20 INVOCATION OF CTANGLE AND CWEAVE CWEBx MANUAL
the C syntax (unlike the C
++
syntax).
In compatibility mode, specied by +c, both CTANGLE and CWEAVE modify their behaviour in such a way
that they try to ensure that they can handle any le that can be correctly processed by Levy/Knuth CWEB,
and that the output is an equivalent C program, respectively a valid T
E
X le (this is the hard part) that
produces a comparable printed document. In the current version this claim can only be made for programs
written in C; a wholehearted attempt to do the same for C
++
programs would cost a substantial amount of
extra work. There are so many dierences in the details of formatting between CWEBx and Levy/Knuth CWEB
that one cannot expect formatted output that is identical to what would be produced under Levy/Knuth
CWEB, but to get the best approximation, one should in addition to +c specify the options +uft.
The option +s is included because the CWEB utilities use statically allocated memory areas, which may
therefore run out; using this option one can see how close one is to the limits of CWEB. The most important
limited resources that it provides information about are are: (a) The name tables in which CTANGLE and
CWEAVE store all distinct identiers and index entires, respectively module names (the entries identiers,
module names, and bytes); (b) CTANGLEs main memory, in which the complete C program le pro-
cessed during a single run has to be stored, albeit in a compactied form (replacement texts and tokens);
(c) CWEAVEs cross-reference memory, in which all the data for the index and list of module names are stored
(cross-references); (d) its parsing buers, which must be able to hold any one program fragment or piece
of C code (scraps, texts, and tokens). There should be no immediate need to increase the size of these
memory areas, since even for the main program of CWEAVE, the largest of CWEBs own compilation units, the
use of any of these resources is less than a third of the amount available. There is one resource of which a
larger fraction is used, namely trie nodes, but its usage depends only on the set of grammar rules used,
which is independent of the particular CWEB source le. When for some source le CWEB is approaching its
limits, one can of course try to recompile CWEB with larger arrays, but alternatively one may restructure
the source le: when one of (a), (b), or (c) runs out, one might consider breaking up the le into several
separately processed pieces; when (d) runs out, a remedy could be splitting up some huge module body into
smaller ones, by introducing submodules or multiple denitions of the module.
Switching to the C
++
language has only a minor inuence on the operation of CWEB: one-line comments
starting with // will be recognised, the main output le produced by CTANGLE will have default extension
.C instead of .c, and CWEAVE will recognise a few more reserved words and use a slightly dierent syntax.
Since there is no general agreement about the proper extension for C
++
les, and alternative default extension
for C
++
mode (instead of "C") may be built in by setting the preprocessor symbol CPPEXT to the desired string
(that should not contain the leading period) when compiling common.c. Currently the CWEAVE grammar will
handle only a basic subset of the C
++
language, which does not include templates or exception handling.
File name arguments Any command line arguments that do not have the form of an option are taken
to indicate le names; their number can vary from 1 to 3. The rst one species the main source le, the
second (if present) indicates the change le, and the third optionally denes a non-standard name for the
main output le. The contents and function of the change le is discussed in the next section; here we we
just indicate how the actual le names used are derived from the given le name arguments. As far as CWEB
is concerned a le name is composed of a base name and an extension. Loosely speaking, the extension of
the main le defaults to w, that of the change le to ch, and that of the output les to c or tex for
CTANGLE respectively CWEAVE (but see also the discussion of the ++ option above); the base names of the
change le and the main output le default to that of the main le. If in place of a change le name an
argument is specied, no change le is used; also if only one le name argument was given, or if the
change le name was specied as +, then the default change le name is tried, but if no such le exists,
processing proceeds without a change le. (Specifying the change le as + is only useful if a third le name
argument is given.) Therefore, assuming regular naming conventions, there is no need to specify more than
the main le name without extension, whether or not a change le is being used.
The precise rules are as follows. On le systems where an extension is not a standard property of le
names, like that of UNIX, it is assumed the a period is a valid character in le names; a full le name is then
formed by concatenation of the base name, a period and the extension (note that this implies that on such
systems CWEB cannot access les whose name contains no period at all). Conversely, a string designating
a full le name is broken up into a base name and an extension at the last occurrence of a period; if no
CWEBx MANUAL SUBSIDIARY INPUT FILES AND CHANGE FILES 21
period is present, then the string is taken to specify a base name only, and is said to have no extension. If
the rst le name argument has an extension, it species both base name and extension of the main source
le, otherwise it species the base name, and the extension is taken to be w (if no such le is found, the
extension web is also tried, but this feature is obsolete). The base name of the main source le is also the
default base name of the change le and the main output le; their default extensions are as described above.
If a second and possibly third le name argument is present and is not + or , it overrides the base name,
and also the extension if it has one, of the change le respectively of the main output le. No change le
will be used either if the second le name argument is , or if no change le is found when the second le
name argument is + or absent.
6 Subsidiary input les and change les
As we have described it so far, the CWEB tools read a single source le, from which a main output le and
possibly some auxiliary output les are produced. Since C programs can be built from several compilation
units, it is not uncommon that several CWEB source les contribute independently to the same program,
and there might be non-CWEB source les as well. However, even what is conceptually a single CWEB source,
described by a single printed document, may in fact be composed from several input les. Two mechanisms
are provided for combining information from several les, with dierent purposes. First, subsidiary les
may be read in from the main source le in a way similar to the way #include les are handled by a
C compiler. In the case of CWEB however, the main purpose is usually not to share information among several
sources, but merely to allow breaking up large source les into more easily manageable parts. Second there
is the change le mechanism already mentioned above, which serves to install system dependent patches to
a master source, allowing that master to remain free of system dependencies.
When a line of the form @i le name ) appears in a CWEB source le, CWEB will read in the indicated
le at that point, and continue reading at the next line when it reaches the end of the subsidiary le. The
le name ) may either be delimited by white space, or be enclosed in double-quote characters (but not in
angle brackets). Source les may be nested in this way up to 10 levels deep. Nothing in the printed CWEB
document will indicate the switch from one source le to another, nor will there be any eect on the C le(s)
written by CTANGLE, except that #line directives will of course always point to the proper point of origin
for each piece of code written to such les.
Like for header les, there is a way to indicate that if a le included by @i is not found in the current
directory, an alternative place can be tried; unlike header les however there is relatively little need to
use this facility, unless one has les that are useful to include identically in more than one project. At
most one alternative place to search can be given, and it is specied by a prex to be applied to the le
name, in the same way as for header les. This prex may either be compiled into the CWEB programs by
setting the preprocessor symbol CWEBINPUTS equal to that string when compiling common.c (analogously to
CWEBHEADERS), or it can be specied at run time by setting the environment variable CWEBINPUTS; when
both methods are used, the latter takes precedence.
The change le, if present, contains a sequence of changes, each of which species the replacement
of one or more lines from the main input stream by another set of lines. Each change has the form @x
original lines ) @y replacement lines ) @z, where each of the codes @x, @y, and @z occupies a line
by itself. The original lines ) is a non-empty set of lines that should match exactly with some sequence
of lines in the main input stream (except for the fact that trailing white space on any line is ignored).
Furthermore, dierent changes should aect non-overlapping sets of lines, and their order in the change le
should be the same as that of the parts of the main input stream that they replace. For each change in
succession, a sequence of lines matching original lines ) is searched for, and replaced by the corresponding
replacement lines ); like for @i le insertions, the resulting stream of lines will be processed in the usual
way as if it constituted a single CWEB source le. The main input stream referred to here is the result of
(recursively) inserting any auxiliary les indicated by @i lines into the main CWEB source le. It therefore
makes no sense to specify @i in the original lines ), nor is @i allowed in the replacement lines ): it should
simply not occur anywhere in the change le. On the other hand it is legitimate for the original lines ) to
match a sequence of lines coming from more than one physical source le.
22 SUBSIDIARY INPUT FILES AND CHANGE FILES CWEBx MANUAL
The fact that input is temporarily switched to the change le is not entirely transparent to the CWEB
document, as it was in the case if @i les: CWEAVE will mark all sections that were modied under control
of the change le, by attaching an asterisk to their section number, and to all references to that number.
(If some changes should add or remove entire sections in the middle of the CWEB source, which is allowed
although not encouraged, then the section numbering will be altered, but sections for which this is the only
change will not be agged with an asterisk.) If one is only interested in sections that are modied, then it
is even possible to restrict printing to only those sections, by including the T
E
X command \changesonly
in the text in limbo, preferably by means of the change le.
In order to facilitate ecient implementation of the change le mechanism, an additional constraint is
placed on the changes: once an exact match of a line in the main input stream with the rst line of a change
is found, the remaining lines of the change (up to the @y) should also match. Any empty lines immediately
following @x are not used for matching (and are in fact completely ignored) so the rst matching line is
never an empty one; it is preferable to choose changes such that their rst line matches a unique line of the
main input. It is a good idea to start changes in the T
E
X part of sections (after all, if the program changes,
so should its explanation); in this case uniqueness of the match of the rst change line can always be ensured
(even when the T
E
X part is empty) by placing a T
E
X comment in the main input, that serves merely as a
target for replacement by the change le. All text in the change le that is not part of a change is ignored,
except that there should be no lines starting with @i, @y, or @z; this text can be used for instance to
explain the purpose of the change to the person installing the program on a new system, rather than to the
ordinary reader of the program.
As we have said earlier, the change le mechanism provides an alternative to system dependent conditional
compilation, and it is usually a much more elegant way to incorporate system dependencies. The main reason
for this is that one does not have to anticipate all possible systems that a program could be ported to, nor is
the main source polluted by such considerations: it suces to provide a separate change le each time the
program is moved to a system with dierent system dependent requirements. Users of a particular system
need to know about the change le for that system only, and the responsibility for maintaining main source
and the change le might lie with dierent persons; additional eort is only required when the main source
changes in such a way that a change le fails to match.
One should not get carried away by the benets of change les though: they provide only a rather crude
mechanism (due to the inexible matching rules), and if there are many changes, they will become dicult
to maintain when the master le evolves. Portability is still best obtained by limiting system dependent
features as much as possible, and if inevitable, conning them to some well dened part of the program.
If one should wish to create variants of a program that involve signicant changes, then writing extensive
change les is probably not the best way to go. This method could lead to a form of rigor mortis for
the original version of the program, caused by fear that any alterations could upset one of the change les,
even trivial changes that only involve the commentary, or even just the layout of the source le. A better
approach would be to collect routines of general utility as much as possible into separate compilation units
used by all variants, and to complement these with completely independent compilation units to dene the
specic behaviour of each of the variants. It is certainly pointless to use a change le for such things as bug
xes or further development of a program; the whole idea is that such modications can be made in the
master le while the change les for various systems need little or no adjustment.
The codes @i, @x, @y, and @z of this section have the appearance of control codes, but they are not
really part of the CWEB language, and obey dierent rules than control codes. For instance, they are line
oriented (and rightly so, since their goal is to select which lines will be actually processed by CWEB): they
should appear at the beginning of a line, and any further text on the line (in case of @i, after the le name)
is ignored. Also they act quite independently of CWEBs current mode of operation: rules such as the one
forbidding control codes in limbo do not apply to these codes.
CWEBx MANUAL CONTROL CODES FOR ADVANCED OR EMERGENCY USE 23
7 Control codes for advanced or emergency use
In this section we discuss control codes that are not essential for everyday use of CWEB, but are provided
to enable either renements in the presentation of the CWEB document, or special manoeuvres to deal with
certain unusual situations or requirements. Most of them serve to allow the programmer some form direct
control over the contents of either the CWEB document, the C le, or the source le, bypassing the automatic
processing by which these are normally related to each other; there are also a few that serve as debugging
aid, eliciting explicit information from the CWEAVE parser about its actions.
Control codes for cross-referencing: @!, @^, @., @?, @:, @# Some control codes are
provided that allow the programmer to inuence indexing and to perform explicit cross-referencing. The
codes in this subsection are the only ones that are allowed to occur in the T
E
X part of sections, outside
|...|; with the exception of @#, they can also be used in C text. Control codes such as these, that
are intended only to aect the printed document, are ignored completely by CTANGLE. Incidentally, cross-
referencing in CWEB always means referring to section numbers rather than to page numbers: CWEAVE cannot
know about page numbers since these are determined only at the T
E
X processing stage. It would be possible
to have T
E
X produce a table mapping section numbers to page numbers; in fact the table of contents provides
a coarse approximation to such a map.
Whenever CWEAVE can determine from the context that an occurrence of an identier is a dening one, it
will make the corresponding section reference in the index underlined. If some case is missed by CWEAVEs
normal rules, or if one wants to make a reference to a reserved word (which is only made if it is underlined),
then one can place the code @! in front of the identier to create an underlined reference. Cases where
this may be required include arguments of functions with an old-style (pre-ANSI) heading for which no
declaration is given before the function body (i.e., the default type int applies), and enumeration constants
that appear out of context of the enum keyword (e.g., because the enumeration list is given as a separate
module). In general, the occasions where one needs @! are quite rare.
A group of three codes serves to include additional entries in the index, amidst those generated automati-
cally by CWEAVE for identiers. It may be useful for instance to maintain references to concepts like system
dependencies, or to all error messages that can be generated. The three codes are @^, @., and @?; they
dier only in the way the index entry will be typeset. In each case the index entry is specied as a control text
terminated by @>; control code and control text will be removed by CWEAVE, but the control text will appear
in the index, followed by the section number(s) where the control code occurred. For @^, @., and @?,
the index entry will be set respectively in roman type, in typewriter type, and as argument to the control
sequence \9 (which is undened in the standard format, but which the programmer may dene in limbo).
The rst possibility is most suited for general concepts, the second for strings that occur in the program,
and the third for any further special purpose one may think of. These control codes can be put either in the
T
E
X part of a section or within C code; the eect will be the same, but this allows the programmer to put
the control code in such a place that it is most likely to remain in the right place in case the section should
be reorganised and possibly subdivided. Like for references to identiers, one can make an index reference
underlined by prexing the corresponding control code with @!.
Unlike the control text forming a module name, the control texts discussed here (as well as those that
have not been introduced yet) should be contained in a single line of input; also, no spaces are contracted or
removed. The control texts are passed unchanged to T
E
X (with only @@ being undoubled as usual), so that
they can use T
E
X commands for special eects. Inside @.. . . @> one can get the special characters occurring
in #$%^&{}~_\ by prepending a backslash, \v gives a vertical bar |, and \ gives a visible space .
The control texts are also used as a sort key to determine the place in the index where the entry appears.
Dierent occurrences of these control codes are combined in the index only if there is an exact match of
both control code and control text, and no merging takes place with identiers whose name happens to
be equal to the control text (however, their relative order in the index is unpredictable). In sorting, a
collating sequence is used that diers from the standard ASCII order: alphanumeric characters appear at
the end of the sequence, with upper and lower case being considered equivalent, and the space character
appears at the beginning of the sequence. In case there are entries that cannot be correctly positioned by
ordinary means, the following trick has been suggested by Knuth: dene \def\9#1{} and represent the
tricky entries as @? sort key )}{ T
E
X code )@>, where sort key ) contains suciently many characters to
24 CONTROL CODES FOR ADVANCED OR EMERGENCY USE CWEBx MANUAL
uniquely determine the position of the entry in the index, and T
E
X code ) produces the index entry itself;
this works because CWEAVE will write the index entry \9{ sort key )}{ T
E
X code )}, which expands to
{ T
E
X code )}.
Besides references from the index, CWEAVE provides cross-references, in the form of the section numbers
that link the (rst) dening occurrence of a module name with the places where it is used and cited. There
is also a mechanism for the user to explicitly state similar cross-references in the T
E
X part of a section, so
that it is possible make a reference to another section (where some related matters are treated), that will
remain correct if sections are renumbered. The mechanism is simple: in the section referred to, one places
the control code @:, followed by a control text serving as a label, and at the place of reference one uses
@#, followed by the identical control text (both control texts are terminated by @>). The rules for placing
@: are the same as for @^ and its relatives, except that @! has no eect here; the control text will not
appear in the index, and there is no conict when the same string is used as an identier or index entry.
For @# and its control text, CWEAVE basically substitutes the section number of the matching @: code,
but because there might be multiple occurrences of @: with the same control text, the precise replacement
rule is a bit more complicated. The replacing text is precisely what would follow See also section in a
cross-reference for a module name: one or more section numbers in increasing order, separated by commas
and and as appropriate, and preceded by a space and, in case there is more than one section number, by
an s before that space. This is set up so that a reference of the form section@#label@> will generate a
proper reference, whether or not there are multiple denitions of the label. One can also use \Sec@#label@>
since in the standard format \Sec expands to and \Secs to (in this case the space produced by @#
is ignored after the T
E
X control sequence); by dening other T
E
X macros one could do anything one likes
with the text provided by @#. Although @# cannot be used directly in comments and module names, it is
possible to capture its text in a macro denition (within a T
E
X part) and use that macro instead.
Control codes for layout in programs: @,, @|, @/, @), @\, @+, @; As we mentioned
before, CWEAVE formats the program fragments and pieces of C code by inserting formatting controls in the
the output based on a syntactic analysis of the C tokens of the program fragments; in particular the layout
of the code in the source le is completely ignored. Although this automatic formatting usually works well
provided that CWEAVE succeeds in parsing the program fragment (possibly with help of some codes already
discussed), there may still be occasions where one is not quite satised by the result. If one wishes certain
constructions to be systematically treated in a dierent way, then a more pleasing style might be available by
calling CWEAVE with certain options set; if not, then there is always the possibility of changing the grammar
or layout rules of CWEAVE (that program was written in a way that tries to make this as easy as possible,
but it still requires some careful study of the relevant chapters of the CWEAVE source document). However
in some cases one simply wants to override the general rules in specic cases by adding or removing a few
formatting controls. There are a number of control codes which can be used to do that. These codes are
ignored by CTANGLE; since most of them deal with line breaks, their importance for |...| fragments is
minimal.
The control code @, will insert a thinspace (a small amount of horizontal white space) where it is placed.
Within an statement @| may be used to indicate a place where a line break may be optionally taken (with
no associated penalty), when the statement is too long to t on a single line. Note however that optional
breaks are already allowed at most operator symbols, with a penalty that increases with the operator priority
and the number of enclosing parentheses, so CWEB will almost always succeed in nding very a reasonable
break point in long expressions. A line break can be forced by @/; this can be used for instance between
statements (if line breaks are not already forced there), in order to group related statements on one line
rather than simply as much as possible. The code @) will also force a line break, and in addition create a
bit of vertical white space to give an even more visible separation. (CWEAVE will never issue more than one
line break on the same place, so there is no problem if a line break was already present on that spot.) The
code @\ is another variation: it forces a line break and backs up the next line by one indentation unit. It
is useful before a module name that represents one or more cases in a switch statement: this will make the
name line up with the case labels.
Finally, @+ cancels any (forced) line break that might be inserted by CWEAVE at the point where it is
placed, and replaces it by a space with optional line break (the kind of space that is usually inserted between
CWEBx MANUAL CONTROL CODES FOR ADVANCED OR EMERGENCY USE 25
statements). Its main use is to force small conditional or loop statements onto a single line when CWEAVE
would otherwise use a multiple-line layout. Because the line can still be broken at the inserted space, such
one-liners do not make it impossible to retypeset the program in a narrower column. A warning is in place
however if, as a result of applying @+, a substantial stretch of C code is void of forced breaks, and that code
contains constructions that aect the indentation level. T
E
Xnically speaking, the indentation at optional
breaks is governed by the hanging indentation parameter of T
E
X, whose value is constant throughout a
paragraph, which in this case is everything between two forced breaks; under the mentioned circumstances
the amount of indentation at optional breaks can be unexpected and inappropriate.
For convenience, an alternative method is provided to t compound statements on a single line, and
similarly for struct and union speciers. Instead of writing @+ on every place where CWEAVE would
otherwise force a line break (which incidentally depends on the chosen layout style), it suces to place @;
immediately after the opening brace. This will activate a dierent set of layout rules than is normally used,
which will not insert forced breaks between the declarations and statements of the compound statement,
respectively between the elds of the struct or union specier. In the case of a compound statement, any
forced breaks caused by conditional or loop statements appearing directly inside the compound statement
are also avoided (but nested statements are not aected, so they should be handled separately if present,
possibly using another @;). Compound statements starting with {@; will be treated as if they were simple
statements in further parsing, which may aect formatting; for instance, if the statement is the branch of a
conditional it will be placed on the same line as the if or else controlling it. If this is too much of a good
thing, a forced break may be explicitly inserted at the beginning and/or end of the compound statement; in
fact the sequence @/{@; is a fairly common one.
There is another use of @+, which does not cause any breaks to be cancelled, but where on the contrary
the purpose is insert white space. It applies when a long string constant is needed, for which the string-break
feature is used: a sequence of strings separated by white space only will be concatenated by the compiler into
a single string. Although CTANGLE will correctly insert a space between any two consecutive strings, CWEAVE
(guided by syntax rather than by lexical structure) will simply juxtapose them; by inserting @+ between
the strings, one guarantees that in the printed document there will either be a horizontal separation or (if
the constituent strings themselves are already long) a line break. Incidentally, if the problem of breaking
a string is in the source le rather than in the printed output, one can use the traditional solution of an
escaped newline within the string; CWEAVE will treat this as if the parts of the string were on the same source
line. If one should create a string in this way that does no t on a single line of output, a break will be
introduced automatically at a some point, which will be typeset as if a string-break was used. In very long
strings however it is better to write string-breaks explicitly; for strings broken only by escaped newlines, the
same length limit holds as for module names (1000 characters).
Codes for special items in C code: @p, @v, @t, @&, @=, @ Contrary to T
E
X text, pieces
of C code are broken up into tokens by both CTANGLE and CWEAVE, stored internally and output at some
later time after having undergone some processing. This makes it potentially dicult to put something into
C code that CWEB is not prepared to handle. Since C is a much more regular language that T
E
X, occasions
where one would need to do such a thing should be quite rare, yet some escape mechanisms have been
provided, which we treat in this subsection.
The code @p can be used to explicitly specify the place where the preprocessor directives generated by
@d and @h commands will be placed in the C le. Multiple use of @p is allowed; as soon as it is used at
least once, the default placement at the beginning of the C le is cancelled. This code provides the only way
to write the directives generated by @d and @h to an auxiliary output le. In the formatted output this
code is represented by the pseudo-module Preprocessor directives ), which (like preprocessor directives
embedded in a program fragment) is set on a separate line and does not otherwise aect the formatting of
the surrounding code.
Two other codes are intended mainly for use within |...|. As mentioned earlier, @v represents the
bitwise-or operator. The code @t is followed by a control text, which can be used to insert any T
E
X symbols
into a C expression; the result gets category expression but (if used in a program fragment) does not produce
any actual C code. It is for instance possible to obtain phi < /2 by writing | phi < @t$\pi$@> / 2 |,
or if one prefers, to get phi <
2
by writing | phi < @t$\pi\over2$@> |. The control text is put
26 CONTROL CODES FOR ADVANCED OR EMERGENCY USE CWEBx MANUAL
into an \hbox that will appear at the specied point in the formula. One might imagine using @t as a
means to sneak in T
E
X commands that will modify the formatting produced by CWEAVE, but this is strongly
discouraged unless one thoroughly understands that formatting and the way it is obtained.
The codes @& and @= are intended as a means to alter or bypass the processing of C tokens by CTANGLE;
they should only be used in very exceptional situations. The code @& forces CTANGLE to output the symbols
to the left and right of it directly adjacent to each other. Normally CTANGLE inserts space between two
symbols if it thinks this is necessary for lexical reasons, regardless of whether a space was present in the
input. Items with a lexical structure unknown to CTANGLE might confuse it, so that it would output a
spurious space; this space could then be eliminated by @&. For instance, an earlier version of CTANGLE
would not recognise 1000000UL as a constant, and consequently it output a space before the U, so that the
C compiler could not recognise it either; this problem could then be remedied by inserting @&. No similar
cases are known for the current version of CTANGLE.
The code @= can be used to place some text in the C le that CTANGLE will not produce by ordinary means:
the control text following @=, up to the next @> is copied verbatim to the C le (with @@ undoubled
as usual). If some special compiler activity, or some action by another tool, is triggered by the occurrence
of some special form of comment in the C code, then such a comment can be placed using @= (normally
comments are removed by CTANGLE). Also, should CTANGLE unjustly decide that two symbols need no space
in between them, then a space can be forced by writing @= @>
==
\E
=
!=
\I
,=
<=
\Z
>=
\G
&&
\W
||
\V
!
\R
&
\AND
&
|
\OR
[
^
\XOR
~
\CM
<<
\LL
>>
\GG
++
\PP
++
\MM
%
\MOD
%
>
\MG
##
\SS
##
When such a macro is redened, it is best to consult the original denition rst, since it often issues a penalty,
and it is best to retain this. Formatting of ordinary identiers and keywords is performed by \\ and \&,
which have one argument, that is typeset in italic respectively boldface type; similarly \. is used for items
in typewriter type, such as strings and all-caps identiers. In the argument of \. special characters can be
used if escaped, as discussed for @.. For & in ordinary text \AM can be used (rather than \&). For names
in all caps, like ASCII, or UNIX, the macro \caps is provided, which makes them slightly less obtrusive
by selecting a smaller font; for C and C
++
the macros \Cee and \Cpp are provided. Typesetting of
comments, C
++
one-line comments, and numeric constants is controlled by the macros \C, \SHC, and \T,
respectively; these can be redened if a dierent style is desired.
The dimensions of the pages can be controlled by setting the parameters \pagewidth, \pageheight (the
height of the text area), \fullpageheight (the height including running head), and \pageshift (extra
displacement of odd numbered pages with respect to even numbered ones), and then invoking the macro
\setpage. A magnication can be applied to the entire document by saying \magnify{n},where n is the
magnication in thousandths of the ordinary scale; this should precede any changes of the page dimensions,
but if no changes are made, the page dimensions will be set to their standard values, unmagnied. The unit
of indentation can be set by \indentation{ size )}.
The title of the program is taken from the macro \title, whose default value is the basename of the
program source le, converted to upper case. It is used in running heads and in the table of contents. Another
part of the running heads is set to the chapter title by sections starting with @*, but by dening the macro
\gtitle in limbo the corresponding text for the running heads on any pages before the rst such section
can be set (the default is CWEB output). By invoking \titletrue the running head can be suppressed
for one page; this is useful if the text in limbo produces a title page. The date of processing (by T
E
X) can
be included in the document before the rst section by putting \datethis in limbo; it can be placed on
the table of contents by saying \datecontentspage. At the end of the document one normally has an
index, a list of module names and the table of contents, in that order, but T
E
X can be made to stop short
of any one of these by invoking respectively \noinx, \nomods, or \nocon; as already mentioned, stating
\changesonly will limit the printed output to the sections aected by the change le. The appearance of
the table of contents can be controlled by redening \topofcontents and \botofcontents: these macros
determine the material that comes above the table and below it, including its title and the glue needed to
ll up the page height. The page number of the table of contents is assigned from \contentspagenumber
(the default is 0), but it will not appear in print because the running head is suppressed on that page.
It may be noted that CWEB documents contain some xed phrases in the English language, such as the
cross-references at the end of sections. These are not produced directly by CWEAVE however: one could adapt
CWEB to a dierent language by redening the macros \A, \As, \Q, \Qs, \U, \Us, \ET, \ETs, \ch,
\postATL, \ATP, \today, \now, and parts of \fin and \con.
30 COMPARISON WITH LEVY/KNUTH CWEB CWEBx MANUAL
9 Comparison with Levy/Knuth CWEB
As was mentioned in the introduction, CWEBx is derived from an earlier CWEB system (itself derived from
Knuths WEB), that was written and distributed by Sylvio Levy and Donald E. Knuth, and that CWEB system
has independently evolved into a version currently distributed as CWEB 3.3. Both CWEBx and Levy/Knuth
CWEB have undergone changes with respect to their common ancestor, although the spirit of the system has
not fundamentally changed in either case. Since we considered it undesirable to have a great divergence
between systems that both intend to be a WEB system for C, we made a conscious eort to reduce the
dierences between CWEBx and Levy/Knuth CWEB by including the extensions of the latter into CWEBx as well.
There was one deliberate exception: we made no attempt to extend the grammar used by CWEAVE to handle
the full C
++
language
. On the other hand, hoping to fully bridge the gap between Levy/Knuth CWEB and
CWEBx for C programs, a compatibility mode was added to CWEBx in which it tries to mimic the behaviour
of Levy/Knuth CWEB in all aspects that are relevant to the programmer (even in cases where that behaviour
is undocumented), at the price of losing some possibilities that CWEBx normally has.
The description of the dierences between Levy/Knuth CWEB and CWEBx can be divided into two parts:
the dierences between Levy/Knuth CWEB and the compatibility mode of CWEBx, and the dierences between
CWEBx with and without compatibility mode. The former dierences are minimal, but hard to enumerate
precisely, as they are mainly a matter of dierence in implementation. The latter dierences are much more
signicant, but they can easily be listed, since precise details of those dierences can be found by looking up
all index references of the identier compatibility mode in the sources for the programs of CWEBx, and (for
the dierences that only involve processing by T
E
X) the contents of the le cwebcmac.tex that modies the
cwebxmac format to emulate the environment provided by the cwebmac format of Levy/Knuth CWEB.
As was stated, the dierences between Levy/Knuth CWEB and the compatibility mode of CWEBx should not
be relevant to the programmer, but if one uses Levy/Knuth CWEB in a way that relies on knowledge of intimate
details of its implementation (which are not described in the manual but can be learned from studying the
sources), then it is certainly possible to nd such dierences; this applies particularly to using the T
E
X code
produced by CWEAVE in unusual ways. Unfortunately there is no clear specication of which aspects of CWEB
are well dened so that the user can safely rely on them, and which aspects are implementation details. We
have taken a pragmatic attitude by reproducing all aspects that are described in the manual, and moreover
many undocumented aspects, enough to process the sources of Levy/Knuth CWEB itself and of the Stanford
GraphBase without problems. To give an impression of the kind of dierences that remain, we shall list
some of the known ones.
The output les written by CWEBx are not equal to those written by Levy/Knuth CWEB, so processing
them otherwise than directly by a C compiler respectively by T
E
X may reveal some deviations. For instance,
entries for the index and the list of module names are written using the control sequence \I by Levy/Knuth
CWEB, but since \I is also used for representing the operator ,=, this causes problems for module names in
which that operator is used; therefore, CWEBx uses \@ instead. In CWEBx unbalanced braces or parentheses in
program fragments or macro replacement texts are reported and corrected by CTANGLE, as an aid in catching
programming errors early; in Levy/Knuth CWEB this is not done (but programs with such unbalanced symbols
will still bring CWEAVE into serious problems). In compatibility mode the denition of \PB ensures that
|...| can always be used from within math mode; in Levy/Knuth CWEB this is true only in simple cases.
Comments in the T
E
X parts of sections (following a non-escaped % character) are ignored and removed
by CWEBx, whereas in Levy/Knuth CWEB they are processed normally and copied to the output, which may
cause spurious index entries, and in exceptional cases may cause part of the comment to appear in print.
The grammars used by CWEAVE in the two systems are quite unrelated; for CWEBx, the only guideline in
constructing the grammar has been the ANSI/ISO C syntax. When processed by CWEBx with the proper
options selected, CWEB documents will look similar to the result produced by Levy/Knuth CWEB, but not
identical. Unlike Levy/Knuth CWEB, CWEBx places optional breaks at operators, reecting their priority and
C
++
has a signicantly more complicated syntax than that of C, which is already far from simple, and it
has some forms of context dependence that make it doubtful whether to CWEAVE could ever reliably handle
C
++
in full generality (and even then, C
++
is a moving target). Most eort was spent on getting the grammar
for C correct; support for C
++
was restricted to some extensions of C that could be incorporated easily.
CWEBx MANUAL SUMMARY OF CWEB CODES 31
nesting inside parentheses. In Levy/Knuth CWEB, if @ is immediately followed by a subsectioning code,
then the output from the subsectioning code (e.g., #define for @d) will be placed on the same line as the
section number, but if anything, even an extra space, comes in between, or if the @ in the sectioning code
was followed by a newline rather than by a space, then that output is moved to the beginning of a fresh line;
in CWEBx output from a subsectioning code never appears on the same line as the section number.
CWEBx has a number of control codes and a number of command line options that Levy/Knuth CWEB
does not have; moreover there are some control codes that Levy/Knuth CWEB does have, but under dierent
names (that are used for other purposes in CWEBx). In compatibility mode such control codes have the same
interpretation as in Levy/Knuth CWEB, but if there is no such interpretation while there is one in CWEBx,
the latter is taken;. This means that in compatibility mode one can use the control codes @v, @\, and
@~, which are ignored in Levy/Knuth CWEB, while @? and @) can be used as aliases for @: and @#,
respectively; furthermore the command line options controlled by the characters l, d, t, e, a, u, w,
m, and + are also extras with respect to Levy/Knuth CWEB. Finally the control code @; retains all its
CWEBx uses in compatibility mode, except that of modifying the category of module names (which is dierent
anyway), whereas in Levy/Knuth CWEB it can only be used as an invisible semicolon.
The most direct dierence between CWEBx with and without compatibility mode is that in compatibility
mode the control codes @h, @:, @#, and @p are translated into respectively @p, @?, @), and @c, which
implies that the meaning of @h, @:, and @# as described in this manual are not available in compatibility
mode. There is also an important syntactic adjustment: in compatibility mode module names are always
treated as expressions (which means they must almost always followed by @; to make the combination
behave as a statement, or by ;, which will however become an empty statement in the C program). Then
there are a few points where compatibility mode lifts certain restrictions (thereby reducing the diagnostic
capabilities). All 8-bit characters will be accepted by CTANGLE, whether or not an explicit translation was
specied using @l; the default translation used corresponds to @l NN XNN , where NN is the 2-digit
hexadecimal code, in upper case, for the character. Module names used within |...| do not have to be
the nal item. Rather than performing @i inclusions before the change le is matched, the order is more
or less reversed (but if some @i line is not replaced by the change le, then the included le will again be
scanned for changes); consequently @i is allowed (and meaningful) in the change le.
The remaining alterations aected by compatibility mode are fairly minor. Trailing digits in identiers will
not be set as subscripts. The T
E
X control sequence corresponding to identiers that are given the category
of T
E
X by means of a format denition will be processed in math mode rather than horizontal mode. The
code @t together with the following control text will not be treated as an expression in parsing, but as
an inert item that sticks to the token to the right of it (unlike comments that are attached to the token to
their left); this allows @/@t\4@> to be used in place of @\. Compound assignment operators like += are
treated as two separate tokens; this implies among other things that the operator |= must be entered as
@v= when used inside |...|. In index entries produced by @^, @., or @:, the underscore character
will be automatically escaped by a backslash, unlike other special characters (this makes it harder to enter
formulas with subscripts into the index). The output of |...| is made an argument to the control sequence
\PB, whose default denition puts its argument into an \hbox; this makes it safe to use |...| inside
math mode when using compatibility mode. Finally there are a few other small changes to the format used:
C
++
one-line comments will be formatted as if they were ordinary C comments, and the formatting of lines
in the table of contents will be aected in font and spacing by the depth specied for that chapter title.
10 Summary of CWEB codes
For reference, we give a table with all the codes used in CWEB with their main characteristics. The letters in the
column where indicate in which parts of the source text may immediately precede the code: L indicates text
in limbo, T the T
E
X part of a section, M indicates an intermediate part of a section (from @d, @h, @f,
or @s), C the C part of a section, and c pieces of C code within |...| (the letter M is only used when the
code terminates an intermediate part; inside the parts after @d, @h, and @f, the letter C applies). The
column frequency indicates how commonly the code is used, with regular > incidental > rare > emergency;
for the codes with frequency rare, no sensible use could be found within any source le for the CWEB
system itself. The codes @0@3 that are only of temporary use are omitted from the table.
32 SUMMARY OF CWEB CODES CWEBx MANUAL
code meaning where frequency remarks
Sectioning codes
@* Start of chapter LTMC regular @**, @*n, or @* Title.
@ Start of section LTMC regular
@~ Start of section tied to previous one LTMC regular
Subsectioning codes
@c Start of unnamed program fragment TM regular also @C
@< Start of module name TMCc regular @< Module name @>
@d #define; start macro denition TM regular also @D
@h #include; specify included header le TM regular also @H
@f Format denition; change syntactic category TM incidental also @F
@( Start module name dening output le TMCc incidental @( le name @>
Parsing control codes
@; Invisible semicolon, or magic wand for syntax Cc incidental
@[ Start of item forced to expression Cc incidental
@] End of item forced to expression Cc incidental
Cross-referencing codes
@! Make index reference underlined TCc rare
@^ Index entry in roman type TCc regular @^index entry@>
@. Index entry in typewriter type TCc regular @.index entry@>
@? Index entry formatted by \9 TCc rare @?index entry@>
@: Dene label for explicit cross-reference TCc incidental @:label@>
@# Explicit cross-reference to dened label T incidental @#label@>
Layout control codes
@, Thin space Cc rare
@| Optional line break Cc rare
@/ Forced line break Cc incidental
@) Forced line break with vertical white space Cc incidental
@\ Forced line break, next line backed up Cc incidental
@+ Cancel any line break, replace by space Cc incidental
C control codes
@p Insert output from @d and @h Cc rare also @P
@v Bitwise or operator [ Cc incidental also @V
@t T
E
X code within expression Cc incidental @tT
E
X code@>; also @T
@& Glue together adjacent tokens Cc emergency
@= Insert verbatim C code Cc emergency @=verbatim C code@>
@ ASCII constant converted to number Cc rare @c
Silent control codes
@s Non-printing version of @f LTM rare
@q Ignored control text LTCc rare @qany text@>; also @Q
@l Specify translation of 8-bit character L rare @l xx string ; also @L
Miscellaneous codes (these are not control codes)
@@ Representation of @ LTCc incidental legal in control text too
@i Insert subsidiary source le any incidental also @I
@x Start of change; old lines follow any incidental also @X
@y Middle of change; replacement lines follow any incidental also @Y
@z End of change any incidental also @Z
CWEBx Manual
1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 About literate programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Structured programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Limitations of traditional structured programming . . . . . . . . . . . . . . . . . . . . . 2
Requirements for literate programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
WEB systems for literate programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 What a CWEB program looks like . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Some remarks about the example program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Further attributes of CWEB programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Output to multiple les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 How to create a CWEB program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
The general setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Sectioning codes: @*, @ , @~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Subsectioning codes: @d, @h, @f, @c, @< . . . @>= . . . . . . . . . . . . . . . . 12
Text within C program fragments: comments and module names . . . . . . . 15
C code within text: |...| fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Modules producing additional output les: @( . . . @> . . . . . . . . . . . . . . . . 16
Control codes that help parsing in special situations: @;, @[, @] . . . . . 16
5 Invocation of CTANGLE and CWEAVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Command line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
File name arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6 Subsidiary input les and change les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7 Control codes for advanced or emergency use . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Control codes for cross-referencing: @!, @^, @., @?, @:, @# . . . . . . . 23
Control codes for layout in programs: @,, @|, @/, @), @\, @+, @; 24
Codes for special items in C code: @p, @v, @t, @&, @=, @ . . . . . . . . 25
Control codes behind the scenes: @s, @q, @l . . . . . . . . . . . . . . . . . . . . . . 26
Control codes for tracing CWEAVE: @0, @1, @2, @3 . . . . . . . . . . . . . . . . . 27
8 Some features of the standard format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
9 Comparison with Levy/Knuth CWEB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10 Summary of CWEB codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31