Algorithm Description
Algorithm Description
Algorithm Description
55
algorithms, then give a detailed description that the programmers can understand. You do not have to give a full
theoretical analysis of the algorithm, but a few sentences to explain why you chose the algorithm you did would be
helpful.
Your report will be read by two audiences: the managers of DDD and the programmers. The managers have no
formal training in computer science, but want to know that you’ve solved their problem. The programmers know a
little computer science, but are not familiar with sorting algorithms. The programmers may know as much as you
do, but because they can’t write nice reports, they get sweatshop wages while you earn consultant rates.
56
11.4 Figures and displays
A manager reading your report will look at the executive summary and at the figures, to see what the report is about
and whether it is worth passing on to the programmers who’ll read the complete report. It should be possible to get
a fair idea of the content of the report just from looking at the figures and reading the captions. Look at the articles
in Scientific American for an excellent example of how figure captions can be used to convey most of the substance
of a technical article.
You have several opportunities for figures in this report—you can use them for the pseudo-code for the algorithm,
for showing the change in the data structure as the sorting progresses, for displaying the relative speed of different
algorithms as a function of the number of items to sort, and so forth.
There are two ways to insert non-textual material into a report: figures and displays.
A figure can appear anywhere in the report, and usually appears on the same page as, or slightly after, the first
reference to it. Figures are numbered and have a caption explaining what the figure is illustrating. Each figure must
be referred to somewhere in the text, either as part of a sentence, like this: “Figure 37 illustrates the relationship
between . . . ”; or as a parenthetical remark, like this: “(See Figure 38.)”. Remember to capitalize the word “figure”
when it is used as part of the name for a particular figure.
A display differs from a figure in that it is inserted in a fixed place, and is not allowed to “float” to a different
place in the text. Displays are used for math formulas, very small program fragments, and quotations that are too
long just to put in quotes in the middle of a paragraph. Some empty space is usually inserted before and after each
display to set it off from the surrounding text, and displays are often centered on the page. Mathematical formulæ
are usually inserted as displays, and are grammatically part of the sentence before them. For example, I can define
the golden ratio as the positive solution to the quadratic equation
x2 = x + 1 ,
11.4.1 Graphics
Pictures are often the clearest way to explain data structures, particularly when pointers are required. Whenever a
picture is used, it should be a numbered, captioned figure. Figures should be (nearly) comprehensible without the
accompanying text. Make sure the caption explains what the figure means. Something like “Figure 3. Step 2 of the
algorithm.” is nearly useless. A better caption might be “Figure 3. Swapping out-of-order data elements pointed to
by i and j.”
The pictures you are likely to draw are block diagrams, symbolic representation of pointer structures, maps of
contiguous sections of memory, and graphs showing how something (time, cost, current, voltage, . . . ) varies when
changing various parameters. The rest of this section will talk only about numerical graphs, though you probably
will need to generate other sorts of graphics, even for this assignment.
Proper treatment of graphics is really beyond the scope of this class. A fairly good introduction is included
in Chapters 8 and 9 of Huckin and Olsen [HO91, 137–184]. If you want a comprehensive treatment of presenting
numerical (mainly statistical) data in a professional form, the best book is Tufte’s The Visual Display of Quantitative
Information [Tuf83] on reserve in the Science Library. Other books can be found under subject headings like
“Engineering Graphics,” “Graphic Methods,” and “Computer Graphics.” For examples of how not to present
numerical information, see almost any mass-market magazine or newspaper. USA Today is particularly good at
distorting tiny data sets into fancy pictures that hide the data. Another particularly good collection of bad examples
can be found in a book by Time Magazine illustrator Nigel Holmes [Hol84], although they are presented as if they
were the best way to illustrate data.
Tufte summarizes his theory of data graphics as follows [Tuf83, 105]:
Five principles in the theory of data graphics produce substantial changes in graphical design. The
principles apply to many graphics and yield a series of design options through cycles of graphical revision
and editing.
• Above all else, show the data.
• Maximize the data-ink ratio.
57
• Erase non-data ink.
• Erase redundant data ink.
• Revise and edit.
Tufte also has some strong comments about “chartjunk”, the cluttering of graphs with non-informative decora-
tions [Tuf83, 121]:
Chartjunk does not achieve the goals of its propagators. The overwhelming fact of data graphics is that
they stand or fall on their content, gracefully displayed. Graphics do not become attractive and interesting
through the addition of ornamental hatching and false perspective to a few bars. Chartjunk can turn
bores into disasters, but it can never rescue a thin data set. The best designs (for example, Minard on
Napolean in Russia, Marey’s graphical train schedule, the cancer maps, the Times weather history of
New York, the chronicle of the annual adventures of the Japanese beetle, the new view of the galaxies)
are intriguing and curiosity-provoking, drawing the viewer into the wonder of the data, sometimes by
narrative power, sometimes by immense detail, and sometimes by elegant presentation of simple but
interesting data. But no information, no sense of discovery, no wonder, no substance is generated by
chartjunk.
One problem that has appeared repeatedly is that students have given us plots of n2 and n lg n for n ranging from
1 to 10, to show the advantages of one of the O(n lg n) algorithms over bubble sort. There are two flaws: internal
sorting of 10 elements is so fast that no one cares which algorithm is used, and for such short lists some of the O(n2 )
algorithms are faster than the O(n lg n) ones. If the plots showed from 100 to 10,000 records being sorted, the graph
would be both more meaningful to the reader and more honest.
Also, the graphs are often misleading, in that they are labeled with seconds or some other inappropriate unit. If
you had done some real speed tests, then seconds would be the right unit, but if you have just done a theoretical
analysis, then it is better to be more explicit about what you are plotting—in most analyses the number would be
the number of comparisons or the number of exchanges done. If you want to provide a good graph, you might look
up the formula for the expected number of comparisons actually done by each algorithm. The best place to find
accurate formulas for the standard sorting algorithms is Knuth’s Sorting and Searching [Knu73].
Here are some quick hints for creating usable graphs:
• Make sure that each graph has a single, clear purpose, and use the caption to point out what the reader is
supposed to notice.
• Label both coordinate axes.
• Do not print a grid, but do put tick marks on the axes.
• Don’t clutter up the graph with cute pictures.
• Make sure the ranges of the axes are appropriate.
11.4.2 Pseudo-code
Most of you have seen flowcharts for describing techniques. Although touted as a cure-all to programming woes in the
1960’s, they have essentially disappeared from professional programming. They encouraged unstructured “spaghetti”
code, and didn’t help much in understanding the resulting mess. The small boxes in flow charts give far too low an
information density, using lots of paper to say very little.
Instead of flowcharts, most algorithms are now described using pseudo-code. The pseudo-code syntax can be
based on any structured language (Algol, Pascal, C, Ada, . . . ), but should be readable by someone familiar only with
a different language. Tests and statements are not written in full detail, instead the intent of the statement or test
is given. For example, pseudo-code for finding the minimum of an array might look like
58
min← ∞
for each ēlement of array
if element<min then min←element
Pseudo-code should be properly indented and either displayed like the example above, or put into a figure with
a figure number and caption. Figures are generally easier for magazine and book typesetters to handle, and are
essential for big chunks of pseudo-code, but displays are easier to read for small pieces of code.
If you do put the pseudo-code in a figure, try to arrange the layout so that the figure comes after the explanation
has started. It is unfair to the reader to dump a big chunk of unexplained code in front of him or her. It can be very
frustrating to slog through the code only to find an explanation after you turn the page.
• First describe the outside view of the recursive function when viewed as a “black box” whose internal operations
are not visible. For example, when describing a recursive procedure that sorts an array, say something like
The procedure sort(arr,low,high) sorts the elements of the array arr[low] through arr[high]
into increasing order.
• Describe how the problem is subdivided and recombined in fairly general terms. For example,
Sorting is accomplished by dividing the sequence to be sorted into two smaller sequences, sorting
each sequence independently, then merging the two sorted subsequences.
Note that the above description could apply equally well to quicksort or merge sort. The differences between
the two algorithms come in how the partitioning and merging are done. Merge sort uses a trivial partition and
a more complex merging operation, while quicksort uses a fairly complex partitioning operation and a trivial
merge.
• Describe the boundary conditions that cause the recursion to stop.
A sequence with only a single element is already sorted, so no partitioning and merging are necessary.
Fast variants of divide-and-conquer sorting algorithms often use insertion sorting when the sequence
has fewer than five or six elements, as the overhead for recursion is quite high on conventional
machines.
• Describe the details of the algorithm after giving the overview. Only after you’ve described the divide-and-
conquer sorting strategy should you start giving the details that distinguish between merge sort and quicksort,
and only after that should you start distinguishing between the different variants of quicksort. Here is the place
to describe the importance of good pivot selection in quicksort, and to describe which of the many partitioning
strategies you recommend.
• Do not attempt to walk through an execution of the code without explaining the structure. It is tempting
to describe what happens in chronological order, but you’ll lose the readers very quickly if they have to keep
a six-deep recursion stack in their heads. Iterative algorithms are often best explained in chronological order,
but recursive algorithms are not. If you are explaining quicksort, you may find it helpful to illustrate the
partitioning step (which is iterative, not recursive) with pictures that show what operations happen, and in
what order.
59
11.6 Titles, title pages, and executive summaries
When you prepare a formal report, it is worth taking a little extra care to make sure that the report will reach the
intended audience, and that they will decide to read it. There are several things you can do to make it more likely
that your report ends up in the right place:
• Use an informative title. If your report is titled Quicksort or A report on certain problems within a database
company, no one will have any idea who to send the report to, where to file it, or whether it is worth reading.
A title that a secretary or an executive can understand is much more valuable: Speeding-up DDD’s zip-code
sorting by using quicksort.
• Use a title page. Put the title and your name on a separate page from the body of the report. Not only does
this look more professional than cramming everything into the body of the report, but it makes it easier for a
busy executive to sort the reports he or she has to read. You can put your address and phone number on the
title page also, so that anyone who wants to hire you again as a contractor can find you easily.
• Put an executive summary on a separate page from the body of the report. If the summary is short enough,
it can go on the title page. The summary should tell the executives all they need to know about the report:
what the problem is, what the solution is, and how much work will be needed to implement the solution, in
just a few paragraphs. Remember that the executive may be handling dozens of reports for different problems
in different products, and so a concise summary of the problem is as important as the solution.
60
• You should also check the verbal translations of the big-O order notation. For example, “grows exponentially”
has a specific technical meaning. The formula 2n grows exponentially, but n2 does not—n2 grows quadratically.
• When you report the relative speeds of different algorithms, know what it is you are reporting! In most analyses
of sorting routines, the number of comparisons or the number of exchanges is counted. Given the description
of the problem, which is likely to be more important?
• When you cite further sources for an algorithm, make sure that the source you cite uses the same variant of the
algorithm that you do. This is particularly important for algorithms like quicksort, where every author picks a
slightly different variant. For quicksort, the main variations are in how the pivot element is chosen and in how
the partitioning is done. The average behavior is O(n log n) for all the variants, but some of the variants are
much easier to program or have less probable worst cases.
If you have trouble understanding one author’s version of quicksort, check another’s. But, please, don’t get
two different versions mixed together! Warning: many authors pick a pivot for quicksort that gives the worst
possible behavior with an already sorted list. Because zip code lists are likely to be re-sorted after a few new
entries are added, you want to be sure that the algorithm you recommend will be fast even if the list is already
sorted.
61
11.8 The final draft
Your report will be read by executives who understand the problem, but not programming. The format of the report
can aid them considerably in finding the information they need. A jargon-free executive summary at the beginning,
explaining the problem and its solution, may be all they need to read.
Use good section headings and captions for your figures. Read Sections 9.3 and 9.4 of Huckin and Olsen [HO91,
176–183] for suggestions about formatting a report for maximum readability.
Prepare your report as neatly as you can. A professional consulting job may produce a 10–20 page report from a
$2000 study. At more than $100 a page, a consultant can afford to do it right. You do not need to spend enormous
amounts on type-setting or professional illustrators. A typescript is still acceptable in industry, though laser-printing
is becoming the standard, rather than the exception. Figures can be prepared on separate pages, pasted in, and
photocopied. Photoreducing hand-drawn figures before pasting them in can hide the wiggly lines somewhat. The
software available on the Macintosh and Windows machines makes combining graphics and text easy.
Choose an appropriate font size for your document. Unless you are writing a document that you don’t want
anyone to read (a warranty, perhaps?), choose at least a 10-point font. Unless you are writing for young children or
people with bad eyesight, use at most a 12-point font. Standard elite and pica typewriters fall in the correct range.
Using too large a font usually strikes readers as childish, which is not the effect you want in a report that you are
charging hundreds of dollars for. If you want to impress someone with the bulk of a report, use 24-pound rag bond
paper, a moderately wide font (say Palatino or Bookman instead of Times), slightly wider margins, and slightly more
leading (space between lines), not 14-point fonts.
Check your spelling and punctuation carefully. Check even the spaces around your punctuation:
• Have two spaces after each colon, question mark, exclamation point, or sentence-ending period. (This is
typewriter advice—when typesetting with variable-width fonts, no extra space is used after these punctuation
marks.)
• Have spaces outside, but not inside parentheses and quote marks.
• Have no spaces before any punctuation marks except “(” and open-quote marks.
• Have no spaces around dashes.
Because of the difficulty students have had in the past describing sorting algorithms, we have requested two drafts
with peer-editing on each draft before the final draft is due.
62