(In-)Formal Methods: The Lost Art

Morgan, Carroll

doi:10.1007/978-3-319-29628-9_1

Carroll Morgan^15,16

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9506))

876 Accesses

Abstract

This article describes an experimental course in “(In-)Formal Methods”, taught for three years at the University of New South Wales to fourth-year undergraduate Computer-Science students (http://www.cse.unsw.edu.au/~cs6721/). An adapted version was then taught (disguised as “Software Engineering”) to second year undergraduate students (http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2332).

Fourth-year CS students at UNSW are typically very-good-to-excellent programmers. Second-year students are on their way to the same standard: but many of them have not yet realised how hard it will be actually to get there.

Either way, whether good or on the way to good, few of these students have even heard of static reasoning, assertions, invariants, variants, let alone have learned how to use them$\ldots $ None of the simple, yet profoundly important intellectual programming tools first identified and brought to prominence (more than 40 years ago) has become part of their programming toolkit.

Why did this happen? How can it be changed?

What will happen if we do change it?

Below we address some of those questions, using as examples actual material from the two related courses mentioned above; they were given in the years 2010–4. As an appendix, we present feedback from some of the students who took one course or the other.

At the same time, some suggestions are made about whether, when and how courses like this one could possibly be taught elsewhere.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Assembling a prehistory for formal methods: a personal view

Article 25 September 2019

40 Years of Formal Methods

Origins and Development of Formal Methods

Notes

1.
In 2011: http://beta.csesoc.unsw.edu.au/2011/05/getting-max-right/.
2.
See this equivalently as whether you’d like to have Sean Bean or Brad Pitt on your programming team.
3.
Only this last suggestion is meant as a joke: for undergraduates, that phrase is more likely to mean “breaking up with your boy/girlfriend”.
4.
I say “second years” here because that is what I have actually been able to try. As should be clear from above, in my opinion this is better done in first year.
5.
Your aim in the end is, of course, that they should become one of you.
6.
A simpler program is better for more advanced students because they have developed, by then, an impatience with complexity introduced by anyone other than themselves. (First-years are still indiscriminatingly curious.) Furthermore, older students have begun to realise that their lecturers actually might have something to teach them. Remember Mark Twain:

When I was a boy of fourteen, my father was so ignorant I could hardly stand to have the old man around. But when I got to be twenty-one, I was astonished at how much the old man had learned in seven years.

And finally, older students have learned to suspect that if something looks really obvious than there’s probably a catch: so warned, they’ll stay awake. The “find the maximum” program used for fourth years was the topic of the quote in Sect. 1.1. For the second-years, I used part of an assignment they had been given in the first-year introduction-to-programming course.
7.
Have you forgotten too, as you read this? That’s precisely the idea.
8.
In fact I tested this beforehand on a small sample of such students, to find out whether they agreed that my hand-waving and picture-drawing could be regarded as a typical approach.
9.
Almost: see the “small problem” identified in Fig. 3.
10.
The handwritten version was done with a tablet app, and then converted to pdf. It’s important that it be done that way so that – even handwritten – it can easily be corrected and improved as it’s used and re-used in subsequent years.
11.
Annotating a pdf on a tablet is a remarkably efficient way of doing all this. Not only can you come back and alter your remarks later, but you can do the marking on the bus or train, a few each day, in time you wouldn’t otherwise be using. And it’s important to spread-out the marking as much as possible, so that your comments are fresh each time: as much as possible your comments should seem personal.
12.
The operator $(\mathbin {\max })$ is “maximum”.
13.
The “careful thought” here, which most students will enjoy, is to figure out what the longest Good subsegment can be that includes neither of the bad subsegments $\mathtt {A[}b_0,b_0+3\mathtt {)}$ and $\mathtt {A[}b_1,b_1{+}3\mathtt {)}$, where $b_0<b_1$ are the two adjacent bad positions with $ maxDiff = b_1{-}b_0$.
It must start at-or-after $b_0{+}1$, in order to leave out $\mathtt {A[}b_0,b_0+3\mathtt {)}$, and it must end at-or-before $b_1{+}2-1$ to leave out $\mathtt {A[}b_1,b_1{+}3\mathtt {)}$. So its greatest possible length is $(b_1{+}2{-}1)-(b_0{+}1)+1$, that is $b_1{-}b_0{+}1$. Since the largest such $b_1{-}b_0$ is maxDiff itself, the correct value for length l is ${maxDiff}{+}1$ for the largest maxDiff — and not ${maxDiff}{+}2$ as suggested above. This “guessing wrong” and then “calculating right” can be simulated in your presentation, and increases the students’ sense of participating in the process.
14.
The syntax is based on Dafny, for which see Sect. 6. Dafny does not however have loop, for, repeat or exit constructions.
15.
R.W. Floyd. Assigning Meanings to Programs. Proc Symp Appl Math., pp. 19–32. 1967.
16.
The reductio ad absurdum arguments, e.g. that students must translate their programs manually into assembler before being allowed to use a compiler, are escaped by recognising the role of abstraction.
A compiler provides a layer of abstraction that its user can pretend is reality. The programmer can believe, and reason as if, assignment statements really do assign (instead of loading into a register and then storing somewhere else), as if while loops really do “loop while” instead of comparing, setting condition bits and then (conditionally) branching back, whatever that latter might mean to a second-year. And such abstractions are usually good enough for a first course: more hostile, demanding applications can break them; but by that stage, students are ready to go to lower levels.
On the other hand, a typical IDE doesn’t abstract from anything: it’s a cookbook, not a chemical analysis of edible compounds and their reactions with each other. To make and run a program, you type its text into the left-hand window and keep pressing a button on the right, and fiddling with your text, until all the red highlights go away. Press another button at that point, and some outputs might appear somewhere else. It’s hard to explain what is going on without knowing the primitives over which these actions operate: source files, compilers, libraries, linkers, archives, object-files, debuggers. And indeed some teachers think that they are doing their students a favour by hiding those details from them. If, on the other hand, the primitives are explained and used first, on small examples, the IDE can be explained as a convenience rather than as an incantation.
17.
Just after introducing flowcharts is a good moment. (Recall Sect. 5.3.).
18.
Algorisms are the techniques for calculating with notations denoting numbers, viz. they are algorithms for arithmetic.
19.
Although the compelling rigour of logic, meta-language and object language, is what guided the creators of formal methods and is what makes sure that, in the end, it all comes together into a coherent whole, for most programmers it’s best not to present it that way initially.
Try explaining to a second year that “actually” there are at least four kinds of implication in Formal Logic: the ordinary “if/then” of natural language, the horizontal line in a sequent, the single turnstile, the implication arrow... and then, underneath it all, the double turnstile (which makes five).
Those things have to be asked for when a person reaches the point of being too confused to proceed without them. Only then will you be thanked for giving the answer.
20.
http://dafny.codeplex.com.
21.
http://rise4fun.com/dafny.
22.
As a teacher, however, be prepared for the later moment when they realise that Dafny can sometimes fail to prove correctness even when the program is correct. Of course it shares that problem with all verifiers: but the pedagogical issue is “How far can you get before reality bites?”.
23.
The comparison operators here operate over all elements of the structure. It’s a neat bit of notation, but at some point it must be mentioned that it is not transitive (when the intermediate structure is empty). Given that most of the students in the class won’t have heard of transitivity, now is probably not the time. Remember that the idea of operators’ having (algebraic) “properties” itself is a higher level of awareness than most will have at this stage.
Save this, thus, until at some later stage in the class the topic of algebraic reasoning comes up naturally. It will: how do you initialise a loop whose invariant is that some variable holds the product “so far”? What’s the product of an empty sequence? Why is there a “right” answer? (It’s so that product is a homomorphism.) At that moment, you can suddenly remember this operator, and discuss its abstract properties too. Having a store of deferred “Did you notice?” items, like this one, is useful for time-management during your interactive-style lectures. If you look like you’re going to run out of material, pull one of them out and connect it to earlier material. Spend a happy ten minutes discussing with them how to think about it properly.
Have also a few really intriguing “puzzles” that you can look at with the upper-layer students; from about one-third of the way through the course, the others will be happy to listen and they won’t be bored. A good one for “What’s the sum/product of...?” is “What’s the determinant of an empty matrix?” Only the upper layer will know what matrices and determinants are: but they will be pleased that you recognise their extra knowledge and expertise.
24.
An Australian styrofoam “stubby holder” gets a good reaction, especially a brightly coloured one whose beer-brand logo will be recognised even from the back of the classroom.
25.
The use of ?L? etc. just below is to be able to refer to them in this text. On the board, simply ??? is fine, since you can point to the one you mean.
26.
Depending on the grip you feel you have on their attention at this point, you could ask them whether they found it odd to be developing a program without a specification of what it should do. But do not force this: if they look confused enough already, do not add to it.
27.
Saying nothing (aside from, obviously, “Here are your answers from Lecture 1”) is important: it’s right now, for a few minutes only, that they will be most receptive and will draw conclusions of their own. You cannot draw their conclusions for them.
Any distraction (e.g. your voice) will dilute the effect. Silence!.
28.
N. Wirth. Program development via stepwise refinement. Communications ACM 14,4, pp. 221–7. 1971.
29.
The Dafny documentation explains why this is a risk with “SMT solvers”, which is the kind of prover (Z3) that Dafny uses.
30.
For me, as an undergraduate, the computing courses that had the most impact, both at the time and lasting even until now, were the “de-mystifiers” — the course on compiler construction, that showed how that impossible task could be routinely done if only you looked at it the right way (and read the literature); the course on operating systems, that followed a single character from the moment you typed it in until its arrival in your program’s char-buffer; and the course we would have had, had it been 10 years later, of how to program a full-screen editor.
31.
Calling it “creat”, rather than sanitising it to something sensible, is an important part of this experience. The students should be able to find that exact command using “man 2 creat”, and they should see that the behaviour described in the manual page matches their abstraction.
32.
Indeed it’s often the unpopular lecturers and courses that turn out in the end to have been of the most value. Remember your time at school, what you thought then and what you realised twenty years later.
33.
This is a problem with any form of teaching, of course. But it’s especially an issue with Formal Methods because those who haven’t “got it” don’t want it and, furthermore, don’t realise that actually they need it. On the other hand, those who have got it are so amazed at their new perspective that they tend to run ahead of the evidence and so discredit the whole enterprise. Formal-Methods proselytisers must play by the same rules as anyone else if they are not to be branded as zealots — which is the usual prelude to being ignored.
34.
For these students, that was first-year introductory programming in C.
35.
The hand-writing of assignments is deliberate. (In the course as given, most of the material was hand-written; much of this article has been typeset from those hand-written notes just for this publication.)
First, hand-writing is a much faster way of getting material ready when it mixed text, program code, marginal notes, arrows etc....
Second, and more important, is that it sends the message that clever, glossy, beautiful typesetting is not the aim of the course: we are interested in clever, glossy, beautiful ways of thinking. Hand-written notes and on-the-board lectures reinforces that.
36.
An example is given in Fig. 11.
37.
Often the red highlight and the short explanation will be enough for you to see what is wrong. But not always. This scheme is chosen to make things efficient for the marker, to reduce fatigue and so allow more real thought while marking.
Thus the marker’s principles in choosing the explanation will be to keep it short, and to act as a reminder to the marker what the problem really was. That way the marker can check your work more thoroughly (less fatigue), but will also be able to remember what the problem was and explain it face-to-face if you ask, afterwards, for more help.
38.
Carroll Morgan. Programming from Specifications. Prentice Hall 1994. Ralph-Johan Back, Joachim von Wright. Refinement Calculus: A Systematic Introduction. Springer 1998.
39.
In fact it was not possible to prepare for the assignment a fully “circular” buffer in the 2014 version of this course: getting the Dafny proof to go through proved too difficult to have prepared beforehand. But it was completed after the course, and the buffer will be fully circular next time.
40.
Is “Information Technology” a euphemism to disguise the fact that writing programs actually requires disciplined thinking and rigorous practices, more than just running spreadsheets, databases and word-processors? If so, it’s good news for some: the people who can apply discipline and rigour, when it’s required, will stand out from the pack. They’ll be more valued, will have important projects and earn higher salaries. The rest of us will depending on them.
41.
Since the questions are based on an actual assignment, references such as “you” etc. are to the students.
42.
That is, the code in Fig. 26 was written before the code of Fig. 25. Unfortunately, Fig. 26 is normally not made at all.
43.
Compare for example our read-stub to a traditional one in which nondeterminism is not available: then the stub would probably return one of three values for the number of characters read: the least, the greatest and one somewhere in between.
Our approach here in effect tests all values, not just three of them.
44.
Sequences are expensive because they support so many convenient operations: concatenation, subsequencing etc. Arrays are much faster, but have fewer native operations.
45.
The circular-buffer version of this is more sophisticated.
46.
If a loop $\mathbf{do }\, G_1\rightarrow S_1~[\!]~G_2\rightarrow S_2~\mathbf{od }$ were recoded as as a conventional while-loop, it would become
$$ \mathbf{while }~G_1{\vee }G_2~\mathbf{do } \quad \mathbf{if }~G_1~\mathbf{then }~S_1~\mathbf{else }~S_2~\mathbf{fi } \quad \mathbf{od }, $$
which has the disadvantages that (1) it must repeat $G_1$ and (2) it is not obvious from the text what assertion holds at the beginning of $S_2$. (It is of course $(G_1{\vee }G_2)\wedge \lnot G_1$, that is $G_2$; but that might not be obvious if $G_1{\vee }\,G_2$ itself has been simplified into some other form.)
So what we have written in Fig. 31 corresponds instead to
which avoids both of those disadvantages. It still encodes a priority, however, favouring $G_1$ over $G_2$. To do the opposite, we would swap first two interior if-branches.
47.
The 2>/dev/null merely hides the output of the fprintf’s.
48.
By “limited” we mean as above that output has priority only when it can write a full outBlock elements; otherwise (above) the choice between reading/writing remains nondeterministic.
49.
Thus in this question you are resolving the remaining nondeterminism in favour of reading, and you are doing it by choosing the way you transliterate the Dafny while-loop into C.
50.
This was a huge file, so large that the students could not tell just by looking what correct program output should be. Thus their confidence had to be based on the verification. They had to submit the fprintf output only: just the blocksizes read and written were checked.
51.
That is, all the comments are included, not just the favourable ones. That is to give a fair picture of good vs. bad: there’s no “cherry picking”.
52.
Three 1-hour slots per week is best; but time-tabling forced one 3-hour slot in 2014.
53.
The guest speaker was June Andronick from NICTA.
54.
The course was not given at all in 2013.
55.
This comment is of course the thesis of this whole article.
56.
The courses in 2010–2 were taught in three 1-hour slots per week. In 2014 time-tabling for second-years forced that to change to one 3-hour slot.

Acknowledgements

The ideas in this course description distill what I have learned from many years of teaching students and of interaction with my fellow lecturers, both in Australia and, earlier, at Oxford in the UK. Some of those ideas I thought of myself; but most I have copied from colleagues whose style I admire. The key is, of course, in having consistent principles of what to copy and what to leave aside. In spite of the difficulty Formal (or Informal) Methods has had in gaining traction against more conventional courses, I have personally never felt that I lacked the support of my fellow academics in trying this material out. In earlier teaching of rigour in programming, I took a very strict approach; here (obviously) it is not strict at all. I have been encouraged by others in both cases, and I appreciate it. It is not clear yet how to combine the informal and the formal: there is still more experimenting to do. Thanks therefore to all my students, friends, colleagues and even skeptics who have allowed this exploration the space to breathe, and who have given me fair and constructive criticism that has helped to make it better. Finally, I would like to thank Zhiming Liu, Jonathan Bowen and Zili Zhang for organising the Summer School on Engineering Trustworthy Software Systems at which lectures based on this “users manual” were given, and for the opportunity to publish it here. I am also grateful also for the institutional support of the University of New South Wales and of NICTA, both during the running of these courses and during the preparation of this article.

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Carroll Morgan
Data 61 (formerly NICTA), Sydney, Australia
Carroll Morgan

Authors

Carroll Morgan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carroll Morgan .

Editor information

Editors and Affiliations

Southwest University, Chongqing, China
Zhiming Liu
Southwest University, Chongqing, China
Zili Zhang

Appendices

A Binary-Search Class Test

1.1 A.1 Teacher’s Notes

1.
Hand the exercise out as a single double-sided A4 sheet with the instructions on the front and the template on the back. The fact it’s just a single sheet reinforces the feeling of informality that we want to achieve: this exercise should not be a big deal.
2.
Use an audible, but gentle alarm to give them 10 min to complete it. (I used a countdown timer on a smart phone that played 2001: A Space Odyssey’s theme Thus Sprach Zarathustra. It begins softly, and so doesn’t startle anyone; and in the end it gets a laugh.) Use an alarm sitting on the table, rather than e.g. checking a clock or wrist-watch yourself, because that removes you from the “enforcement zone” — you become the good cop. It’s the automated alarm that’s the bad cop.

(This is the same strategy used by some libraries that put their photocopiers on a timer that switches them off automatically at ten minutes before closing. Can’t blame the nasty librarian, in that case.)
3.
Just collect the answers, and then move on immediately. You want the students as quickly as possible to forget that they have done this test, because they’ll regard Binary Search as trivial, as “old stuff” (in spite of the fact their answer is almost certainly wrong), and if you dwell on it they will start to wonder whether they have enrolled in a course that’s beneath them.
4.
Look at the answers only later, e.g. when you get home: you will probably be amazed. Out of a class of 42 beginning second-year Computer-Science students I found just one answer that was correct. A second one was nearly correct; and the remaining 40 ($=95\,\%$) were quite wrong. Figure 1 in Sect. 4.1 above shows a typical example.
5.
Scan them all to pdf’s and mark them (at your leisure — you won’t need them for a while) by annotating them. (See example marking also in Fig. 1.)
6.
Remember that the point of this exercise isn’t humiliation, of course. What you will do is choose your moment, somewhere further down the course, where their coding is clearly better than it was on the first day.

At that stage you return the marked pdf’s of Binary Search, and you let them see for themselves how much they have improved.

1.2 A.2 The Test Itself

The next two pages are the test itself. The second page is a typeset version of Fig. 1 before it was answered. Probably the handwritten version is more effective, since it reinforces the informality.

Your program is to read from three variables A,N, and a:

A sorted array A of integers, of size N, indexed from 0 (inclusive) up to N (exclusive). Thus the array elements are
$$ \texttt {A[0]} \quad \le \quad \texttt {A[1]} \quad \le \quad \cdots \quad \le \quad \texttt {A[N{-}1]}. $$
An integer a to be sought in the array.

The program is to assign to a single variable n the least value such that $\texttt {A[n]}=\texttt {a}$, if there an a in A at all. Otherwise n should be the least value such that $\texttt {A[n]}>\texttt {a}$; and if no value in A is greater than a, then n should be set to N.

You can use other, temporary variables.

Write your binary-search program by filling in the seven boxes on the back of this sheet.

Do not change anything else.

The original version of this was handwritten. (See Fig. 1.)

B Assignment 1: Good Subsegments

The assignment follows, adapted for this article. Footnotes in italics have been added in this text; footnotes in normal font were in the original.

1.1 A.1 Motivation, Presentation, Evaluation

In class we went in detail through the steps needed to program-up a solution to a problem inspired by one of the assignments you had last year:^{Footnote 34} The techniques we used for the two versions of that program were intended to be what is normal for a second-year student: indeed, they (or similar) are normal for more experienced programmers too.

Motivation. In this assignment we deal with the same programming problem, i.e. we do it for a third time. But now we will be using the techniques of this course rather than the introductory techniques of last year. The aim is that these techniques are what will become normal for you.

Presentation. Below (p. 43) there is a (hand-written) section “The Assignment — detail ” that explains the assignment further, and contains the eight questions you should answer.^{Footnote 35} (The blurry green portions are model answers, to be used for marking. Note their approximate size!)

Your submission must be a single pdf file named Ass1.pdf, and it must have “Informal Methods Session 1 2014 Assignment 1” at the top, then your name (with your family name in capitals), and then your student number. That must be followed by your answers, clearly labelled Answer 1, Answer 2 etc. Note that the number-of-sentence limits are mandatory. If you write more sentences than allowed, the extra might be ignored. An example of that format is given in Fig. 10.

An easy, efficient approach is to use a text editor, i.e. with an ASCII file, and then print-to-pdf and submit that pdf. Using Word, Pages or is not recommended. Although the example below fits on one page, you may take as many pages as you like — but you may not write more than the allowed number of sentences for each answer.

Remember: pdf file Ass1.pdf — not .txt, .doc, .pages or .ps etc. And “Informal Methods 2014 Assignment 1” at the top, then your name (with family name in capitals), and then your student number.

Evaluation. The markers will be trying to make sure that you understand the material that the questions are covering. That’s a two-part process: first figuring out, from what you’ve written, what you are actually thinking; and second, figuring out whether you are thinking the right thing. Make the first part easy for the markers by writing clear and concise answers.

You will get marks only for the second part (evidence that you are thinking the right thing); but you cannot get marks there unless the markers successfully interpret the first part. So neatness is important.

Your marked answers will be emailed to you as annotated pdf’s. The general marking conventions will be as follows:^{Footnote 36}

Phrases the marker wants to emphasise as good will be highlighted in green. Green means “You got it.”
Phrases the marker wants to emphasise as not understood, or dubious, will be highlighted in yellow. Yellow means “Are you sure?” There might be a short query nearby.
Phrases the marker wants to emphasise as bad (e.g. completely wrong) will be highlighted in red. Usually those will have a short explanation nearby.^{Footnote 37}
Next to each question in blue will be a fraction: that’s the number of marks gained (numerator) over the number of marks available (denominator) for that question.
At the top of the assignment will be an overall mark as a blue fraction in a box, obtained by summing the individual numerators and denominators for each question. The numerator is your mark, and the denominator is the total mark for the whole assignment. (That total mark will be scaled to a percentage, later, depending on the proportion of marks this assignment represents in the whole course.)

The “numerator/denominator” scheme makes it easy to check for marking errors: check the numerator-sum to make sure the marks given were added correctly; check the denominator-sum to make sure every question was marked and its mark included in the total.
Any annotation numbers that are not fractions are merely marker’s notes, and should be ignored.

C Dafny Versions of Introductory Assertion-Exercises

D Stepwise Development in Dafny

In this appendix we give explicitly the stages through which one develops “from the outside in” a verified implementation of Binary Search. The virtues of this were explained in Sect. 8.

In Fig. 18 we have the specification of Binary Search given in the requires/ensures clause(s) just after the method header.

Then the method body sets n to an arbitrary value nondeterministically, via n:= * and, immediately afterwards, with assume statements forces that arbitrary value to be one that satisfies the very same ensures clauses as are above. Thus this “implementation” simply achieves the postcondition by setting n to a value that... satisfies the postcondition.

This extreme caution is brought about by experience: sometimes Dafny cannot prove that a universal quantification implies itself: in broad terms, that is because its general strategies for proving universal quantifications are sometimes confounded by simple instances. Here we are making sure at the very beginning that this won’t happen to us here.

And what do we do if Dafny fails even this simple first step? In that case, we look for another way to specify what we want the program to do.

In Fig. 19 we make our first refinement step, preparing to replace the simple assignment by a loop that keeps most of the postcondition as an invariant, but splits one conjunct off to be established by the negation of the loop guard. We introduce the variables low and high, and anticipate a loop whose effect is to make them equal.

Note this does not mean that, when you do this yourself, you have to type in the whole program again. Copy the method BinarySearch0; paste the copy in and rename it to BinarySearch1; then alter its body. Then verify them both together. (For a larger program, you might use separate files to avoid constant verifying of the earlier steps; but for a small program like this one, it’s so fast it makes no difference.)

In Fig. 20 we insert a loop skeleton: its invariant and guard are as advertised in the previous step (Fig. 19). But at this stage, with decreases *, we indicate that we are not yet interested in proving that the loop terminates. (Experiment by commenting out the decreases clause.)

In Fig. 21 we add the variant function that will guarantee loop termination. In this case it is that the variables low and high must move strictly closer together. First their current values are captured, and then the “set such that” statement requires that the difference has decreased.

With this done, a decreases high-low will be accepted by Dafny. But in many cases (including this one), Dafny can guess the loop variant itself: provided you code actually decreases some variant, Dafny will often figure out what variant that is.

In Fig. 22 we implement the strategy “Choose some new variable mid to lie between low and high, and use it to change one or the other of those two variables.” We don’t know which, yet; but the decrease of the variant forces us even so to choose assignment right-hand sides that will have that strict-decrease effect. (Experiment by replacing the mid+1 with just mid.)

Note the nondeterministic if statement whose both-true guards allow either of its two branches to be executed. At the moment, the assume statements further below live up to their name: they “assume” that the nondeterminism in the if statement has been resolved correctly, i.e. in a was that preserves the invariant. What will force us to code that up into “real” tests is that the assume’s are not allowed to be in our final program: in the Refinement Calculus it would be said that they are “not code”.^{Footnote 38}

In Fig. 23 we have replaced the true guards with actual tests; and, having done that, we can remove the assume’s. Rather than Dafny’s assuming that they hold, it can now prove that they do.

Note however that we still have a nondeterministic statement choosing the value of mid. And yet the program is correct. What that means is that this program works however we choose mid strictly between low and high. That is, the “binary chop” step, which we are about to implement, is a matter of efficiency, not of correctness.

Finally, in Fig. 24 we choose mid to lie somewhere approximately in between low and high and, finally, we have the traditional Binary Search.

Note though that by choosing mid differently (yet still in between), we end up with a linear search.

E Assignment 4: Real-World Programming

The assignment follows, adapted for this article. Footnotes in italics have been added in this text; footnotes in normal font were in the original.^{Footnote 39}

1.1 E.1 Why “Real World” Programming?

With the programming techniques taught in this course, you will be able to develop code more quickly than before, and it will have far fewer errors than is normal in the IT industry.^{Footnote 40} And your code will be more easily maintained as well.

For that to happen, you must learn to apply our “perfectionist” techniques in an imperfect world, where systems have imprecise or incomplete specifications, and where most programs are too large and detailed to allow assertion-based reasoning by hand alone. We need tools to help.

The UNIX-style copy command (cp simplified) is our example, using (almost) the real UNIX system calls. We abstract the system calls’ behaviour as Dafny requires/ensures specifications; and we will transliterate our Dafny programs into actual C code, and run it.

The remaining vulnerabilities are mainly that we have no real assurance that we have specified the UNIX calls correctly; and we have no assurance either that our transliteration into C of our own code did not itself introduce errors. The more-than-compensating strengths are that the algorithm is verified, that it can easily be changed without introducing errors, and that its documentation is enforced (and, if necessary, updated) automatically — and all of these because Dafny won’t verify it otherwise.

1.2 E.2 UNIX-Style Copying with a Single Buffer

In this section we take our first steps towards developing real code that copies standard input to standard output: it will be a scaled-down version of the UNIX command cp. For the moment, however, we abstract from UNIX by modelling the standard input, standard output and in-memory buffer all three as (Dafny) sequences rather than as actual files (input and output), or as a buffer-array with a pointer into it. That simplifies our initial sketch of the copying algorithm, so that we can see its overall structure.

The input-file sequence is fixed in value, modelling that in the real-life situation it is not being changed (by something else) as we read it; what does change as it is read is an offset pointer into the file that indicates the position from which the next read will occur. (UNIX stores that pointer as part of a “file descriptor” structure.) That pointer is initialised to 0 because the file is to be read from its beginning. The output-file sequence begins empty, modelling that we create a new file (rather than appending to an existing one); it is gradually extended by the buffer-loads of data that are successively written to it.

The effect of all this abstraction can be seen in the different answers required for the two questions marked by stars below.^{Footnote 41} They refer to our Dafny code (Fig. 26) and its corresponding code in C (Fig. 25).

1.
A simple C program for copying standard input to standard output is given in Fig. 25. It uses a single buffer of size $\mathtt {BUF\_SIZE}$ which size, in your case, you will set to digits 1–3 of your student number. (In the example, the student number is z7654321.) It reads at most $\mathtt {IN\_MAX}$ bytes of data at at time; you will set $\mathtt {IN\_MAX}$ to digits 2–3 of your student number. Take the code of Fig. 25 and edit $\mathtt {BUF\_SIZE}$ and $\mathtt {IN\_MAX}$ to reflect your own student number as above: make it into a file cpA.c. Compile it using the command cc cpA.c -o cpA. Run it by typing $\mathtt {./cpA <cpA.c}$, and check that it correctly copies its own text to the standard output. Now change $\mathtt {BUF\_SIZE}$ to 0, then re-compile and re-run your program. What error message do you get, and when?
2.
In Fig. 26 appears the Dafny program from which the C program of Fig. 25 was transcribed.^{Footnote 42} Make it into a file cpA.dfy and edit its constants as above; verify it using the command dafny cpA.dfy. (It should get no errors.) Now change the bufSize parameter to 0, and re-verify it. What error message do you get, and when?

1.3 E.3 Unit Testing: Harnesses and Stubs

In the Dafny code of Fig. 26 there are “simulations” of the environment in which our copy method is intended to run. The read(...) system-call is simulated by

where the declaration and initialisation of count is nondeterministic — the symbols :| mean “...is given a value such that.” And so the read system-call guarantees to set count to a natural-number value satisfying

but, beyond that, it makes no guarantee at all about which value that will be.

Similarly, the write(...) system-call is simulated by

where $+$ is sequence concatenation.

In both cases these simulations can be compared with the informal descriptions given in the actual UNIX man-pages. (You can enter the UNIX commands man 2 read and man 2 write if you want to see them.) The code-fragments above are called stubs because they are not the real system calls. Similarly, the method-call cpA(65,765) is a simulation of what is using (rather than used by) our copy method: it is called a harness.

In both cases – in conventional program development – the simulations, the stubs and harnesses, are supposed to provide a great variety of behaviours typical of what the unit under test will encounter in practice, focussing particularly on the so-called “edge cases” where coding errors are likely to have occurred: when index-variables are smallest, or largest; when structures are empty, or full etc. Making an effective test environment requires lots of work.

With a modern software development method (such as we are now using) this work is much reduced and yet is more effective, as we now show.^{Footnote 43}

First, we can remove the harness altogether: its function is taken over by the requires clause(s) of the copy method itself, which describes all of the things a harness for this program is allowed to do, including the edge-cases automatically. (A conventional harness can only implement some of those things, in general). In Fig. 27 the harness is no longer there; and your student number is no longer necessary for selecting “random” block-sizes.

Second, we can remove the stubs by replacing them by a specification of what they do; again, this describes all of their possible behaviours, not just some of them. In Fig. 28 there is no code for reading or writing.

Our read(...) specification replaces the traditional “read-stub”. Describe very briefly in words the intention of the following four postconditions of the specification of read in Fig. 28 :

(a)
ensures old(pos)!=|data| ==> justRead!=[];
(b)
ensures justRead==data[old(pos)..pos];
(c)
(d)

Describe very briefly in words what (bad things) the read method could do to its calling copy method if each the following three postconditions of read in Fig. 28 had separately been left out, i.e. in a case, for each one, where the read method violates it:

(a)
ensures old(pos)!=|data| ==> justRead!=[];
(b)
ensures justRead==data[old(pos)..pos];
(c)

Explain the purpose of the term + (if eof then 0 else 1) in the decreases clause of the copy method in Fig. 27.

Explain the different purposes of the Booleans input.eof within the class Input and eof within the main program cp .

1.4 E.4 The Buffer as Array; Reading/Writing in Blocks

The sequence abstraction for buf is very convenient for specification, but sequences are expensive to implement in real applications — and that is why it is is not used in the actual UNIX cp program.^{Footnote 44} Instead, an array (in C) is allocated for the buffer; and so we will model that now with an array in Dafny.

Because an array (unlike a sequence) does not move around in memory once allocated, for efficiency reasons, our use of it will have to become more sophisticated: we will have start- and end pointers s,e into buf that indicate the part of it buf[s..e] that contains actual data. When the pointers get to the end of the buffer, we will reset them to the beginning.^{Footnote 45}

Having such pointers allows furthermore that input and output might have different preferred block-sizes, and that might be important depending on what the actual input- and output devices are. For example, if the input device prefers to deliver data in blocks of 100 elements but the output device prefers to receive data in blocks of 150 elements, again for efficiency reasons, then we should read twice into the buffer (200 elements) before we write once (leaving $200-150=50$ elements behind, which we should try not to write until we have read more).

Our more sophisticated Dafny code is given in Fig. 29; notice that it is written in the multiple-guard while-loop style, which is much less error-prone than the usual form.^{Footnote 46} (The updated stub-specifications are given in Fig. 30.)

For cpC.dfy in Fig. 29 , supply code for the missing portions according to the following hints.

(a)
Put a Boolean test here that ensures there is some data to write.
(b)
Put a “such that” assignment here to n that is as liberal as possible consistent with correctness of the program, but is not more than outBlock.
(c)
Set s,e to the correct (new) values.
(d)
Put a Boolean test here that ensures that an EOF indication has not already been received, and that there is room in the buffer for more data.
(e)
Put a “such that” assignment here to n that is as liberal as possible consistent with correctness of the program, but is not more than inBlock.
(f)
Set e to the correct (new) value.

Based on cpC.dfy , make a file MYcpC.dfy according to your answers above. Then verify it with Dafny .

The C code corresponding to Fig. 29 is given in Fig. 31, where the Dafny-style multiple-guard while has been transliterated into a C-style

$$ \texttt {while~(1)~\{if}\cdots \texttt { else if }\cdots \texttt { else break;\}}. $$

What’s especially interesting is that in doing that transliteration we have had to decide which if comes first, so to speak the “read if” or the “write if”. Our choice in Fig. 31 has taken the second option: it gives priority to writing in the sense that if both reading and writing are possible, then writing will be chosen. That is, the code of Fig. 31 reads only when it can’t write; even though the original Dafny code does not have that property. Technically that represents a “resolution of specification-time nondeterminism”.

But we could have put the if’s the other way, as in Fig. 32, in which case instead the code would write only when it couldn’t read. Both Figs. 31 and 32 are valid transliterations of Fig. 29, and we can choose whichever we want depending on implementation issues (like which of reading or writing should be given priority in our particular application).

Based on cpC.c , make MYcpcC.c by filling-in the missing portions of Fig. 31 found in your verified MYcpC.dfy. Convert the Dafny such-that assignments to n to deterministic assignments in C that make n as big as possible consistent with the such-that’s. Fill in the constants according to your student number.

Compile MYcpC.c with cc MYcpC.c -o MYcpC.c and run it using the command on a test file of your choice. ^{Footnote 47}

1.5 E.5 Refinement of Multiple-Choice Iterations

In Footnote 48 we saw the general form

$$\begin{aligned} \mathbf{do }~G_1\rightarrow S_1~[\!]~G_2\rightarrow S_2~\mathbf{od } \end{aligned}$$

(3)

of a multiple-guard iteration. It executes by first evaluating the guards $G_1,G_2$; if both are false, the loop terminates. If exactly one of $G_1,G_2$ is true, then the corresponding statement $S_1,S_2$ is executed. But if both $G_1,G_2$ are true, then either of $S_1,S_2$ can be executed. This is known as nondeterminism.

Nondeterminism might at first seem to make reasoning about programs harder. But – in this form at least – it actually makes it easier. What an alternative $G_i\rightarrow S_i$ says is that “if $S_i$ is executed in a state where $G_i$ holds, then it is guaranteed to maintain the invariant and to decrease the variant.” It’s that simple.

When the guards do overlap in this way, then it’s possible in a refinement to alter the guards slightly in order to take implementation concerns into account. For example if we wanted a refinement in which the same overall effect was reached but, during the execution, the first guard was executed in favour of the second whenever both were ready, then we could use the modified loop

$$\begin{aligned} \mathbf{do }~G_1\rightarrow S_1~[\!]~G_2{\wedge }\lnot G_1\rightarrow S_2~\mathbf{od }, \end{aligned}$$

in which the second guard $G_2$ has been strengthened to include “unless $G_1$”. It is a refinement of (3). And the complementary $ \mathbf{do }~G_1{\wedge }\lnot G_2\rightarrow S_1~[\!]~G_2\rightarrow S_2~\mathbf{od }$ is also a refinement of (3), but one where we have given the priority to $S_2$ instead of to $S_1$.

The general refinement rule is that

$$\begin{aligned} \mathbf{do }~G_1\rightarrow S_1~[\!]~G_2\rightarrow S_2~\mathbf{od } \quad \sqsubseteq \quad \mathbf{do }~G'_1\rightarrow S_1~[\!]~G'_2\rightarrow S_2~\mathbf{od } \end{aligned}$$

(4)

when $G_1{\vee }G_2 \equiv G'_1{\vee }G'_2$ and $G'_1\Rightarrow G_1$ and $G'_2\Rightarrow G_2$. In words, the conditions are that the two loops have the same overall guard, and that whenever $S_i$ is executed in the more-refined loop, it must have been permissible to have executed it in the less-refined loop.

In our read/write loop of Fig. 29 in fact we have actual non-determininism whenever it is both possible to read (because there’s some space left in the buffer) and to write (because there’s some data in the buffer). In the C code of Fig. 31 we resolved that nondeterminism in favour of writing; and in Fig. 32 we resolved it in favour of reading.

By examining the guards you added to Fig. 29 , write down exactly, in terms of the program variables, the conditions in which nondeterminism is present, that is when both reading and writing are possible.

Alter your guards so that writing has priority over reading whenever a full outBlock elements can be written, but otherwise the priority is not determined.

Use the refinement rule in (4) to check that your new loop, with its limited ^{Footnote 48} output priority, is a refinement of the original.

Code up your altered read/write method in the style of Fig. 31 (with the fprint ’s included); call the file cpD.c. When you resolve any remaining nondeterminism (i.e. as you transliterate the multiple-guard loop into the form if–else if–else ), give the priority to reading. ^{Footnote 49}

Compile it, and run it on the input file Ass4In , ^{Footnote 50} capturing its fprint output in a file Ass4Out using the commands

1.6 E.6 How This Assignment Will Be Marked

1.
The written answers to “why this” and “why that” will be checked. They should be very short, and precise.
2.
The Dafny codes will be checked, by running Dafny on them, to see whether they verify.
3.
The C codes will be checked to see whether they appear correctly to transliterate the Dafny codes. They won’t be marked for style (otherwise), since conceptually the Dafny is our source code, and the transliterations are our assembly code. We don’t usually mark the assembly-code output of a compiler for style (unless we are evaluating the compiler itself).
4.
The test-file Ass4In was specially constructed to allow errors easily to be seen, and it will be used to check for run-time errors. But what kind of errors will it find? If the Dafny verified, the program should be correct as far as functionality goes. Thus this check helps to uncover transliteration errors; but it also captures cases where the nondeterminism was not resolved in the way the question required.

Written answers will be marked in the usual way, with partial credit available for answers that are partly correct. However...

Full credit for Dafny code is given only if it is the same structure (essentially) as the (supplied) code from which it is supposed to be derived, and has no verification errors when checked with Dafny. If it does have verification errors, then only partial credit is given. However if the Dafny code does not verify completely (that is, if it gets even just a single error), then no credit is available for the other two (remaining) files MYcpC.dfy and Ass4Out. That is, if your Dafny doesn’t verify then your C code gets zero, even if it looks right. Even if it is right. Our C code cannot be guesswork.

If the Dafny code does verify completely, i.e. with zero errors, then the remaining answers are marked simply as either correct or incorrect (i.e. either full credit, or none). To get full credit, the C code should compile without error and must accurately copy the marker’s (not merely the student’s) test-data file. For the printf outputs, the output you get must be byte-for-byte what is expected (based on the student number and the test file Ass4In). If it differs in even a single byte, it will be marked zero.

F Student Feedback: At Least Not a Failure

The comments below are verbatim, collected anonymously via UNSW’s teaching-evaluation web-interface just after the course has ended. Any material about the lecturer personally, rather than about the course, has been omitted however. Otherwise they are complete.^{Footnote 51}

As remarked in Sect. 10, there is no sense in which student feedback in the short term can establish that a course has been successful: it indicates only what they thought of its style, delivery and content. In spite of that, for formal-methods related material especially, it’s encouraging that none of these students felt the course was pointless or irrelevant.

1.1 F.1 From Second-Years in 2014

Best Features:

The interactive and hands on approach of the teaching in the course, as well as the content itself.
The content is relevant, lectures are interesting, tutorial is interesting.
The assignments were an amazing learning experience, the lectures were helpful and so were the tutorials. The assignments eased you in and allowed me to learn a lot of the content while doing it.
[This] course [was] interesting, challenging and overall awesome. The amount of content I learnt this semester in this course was huge. The structure of the course allowed a smooth transition for all students and the mentor sessions along with extensive notes provided allowed students to practice many examples before tackling the assignments.
This course let us know how to design and plan[ning] to build a software which is really cool and interesting.
[The] class room style teaching.
Very easy to understand and interesting [...] Assignments were incredibly fascinating and were well thought out. Notes also aided in reinforcing knowledge.
Encouraged thinking outside the box.
Teaching methodology [...] and course content. The choice of tutors were mostly good. Structure of the course (except Project Management).
[The] content was extremely interesting and useful.
Clear [...] The relevance of content was made clear from the start to beginning. Very interesting course.
Examples. Organisation.
Relevance to past real world examples. Assignments were not testing as much but rather assessing through learning using ideas taught throughout the course. They were thought out and well constructed to make you think.
[...] Interesting content. Challenging.

Suggested Improvements:

Splitting the 3 h lecture slot into 2 time slots.^{Footnote 52}
Nothing, it’s already the best.
Removing Assignment 4 and giving project management component an additional 2 weeks. I really enjoyed Assignments 1–3 but Assignment 4 seems somewhat repetitive (a summary of the other assignments in some ways). Also the invited guest speaker towards end of semester was also very interesting, I would love to see more in the future.
No improvements needed.
Iterating why it is important. I was not aware of how important proving correctness was until the guest speaker from NICTA visited.^{Footnote 53}
More defined learning areas.
Revise what Project Management requires and the aim of it.
Better mentor sessions.
Conducting better mentor sessions.
Cover more content. We went a little slow at times.

1.2 F.2 From Fourth-Years in 2012

Best Features: ^{Footnote 54}

Encouraged a different way of thinking than I was used to, that makes much more sense. These concepts should be taught in first year.
Teaching thinking method that I never use before.
Subtlety, concurrency ..... No EXAM ..... Assignments.... Requirement to think in a different way.
Everything. The content is amazing, very well structured, has a lot of interesting material and is well explained.
Good approach, but perhaps more appropriate for introducing people to programming properly than as a course numbered 6xxx which implies it should be taken later in the degree.^{Footnote 55}
This course radically changed the way I view programming, and has certainly improved my programming skills immensely. This course should be compulsory for all Computer Science students, or have its content integrated into first year.
The interesting problems and concepts presented. No other course makes you think like this, or presents methods of solving problems as this course does. Well structured and interesting topics. Assignments really helped to solidify lecture material.
The subject matter, the way it was structured, and the way it was taught. One of the best courses offered at CSE.
Fascinating content. Assignments were pitched at a good progression of difficulties. In general, the course was run extremely well.
Made the abstract, theory side of computing very accessible, with clear practical applications. Not sure what determines the lecture times, but the one-hour-per-day split was very good.^{Footnote 56}

Suggested Improvements:

It honestly could not be.
Nothing actually. Maybe more students?
Some lecture notes could [have] been released a week before the lectures so that you could have a better understanding of the content beforehand.
Nothing.
Having it in a room where you can hear from further back than the first row!
Not much, maybe some of the concepts were a little too challenging, and that coupled with the new techniques of problem solving we were learning really sent your head into a spin. Although you set it out extremely logically, it can still become rather overwhelming.
Not changing it in the slightest.

1.3 F.3 From Fourth-Years in 2011

Best Features:

The subject matter [...] the interaction between the students and the lecturer, in short all of it.
It changed my approach to programming in a way that I was then able to do better in my other subjects as well as in teaching programming.
Very useful technique, impressive lecture.
Class participation in lectures, interesting material.

Suggested Improvements:

Maybe some preview notes?
More consistent marking. Although I now recognise and appreciate the difficulty in marking the assignments, disparity between marks for making the same mistakes seems odd.

1.4 F.4 From Fourth-Years in 2010

Best Features:

Giving a thorough grounding to good programming techniques in computing through static reasoning.
It was awesome, limited size group. I’ve benefited more from that course than from any other at UNSW. The best feature definitely is the informal style of the course, the high interaction between the lecturer and the students, the timetable (three times one hour instead of three hours in a row in most courses), and the fact that the teached material is actually quite a rare stuff.
Interesting.
Everything!!!
The course content was really well thought out and prepared.
The course content was extremely interesting.

Suggested Improvements:

Better course notes.
Hard to say really. I can’t think of a simple way to improve it. But that doesn’t mean there is no room for improvement!
More time to do more stuff.
A more concrete assessment schedule.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morgan, C. (2016). (In-)Formal Methods: The Lost Art. In: Liu, Z., Zhang, Z. (eds) Engineering Trustworthy Software Systems. Lecture Notes in Computer Science(), vol 9506. Springer, Cham. https://doi.org/10.1007/978-3-319-29628-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-29628-9_1
Published: 01 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29627-2
Online ISBN: 978-3-319-29628-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

(In-)Formal Methods: The Lost Art

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Assembling a prehistory for formal methods: a personal view

40 Years of Formal Methods

Origins and Development of Formal Methods

Notes

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Binary-Search Class Test

1.1 A.1 Teacher’s Notes

1.2 A.2 The Test Itself

B Assignment 1: Good Subsegments

1.1 A.1 Motivation, Presentation, Evaluation

C Dafny Versions of Introductory Assertion-Exercises

D Stepwise Development in Dafny

E Assignment 4: Real-World Programming

1.1 E.1 Why “Real World” Programming?

1.2 E.2 UNIX-Style Copying with a Single Buffer

1.3 E.3 Unit Testing: Harnesses and Stubs

1.4 E.4 The Buffer as Array; Reading/Writing in Blocks

1.5 E.5 Refinement of Multiple-Choice Iterations

1.6 E.6 How This Assignment Will Be Marked

F Student Feedback: At Least Not a Failure

1.1 F.1 From Second-Years in 2014

1.2 F.2 From Fourth-Years in 2012

1.3 F.3 From Fourth-Years in 2011

1.4 F.4 From Fourth-Years in 2010

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation