{28.Aug'03} This note describes changes to the programming and specification notation of the book "Programming from Specifications, 2nd ed." by Carroll Morgan. Most of the changes affect the programming notation which are the "guarded commands" that have been unchanged since thirty years. The problem of formal developments is that the only way to find errors are reviews. Although the clear documentation provided by a good formal derivation is the best basis for such a review, it has been shown that reviews will never (effectively) find all errors. Productive software development needs machine support. This support consists of syntax and type checking, and in the future - I hope - also of checking of applied laws and transformations. At present, however, the only automated means we have are automated assertion- checking and extensive repeatable testing of the programs. Consequently, we have to take our programs (not only the code!) to a real executable language that supports checkable program annotations. (Rationale: If we would translate only the code in the final step we could only check the code, not the reasoning, and even worse: we would have to track down every error to its source in the development. The author of the cited book has mentioned that his developments contained only marginal errors that were easily to correct, but correctly he didn't claim that this scales up to real-world programming of large systems as opposed to book-writing and university courses, where hundreds of students are reviewers free of charge. When a reviewer costs some tens of dollars an hour, one rather uses the computer.) To maximize the profit from automated checking and minimise the errors occuring during translation we need to minimise the distance between the derivation language and the execution language. Fortunately there already exists a practice-proven language which was designed with formally supported development in mind, and to some extent it is even simpler than the already puristic guareded commands. I think that the designer of Eiffel has correctly observed that the meaning of the "formal methods" for program development is to be used in daily practice and that programming language constructs which are not needed in the methodology are not needed in practical languages either. This is why the following changes to guarded commands are not a compromise between theory and practice, but in fact the victory of methodology. 1. Ubiquitous Procedures and Functions The principle methodological change is that all initial specifications are written as a function or procedure. This has several advantages: - Routines (with contracts) exist also in the programming language (while the general concept of a "program" does not). - It encourages the use and re-use of routines as program building-blocks in general. My methology suggests to keep routines very short. (For example it doesn't allow more than one loop per routine. This has the additional advantage that the pre- and postcondition of each loop get documented as the contract of the routine.) - Since value parameters are read-only (see below) we can get rid of "save variables" and preconditions like "x = X" which are often confusing to beginners. Furthmore new variables can only be declared at the beginning of routines, which unifies the concepts of "block" (units of scope) and routine. (As an aside, Eiffel routines unify not only blocks and contracts, but also error handling (improperly called exceptions) and even syncronisation of concurrent threads (in the SCOOP) model.) 2. Functions and expressions I allow routines that return (usually unnamed) values and don't affect (module-) global variables or have any other side-effects. Such routines are called (of course) "functions" and are clearly seperated from "procedures" which never return anything, but have a side-effect (on the module's state, or the environement consisting of other modules and the real world). The only parameter-passing mechanism is by-value (as in almost all OO languages, and even "C"). Functions can however return multiple values. The syntax is function_name( param, .. : type; .. ) : result_type IS - or - function_name( param, .. : type; .. ) : ( result, .. : type; .. ) IS (For simplicity I write all keywords in uppercase.) In the first case, there is a read-write variable "Result" in the routine, whose value is returned to the caller. In the second case all the result variables are available read-write and are returned as a tuple. This tuple can directly be used in a expression, or in a multiple assignment with syntax result, .. := function_name( param, .. ) This resembles closely the syntax found in the B language and it seems to be the most sensible way to handle result parameters: - It makes clear that the semantics is like assignment - We don't need the additional constraint that actual result parameters need to be different (since the same constraint for multiple assignment is already there). - There is no confusion about what the former values of the result parameters mean to the routine. And it is also evident that the result variables can be freely used on the right-hand side: after all, that's only a normal expression. I accept the new problem, that expressions may now not always be well-defined (since functions can have preconditions). But this problem occurs in all realistic programming languages (even purely functional ones) and without it, complex functions are just not possible - but functions and expressions form the basis of structured programming and shouldn't be restricted to built-in data types only. The separation of functions (queries) and procedures (commands) is already part of the Eiffel method and ensures that all expressions can be transformed, reordered etc. without affecting their semantics. Multiple function results are support by Eiffel tuples, although their syntax is not (yet) as simple. (Language proposal pending.) Incidentally the proposed language extension for Eiffel does also make the multiple assignment possible (lack of which was the cause for one of the translation errors reported in the cited book). 3. Loops Instead of the multiple-condition loops of Dijkstra, I simply use the loops of Eiffel. Rationale: - Invariant and variant are easier visible and you don't need to find a place for them in the accompanying documentation. (For standard patterns, like iteration, they can be left away, which is already the common practice in Eiffel.) - The initialisation is logically part of the loop anyway and making that part of the syntax means that the invariant doesn't need to be mentioned outside of the loop. - Multiple guards are rarely needed and in this cases it seems useful to treat the alternation concern seperately with a contained IF. - The new loop maps easily to the executable language (Eiffel). That's also why I chose the UNTIL form, instead of a WHILE. (Meyers design argument in this case was the simpler loop postcondition "C and I".) The refinement law is w : [ pre, post ] is refined by FROM w : [ pre, I and V >= 0 ] INVARIANT I VARIANT V UNTIL C LOOP w : [ not C, I, 0 <= V < V0 ] END If "C and I implies post". ("not, and, or, implies" are the Eiffel keywords for the conventional boolean operators. "=" is used for boolean equivalence.) 4. Conditionals Those need not be changed, because there is a trivial translation rule to "real code"! Given IF cond1 --> prog1 [] cond2 --> prog2 ... [] condN --> progN FI we choose an optimal order for readability and efficiency and translate to: IF cond1 THEN prog1 ELSEIF cond2 THEN prog2 ... ELSE CHECK condN END progN END In ordering we follow the convention to put a branch containing SKIP last. (If there are more branches containing SKIP they can first be conjoined.) This "ELSE SKIP" is then just left away. In most cases this translation is already optimal, but if there are more than two alternatives, we can further refine to: IF cond1 THEN prog1 ELSEIF cond2' THEN CHECK cond2 END prog2 ... ELSE CHECK condN END progN END Where "not cond1 implies (cond2' = cond2)", or generally for "i = 2..N" "not (cond1 or cond2 or ... cond(i-1)) implies (condI = condI')". As a special case of that and "cond1 or ... condN" we have "condN' = true" which justifies using ELSE instead of ELSEIF. We can view this transformation as a last refinement step, indeed the B language includes both a conventional IF and the indeterministic conditional called SELECT. In B only the former is code and the indeterministic one is for refinement only. If we follow the transformational approach of Bird and others who gives up that "all code must be efficient" and consequently (some or all) specifications are executable, we can allow both forms of IF in code: the indeterministic one is just translated the simple way by the compiler (which is an automated, although stupid refinement to deterministic code). If we allow to mix specifications and code in programs, there is no more big difference between "animating a specification" and "running a program". Anything that's free of "unsolved equations" should be executable, and any code should be allowed in specifications. (Note that the fact of having more abstract (and more inefficient) instructions in code which can be refined to more efficient ones, opens up the door of safe program optimisations which can be done either by the programmer or the compiler and thus leave the language open for advances in compiler technology. Furthermore, in the absence of mechanical checking of the refinements, abstract and efficient versioned can just be compared using extensive tests. This can be automated and will find a lot more errors than conventional testing (and, by experience, other errors than are found by reviews). The same approach is taking by DESSY (a yet unpublished work by myself).) 5. Modules Instead of the imports which repeat the specification of used features (exported routines and attributes), I use seperate module specifications (which are automatically generated in Eiffel). Module (class) invariants are coded in predicates (boolean functions) and mentioned in the pre- and postcondition of relevant routines. This avoids B's restriction that forbids module internal routine calls and makes Eiffel's contracts clearer. (In Eiffel the class invariant can be assumed and is maintained by all exported routines - but what is with partially exportet routines?) Conclusion It is astonishing that these simple changes already bring the "text book notation" and the "real programming language" so close together. It is hoped that the changed notation will inspire teachers and authors to teach practical methodology instead of "guarded commands for theory and Java(TM) for practice".