Programming Language CSC804 Lecture Note
Programming Language CSC804 Lecture Note
Dr. S. C. Echezona
RECOMMENDED TEXTS
1
Course Outline
Comparative study of the organization and implementation of a variety of programming
Languages and language features. Design principles are explored and applied in a historical
review of major languages. Procedural, functional, Logic-based, Object-oriented languages.
Research issues such a polymorphism, formal semantics and verification explored in depth.
See if you can download and analyze Donald Trump Language ( the newest programming
language)
3. A set of rules that specify which sequences of symbols constitute a program and what
computation the program described.
There are at present thousands for programming languages because different implementations
tend to explore different abstractions of the problems. Hence programming language paradigms
or ways of reasoning about the problems ( school of thoughts).
Programming languages evolved from the earlier machines code and Assembler to higher level
FORTRAN, COBOL, to even higher Lisp, SNOBOL, APL, BASIC, to Algol, C, Pascal, PL/I,
Ada, Modula-2, C++, Small Talk, Prolog, Icon, Perl, Visual Basic, Python, Java, C#, Ruby, PHP,
etc.
Technically, programming languages are similar because each can be used to express the
different classes to problems or abstractions talked about before. However, some many pose
more challenges. So we say that any such programming languages that can be used to express all
possible kinds of algorithm is Turin complete. That implies that languages that targets exclusive
kind of problems (examples spreadsheets generative languages etc.) may not be classified as
Turin complete.
2
Obviously knowledgeability in different programming languages enables a programmer
handle all jobs with different demands. Hence enhances productivity. Although a good
knowledge in one will enhance the appreciations of others.
- Imperative Languages.
- Functional
- Data Oriented.
- Object Oriented.
- Non-imperative
Note that developers of programming languages are aligned to each of these paradigms even
when others combine/overlap with more than one paradigm.
1. Imperative Languages
These languages tend to give instructions to computer mainly in the way the computer will
understand the problem not to more to how human may understand and solve the problems.
Often referred to as procedure and functional since this is the way machine will understand it.
This is the default paradigms for other schools of thought eg FORTRAN, COBOL, PL1, Algol
and its descendants, C, C++, Ada.
2. Funtctional
This is when you give computer instructions as a pure mathematical equations or functions.
It is a special case of imperative programming languages. Eg. Math lab Mathematica and the
original FORTRAN IV etc.
These are languages that make it possible to create elaborate data structure for the data and
the procedure to handle the data structure. These class of programming languages makes it
possible to develop large and program for that may not be possible in others. Examples include:
COBOL, Lisp, APL, SNOBOL, Icon, SETL.
3
4. Object Oriented Languages
These are a class of languages that visualize problem in real world form, Extensive effort is
made to encapsulate the data requirements and the solutions (methods) in a class and these class
may not depend on any other to perform its function unless specified. This class of programs
encourages data hiding as efforts and made so that other programs/methods do not interfere with
other methods/data outside it. Examples: Java, C++, C#, Python etc.
5. Non-Imperative
These are languages that may not bother to perform the usual assignment statements found
in other languages but rather prefers to describe the computations in questions and allows the
system figure out the computation. Most often a lower level program goes to generate underlying
solutions to the problem described. For instance some data base application allows one to
describe the structure of forms and tables offline and allows a code to generate programs to solve
these problems. Generative package such as spreadsheet packages, graphic, design packages,
simulation packages, etc., are some of the examples.
Standardization
Standardization is when a body/organization approves a standard set of codes for a language.
This can be done during or after language development. The advantage is that if your
programming language is standardized, it will be possible to port the language across several
hardware with minimal changes in the language. Many languages may have different dialects,
sometimes most interpreters are designed to warn if not standard codes are used in the program.
If you are writing a software package that is run on a wide range of computers, you should
strictly adhere to standard otherwise your maintenance task will be extremely complicated
because you must keep track of several machine specific items.
4
Machine Code, Assembler
Instructions set vary enormously in size, complexity, and capabilities. Difficult for humans
Basic unit of computations is the machines words often used as a number.
Example
include p16f84 inc”
extern SQR-Root, SQR, SQUARE
VECTORS code
TEXT code
5
Btfsc STATUS, C ;Check if produces carry
Global SUM
End
FORTRAN COBOL
“High level Language Imperative paradigm. Enter human readable arithmetic expressions can
be written on a single line. Flowchart widely used to assuage the chaos entailed by Goto-based
program control flow.
Example.
PROGRAM wimp
Func(xx) =7-6*(xx**5)
Write (*,101)
Write (*,104)
104 format (‘input left and right boundary) read (x,x) xl, x2
N=nstant
6
Fa=0.0
If (x!) 4,3,4
3 con=func(x2)
Go to 5
4 con=func(x1) =func(x2)
5 continue
Sum1=0.0
Sum2=0.0
L=n/2-1
X=x1
Do 10 I=1 to l
X=x + dx
10 sum2=sum2 + func(x)
Write(*,111) n , f
12 n=2 * n
Fa= f
Go to 6
13 continue
Stop
End
7
Lisp, SNOBOL, APL, BASIC
Functional paradigm and alternatives interpretive user friendlier, slow. Enter functions and
other complex computations can be written in a line or two in some of these languages. More
important are advances such as automatic recycling of memory and the ability to construct or
modify new codes while the program is running. Example;
(program
(if
Program TPC32
Uses SysUtils
Windows
8
Parser
Scanner
10Utilities
CommonVariables
Begin
WriteIn (Greetingstring)
{$WARNINGS ON}
CommandlinePtr cmdline
{$WARNINGS ON}
Read configurationFile
LinkerOptions
ProcessCompilerParameters
CurrentFileName TempString
SaveRegisterAndSetErrorReturnAddress
LoadLiberary
Case LastError of
NoError
9
Ada, Modula-2, C++
Modula systems Programming language Data abstraction. Improvement in scalability, to go
along with the fact that you have to write zillions lines to do anything.
Example
VAR
I, J: CARDINAL;
Temp Hi:
DEXOR: BOOLEAN;
BEGIN
FOR I; =O TO 255 DO
TempHi := 0 :
Temp Lo ;=0 ;
For J; = 1 To 8 DO
DOXOR =TRUE;
ELSE
DOXOR: =FALSE;
END;
END:
10
TempHi : =TempHi >> 1;
IF DOXOR THEN
END;
END
END;
Runtime Systems
Programming Languages semantics are partly defined by the compiler or interpreter and partly
by the runtime system. A runtime system consists of libraries that implement language
semantics. They range from tiny to gigantic. They may be linked into generated code, or linked
into an interpreter and sometimes embedded directly to generated code. They include things
like implementation languages built-ins that are not supported directly by hardware, to
memory managers and garbage collectors, to thread schedulers to input/output.
11
Aside from the benefit of programming to solve the goto control flow, memory management is
a dominant aspect of the modern computing. If its not solved by the language it will dominate
the efforts required to develop most programs. For instance memory debugging in C, C++ may
occupy 60% of the effort required in getting in working solution.
Tracing from early languages which laid out data statically as global variables. Then when we
started using functions for everything, we discovered that most of the data was short lived and
could be reused if we allocate it to a stack (ie, local variables) Machine hardware evolved to
dedicate 1-2 registers to this. When objects came, we discovered that longer lived data tended
to be association with application domain concepts, and that which data had highly variables
lifetimes best served by (automatically managed) heap. 00 systems typically dedicate another
register (“self” or “this”0 for this.
I/O
Almost all programming Languages consider I/O as an after-thought. It is obvious that I/O is
dominant in all aspects of modern computing and of the effort required to develop most
programs. The dominance of graphics, networking and storage in modern computing hardware
advances; necessity of I/O in communication of results to humans; Proliferation of different
computing devices with different I/O capabilities. The implication is that programming language
syntax and semantics should promote extensible I/O abstractions as central to their language
definitions. Ubiquitous I/O hardware should be supported by language built-ins.
-Preprocessor
-JIT
-???
12
-???
-tokenizing
-tree
-VM
Executes via software via software interpreter of a virtual machine instruction set.
A post script is a language used to describe the appearances of a printed page developed by
Adobe.
Enscript
-enscript is a program that converts ASCII text file int a postscript. It has some basic options
for readable formatting.
-efficiency of execution.
13
-Implementability (if not who care)
-consistency
-simplicity.
-expressiveness
A context free grammar can be used to generate strings in the corresponding language as
follows.
While there is some nonterminal Y and X apply any more production rules using Y
eg. Y-w.
When x consists only of terminal symbol it is a string of the language denoted by the grammar.
Each iteration of the loop is a derivation step. If an iteration has several nonterminal to choose
from at some point, the rules of derivation would allow any of these to be applied. In practice
pairing algorithms tends to always choose the leftmost non terminal or the rightmost
nonterminal resulting in a string that are leftmost derivations or rightmost derivation.
LR parser simplified version of a canonical LR parser to parser (separate and analyse) a text
according to a set of production rules specified by a formal grammar for a computer language
LR= Left to right.
YACC
(Yet Another Compiler Compiler)
The parsing algorithm used by YACC and Bison (LALR) can only handle a subset of all legal
context free grammars.
14
-Full context free parsers existing since 1970’s use so much time and space and so are
prohibitive.
- YACC runs in linear time that is proportional to input size (ie number of tokens). A very
desirable property for a tool that must handle large inputs all the time like compilers
-YACC space requirements are worse than linear, but use tricks such as identifying that rows
in table that are identical, to keep the parse table reasonable in size.
Repeat {
A=*ip
REDUCE: A –>B: {
Push A
15
LR Parsing
Consider the grammar:
E : E ‘+” T / E ‘ –‘ T/ T ;
T : T ‘*’ G / T ‘/’ G /G ;
G : F ‘^’ G/ F ;
F: Num/ (‘E’) ;
Note that most times lexical analysis is usually interleaved with parser such that yyparser() calls
for the yylex() once every time it shifts. This might result in the mix of I/O and CPU balance
although most often the I/O is buffered to realise good performance.
The parse stack is empty and yyparse () calls on yylex () to read first token.
Shift or reduce? Shift. Note that you could reduce even in this empty stack case if the
grammatical had production rule where there was some options thing at the start.
16
Shift or reduce? We still need to have a T and don’t, so reduce again.
-Selection statements
-Looping statements
17
A preconditions is an assertion before program execution that defines the expected state. It
defines requirements that most be true in other for statements to do what it is supposed to do.
A post condition is an assertion after a statement executes that describes what the statements
has caused to become true. An invariant is an assertion of things that do not change during the
execution of a statement. A invariant is particularly useful with loop statements.
While X>=y do
X : =x - y
Suppose {X>=O} and {y>O} is true. Then we can further say {X> y>O } inside the loop. After the
assignment different assertions holds:
While X >= y do
X:= x – y;
While these kinds of assertion can allow you to prove certain things about program
behaviour, they only allow you to prove that program behaviour corresponds to requirements if
requirements are defined in terms of formal logic. There is a certain difficult in scaling up this
approach to handle real world software systems requirements but there is certainly a great
need for every technique that helps programmers write corrects programs.
POLYMORPHISM
Polymorphism simply means different forms. This refers to languages that allow codes that can
behave differently at different situations (context sensitivity). Polymorphism can be static or
dynamic. In static polymorphism, the multiple forms are resolved at compile time and
appropriate machine code generated. Examples of static polymorphism are:
-Overload: the same name is used for two or more different objects or subprograms
(including operators)
18
-Generic: a parameterized template of a subprogram is used to create several instance of a
subprogram.
-Variant and unconstrained records; one variable can have values of different types.
Type Conversion
Type conversion is the operation of taking a value of one type and converting it to a
value of another type. There are two variations (1) translating a value of one type to a valid
value of the other and (2) transferring the value of an uninterpreted bit string.
I: Integer:=5;
F: Float: =Float(I);
C syntax is
Int i=5;
Float F=(float);
In addition C includes implicit type conversion between types (primarily numeric type);
Int I ;
Float F=I;
Explicit type conversion are safe because they are simply functions. If the predefined
type conversion did not exist, you could always write your own. Implicit type conversions are
more problematical because the reader of the program does not know if the conversion was
intended or if it is an unintended oversight. Using integer in a complicated floating point
expression should cause no problem. But other conversion must be written explicitly.
The second type of type conversion is to simply allow the program to use the same bit
string in two different ways. C uses the same syntax for both forms of conversion. If it makes
19
sense to do a type conversion, such as between numerical types or pointer types, a true
conversion is done otherwise the bit field is simply transferred as it is.
C++ while the retaining C type conversion for compatibility, has defined a new set of
cast operators.
Dynamic cast
Static cast (An expression of type T1 can be static cast to type T2 if T1 can be
implicitly converted to T2 or conversely. Conversely , static cast would be used for
type safe conversions like float to int and conversely.)
Reinterpreted cast (unsafe type conversion)
Cast-cast (used to allow assignment to constant objects)
Overloading
Overloading is the use f the same name to denote different objects in a single scope. The use of
the same name for variables in the same two different procedures (scope) is notoverloading
because the two variables do not exist simultaneously. The idea of overloading comes from the
need to use mathematical and input-output libraries of different types. In C for instance,
different name need not be used for absolute values function on different types:
Int i=abs(25)
Double d= fabs(1.57):
Long l + labs(-256):
In Ada and C++, the same name can be used for two or more different subprograms provided
that their parameter signatures are different. As long as the number and/or types (but not just
the name or modes) of the formal parameters are different, the compiler will be able to resolve
any subprogram call by checking the number and types of actual parameters.
In Ada
F1:=sin(F2);
20
L1:=sin(L2);
An interesting difference between the two languages is that Ada takes the function returns type
into account when searching for overloading, while C++ restricts itself to the formal
parameters.
Of particular interest is the possibility of overloading predefined operators like + and * in Ada
Of course you have to supply the function to implement the over loaded operator for the new
types note that the syntactic properties of the operator, in particular its precedence, are not
charged C++ has a similar facility for overloading.
This is just like functions declaration except for the use of reserved word operator. Operator
overloading should only be used for operators that are similar to the predefined meaning so as
not to confuse maintainers of the program.
If used carefully overloading can reduce the size of the name space and ensure the portability
of a program. It can even enhance the clarity of a program because artificial names like fabs are
no longer needed. On the other hand, indiscriminate overloading can easily destroy readability
by assigning too many meanings to the same name. Overload should be restricted to
subprogram that do the same sort of computation. So that the reader of a program can
interprete the meaning just from the subprogram name.
#include <iostream.h>
21
void main(void)
//Function declaration
cout << “Enter any two floating point numbers “ << endl;
//swap on integers
swap_int(ix,iy);
22
cout <<endl;
// Swapping of charracters
swap_char (cx,cy);
Int temp;
temp = a;
a = b;
b = temp;
23
{
float temp;
temp = a;
a = b;
b = temp;
char temp;
temp = a;
a = b;
b = temp;
10 20
11.11 -22.22
a b
swapping of integers
ix = 10 iy = 20
24
after swapping
ix = 20 iy = 10
after swapping
fx =-22.219999 fy = 11.11
swapping of characters
cx = a cy = b
after swapping
cx = b cy = a
The following program demonstrates how function overloading is carried out for
swapping of two variables of various data types, namely integers floating point
numbers and character data types.
//Overloading of functions
Include <iostream.h>
25
float fx, fy;
cout << “ enter any two floating point numbers “ << endl;
// swapping on integers
swap (ix,iy);
//swapping of characters
26
cout << endl;
int temp;
temp = a;
a = b;
b = temp;
float temp;
temp = a;
a =b;
b = temp;
27
{
char temp;
temp = a;
a= b;
b= temp;
100 200
-11.11 22.22
s t
Swapping of integers
ix =100 iy = 200
After swapping
ix = 200 iy = 100
fx = -11.11 fy = 22.219999
After swapping
fx = 22.219999 fy = -11.11
28
Swapping of characters
cx = s cy = t
After swapping
cx = t cy = s
Generics
Arrays, lists and trees are data structures that can store and retrieve data elements of arbitrary
type. It is necessary to store several types simultaneous by some form of dynamic
polymorphism is needed. However, if we are working only with homogenous data structures
such as an array of integers, or a list of floating point numbers, it is sufficient to use static
polymorphism to create instances of a program template at compile time.
For instance, consider a sort program to sort an array. The type of array element is used only in
two places. When comparing elements and when swapping elements. The complex
manipulations of indexes is the same whatever the array element type.
Begin
Min= I
End loop
29
A(Min) = Temp;
End loop;
End sort
In fact, even the index type in irrelevant to the programming of the procedure, as long as a
discrete type (such as characters or integers) is used.
To obtain a sort procedure for some other elements type such as character, we could physically
copy the code and make the necessary modification but this will introduce the possibility of
errors. Furthermore, if we wish to modify the algorithm, we would have to do so in each copy
separately. Ada defines a facility called generics that allows the programmer to define a
template of a subprogram and then to create instances of the template for several types. While
C lacks a similar facility, its absence is less serious because void pointers, the size operators and
pointers to functions may be used to program if unsafe subprograms. C++ calls its
implementation template. Note that the use of generics does not ensure that any of the object
code will be common to the instantiations, in fact, an implementation may choose to produce
independent object code for each instantiation.
Here is a declaration of a generic subprogram with two generic formal parameters in Ada.
generic
This generic declaration does not declare a procedure, only the template of a procedure. A
procedure body must be supplied: The body will be written in terms of the generic parameters:
Begin
……………….exactly as before
End sort
30
To get a (callable) procedure you must instantiate the generic declaration, that is, create an
instance by furnishing generic actual parameter:
These are actual procedure declarations instead of a body following the “is” keyword, a new
copy of the generic template is requested.
The generic parameters are compiled-time parameters and are used by the compiler to
generate the correct code for the instance. The parameters form a contract between the code
of the generic procedure and the instantiation. The first parameter Item is declared with the
notation(<>), which means that the instantiating program promises to supply a discrete type
such as Integer or character and the code promises to use only operations valid on such types
since every discrete type has the notational operator defined on its values, the procedure sort
is assured that “!” is valid. The second generic parameters Items Array is a clause in the
construct that says. Whatever type was given for the first parameters, the second parameters
must be an integer indexed array of that type.
The contract model works both ways. An attempt to do an arithmetic operation such as “+” on
values of type Item in the generic body is a compilation error, since there are discrete types
such as Boolean for which arithmetic is not defined. Conversely, the generic procedure could
not be instantiated with a record type because the procedure needs “!” which is not defined for
records.
The motivation for the contract model is to allow programmers use to reuse generic units with
the certain that they need not know how the generic body is implemented. Once the generic
body compilers correctly, an instantiation can fail only if the actual parameters do not fit the
contract. An instantiation will not cause compile errors in the body.
Templates in C++
In C++ generics are implemented with the template facility.
31
:
There is no need for instantiation; a subprogram is created implicitly when the subprogram is
used.
I-Array a:
C-Array c;
A function template is defined to find the sum of a given array of elements such as in, float and
double, etc.
T temp = 0;
return(temp);
A function template is defined to swap two given items of different data like int, float, double or
character.
32
template<class T>
T temp;
temp = first;
first = second;
second = temp;
return (0);
A program to define a function template for summing an array of integers and an array of
floating point numbers.
#include <iostream.h>
T temp = 0;
return(temp);
void main()
{
33
int n= 3, sum1;
float sum2;
sum1 = sum(a,n);
sum2 = sum(b,n);
cout <<endl;
A program to define the function template for swapping two items of various data types such as
integer and floating point numbers.
#include <iostream.h>
T temp;
temp = first;
first = second;
second = temp;
return(0);
34
}
void main()
swap(ix, iy);
10 20
-11.22 33.33
35
After swapping integers
Ix = 20 iy = 10
Fx 33.330002 fy = -11.22
In Ada and C++, we have two ways of constructing polymorphic data structures: Generics in Ada
and Templates in C++ for compile time polymorphism, and class wide types in Ada and pointer/
references to classes in C++ for runtime polymorphism. The advantage of generic/templates is
that the data structure is fixed when it is instantiated at compile time; this can improve both
the efficiency of code generation and the memory that needs to be allocated for the data
structure. i
Java chooses to implement only runtime polymorphism. As in Smalltalk and Eiffel every class in
Java is considered to be derived from a root class called object. This means that a value of any
non-primitive type can be assigned to an object of type Object (of course this works because of
the reference semantics).
To create a linked list, a node class would first be defined as containing (a pointer to) an Object.
The list class would then contain methods to insert and retrieve values of the object.
Class Node {
Object data;
Node next;
Class List {
36
/*Java Code*/
If L is an object of type List and a is an object of type Airplane-Data, then L. Put (a) is valid
because Airplane-Data is derived from Object. When a value is retrieved from the list, it must
cast to the proper descendant of Object.
a = (Airplane_Data) List.Get();
Of course if the value returned is not type Airplane_Data ( or descendant from the type), an
exception will be raised.
The advantage of this paradigm is that it is very easy to write generalized data structures in
Java, but compared with generics/template there are two disadvantages: (1) the additional
overload of the reference semantics( even for a list of integers), (2) the dangers that an object
placed on the wrong queue will cause a runtime error when retrieved.
Variant Records
Variant records are used when it is necessary to interprete a value in several different ways
at runtime. Common examples are:
To solve those types of problems programming languages introduce a new category of types
called variant records which have alternative list of fields. Thus a variable may initially contain a
value of one variant and later be assigned a value of a different variant with a completely
different set of fields. In addition to the alternative fields, there may be fields which are
common to all records of this type; such field usually includes a code which is used by the
program to determine which variant is actually being used. For example, suppose that we wish
to create a variant records whose fields may be either an array or a record.
Typedef struct {
Float f1 ;
Int i1;
37
} rec;
Now a union type in C can be used to create a variant record which can itself be embedded into
structure that includes the common tag fields.
Typedef struct {
Arr a:
Rec r;
} data;
} S_Type;
S_Types S;
From a syntactical point of view this is first ordinary nesting of records and arrays within other
records. The difference is in the implementation; the fields data is allocated enough memory to
contain the longer of the array field s or the record r (fig 10.1). Since enough memory is
allocated
Code F1 i1
Code a
Fig. 10.1
to accommodate the largest possible field, variant records can be extremely wasteful of
memory if one alternative is very large and the others small:
Union {
Int a[1000];
Float f;
38
Char c;
/* C Code */
At the cost of complicating the programming, the best solution in this case is to use a pointer to
the long fields.
The assumption underlying variant records is that exactly one of the fields of the union is
meaningful at any one time, unlike an ordinary record where all fields exists simultaneously.
If (s.code ==Array.Code)
else
The main problem with variant records is that they can potentially cause serious bugs. It is
possible to treat a value of one type as if it were a value of other type (say to access a floating
point number as an integer). Since the union construction enables the program to access the
same bit string in different ways. In fact Pascal programmers use variant records to do type
conversion which is not directly supported in the language. In the above example, the situation
is even worse because it is possible to access memory locations which have no meaning at all:
The field s. data.r might be 8bytes long to accommodate the two numbers, while the field
s..data.a might be 20 bytes long to accommodate ten integers. If a valid record is stored in
s.data. r, s.data. a [5] is meaningless.
Ada does not allow variant records to be used to break type checking. The “code” field that we
used in the example is now a required field called the discriminant, and access to the variant
fields are checked against the value of the discriminant. The discriminant appears as a
“parameter” to the type.
39
. Val: Expression –->value
-So
and
Val (2+3)*4)=20
.The domain of a function such as Val is a syntactic space: The set of expression is a syntactic
entity.
.The range of a function such as Val is a semantic space: The set of integers is a semantic entity.
. In many programming languages, a program can be regarded as something that receives input
and produces output.
. A semantic function for programs would be: Program –-> {Input - Output}
Int main ()
{ int x ;
Printf (“%d/n, x)
Return 0;
.Here p denotes the identity function f from integers to integers: P(P)= f, where:
. The function symbol –> is right associative so we can simplify functions such as:
40
. P: Program—> { Input –Output}
To
.This allows for a slightly simpler notation particularly as semantic spaces are frequently
functions.
2. A definition of the semantic space consisting of the values of the semantic functions
And
Syntactic Spaces
In denotational semantics, we use a notation that is essentially the same as the abstract
syntax used in operational semantics.
The sets are listed first using capital letters to denote set elements.
The grammar rules are then listed which recursively define the elements of the set.
E.g. the syntactic spaces Number and Digit are specified as follows:
. D: Digit
N: Number
. N—>N D| D
D -’0’ |’1’|….|’9’
A denotational definition views the spaces as sets of syntax trees specified by the
grammar rules.
Semantics functions are defined recursively on these sets, based on the structure f a
syntax tree mode.
These are the sets from which semantics functions take their values.
They are sets like syntactic spaces but may have extra structure depending on their
use.
41
E.g. the integers have the arithmetic operations “+”, “-“ and “*”.
Such extended spaces are called algebras.
Strictly, these need formal specification but we can often get away with “well-known”
spaces.
E.g. the semantics space of the integers:
Operations:
We restrict ourselves to three operations because they are the only ones used in our
sample language.
The symbols v: indicate that the name v will be used to indicate an arbitrary member
of the set.
A semantics function Is specified for each syntactic space.
Thus the value function from the syntactic space Digit to the semantic space Integer is
written: D : Digit – Integer
The value of a semantic function is specified recursively on the trees of the syntactic
spaces.
This is done by giving a semantic equation for each grammar rule.
D D …. D
| | |
‘ 0’ ‘1’ ‘ 9’
D D D
42
‘ 0’ ‘1’ ‘9’
Where the [[….]] notation indicates that the argument is the syntax tree with the listed
argument(s) as child(ren).
Another Example
N : Number - Integer
. N - ND/D
And is given by
N[[D]] =D[[D]]
Another Example
N D
43
Using the expression language of last week and the approach described here:
Syntactic Spaces
E: Expression
N: Number
D: Digit
N – >N D | D
Semantic Spaces
. Operations:
Semantics Functions
E[[N]]= N[[N]]
N: Number –>Integer
N{{D]]= D[[D]]
D : Digit - Integer
‘(2+3) *4’
E [[ (‘2’+’3’) * ‘4’]]
= (N {[2}] +N [[3]] * 4
= (D [[2]] + D [[3]] + 4
(2+3) * 4
=5 * 4
=20
The value undef is given a special name, bottom, and is denoted by the symbol ∟.
We write Integer {Undef} as Integer∟.
Environments
. We can now define Env0, the empty environment by
45
The evaluation of expressions within an environment must include the environment
as a parameter.
Thus the semantic value of an expression now becomes a function mapping
environments to integers:
E “ Expression -Environment-Integer∟
To extend the semantics to these; we note that these are function mapping
environments to environment
Execution of a statement simply adds its value to the environment.
We will use the same & notation we saw last week for adding to an environment.
. A statement list is simply the composition of the equivalent functions:
(f o g)(x) = f (g)(x) )
Syntactic Spaces
P : Program
L: Statements-List
S: Statement
E: Expression
N: Number
D: Digit
I: Identifier
A: Letter
46
Abstract Syntax
P –> L
L -L1 ‘;’ L2 | S
S –>I ‘:=’ E
N – >N D | D
I-I A | A
Semantics Spaces
. Operations
-: Integer x Integer--->Integer
Semantics Functions
P : Program –>Environment
47
S: Statements—>Environment—>Environment
S; Statement ---Environment---Environment
E[[E]](Env) = E{[E](Env}
E[[I]](Env) = Env(I)
E[[N]]= N[[N]]
N: Number—>Integer
N[[D]]= D[[D]]
D: Digit—>Integer
S: Statement
48
S: Statement -Environment—>Environment
Where, given:
F: Environment –Integer
G: Environment –Environment
H: Environement –Environment
Then
To use this as a semantic function we must be sure that there exists a unique solution
for F in some sense.
How this termination is assured is beyond the scope of this subject.
I:= 1 :
S: Statement—>Environment∟—>Environment∟ where
49
Environment∟ =(Identifier –>Integer∟)∟
Axiomatic Semantic
How we define the semantics of a program by describing the effect that its execution has on
assertions made about the program.
Preconditions
and
Post-conditions.
x :=x + 1
We would expect that, whatever the value of x was before execution, its value after execution
is one greater.
We write this:
. { x = A}
X:=x+1
{x = A +}
x : =1/ y
Where the precondition is that y be non-zero and the post condition is that x = 1/y.
50
{y<>0}x := 1/y {x =1/y}
Sort-program
E.g.
Axiomatic Specification
{P}C{Q}
The semantics of this is that if P is true prior to execution of C then Q is true after execution.
In general, given the assertion Q there are many possible assertions P with the property
{P}C{Q}.
E.g. For:
. C =x : =1/y
Q = x =1/y
Then
Weakest Pre-condition
51
Where we have a range of possible pre-conditions it is possible to identify the most general or
weakest assertion which will result in Q being true.
wp(C,Q)
Now, we define the axiomatic semantic of the language construct C as the function wp (C, Q)
from assertion to assertions.
This is called a predicate transformer in that it takes a predicate (assertion) as argument and
returns a predicate(the weakest precondition) as result.
Examples of wp(C,Q)
. x := E
Properties of wp
wp(C, false)=false
Distributivity of Conjunction:
Distributivity of Disjuntion:
52
Law of Monoticity
Note, first, that the specification of the semantics of isolated expressions is not normally
undertaken in axiomatic semantics.
Essentially, the assertions axiomatic semantics are statements about the side effects of
language constructs.
That is, they are statements about values of identifiers in the environment of the
construct.
P -L
L—>L1’;’ L2 | S
Syntax rules such as P –> L and L -S do not need separated specification, since these rule
state that the wp function for a program is the same as for its associated statement list and that
if a statement list is a single statement then wp for the list is the same as that of the statement.
Statement lists:
Assignment statements:
This introduces a new notation Q[E/I] which is defined to be the assertion Q in which all free
occurrences of I are replaced with E.
A free occurrences is one which is not subject to “for all” or there exists”
Free Variables.
53
E.g. with
We can evaluate
But
Q{1/i} = Q
wp ( ‘x : =x +1 ;’ X > 0)
Here:
I=x
E=x=1
Q={x>0}
So, from
wp(I:= E, Q) =Q[E/I}
We get
={x>-1}
54
The weakest pre condition for the if-statement in our sample language is defined as;
wp (if E then L1 else L2, Q)= (E>0—>wp(L1, Q) and (E <0 –>wp (L2, Q))
An Example if-statement
Consider:
Here
Q= {x=1}
E=x
L1=x:=1
L2=x :=-1
Now:
So:
and
Substituting back
55
=true and x>0
=x>0
Once again, this recursive behaviour creates problems for formal semantics.
We will give an inductive definition based on the number of times the loop executes.
Let H(while E do L od, Q) mean that the loop executes i times and terminates in a state
satisfying Q.
Clearly:
and
H1(while E do L od, Q)
In general:
Hi+1(while E do L od, Q)
wp(while E do L od, Q)
Note that the loop must terminate for this to make sense.
E.g.
56
wp(while 1 do od, Q) =false for all L, Q.
Axiomatic semantics were developed as a tool for proving the correctness of programs.
Swapxy
t :=x ;
x :=Y :
Y := t ;
Is correct
To prove this we must compute wp(C, Q) and then show that P -wp (C, Q)
57
and wp(t:=x, wp(x:=y, wp(y:=t, y=X)))
Now:
wp(t;=x, y=Y) =
{y=Y}
and
{x=X}
As can be seen from even this simple example, formal proof of program correctness is
extremely complex.
The prospect of proving the correctness of 100,000 – line program, even a particularly carefully
written one, terrifying.
58