Formal Aspects of and Development Environments for Montages

Philipp Kutter

Formal Aspects of and Development Environments for Montages Philipp W. Kutter Matthias Anlau Eidgenossische Technische Hochschule International Computer Science Institute CH{8092 Zurich, Switzerland Berkeley, CA 94704, USA kutter@tik.ee.ethz.ch anlau @acm.org Alfonso Pierantonio Universita di L'Aquila I{67100 L'Aquila, Italy alfonso@univaq.it August 31, 1997 Abstract The speci cation of all aspects of a programming language requires adequate formal models and tool support. Montages speci cations combine graphical and textual elements to yield language descriptions similar in structure, length, and complexity to those in common language manuals, but with a formal semantics. A broad range of people involved in programming language design and use may nd convenient to use Montages in combination with the tool Gem{Mex. It allows the automatic generation of high{ quality documents, type{checkers, interpreters and symbolic debuggers. 1 Introduction Montages [19] are a speci cation formalism for describing all aspects of programming languages. Syntax, static analysis and semantics, and dynamic semantics are given in a uni ed and coherent way by means of semi{visual descriptions. The static aspects of Montages resemble control and data ow graphs, and the overall speci cations are similar in structure, length, and complexity to those found in common language manuals. Thus, Montages are a formal instrument which can be equally well understood by language designers, compiler constructors, and programmers. Based on Abstract State Machines (formally called Evolving Algebras) [12] Montages provide a theoretical basis for a number of activities from initial language design to prototyping. ASMs have been proposed by Y.Gurevich as a dynamic generalization of multi{sorted algebras, intended to provide a more versatile notion of Turing machine, \able to simulate arbitrary algorithms in a direct and essentially coding{free way" [12]. In short, ASMs are a state{based formalism in which a state is updated in discrete time steps. Unlike most state based systems, the state is given by an algebra, that is, a collection of functions and universes. The state transitions are given by rules that update functions pointwise and extend universes with new elements. ASMs have already been used to model the dynamic semantics of programming languages such as Prolog [7], Occam [5], C [13], C++ [26], Oberon [17], and VHDL [6]. At the risk of oversimplifying somewhat, we can describe some of these models [13, 26, 17] as follows. Program execution is modeled by the evolution of two functions CT and S. The current task CT represents the part of the program text currently in execution and may be seen as an abstract program counter. S represents the current value of the store. Formally one de ne the initial state of the functions and speci es how they evolve by means of transition rules. In the described models [13, 26, 17], the initial state is assumed to include the results of a static analysis, which is only described informally. This analysis provides a representation of the program's control and data ow in the form of functions between parts of the program text. As usual the control ow functions specify the order in which statements are executed, and the data ow functions specify how values ow via 1 variables through operations. The corresponding transition rules update the program counter and program state using the control ow and data ow functions. Montages suggest how to use ASMs to model not only the dynamic semantics of a programming language, but the static analysis and semantics as well. In particular, we show how to generate the control and data ow, i.e. what for us correspond to the abstract syntax, starting from the concrete one. This mapping is provided by means of graphs which confer to the speci cation a certain intelligibility. In this paper, we show some toy examples. The speci cation method scales{up to realistic languages, e.g. in [20] is the complete speci cation of the whole language Oberon can be found. Complex features as encapsulation, modularity, inheritance and pointers are covered in a surprisingly short and comprehensive manner. Montages have been used also in [2] and [10] for formalizing the object{oriented language Sather and the SQL direct (ISO9075), respectively. The collection of Montages de ning a language may be used used for generating automatically a number of tools, such as type{checkers, interpreters and symbolic debuggers. This is accomplished by means of the Gem{Mex language suite which provides a convenient and comfortable environment. These tools feature also the possibility to generate high quality documents suitable for presentations and reference manuals. The paper is organized as follows. In section 2, we de ne some prerequisites. Montages are presentated in section 3. Then we illustrate the tool Gem{Mex. Finally, in the last two sections we provide a brief comparison with related work and draw conclusions. 2 Prerequisites In the following sections we give some preliminary notions. In section 2.1 we recall the notion of context free grammars, look at the related derivation trees, and introduce compact derivation trees. The main point of the section is the observation, that the nodes of compact derivation trees are characterized by a speci c subset of the symbols in the grammar. This subset, the so{called characteristic symbols is the base for the syntax driven modularity of Montages. We introduce the notion of initial state of ASMs, and specify how a compact derivation tree is mapped in such a state. In section 2.2 we introduce the notion of transition rule of ASMs, and show some generic transition rules which can be used to traverse a compact derivation tree parallelly and sequentially. Although all used aspects of ASMs are explained during the sections, we have to be rather short. For a more complete treatment and motivations we refer to [12, 4]. 2.1 Initial State and Tree Representation Given a context free grammar of a language, the generation of a string of that language can be described by means of a derivation tree. The root of the tree is labeled with the start symbol. We say as well, the root represents the start symbol. Every replacement of a nonterminal by 1 2 m in the derivation of is represented in the tree by appending from left to right nodes representing 1 to m to the node representing . The new nodes are labeled with the corresponding symbols 1 to m . Such trees can be made more compact by putting multiple labels in the case of synonym productions [22]. Synonym productions are rules of the form ::= 1 j 2 j j m , which give place to nodes with only one child. In such cases we simply do not append new nodes but we keep track of the synonym productions by adding a new label to the current node. The resulting trees are called compact derivation trees and we distinguish a synonym production ::= by writing = . To exemplify the situation we give in g. 1 the normal and the compact version of a term + ( + ) + of a typical expression language. According to the above de nitions, each node is labeled with at least one terminal or one non-terminal, which is the left-hand-side of a non-synonym production. Such symbols are called characteristic symbols since it can be shown that each node is labeled with exactly one of them. If a node is labeled with a characteristic symbol we say as well that the node is characterized by . Such a characterization partitions the set of nodes. S n s s s n s n s s ::: s S s s ::: s n n a b c E D e s s 2 E Expr Sum Expr \+"Term S Expr, Sum F Expr \+" Term Ident: e Expr, Sum Term Product Expr, Term, Factor, Ident: \+" Factor Term \*"Factor Ident: a Factor Ident: b Term, Factor, Ident: b $" Expr $" Sum Expr \+" Term Term e T, P a ExprInParenth Term, Factor, Ident: \+" \*" Factor, ExprInParenth $" Expr,Sum Expr, Term, Factor, Ident: c Factor $" \+" Term, Factor, Ident: d Factor Ident: d Ident: c Figure 1: Normal and compact derivation trees Given a program, its compact derivation tree is represented in the associated initial state. A state of an ASM is given by a set called the superuniverse and a collection of functions. The superuniverse has the distinguished elements true, false, and undef. Unary functions from the superuniverse to are used to represent sets or universes. The universe consisting of and is called Bool. Our setting requires some speci c universes. In particular, the nodes of the compact derivation tree constitute the universe Node. Moreover, each symbol in t n is interpreted by a sub-universe of Node containing those nodes which are labeled by . Example The compact tree in g. 1 states that universe Term contains six elements, namely all nodes with the label Term. As mentioned in the last section, the universes interpreting the characteristic symbols partition the universe Node, i.e. each node is in exactly one such universe. But a node might be present in more than one universe if it has multiple labels, which is possible if its derivation includes synonym productions. Example In g. 1 the left{ and bottom{most node of the compact tree is member of the universes Expr, Term, Factor, and Ident, where Ident is its unique characterization. A number of so{called selector functions re ects the structure of compact derivation trees and allows us to retrieve the syntactical elements of the program text. Since descendants of a node are constructed by a production rule, we de ne the functions accordingly. Let be a node whose descendants have been constructed by a replacement due to a production rule ::= , then If is of the form \ 1 2 m " we access the new nodes in the universes 1 2 , and m by unary functions ftrue; f alseg true s f alse V [V s x n E s s :::s s ;s (S- i : Node s ! s s i )i2f1;:::;mg If the same symbol occurs more than once in \ 1 2 m ", we enumerate the functions from left to right: S1-s maps to the rst -descendant, S2-s to the second and so on. If contains a symbol in a part, then an element of a universe ListNode is created and serves as access point of the whole list. The details are given in section 3.2. s s s x E E s s f g 3 s Apart from the selector functions, which re ect the structure of the tree, we need also an auxiliary function Up : Node ! Node, which links the descendants to their parents. The automatic de nition of initial state is implemented in the Gem-Mex tool which is described in section 4. 2.2 Transition Rules and Tree Traversal Given an initial state a transition rule governs the behavior of an ASM. A transition rule is a description of how the current state evolves starting from the initial one. The transition rules are built over the following set of constructs: update, block, conditional, vary and extend. The update operator is a pointwise modi cation of a function. Its synopsis is (1 n ) := 0 Such a rule rst evaluates the terms 1 n and 0 over the current state to the elements 1 n and on the point ( 1 0 , and then modi es n ) to 0 . The update of does not a ect the de nition of the function over the rest of the domain. A set of transition rules constitute a block and are executed in parallel. For convenience we speak henceforth about \transition rules", if we mean the block of them. A conditional rule consists of a condition and another rule . f t ;:::;t t ;:::;t e f if then endif t e ;::: ;e C t e ;::: ;e e f R C R The then branch is triggered if the value of the guard expression is evaluated to true. A conditional rule may have also an else branch, in such a case the corresponding transition rule is triggered if the guard is not evaluated to true. The operators vary, and extend are explained as they are encountered in the paper. The di erent states of an ASM are reached by iteratively triggering the transition rule until the ASM reaches a state which cannot evolve anymore. Montages model the static semantics and analysis with an ASM that traverses the tree. In the next section we de ne what exactly happens during this traversal. In this section we de ne a general pattern of transition rules which can be used to de ne a traversal. In instance of the pattern traverses a compact derivation tree of a given grammar, and executes at each node an action, which depends on the characterization. In addition a general technique is introduced allowing to sequentialize the tree traversal in an arbitrary way. R C Imperative versus declarative style In [18] we de ned tree traversal in an imperative style. Here we use alternatively a de nition in a declarative style. The declarative style presented here can be used for parallel traversal as well, which is needed for the derivation of parallel compilers [3]. The imperative version of [18] is a sequential re nement of the declarative version given here. The advantage of the imperative version is that it is easier to read for non-academic programmers. An advantage of the version given here, is that it facilitates extraction of an axiomatic description of the state reached after the graph traversal. As noted the aim of the traversal is to execute for each node in the tree an action. We start with a rule executing the action for all nodes in parallel. Then we show how to specify an action depending on the characterization of the node. Finally we present a solution, how the execution of the actions can be sequentialized. Parallel traversal In order to execute an action R for each node, we use the vary construct of ASMs. The rule vary Self over Node R endvary (1) executes R for each element in Node simultaneously. The bound variable Self can be used in R to access the single elements. If for instance Node is a universe with two elements and , the above rule corresponds to a block of twice R, once with Self substituted by and once with Self substituted by . a a b b 4 Case distinction by characterization Using the fact that the nodes are partitioned by their characterization, we can execute a specialized rule Rn , the so called action of n, for a node characterized by n. Technically we can do this by replacing R in the above vary rule by a block of conditionals, one for each characterizing symbol n: if n(Self) then (2) Rn endif Such a conditional triggers Rn only, if n(Self) evaluates to true, i.e. if Self is in the universe n. For convenience we say henceforth action of a node, if we mean the action of its characterization. Up to now, for each node, its action is executed in parallel. The next task is to introduce the possibility to sequentialize the execution. Typically the actions should be executed for lower level nodes rst, and in some order between the children of a node. The situation where actions of lower level nodes are executed rst allows already for direct representation of structural induction: each node (representing a parsed term) can use the results of the actions (de nitions) performed for the descendants (representing sub-terms). In addition we need often a certain sequentialisation between descendants, e.g. actions for declaration parts in programs must typically be performed before actions for the statement parts. For the sequentialisation task, we need a boolean dynamic eld Visited: Node ! Bool which is initialized with false for each node. This eld indicates whether a node has been visited, i.e. whether its action has been executed. A relation before: Node Node ! Bool relates nodes sequentially. The relation (a before b ) indicates node a must be visited before node b. The relation before can be de ned with a parallel tree traversal, as we will see in the next section. Sequentialized traversal Using the above de nitions, a sequentialized traversal is de ned by the following rule vary Self over Node satisfying for all node in Node holds node before Self implies node.Visited (3) R Self.Visited := true endvary where again R is re ned to a case distinction by characterization (2). The nal state or termination of a sequentialized tree traversal is reached if the root of the tree is visited. 3 Montages Montages is a semi{visual formalism which de nes for each characteristic symbol the related syntactic and semantic aspects of a language. We start by giving two examples for Montages of programming language constructs. The rst example, given in g. 2, is a typical While loop. The topmost part is the production rule de ning the context free syntax. Below is a graphical representation of the control and data ow graph. The NT (NextTask), and TrueTask arrows denote for instance sequential control ow, while the Condition arrow denotes the data ow. Control ow arrows are dotted and data ow arrows are solid. The control ow arrows I (initial) and T (terminal) are special arrows which serve to plug together the local ow{information to the global one. The boxes and circles are labeled with the selector functions accessing the corresponding 5 WhileStatement ::= WHILE Expr DO StatementSequence END Condition I S-Expr NT NT S-DO TrueTask T S-StatementSequence condition Self.S-Expr.StaticType = Boolean if DO(CT) then if CT.Condition.Value = true then CT := CT.TrueTask else CT := CT.NT endif endif Figure 2: A Montage for a While statement elements. A circle is used to indicate that the corresponding descendant is a token of the program text. The third part of the While Montage contains the static semantics, that is, the type of the While-condition must be Boolean. The last part contains the dynamic semantics rules. This rule is executed if the abstract program counter CT points to a DO-task. In this case, it checks whether the value of the condition is true. If it is true, the abstract program counter is set to the statement sequence (using the TrueTask arrow), else to the next task. The next task of the DO token is not de ned directly by the graph, but it is de ned through the mentioned plugging mechanism of the T arrow. Sum ::= Expr \+" Expr Left I S1-Expr NT S2-Expr Right NT S-\+" Self.S-\+".StaticType := Least CommonSupertype( Self.S1-Expr.Terminal.StaticType, Self.S2-Expr.Terminal.StaticType) condition Is Num(Self.S1-Expr.StaticType) and Is Num(Self.S2-Expr.StaticType) if \+"(CT) then if Over ow(CT.Type, CT.Left.Value + CT.Right.Value) then CT := RunTimeError else CT.Value := CT.Left.Value + CT.Right.Value CT := CT.NT endif endif Figure 3: A Montage for a Sum expression 6 T As a second example, we show a Sum Montage ( g.3). The static semantics of the sum expresses that both components must be of numeric type. In the de nition we use a static function Is Num( ) which maps all numeric types to true. The graph specifying control and data ow de nes again NT control ow arrows, and two data ow arrows Left, Right, which are used to reference the left respectively right argument of the \+" token. In contrast to the rst example, the second part of the Sum Montages contains a textual rule. This rule uses the static function LeastCommonSupertype in order to determine the type of the Sum from the types of the left and right arguments. The dynamic semantics rule of the \+"-token raises a runtime error, if the addition leads to an over ow, with respect to the type of the expression. Otherwise the value of the \+"-token is set to the result of the addition and control is passed to the next task. It is remarkable how the understanding of a Montage does not require too much expertise as shown in the examples above. The formal semantics given below is an unambiguous arbiter between di erent ways of understanding and it makes clear how the interaction between the Montages works, e.g. how the I and T arrows plug the local ow information together to global control and data ow graphs. The resulting ASM semantics is exactly as compact as the visual Montages, i.e. each element in a Montages corresponds to an update in the ASM-semantics. In [19] we de ned several notational shortcuts, which are not used in this text. 3.1 Basic De nitions Formally speaking, the semantics of a Montages speci cation is an ASM that for a given program checks the static semantics, initializes the control and data ow functions, and in a second phase executes the dynamic semantics. The transition rule of consists thus of two rules, one modeling the rst phase, called statics rule, and one modeling the second phase, called dynamics rule. The statics rule is a sequentialized traversal as described in the last section. The actions executed by the statics rule check for each node the static semantics, and de ne control and data ow between leaf descendants. In the Montages framework, the de ned ow information is stored as functions between parts of the program text, i.e. leaves of the derivation tree. This functions correspond to the arrows in the examples. As usual the nal state of such a sequential traversal is reached if the root is visited. If the traversal is aborted, e.g. a function Abort is set to true, the checked program is not valid. Otherwise we can proceed with the dynamics rule. The second phase uses the nal state of the rst as its initial state. The dynamics rule executed in the second phase corresponds to the transition rule of a traditional ASM for dynamic semantics, as described in the introduction. But in contrast to the traditional use of ASMs, where control and data ow are assumed to be given as functions between leaves, here these functions are de ned by the rst phase. As exempli ed above, Montages are modules containing for a characteristic symbol: A production rule, if the symbol is a non-terminal. A semi{visual speci cation of control and data ow functions. The graphical part consists of nodes representing the right-hand-side symbols of the production rule and arcs specifying ow function. The textual part is a transition rule. A rst{order logic predicate that represents the static semantics constraint. This predicate is marked by the keyword condition. Transition rules that model the dynamic semantics of terminals generated by the production rule. These four parts are used to de ne the statics and dynamics rules. The dynamics rule is simply the block of the rules given in the fourth part of the Montages. The statics rule is a sequentialized traversal (3). The second and third part of a Montage of a symbol s de ne the action of s in this traversal. The de nition of the before -relation for the statics rule is done by a parallel traversal (1) with case distinction by characterization (2). The actions of this traversal are de ned such that lower nodes in the tree must be visited before higher nodes, and that siblings are visited in the order corresponding to their left-before-right and top-before-bottom order in the graph of the Montages. We can thus de ne the s-action in the parallel traversal de ning the before -relation of the statics rule as follows: M M 7 before(Self.S1 , Self.S2 ) := true before(Self.S2 , Self.S3 ) := true ::: before(Self.Sn?1 , Self.Sn ) := true before(Self.Sn , Self) := true where S1 , S2 , , Sn are the selector functions accessing the descendants of an -node in the left-before-right and top-before-bottom order de ned by the Montage of . Please note that the transitive closure of the de ned relation is not necessary due to the way how the sequentialisation mechanism works. To illustrate we give the corresponding actions for the While and Sum Montages. Example The action for the While Montage ( g. 2) is before(Self.S-Expr, Self) := true before(Self.S-DO, Self) := true before(Self.S-StatementSequence, Self) := true before(Self.S-Expr, Self.S-DO) := true before(Self.S-DO, Self.S-Expr) := true and the action for the Sum Montage ( g. 3) is before(Self.S1-Expr, Self) := true before(Self.S2-Expr, Self) := true before(Self.S-\+", Self) := true before(Self.S1-Expr, Self.S2-Expr) := true before(Self.S2-Expr, Self.S-\+") := true ::: s s The actions of the statics rule are explained step by step in the following. Lets assume for the discussion a xed Montage for a symbol . The action for this Montage is built up as block of updates. Each arrow in the control and data ow graph de nes one update in the action. This update links not directly the graphically related nodes but two of their leaf-descendants. The de nition which leaf descendants are linked relies heavily on the de nition of two functions Initial : Node ! Node Terminal : Node ! Node which conceptually denote the rst and the last leaf in the control ow between the leaf-descendants of a node. These functions are initialized with the identity on nodes, in order to be well de ned for leaves, which serve as their own initial and terminal leaf. The de nition on inner nodes is built up inductively during the tree traversal. A dotted arrow, labeled with I (respectively T) denotes the direct descendant, whose de nition of Initial (respectively Terminal) has to be copied. In the While montage ( g. 2), for instance, the initial leaf of a WhileStatement-node is the initial leaf of the S-Expr-descendant, and the terminal leaf is the terminal leaf of the S-DO-descendant. The arrows in the graph de ne three di erent kind of updates in the action, one for the above described Initial and Terminal functions, one for data ow arrows,and one for control ow arrows: 1. To de ne the functions Initial and Terminal, we specify which node in the graph contains the initial and terminal leaf, respectively. We call this nodes Inode and Tnode. Inode is de ned as the target of a dotted arrow labeled with I, and Tnode is the source of a dotted arrow labeled with T. Formally these arrows de ne two selector functions I and T which link the Self with Inode and Tnode. The corresponding fragment of transition rule obtained by specifying Inode and Tnode graphically is the following block: s Self.Initial := Self.I.Initial Self.Terminal := Self.T.Terminal We call this fragment{1. 8 2. Each solid edge as the following f S-dst S-src de nes the update Self.S-src.Terminal.f := Self.S-dst.Terminal The terminal leaf is chosen as target for the control ow. We call the block of these updates fragment{2. 3. Each dotted edge as the following f S-src S-dst de nes the update Self.S-src.Terminal.f := Self.S-dst.Initial The de ned control functions link the terminal leaf with the initial. We call the block of these updates fragment{3. The action of the statics rules contains in addition to the updates corresponding to the arrows a rule which is given textually in the second part of the Montage. Please have a look at the textual rule in the sum Montage ( g. 3). Its structure resembles to that of updates generated by the second fragment, but it cannot be represented graphically. Theoretically all graphical de ned updates can be given textually as well. The only missing part is the check of the static semantics condition. This condition is checked before the updates of the action happen, and a nullary function Abort is set to true if the condition is false. In order to make the rule easier to read, we write the corresponding conditional rule at the beginning of all updates. We discussed now all parts of the action of a characteristic symbol, and can de ne it as follows: The action of a characteristic symbol s in the sequentialized traversal being the statics rule of the Montages semantics is if not Condition then Abort := true endif fragment{1. fragment{2. fragment{3. TransRule where Condition is the static semantics constraint of Montage s, TransRule is the textual rule in the second part of Montage s, and the fragment{1., fragment{2., and fragment{3. are the updates de ned by the graph of Montage s. Example We give for the While ( g. 2) and Sum ( g. 3) Montages the corresponding actions. The action for While is the following rule: 9 Self.S-Expr.StaticType = Boolean then Abort := true if not endif 1 Self.Initial := Self.S-Expr.Initial Self.Terminal := Self.S-DO.Terminal 2 Self.S-DO.Terminal.Condition := Self.S-Expr.Terminal 3 Self.S-Expr.Terminal.NT := Self.S-DO.Initial Self.S-DO.Terminal.TrueTask := Self.S-StatementSequence.Initial Self.S-StatementSequence.Terminal.NT := Self.S-Expr.Initial and the action for the Sum looks as follows: (Is Num(Self.S1-Expr.StaticType) and Is Num(Self.S2-Expr.StaticType)) then Abort := true if not endif Self.Initial := Self.S1-Expr.Initial Self.Terminal := Self.S-\+".Terminal Self.S-\+".Terminal.Left := Self.S1-Expr.Terminal Self.S-\+".Terminal.Right := Self.S2-Expr.Terminal Self.S1-Expr.Terminal.NT := Self.S2-Expr.Initial Self.S2-Expr.Terminal.NT := Self.S2-"+".Initial Self.StaticType := LeastCommonSupertype( Self.S1-Expr.StaticType, Self.S1-Expr.StaticType) 3.2 1 2 3 List Processing In many approaches a major part of a language speci cation is concerned with the processing of lists. Therefore we decided to include in Montages a simple, yet powerful list model together with graphical and textual speci cation elements that can be used to avoid all explicit list processing. If the right{hand{side of a production rule contains a symbol in a f g part, a list of descendents is generated. As already mentioned we generate as well an additional node, a so called list node, that provides access to the elements and to all needed informations about the list. An attribute ListLength of the list node is set to the length of the generated list and a binary mix{ x function [ ] : ListNode Nat ! Node can be used to retrieve the elements of the list. The initial and terminal leaves of a list node are de ned to be the initial leaf of the rst element, respectively the terminal leaf of the last element in the list. If the list is empty, they point to the list node itself, which then serves as dummy element. The dynamic semantics of that dummy element corresponds to the skip command. For convenience we assume that a number of patterns in the right-hand-side of production rules are recognized and treated as simple lists. These patterns are 10 fsg sfsg s f\t" sg [s f\t" sg] where \t" is an arbitrary terminal and s a non-terminal. For all these patterns just one list node is generated, which can be accessed by the selector function S-s. The [ ] function can then be used to access all generated s-descendants from left to right, regardless by which s in the pattern the descendant was generated. We have not yet de ned the statics for list nodes. For simplicity we give just one of many possible solutions. We de ne the action in the before -de nition such that the elements must be visited from left to right and that they are visited before the list node Self: vary i over Nat satisfying i < Self.ListLength before(Self[i], Self[i + 1]) endvary before(Self[Self.ListLength], Self) := true In addition we chose the action in the statics rule of a list node such, that it sequentially links the elements by means of NT control arrows and sets the initial and terminal leaf to the corresponding leaves of the rst and last element: Self.Initial := Self[1].Initial Self.Terminal := Self[Self.ListLength].Terminal vary i over Nat satisfying i < Self.ListLength Self[i].Terminal.NT := Self[i + 1].Initial 1 3 endvary These two de nitions are a simple solution, which works ne for many examples. Other solutions can be de ned with a generic Montage for lists. In the graphs of Montages, a list node is represented by a box, which is marked in the right{top corner with the keyword List. A second box or circle within the List{box represents the single elements in the list. An arrow from a node within a List{box corresponds therefore to a family of arrows from all of the elements in that list, whereas arrows from or to the list-box itself have the same semantics as normal arrows. VarDeclaration ::= VarObjectf\," VarObjectg \:" Type StaticType T LIST I S-VarObject S-Type Figure 4: A variable declaration Montage StaticType StaticType StaticType a NT b c NT Bool Figure 5: Control/data{ ow graph of variable declaration \a, b, c: Bool" 11 As example we take a Montage for a variable declaration, as pictured in g. 4. The production rule of that Montage generates a list of VarObject-descendants and a node labeled with Type. The single StaticTypearrow in the Montage speci es a family of data ow arrows, one from each variable object to the type-node. Please note that the action of the list node links all variable objects sequentially with an NT arrow. As example we show the generated control/data- ow graph of a variable declaration \a, b, c: Bool" ( g. 5). ParamProcCall ::= Call$" [Expr f\," Exprg] $" ActualParameter( ) LIST I S-Expr NT T S-Call Figure 6: A Montage for a parameterized procedure call Experience showed that often binary functions which link a leaf with all elements of a list are used. The second argument of such a function is the position in the list. We allow to de ne this type of binary function graphically by means of a data arrow going into a list box. A typical language construct where this is needed are the actual parameters of a procedure call (see g. 6, we refer to KP97b for a complete version). The ActualParameter( ) arrow in the Montage de nes a binary function ActualParameter which maps a call task c and a position n to the n-th actual parameter of c. As example we show the generated control/data- ow graph of a procedure call \P(x, y, z)" ( g. 7). ActualParameter(1) ActualParameter(2) ActualParameter(3) x NT y NT z NT P Figure 7: Control/data{ ow graph of procedure call \P(x, y, z)" 4 Development Environment for Montages The development environment for Montages is given by the Gem-Mex tool. It consists of a graphical editor (Gem) providing an easy means to edit the graphical and textual elements of a Montage, and an executable generator (Mex). Gem also contains functionality to generate documents suitable for presenting the Montages. Both, paper and online presentation of the speci ed Montages are supported by gem: LATEX as well as Html versions of the Montages speci ed can be generated. In order to increase the readability also of the parts of the formalization represented by the textual elements of the Montages, a \literate speci cation" style is supported by means of a literate programming tool integrated into the system. \Literate speci cation" means that the Montages text elds may contain references to other parts of the formalization speci ed out the Montages boxes. This makes the appearance of a Montages speci cation very much like that of an informal description yet being a formal one. Mex is a type checker and executable generator for Montages. As mentioned earlier, in a rst step Mex uses standard tools (lex and yacc) to construct the abstract syntax tree according to the syntax rules given by the Montages speci cation. The next step is the generation of code given by class description of the object-oriented programming language Sather. This code consists of the following parts: de nition of a class hierarchie representing the grammar structure of the speci cation, 12 E = TjS S ::= E\+"T T = IjP P ::= T\"I ) abstract class $E; class S < $E is attr S_E :$E; attr S_PLUS:PLUS; attr S_T :$T end; abstract class $T < $E; class P < $T is attr S_T :$T; attr S_AST:AST; attr S_I :I; end; Figure 8: Generated class structure code implementing the conditions and transition rules, code representing the ASM for the dynamic semantics given by the text in the bottom{most parts of the Montages, and code for debugging the generated executable. For each grammar rule Mex generates either a concrete or an abstract class depending on whether it is a production rule or a synonym production. The rst kind of rule are represented by concrete classes, the latter ones trigger the generation of abstract classes being parent classes of those (concrete or abstract) classes representing the alternatives of the synonym production. Nonterminal appearing on the right-hand-side of a production rule are modeled as attributes of the left-hand-side attribute. Figure 8 shows an example for the generation of classes corresponding to grammar rules. The consistency check of the Montages is done during the construction of the abstract syntax tree. The data and control edges, the condition and the transition rules are evaluated during a left-to-right tree traversal constructing the initial state for the ASM de ning the dynamic semantics. Mex generates source code that represents the ASM rules of the dynamic semantics. This is done by simple data ow analysis of the updates and introducing auxiliary variables where necessary. The generated code also contains debugging functionality in order to allow the user to interactively trace the run of the ASM given by the dynamic semantics rules. The debugging functionality includes the animation of nodes within the corresponding Montages in the Gem editor; token nodes and edge are highlighted when they are reached by the control ow. 5 Related Works Denotational semantics has been regarded as the most promising approach for the semantic description of programming languages. But its problems with the pragmatical side of language design have been discovered already in case studies of the scale of Pascal and C (see for instance [24]). Information hiding, object orientedness and complex name analysis are not covered because of the global visibility of the de nition of semantic domains throughout a denotation description. Moreover domain de nitions often need to be changed when extending the language with unforeseen constructs, for instance a change from the direct style to the continuation style when adding gotos [21]. To cite Abramsky \ once languages with features beyond the purely functional are considered, the appropriateness of modeling programs by functions is increasingly open to question. Neither concurrency nor `advanced' imperative features have been captured denotationally in a fully convincing fashion." [1] Other research has been carried out because of the above considerations about pragmatics. In particular, it is worth mentioning Action semantics [21], which is an initial{algebra semantics [11], based on Mosses' uni ed algebras. Action semantics retained some denotational semantics features, i.e. context{free grammars for de ning abstract{syntax trees, and the use of Horn clauses to give inductive de nition of compositional ::: 13 semantic functions. The main semantic entities are actions, which are speci ed by means of the action notation. To mention Mosses \ the current structural operational semantics of action notation is not easy to modify; alternative forms of operational semantics, such as evolving{algebra semantics, might be preferable in that respect." [21] Another universal meta{language for all aspects of programming languages is ASF+SDF [25]. It is an initial{algebra approach and speci es the static and dynamic semantics by means of positive conditional equations. As all the initial{algebra based formalisms they are forced to remain under the expressiveness of the logic of Horn clauses, i.e. positive conditional equations, otherwise the existence of the initial model is not guaranteed and the syntax cannot be mapped in an unambiguous way to the semantics because of non{existence of the universal homomorphism. In this respect, Montages are much more expressive since they make use of the full rst{order logic for the static semantics predicates. An approach with the same ambitious goal are Kahn's Natural Semantics [14] which are directly based on Natural Deduction. For somebody knowing mathematical logic, Natural Semantics are pretty intuitive and we used it for the dynamic semantics of Oberon [16]. Although we succeeded due to the excellent tool support by Centaur [9], the result was much longer and more complex then the Montages counterpart given in [20], since one has to carry around all the state information in the case of Natural Semantics. Although attribute grammars [15] are not designed to specify all aspects of languages, it's worth noting that the the solution for the static aspects of our approach has some similarities with attribute grammars. Although in certain cases they may be executed very eciently, we preferred not to use them for the following reasons: using ASMs we have the same formalism for all parts of the speci cation; and as shown in [22] attribute grammars tend to be very long if applied to real programming languages. Using ASMs for dynamic semantics, the work in [23] de nes a framework comparable to ours. Although it has di erent aims, namely ecient execution. For the static part, it proposes occurrence algebras which integrate term algebras and context free grammars by providing terms for all nodes of all possible derivation trees. This allows such an approach to de ne all static aspects of the language in a functional algebraic system, which is supported by the MAX tool. In any case, the additional mathematical machinery must be hidden from the user and Montages might be well suited for that task. ::: 6 Conclusions In this paper we presented a novel approach to cope with speci cations of all aspects of programming languages. Expressive yet intelligible descriptions of language constructs and of complex features together with ease of maintenance were sought. In this respect, the well known ASMs have already attracted attention. The classical use of ASMs abstracts from the static semantics and assumes the result of a static analysis in order to de ne the dynamic semantics. The main criticisms against such an approach is that the static analysis is not formalized. An exception is the work on Occam [5]. Unfortunately the solution presented there allows not for the de nition of the static semantics. Montages solves the problem using control and data ow graphs and at the same time allows one to give a very compact de nition of static semantics as full rst{order predicates. At the same time Montages retain the advantages of ASMs. Experience in scaling up both basic ASMs and Montages for large case studies such as the speci cation of SQL [10] and Oberon [17, 20] showed some important advantages of Montages with respect to basic ASMs: the readability and comprehension of speci cations improved drastically since the speci cation is arranged in capsules of behavior according to the rules of the context{free grammar, the maintenance of the whole speci cation is much easier since the behavior can be easily localized and eventually modi ed according to requirement changes, starting from the static semantics and analysis allows one to have a better comprehension of the dynamic semantics, the speci cation process requires less time since { the designer is driven by the structure of the Montages, 14 { Montages represents a sort of factorization of behavior which can be reused in the de nition of other languages in a o {the{shelf fashion or via re nement. As pointed out in [4] ASMs are deliberately not imposing any particular calculus. The ease of abstraction makes ASMs very suitable for di erent application domains and for each of them it makes sense to have a di erent veri cation system with its own assumptions. Moreover, one is able to do complex mathematical proofs directly in the ASM framework, as shown in [4, 5, 8]. The translation of an ASM model in other formalisms is interesting if tools are available with powerful veri cation capabilities. Montages are formal descriptions of programming languages with an higher intelligibility than usual semantics descriptions. This is mainly due to the use of visual elements and the elaborated structuring mechanism. Future case studies will show whether Montages are not only suited for the speci cation of programming languages, but for other application domains as well, such as databases, hardware languages and communication protocols. Acknowledgments We gratefully acknowledge Yuri Gurevich, Egon Borger, Wolf Zimmermann, Jim Hug- gins, David Espinosa, and Daniel Schweizer for their constructive comments on early drafts of the paper. Thanks goes to Richard Waldinger and Chuck Wallace who helped us with the writing; Richard proposed the name Montages. References [1] S. Abramsky. Semantics of Interaction. In Trees in Algebra and Programming { CAAP'96, 21st Int. Coll., volume 1059 of LNCS, page 1. Springer Verlag, 1996. [2] M. Anlau . The semantics of the object{oriented programming language sather. Technical report, International Computer Science Institute, Berkeley, 1997. In preparation. [3] M. Anlau , P.W. Kutter, A. Pierantonio, D. Rosenzweit, L. Thiele, and W. Zimmermann. Compiler construction with montages. submitted for publication, 1997. [4] E. Borger. Why Use Evolving Algebras for Hardware and Software Engineering. In SOFTSEM'95 22nd Seminar on Current Trends in Theory and Practice of Informatics, volume 1012 of LNCS, pages 236 { 271. Springer Verlag, 1995. [5] E. Borger and I. Durdanovic. Correctness of Compiling Occam to Transputer Code. Computer Journal, 39(1):52 { 92, 1996. [6] E. Borger, U. Glaser, and W. Mueller. Formal De nition of an Abstract VHDL'93 Simulator by EA{ machines. In Semantics of VHDL, volume 307 of The Kluwer International Series in Engineering and Computer Science. Kluwer, 1995. [7] E. Borger and D. Rosenzweig. A Mathematical De nition of Full Prolog. Science of Computer Programming, 1994. [8] E. Borger and D. Rosenzweig. The WAM - De nition and Compiler Correctness, chapter 2, pages 20 { 90. Series in Computer Science and Arti cial Intelligence. Elsevier Science B.V.North Holland, 1995. [9] P. Borra, D. Clement, T. Despeyroux, J. Incerpi, G. Kahn, B. Lang, and V. Pascual. CENTAUR: The System. Technical Report 777, INRIA, Sophia Antipolis, 1987. [10] B. DiFranco. Semantica Statica e Dinamica di SQL mediante i Montaggi. Master's thesis, Universita di L'Aquila, 1997. in italian. [11] J.A. Goguen, J.W. Thatcher, E.G. Wagner, and J.B. Wright. Initial algebras semantics and continuous algebras. J.ACM, (24):68 { 95, 1977. 15 [12] Y. Gurevich. Evolving Algebras 1993: Lipari Guide. In E. Borger, editor, Speci cation and Validation Methods. Oxford University Press, 1995. [13] Y. Gurevich and J.K. Huggins. The Semantics of the C Programming Language, volume 702 of LNCS, pages 274 { 308. Springer Verlag, 1993. [14] G. Kahn. Natural Semantics. In Proceedings of the Symp. on Theoretical Aspects of Computer Science, Passau, Germany, 1987. [15] D.E. Knuth. Semantics of Context{Free Languages. Math. Systems Theory, 2(2):127 { 146, 1968. [16] P.W. Kutter. Executable Speci cation of Oberon Using Natural Semantics. Term Work, ETH Zurich, implementation on the Centaur System [9], 1996. [17] P.W. Kutter. Dynamic Semantics of the Programming Language Oberon. TIK-Report 27, ETH Zurich, 1997. [18] P.W. Kutter and A. Pierantonio. Montages: Uni ed static and dynamic semantics of programming languages. Technical Report 118, Dip. Matematica Pura ed Applicata, Universita di L'Aquila, July 1996. [19] P.W. Kutter and A. Pierantonio. Montages Speci cations of Realistic Programming Languages. Springer Journal of Universal Computer Science, 3(5):416{442, 1997. [20] P.W. Kutter and A. Pierantonio. The Formal Speci cation of Oberon. Springer Journal of Universal Computer Science, 3(5):443{503, 1997. [21] P. Mosses. Theory and Practice of Action Semantics. In MFCS'96, 21st International Symposium, volume 1113 of LNCS, pages 37 { 61. Springer Verlag, 1996. [22] M. Odersky. A New Approach to Formal Language De nition and its Application to Oberon. PhD thesis, ETH Zurich, 1989. [23] A. Poetzsch-He ter. Developing Ecient Interpreters Based on Formal Language Speci cations. In Compiler Construction, volume 786 of Lecture Notes in Computer Science, pages 233 { 247. Springer{ Verlag, 1994. [24] D.A. Schmidt. Denotational Semantics: A Methodology for Language Development. Allyn & Bacon, 1986. [25] A. van Deursen, J. Heering, and P. Klint, editors. Language Prototyping { An Algebraic Approach, volume 5 of AMAST Series in Computing. World Scienti c, 1996. [26] C. Wallace. The Semantics of the C++ Programming Language. In E. Borger, editor, Speci cation and Validation Methods, pages 131 { 164. Oxford University Press, 1994. 16

RELATED PAPERS

RELATED TOPICS

Log In

Formal Aspects of and Development Environments for Montages

Formal Aspects of and Development Environments for Montages

Related Papers

RELATED PAPERS

RELATED TOPICS