Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Montages - Engineering of Computer Languages

In this thesis we elaborate a language description formalism called Montages. The Montages formalism can be used to engineer domain specific languages (DSLs), which are computer languages specially tailored and typically restricted to solve problems of specific domains. We focus on DSLs which have some algorithmic flavor and are intended to be used in corporate environments where main-stream state-based programming and modeling formalism prevail. For engineering such DSLs it is important that the designs of the existing,well known general purpose languages (GPLs)can be described as well, and that this descriptions are easily reused as basic building blocks to design new DSLs. Using the Montages tool support Gem-Mex, such a new designs can be composed in an integrated semantics environment, and from the descriptions an interpreter and a specialized visual debugger is generated for the new language. We restrict our research to sequential languages and the technical part of the thesis tries to contribute to the improvement of the DSL design process by focusing on ease of specification and ease of reuse for programming language constructs from well known GPL designs. For the sake of shortness we do not present detailed case studies for DSLs and refer the reader to the literature. Finally, we mainly look at exact reuse of specification modules, and we have not elaborated the means for incremental design by reusing specifications in the sense of object oriented programming. Of course these means are needed as well and we assume the existence of such reuse features without formalizing them. The technical part of the thesis provides the basic specification patterns for introducing all features of an object oriented style of reuse, and applying these patterns to Montages in order to make it an object-oriented specification formalism is left for future work. The focus and contribution of this thesis is the design and elaboration ofa language engineering discipline based on widely-spread state-based intuition of algorithms and programming. This approach opens the possibility to apply DSL technology in typical corporate environments, where the beneficial proper-ties of smaller, and therefore by nature more secure and more focused computer languages are most leveraged. The thesis does not cover the equally important topic how to formalize these beneficial properties by means of declarative formalisms and how to apply mechanized reasoning and formal software engineering to DSLs. The thesis is structured in three parts. In the first part the requirements fora language engineering approach are analyzed and the language definition formalism Montages is introduced. In the second part the formal semantics and system architecture of Montages is given. The third part consists of a number of small example languages, each of them designed to show the Montages solution for specifying a well-known feature of main-stream object oriented programming languages such as Java. The single description modules of these example languages can be used to assemble a full object-oriented language, or a small subset of them can be combined with some high-level domain-specific features into a DSL....Read more
Institut für Technische Informatik und Kommunikationsnetze Computer Engineering and Networks Laboratory TIK-SCHRIFTENREIHE NR. XXX Philipp W. Kutter Montages Engineering of Computer Languages Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich
A dissertation submitted to the Swiss Federal Institute of Technology Zurich for the degree of Doctor of Technical Sciences Diss. ETH No. 13XXX Prof. Dr. Lothar Thiele, examiner Prof. Dr. Martin Odersky, co-examiner Examination date: xxxxx xx, 2003
Institut für Technische Informatik und Kommunikationsnetze Computer Engineering and Networks Laboratory TIK-SCHRIFTENREIHE NR. XXX Philipp W. Kutter Montages — Engineering of Computer Languages Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich A dissertation submitted to the Swiss Federal Institute of Technology Zurich for the degree of Doctor of Technical Sciences Diss. ETH No. 13XXX Prof. Dr. Lothar Thiele, examiner Prof. Dr. Martin Odersky, co-examiner Examination date: xxxxx xx, 2003 I would like to thank to my father, Gerhard Rüdiger Kutter, whose answers to my questions about his work with fourth generation languages in banking software have been a seed from my childhood, which has grown to this thesis. To my marvelous wife, Enza, to my children, Nora-Manon Alma Stella and Anaël Gerhard Victor-Maria, to my mother and my sister. To the Montages team, Alfonso Pierantonio and Matthias Anlauff, to Florian Haussman, who was there when everything started, and to my scientific advisors, Lothar Thiele, Yuri Gurevich, and Martin Odersky. For various comments on content and form of the thesis I am thanksfull mainly to Chuck Wallace, but as well to Marjan Mernik, Welf Löwe, Asuman Sünbül, Arnd Poetzsch-Heffter, and Dimidios Spinellis. For inspiring discussions on related areas, thanks to Craig Cleaveland, David Weiss, Grady Campbell, Erich Gamma, Peter Mosses, Dusko Pavlovic. For helpfull insight in the business aspects of Montages, I would like to thank Denis McQuade and Hans-Peter Dieterich. . Contents 1 Introduction 1 I Engineering of Computer Languages 7 2 Requirements for Language Engineering 2.1 Problem Statement . . . . . . . . . . . . . . . . . . 2.2 Typical Application Scenario . . . . . . . . . . . . . 2.2.1 Situation . . . . . . . . . . . . . . . . . . . 2.2.2 Problem . . . . . . . . . . . . . . . . . . . . 2.2.3 DSL Solution . . . . . . . . . . . . . . . . . 2.2.4 Conclusions and Related Applications . . . . 2.3 Designing Domain Specific Languages . . . . . . . . 2.4 Reusing Existing Language Designs . . . . . . . . . 2.5 Safety, Progress, and Security . . . . . . . . . . . . 2.6 Splitting Development Cycles . . . . . . . . . . . . 2.7 Requirements for a Language Description Formalism 2.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Montages 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 From Syntax to Abstract Syntax Trees (ASTs) . . . . . . . . . 3.2.1 EBNF rules . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Abstract syntax trees . . . . . . . . . . . . . . . . . . 3.3 Dynamic Semantics with Tree Finite State Machines (TFSMs) 3.3.1 Example Language Ë . . . . . . . . . . . . . . . . . . 3.3.2 Transition Specifications and Paths . . . . . . . . . . 3.3.3 Construction of the TFSM . . . . . . . . . . . . . . . 3.3.4 Simplification of TFSM . . . . . . . . . . . . . . . . 3.3.5 Execution of TFSMs . . . . . . . . . . . . . . . . . . 3.4 Lists, Options, and non-local Transitions . . . . . . . . . . . . 3.4.1 List and Options . . . . . . . . . . . . . . . . . . . . 3.4.2 Extension of InstantiateTransitions . . . . . . . . . . . 3.4.3 Global Paths . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Algorithm InstantiateTransition . . . . . . . . . . . . . . . . . . . . . . . . 9 11 14 14 15 16 17 19 23 26 28 31 32 . . . . . . . . . . . . . . . 37 38 48 48 48 51 53 57 60 63 64 65 65 67 69 70 iv Contents 3.5 3.4.5 The Goto Language . . . . . . . . . . . . . . . . . . . . Related Work and Results . . . . . . . . . . . . . . . . . . . . . 3.5.1 Influence of Natural Semantics and Attribute Grammars 3.5.2 Relation to subsequent work on semantics using ASM . 3.5.3 The Verifix Project . . . . . . . . . . . . . . . . . . . . 3.5.4 The mpC Project . . . . . . . . . . . . . . . . . . . . . 3.5.5 Active Libraries, Components, and UML . . . . . . . . 3.5.6 Summary of Main Results . . . . . . . . . . . . . . . . II Montages Semantics and System Architecture 4 eXtensible Abstract State Machines (X ASM) 4.1 Introduction to ASM . . . . . . . . . . . . . . . . . . . . 4.1.1 Properties of ASMs . . . . . . . . . . . . . . . . . 4.1.2 Programming Constructs of ASMs . . . . . . . . . 4.2 Formal Semantics of ASMs . . . . . . . . . . . . . . . . . 4.3 The X ASM Specification Language . . . . . . . . . . . . . 4.3.1 External Functions . . . . . . . . . . . . . . . . . 4.3.2 Semantics of ASM run and Environment Functions 4.3.3 Realizing External Functions with ASMs . . . . . 4.4 Constructors, Pattern Matching, and Derived Functions . . 4.4.1 Constructors . . . . . . . . . . . . . . . . . . . . 4.4.2 Pattern Matching . . . . . . . . . . . . . . . . . . 4.4.3 Derived Functions . . . . . . . . . . . . . . . . . 4.4.4 Relation of Function Kinds . . . . . . . . . . . . . 4.4.5 Formal Semantics of Constructors . . . . . . . . . 4.5 EBNF and Constructor Mappings . . . . . . . . . . . . . 4.5.1 Basic EBNF productions . . . . . . . . . . . . . . 4.5.2 Repetitions and Options in EBNF . . . . . . . . . 4.5.3 Canonical Representation of Arbitrary Programs . 4.6 Related Work and Results . . . . . . . . . . . . . . . . . . 72 75 75 76 76 77 78 79 85 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Parameterized X ASM 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The $, Apply, and Update Features . . . . . . . . . . . . . . . 5.2.1 The $ Feature . . . . . . . . . . . . . . . . . . . . . . 5.2.2 The Apply and Update Features . . . . . . . . . . . . 5.3 Generating Abstract Syntax Trees from Canonical Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Constructing the AST . . . . . . . . . . . . . . . . . 5.3.2 Navigation in the Parse Tree . . . . . . . . . . . . . . 5.3.3 Examples: Abrupt Control Flow and Variable Scoping 5.4 The PXasm Self-Interpreter . . . . . . . . . . . . . . . . . . . 5.4.1 Grammar and Term-Representation of PXasm . . . . . . . . . . . . . . . . . . . . . . . . 93 95 95 97 102 106 107 110 112 116 116 116 117 118 118 120 120 122 123 126 . . . . 129 131 133 133 134 . . . . . . 135 135 137 138 141 141 Contents 5.5 5.6 v 5.4.2 Interpretation of symbols . . . . . . . . . . . . . . . . 5.4.3 Definition of INTERP( ) . . . . . . . . . . . . . . . . The PXasm Partial Evaluator . . . . . . . . . . . . . . . . . . 5.5.1 The Partial Evaluation Algorithm . . . . . . . . . . . 5.5.2 The do-if-let transformation for sequentiality in ASMs Related Work and Conclusions . . . . . . . . . . . . . . . . . 6 TFSM: Formalization, Simplification, Compilation 6.1 TFSM Interpreter . . . . . . . . . . . . . . . . . 6.1.1 Interpreter for Non-Deterministic TFSMs 6.1.2 Interpreter for Deterministic TFSMs . . . 6.2 Simplification of TFSMs . . . . . . . . . . . . . 6.3 Partial Evaluation of TFSM rules and transitions . 6.4 Compilation of TFSMs . . . . . . . . . . . . . . 6.5 Conclusions and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Attributed X ASM 7.1 Motivation and Introduction . . . . . . . . . . . . . . . . . . 7.1.1 Object-Oriented versus Procedural Programming . . . 7.1.2 Functional Programming versus Attribute Grammars . 7.1.3 Commonalities of Object Oriented Programming and Attribute Grammars . . . . . . . . . . . . . . . . . . 7.1.4 AXasm = X ASM + dynamic binding . . . . . . . . . . 7.1.5 Example . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Definition of AXasm . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Derived Functions Semantics . . . . . . . . . . . . . . 7.2.2 Denotational Semantics . . . . . . . . . . . . . . . . 7.2.3 Self Interpreter Semantics . . . . . . . . . . . . . . . 7.3 Related Work and Results . . . . . . . . . . . . . . . . . . . . 8 Semantics of Montages 8.1 Different Kinds of Meta-Formalism Semantics . 8.2 Structure of the Montages Semantics . . . . . . 8.2.1 Informal Typing . . . . . . . . . . . . 8.2.2 Data Structure . . . . . . . . . . . . . 8.2.3 Algorithm Structure . . . . . . . . . . 8.3 X ASM definitions of Static Semantics . . . . . 8.3.1 The Construction Phase . . . . . . . . 8.3.2 The Attributions and their Collection . 8.3.3 The Static Semantics Condition . . . . 8.4 X ASM definitions of Dynamic Semantics . . . 8.4.1 The States . . . . . . . . . . . . . . . . 8.4.2 The Transitions . . . . . . . . . . . . . 8.4.3 The Transition Instantiation . . . . . . 8.4.4 Implicit Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 143 147 147 154 156 . . . . . . . 159 159 160 162 164 164 166 169 171 . 172 . 172 . 174 . . . . . . . . 175 175 177 179 179 180 184 193 . . . . . . . . . . . . . . 195 196 198 198 198 200 201 201 201 202 204 204 205 205 211 vi Contents 8.5 8.4.5 The Decoration Phase . . . . . . . . . . . . . . . . . . 211 8.4.6 Execution . . . . . . . . . . . . . . . . . . . . . . . . . 212 Conclusions and Related Work . . . . . . . . . . . . . . . . . . 214 III Programming Language Concepts 217 9 Models of Expressions 9.1 Features of ExpV1 . . . . . . . . . . . . . . 9.1.1 The Atomar Expression Constructs . 9.1.2 The Composed Expression Constructs 9.2 Reuse of ExpV1 Features . . . . . . . . . . . 223 223 224 224 231 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Models of Control Flow Statements 233 10.1 The Example Language ImpV1 . . . . . . . . . . . . . . . . . . 234 10.2 Additional Control Statements . . . . . . . . . . . . . . . . . . 236 11 Models of Variable Use, Assignment, and Declaration 241 11.1 ImpV2: A Simple Name Based Variable Model . . . . . . . . . 242 11.2 ImpV3: A Refined Tree Based Variable Model . . . . . . . . . . 244 11.3 ObjV1: Interpreting Variables as Fields of Objects . . . . . . . . 248 12 Classes, Instances, Instance Fields 12.1 ObjV2 Programs . . . . . . . . . . 12.2 Primitive and Reference Type . . . . 12.3 Classes and Subtyping . . . . . . . 12.4 Object Creation and Dynamic Types 12.5 Instance Fields . . . . . . . . . . . 12.6 Dynamic Binding . . . . . . . . . . 12.7 Type Casting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Procedures, Recursive-Calls, Parameters, Variables 13.1 ObjV3 Programs . . . . . . . . . . . . . . . . . 13.2 Call Incarnations . . . . . . . . . . . . . . . . . 13.3 Semantics of Call and Return . . . . . . . . . . . 13.4 Actualizing Formal Parameters . . . . . . . . . . 14 Models of Abrupt Control 14.1 The Concept of Frames . . . . . . . . 14.2 FraV1: Models of Iteration Constructs 14.3 FraV2: Models of Exceptions . . . . . 14.4 FraV3: Procedure Calls Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 252 253 254 255 255 256 262 . . . . 263 263 265 265 269 . . . . 271 272 275 281 286 Contents vii IV Appendix 291 A Kaiser’s Action Equations A.1 Introduction . . . . . . . . . . . . A.2 Control Flow in Action Equations A.3 Examples of Control Structures . . A.4 Conclusions . . . . . . . . . . . . . . . . 293 294 295 296 301 . . . . . . . . . . . . . . 303 304 304 304 305 306 307 307 308 309 310 311 313 313 314 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B Mapping Automata B.1 Introduction . . . . . . . . . . . . . . . . . . . . B.2 Static structures . . . . . . . . . . . . . . . . . . B.2.1 Abstract structure of the state . . . . . . B.2.2 Locations and updates . . . . . . . . . . B.3 Mapping automata . . . . . . . . . . . . . . . . B.4 A rule language and its denotation . . . . . . . . B.4.1 Terms . . . . . . . . . . . . . . . . . . . B.4.2 Basic rules constructs . . . . . . . . . . . B.4.3 First-order extensions . . . . . . . . . . . B.4.4 Nondeterministic rules . . . . . . . . . . B.4.5 Creating new objects . . . . . . . . . . . B.5 Comparison to traditional ASMs . . . . . . . . . B.5.1 State and automata . . . . . . . . . . . . B.5.2 Equivalence of MA and traditional ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C Stärk’s Model of the Imperative Java Core 319 C.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 C.2 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 C.3 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 D Type System of Java D.1 Reference Types . . . . . . . . . . . D.2 Subtyping . . . . . . . . . . . . . . D.3 Members . . . . . . . . . . . . . . D.4 Visibility and Reference of Members D.5 Reference of Static Fields . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 330 332 333 335 336 337 viii Contents 1 Introduction In this thesis we elaborate a language description formalism called Montages. The Montages formalism can be used to engineer domain specific languages (DSLs), which are computer languages specially tailored and typically restricted to solve problems of specific domains. We focus on DSLs which have some algorithmic flavor and are intended to be used in corporate environments where main-stream state-based programming and modeling formalisms 1 prevail. For engineering such DSLs it is important that the designs of the existing, well known general purpose languages (GPLs) can be described as well, and that this descriptions are easily reused as basic building blocks to design new DSLs. Using the Montages tool support Gem-Mex, such a new designs can be composed in an integrated semantics environment, and from the descriptions an interpreter and a specialized visual debugger is generated for the new language. We restrict our research to sequential languages and the technical part of the thesis tries to contribute to the improvement of the DSL design process by focusing on ease of specification and ease of reuse for programming language constructs from well known GPL designs. For the sake of shortness we do not present detailed case studies for DSLs and refer the reader to the literature. Finally, we mainly look at exact reuse of specification modules, and we have not elaborated the means for incremental design by reusing specifications in the sense of object oriented programming. Of course these means are needed as well and we assume the existence of such reuse features without formalizing them. The technical part of the thesis provides the basic specification patterns for introducing all features of an object oriented style of reuse, and applying these patterns to Montages in order to make it an object-oriented specification 1 Examples are state-machines, as found in UML or State-Charts, flow-charts, and imperative as well as most object-oriented and scripting languages. 2 Chapter 1. Introduction formalism is left for future work. The focus and contribution of this thesis is the design and elaboration of a language engineering discipline based on widely-spread state-based intuition of algorithms and programming. This approach opens the possibility to apply DSL technology in typical corporate environments, where the beneficial properties of smaller, and therefore by nature more secure and more focused computer languages are most leveraged. The thesis does not cover the equally important topic how to formalize these beneficial properties by means of declarative formalisms and how to apply mechanized reasoning and formal software engineering to DSLs. The thesis is structured in three parts. In the first part the requirements for a language engineering approach are analyzed and the language definition formalism Montages is introduced. In the second part the formal semantics and system architecture of Montages is given. The third part consists of a number of small example languages, each of them designed to show the Montages solution for specifying a well-known feature of main-stream object oriented programming languages such as Java. The single description modules of these example languages can be used to assemble a full object-oriented language, or a small subset of them can be combined with some high-level domain-specific features into a DSL. In the following we summarize for each part and its chapters their content and relation to each other. 3 Part I: Engineering of Computer Languages The first part of this thesis describes the problems we try to solve (Chapter 2), and gives a tutorial introduction to Montages (Chapter 3). Chapter 2: Requirements for Language Engineering In this chapter we analyze the problem in the area of language engineering in general, with a special focus on DSLs. A typical application scenario for a DSL with algorithmic flavor is described, and the issue of designing DSLs is discussed. We motivate why the possibility to reuse existing language designs is important even for simple language designs, and show how introducing DSLs and especially language description formalisms allows one to split the development cycle. After these discussions the resulting requirements for a language requirement formalism are summarized and finally related work in the area of language design, domain engineering, and domain specific languages is discussed. Chapter 3: Montages The purpose of this chapter is to introduce Montages, a language description formalism we proposed with Pierantonio in 1996 and which has since then be used for descriptions and implementations of GPLs and DSLs in academic and industrial contexts. While a complete formal definition is delegated to Chapter 8, we give here a tutorial introduction. Since for the static semantics we use the well known technique attribute grammars (AGs), we focus on our novel approach for describing dynamic semantics. In short Montages define dynamic semantics by a mapping from programs to tree finite state machines (TFSMs), a simple tree based state machine model we designed for streamlining the semantics of Montages. The states of such a machine are elements of the Cartesian product of syntax tree nodes, and states in finite state machines (FSMs). The states of the FSMs are in turn associated with action rules. If the TFSM reaches some state, the corresponding action is executed in the environment given by the corresponding node in the syntax tree. The tree structure is defined by traditional EBNF grammars, producing a syntax tree, and the transitions from one node to another in the tree are specified by representing the structure of the tree as nested boxes within the FSMs. The nodes in the tree, whose number can be infinitely large, is associated with a finite number of different FSMs by defining one FSM per production rule in the EBNF, and then associating each node with the FSM corresponding to the production rule which generated the node. The TFSM definition is thus structured along the EBNF rules. The algorithms for constructing a global FSM from the TFSM, for the simplification of TFSMs, and for the execution of TFSMs are given in an informal way, then special features for the processing of lists, and for the specification of non-local transitions are described. Finally previous results with Montages are summarized and related work in the areas of formal semantics, abstract state machines, and language description environments is discussed. 4 Chapter 1. Introduction Part II: Montages Semantics and System Architecture In the second part the formal semantics and system architecture of Montages are given. The X ASM formalism, being used for both giving the formal semantics and implementing the system architecture, is introduced (Chapter 4), then the extension of X ASM with parameterizable signature is motivated (Chapter 5), the details of attribute grammars in Montages are given (Chapter 7), and finally, using the previous definition and examples, the formal semantics of Montages is presented in the form of a meta-interpreter, a X ASM program which reads both the specification of a language, and a program written in the specified language, and then executes the program according to the language’s semantics (Chapter 8). The meta-interpreter can then be partially evaluated to specialized interpreters of the language, and even into compiled code, a process which is sketched in Chapter 5. In this context the parameterization of signatures is used to control the form of the resulting code in order to meet developers requirements on simplicity and transparency of the generated code. Chapter 4: eXtensible Abstract State Machines (X ASM) The content of this chapter is a motivation and definition of the imperative fastprototyping formalism eXtensible Abstract State Machines (X ASM). The X ASM language has been devised by Anlauff as implementation basis for Montages. Since X ASM have not been defined formally up to now, we contribute here a detailed denotational semantics of X ASM. X ASM is a generalization of Gurevich’s ASMs, a state based formalism for specifying algorithms. The basic semantic idea of both ASMs and X ASM is that each step of a computation is given by set of state-changes. The state itself is given by an algebra. While ASMs propose a fixed update language, the X ASM formalism generalized the idea by allowing to introduce extension functions whose semantics can be freely calculated by an other ASM or externally implemented functions. In addition X ASM feature a group of features building a pure functional sublanguage: constructor terms, pattern matching, and derived functions. If these features are used together with the imperative features an interesting mix of the imperative and the functional paradigm is achieved. Another built in feature of X ASM are EBNF grammars. Such a grammar can be decorated with mappings into constructor terms, which are then processed with pattern matching. At the end of the chapter ASM related work is discussed, and a possible challenge of the so called “ASM thesis” is drafted. Chapter 5: Parameterized X ASM The X ASM extension Parameterized X ASM (PXasm) is the topic of this chapter. We designed the novel concept of PXasm in order to allow for freely parameterizing the signature of X ASM declarations and rules. We motivate the necessity of PXasm by showing that it is not possible to generate the kind of syntax trees defined in Chapter 3 with traditional ASMs, since the signature of the trees depends on the symbols in the EBNF. After in- 5 troducing the new features, we show how the tree generation problem can be solved; we introduce first techniques to navigate in the syntax tree, including examples for specifying abrupt control flow and variable scoping of a simple programming language. Then PXasm are used to define a self interpreter and the use of this self interpreter for formalizing execution of the earlier introduced TFSM is shown. Since the TFSM interpreter is the nucleus of the complete Montages semantics, this formalization is the basic building block of the formal semantics of Montages given later. On the example of the TFSM formalization we show how partial evaluation can be used to implement Montages by specializing its semantics. The given partial evaluator for ASMs is a further example for the use of PXasm. Chapter 7: Attributed X ASM As mentioned, attribute grammars (AGs) are used in Montages for the specification of static semantics. In this chapter we propose a new variant of AGs, which combines features of object-oriented programming and traditional AGs to a new AG variant which features reference-values, attributes with parameters, and more liberal control-flow, e.g. no classification in synthesized and inherited attributes. The new variant is based on a further extension of X ASM, called attribute X ASM (AXasm). After motivating the design and initial examples for AXasm we give formal semantics to them in three different ways. First we show that AXasm can be translated easily into derived functions of X ASM, then we extend the denotational semantics of X ASM to the new features, and finally we give a self interpreter for AXasm. This self interpreter will be used in the Montages semantics to evaluate terms and rules. Using AXasm, the complete specification of the object-oriented type system of the Java programming language is given in Appendix D. Although the example is relatively long, it shows that the approach scales to real-world languages. Finally we discuss related work in the field of AGs and specifications of Java. Chapter 8: Semantics of Montages Based on the previously proposed extensions of X ASM, this chapter gives a formal semantics of Montages. We shortly discuss the choices for defining semantics of a meta-formalism like Montages. A parameterized, attributed X ASM is then given, which processes, validates, and executes programs and Montages in 5 steps. First the abstract-syntax tree is generated, then the static semantics conditions are checked for each node. If all conditions are fulfilled, the states and transitions of the Montages are used to construct a TFSM which gives the dynamic semantics. Finally the TFSM is simplified, and then executed. 6 Chapter 1. Introduction Part III: Programming Language Concepts In this part we use Montages to specify programming language concepts. We try to isolate each concept in a minimal example language. The executability of each of these languages is tested carefully using the Gem-Mex tool, and we invite the reader to use the prepared examples and the tool to get familiar with the methodology. The standard Gem-Mex distribution contains the examples and is available at www.xasm.org. The language ExpV1 (Chapter 9, Models of Expressions) is a simple expression language. The remaining example languages are extensions of ExpV1. The first imperative language ImpV1 extends ExpV1 by introducing the concept of statements, blocks of sequential statements and conditional control flow. The concept of global variables is introduced in example language ImpV2. The purpose of languages ImpV1 and ImpV2 is to introduced features of a simple imperative language. In a series of refinements, the primitive variable model of ImpV2 is now further developed into ImpV3, and finally ObjV1. Language ObjV2 is an extension of ObjV1 with classes and dynamically bound instance fields, and ObjV3 is an extension of ObjV1 with recursive procedure calls. The languages FraV1 , FraV2 , and FraV3 feature iterative constructs, exception handling, and a refined model of procedure calls, respectively. The presented example languages are an extract from a specification of sequential Java. The Java specification mainly differs from the here presented languages by a complex object-oriented type system, many exceptions and special cases, and a number of syntax problems. We have given the specification of the complete Java type system as example in Appendix D. Unfortunately the scope of this thesis does not allow the inclusion of a full description of Java and we refer the reader to the description given by Schulte, Börger, Schmidt, and Stärk. In Appendix C we show how their model can be directly mapped into Montages. Other complete descriptions of object-oriented programming languages which can be mapped into Montages without major modifications are the specification of Oberon by the author and Pierantonio, and the specification of Smalltalk by Mlotkowski. Part I Engineering of Computer Languages 2 Requirements for Language Engineering Information hiding (175) is a root principle motivating most of the mechanisms and patterns in programming and design that provide flexibility and protection from variations (142)1 . One of the most advanced tools for information hiding is a programming language. A general purpose language (GPL) hides the details of how machine code is generated from more abstract descriptions of general algorithms. More hiding can be achieved by a domain-specific language (DSL) (217), which allows one to use a domain’s specialized terminology to describe domain problems, and which allows one to hide the general programming techniques used to implement these problems efficiently. The process of designing, implementing, and using a new, specialized computer language is often considered as part of the history of computer science. In contrast, the DSL approach aims at creating a repeatable software engineering process supporting information hiding by means of creating new languages. Most existing techniques supporting this process are too complex to be applied for people outside the software and hardware area. An important part of the DSL approach is therefore computer language engineering, the discipline of designing and implementing computer languages as tools for the software, hardware, and - most importantly - business engineers. The purpose of this thesis is to propose a simple, integrated approach, especially suitable for business related problem domains, such as finance, commerce, and consulting. In this chapter we analyze the requirements for a language description formalism which can be used to reengineer the designs of existing, well-known GPLs, and to reuse those designs as basis for engineering new DSLs. The 1 Data encapsulation, which is often used as synonym of information hiding, is only one of many mechanisms to support information hiding, other well known mechanisms are interfaces, polymorphism, indirection, and standards. 10 Chapter 2. Requirements for Language Engineering main issues with design and use of specialized computer languages are analyzed in Section 2.1. A typical application scenario for a DSL is presented in Section 2.2. The design process of DSLs is further analyzed in Section 2.3. Later in Section 2.4 we discuss the impact of reusing existing language designs, in Section 2.5 we sketch how part of the safety, progress, and security requirements of a system can be guaranteed on the language level, and in Section 2.6 is is shown how language description frameworks can be used to simplify language implementation. Finally in Section 2.7 the requirements for a language description formalism are formulated. 2.1. Problem Statement 2.1 11 Problem Statement The DSL approach exploits the design, implementation, and use of a new language, which is tailored for the needs of the domain at hand. Restriction to fewer, specialized features is considered as an advantage, since it allows one to hide more internal information. DSLs increase productivity not only through information hiding, but also by providing better scope for software reuse, possibilities for automatic verification, and their ability to support programming by a broader range of domain-experts. Often all these advantages are however shadowed by the cost incurred in designing and implementing a new language. Additionally, DSLs also involve relatively high maintenance costs since knowledge about the underlying domain usually grows with experience, and changing requirements lead to frequent revisions in the language. Resolving these issues is critical in making the DSL approach feasible, since otherwise it amounts to shifting the entire complexity of program development into the implementation and maintenance of the DSL. The situation is further aggravated by the fact that many small DSLs have an extremely limited number of potential users, sometimes also a brief life-span, and therefore do not justify too much effort from outside the group using the language. Designing a DSL is an important problem in itself and is a topic of research. However, it is not difficult to imagine a scenario where this problem is subsumed by the complexity in its implementation, and even more by maintenance, which involves specialized skills (in compiler technology, for example) usually not available with the domain experts who use the language. From this situation we identify two main problems, which hinder the wide use of the DSL approach despite a long list of successful examples in the literature. The problems which faces every new DSL can be formulated as follows: 1. Users of the DSL are not familiar with the design of the new language. 2. Designers of the new language are often not experienced with techniques for implementing a new language. The first problem hinders the use of a new language, while the second prevents successful implementation of the new language. A systematic approach for the solution of these two problems could be provided by a language engineering method which allows for ¯ a library of major existing language designs, ¯ the definition of new languages by reusing the design library, and ¯ the generation of language implementations directly from the language definitions. Providing a library of existing language designs contributes to the solution of both problems, users can see existing language designs, which they already know and understand the language description style used, and designers 12 Chapter 2. Requirements for Language Engineering can start with working descriptions and learn the description style by example. Reusing the design of existing languages not only simplifies the job of the designer but also helps the user to quickly understand the new language based on the reused existing one. Finally, if implementations can be generated from the descriptions, the problem of implementing the language can be reduced to the problem of defining the language in the given language definition style. In order to follow this approach, the form of language definitions is of utmost importance. Most existing language definition formalisms are based on declarative techniques and are most suited to define languages with a declarative flavor. Since we understand the use of a DSL as a mechanism for information hiding of complex imperative and object oriented software systems, we need a language description formalism that allows one to map DSL programs directly into imperative, state-based algorithms. Those domain experts which are currently solving successfully domain-problems using main stream imperative and object oriented languages should be able to transfer their programming knowledge and experience directly into the design of a DSL. The DSL is thus a means to reuse their experience in a way where low level details about the implementation are hidden from the user and where implementation knowledge is moved into the language definition. Our view on DSLs is in stark contrast to most of the existing DSL literature, which focuses on static, declarative DSLs. The most interesting paper which compares a declarative with an algorithmic DSL for the same application domain is the paper of Ladd and Ramming (137). They show how in an industrial context the development of software for telecommunication switches has been moved from C to an algorithmic, imperative DSL, and then further to a declarative DSL. Their case study shows clearly the advantages of the later declarative solution over the imperative one. One possible objection to their argumentation is that it may have been possible to define a more abstract imperative DSL, which would have shared most of the properties of the declarative language. Further at several places they are assuming that imperative, algorithmic languages are automatically ”general purpose” or ”Turing-complete” and further they take for granted, that an algorithmic, imperative DSL cannot be used as starting point to generate different software artifacts or to do analysis. Typically, algorithmic, imperative DSLs have reduced expressivity with respect to general-purpose languages, they have often an elaborated declarative static semantics, and besides the intuitive execution there are typically other things one can generate from them. Even for clearly declarative DSLs, an additional dynamic semantics can be useful. If the declarative DSL specifies some sort of computation, it may be useful to add a dynamic execution semantics which is only used as an intuitive example of a possible execution behavior. Such a dynamic semantics would be given just for the purpose of delivering to the DSL user a state-based intuition. Another situation where adding imperative or object oriented features to a declarative DSL may make sense is scripting. If scripting is needed, it may be useful to extend a declarative core language with algorithmic features for 2.1. Problem Statement 13 scripting. This integrated scripting will certainly lead to simpler semantics than combining the declarative language with some general purpose scripting language. Since there are not many algorithmic DSLs described in the literature, we sketch in the next section a typical application scenario. 14 Chapter 2. Requirements for Language Engineering 2.2 Typical Application Scenario The most beneficial applications for DSLs exist, when two different groups of people must influence the behavior of a system. In such cases there is no clear separation between developers and users of a system. For instance, in the finance industry it is very common that both IT-experts and domain experts code part of the application. More complex IT tasks are solved by a high-level team of computer scientists, providing for instance a sophisticated data-base architecture and methods how to manipulate the data-base in a consistent way. The financial domain experts apply so called “office tools” like “Excel” or “Access” to “program” their own small applications on top of that infrastructure. This process is called “end user programming” (105). The problem with using “Excel” or “Access”‘ for end user programming is their unrestricted expressiveness. The user is for instance not prevented from doing domain specific errors like calculating the sum of revenue and earning of a company in her/his spreadsheet calculations. A small DSL, allowing only for programming with domain specific, restricted expressiveness could make the process less error prone. We are convinced that a large part of the knowledge built into complex financial application suites could be leveraged into the semantics of a financial DSL. As a concrete example we look at trading strategies. In today’s financial markets it is more and more common to use systematic trading strategies rather than buying and selling financial products in a non systematic, intuitive way. Because of their algorithmic nature, trading strategies are good candidates for automatization. The presented case study is based on an actual need of brokerage departments in large Swiss banks to automatize trading strategies. In Section 2.2.1 we describe why automating trading strategies is important in the brokerage department of a bank, in Section 2.2.2 we analyze the problems using traditional GPLs or office tools. In Section 2.2.3 we show why using a DSL for their automatization is better than using a traditional PL and in Section 2.2.4 we conclude that DSLs are especially appropriate for the financial sector, since the requirements are changing very fast in this industry (53; 215; 109). 2.2.1 Situation In a large bank, almost all transactions are finally executed by the brokerage department. The traders try to optimize their actions using systematic trading strategies. Three examples are given here. ¯ The traders must execute large amounts of orders generated by various other departments. Certain techniques can be applied to predict the development of the price of a financial product for the next few minutes and based on this assumptions the brokerage department may optimize its role as a buffer between the orders flowing in from other departments and the real market. ¯ For certain financial instruments, the bank is a market-maker, constantly offering to buy at a certain price, the bid price, and to sell at a slightly higher price, 2.2. Typical Application Scenario 15 the ask price. If there are more sellers than buyers, the market-maker is lowering the prices until the market balance is reestablished. In the case of more buyers than sellers, the prices are increased. This process is called spread trading. ¯ It may be possible that a large client wants to execute a systematic, repetitive pattern of trades. This may serve, for instance, to hedge the client’s risks resulting from other, non liquid investments. A number of systems are supporting the traders in this activities, but because of the volatile requirements many tasks have to be executed by hand. The factors which constantly change the requirements are regulations coming from outside, internal management decisions, competition, and specific requirements from clients. If in the current situation some repetitive tasks are identified, the brokerage department may specify an application which helps them automating those tasks. The IT department is subsequently trying to implement the software according to the specification. In a large bank, the production cycle from the specification to the working software takes typically about three months. In this time, both security and usefulness of the new application are tested, and possible technical problems are identified and solved. After the production cycle, the software can be used by the traders. This process may be too slow for the problem to be solved. Thus in many cases, the brokerage department will prefer to continue executing the tasks without automation. Since the costs for brokerage work-force are very high, and since even highly-trained experts tend to make more errors if they do repetitive tasks, the bank may lose money. Alternatively, the domain experts develop their own application using an office tool like “Excel”. Experience shows that such an ad-hoc solution is creating often more problems than it solves (39; 23). 2.2.2 Problem It is relatively easy to write a program implementing the described trading strategies. The problem is not the coding of the algorithm, but the fact that the production cycle of three months makes the strategy to be implemented often obsolete. If we analyze what happens in those three months to a trading strategy software, we find a number of necessary activities which cannot be skipped. ¯ It must be tested whether the software correctly implements the strategy defined by the trader. An informal specification is always a source of misunderstandings. Often some information is lost between the know-how of the trader, and the implementation done by IT specialists. ¯ The software must be checked to behave always in a friendly way, not trying to use too many system resources, or to enter trades which would result in noncontrollable situations. ¯ The risk-monitoring system of the bank must be used in a proper way. If a certain situation leads to an exposure which is pulling the trigger of the risk 16 Chapter 2. Requirements for Language Engineering measures, the software must stop executing the trading strategy and a rescue scenario must be triggered. ¯ The internal regulations determine which authorizations are needed for certain trades. In some situations, the software must thus interact with the traders to get digital signatures for the authorization. If several trading strategies are implemented, many problems have to be solved repeatedly. Each resulting application has to pass the production cycle. If a general problem in the trading-strategy domain is detected, this problem can only be solved for the currently developed application. Older trading-strategy applications which may have the same error cannot be easily adapted and often the faulty behavior will show up in several applications. Since the initial requirements are often ambiguous, and since problems with the application are most often fixed on the code level, the applications are often no longer consistent with their documentation at the end of the process. In a competitive environment, there will as well be no time to document the application properly. There is thus a danger that the resulting applications are not well specified, and cannot be maintained over a longer time period. 2.2.3 DSL Solution A possible solution to this problem is to design a DSL for the specification of trading strategies. We call this DSL TradeLan. The elements of TradeLan are actions to enter, buy, and sell orders in the system, to “hit” orders being listed in the system, and to evaluate various indicators (including bid and ask price of the financial instrument to be traded as well as responses from the risk monitoring tool) as basis to decide when and how to execute certain actions. Using the DSL approach, it is possible to tailor TradeLan such that ¯ only well-behaving trading strategies can be specified, ¯ the risk-monitor-system is automatically used in an intelligent way for any specified strategy; strategies which are not implementing the risk regulations cannot be defined, ¯ authorization checks are executed where necessary; there is no way to turn this feature on or off. The specific problems for trading strategies are thus solved generically for all strategies written with TradeLan. The TradeLan programmer does not need to think how to solve these problems; she/he may concentrate on what the trading strategy is intended to do. The implementation of TradeLan adds all other necessary actions. The implementation of this DSL will go through the three month production cycle. Probably it will even take some time longer since a DSL application is more complex than a simple trading strategy application. After the implementation went through the production cycle, the traders are faced with a completely new IT situation. 2.2. Typical Application Scenario 17 ¯ A trader can now specify her/his trading strategy using TradeLan. For a programmer, writing a TradeLan specification will not look much simpler, but for the trader, a TradeLan program looks like an informal description of his ideas using trader terminology. ¯ From such a specification the implementation is generated, and the trader can immediately see whether the application is doing what she/he wants. Most importantly, additional trading strategies do not have to pass the production cycle any more. They can be implemented using TradeLan, and TradeLan specifications are just input to the TradeLan implementation which passed the production cycle already. ¯ Another advantage is that the people who defined the trading strategies can maintain them on their own. The TradeLan specifications look like informal specification documents, and they can be managed like other documents. Since they are understandable by the traders, they serve as documentation of the trading knowledge built up in the bank. These advantages are offset by the typically high costs of designing, implementing, maintaining, and introducing a new DSL, if a suitable approach for engineering such languages is not available. 2.2.4 Conclusions and Related Applications Time to market is the most important factor in the financial industry (63). If a new business opportunity is found, a quick implementation of the corresponding IT solution decides over the commercial success. However, the financial risks with each transaction imply that software must be deployed carefully (195). The above described solution shows that it is a good idea to generate the applications from explicit descriptions of the business rules, rather than implementing each repetitive problem by hand. Main reasons are the long production cycles and the problem that a lot of domain-knowledge is lost at banks, since the traditional applications do not force the user to keep specifications consistent. Knowledge flows into application source code, from where it can only be retrieved with difficulty. For these reasons we expect that DSL techniques will establish themselves faster in the finance industry than in other more static business domains. An application area related to trading strategies is the specification of financial instruments or contracts. The problem of defining contracts is becoming increasingly acute as the number and complexity of instruments grows (118). Probably the first publicly known implementation of a financial product specification has been created by JP Morgan in the context of their Kapital system (179), which was the first environment where the DeAddio’s and Kramer’s Bomium architecture (53) for specifying complex financial instruments has been applied. During his research Van Deursen has introduced Rislan (214; 215), a formal and exact language for specifying financial contracts. This language has subsequently been used by CapGemini in their Financial Product System software (218). Later the company LexiFi Technologies has introduced mlFi (109; 18 Chapter 2. Requirements for Language Engineering 62), a similar language which has been initially formulated as DSL. Such languages not only enable traders to be more precise in constructing deals, but such a contract definition can provide the basis for valuing contracts, as well as automating and managing their processing trough the transaction live-cycle. The trading strategy application described in this chapter would become even more interesting, if trading strategies could be defined not only over a fixed set of existing financial contracts, but over freely defined types of contracts, using a language such as Rislan or mlFi for the contract specification. The static information of a financial contract specification could then be used as parameter for the dynamic semantics of a trading strategy language like TradeLan. Another promising area for applying DSLs in finance is the tailoring of research articles to specific market and client situation. The company A4M (135) has used Montages to develop for a small financial service provider a technology where three specially tailored DSLs are used to generate research reports for complex structured financial instruments. The first DSL, called InstruLan is used to describe the structure and semantics of the analyzed financial instruments, the second one, called IndiLan is used to define the calculation and naming of financial indicators derived from the available data, and the third language, called FinTex is used to give text fragments as well as the logic how to compose them to full-blown, natural-language financial-analysis texts, which may be personalized for specific clients, interest groups, risk profiles, e.t.c. In contrast to Rislan and and especially mlFi, A4M’s InstruLan is not a fixed, full blown language for specifying all kind of contracts, but InstruLan is a minimal language adapted to the clients existing set of products and terminology. Experience with using InstruLan for a large international bank in Zürich shows that in practice a family of minimal DSLs for specifying financial products, adapted to the needs of different clients may serve them better than a one-fits-all solution. On the other hand, an industry-proven product specification approach such as Bomium is a perfect basis to explore new types of financial instruments. 2.3. Designing Domain Specific Languages 2.3 19 Designing Domain Specific Languages Early in the history of programming language design, the idea arose that small languages, tailored towards the specific needs of a particular domain, can significantly ease building software systems for that domain (24). If a domain is rich enough and program tasks within the domain are common enough, a language supporting the primitive concepts of the domain is called for, and such a language may allow a description of few lines to replace many thousand lines of code in other languages (94). A good starting point for designing a domainspecific language (DSL) is a program-family (176). This idea is elaborated in the FAST (227) 2 process for designing and implementing DSLs. Central to FAST is the process to identify a suitable family of problems, and to find abstractions common to all family members. Traditional software development methods would use the knowledge about a family of problems and common abstractions as well, but in a more informal way. In FAST, as well as other DSL processes, one tries to use these abstractions to produce implementations of family members in an orthogonal way. Rather than crafting an implementation for each problem at hand, one designs an implementation pattern for each abstraction, in such a way that implementations of single problems can be obtained by composing the patterns. Typically such implementation patterns are therefore developed with a GPL supporting generic programming in some way. To this point, FAST is very similar to most reuse methodologies. We visualize the situation as follows. In Figure 1 the problem family contains members m1, m2, and m3. The common abstractions a1, a2, and a3 are depicted as shapes, which occur repeatedly in the family members. The implementation patterns i1, i2, and i3 are then developed for for each abstraction. In Figure 2 the process to construct an implementation is represented by the triangle. The input to the process is a member of the problem family and the implementation patterns of the abstractions. The output is an implementation solving the problem. In the next step of the FAST process a language is designed for specifying family members. The syntax of the language is based on the terminology already used by the domain experts, and the semantics is developed in tight collaboration with them. The goals are to bring the domain experts into the production loop, to respond rapidly to changes in the requirements, to separate the concerns of requirement determination from design and coding, and finally to rapidly generate deliverable code and documentation. The design process of the language consists of introducing syntax for denoting the abstractions we identified in the first step and defining the allowed constructions of complete sentences in the new language. This definition should capture the knowledge gained from the implementation patterns and exclude all non-correct combination of the abstractions. The possibility to define exactly in which way the syntax and the semantics of the language allow us to combine the basic abstraction is the big advantage over traditional ways of reusing 2 Family oriented Astraction Specification and Translation 20 Chapter 2. Requirements for Language Engineering family of related problems: abstractions: a1 a2 m1 m2 a3 m3 i1 GPL i2 GPL i3 GPL implementation patterns: Fig. 1: Identification of problem domain and abstractions family member abstractions: GPL problem n tio abstraction lu so GPL GPL orthogonal process GPL Fig. 2: implementation Orthogonal process for implementing family members 2.3. Designing Domain Specific Languages 21 abstractions, such as libraries or component frameworks. In none of the later two the user can be forced to use an abstraction in the right way: either the user is allowed to use the function or component, or not. By means of a language, the complete context of applying an abstraction is known, and the use of an abstraction can be allowed for certain contexts only. As visualization, in Figure 3 we schematize a DSL definition and the relation of its syntactical productions feature 1 . . . feature 3 to the corresponding abstractions. The bottom left corner contains a number of DSL programs specifying the problem family members in the bottom right corner. The arrow from the problems to the abstractions and the one from the DSL definition to the DSL DSL definition abstractions DSL feature 1 ::= feature 2 ::= feature 3 ::= DSL DSL DSL DSL programs Fig. 3: Design of DSL for family member specification problems 22 Chapter 2. Requirements for Language Engineering programs depict the engineering process as it is described up to now: deriving abstractions from the problem domain, defining a DSL for specifying over such abstractions, and using the DSL to specify the problems in the domain. We would like to note the difference between the GPL program resulting from the orthogonal process in Figure 2 and the DSL program in Figure 3. While both are related to the same problem, the GPL program is directly executable, while the compiler or interpreter of the newly designed DSL has to be implemented. In fact the implementation costs for a new DSL can be very high, if no specialized language implementation method is available. This leads to the last step in the FAST process, the implementation of the DSL. One possibility is to use a meta-formalism to formally define syntax and semantics of the introduced DSL, and to generate the implementation from this definition. Alternatively traditional compiler or interpreter construction tools can be used. The DSL implementation process is shown in Figure 4. This figure corresponds directly to Figure 2 but the informal description of the family members has been replaced by the formal DSL descriptions, and the implementation patterns have been combined with the specification of the DSL (production rules feature 1, feature 2, feature 3 on the right), resulting in a full specification of the DSL. DSL feature 1 ::= DSL GPL feature 2 ::= problem Fig. 4: Implementation of a DSL abstraction n tio lu so GPL GPL feature 3 ::= GPL 2.4. Reusing Existing Language Designs 2.4 23 Reusing Existing Language Designs It is often a concern that the broad use of technologies for the introduction of DSLs would lead to a confusing number of different languages. The worst situation would be the coexistence of languages, where ¯ slightly different kind of syntax and semantics are used for features being functionally identical, and ¯ the same syntax is used for features being completely unrelated. Most confusing gets the situation, when the exactly same task needs to be done in different languages, but the languages solve the task in different ways. For instance in the Centaur tool-set (35) the two DSLs for specifying prettyprinting and dynamic semantics processing of parse tree are providing different syntax for accessing the leaves of the tree, although both DSLs work on the same tree-representation in the Centaur-engine. Our experience with using the system (124) shows that such a situation has a negative impact on productivity. Instead of designing new languages from scratch, as done in many existing DSL methodologies, we propose reusing designs of existing languages. This approach allows us to engineer the set of languages being used, rather than considering them as unrelated, incompatible entities. Our approach is to start with a library of existing, well-known language designs and to create new languages by applying the following four language-design reuse patterns: ¯ restriction Take an existing language and restrict its expressiveness. This can be done by removing features, or by fixing the possible choices for some features in a context dependent way. ¯ extension Add a new feature to an existing language by combining existing features under a new name, or by adding a new kind of semantics3. ¯ composition The synthesis of a larger languages as a combination of small sublanguages. This pattern allows the designer to describe, test, and teach small subsets of language features, and combine them later to real-live languages. ¯ refinement Change the semantics of an existing construct. This is the most dangerous pattern. Typically it is applied in such a way, that the intuitive semantics remains the same for the user, but some details are adapted to a special situation. If a language is designed based on existing well know languages there are more users which are familiar with part of the design, and a language description methodology which supports synthesis of new languages trough the actions of restricting, extending, composing, and refining existing descriptions simplifies 3 Technically, the extension-pattern can be considered as a special case of the combination pattern. From the language user’s point, they are very different, since the extension pattern involves only one existing language, while the combination pattern combines at least two different languages. 24 Chapter 2. Requirements for Language Engineering the task of the language designer to implement the language easily. Further, some of the advantages of DSLs as listed in Table 1, can be combined with the advantages of GPLs with respect to DSLs are listed in Table 2. 2.4. Reusing Existing Language Designs Tab. 1: Advantages of DSLs Compactness Abstractness Self Documentation Safety Progress Security Tab. 2: 25 Features are focused on problems to be solved. Fewer concepts have to be learned to master the language. A larger group of people can use the language. Since the specific application domain is known in advance, abstractions can be found, and many details can be hidden in those abstractions. Systematic use of the established terminology in the problem domain results in good self documentation. Absence of a feature in a DSL guarantees its absence in all programs written with that DSL. Transactions consisting of a number of actions can be encapsulated in the semantics of specific constructs. Correct authorization of each action can be guaranteed by the language definition. Advantages of GPLs Stability Existing Solutions Education Available Tools The language design has proven its consistency and will not change too much over time. Many problems have been solved with the language. Not everything has to be done from scratch, and many examples of how to use the language exist. Many programmers know how to use the language and it is easy to find experienced developers. Typically GPLs GPLs are supported by compilers, interpreters, debuggers, and other tools which are integrated in one, versatile development environment. 26 Chapter 2. Requirements for Language Engineering 2.5 Safety, Progress, and Security The systematic introduction of new languages as extensions, restrictions, compositions, or refinements of existing languages can be used to guarantee some of the safety, progress, and security requirements of a system. Following Szyperski and Gough (206) these properties can be defined as follows: Safety Nothing bad happens. Progress The right things do happen. Security Things happen under proper authorization. Using language design for guaranteeing some of these properties is a common technique (206). For GPLs only general properties like strong typing can be embedded. In a relatively narrow domain, many more requirements are known. Restricting, extending, and refining existing languages can be used to guarantee safety, progress, and security on the language level, rather than on the code level. The pattern to using language restriction for safety is already described (200) , but the idea to use language extension for progress, and language refinement for security have not been discussed earlier. As a disclaimer for the following discussion we would like to note that all of this problems can and are solved with traditional programming means as well. We try to highlight some advantages if the problems are solved on the language level rather than on the implementation level. The first idea is to achieve safety by reducing expressivity of the programming language used for the critical components of a system. Reducing expressivity can be done by removing language-features, or by fixing the possible choices for some features in a context dependent way: for instance one could remove features to interact with external computers from pieces of code that serve for internal calculations only. In this way it is possible to guarantee safety conditions on the language level, allowing source code developers to concentrate on non-security-critical details. We call this technique safety through reduced expressiveness. An example is a safer subset of C presented in (64). Although reducing expressivity of languages is not a general solution to safety problems, a framework in which language features could be turned off individually would allow the developers to solve some safety problems. For instances computer viruses relying on certain language features could be stopped by allowing those features only in parts of the system which are completely write-protected from the network. Security may be achieved by refining the semantics of an existing language feature such that correct authorization is guaranteed. As an example, consider a situation where a central security server has to be informed before each security critical call to a given library. This problem can be solved with a standard application programming interface (API) for the library. The problem with the API approach is, that changes in the library must be correctly reflected in the API, and each time a new function is added, there is the danger someone forgets 2.5. Safety, Progress, and Security 27 to implement all API rules, such as the above mentioned rule that a security server has to be informed. Our approach would be to change the programming language which is used such that the central security server is informed automatically, whenever the critical library is called. Like this it is guarantees that all authorization is done correctly, independent of how the application and the library are developed. A typical example related to progress is a requirement that after opening a transaction, either all parts of the transaction are executed successfully, leading to a commit of the transaction, or a roll-back is triggered. Our idea is to guarantee this requirement by encapsulating the complete process in one new language construct. Of course such a construct has to be added to a language that has been restricted such that the transaction cannot be started otherwise. Another issue applying this approach could be performance problems. Reuse of existing language designs and the subsequent restriction, extension, composition, and refinement of their definitions, both syntactically and semantically, are basic building blocks for a realistic application scenario for engineering of computer languages. An example for defining a DSL by first restricting to a subset and then extending with domain-specific features can be found in (20) where a protocol construction language is defined as extensions on top of a subset of C. We illustrated that achieving safety, progress, and security on the language level may be the conceptual motivations for introducing a DSL. 28 Chapter 2. Requirements for Language Engineering 2.6 Splitting Development Cycles From a high-level viewpoint every software development cycle can be presented as in Figure 5. A system is specified, a suitable architecture is designed, the software is implemented, tested against the specification, and finally brought into a form suitable for deployment. The platform for such a cycle is typically a GPL with its support tools, visualized in the figure as the innermost box labeled “platform”. The result of going through the cycle is the creation of an application, which serves as the “platform” for the user to solve her or his daily problems. The user provides positive and negative feedback on the correctness, efficiency, general usefulness of the application. This feedback, along with additional requirements, triggers a new development cycle, resulting in a new version of the application. Using a process like FAST the development of a system is split into two independent development cycles, as shown in Figure 6. In a first development cycle, a DSL is designed and implemented. The “application” resulting from this cycle is the DSL being used in the second cycle to specify and implement enduser applications. Users of the application provide feedback for the applicationdevelopers, and application-developers, who are also DSL-users, provide feedback to the DSL developers. This situation allows for an interesting split of maintenance tasks. Fine tuning and solution space exploration of the problem is done in the application development cycle working with the DSL, while improving performance and porting to other software and hardware architectures is typically done by refining the DSL definition. Similarly, reuse of algorithms happens on the level of DSL programs, while reuse of interfaces to underlying hardware and software architectures happens on the DSL-definition level. The crucial software development problem in such projects is often the implementation of the DSL. This stems from the fact that in many cases the identified problem family is intricately structured, but each single family member is quite a simple problem. The implementation of such a family member can thus be relatively simple, compared to the costs for implementing the DSL. For a successful application of a DSL, the additional implementation costs for the DSL must be offset by the reduced costs of repeatedly using the DSL to solve problems of the problem family. Methods that minimize the costs for design and implementation of DSLs increase considerably the number of useful and feasible DSL applications. Recently a lot of research was dealing with the problem how to minimize the costs for implementing a DSL (90; 21; 68; 163; 209; 200). The main idea behind most approaches is to define a language definition formalism which can be used to define the DSL, and to generate an implementation from such a definition. Having such a formalism and tool at hand, it is possible to split the development process into three development cycles, as shown in Figure 7. While in the above described two cycle model a GPL is used in the development cycle of the DSL-definition, in the three cycle model, a language definition formalism (LDF) is used for the DSL-definition. The third development cycle, 2.6. Splitting Development Cycles 29 ti ica cif de pl oy m en t e sp on platform: platform: GPL application g de sig n ti n te s implementation development cycle of application user feedback Classic development cycle of applications t en en m m oy pl pl de on on ati ati ific ific de ec ec oy sp sp t Fig. 5: platform: platform: GPL platform: DSL application n sig de g g de tin tin sig n tes tes implementation implementation development cycle of development cycle of DSL application user feedback Development cycles of DSL and application t en m oy pl de t en m oy pl de sig n de sig n g tin g tin de sig n application implementation development cycle of DSL feedback tes tes g tin implementation development cycle of LDF platform: DSL de en m oy pl de n on tio tes implementation Fig. 7: platform: LDF development cycle of ati ica platform: GPL ific ec on ati ific ec cif platform: sp sp e sp t Fig. 6: feedback application feedback user feedback Development cycles of Language Definition Formalism(LDF), DSL, and applications 30 Chapter 2. Requirements for Language Engineering shown on the left side of the graphic, is concerned with the development of the language definition formalism . The “application” generated by this cycle is the language development tool. The second cycle now uses the LDF as platform for the development of the DSL. Interfaces to existing hardware and software architectures as well as program generators for parser and other language technologies like attribute grammars are provided by the language definition formalism, allowing the DSL-designer to concentrate on efficiency, integration, and extensibility issues in the problem domain. We hope that in this way the costs for DSL implementations can be split over many domains. However in the three-cycle model one has to consider the costs of learning the LDF as well as the costs for defining the DSL with the LDF. The sum of learning an LDF and implementing a DSL for the first domain may be larger than the costs to implement a DSL from scratch. Once the LDF method is learned, its application to new domains can be done with little costs. Restriction of the LDF to well known techniques such as EBNF, Attribute Grammars and Flow Charts avoids creating a new problem of understanding language definitions. 2.7. Requirements for a Language Description Formalism 2.7 31 Requirements for a Language Description Formalism In order to solve the stated problems, a language description formalism and the corresponding language design method should fulfill the following requirements. ¯ The techniques used for defining languages should be well known. The typical background of a programmer should be sufficient to understand the descriptions. EBNF and flow charts are typically the “specification tools” of a programmer. ¯ Languages should be described in a “compact” form. This is important since many users deal with large software projects and do not have the additional resources to create and maintain huge language descriptions. The size of a language specification should evolve linearly with the number of production rules in the grammar. ¯ A language description should be built with small, independent building blocks. Reusing the features of a language should involve a minimal interface with other components of the language. A mechanism for the modularization of language specifications is therefore needed. ¯ A library of specifications of major programming language concepts should be available. This library should cover both concepts for programming in the small, which can be reused to synthesize efficiently a DSL without reinventing details such as expressions, as well as concepts for programming in the large, which can be used to extend a DSL with state of the art modularization concepts, such as object orientedness. Most important, the modules of the library should have a high level of decoupling. ¯ Tool support should provide a comfortable development environment for the specified languages. Not only an interpreter or compiler should be generated from the specifications, but as well a number of support tools, such as debuggers, program animators, and source analysis tools. 32 Chapter 2. Requirements for Language Engineering 2.8 Related Work It may be correct to say that the concept of DSLs has not been invented but observed. One of the earliest references to DSLs is Landin (141). The large problem space to which software systems may be applied has caused a proliferation of such specialized languages. There has never been agreement on whether a multitude of different languages should be supported and managed by appropriate tools, or whether one should try to define languages like Ada or C++ which can be used to cover all problems. One solution to combine advantages of specialized languages and general purpose languages is to provide programming languages which are extensible with domain-specific features. Research on extensible programming languages, as summarized by Standish (202), has led to insight both in techniques to allow for extensibility and problems related with extensibility. Extensibility as language feature has often led to more maintenance problems than it has solved. Altering the semantics of existing languages has been identified as especially harmful. Examples of successful extensible programming languages are CLOSE (119), an object-oriented Lisp language, and Galaxy (22), an efficient imperative language. In both cases, the extension features have been used to bootstrap the implementation of the languages. The general problem of tailoring a programming language to the application domain forms part of language design research (230; 228; 96). With respect to the design of DSLs, the discussions about how to decide on feature inclusions are interesting. Knuth (121) argues that the inclusion and exclusion of features should be based upon observed usage in addition to theoretical principles. This idea has led to research on feature set usage analysis; a good summary can be found in the text of Weicker (226). The large amount of available material has even led to statistical investigations (196). The use of different DSLs with comparable definitions may lead to new applications of such work. An interesting paper looking at the use of DSLs for software engineering is the work of Spinellis and Guruprasad (201). The paper investigates typical software engineering problems, which can be nicely solved by introducing a DSL and shows a list of representative examples. The most interesting example deals with the use of about 10 DSLs for the development of a CAD system in civil engineering (199). A software engineering discipline for which DSLs are especially well suited is rapid application development. Boehm notes that portions of certain application domains are sufficiently bounded and mature so that you can simply use a specialized language to define the information processing capability you want (26). He further highlights that individual users with relatively little programming expertise can, in hours or days, generate an application that once took several months to produce. Looking at DSLs from a broader perspective, they are most naturally considered as part of domain engineering (165; 14; 166). The FAST process discussed earlier is an example for a domain engineering process focusing on DSL design. The method is based on previous work about program families (176). 2.8. Related Work 33 FAST has been used by Weiss’s group at Lucent and now at Avaya for over twenty different projects in software production. Experience reports and a detailed description of the approach can be found in (16; 227; 15; 50). A related approach being developed before FAST is the Reuse-driven Software Process (or Synthesis) approach by Campbell (37; 36). This approach has been adopted by many companies such as Rockwell International, Boeing, Lockheed-Martin, and Thomson-CSF. The programming language C++ has turned out to be a good platform for the development of sophisticated domain-specific frameworks. Very often these frameworks are of generic nature. Recent work (48; 51) shows how DSLs can be used to make such frameworks accessible to domain experts, and how to combine DSL based processes like FAST with generic frameworks. Another very promising approach is the Sprint method (210; 47). It follows the view that a DSL is a good parameterization for a domain-specific framework. Having efficient C++ frameworks at hand, using denotational-semantics for the language definition, one achieves both efficient implementations and nice formal semantics. Combining generic frameworks with DSLs is further pursued in the Jts approach (21). This approach provides a set of tools which allow mainstream languages to be extended with domain-specific constructs. The implementation of existing language designs is directly reused and not generated from a language definition. The DSL technique is used only for new constructs. This approach is very realistic, since the description of existing languages and the generation of tools for this languages is very hard. Methods based on established compiler construction tools like Coctail (77) and Eli (76) include full descriptions of existing languages and the generation of a state-of-the-art compiler. Since construction of an efficient compiler is a complex task, some of this complexity cannot be fully hidden, and the use of such tools is not very easy. In (180) the complexity of Eli is managed by allowing typical language features to be turned on and off, but this approach hides those details which would be needed to access the definitions of the existing languages. In general, all approaches for DSL implementation show that one has to make a trade off between ease of use and quality of the generated code. Focusing on the support tools, rather than the actual language compiler or interpreter, the mid and the late-eighties saw a proliferation of different programming environment generators, some of the best known among them being the Synthesizer Generator (189), Centaur (35), Pan (19), Mentor (61), PSG (18), IPSEN (66), Pecan (188), Mjolner (147), Yggdrasil (38), GIPE (91) and ASDL (123). The current work on DSLs has renewed the interest in these frameworks. For example, the ASF+SDF Meta-Environment (120; 213) has been used to successfully implement several DSLs being used in the industry (214; 216). Other work is concerned with generation of tools from attribute grammar description of languages (93). The flexibility associated with generating a language implementation from its specification results in significantly improving the ease in maintenance, 34 Chapter 2. Requirements for Language Engineering which is important in the DSL context (216). In contrast to previous work on programming environment generators where the main focus was on the generation of a language-based editing system, current interests, however, are more related to issues like generating efficient compilers, interpreters, debuggers, and above all, ease in specification. Some of these tools can be generated only if the runtime behavior of a program is contained in the language description. As a result of this the specification of dynamic semantics has gained more importance than in the past. While most existing applications in industry focus on small, declarative languages without dynamic semantics (44; 45), the abstract specification of dynamic semantics is an important topic of formal programming languages semantics, such as Denotational Semantics (192), Structural Operational Semantics (182), or Natural Semantics (110). Applying programming language semantics tools allows for high level specification of languages. A discussion on existing approaches for language definition formalisms tailored towards DSLs is presented by Heering and Klint (92). The main problem with applying programming language semantics approaches for DSLs is that they take advantage of a number of mathematical techniques like rewriting systems, algebraic specifications, or category theory which are not known to a typical computer-science engineer, let alone to the different kinds of domain engineers. Schmidt calls for a “popular semantics” (191) combining the formality of existing approaches with ease of use. Unfortunately many practical approaches cannot satisfy Schmidt’s requirements for a “popular semantics” since they are not based on a calculus allowing directly for correctness proofs. Among the classical programming language semantics approaches the Action Semantics (158) approach has been specially tailored for combining a traditional language semantics style with ease of use. The problem of modularity with respect to language descriptions has been investigated by Mosses and Doh (159; 160; 60). Besides the use of many mathematical concepts, another source of complexity in classical programming language semantics approaches is their common property to consider each parse tree as a syntactic entity. Two equivalent subtrees are represented as the same entity, and it is not possible to decorate the parse tree with attributes or intermediate results, and control/data-flow graphs must be encoded with tables or continuations. In newer approaches like (70; 80; 167; 183) each parse tree is formalized as a tree of objects, which can be decorated with attribute values, intermediate results, and direct links to other objects, representing the control/data flow edges. Poetzsch-Heffter defines occurrence algebras (186) which allow to combine the newer approaches with traditional techniques. Since one of the main problems with DSLs is language implementation costs, different implementation patterns have been investigated by Spinellis (200). He discusses both the language extension and the language restriction, or specialization pattern. The importance of language specialization for safety has been recognized clearly by him, but the relation of progress to language extensions is not discussed, since the focus of the paper is on language 2.8. Related Work 35 implementation rather than language design. We also propose to add a language refinement pattern for security. The language composition pattern, which we use repeatedly, is not mentioned in (200) since language combination is not possible with most existing language implementation techniques. At this point it is important to note that our composition notion is only informal and based on empiric results from a certain class of applications. An example for a state-based framework providing formal compositionality are Especs (177). In this text we are not focusing on the problem of how to describe the syntax of a language, but in practical applications of DSL design, the definition of syntax is the first, and thus most critical task. Many successful DSL applications show very simple, sometimes line based syntax styles. Another approach for avoiding syntax problems is to use XML for the representation of programs. Cleaveland discusses different DSL scenarios with XML-syntax and explains them carefully (45). An earlier, related approach are Lucent’s Jargons (163; 107; 161), and their support tool InfoWiz. InfoWiz is the major language implementation tool used in the FAST approach. Jargons build a family of DSLs with similar syntax on top of a host language called FIT (162). The variable part of a jargon is declared with WizTalk, a meta-language similar to XML. For the reuse of existing GPL designs including the original syntax, a full scale parser generator such as Lex/Yacc (143; 104) is needed. Already in 1988 the parser generator TXL (49) was proposed for the definition of dialects of existing languages. In general the syntax problem is much harder if existing languages should be reused. According to Jones at least 500 programming languages and dialects are available in commercial form or in the public domain (106). Lämmel and Verhoef propose a sophisticated methodology to efficiently derive parsers by reusing existing grammars (138; 139). The syntax problem is very hard, and at the same time very well investigated. We are therefore referring to the literature and concentrate mostly on semantics. Our treatment of characteristic and synonym productions allows an automatic generation of an abstract syntax tree (AST) from the concrete EBNFsyntax, as defined by Odersky (167). This choice is on one hand restricting the application of the current implementation to real-live programming languages with simplified syntax only, but on the other hand it simplified both the implementation of the tool, and the specification work with the tool. If we would have chosen a full fledged solution with completely independent treatment of concrete and abstract syntax, as featured by most of the mentioned attribute grammar and formal semantics systems, we would not have been able to design, implement, test, and validate a new programming language prototyping environment from scratch. One of the most successful language specification technique, Attribute Grammars (122) is not discussed in detail here, but later in the related work Sections 3.5 and 7.3. At this point we would like to mention only the work of Mernik et al. on reusable and extendable language specifications (153; 154). The authors discuss how to use object-oriented programming features to allow for incremental programming language development. Adding such features to 36 Chapter 2. Requirements for Language Engineering a specification environment is a very useful step, and the usability of many approaches, including the later introduced Montages approach, would benefit from such features. 3 Montages In the Chapter 2 we analyzed specific requirements for a language description formalism. These requirements have been used as design principles for Montages, a meta-formalism for the specification of syntax, static analysis, static semantics, and dynamic semantics of programming languages. ¯ An introduction to Montages is given in Section 3.1. ¯ After a short description of syntax related aspects in Section 3.2, ¯ in Section 3.3 it is shown how Montages define dynamic semantics by making the syntax trees directly executable. To formalize executable trees, we introduce the concept of Tree Finite State Machines (TFSM). ¯ The details of Montages related to lists, and non-local control flow are explained in Section 3.4. ¯ Finally in Section 3.5 related approaches are discussed and the results of Montages related work are reviewed. 38 Chapter 3. Montages 3.1 Introduction New languages are defined passing through a number of stages, from initial design to routine use by programmers, forming the so–called programming language life cycle. During this process, designers need to keep track of already taken decisions and the design intentions must be conveyed to the implementors, and in turn to the users. Therefore, as for other software artifacts, accurate, consistent and intellectually manageable descriptions are needed. So far, the most comprehensive description of a programming language is likely its reference manual, which is mainly informal and open to misinterpretation. Formal approaches are therefore sought. Montages is a new proposal for such a formal approach, which can be seen as a combination of EBNF, Attribute Grammars, Finite State Machines and a simple imperative prototyping language called X ASM. All of these techniques except X ASM are in some form part of the typical university curriculum of a programmer and we hope that the resulting descriptions are thus easy to understand by language designers, compiler constructors, programmers, as well as domain engineers. One of the main achievements of Montages is a new way to modularize the design of languages. Our library of existing language designs contains small specification modules, each of them capturing a language feature, such as scoping, sub-typing, or recursive method calls. In the current state, the library contains all features needed to assemble a modern object-oriented language such as Java. Most interestingly we managed to achieve a high level of decoupling among the modules. For instance we can treat exception handling independently from method calls or break/continue semantics. The library of language features is shown in part II of this thesis. Figure 8 illustrates the relationships between language specification and language instances, e.g. programs. On the left-hand side the syntax and semantics related components of a language specification are shown, and on the righthand-side, the corresponding process on language instances is shown. Syntax Syntax of a programming language is specified by means of EBNF productions. The EBNF productions define a context free grammar (42), and can be used to generate a parser. In Section 3.2 we specify the exact kind of syntax rules, as well as a canonical construction of compact abstract syntax trees (AST). The corresponding phase 1 of Figure 8 refers to the transformation of programs into ASTs. Static Semantics Static Semantics of programming languages is described by means of attribute grammars (122) and predicate logic. All static information, such as static typing, constant propagation, or scope resolution can be specified with attribution rules. The resulting attribute values of the AST are both used during dynamic semantics, and for the evaluation of the static semantics condition of each construct. In phase 2 the attribution rules are evaluated transforming the AST into 3.1. Introduction language specification: 39 language instances: Program Phase 1 EBNF Abstract Syntax Tree (AST) Phase 2 Attribution Rules Attributed AST Phase 3 Static Semantics Condition Validated AST Phase 4 MVL descriptions (local state machines) TFSM Phase 5 states transitions action rules conditions X ASM transition rules Fig. 8: Relationship between language specification and instances. 40 Chapter 3. Montages an attributed AST. The static semantics is given by means of predicates associated with the EBNF productions, so called static semantics conditions. Only if the static semantics condition of each node in the AST evaluates to true, the program is considered valid, otherwise it is rejected and not considered as a valid program of the specified language. In phase 3 the static semantics conditions are checked in order to validate the AST. Since attribute grammars and predicate logic are well-known formalisms, we do not explain them further in this chapter. The exact type of attribute grammars used by Montages is described formally in Section 7 and the formal description of static semantics definitions are deferred to Section 8.3. Dynamic Semantics Dynamic semantics defines the execution behavior of a program. Montages gives dynamic semantics by mapping each program of a described language into a finite state machine, whose states are decorated with actions which are fired, each time a state is visited. With other words, during execution control flows along transitions whose firing conditions evaluate to true, and at every state visited, the corresponding action rule is executed. Instead of giving a transformation from programs into state machines, we introduce a novel kind of state machines, called Tree Finite State Machines(TFSMs) (phase 4 of Figure 8). TFSMs are derived from an XML based DSL formalism developed by the author (126). By means of TFSM we can directly execute an AST, without transforming it into another structure. The execution behavior of the program is then given by executing the TFSM (phase 5 of Figure 8). In short, the TFSM semantics of an AST is defined by giving a local state machine for each EBNF production rule. The local state machines and their embedding into the TFSM are given by means of Montages Visual Language (MVL). MVL allows to define control flow both inside a local state machine, and between machines associated with different productions, both those of the symbols denoting siblings in the AST 1 and those of arbitrary symbols2. Entry and exit points of a MVL machine are marked by the special states ”I” (initial) and ”T” (terminal). Execution of a program starts by visiting the ”I”state of the AST’s root, and stops either by reaching the ”T”-state of the AST’s root or by being terminated by the action rules. Many interesting programs are not terminating at all. The introduction to TFSMs and their specification by means of MVL are given in Section 3.3. Vertical Structuring Unlike most other language description formalisms, in Montages the phases are not used to structure the specification horizontally in modules. Instead, for each production rule of the grammar a specification module, called a “Montage” 3 is given, containing The EBNF-definition, the attributions, the static semantics 1 This corresponds to so called “structural” control flow into the sub-components of a language construct. 2 This corresponds to more liberal ways of control flow such as goto-constructs. 3 Montage: The process or technique of producing a composite whole by combining several different pictures, pieces of music, or other elements, so that they blend with or into one another. 3.1. Introduction 41 A ::= ... B ... C ... EBNF ... attr a(p1, ..., pn) == T1 Attribution Rules ... C condition s1 Static Semantics Condition C1 S-B C3 S-C s3 @s3: R Fig. 9: s2 MVL descriptions (local finite state machines) C2 X ASM transition rules An abstract Montages example conditions, and the MVL-machine. Each Montage describes like this the semantics of a production rule, and can be considered in some sense a “BNF extension to semantics”(192; 191). A language definition consists of a set of Montages. Examples As an abstract example of a Montage containing all five parts take Figure 9. The first part contains an EBNF rule defining the context-free syntax, here a syntactic component contains among others components  and  . The second part is the attribution rules. Here an attribute  with parameters ½,   ,  is defined by term  ½. The third part, the static semantics condition is the predicate  . In the fourth part we see a first example for MVL. It is an abstract example, containing references to the  and  components, states ½ of the  -component, state ¾ of the  -component, and state ¿ of the -Montage itself, as well as transitions with firing conditions  ½,  ¾, and  ¿. It is missing the specification of the entry point ”I” and the exit point ”T”. The fifth part is the action rule associated with state ¿. A more intuitive example of a Montage containing ”I” and ”T” states is given in Figure 10. A while statement is specified, being different from a typical while by having a special action rule profile which is used to count how often a program loops. In fact, it is a global counter that counts iterations of all loops. The example is chosen since the state and action for profile makes the example more interesting, but also to show how a well known language construct can be slightly altered, for instance in order to support program profiling. The syntax of the while-construct is well known from typical imperative programming languages, such as Algol (164) or Pascal (231). The syntactic components are an expression, and a list of statements. The attribute staticType is used to guarantee that the expression component is of type BooleanType. The well known intention of the while-construct is to evaluate the expression, and then, if and only if it evaluates to true, to execute the statement list. After the execution of the statement list, the whole process is repeated. In our special version of 42 Chapter 3. Montages While ::= ”while” Expr ”do” Stm ”end” EBNF attr staticType == S-Expr.staticType condition Attribution Rules staticType = BooleanType I Static Semantics Condition T S-Expr S-Expr.value profile MVL descriptions (local finite state machines) LIST S-Stm @profile: LoopCounter := LoopCounter + 1 X ASM transition rules Fig. 10: The while example the while-statement, a counter LoopCounter is increased each time before the statement-list is executed. The local finite state machine specifies exactly this behavior. The control enters the machine at the special, initial ”I” state. The ”I”-state leads immediately into the expression. We assume that the visit of the expression results in its evaluation, and that the result of the evaluation can be accessed as attribute value of the expression. After the evaluation of the expression, there are two possibilities. Either the expression evaluated to true and therefore transition with the firing condition S-Expr.value to the profile-state is chosen, or otherwise the transition is to the special state ”T” is chosen. This second special state marks the terminal or final state of the local machine. Transitions like the one going to ”T”, having no firing condition are considered to fire in the default case. The default case is defined to happen, if no other transition exists whose firing condition evaluates to true. The Montages state machines first try to choose a transition with firing condition evaluating to true, else they choose a default transition. If there are several transitions, one is chosen nondeterministically. In our example, there are two transitions from the expression, one with firing condition going to the profile state, and one with default condition, going to the T state. If the transition to profile is chosen, the profile state is visited next. The corresponding action rule increases the value of LoopCounter by one. Afterwards the statement-list is visited. List elements are visited by default sequentially. After the execution of the last statement in the list, the transition from the list to the expression is chosen, and the expression is reevaluated. In a program a language construct is typically used several times. For in- 3.1. Introduction 43 ... x = 0; while_1 do x < 100 ... y = 0; while_2 do y < x ... plot(x,y); z = 0; while_3 z < x* y do ... draw(x,y); z = z+1 end y = y+1; y = x; while_4 do y > 0 ... plotR(x,y); y = y−1 x = x+1; end end fin() Fig. 11: Program stance in the program shown in Figure 11 we see four instances of while, which are numbered. The instances two and four are part of the statement-list of the first instance, and instance three is part of the statement-list of the second instance. This nesting is depicted as nested boxes. An alternative, more traditional representation of the programs structure is the syntax tree shown in Figure 12. In order to keep the representation compact, we represent lists as dotted boxes, and show only the parent-child relation from while-instances to their expression and statement siblings. The selectors S-Expr and S-Sum are used to label these relations. While the transitions in the While-Montage form an intuitive circle, representing loop behavior, it is less trivial to understand how this loop is applied to a complete program. Therefore we show how each transition in the Montages is instantiated in the syntax tree. The first transition in the While-Montage goes from the ”I”-state to the expression. In the program it connects the last state- 44 Chapter 3. Montages LIST ... x = 0; while_1 S−Expr fin() S−Stm LIST x < 100 ... y = 0; while_2 y = x; while_4 S−Expr x = x+1; S−Stm LIST y > 0 S−Expr ... plotR(x,y); S−Stm y = y−1 LIST y < x ... plot(x,y); z = 0; S−Expr while_3 y = y+1; S−Stm LIST z < x* y ... draw(x,y); z = z+1 Fig. 12: Parse tree ment before a while loop with the expression-component of a while loop. In Figure 13 the corresponding transitions are shown for all four instances of the while, being numbered accordingly. Correspondingly the transition from the expression-component to the profile state connects the expression of a whilestatement with the first following statement, as depicted in Figure 14. The ”I” and ”T” states are thus used to plug the state machine of each while-loop into the state machine of the program. Inside a while-statement, a transition with firing condition src.value goes from the expression to the profile state and a default transition links the profilestate to the statement-list. For each instance of a while the profile-state and the connecting transitions are drawn in Figure 15. Finally in Figure 16 the transition from the statement-list back to the expression is visualized. The complete transition graph is shown in Figure 17. The presented state machine is executed starting with the first statement in the topmost list, following lists sequentially if there are now explicit transitions, otherwise following the given transitions. In this way the program has been transformed in a state machine structure over the parse tree which is directly executable. Starting with the first statement, the variable is set to ¼. Then the transition leads us to the evaluation of ½¼¼. From this program fragment, two possible transitions can be chosen. One, assuming 3.1. Introduction 45 LIST ... x = 0; while_1 S−Stm S−Expr 1 fin() LIST x < 100 ... y = 0; while_2 y = x; while_4 S−Expr 4 x = x+1; S−Stm LIST y > 0 2 ... plotR(x,y); S−Stm S−Expr y = y−1 LIST y < x ... plot(x,y); z = 0; 3 S−Expr while_3 y = y+1; S−Stm LIST z < x* y ... draw(x,y); z = z+1 Fig. 13: Parse tree with I-arrows that the value of the expression evaluates to true, leads to the first profile-state, the second leads back to the topmost list of statements. Since  , the fist transition to profile is chosen, and the counter LoopCounter is increased by one. Then the list of statements within the first while instance is visited. After the update of to , a transition leads us to the expression-component of the second while component. Like this, the complete program can be executed. The main part of this chapter contains a more detailed overview of how Montages specify execution behavior of programs by making the parse tree an executable state machine. In Section 3.3 we give an intuitive definition of the execution behavior related aspects of Montages. It is shown how the MVL descriptions given for each language construct and the nodes of the AST define together the state-space and transitions of a special kind of state machines, called Tree Finite State Machines (TFSMs). In these machines, the states are pairs of MVL-states and AST-nodes. Each MVL-transition specifies TFSM-transitions for each AST-node associated with the Montage it is contained in. The definition of dynamic semantics by means of TFSMs is given in Section 3.3. In Section 3.4 the TFSM model is used to give the definitions of list processing and to explain how non-local transitions are defined in Montages. In order to make these descriptions more precise than the previous while-example, we start with a closer look on syntax definitions and the construction of the AST. 46 Chapter 3. Montages LIST ... x = 0; while_1 S−Expr fin() S−Stm 1 LIST x < 100 ... y = 0; while_2 y = x; while_4 S−Expr x = x+1; S−Stm 4 LIST y > 0 S−Expr ... plotR(x,y); S−Stm y = y−1 2 LIST y < x ... plot(x,y); z = 0; while_3 S−Expr y = y+1; S−Stm 3 LIST z < x* y ... z = z+1 draw(x,y); Fig. 14: Parse tree with T-arrows LIST ... x = 0; while_1 S−Expr fin() S−Stm LIST x < 100 ... y = 0; src.val 1 while_2 y = x; while_4 x = x+1; profile S−Expr S−Stm LIST y > 0 ... plotR(x,y); src.val S−Expr S−Stm 4 profile LIST y < x ... src.val 2 plot(x,y); z = 0; profile S−Expr while_3 y = y+1; S−Stm LIST z < x* y src.val 3 ... draw(x,y); z = z+1 profile Fig. 15: Parse tree with profile action and arrows y = y−1 3.1. Introduction 47 LIST ... x = 0; while_1 fin() S−Stm S−Expr LIST x < 100 ... y = 0; 1 while_2 y = x; while_4 S−Expr x = x+1; S−Stm LIST y > 0 ... plotR(x,y); S−Stm S−Expr 4 y = y−1 LIST y < x 2 ... plot(x,y); z = 0; y = y+1; while_3 S−Expr S−Stm LIST z < x* y 3 ... draw(x,y); z = z+1 Fig. 16: Parse tree with the back arrow LIST ... x = 0; while_1 S−Expr fin() S−Stm LIST x < 100 src.val ... y = 0; while_2 y = x; while_4 x = x+1; profile S−Expr S−Stm LIST y > 0 ... plotR(x,y); src.val S−Expr S−Stm profile LIST y < x src.val profile ... plot(x,y); z = 0; S−Expr while_3 y = y+1; S−Stm LIST z < x* y src.val profile Fig. 17: Parse tree with all arrows ... draw(x,y); z = z+1 y = y−1 48 Chapter 3. Montages 3.2 From Syntax to Abstract Syntax Trees (ASTs) In this section, the transformation from a program into an AST is described. This also forms the basis for classifying the nodes with characteristic and synonym universes and for navigating trough the AST using selector functions. 3.2.1 EBNF rules The syntax of the specified language is given by the collection of all the EBNF rules defined in the different Montages. Following the approach of Uhl (212), we assume that the rules are given in one of the two following forms:           The first form declares that contains the components  ,  , , and again in that order whereas the second form defines that  has exactly one of the alternative components  , , or  . Rules of the first form are called characteristic productions4 and rules of the second form are called synonym productions. It is then possible to guarantee that each non-terminal symbol appears in exactly one rule as the left-hand-side. Non-terminal symbols appearing on the left of the first form of rules are called characteristic symbols and those appearing on the left of synonym productions are called synonym symbols. EBNF also features lists and options which may be used in right-hand-sides of productions and are going to be introduced in Section 3.4. 3.2.2 Abstract syntax trees The treatment of characteristic and synonym productions described above allows an automatic generation of an abstract syntax tree (AST) from the concrete EBNF-syntax, as defined by Odersky (167). The resulting ASTs are relatively compact. The idea for making the tree compact is to create nodes only for parsed characteristic symbols, and to represent synonym symbols by adding additional labels. Each node is thus labeled by exactly one characteristic symbol and zero or more synonym symbols. Labeling of nodes is done by declaring a set or universe for each symbol. Adding a label  to a node  is done by putting  into universe . As a consequence, the characteristic universes partition the universe of AST nodes. For each characteristic universe  a Montage is given, specifying syntax and semantics of  ’s elements. Given a node, the associated Montage is referred to as ”its Montage”, and given a Montage, the elements of the corresponding characteristic universe are called the ”instances of the Montage”. 4 In the original publications (212; 167) the name of ”characteristic production” is ”generator production”, since only these productions generate a new node in the AST. We have chosen the name characteristic production, because they can be used to characterize the nodes as described above. 3.2. From Syntax to Abstract Syntax Trees (ASTs) 49 ¼ ¾ S2-D S-B ¾ ¼  S-C S1-D ¾ ¼ ¾ ¼  ¼¼ ¾ ½ ¾ S-B S-C ¾ ½  ¾ ½  Fig. 18: Instances of universe S2-D S1-D ½ ¾ ¼½ ¾ AST , definitions of selectors S-B, S-C, S1-D, S2-D The so called selector functions can be used to navigate through the AST. Selector functions are defined as follows. Each node  in the AST has been generated by some characteristic rule  ½¾     For each symbol  appearing only once on the right-hand-side of the rule, the selector function S-Z     maps  to its unique  -sibling. For each symbol  appearing more then once, the selector functions    Sm-Z  S1-Z S2-Z           map  to its first, second, ..., m-th   -sibling. Given for instance the rule A := B C D D, Figure 18 visualizes the situation for two instances  ¼ and ½ . In order to allow to traverse a tree in arbitrary ways we define in addition the function Parent which links each node with its parent-node in the tree. Example As a running example we give a small language . For the moment, we can abstract from the meaning of programs and consider them as examples for the construction of ASTs. The start symbol of the grammar is Expr, and the production rules are Gram. 1: Expr Sum Factor = Sum  Factor ::= Factor “+” Expr = Variable  Constant 50 Chapter 3. Montages Expr Sum S−Factor S−Expr Parent Factor Constant Name = 2 Parent Expr Sum 2 S−Digits Digits 1 S−Factor Parent Factor Variable 4 Parent 5 3 S−Expr Parent Expr 6 Factor Constant S−Ident Ident Name = "x" Parent 7 S−Digits Parent Digits Name = 1 8 Fig. 19: The abstract syntax tree for 2 + x + 1 Variable Constant ::= Ident ::= Digits The following term is an Ë -program: 2 + x + 1 As a result of the generation of the AST we obtain the structure represented in Figure 19. The labels indicate to which universes a node belongs, and the definitions of the selector functions are visualized as edges. The leaf nodes contain the definition of the attribute Name, which in turn contains the microsyntax of the parsed Digits- and Ident-values. The function Parent is visualized with the edges going from the leaves towards the root of the tree. 3.3. Dynamic Semantics with Tree Finite State Machines (TFSMs) 3.3 51 Dynamic Semantics with Tree Finite State Machines (TFSMs) In Montages, dynamic semantics is given by Tree Finite State Machines (TFSMs), a special kind of state machines which we deviced for allowing AST’s being executed without transforming them. The states of a TFSM are tuples consisting of an AST-node, and a state of the local state machine given for each node by means of its Montage. Execution of programs can be understood and visualized by highlighting the current node CNode in the AST and the current state CState in the corresponding Montage. If the state (CNode, CState) is visited, the action rule associated with CState is executed, using attributes and fields of CNode to store and retrieve intermediate results. Notational Conventions As mentioned, a language definition consists of a set of Montages, which defines a mapping from EBNF productions to local state machines, and indirectly from AST nodes to local state machines. Given these mappings, the states of a TFSM are tuples consisting of an AST-node and a state of its associated local state-machine. Throughout this text we are saying that a TFSM is “in state S of node N”, rather than the more precise formulation in the state being the tuple formed by state S, node N. Further we use the notion “state of a node’s Montage”, rather than the more precise, but lengthy formulation “state of the local state machine associate with a node via the Montage associate with the EBNF production which created the node. The local state machines and their embedding into the TFSM are given by means of Montages Visual Language (MVL). in the descriptions we will use the terms “local (finite) state machine” and “MVL-machine” to denote the machines associated with AST nodes, and we will use the terms “(finite) state machine” and “TFSM” for the global machine representing the dynamic semantics of an AST. TFSM transitions Transitions in TFSMs change both the current node CNode and the current state CState. A TFSM-transition  is defined to have five components, the source node sn, the source state ss, the condition , the target node tn, and the target state ts.        In the condition expression , the source node sn can be referred to as bound variable src, and the target node tn as bound variable trg. Typically conditions depend on attributes of the source and/or target node. The source state and target state cannot be referred to in the condition. A transition can be activated if its source node sn is equal to the current node CNode, its source state ss is equal to the current state CState, and if its condition evaluates to true; if a transition is activated, in the next state the current node CNode is equals the target node tn and the current state CState equals the target state ts. Montages Visual Language (MVL) The state machine of a Montages is given in Montages Visual Language(MVL). 52 Chapter 3. Montages Transitions in MVL are specifications for one or many TFSM-transitions. MVL defines how MVL-transitions of the Montages are instantiated with TFSM transitions. In Section 3.3.2 we give the corresponding definitions in form of the algorithm InstantiateTransition. Later in Section 3.3.3 this algorithm is used to construct a TFSM, in Section 3.3.4 the simplification of TFSMs is discussed, and finally in Section 3.3.5 their execution is described. More advanced features, allowing to specify families of transitions by means of references to lists and sets of nodes are introduced later in Section 3.4. Isomorphism between “flat” view and TFSM view In the following examples, as already in the while-example (Figures 11, 12, 13, 14, 15, 16, and 17), the MVL-machines are drawn repeatedly for each AST-node and therefore the states of these figures corresponds directly to TFSM-states. This visualization is called the “flat” view on TFSM, and is mathematically isomorphic with the TFSM model. In Figure 20 the isomorphism between the “flat” view and the TFSM view is illustrated. On both sides of the figure, the same AST with three nodes is shown, a parent node, and two sibblings. We assume that both sibblings are produced by the same EBNF rules, and consequently they are associated with the same MVL-machine. In the given example, this machine consists of exactly one MVL-state labeled a and a transition sourcing in a. The target of the transition is not specified in the current context. On the left-hand-side the “flat” intuition is shown, where the MVL machine is instanciated for each corresponding AST node. As a consequence, there are two instances of the same state a, and the transitions sourcing in a are departing from these instances. On the right hand side, the corresponding TFSM view is shown. The MVL machine is existing only once, and not instanciated. The states of the TFSM are not the states of the MVL machine, but tuples consisting of an AST node, and an MVL state of the corresponding machine. In our figure there are two such tuples, visualized as dotted double-headed arrows, labeled   . The MVL transitions sourcing in the MVL-state  correspond now to the two TFSM transitions sourcing in the TFSM tuple-states. a a (_,_) (_,_) a Fig. 20: Isomorphism between “flat” view and TFSM view 3.3. Dynamic Semantics with Tree Finite State Machines (TFSMs) 3.3.1 53 Example Language Ë Throughout this Section we use the previously introduced examples A, While, and the Montages presented here for the language Ë whose grammar has been introduced in Section 3.2. We show now informally how the MVL-state machines of the Montages together with the AST can be used to execute a program by intrepreting it as a TFSM. The same example will be used in the following sections as examples for the formal TFSM definitions. The programs of language Ë are arithmetic expressions which may have side effects and are specified to be evaluated from left to right. The atomic factors are constants and variables of type integer. The Montage for Sum is shown in Figure 21. The topmost part of this Montage is the production rule defining the context-free syntax consisting of a Factor and an Expr right-hand-side symbol. The second part defines the states and transitions of this construct by means of a MVL description. All transitions are labeled with the empty firing condition. The control enters the state machine at the ”I”-state, visits the state machine corresponding to the Factor-sibling, then the state machine corresponding to the Expr-sibling and finally the ”add”-state is visited, resulting in the execution of its action rule. The X ASM action rule, which is given in the third part accesses the value-attributes of the siblings of a Sum-instance, and assigns their sum to the value-attribute of the Sum-instance. Finally, the ”T”-state is visited being the final state of the Sum state machine. The Montages Variable and Constant are shown in Figure 22. Both of them contain exactly one state, the Variable-Montage’s state triggers a rule reading the value of the referenced variable from the CurrentStore, and the constant Montage’s state triggers a rule reading the constant value. Both actions set the value-attribute to the corresponding result. In Figure 23 we represents the MVL sections of these Montages as they are associated with the corresponding nodes of the AST we showed already in Figure 19. Visiting a state  in Figure 23, the current state CState is state  in the corresponding Montage, and the current node CNode is the node associated by the dotted line. Based on this “flat” representation, the boxes in the state machines can be replaced with the state machine corresponding to the sibling referenced by the box EBNF Sum ::= Factor ”+” Expr I S-Factor S-Expr @add: value := S-Factor.value + S-Expr.value Fig. 21: Montage components. add T MVL description (local state machine) X ASM transition rule 54 Chapter 3. Montages Variable ::= Ident lookup I T @lookup value := CurrentStore(S-Ident.Name) Constant ::= Digits I T setValue @setValue: value := S-Digits.Name Fig. 22: The Montages for the language Ë . I S−Expr S−Factor T add 1 S−Factor I setValue S−Expr 2 S−Digits 5 S−Ident lookUp T S−Factor S−Expr add 3 S−Factor 4 I I T S−Expr 6 I S−Digits 7 Fig. 23: The finite state machines belonging to the nodes. 8 setValue T T 3.3. Dynamic Semantics with Tree Finite State Machines (TFSMs) 55 label. The S-Expr box of the state machine associated with node 1 in Figure 23 is for instance replaced by the state machine associated with node 3, being the S-Expr sibling of node 1. In Figure 24 the resulting hierarchical state machine is represented. The AST-nodes associated with the states are here directly surrounding the states. In Figure 24 the hierarchy of the AST is visualized as nested boxes, labeled by the selector functions. This visualization corresponds to a MVL-description of the complete program. I I I setValue T S−Factor 2 I lookUp S−Factor T I 5 setValue T S−Expr 6 S−Expr add T 3 add T 1 Fig. 24: The constructed hierarchical finite state machine. We can even go one step further, transforming the hierarchical state machine into a flat one. Since we know that execution entry and exit points for each language construct are marked by the special states ”I” and ”T”, we replace each transition whose target is a box representing an AST node , by a transition whose target is (n, ”I”), and correspondingly we replace each transition whose source is a box representing an AST node , by a transition whose source is (n, ”T”). The resulting visualization is given in Figure 25. Each oval, I, and T represents directly a state in the TFSM, whose node component is given by the dotted arrow into the AST, and whose state component is given by the label. Since the ”I” and ”T” states are not associated with action rules, and since all transitions are labeled by the empty condition, the state machine of Figure 25 can be simplified into the one shown in Figure 26. 56 Chapter 3. Montages 1 S−Factor S−Expr 2 3 S−Expr S−Factor 5 6 I I I T I T setValue I T lookUp setValue T add add T add T Fig. 25: The flat finite state machine and its relation to the AST. 1 S−Factor S−Expr 2 3 S−Factor 5 I setValue S−Expr 6 lookUp setValue add Fig. 26: The simplified finite state machine and its relation to the AST. 3.3. Dynamic Semantics with Tree Finite State Machines (TFSMs) 57 At this point, we can understand the dynamic semantics of the program by executing the state machine. First, the initial state of the root node is visited. Then the following steps are repeated. 1. The action rule associated with the visited state is executed. 2. A control arrow whose firing condition evaluates to true is chosen, and the state it points to is visited next. If there is more than one possible next state, one of them is chosen nondeterministically. If there is no arrow with a predicate evaluating to true, an arrow with the default-condition is chosen. If there is no arrow with the default-condition either, the same state is visited again. 3. Goto step 1. Coming back to our example, assuming that CurrentStore maps to 4, the execution of the state machine in Figure 26 sets the value of node two to the constant 2, sets the value of node five to 4, sets the value of node six to 1, sets the value of node three to the sum of 4 and 1, and finally sets the value of node one to the sum of 2 and 5. 3.3.2 Transition Specifications and Paths Montages define a TFSM for each program of the specified language by giving the context-free grammar and a local state machine for each characteristic symbol in the grammar. The local state machine, given by means of MVL, consists of a set of states, associated with action rules, and a set of MVL-transitions. As mentioned, the states of the TFSM range over the Cartesian product of AST-nodes and MVL-states, and transitions have five components, the source, consisting of a source AST-node and a source MVL-state, the condition, and the target, consisting of a target node, and a target state. The MVL-transitions are considered to be transition specifications which are instantiated as TFSMtransitions. In this refined view an MVL-transition specification has three components, the source path, the condition, and the target path. The MVL visualization of a transition specification is an arrow from the visualization of the source path to the visualization of the target path. The condition of the transition specification is used as the label of the arrow. The MVL-elements for visualizing paths are boxes and ovals. A state of the MVL-machine is a special case of a path. With respect to an instance  of the Montages containing the MVL-elements, their semantics can be described as follows: ¯ The oval nodes are the states. The states are labeled with an attribute. It serves to identify the state, for example if it is the target of a state transition or if it is associated with an action rule. If a state is visited, the associated action rule is executed, such that intermediate results are saved and retrieved as attributes of  and its siblings. 58 Chapter 3. Montages ¯ There are two special kind of states denoting the entry and exit points of the MVL state machine. The initial state  , represented by the letter ”I”, denotes the first state visited, if the machine is entered. The terminal state ”T” denotes the last state visited. ¯ The rectangular nodes or boxes represent siblings of . They are labeled with the corresponding selector function. Boxes may contain other boxes and ovals. Boxes contained in other boxes represent siblings of siblings. Ovals in boxes represent the corresponding state of the node represented by the surrounding box. Later in Section 3.4 we will introduce special boxes referencing all elements in a lists of siblings as well as boxes referencing all elements of characteristic and synonym universes. A path can be represented visually by means of nested boxes and ovals, as discribed above, or textually. The textual representation of a path is a term which is recursively built up by the following operators siblingPath and statePath. ¯ siblingPath(Ident, Int, Path) The arguments of a siblingPath are Ident, the symbol of the sibling, Int, its occurrence, and Path, the relative path from the denoted sibling to the target of the full path. The relative path is never empty, since the target of a full path needs to denote a state. Occurrence undef is used for unique symbols in the right-hand-side of a grammar rule. The paths siblingPath(”A”, undef, N), siblingPath(”B”, 2, N), siblingPath(”C”, undef, siblingPath(”D”, undef, N)) are visualized as follows. The box N stands for an arbitrary relative path. N N S-A S2-B siblingPath(”A”, undef, N) siblingPath(”B”, 2, N) N S-E S-C siblingPath(”C”, undef, siblingPath(”D”, undef, N)) ¯ statePath(Ident) The argument of a state path is the name Ident of the state. The paths statePath(”e”), statePath(”I”), statePath(”T”), siblingPath(”A”, undef, statePath(”f”)), siblingPath(”B”, 2, statePath(”g”)), siblingPath(”C”, undef, siblingPath(”D”, undef, statePath(”h”))) are visualized as follows. 3.3. Dynamic Semantics with Tree Finite State Machines (TFSMs) e I statePath(”e”) 59 T statePath(”I”) statePath(”T”) f g S-A S2-B siblingPath(”A”, undef, statePath(”f”)) siblingPath(”B”, 2, statePath(”g”)) h S-E S-C siblingPath(”C”, undef, siblingPath(”D”, undef, statePath(”h”))) A special short-hand notation is allowed in the visual notation. If the source of a transition is not a state, but a box referencing a node, the transition is assumed to source in the ”T”-state of the corresponding node. Correspondingly, if the target of a transition is a box, the transition is assumed to target the ”I” state of the referenced node. The short-hand notation is allowed, since the ”I”-state is considered as a collector of all transitions incoming to a node, and the ”T”-state is considered as a starting point of all transitions leaving a node. According to the given definitions, we can now represent the MVLtransitions in the abstract A-Montage (Figure 9) as the following triples. Term 1: (siblingPath("B", undef, statePath("s1")), C1, siblingPath("C", undef, statePath("s2"))) (siblingPath("C", undef, statePath("T")), C2, statePath("s3")) (statePath("s3"), C3, siblingPath("B", undef", statePath("I"))) The source of the C2 transition, being a box, has been completed in the textual representation with state ”T”, whereas the target of the C3 transition has been completed with state ”I”. Another example is given by the following textual representations of the transitions in the While Montage (Figure 10). Term 2: (statePath("I"), default, siblingPath("Expr", undef", statePath("I"))) (siblingPath("Expr", undef, statePath("T")), src.value, statePath("profile")) 60 Chapter 3. Montages (siblingPath("Expr", undef, statePath("T")), default, statePath("T")) (statePath("profile"), default, siblingPath("Stm", undef, statePath("LIST"))) (siblingPath("Stm", undef, statePath("LIST")), default, siblingPath("Expr", undef", statePath("I"))) Please note, that the special treatment of lists, together with the state ”LIST” will be discussed later in Section 3.4. 3.3.3 Construction of the TFSM The construction of a TFSM for a given AST is done by instantiating for each instance of a Montage all transition specifications given in its MVL state machine. The instantiation of the MVL-transition specifications with TFSM transitions is done by the algorithm InstantiateTransition. Given a node  of the AST, and a transition specification    SourcePath Condition TargetPath of the corresponding Montage,  is instantiated as a TFSM transition  which is constructed as follows. The four global variables SourceNode0, SourcePath0, TargetNode0, and TargetPath0 are initialized such that SourceNode0 and TargetNode0 equal node , SourcePath0 is initialized with the SourcePath parameter of , and TargetPath0 is initialized with the TargetPath parameter of . ¼ SourceNode0 SourcePath0 TargetNode0 TargetPath0      SourcePath  TargetPath At each step, InstantiateTransition checks, whether SourcePath0 (or TargetPath0) is matching a term like siblingPath(Symbol, Occ, Path0). If so, the corresponding selector function for Symbol is applied to the SourceNode0 (respectively TargetNode0) resulting in node  ; the corresponding global variable SourceNode0 (respectively TargetNode0) is updated with the new node  and the global variable SourcePath0 (respectively TargetPath0) is updated with Path0. In the following pseudo-code "=˜" is used to denote ”matches a term like”, corresponding to pattern matching in functional languages. The pattern variables are marked with a &-sign. ¼ ¼ if SourcePath0 =˜ siblingPath(&Symbol, &Occ, &Path0) then 3.3. Dynamic Semantics with Tree Finite State Machines (TFSMs) 61 let n’ = (selector function (&Symbol, &Occ) applied to SourceNode0) in SourceNode0 := n’ SourcePath0 := &Path0 if TargetPath0 =˜ siblingPath(&Symbol, &Occ, &Path0) then let n’ = (selector function (&Symbol, &Occ) applied to TargetNode0) in TargetNode0 := n’ TargetPath0 := &Path0 After a number of steps, SourcePath0 matches a term like statePath(&srcS) and TargetPath0 matches a term like statePath(&trgS). At this point InstantiateTransition generates the TFSM transition  defined as follows. ¼ ¼   SourceNode0 &srcS Condition TargetNode0 &trgS Coming back to our running example, the transition specifications of the Montages Sum can be textually represented as follows. Term 3: Montage Sum: (statePath("I"), true, siblingPath("Factor", undef, statePath("I"))) (siblingPath("Factor", undef, statePath("T")), true, siblingPath("Expr", undef, statePath("I"))) (siblingPath("Factor", undef, statePath("T")), true, statePath("add")) (statePath("add"), true, statePath("T")) Transitions to and from boxes are directly represented as arrows to or from the corresponding I or T state. The corresponding textual representation of the transition specifications in Montages Variable and Constant is given below. Term 4: Montage Variable: (statePath("I"), true, statePath("lookup")) (statePath("lookup"), true, statePath("T")) Montage Constant: (statePath("I"), true, statePath("setValue")) 62 Chapter 3. Montages (statePath("setValue"), true, statePath("T")) The instantiation of the transition specifications for all nodes          in AST of the program example 2 + x + 1 results into the following list of TFSM transitions. (n1, (n2, (n2, (n2, (n3, (n5, (n5, (n5, (n6, (n6, (n3, (n3, (n1, "I", "I", "setValue", "T", "I", "I", "lookup", "T", "I", "T", "add", "T", "add", true, true, true, true, true, true, true, true, true, true, true, true, true, n2, n2, n2, n3, n5, n5, n5, n6, n6, n3, n3, n1, n1, "I") "setValue") "T") "I") "I") "lookup") "T") "I") "setValue") "add") "T") "add") "T") In fact, these transitions correspond exactly to the transitions in Figure 25, taking as source and target of a transition the combination of the states together with the nodes referenced by the dotted arrows. 3.3. Dynamic Semantics with Tree Finite State Machines (TFSMs) 3.3.4 63 Simplification of TFSM The simplification resulting in Figure 26 can now be described as follows. If there exists two transitions ½ ¾  ½ ½  ¾ ¾  ¾  ¿  ¿  ½  ¾  ¾ such that ¾ equals ”I” or ”T”, then ½ and ¾ can be replaced by transition ¿  ½  ½ ½  ¾  ¿  ¿  This simplification algorithm only works if there is exactly one ”I” and one ”T” arrow in a Montage and if ”I” and ”T” states are not associated with actions. Otherwise a more general simplification algorithm removes all states not having an action associate and combines incoming and outgoing transitions. In the upper part of Figure 27 we see a state/node pair (s, n) of a TFSM which is a candidate for removal from the TFSM transition graph. If the state  in the MVL-graph of the Montage associated with node  is not associated with an action rule, the   pair can be removed, and the incoming and outgoing transitions can be combined as visualized in the lower part of Figure 27. before simplification: ½ ½ ¾ ¾ ... ¼ ½ ¾     ½   ... ¼ ... ¼  ¾ ½ ½ ¾ ¾ ¼ ¼ ¼ ¼ ...   ¼ ¼ after simplification: ½ ½ ...     ½ ¼ ½ ¼  ¼  ¼ ½ ½ ½ ¼ ... ½ Fig. 27: A TFSM fragment before and after simplification ¼ ...   ¼ ¼ 64 Chapter 3. Montages 3.3.5 Execution of TFSMs Execution of the program is now done by an algorithm Execute, which has two global variables, CNode, the current node, and CState, the current state. At the beginning, CNode is the root of the AST, and CState is ”I”. CNode CState  root of AST    The core of Execute has two steps, which are repeated until the machine terminates. Termination criteria depend on the environment of the machine, e.g. whether the environment can change part of the machine’s state. 1. In the first step, the action rule of the state CState in the MVL state machine corresponding to CNode is executed. 2. In the second step, a TFSM transition CNode CState    is chosen, whose source node equals CNode, whose source state equals CState, and whose condition evaluates to true. If such a transition exists, CNode is set to tn and CState is set to ts. 3. Then repeat the process, starting at step 1. This general execution algorithm corresponds to the process described at the end of the example given in Section 3.3.1. This “core” algorithm is going to be formalized later in Section 6.1, in Section 6.4 it will serve as example for the new Montages tool architecture, and finally in Section 8.4.6 it is used as part of the formal semantics of the Montages formalism itself. 3.4. Lists, Options, and non-local Transitions 3.4 65 Lists, Options, and non-local Transitions We have omitted up to now the treatment of lists and options in the EBNF rules, as well as non-local transition specifications in MVL. Both lists and non-local transition specifications can be used to specify a transition which corresponds to a set of TFSM transition instances, rather than a single instance. In the presence of lists and non-local transitions, the algorithm InstantiateTransitions generates from one transition specification in MVL a set of transitions in a TFSM. In Section 3.4.1 we show the EBNF features to specify lists and options, as well as the way how the AST is constructed for such grammars, and how MVL-transitions from and to lists are instantiated in a TFSM with a family of transitions. The visual and textual representation of non-local transitions by means of so called global paths as well as the instantiation of transition specifications involving such paths is given in Section 3.4.3. In Section 3.4.2 we give the full specification of the algorithm instantiating the transitions by combining the definitions from Section 3.4.3 and Section 3.3.3. Finally in Section 3.4.5 we use a goto-language as example how a family of TFSM transition is generated for each transition specification in MVL. 3.4.1 List and Options In characteristic rules, the right-hand-side symbols can be in curly repetition brackets, denoting a list of zero to many instances, or in square option brackets, denoting an optional instance. An optional B instance can be specified as follows: A ::= ... [B] ... A possibly empty list of B instances has the following form A ::= ... {B} ... A comma separated list of B instances with at least one member can be specified as follows. A ::= ... B {"," B} ... The same kind of list with zero or more members can be given using a combination of curly and square brackets. A ::= ... [B {"," B}] ... The mapping into ASTs is defined such, that each of the above right hand sides is mapped into a list of B instances. Further the EBNF list { C D } parses sequences of C followed by D, but represents them as a list of C’s and a list of D’s, which are accessible with the corresponding selector functions. For instance a production L ::= { C D } parsing ”½ ½ ¾ ¾¿ ¿” results in two lists, 66 Chapter 3. Montages LIST a ½ ¾ S-E LIST b S-F ¿ Fig. 28: Examples for MVL-Transitions connecting lists. [C1, C2, C3], [D1, D2, D3] which are accessible via selectors S-C and S-D5 . The construction of the AST for lists and options works as follows. From the list or option operators the production creates an ordered sequence of zero, one, or more instances of the respective symbol enclosed in the operator is returned. This sequence is then transformed into an AST representation as follows. If it is ¯ of length 0, it is represented in the AST with a specially created node, which is an instance of universe NoNode. Consequently in the AST it cannot be seen whether an instance of NoNode has been generated by an option operator, or by a list operator. ¯ of length 1, it is represented in the AST as the node representing the unique member. In the AST we can therefore not see any difference between a list of length one, an instance produced from an optional symbol, or an instance produced from a normal symbol. ¯ of length 2 or longer, it is not transformed and represents itself in the AST. There are two ways to refer to a list with a path. The first possibility is to refer to the elements of the list. In the first case, a transition specification from or to a path denoting a list of nodes is instantiated with a family of TFSM transitions, one for each element in the node. Besides referring to elements of a list, it is possible to refer to the list itself, by using the LIST-box as source or target of a transition. In the textual representation the references to lists is represented by a special state LIST. As example we show in Figure 28 MVL-transitions between lists. The visual representation of paths denoting lists is the visual representation of the denoted element, surrounded by a special box labeled with LIST which visualized the list itself. Such list boxes can only contain a single symbol, and represent a list of instances of that symbol, as described above. The visualization of the involved paths relate to two lists, one of E-instances, and one of F-instances. As mentioned above, they can occur on the right-hand-side of the characteristic 5 A more flexible treatment of lists and options in Montages has been elaborated by Denzler (55) 3.4. Lists, Options, and non-local Transitions 67 production in any of the following forms, not changing anything in their visualization in MVL or representation in the AST. The list of possibilities is not complete.  ... E ... F ...  ... F ... E ...  ... E F ...  ... E ”,” E ... F ”,” F ...  ... [E ”,” E] ... [F ”,” F] ...  ... E F ”,” E F ... The ½ -transition in the figure connects the LIST-boxes. It specifies one TFSM transition, from the ”T”-state of the last element in the E-List to the ”I”-state of the first element in the F-list. The ¾ -transition connects the actual elements of the lists. It specifies a family of transitions, connecting the ”T”-state of each E-list element with the ”I”-state of each F-list element. Finally, the ¿ -transition specification connects the ”a”-state of each E-list element with the ”b”-state of each F-list element. In the textual representation the references to lists is represented by a special state LIST resulting in the following textual representation of the three MVLtransitions. Term 5: (siblingPath("E", undef, statePath("LIST")), c1, siblingPath("F", undef, statePath("LIST"))) (siblingPath("E", undef, statePath("T")), c1, siblingPath("F", undef, statePath("I"))) (siblingPath("E", undef, statePath("a")), c1, siblingPath("F", undef, statePath("b"))) 3.4.2 Extension of InstantiateTransitions The instantiation of transitions involving lists and options can be done by refining the algorithm InstantiateTransition of Section 3.3.3 with two cases, one for source nodes being lists and one for target nodes being lists. In both cases the algorithm InstantiateTransition is called recursively for each element in the list. In order to make the definition clearer, we assume that the initial values of the global variables are given as four parameters SourceNode, SourceState, TargetNode, and TargetState. The header of the algorithm is thus algorithm InstantiateTransition(SourceNode, SourcePath, TargetNode, 68 Chapter 3. Montages TargetPath) variables SourceNode0 SourcePath0 TargetNode0 TargetPath0 <<<<- SourceNode SourcePath TargetNode TargetPath loop ... and in the loop part, the source and target paths are simplified as described at the end of Section 3.3. The new cases for list processing are given as follows. if SourceNode0 = list L with more than 2 elements then for all elements l in list L call InstantiateTransition(l, SourcePath0, TargetNode0, TargetPath0) if TargetNode0 = list L with more than 2 elements then for all elements l in list L call InstantiateTransition(SourceNode0, SourcePath0, l, TargetPath0) The processing of the special LIST-states by the algorithm InstantiateTransition has to handle the special cases of NoNode-instances and normal nodes, since as we discussed, only lists with the minimal length two are represented in the AST as actual lists. If a MVL transition targets to a LIST-state of some path, there are thus two possibilities for the instantiation with a TFSM transition: ¯ If the target node is a list of nodes, the transition is instantiated with a transition going to the ”I” state of the first element in the list. ¯ Otherwise the transition is instantiated with a transition going to the ”I” state of the target node itself. The instantiation of MVL-transition whose source path is a LIST-state is treated correspondingly. ¯ If the source node is a list of nodes, the transition is instantiated with a transition starting at the ”T” state of the last element in the list. ¯ Otherwise the transition is instantiated with a transition starting at the ”T” state of the source node itself. The algorithm InstantiateTransition is now refined with two cases which are checked before the resulting TFSM-transition is generated. if SourcePath0 =˜ statePath("LIST") then SourcePath0 := "T" if SourceNode0 = list L with more than 2 elements then SourceNode0 := last element of L if TargetPath0 =˜ statePath("LIST") then TargetPath0 := "I" if TargetNode0 = list L with more than 2 elements then TargetNode0 := first element of L 3.4. Lists, Options, and non-local Transitions 69  Factor Factor (... , ..., , ”I”,  ) ( , ”T”, , ... , ...) (... , ..., , ”I”,  ) ( , ”T”, , ... , ...) (... , ..., , ”I”,  ) ( , ”T”, , ... , ...) Fig. 29: Examples for MVL-Transitions involving global-paths. Implicit Transitions A last important aspect of lists and options are implicit transitions in the TFSM. Implicit transitions are TFSM-transitions with the default-conditions which are added in order to provide for sequential data-flow in lists, and in order to guarantee, that control flows through the NoNode-instances. For each element in a list, except the last one, an implicit transition with default-condition is added from the ”T”-state of the element, to the ”I”-state of the next element in the list. For each NoNode-instance, an implicit transition from its ”I” to its ”T” state is added. 3.4.3 Global Paths For certain programming constructs like procedure calls, goto’s, and exceptions we need a way to specify a transition from or to nodes which are not siblings, but ancestors of the Montage. The nesting of boxes with selector functions allows us to access direct and indirect siblings. In order to allow for transitions from or to arbitrary nodes in the AST, we introduce the global path. The global path is visualized by a box labeled with a characteristic or synonym symbol. This box represents all instances of said symbol. Besides the already introduced path operators siblingPath and statePath we introduce thus a third one called globalPath. The parameters of a global path are the name of a characteristic or synonym symbol and a path. Control arrow to or from a global path denote a family of arrows to or from all corresponding instances. As in the case of boxes labeled with selector functions, incoming arrows are connected with the ”I”-state and outgoing arrows are connected with the ”T”-state. As an example consider again the AST from Fig. 19. A global path Factor would refer to nodes 2, 4, and 6 whereas a global path Sum would refer to nodes 1 and 3. In this constellation a MVL-transition into a global path Factor would denote 3 control arrows ending in the initial states of nodes 2, 4, and 6, a MVLtransition departing from the same global path would denote 3 control arrows departing from the terminal states of nodes 2, 4, and 6. The situation is depicted in Figure 29. A transition targeting and a transition sourcing in a global path Factor is shown, together with the instantiation as TFSM transitions. In order to process global paths, the algorithm InstantiateTransitions has to be refined again, this time with two cases calling InstantiateTransitions for each 70 Chapter 3. Montages instance of a universe. The new cases look as follows: if SourceNode0 =˜ globalPath(&Universe, &Path0) then for all elements n in universe &Universe call InstantiateTransition(n, &Path0, TargetNode0, TargetPath0) if TargetNode0 =˜ globalPath(&Universe, &Path0) then for all elements n in universe &Universe call InstantiateTransition(SourceNode0, SourcePath0, n, &Path0) 3.4.4 Algorithm InstantiateTransition We have now covered all aspects of InstantiateTransitions and can collect the combine the initial definition and the refinements to the following final version. Since we have not introduced a formal algorithmic notation yet, the code is given in an informal way, referring to well known concepts like calling procedures, updating variables, or ranging over lists. Later in Section 8.4.3, the fully formalized algorithm is given as ASM 57. Interestingly the fully formalized algorithm is neither longer nor more complex. 3.4. Lists, Options, and non-local Transitions algorithm InstantiateTransition(SourceNode, SourcePath, TargetNode, TargetPath) variables SourceNode0 SourcePath0 TargetNode0 TargetPath0 <<<<- SourceNode SourcePath TargetNode TargetPath loop if SourceNode0 = list L with more than 2 elements then for all elements l in list L call InstantiateTransition(l, SourcePath0, TargetNode0, TargetPath0) exit if TargetNode0 = list L with more than 2 elements then for all elements l in list L call InstantiateTransition(SourceNode0, SourcePath0, l, TargetPath0) exit if SourcePath0 =˜ siblingPath(&Symbol, &Occ, &Path0) then let n’ = (selector function (&Symbol, &Occ) applied to SourceNode0) in SourceNode0 := n’ SourcePath0 := &Path0 if TargetPath0 =˜ siblingPath(&Symbol, &Occ, &Path0) then let n’ = (selector function (&Symbol, &Occ) applied to TargetNode0) in TargetNode0 := n’ TargetPath0 := &Path0 if SourcePath0 =˜ statePath("LIST") then SourcePath0 := "T" if SourceNode0 = list L with more than 2 elements then SourceNode0 := last element of L if TargetPath0 =˜ statePath("LIST") then TargetPath0 := "I" if TargetNode0 = list L with more than 2 elements then TargetNode0 := first element of L if SourceNode0 =˜ globalPath(&Universe, &Path0) then for all elements n in universe &Universe call InstantiateTransition(n, &Path0, TargetNode0, TargetPath0) if TargetNode0 =˜ globalPath(&Universe, &Path0) then for all elements n in universe &Universe call InstantiateTransition(SourceNode0, SourcePath0, n, &Path0) else let SourcePath0 =˜ statePath(&srcS), TargetPath0 =˜ statePath(&trgS) in create TFSM transition (SourceNode0, &srcS, Condition, TargetNode0, &trgS) exit 71 72 Chapter 3. Montages 3.4.5 The Goto Language As an example language for transitions involving lists and global paths, we give a simple extension of the expression language   we introduced in the previous sections. In addition to expressions, the extended language features print, goto, and labeled statements. The new EBNF rules are given as follows. Gram. 2: Prog Statement Print Goto Labeled Label ::= = ::= ::= ::= = Statement “;”  Statement  Print  Goto  Labeled “print” Expr “goto” Ident Label “:” Statement Ident In Figure 30 we show two alternative, but equivalent Montages for the Progconstruct. The first solution introduces a list of statement by using a recursive EBNF rule and the square brackets denoting an option. Alternatively the second solution uses the curly list brackets to express directly a list. Prog ::= I Statement [”;” Prog] S-Statement S-Program Prog2 ::= T Statement ”;” Statement LIST I S-Statement T Fig. 30: The Montages Prog and Prog2. In the first case the sequential control has to be given explicitly, in the second case we use a special box for lists. Such LIST-boxes define as default sequential control flow. The print statement (Figure 31) fires an action using the X ASM syntax for printing to the standard output. Its use is to test the behavior of the other statements. The Labeled statement (Figure 31) is composed by a label and a state- 3.4. Lists, Options, and non-local Transitions 73 ment. It sends control directly to the statement-part, and has no further behavior attached. Label is a simple synonym for an identifier. Print ::= ”print” Expr S-Expr I print T @print: stdout := S-Expr.value Labeled Label ::= = I Label ”:” Statement Ident T S-Statement Fig. 31: The Montages Print and Labeled. The interesting Montage is the Goto Montage which is shown in Figure 32. The box labeled with “Labeled” is a global-path referencing all instances of the EBNF-symbol Labeled. The MVL-transition from the go-state to the exit-state within the Labeled reference denotes a family of TFSM transitions from the “go” state going to the ”I”-state of each Labeled-statement. The firing-condition  S-Label    S-Ident of these transitions depends from the source node src and the target node trg. The condition guarantees that the label of the target matches the identifiercomponent of the goto statement. If each label is used only once, this guarantees that the conditions are mutually exclusive for each Goto-instance. An example program in our language is A: print goto C: goto B: print goto 1; B; A; 2; C; the corresponding states and nodes of the TFSM are given in Figure 33. The result of executing the TFSM is the sequential printing of 1, 2, 1, 2, 1, 2,   . 74 Chapter 3. Montages Goto ::= ”goto” Ident go I trg.S-Label.Name = src.S-Ident.Name Labeled Fig. 32: The Goto Montage. Program Labeled Label A Print Const I initial Goto setValue Ident Labeled B Label 1 C Goto Ident print Labeled B A Print Const go setValue go Fig. 33: The nodes and states of the TFSM. Label initial Goto Ident C 2 print go 3.5. Related Work and Results 3.5 75 Related Work and Results The work on Montages was originally motivated by the formal specification of the C language (85)6 , which showed how the state-based Abstract State Machine formalism (ASMs) (80; 81; 97) is well-suited for the formal description of the dynamic behavior of a full-fledged main-stream programming language. At the risk of oversimplifying somewhat, we can describe some of these models (85; 224; 130) as follows. Program execution is modeled by the evolution of two variables7 CT and S. CT points to the part of the program text currently in execution and may be seen as an abstract program counter. S represents the current value of the store. Formally one defines the initial state of the functions and specifies how they evolve by means of transition rules. Some of the ASM models of programming languages assume that the representation of the program’s control and data flow in the form of (static) functions between parts of the program text is given. Others like the Occam model described in (27) use ASMs for the construction of the control and data flow graph. All of them use informal pictures to explain the flow graph. These pictures have been refined and formalized as the Montages Visual Language. 3.5.1 Influence of Natural Semantics and Attribute Grammars Another important experience before the definition of Montages was the use of Kahn’s Natural Semantics (110) for the dynamic semantics of the programming language Oberon (124). Although we succeeded due to the tool support by Centaur (34), the result was less compact and more complex then the ASM counterpart given by Haussmann and the author in (130); one reason is that one to carry around all the state information in the case of Natural Semantics. An important empirical result of this experiment was the fact that treatment of lists produced a relatively large number of repetitive rules. Therefore the definition of Montages included from the beginning a special treatment of lists, being part of the Montages Visual Language. The input from the Verifix project (73; 88) has helped to see the necessity of using attribute grammars (AGs) (122) for the definition of static semantics. Montages use AGs for the specification of static properties. Among the several mechanisms proposed for defining programming languages, AG systems have been one of the most successful ones. The main reason for this lies in the fact that they can be written in a declarative style and are highly modular. However, by itself they are unsuitable for the specification of dynamic semantics. The work of Kaiser on action equations (111; 112) addresses this problem by augmenting AGs with mechanisms taken from action routines proposed by MedinaMora in (151) for use in language based environments. In Appendix A we give a detailed comparison of Montages with action equations. Later Poetzsch-Heffter designed the MAX system (184; 185; 186) being the first system taking advan6 Historically the C case-study was preceded and paralleled by work on Pascal (80), Modula2 (157), Prolog (30), and Occam (28). 7 These variables are called dynamic functions in ASM terminology. 76 Chapter 3. Montages tage of combining ASMs with AGs. Further references to MAX will be given in Section 7.3. Action Equations and MAX can be considered as direct predecessors of Montages. In contrast to them Montages is a graphical formalism. 3.5.2 Relation to subsequent work on semantics using ASM While the Montages approach can be considered as a systematization of the existing ASM descriptions of programming languages (80; 157; 85; 224; 130; 156) a newer thread of ASM specifications is started by Schulte and Börger (33), braking among others with the tradition to using visual descriptions for control flow. This new thread uses a style similar to structural description methods such as Natural Semantics (110) and SOS (182), but the resulting ASM models are isomorphic to the kind of models defined by earlier ASM formulations of programming languages or by a Montages description. The combination of a declarative specification style and a formal model based on abstract syntax trees and control flow graphs can be unintuitive for the experts in structural semantics formalisms, which expect models where programs are formalized as terms, rather than trees, and where control flow is given over the term structure. At the same time the chosen mixture of two different styles make the resulting descriptions unfriendly for programmers, which have typically no background in structural description methods. A more promising approach in this direction is the MAX approach of Poetzsch-Heffter, where parse-trees are formalized as occurrence algebras (186), which allows to combine ASMs directly with a structural description method. The work of Poetzsch-Heffter contains as well a precise definition of upwards pattern-matching, which allows to access nodes further up in the tree. A similar technique is used by Schulte and Börger in the form of patterns with program points which are ”visualized” by tiny, prefixed Greek letters. Nevertheless, the new style of language descriptions by Schulte and Börger which has been further elaborated by Stärk for teaching in a theoretical computer science lecture at ETH Zürich (203) has led to an interesting correctness proof of translation from Java to the Java Virtual Machine (204). As an experiment we have reengineered with Montages a reproduction of the model of the imperative core of Java as given by Stärk. In our reproduction the textual rules are shortened from the original 85 lines to 29 lines, and the complete control flow is specified graphically. The given reproduction can be directly executed using the Gem-Mex tool and has been presented to the students of the ETH classes. Our reproduction of Stärks model is given in Appendix C. In Chapter 14 we show a corresponding state-of-the art Montages description of the same features and explain why our version is better with respect to compositionality. 3.5.3 The Verifix Project A further systematization of the traditional thread of ASM and Montages descriptions of programming languages has been developed by Heberle et al. (88; 3.5. Related Work and Results 77 87) in the context of the Verifix project (73; 88) which aims at a systematic approach for provably correct compilers. The Verifix approach uses a variant of Montages for the specification of source languages, and allows to use state-ofthe-art compiler technology. The Verifix variant of Montages is a combination of Montage’s style for dynamic semantics with traditional well-proven variants of attribute grammars, while our definition of Montages uses a more experimental version of attribute grammars which is described in Chapter 7. Heberle describes a method for correct transformations in compiler-construction and uses the Verifix variant of Montages as formal semantics for the source languages. In order to make the resulting proofs modular and repeatable, he defines the domain-specific language AL for giving action rules. AL is a specialized version of ASM, resulting from his analysis of existing ASM and Montages specifications of imperative and object-oriented languages. As a result, two independently developed specifications for the same programming language will typically be equivalent, if Heberle’s approach is followed, whereas Montages and traditional ASMs allow for many different specifications of the same set of constructs. On the other hand, if domain-specific languages are developed, the approach of Heberle can be more complex than the here presented approach. The proposal of Heberle can as well be generalized to a new way of structuring language descriptions based on Montages. Instead of using a fixed language such as X ASM for defining action rules, one could allow to plug in an arbitrary language. A DSL could then be developed by first defining an action DSL, such as AL, which is used to define action rules in the specification of the final DSL. The interface in order to use one language to define action rules of the specification of another language is relatively lean, in essence providing means to navigate the AST, and to read and write the attributes of the AST. A special case of this language specification structuring mechanism arises if some action rule executes recursively code of the specified language. This case has been implemented in the Gem-Mex tool and used by the author in some of the later referenced industrial case studies. 3.5.4 The mpC Project Another compiler project using Montages is the mpC parallel programming environment (69). Montages in used in this project in two different ways: first, the most sophisticated part of the language, the sublanguage of expressions for parallelization, is modeled using the Gem-Mex environment, second, the obtained formal specification is used for test suite generation (115; 114; 113). Modeling of mpC expressions in Montages framework helped to find several inconsistencies in the mpC language semantics and gave a lot of useful ideas for the code generation part of the compiler. The Montages specification of mpC expressions is used for three different purposes: ¯ Test cases generation. The static semantics part of the specification (syntax productions, constraints) is used to generate both a set of statically correct, and a set of statically incorrect programs, which constitute a positive and a negative 78 Chapter 3. Montages test suite, respectively. ¯ Test oracle generation. The dynamic semantics part of the specification, e.g. the execution behavior, is used for generating trustable output of a test program. The test oracle compares actual and trustable outputs for a particular test case. If the results are not identical the verdict is failure. ¯ Providing test coverage criteria. The specification coverage analysis demonstrates whether all parts of the specification are exercised by the test suite. If the coverage criteria are satisfied then no more test cases are needed, otherwise additional test programs should be added to the test suite. Several coverage Montages-oriented coverage criteria were developed. With help of the generated test suites the mpC team found more then 30 errors in the current compiler implementation, as a result the quality of the compiler was significantly improved (187). This case study demonstrated that Montages specification is a powerful tool for developing language test suites, which is an important part of the compiler development process. 3.5.5 Active Libraries, Components, and UML Montages together with the support environment Gem-Mex (9) can as well be seen as an active library as defined by Czarnecki and Eisenecker (51). According to the given definition, active libraries extend traditional programming environments with means to customize code for program visualization, debugging, error diagnosis and reporting, optimization, code generation, versioning, and so on. Gem-Mex provides such a meta-environment based on Montages, covering program-visualization, debugging, code generation, and versioning. Another example of an active library is the intentional programming system (197; 198). While fixed programming languages (both GPLs and DSLs) force us to use a certain fixed set of language abstraction, active libraries, such as Montages or intentional programming allow us to use a set of abstractions optimally configured for the problem at hand. They enable us to provide truly multi-paradigm and domain-specific programming support. Unfortunately Microsoft decided to keep details of the intentional programming system confidential, until they release it for commercial use. A direct comparison of Montages and intentional programming must thus be delayed to the official launch of intentional programming. From the existing publications we understand that intentional programming relies on pure transformation approaches for giving dynamic semantics, while Montages make the parse trees directly executable. The practical experience with Gem-Mex opened early the discussion on the need for a component based implementation of Montages. X ASM features a component system, which is used for this purpose. In Denzler’s dissertation (55) the use of component technology for Montages is explored in detail, and led to an alternative implementation based on Java Beans. 3.5. Related Work and Results 79 The disadvantage of Denzler’s approach is that it makes it more difficult to realize efficient implementations by means of partial evaluation. Further the low abstraction level of Java w.r.t. X ASM may permit less reuse, and it is more difficult to apply formal transformations such as partial evaluation. Nevertheless we belive that future industrial applications will follow the approach to use a main-stream host language and implement Montages as a pattern for language engineering on top of this language. Actions would be formulated directly in the host language, and the whole abstract syntax tree and tree finite-state machine would be provided as a framework for using the Montages pattern. At the moment we think the emerging executable Action Language for UML state machines (2; 229) is the best candidate, especially since it has many similarities with X ASM, and since a harmonization of Montages with UML terminology for state machines and actions would allow us to reposition Montages as a tool for Model Driven Architectures (25; 170), the OMG group’s variant of domain engineering and DSL technology (43; 148). 3.5.6 Summary of Main Results The following list summarizes the main results of Montages related applications and research. ¯ The language definition formalism Montages has been defined and elaborated over the last six years. The first version, published by Pierantonio and Kutter in 1996 (131; 133) has been step-wise refined, and simplified since then. Shortly after these publications Anlauff joined the Montages core team. The original formulation of Montages was strongly influenced by a case study where the Oberon programming language was specified (130; 132). The earliest case studies outside the Montages team were a specification of SQL by di Franco (58) and a specification of Java by Wallace (225). Other more recent case studies include the use of Montages as a front-end for correct compiler construction in the Verifix project (73; 88), applications of Montages to component composition (13), and its use in the design and prototyping of a domain-specific language (134). These have led to several improvements in the formalism which have been reported in (12). The here presented final version of Montages and its semantics has been influenced by a pure XML based semantics description formalism (126), which has been developed by the author for the company A4M applied formal methods AG (135). ¯ Three general purpose programming languages, Oberon (132), Java (225) and C (98), have been specified using Montages. These case studies have led to constant improvements of the tool and methodology such that all three language can now be described easily, with exception of certain syntax-problems. For example we cannot solve the dangling if problem. Another example of syntaxproblems is that we need to introduce more explicit naming conventions for classes and variable names in Java. It is fair to say, that Montages can and 80 Chapter 3. Montages has been used to specify real-world programming languages, if the syntax (not semantics) is simplified. The syntax problem can be solved by basing Montages on abstract syntax, as shown in the examples of Appendix A, or by using XML syntax (126). ¯ As final case study for this thesis, the Java language has been described again. The work of Wallace (225) has shown several deficiencies of Montages, if a language with the complexity of Java is described. Among other improvements, Wallace proposed to replace the original use of data-flow arrows with a much more general mechanism. Nevertheless we decided to replace data-flow arrows completely with AGs, which allows as well to solve the problems found by Wallace. The very detailed work of Wallace has then been partly adopted by Denzler, and later completed to a full Java description by the author. The most complex part of Java proved to be the specification of subtyping, name resolution, and dynamic binding. This part of the specification is shown in Appendix D as an example. It must be noted that the limited parsing capability of the current Montages implementation has forced us to introduce explicit syntax for resolving whether an identifier is a class, a method, or an attribute. Therefore one can argue that our specification does not completely cover name resolution. Although the length of the resulting Java specification has led to its exclusion from the text, it showed that such a description is feasible. All sequential features of Java have been specified such that they can be used in isolation, and reused in small sub-languages. The complete specification of Java has been split up in a total of fourteen sub-languages. Typically one language extends its predecessors. The extensions are very small, typically two to three new specification modules and half a dozen new definitions, and can often be reused in later stages without adaption. ¯ A library of reusable language concept descriptions has been elaborated from the new Java case study. This library is presented in Part III of this thesis. The semantic features of major object-oriented GPLs are covered in principle by these components and a full object-oriented language can be described by combining and adapting them. In fact, the library is structured again as a number of small languages, reusing each others specification modules. It will be difficult to model the exact syntax and semantics of other existing object-oriented GPL such as C++ without further adapting the library but for our purpose of having building blocks of GPL concepts reusable for DSL designs the library is very useful. ¯ Several DSLs have been developed with and applied by different industrial partners. The executed case studies are – The design and implementation of the data model mapping language CML for the bank UBS (134). This work has been done jointly with Lothar Thiele and Daniel Schweizer. 3.5. Related Work and Results 81 – The specification and implementation of the hybrid system descriptions language HYSDEL for the Automatics Institute at ETH Zurich (6). This work has been done jointly with Samarjit Chakraborty. – The design and implementation of three DSLs for a financial analysis generation software system of a small financial service provider. These languages have been shortly described at the end of Section 2.2 and are currently in productive use at one of Switzerland’s largest banks. – The specification and implementation of the SMS application language Eclipse for the company Distefora Mobile. The last two case studies have been executed by the author and Matthias Anlauff for A4M AG. ¯ Besides GPLs and DSLs the basic notation of another language description formalism called Action Semantics (158) has been described (7). This work has been done jointly with Lothar Thiele and Samarjit Chakraborty. ¯ The imperative prototyping language X ASM (5) has been designed, implemented, and tested by Anlauff and the author for the company A4M applied formal methods AG which is supporting and further developing the language under an open source license (8). X ASM is a generalization of the mathematical Abstract State Machine (ASM) formalism. X ASM is used not only for the definition of semantic actions but for the formalization and implementation of the complete Montages approach. The initial, non-formal definition of X ASM by Anlauff has now been formalized by the author, and a number of additional features and reusable techniques have been developed. The formalization and the newly designed features are presented in Chapter 4. Further a pure object oriented version of X ASM has been developed and specified by the author and an executable Montages description of this new language can be downloaded (128). ¯ X ASM has been used by Anlauff as DSL for the implementation of the Montages tool support Gem-Mex8 . Gem-Mex allows the language designer to generate for each specified language an interpreter, a graphical debugger, and language documentation (10). The design of these tools has been driven by the case studies. The use of X ASM for the implementation allowed a quick adoption of the environment to changes. Further the author has been able to influence the development of Gem-Mex on the X ASM level, without knowing the details of the underlying C-code. By using the DSL X ASM to implement the language description formalism Montages (respectively its tool set Gem-Mex), the development process of our team is a refined version of the three cycle process (Section 2.6, Figure 7). In 8 The current Gem-Mex implementation has been preceeded by work of Sèmi (193) on using Centaur for the tool support of Montages, and by a first Montages implementation based on Sather. 82 Chapter 3. Montages fact our process is a four-cycle process, resulting as a combination of the three cycle process with the two-cycle process (Figure 6), which are both embedded in our actual development process. – The two-cycle process is built by the GPL C which we use to develop the DSL X ASM, which in turn is used to develop the application Gem-Mex. – The three-cycle process is overlapping these cycles: X ASM is considered the GPL used to develop the language description formalism Montages, which is then used to develop an arbitrary DSL, which is used to develop applications. With other words, in the two-cycle aspects of our development process, GemMex is considered the resulting application, and in the three cycle aspect, the very same software, also know as Montages, is considered as the language description formalism, being the central building block of the three cycle process. Our four cycle process is visualized in Figure 34. ¯ Both the Montages meta-formalism, and the X ASM formalism have been specified and tested using Gem-Mex. The Gem-Mex meta-formalism description of Montages has been partly derived from a description of an XML based meta-formalism developed by the author for A4M Applied Formal Methods AG (126). The Montages- and X ASM-implementations generated by Gem-Mex from their Montages descriptions are fully functional, but cannot compete yet with hand written implementations. Their main purpose at the moment is the documentation of the design process of Montages and X ASM. In this thesis, an alternative X ASM definition of Montages is given in Chapter 8. This new semantics is specially designed to allow for a relatively efficient implementation by means of partial evaluation. In parallel we work on using Montages for bootstrapping X ASM in the context of the X ASM open-source project. The bootstrapping process for X ASM is visualized in Figure 35. 3.5. Related Work and Results DSL feedback en t oy m pl de de sig n development cycle of Xasm g implementation implementation tin tes g g tin tes tin tes de sig n Montages de sig n en t oy m pl de en t oy m pl de n tio ica n tio ica platform: Xasm development cycle of on ati ific ec sp if ec sp if ec sp platform: implementation development cycle of Xasm Montages feedback feedback Fig. 35: The bootstrapping of X ASM feedback en t application feedback Fig. 34: The four development cycles of the Montages team C de sig n de sig n de sig n de sig n development cycle of Montages platform: application implementation development cycle of feedback g tin tes g tin tes g tin tes g tin tes Xasm DSL implementation development cycle of platform: platform: Montages implementation implementation m platform: Xasm development cycle of de pl oy en t m de pl oy en t m de pl oy en t m de pl oy platform: C on ati ific ec sp on ati ific ec sp on ati ific ec sp on ati ific ec sp platform: 83 user feedback 84 Chapter 3. Montages Part II Montages Semantics and System Architecture 87 In the first part we discussed requirements for language definition formalisms, and introduced our language definition formalism Montages trying to fulfill the formulated requirements. The requirements discussed in Part I are all related to the needs of DSL designers, implementors, and users. As a consequence we have been able to report positive results about usability and expressivity of our approach. On the other hand, discussions with software developers and system engineers in the financial industry and in networking companies showed that our approach needs to fulfill various requirements related to the form, transparency, and quality of the resulting code, if it ever should have a chance for serious industrial applications, let alone for entering main-stream technologies. In other words, it is not enough to deliver a DSL with a very simple design. The developers which are responsible to support the DSL for the domain experts expect that not only the DSL is easy to understand and maintain, but as well the generated code. It is difficult to explicitly formulate these kinds of requirements, since they will largely depend on the environment in which the code is going to be used. In order to be able to meet as many as possible of the possible requirement which will show up in concrete situations, our approach should allow ¯ to influence the structure of the generated code, ¯ to influence the naming of identifiers in the generated code, and ¯ to clean the code from those parts which are only needed to make the approach general, but are not relevant or used in a concrete situation. As example, assume a DSL which features global variables and updates, and where x := x + 1 is an admissible program. The developers require that the system generates the code they are expecting: x := x + 1 At least for simple examples they need this kind of ”validation”, indicating whether the system is doing what they expect. As indirect requirement simple language descriptions and simple programs, such as the above x := x + 1 should result in simple generated code. The current implementation of Montages (10) generates for each specified DSL an interpreter, the complexity of the generated code is therefore independent of the complexity of the DSL programs. In order to improve the current implementation we are going to develop in this part a formal, executable semantics of Montages which serves directly as building block for a new system architecture. For the formalization of the semantics as well as for the other parts of the system architecture we use the ASM-language X ASM which is described in Chapter 4. For the sake of simplicity we abstract from the problem of implementing X ASM and present everything on the level of X ASM assuming that a transparent and relatively efficient implementation of X ASM exists9. 9 The X ASM Open Source project www.xasm.org is working on X ASM implementations. 88 Montages Specification of L program generator L L−program P L−interpreter Xasm input Fig. 36: Current Architecture of Montages System The here presented system architecture replaces the current implementation, where the specification of a language , written with Montages, from which a program generator creates an -interpreter. In Figure 36 these components are visualized, the generated interpreter is represented with a dashed box, and the user supplied language specification and program are solid line boxes. The interpreter works as usual, taking as input an -program  which is then executed. The program  does not influence the complexity of the generated interpreter. As stated above, the resulting problem is that we cannot expect simple code for simple programs. The current program generator is further designed as a proof-of-concept for the feasibility of complex Montages description, such as the description of general purpose languages. The implementation has not been tuned towards simplification of the generated code, and the generated interpreters are relatively complex, independent of the complexity of the described language. For our new architecture we developed with X ASM a meta-interpreter of Montages, reading both a specification of a language  (syntax and semantics) and a program  written in the described language , parsing the program according to the given syntax-description, and executing the program according to the given semantics description. By assuming that the language specification is fixed, we can partially evaluate (46) the meta-interpreter to a specialized interpreter of the specified language. Assuming in addition that the program is fixed, we can further specialize the interpreter into code implementing the program. In Figure 37 the specification of , written in Montages and the program  , written in , are shown as boxes on the left side. Both the  specification and  are input to the meta-interpreter which is written in X ASM, and visualized on the right side. The box below the meta-interpreter is a -interpreter, obtained by partially evaluating the meta-interpreter assuming that the specification of  does not change. The -interpreter box is dashed, showing that it has been generated by the system, rather than provided by the user. As usual, the interpreter takes as input the program  and executes it. Finally, from the interpreter a specialized  -implementation is obtained by partially evaluating the interpreter, assuming that the program  is not changing. Again the box is 89 input Metainterpreter Xasm Montages Specification of L partial evaluation L L−program P L−interpreter Xasm input partial evaluation Xasm P−implementation Fig. 37: New Architecture of Montages System drawn with dashed lines, since it does not have to be provided by the user. The detailed definition of the meta-interpreter is given in Chapter 8. A more detailed sketch of the partial evaluation process is given in Chapter 5. Using only partial evaluation would create the problem that the generated code inherits the more abstract signature of the language-specification level. As an example consider again a DSL with global variables and destructive updates. The syntax of an assignment may be given as: Assignment ::= Ident ":=" Expression and the semantics of the construct is given by an action in the X ASM language. We refer to the micro-syntax of the global variable as S-Ident.Name and to the value of the previously evaluated expression as S-Expression.value Assuming a hash-table Global( ) which holds the values of global variables, the following X ASM rule gives the semantics of the Assignment feature: Global(S-Ident.Name) := S-Expression.value Obviously, even if the variable and the expression partially evaluate to the values of the initial example, ”x” and ”x + 1”, the generated code will never be simpler than Global("x") := Global("x") + 1 In order to achieve the desired outcome, we need to parameterize the signature of the semantics rule. We extended our formalism such, that the signature of variables and functions can be given by a string-value in $-signs. This variant of 90 X ASM is called parameterized X ASM (PXasm) and is introduced in Chapter 5. In our example, we can now use a global variable with parameterized signature, rather than the hash-table. The new semantics of the Assignment feature is now: $S-Ident.Name$ := S-Expression.value On the left hand side the $-signs are used to refer to a global variable whose signature is given by the expression S-Ident.Name. Once the value of S-Ident.Name is fixed to ”x” the left-hand side can be simply specialized to global variable ”x”, and the code generated for our initial example is now the desired x := x+1 In Section 11.1 the detailed Montages semantics of an example language ImpV2 having this semantics is presented, and we invite the reader to consult this section for further details about our above example showing why not only partial evaluation but as well parameterized signatures are needed for our new architecture. Combining partial evaluation and parameterization of signature results in a techniques which works similar to template languages used for program generation (44; 45). In our case the actual “generation” of the program happens only if the partial evaluation results in a complete evaluation of the signatureparameters, whereas in traditional template languages the content of the templates can always be evaluated. Further our parameterization of signature is integrated with our development language X ASM in such a way, that programs can be executed even if partial evaluation did not completely evaluate the parameterized signature. In contrast, unevaluated templates are typically not valid programs. Another advantage of the new architecture is that the fixed meta-interpreter is much easier to test and maintain than the original interpreter generator. In the software development process of Gem-Mex, as visualized in Figure 34, the maintenance of the generator showed to be the most difficult part, since it was difficult to test whether the generator is really implementing the semantics of Montages. In contrast the meta-interpreter written in X ASM is very compact and serves both as semantics and implementation, there is thus no problem of mismatch between semantics and implementation. Although execution the meta-interpreter is far too slow for real applications, it can still be used to test. Once a problem is solved successfully with the meta-interpreter, one has confidence on the functionality of the system. The result of the partial-evaluation can then be tested against the existing reference implementation given by the meta-interpreter. Further we found that the partial evaluator gives us a lot of freedom to identify variable and static aspects of a system in a late stage, or even dynamically. We can choose freely which parts of the system should be interpreted, allowing them to be changed dynamically, and which parts are partially evaluated, resulting in specialized code. In Section 9.1.2 we show for instance how Montages 91 can be specialized and transformed using partial evaluation. The traditional choices of DSL interpreter or DSL compiler are only special cases of the possible choices: they assume that the language specification is fixed. In some cases it is beneficial to leave part of the language specification interpreted, or to assume part of the program input to be fixed. Often the partial evaluator must be called at run-time, for instance after a number of configuration files are read. The following chapters are building up the tools which are needed to define the new system architecture in a formal way. In Chapter 4 we introduce the specification language X ASM, in Chapter 5 X ASM is extended with features allowing for parameterized signature and partial evaluation, in Chapter 6 we apply the introduced techniques to simplify and compile TFSMs, in Chapter 7 the kind of attribute grammars used by Montages is formalized, and finally in Chapter 8 we give the Montages meta-interpreter serving in the new architecture both as semantics and implementation of Montages. 92 4 eXtensible Abstract State Machines (X ASM) eXtensible Abstract State Machines (X ASM) (4; 11; 5) has been designed and implemented by Anlauff as formal development tool for the Montages project. Recently X ASM has been put in the open source domain (8). Unfortunately a formal semantics of X ASM has not been given up to now. We streamline Anlauff’s original design and present a denotational semantics, complementing the existing informal description. In fact we found that X ASM implement a semantic generalization of Gurevich’s Abstract State Machines (ASMs) (79; 80; 81; 82). The initial idea for this generalization came from May’s work (150) which is the first paper formalizing sequential composition, iteration and hierarchical structuring of ASMs. May notes that his approach complements .. the method of refining Evolving Algebras 1 by different abstraction levels (31). There, the behavior of rules performing complex changes on data structures in abstract terms is specified on a lower level in less abstract rules, and the finer specification is proven to be equivalent. For execution, the coarser rule system is replaced by the finer one. In contrast, in the hierarchical concept presented here, rules specifying a behavior on a lower abstraction level are encapsulated as a system which is then called by the rules on the above level. (150), Section 6, page 14, 29ff X ASM embeds this idea in the form of the “X ASM call” into a realistic programming language design. The X ASM call allows to model recursion in a very natural way, corresponding directly to recursive procedure calls in imperative programming languages. Arguments can be passed “by value”, part of the state can be passed “by reference”, the “result” of the call is returned as value allowing for functional composition, and finally the “effects” of the called machine 1 Evolving Algebras is the previous name of ASMs. 94 Chapter 4. eXtensible Abstract State Machines (X ASM) are returned at once, maintaining the referential transparency property of nonhierarchical ASMs. Börger and Schmidt give a formal definition of a special case of the X ASM call (32) where sequentiality, iteration, and parameterized, recursive ASM calls are supported. In their framework a so called “submachine” is not executed repeatedly until it terminates, but only once. The X ASM behaviour of repeated execution can be simulated by explicit sequentiality, but unfortunately they are excluding the essential feature of both Anlauff’s and May’s original call to allow returning from a call not only update sets, but as well a value. This restriction hinders the use of their call for the modeling of recursive algorithms. Of course one could argue again, that returning a result from their “submachine” call can be simulated by encoding the return value in some global function, but the essence of ASM-formulations is to give a “direct, essentially coding free model” (81). The full X ASM call leads to a design where every construct (including expressions and rules of Gurevich’s ASMs) is denoted by both a value and an update set. This is a generalization of Gurevich’s definition of ASMs, where the meaning of an expression is denoted by a value and the meaning of a rule is denoted by an update set (82). In the context of this thesis, X ASM are used for defining actions and firing conditions of the Montages formalism, and the X ASM extensions defined in later chapters will be used to give formal semantics to Montages. In Section 4.1 ASMs are introduced from a programmer’s point of view looking at them as an imperative language, which can be used to specify algorithms on various abstraction levels. The denotational semantics of ASMs, as defined by Gurevich (82), is given in Section 4.22 . Based on a unification and generalization of this semantics, the X ASM extension of ASMs is motivated and formalized in Section 4.3. The complete X ASM language is a full featured, component based programming language. The features of a pure functional sublanguage of X ASM, including constructor terms, pattern matching, and derived functions are given in Section 4.4, and the support for parsing in X ASM is described in Section 4.5. Finally, in Section 4.6 we discuss related work. 2 The formalization of choose and extend chosen by Gurevich (82) are not standard denotational semantics and it may be argued that they are ambiguous. An inductive definition can solve this problem, but we wanted to build our definitions on Gurevich’s original formulation. 4.1. Introduction to ASM 4.1 95 Introduction to ASM ASMs are an imperative programming language. An imperative program is built up by statements, changing the state of the system, given by the current values of data-structures. A data structure is an abstract view of a number of storage locations. Typical Examples of data-structures are variables, arrays, records, or objects. Execution of statements results in a number of read and write accesses to visible and hidden storage locations. The higher the abstraction level of an imperative programming language, the more happens behind the scene for each statement. Ousterhout analyzes the increase of work done per statement for imperative languages of different abstraction levels, starting from machine languages, over system programming languages, reaching up to scripting languages (173). On average, each line of code in a system programming language such as C or Java translates to about five machine instructions, which handles directly read and write accesses to the physical machines. Scripting languages, such as Perl (223; 222), Python (219; 146), Rexx (71; 169), Tcl (172; 171), Visual Basic (which was ”created” as a combination of Denman’s MacBasic and Atkinson’s HyperCard (67)), and the Unix shells (145) feature statements which execute hundreds or thousands of machine instructions. 4.1.1 Properties of ASMs Unlike the statements of the mentioned programming languages, ASM statements are not executed sequentially, but in parallel. It is therefore difficult to compare ASMs with these formalisms, or to fit them in Ousterhout’s taxonomy. Rather than triggering a number of sequential steps of a given physical machine, the parallel rules define a new, tailored abstract machine. Therefore ASMs are very well suited to describe semantics of programming systems on various abstraction levels. The parallel execution of ASM statements allows to bundle an arbitrary amount of functionality into one state-transition of a system. In traditional imperative languages, regardless of whether they are machine, system, or scripting languages, the amount of work done in one step is fixed by the functionality of the statements featured by the language. In ASM it is therfore relatively easy to tailor a parallel block of statements, whose repeated execution results in a run of states corresponding exactly to the states of the algorithm to be modeled. Another important difference of ASMs with respect to the mentioned imperative formalisms is the absence of specialized value-types, data-structures, and control-statements. In ASMs there exist no integer, real, or boolean as valuetypes; the usual variables, arrays, or record data-structures are missing; neither while, repeat, nor loop statements are available. Instead the following solutions are chosen in the ASM formalism: value-types ASMs feature only one type of value, the elements. A typical implementation of ASMs provides a number of predefined elements, like numbers, strings, booleans, as well as elements which can be at runtime created using the extend 96 Chapter 4. eXtensible Abstract State Machines (X ASM) construct of ASMs; examples for such dynamically created elements include objects, as well as abstract storage locations. All of them are considered being elements. data-structures ASM feature a unique, universal, n-dimensional data structure that corresponds to an n-dimensional hash table. This data-structure is called n-ary dynamic function. A dynamic function f can be evaluated like a normal function, ½        where ½      are ASM elements. However it can also be updated, ½         ¼ where ¼ denotes the new value of f at the point ½      . The resulting definitions of the dynamic functions represent the state of an ASM, similar to the way how values of variable, arrays, and records represent the state of an imperative program. The single locations, consisting of function name and argument tuple can as well be considered to be the storage locations of an underlying abstract machine. 0-ary functions are used to model variables, unary functions are used to model arrays and records. A set or universe is modeled by its characteristic function, mapping all members of the universe to true, and all other elements to false. Functions mapping all arguments either to true or false are called relations. Universe is a synonym of unary relation. control-statements Instead of explicit loop or iteration constructs an ASM program is automatically repeated until it terminates. Termination condition is a fix point of state changes, i.e. if a rule generates no more updates, it terminates. To control the repeated execution of an ASM rule modeling an algorithm, ASMs feature an if-then-else statement, allowing to execute statements conditionally, and a number of statement-quantifiers, allowing to construct sets of statements depending on the current state. While these features look exotic for most programmers, they have shown to be useful in our context. Programming an algorithm in ASM allows to concentrate on the conceptual structure of the state, and the evolution of that state in a granularity which is completely controllable. Gurevich proves, that every sequential algorithm can be modeled by an ASM which makes exactly the same steps as the modeled algorithm is intended to do (84). The last property has been formulated in the ASM-thesis (79), and a large number of case studies have been elaborated for giving evidence to the thesis, not only with respect to sequential, but as well with respect to distributed algorithms. A summary of all case studies has been published (29) and further discussion of related work is found in Section 4.6. 4.1. Introduction to ASM 4.1.2 97 Programming Constructs of ASMs ASM statements are built by six different rule constructors. Update Rule The basic update rule is used to redefine an n-ary function at one point. Given the rule f(t1, ...., tn) := t0 first the terms t0, ..., tn are evaluated to elements   , and then the function f is redefined such that in the next state        holds. Please note that the equation       may never hold, since in parallel to the given redefinition of f, the functions used to build terms t0, ..., tn may be redefined as well, such that in the next state they evaluate to different elements. For instance the rule x := x + 1 will never result in a situation where  holds. But if before the execution of the rule  ¼ , then after the execution  ¼  holds. Parallel Composition ASM rules are composed in parallel. There are thus no intermediate states, if a block of ASM code is executed, and the order of ASM statements in the block does not influence the behavior. Further the same expression has the same value, independent where in the block it appears. This property is known as referential transparency (RT) from functional programming. If a language has RT, then the value of an expression depends only on the values of its sub-expressions (and not, for instance, on how many times or in which order the sub-expressions are evaluated). These properties influence considerably the style of the resulting descriptions. The standard example showing the effect of parallel composition of rules is the following swap of two variables 3 x and y. x := y y := x If this rule is executed, the values of x and y are exchanged. In contrast to sequential programming languages, there is no need to use a help variable, as done in the following minimal sequential version: tmp := x; x := y; y := tmp; Unlike the sequential version, the above parallel rule will never terminate, since it updates x and y in each step, and a state fix-point is thus never reached. The following example shows a terminating parallel rule. 3 Variables are 0-ary functions in ASM terminology. 98 Chapter 4. eXtensible Abstract State Machines (X ASM) Consider the situation, where we have three variables, ½ , ¾ , and ¿ . All of them are initially set to the value undef. In each step of the algorithm, ½ takes the value 1, ¾ takes ½ ’s value of the previous step, and ¿ takes ¾ ’s value of the previous step. It will thus take three steps, until the value 1 is propagated to ¿ . The ASM program AP corresponding to our algorithm is ASM 1: asm AP is functions x1, x2, x3 x1 := 1 x2 := x1 x3 := x2 endasm The variables are declared as dynamic functions with arity 0. By default, at the beginning all dynamic functions evaluate to undef. The requirements how values are propagated are directly expressed as the three parallel updates. After the first step of AP, ½ equals 1, but the remaining functions still equal undef. After the second step, both ½ and ¾ equal 1, but ¿ is still undef. After the third and all following steps, all three functions evaluate to 1. The system terminates after the fourth step, since the state of the system is no more changing, e.g. a fixed point has been reached. Consistency At this point we would like to raise the issue of inconsistent rules. If the same variable is updated to different values in parallel, for instance by the rule x := 1 x := 2, then an inconsistent state is reached and the calculation is aborted. Throughout the thesis we assume consistent rules, although it has to be noted that in general consistency of a rule cannot be guaranteed statically. Conditional Rules The conditional rule allows execution to be guarded with predicates. One special application of the conditional rule is to model sequential execution with ASMs. Typically a 0-ary function is used to model an abstract program counter mode. For instance the following sequential algorithm var x = 1, length = 10 array a, f 1 2 3 4 x := x + 1; a(x) := f(x); if x < length goto 1 end can be modeled as the following ASM. ASM 2: asm ModeTest is functions mode <- 1, x, length a(_), f(_) if mode = 1 then x := x + 1 mode := 2 4.1. Introduction to ASM 99 elseif mode = 2 then a(x) := f(x) mode := 3 elseif mode = 3 then if x < length then mode := 1 else mode := 4 endif endif endasm In fact most ASMs given in the literature follow more or less this pattern to model sequentiality. The advantage of ASMs is, that they allow us to abstract from low level intermediate steps. In typical ASM applications the number of sequential steps is relatively small and therefore the presented solution is acceptable. Do-for-all Rules The do-for-all rule allows to trigger an ASM rule for a number of elements contained in a universe and fulfilling a certain predicate. Given a universe U containing three elements e1, e2, e3 and the predicate Q over the dynamic functions and the bound variable u, the rule do forall u in U: Q(u) f(u) := 3 enddo corresponds exactly to if Q(e1) then f(e1) := 3 endif if Q(e2) then f(e2) := 3 endif if Q(e3) then f(e3) := 3 endif where Q(e) is Q(u) with the bound variable u replaced by the element e. As a further do-forall example, consider a generalization of algorithm AP ( ASM 1) to n variables instead of three. We number the variables and use a unary dynamic function x( ) mapping the number of a variable to its value. This corresponds to an array of variables. To trigger the updates, we use a rule quantifier, triggering the update x(i - 1) := x(i) for each i ranging from 2 to n. The argument n is passed as parameter to the ASM that looks as follows. ASM 3: asm AP’(n) is function x(_) do forall i in Integer: i >= 2 and i <= n x(i-1) := x(i) enddo This algorithm will terminate after n steps. 100 Chapter 4. eXtensible Abstract State Machines (X ASM) Choose Rules The choose rule works similar to the do forall rule, but the rule is only instantiated once for an element of the universe fulfilling the predicate. The ifnone clause of the choose-rule allows to give an alternative rule, if there is no such element. Given again a universe U containing three elements e1, e2, e3 and the predicate Q over the dynamic functions and the bound variable u, the rule choose u in U: Q(u) f(u) := 3 endchoose corresponds to the empty rule, if neither Q(e1), Q(e2), nor Q(e3) holds, otherwise to the rule f(e) := 3 where e is nondeterministically chosen from those elements in U for which Q holds, e.g. from !!    "!. As an example consider a situation where messages have been collected in a universe MessageCollector. A predicate ReadyToProcess( ) decides which of these messages can be processed. Processed messages are removed from universe MessageCollector. Please remember that universes are modeled by their characteristic function. An element e is therefore removed from the declared universe by the rule MessageCollector(e) := false. If there is no message remaining to be processed, the function mode is set from its initial value undef to ”ready”. For simplicity we give no details on Process( ) and predicate ReadyToProcess( ). ASM 4: asm ProcessMessages is universe MessageCollector function mode ... choose m in MessageCollector: ReadyToProcess(m) Process(m) MessageCollector(m) := false ifnone mode := "ready" endchoose endasm Extend Rules Extend rules allow us to introduce new elements. The rule extend C with o x := o endextend 4.1. Introduction to ASM 101 extends a universe C with a new element. This element is accessible within the extend-rule as bound variable o. The element is implicitly added to C by triggering C(o) := true. Further in the example, the new element is assigned to variable x. Intuitively this corresponds to a x := new C statement known from object oriented languages. These examples only give a rough overview of the existing programming constructs in ASM. The detailed definition and formal semantics are given in the next section. 102 Chapter 4. eXtensible Abstract State Machines (X ASM) 4.2 Formal Semantics of ASMs The mathematical model behind an ASMs is that a state is represented by an algebra or Tarski structure (207) i.e. a collection of functions and a universe of elements, and state transitions occur by updating functions point wise and creating new elements. Of course not all functions can be updated. The basic arithmetic operations (like add, which takes two operands) are typically not redefinable. The updatable or dynamic functions correspond to data-structures of imperative programming languages, while the static functions correspond to traditional mathematical functions whose definition does not depend on the current state. All functions are defined over the set Í of elements. In ASM parlance Í is called the superuniverse. This set always contains the distinct elements true, false, and undef. Apart from these Í can contain numbers, strings, and possibly anything – depending on what is being modeled. Subsets of the superuniverse Í, called universes, are modeled by unary functions from Í to true, false. Such a function returns true for all elements belonging to the universe, and false otherwise. The universe Boolean consists of true and false. A function f from a universe U to a universe V is a unary operation on the superuniverse such that for all    ,   # !  and f(a) = undef otherwise. Functions from Cartesian products of Í to Boolean are called relations. By declaring a function as a relation, it is initialized for all arguments with false. A universe corresponds to a unary relation. Both universes and relations are special cases of functions. The dynamic functions not being relations are initially equal to undef for all arguments. Formally, the state $ of an ASM is a mapping from a signature (which is a collection of function symbols) to actual functions. We use  to denote the function which corresponds to the symbol f in the state $. As mentioned above, the basic ASM transition rule is the update. An update rule is of the form ½        ¼ where ½        and ¼ are closed terms (i.e. terms containing no free variables) in the signature . The semantics of such an update rule is this: evaluate all the terms in the given state, and redefine the function corresponding to f at the value of the tuple resulting from evaluating  ½        to the value obtained by evaluating ¼ . Such a point wise redefinition of a function is called an update. Rules are composed in a parallel fashion, so the corresponding updates are all executed at once. Apart from the basic transition rule shown above, there also exist conditional rules, do-for-all rules, choose rules and lastly extend rules. Transition rules are recursively built up from these rules. The semantics of a rule is given by the set of updates resulting from composing updates of rule components. This so called update denotation of rules is formalized in the following. 4.2. Formal Semantics of ASMs Def. 1: 103 Update denotation. The formal semantics of a rule R in a state $ is given by its update denotation Upd  $ which is a set of updates. The resulting state-transition changes the functions corresponding to the symbols in in a point wise manner, by applying all updates in the set. The formal definition of an update is given as follows. Def. 2: Update. An update is a triple      where f is a n-ary function symbol in    and         are elements of Í. Intuitively, firing this update in a state $ changes the function associated with the symbol f in $ at the point         to the value  , leaving the rest of the function (i.e. its values at all other points) unchanged. Firing a rule is done by firing all updates in its update denotation. Def. 3: Successor state. Firing the updates in Upd  $  in the state $ results in its successor state $ . For any function symbol f from , the relation between  and  ·½ is given by  ·½         %             Upd   $  &'(% There are two remarks concerning this definition. First, if there are two updates which redefine the same function at the same point to different values, the resulting equations are inconsistent, and the next state  ·½ cannot be calculated. Consistency of rules cannot be guaranteed in general, and an inconsistent rule results in a system abort. The second remark is about completeness of the successor-state relation. The above complete definition of the next state (Definition 3) could be relaxed to a partial definition as follows: Def. 4: Partial successor state. Firing the updates in Upd  $   in the state $ results in its successor state $  . For any function symbol f from , the relation between  and  ·½ must be a model for the following equations:  ·½      %         Upd   $  The advantage of the partial definition is that the evolution of the part of the state which does not change is not specified at all, and therefore it is easier to combine such definitions. This advantage becomes visible in approaches where ASM rules are modeled as equation systems, for instance if ASMs are 104 Chapter 4. eXtensible Abstract State Machines (X ASM) modeled with Algebraic Specifications (125; 136; 177; 178). The complete definition results in an exploding number of equations (125; 136) while the partial definition allows to solve this problem elegantly (178). Further the partial definition Definition 4 allows to compose the equations of the subrules, whereas the complete definition does not allow for such a composition. The different forms of rules are given below. We use ) to denote the usual term evaluation in the state $. In all definitions,  ¼       are terms over . 4.2. Formal Semantics of ASMs Def. 5: 105 Update denotations of ASM rules. Basic Update if  ½         ¼ then Upd  $    ) ½      )  Parallel Composition if  ½     then Upd  $  Conditional Rules if R = if  then then Upd  $   ½ Upd     $   $ else  endif Upd  $ % )   '! Upd   $ &'(% Do-for-all if R = do forall x in U : Q(x) R’ enddo then Upd  $   Upd  $ where    )¼  ¼      )   " are U elements fulfilling Q.    is state $ with x interpreted as e. Choose if R = choose x in U : Q(x) R’ ifnone R” endchoose then Upd  $   ORACLE Upd  $  %   )    " Upd  $ &'(% where ORACLE is a nondeterministically chosen element     4 , fulfilling Q(e).      Extend if R = extend U with x R’ endextend then Upd  $  Upd  $      '!, where e does not belong to the domain or the co-domain of any of the functions in state $, i.e. is a new, unused element. 4 As mentioned,  is the definition of U in state . 106 Chapter 4. eXtensible Abstract State Machines (X ASM) 4.3 The X ASM Specification Language Due to the fact that the ASM approach defines a notion of executing specifications, it provides a perfect basis for a language, which can be used as a specification language as well as a high-level programming language. However, in order to upgrade to a realistic programming language, such a language must – besides other features – add a modularization concept to the core ASM constructs in order to provide the possibility to structure large-scale ASM-formalizations and to flexibly define reusable specification units. X ASM realizes a component-based modularization concept based on a unification and generalization of ASM’s rule and expression semantics. The unification of rules and expressions is done by considering each ASM construct, whether rule or expression, to have both an update set denotation, and to evaluate to a result, the so called value denotation. In addition to the existing ASM constructs, we introduce a new feature, so called external functions5. External functions can be evaluated like normal functions, but as a result, both a value, and an update set are returned. For each external function, we need to specify its update denotation and its value denotation. Both denotations can be freely defined. The formal definition of external functions, their denotations, and the propagation of these denotations through the existing ASM term and rule constructors is given in Section 4.3.1. While external functions make the calculation of rule sets, and thus the semantics of X ASM rules extensible, we introduce a second new construct called environment functions in order to make X ASM open to the outside computations. Environment functions are special dynamic functions whose initial definition is given as a parameter to an ASM. After an ASM terminates, the aggregated updates of the environment functions are returned as update denotation of the a complete ASM run. The formalization of ASM runs, environment functions, the update denotation of an ASM run in terms of state-delta, and the value denotation of an ASM run are given in Section 4.3.2. For intuition, it is a good idea to think about environment functions as dynamic-functions passed to an ASM as reference parameters, and about external functions as locally declared procedures. Having both concepts we can plug the two mechanisms together by defining update and value denotation of an external function by means of an ASM run. Thus the evaluation of such an external function corresponds to running, or calling another ASM. The environment functions of the called ASM are given as functions of the calling ASM. The details how an external function can be realized as ASM are given in Section 4.3.3. The formalization is given by using the definition of update and value denotations of an ASM, as defined in Section 4.3.2 as the definition of the update and value denotations of the realized external function. 5 In the context of ASMs the term ”external function” has been used in a different way. For the sake of simplicity we are using the term ”external function” only in connection with X ASM, and not with ASMs and we are always referring to the X ASM definition of ”external function”. 4.3. The X ASM Specification Language 4.3.1 107 External Functions In Section 4.2 the denotation of each ASM rule construct has been given as a set of updates. Denotation of terms has been formalized by means of the usual ) term evaluation. The denotation of each existing ASM construct is thus either a set of updates or an element, the result of its evaluation. The ASM constructs denoted by updates are the rules, and the ASM constructs denoted by values are the terms. The idea of eXtensible ASMs (X ASM) is to unify rules and terms, by considering each construct to have both an update and a value denotation. In pure ASMs rules would have the value denotation undef and expressions have the empty set as update denotation. In X ASM external functions are introduced as a new construct having both denotations. In order to avoid confusion with the standard )  function, we introduce a new function which gives the value-denotation. Def. 6: Value denotation. The value denotation of each rule or expression R in a state $ is defined to be an element of Í given by Eval  $ The external functions are declared using the keyword external function. Syntactically the external functions are used like normal functions. Function composition which involves external functions may thus result in updates, and we need therefore to redefine the update denotations of all rule constructions involving expressions, by refining Definition 5. In order to simplify the presentation of semantics, we denote the external function symbols with underlined symbols, for instance f. These symbols are grouped in the set  of external symbols. Def. 7: Extended signature. The signature external functions to signature . is extended with the symbols  of ¼ ¼   Since the external functions are not part of an ASM’s state, the definition of state $ is not affected, it is still a mapping from signature of dynamic functions to the actual definitions of these functions. However, terms can be built over the extended signature . ¼ Def. 8: Denotations of external functions. For each external function updates and value denotations in state $ are given by ExtUpd  ½        $ and ExtEval  ½        $   their 108 Chapter 4. eXtensible Abstract State Machines (X ASM) X ASM features interfaces allowing to give these definitions in arbitrary external languages, which leads to a non-formal system, or in X ASM itself, which leads to a formal system which is described in Section 4.3.3. In the following we give the definition of Upd and Eval for function composition of dynamic functions  , external functions   , and all six rule constructors. Update and value denotations of X ASM constructs. Assume in all following definitions Def. 9:  ¼       are terms over ,  ¼  Eval¼  $ and    and  evaluate to,  Eval  $ are the elements these terms   is the symbol of a dynamic function, and    is the symbol of an external function. Function Composition if  ½        then Upd  $   ½    $ Eval  $   ½        Ë   External Function Composition if  ½        then Upd  $  ExtUpd  ½        $ Eval  $  ExtEval  ½        $ Basic Update if  ½         ¼ then Upd  $    ½        ¼  Eval  $  undef Conditional Rules if R = if  then then Upd  $  Eval  $  else ´ Upd Upd ´ Eval Eval Ë Ë ½ ½      $    $  endif  $   $  $   $ Upd $ % Eval $  '! Upd $ &'(% % Eval $  '! &'(% 4.3. The X ASM Specification Language 109 Parallel Composition if  ½     then Upd  $   ½ Upd Eval  $  !      $ Do-for-all if R = do forall x in U : Q(x) R’ enddo then Upd  $   $     Upd  " $ ½ ¼ Eval  $  ! where ¾     Eval   " $ are U elements fulfilling Q.  $    is state $ with x interpreted as e. Choose if R = choose x in U : Q(x) R’ ifnone R” endchoose then Upd  $   ORACLE Upd"ORACLE $ Upd  $  %   Eval   " $ Upd  $ &'(% Eval          Eval     $   Eval $   ORACLE %   Eval   " $  $ &'(% where ORACLE is a nondeterministically chosen element    , fulfilling Q(e) in $. Extend if R = extend U with x R’ 110 Chapter 4. eXtensible Abstract State Machines (X ASM) endextend     then Upd  $  Upd  $     '! , Eval  $  ! where e does not belong to the domain or the co-domain of any of the functions in state $. ¼ 4.3.2 Semantics of ASM run and Environment Functions We have given the semantics of ASM rules and expressions in terms of defining the relation of one state to the next. In this section we formalize how the state of an ASM is initialized, by means of parameters and so called environment functions, and what is an ASM run. We give both value and update denotations of ASM runs. We mentioned earlier that dynamic functions are initialized everywhere with undef, except for relations, which are initialized everywhere with false. Parameters and environment functions allow to initialize functions with different values. As example we take the following ASM. ASM 5: asm InitializationExample(p1, p2) updates function f(_,_) accesses function g(_,_) is function h(_,_) R endasm The example shows two parameters, p1 and p2, two environment functions, f and g, and one normal dynamic function h. If the ASM is started, or called, actual values for the parameters have to be given, as well as definitions for the environment functions. Parameters result in normal, 0-ary dynamic functions, which are initialized with the actual value. Environment functions are used to initialize functions of arity higher than zero. As we can see, there are two ways to declare environment functions, one for read-only access as ”accesses” and the other for read-write access as ”updates”. In addition to such declared functions there is the special 0-ary function result which is used to return values from an ASM run. Intuitively environment functions correspond to reference parameters passed to an ASM call. The aggregated updates to these functions constitute the update denotation of an ASM run. In contrast parameters can be considered call-byvalue arguments. Updates to such arguments are possible in Xasm, but they have only local effects. The signature of the state of an ASM consists thus of the normal dynamic functions, the 0-ary dynamic functions initialized by actual parameters, the environment functions, and the special function result. 4.3. The X ASM Specification Language 111 Def. 10: Local and environment functions. The signature of dynamic functions is built by a set of locally defined functions  , the set of parameter functions  , the set of environment functions  and the special function result. All of them must be pairwise disjoint.    '!       '!  An ASM can now be called by providing it with actual parameters, and an initial state for the environment functions. Def. 11: ASM call. An ASM with rule R parameters        and environment func- tions  is called by the following triple        $  where         are actual values for the parameters of the ASM, and $ is a mapping from the function symbols of  to actual definitions for these functions. Given an ASM call, we can define the initial state of the called ASM as follows.        $  with parameters        the initial state $ of the called ASM is defined as follows. Def. 12: Initial state. Given an ASM call $ $             Given the definition of the initial state and of the next state relation we can define the fixpoint semantics of an ASM run as follows.        $ , the definition of the initial state $ of such a call, according to Definition 12, and the relation of state $ to $ , according to Definition 3, we define the fixpoint semantics as a mapping from ASM calls to final states or if there is no fixpoint. Def. 13: Fixpoint semantics. Given an ASM call        $   where  $ % % $  $   *  *  %  $  $  %  $  $  denotes a non-terminating call. Given the fixpoint semantics of an ASM call, we can define the update and value denotation of such a call. The value denotation is simply the value of function result in the final state of the call. 112 Chapter 4. eXtensible Abstract State Machines (X ASM) Def. 14: Value denotation of ASM call. Given an ASM call        $ , and the fixpoint semantics, according to Definition 13, the value denotation CallEval is the value of result in the final state of the call. CallEval       $   result   ½ The update denotation CallUpd of a call is given by the aggregated updates to environment functions. The aggregated updates are calculated by comparing the initial state and the terminal state of these functions. The comparison of states is done by state subtraction Def. 15: State subtraction. Given two states $ and $ over the same signature , the formal definition of state subtraction is $  $                          Í ½           ¾                Using this definition, the update denotation of an ASM call is defined as follows. Def. 16: Update denotation of ASM call. Given an ASM call        $ , the signature  of environment functions, the fixpoint semantics, according to Definition 13, and the definition of state subtraction according to Definition 15, the update denotation CallUpd is the environment part of the final state minus the initial state $ of the environment functions. CallUpd       $ 4.3.3          $     $ Realizing External Functions with ASMs After we specified both external functions, for which we need to give value and update denotations ExtEval and ExtUpd, and as well ASM calls, for which we defined value and update denotations CallEval and CallUpd, the next natural thing to do is to use the denotations of an ASM call as definitions of the denotations of an external function. With other words, we realize an external function with an ASM. The environment functions of the called ASM are naturally taken from the dynamic functions of the called ASM, and the resulting updates to these functions fit thus naturally in the update set of the calling ASM. The definition of update set and value denotations of an external function realized by ASM can now be given by using CallUpd and CallEval as definitions of ExtUpd and ExtEval. Def. 17: Denotations of ASM call. Assume the external function by the following ASM: to be implemented 4.3. The X ASM Specification Language 113 asm _f(p1, ..., pn) updates functions SIGMA_ENV is functions SIGMA_LOC R endasm where SIGMA ENV is the signature  of environment functions of the called ASM, and SIGMA LOC is the signature  of locally declared dynamic functions of the called ASM. Given a state $ of the ASM calling , the denotations ExtUpd and ExtEval are defined as follows. ExtUpd          $  CallUpd          $ ExtEval          $  CallEval          $     Examples Consider our previous example the ASM AP. An ASM AQ, can refer to AP, by declaring it as external “ASM” function, or short external function. ASM 6: asm AQ is function i <- 0 external function AP if i < 10 then i := i+1 AP endif endasm In AQ there is a local 0-ary function i, and the external function AP, which is realized as ASM. The if-clause in the rule of AQ guarantees that AP is called 10 times. Each time, AP is called, it runs until its termination, the final state of AP is interpreted as an update set, and the value of the function return in AP is used as return value. The update set generated by each run of AP is                Since all of the updated functions are local to AP, the generated update set has no effects on the state of AQ. Further, in this simple case, the value of return is undef, since there is no update to return in AP. Thus the value denotation of calling AP is undef. As second example consider two ASMs A and B. We abstract from concrete rules and consider A to execute the parallel composition of a rule Ra and a call to B, while B is considered to execute a rule Rb. A has locally defined functions        and B has locally defined functions +      + . ASM 7: asm A is functions a1, ...., an 114 Chapter 4. eXtensible Abstract State Machines (X ASM) external function B Ra B endasm ASM 8: asm B updates functions a1, ..., an is functions b1, ..., bm Rb endasm A A B B asm step . . . . . . B call as function Fig. 38: ASM A calls ASM B The interface of B determines that ASM calling B must provide dynamic functions ½      which are allowed to be updated by B. The situation of A calling B is visualized in Fig. 38. In each step of A, the rule Ra as well as ASM B are executed. If B is called, the current state of A’s functions is passed to B as the initial state of the environment functions. From this state, B runs until its termination, updating the state of its local functions as well as the state of the environment functions. After termination, the state of the local functions of B, is discarded, and the state of the environment functions is compared with their initial state, passed by the environment. The changes with respect to the initial state are returned as the update-denotation of the B-call. The updated-denotation of the B-call is combined with the updatedenotation of the Ra-rule, and applied to the current state of A. Only now A’s locally defined functions are really updated. The internal steps of B are not visible to A. From A’s perspective, calling B is considered an atomic action. The X ASM call provides thus an abstraction from sequentiality. Returning values We have mentioned several times the special role of the function result, but we have not shown its use and examples. Based on the above definitions, result must be declared as local function and updated like any other function. The termination of an ASM does not a priori depend on the state of result. A typical “factorial”-program would look as follows. 4.3. The X ASM Specification Language 115 ASM 9: asm factorial(n) is function result if result != undef then if n = 0 then result := 0 else result := n * factorial(n-1) endif endif endasm For convenience a shorthand notation allows the user to skip the explicit declaration of the variable ”result”, as well as the outer ”if result != undef”-clause, and it introduces the more intuitive syntax ”return x” instead of ”result := x”. Applied to the previous example, the shorthand notation results into the following formulation. ASM 10: asm factorial(n) is if n = 0 then return 0 else return n * factorial(n-1) endif endasm As a last example of this section, we would like to show a formulation of “factorial” which avoids call-recursion. ASM 11: asm factorial(n) is function n0 <- n, r <- 1 if n0 > 0 then r := n0 * r n0 := n0 - 1 else return r endif endasm Every tail-recursive algorithm can be reformulated in this iterative style. We will use this stile throughout the thesis, since it shows clearer how ASMs work. In the following variant of factorial we use the fact, that the parameters of an ASM can be used as normal 0-ary dynamic functions. ASM 12: asm factorial(n) is function r <- 1 if n > 0 then r := n * r n := n - 1 else return r endif endasm 116 Chapter 4. eXtensible Abstract State Machines (X ASM) 4.4 Constructors, Pattern Matching, and Derived Functions Most theoretical case studies using ASMs start with a mathematical model of some static system, formalized as a fixed set of statically defined functions and elements, and add a number of dynamic functions on top of this algebra. With the up to now discussed features, the static models must be either provided by an external implementation, or simulated with dynamic functions as well. 4.4.1 Constructors While experimenting with early versions of X ASM, we identified one mathematical concept which is on one hand often used, and on the other hand very awkwardly simulated with dynamic functions. The identified concept is freegenerated-terms. Unlike terms over dynamic functions, evaluating initially all to the same element undef, free-generated-terms, or constructors are expected to map to the same element, if and only if all their arguments are equal. This concept corresponds to free-data-types in functional programming languages like Standard ML (155; 40) or term algebras in algebraic specifications (65). X ASM features an untyped variant of classical constructor terms, as well as pattern matching and derived functions. These three features form a pure functional subset of X ASM. In Section 4.4 we give the details of these features. In functional languages, typically each element of a constructor is typed with some free-data-type. In contrast, the X ASM constructors take arbitrary arguments, even dynamically allocated elements, and construct a unique element from each unique sequence of arguments. The definition of the two constructors constructors zero, successor(_) is thus not only creating the elements zero, successor(zero), successor( successor( zero), . . .  , but as well unexpected elements like successor(true) or successor(¼ ), where ¼ is an element created by an extend-rule; since such dynamically elements elements do not correspond to any symbol for built-in constants, X ASM allows the user to define constructor-terms having no syntactical representation. 4.4.2 Pattern Matching In combination with constructors, it is very useful to have pattern matching and derived functions. As an example for pattern matching, consider an abstractdata-type stack, being specified by the following equations. !& )  !& )  ,  ,       ) ! , 4.4. Constructors, Pattern Matching, and Derived Functions 117 Two constructors empty and push( , ) are used to build stacks in the usual way. top( ) and pop( ) are declared as external functions and realized as ASMs. Within these ASMs, pattern matching is used. ASM 13: constructors empty, push(_,_) external functions top(_), pop(_) asm top(s) accesses function push(_,_) is if s =˜ push(&, &v) then return &v else return undef endif endasm asm pop accesses functions empty, push(_,_) is if s =˜ push(&s, &) then return &s else return empty endif endasm We see the pattern matching symbol “=˜” and the pattern variables, which all start with the symbol &. The plain symbol & is a placeholder for pattern variables, whose value is not used anymore. The matching-expression is given as condition of an if-then-else rule. If a match happens, the pattern-variables can be used, otherwise they cannot. Thus pattern-variables can only be used in the then-clause of an if-then-else rule. 4.4.3 Derived Functions A third construct which is useful in combination with constructors and patternmatching is the derived function. The value of derived function is defined by an expression. The derived function derived function f(p1, ..., pn) == t where t is a term build over and the parameters ½       , is semantically equivalent to an external function defined as follows. external function f(p1, ..., pn) asm f(p1, ..., pn) accesses ... is return t endasm 118 Chapter 4. eXtensible Abstract State Machines (X ASM) Tab. 3: Properties of X ASM function types Function Types dynamic function constructor derived function external function asm updatable? yes no no yes yes initial value undef free-generated calculated calculated calculated generate updates? no no yes yes yes Using derived functions, the above example ASM 13 can be reformulated as follows: ASM 14: constructors empty, push(_,_) derived (if s derived (if s 4.4.4 function top(s) == =˜ push(&, &v) then &v else undef) function pop(s) == =˜ push(&s, &) then &s else empty) Relation of Function Kinds Using only constructors, pattern-matching, and derived functions, X ASM can be used as a pure functional language. An arbitrary part of an X ASM specification can thus be written in the functional paradigm. However, if derived functions are defined over dynamic functions, their value depends on the state, and if derived functions are used in combination with extension functions, they may even produce updates. Table 3 lists the different types of functions in X ASM, as well as the information ¯ whether they can be updated, ¯ what is their initial value, and ¯ whether they generate new updates if they are evaluated6. We marked both external functions, as well as locally defined ASMs as updatable. This feature is useful to refine models, by replacing dynamic functions with external functions, for instance data-bases. The X ASM implementation is organized such, that first all read accesses to external functions are done, and then all updates. 4.4.5 Formal Semantics of Constructors The concept of terms built up by constructors can be mapped to the ASM approach as follows: each of the function names may be marked as constructive, expressing that constructor functions are one-to-one and total. 6 New updates are those resulting from the function evaluation itself, and not from the evaluation of the functions argument. 4.4. Constructors, Pattern Matching, and Derived Functions 119 Let  be the set of all constructive function symbols. If  , be of arity n, -  , be of arity m, and ½       ½       be terms over , then the following condition hold for all states $  of the ASM: ¼ (i) ½         - ½        iff   -  and   , and            for all  .   where     stands for the evaluation of the term t in state $ of the ASM. Informally speaking it means that each constructive function is total with respect to Í and injective. (ii) For all * / % ½  Í        Í  ½          ½        This means that constructive functions do not change their values with time, but whenever a new element is created, the domain of all constructive functions is automatically extended to the new element; from that moment on, all elements constructed from the newly defined element do not change in time either. If  , then f is called a constructor, and terms  ½        built only over are called constructor terms. In the following, we use the constructor term t as a synonym for its unique value    . 120 Chapter 4. eXtensible Abstract State Machines (X ASM) 4.5 EBNF and Constructor Mappings X ASM features specialized programming constructs to define EBNF grammars, to parse strings according to these grammars, and build during the parsing a constructor term representing the AST. In this section we introduce these programming language related features which have been integrated into the X ASM language as a means to support the implementation of various meta-programming algorithms, such as the later presented self-interpreter (Section 5.4), typecheckers, attribute grammar engine (Section 7), partial-evaluators (Section 5.5), as well as the specification and implementation of Montages in Section 8. The existing X ASM implementation features a relatively direct integration with the Lex/Yacc tool-set, supporting only BNF rules, instead of EBNF, and forcing the user to program the construction of constructor terms or other structures during the parsing. We introduce here a refined version where full EBNF rules can be specified, and where the construction of the terms representing the AST is done with a declarative mapping from EBNF productions into constructor terms. The purpose of our refined definitions is to allow for a complete specification of the parsing and AST construction process of Montages, without having to code the detailed construction, and especially without having to simulate EBNF with BNF rules. We abstract here from the problems of integrating our refined features with a specific parser generator. 4.5.1 Basic EBNF productions As mentioned in Section 3, the EBNF production rules are used for the contextfree syntax of the specified language L, and allow the generation of a parser for programs of L. Given an L program, a parser reconstructs the (recursive) applications of the EBNF productions such that the generated string corresponds to the program. The result of parsing is a syntax-tree, being formalized in our framework as a constructor-term built up during the parsing. The mapping from programs into constructor terms can be given by denoting for each EBNF production a constructor, and defining how the constructor-representations of the parsed symbols on the right hand side are embedded into the constructor term. Basic EBNF productions and the difference between characteristic and synonym productions have been given in Section 3.2.1. Characteristic productions References to the right-hand symbols in characteristic productions are done via their names, possibly marked by their number of occurrence. Assume      is a 4-ary X ASM constructor. A characteristic production A ::= B C D D extended with mapping => a(B, C, D.1, D.2) returns a constructor-term '-  '-  '-  '- , whose arguments '- are the constructor-terms returned by the parsed right-hand sides symbols. 4.5. EBNF and Constructor Mappings 121 Micro syntax In the case of variable terminals, the term Name returns the micro-syntax. For brevity we are not giving here the details how to define variable terminals, but of course we use the standard technique of regular expressions. For instance, the definition of a typical Ident symbol returning its micro-syntax could be given as follows. Ident = [A-Za-z][A-Za-z0-9]* => Name In all other cases, Name returns a string representation of the left-hand-side symbol. For instance, the following mapping of the above characteristic rule A ::= B C D D => characteristic(Name, B, C, D.1, D.2) results in a constructor term &' '%%   '-  '- '- '- where again arguments '- are the constructor-terms returned by the parsed right-hand sides symbols. Synonym productions For synonym productions, the chosen right-hand side is accessible as term rhs. A synonym production E = F | G | H => e(rhs) returns the term   where x is the chosen right-hand side. As an alternative one can return only the right-hand side, e.g. the production E = F | G | H => rhs returns directly the chosen right-hand side. Returning a constructor from a synonym rule allows to keep information which synonym rules have been triggered, while returning directly rhs allows to compactify the resulting terms. A third alternative is to map the results of the synonym-production into a special constructor synonym and to use the Name term to store which synonym rule has been used. The production E = F | G | H => synonym(Name, rhs) returns a constructor term   ,  '- where '- is the constructor term returned from parsing one of the right-hand side symbols. 122 Chapter 4. eXtensible Abstract State Machines (X ASM) Example As an example we extend the syntax rules of language (Gram. 1 in Section 3.2.2) with a mapping from parsed programs into constructor terms. Later in Section 4.5.3 the same grammar is used with an alternative mapping, using the above solutions with the ”characteristic” and ”synonym” constructors. The interested reader is invited to consult these examples already now. Gram. 3: Expr Sum Factor Variable Constant Ident Digits = Sum  Factor / expr(rhs) ::= Factor “+” Expr / sum(Factor, Expr) = Variable  Constant / factor(rhs) ::= Ident / variable(Ident) ::= Digits / constant(Digits) = [A-Za-z][A-Za-z0-9]* / ident(Name) = [0-9]+ / digits(Name.strToInt) If a ”Sum” is parsed, the constructor sum( , ) is returned, having as first argument the constructor returned for the parsed ”Factor”, and as second argument the constructor returned for the parsed ”Expr”. If one of the synonyms is parsed, the chosen right-hand side is returned as unique argument of the constructor corresponding to the synonym. For instance an instance  ¼ of symbol Expr is returned as term expr(¼ ). If a ”Variable” is parsed, constructor variable( ) with the representation of the Ident as argument is returned, and finally, if a ”Constant” is parsed, the constructor constant( ) with the representation of the Digits is returned. Finally, Ident and Digits return constructors with their micro-syntax as arguments. Considering again the example program ”2 + x + 1” of Section 3.2.2, the textual version of the constructor term resulting from applying the above mapping is given as follows. Term 6: expr(sum(factor(constant(digits(2))), expr(sum(factor(variable(ident("x"))), expr(factor(constant(digits(1)))) ) ) )) A visualization of this term can be seen on the left-hand side of Figure 40. 4.5.2 Repetitions and Options in EBNF On the right-hand side of characteristic productions, not only non-terminal symbols, but repetitions and options are allowed. Repetitions and options are treated 4.5. EBNF and Constructor Mappings 123 similar to the way how they are treated in Montages, as described in Section 3.4. Symbols within curly repetition brackets return a list of representations of the corresponding symbol. The EBNF list { A B } parses sequences of AB, but returns as A a sequence of A, and as B a sequence of B. For instance a production L ::= { A B } => l(A, B) parsing ” ½ ½ ¾ ¾ ¿ ¿” results in constructor term l([A1, A2, A3], [B1, B2, B3]) Further, a single symbol, followed or preceded by a list containing the same symbol and possibly some terminals is collected in one list. The EBNF clause A {";" A} {A ";"} A are both parsing sequences like A;A or A;A;A, and return as A one list of A instances. Symbols within square option brackets return an empty list, if the optional symbol is not present and the representation of the symbol otherwise. This is especially practical in combination with the above feature, since an EBNF clause ["(" A {";" A} ")"] is returning as A an empty list, if nothing is present (as defined by the rule for square brackets), a single A, if one A is present, and a list of A’s, if two or more A’s are present (as defined by the rule for curly brackets.) 4.5.3 Canonical Representation of Arbitrary Programs In addition to the possibility of defining custom mappings, we define a default, canonical mapping into a generic term representation using the above mentioned constructors characteristic and synonym. This canonical mapping is later used as starting point to construct ASTs like those introduced in Section 3.2. ¯ Given a characteristic EBNF rule A ::= B C D D the generic mapping is => characteristic(Name, [B, C, D.1, D.2]) ¯ Given a synonym rule E = F | G | H 124 Chapter 4. eXtensible Abstract State Machines (X ASM) the generic mapping is => synonym(Name, rhs) where rhs is an operator allowing to access what comes back on the right-hand side.  Given a terminal ”x”, the generic mapping is omitting the terminal.  Given a right-hand side symbol within a list, the mapping is that symbol. For instance the Rules K ::= K ::= K ::= { L } L {"," L} ["(" L {’;" L} ")"] all result in the mapping => characteristic(Name, L)  Correspondingly, if a symbol is in option brackets, the mapping is the symbol. Following this rules it is possible to write a generator, taking as input a term representation of EBNF rules, and outputting a term representation of the same EBNF decorated with constructor mappings according to the above description of a canonical mapping. This generator is called GenerateEBNFmapping( ). For the sake of brevity, we are not giving the full definition of this generator. Example Given again the grammar (Grammar 1 in Section 3.2.2) the result of applying GenerateEBNFmapping( ) is the following grammar. Gram. 4: Expr Sum Factor Variable Constant Ident Digits Sum  Factor / synonym(Name, rhs) ::= Factor “+” Expr / characteristic(Name, [Factor, Expr]) = Variable  Constant / synonym(Name, rhs) ::= Ident / characteristic(Name,[Ident]) ::= Digits / characteristic(Name, [Digits]) = [A-Za-z][A-Za-z0-9]* / terminal(”Ident”, Name) = [0-9]+ / terminal(”Digits”, Name.strToInt) = 4.5. EBNF and Constructor Mappings 125 Expr Sum synonym −− "Expr" 1 characteristic −− "Sum" S−Factor onym synonym −− "Expr" onym characteristic −− "Sum" Factor minal "Factor" −− synonym synonym −− "Expr" "2"s "Variable" −− synonym "Ident" −− terminal "x" synonym −− "Factor" synonym −− "Constant" synonym −− "Digits" Expr Sum 2 Constant S−Digits Digits Name = 2 S−Expr S−Factor Factor Variable 4 S−Ident Ident 5 3 S−Expr Expr 6 Factor Constant S−Digits 7 Name = "x" "1" Digits Name = 1 8 Fig. 39: The canonic constructor term and the abstract syntax tree for 2 + x + 1 As we can see, in contrast to the customized mapping of Grammar3 in Section 4.5, the canonical mapping uses only the generic constructors synonym and characteristic. Considering once again the example program ”2 + x + 1” of Section 3.2.2, the textual version of the resulting constructor term is given as follows. Term 7: synonym("Expr", characteristic("Sum", [synonym("Factor", characteristic("Constant",[terminal("Digits",2)])), synonym("Expr", characteristic("Sum", [synonym("Factor", characteristic("Variable", [terminal("Ident","x")])), synonym("Expr", synonym("Factor", characteristic("Constant", [terminal("Digits", 1)]))) ])) ])) Compared to the customized version Term 6 the above term is longer, but it is easier to process this kind of generic terms, using only a fixed set of constructors, in a generic way. In Figure 39 we show on the left-hand side a visualization of the constructor term, resulting from the application of the new, canonic mapping. The mentioned customized mapping (Grammar3 in Section 4.5) is visualized in Figure 40. The right-hand side of both figures show the parse tree which needs to be created for the Montages models. The advantage of the canonical mapping is, that a generic X ASM formalization of the parse tree creation can be given more easily. 126 Chapter 4. eXtensible Abstract State Machines (X ASM) 4.6 Related Work and Results ASMs are a combination of parallel execution, treatment of data-structures as variable functions, and implicit looping. Parallel execution is well know from Hardware description languages like VHDL (144). The treatment of data structures as variable functions is known from early work on axiomatic program verification (95; 59) and has been stated explicitly in (190). While existing work aimed at modeling concrete memory- or data-structures in hardware or software, Gurevich’s ASM are defined as dynamic versions of Tarsky structures (207). Based on the fact, that Tarsky structures are the most common tool of mathematicians to describe static systems, they are logical candidate to represent in a most general way a single state of a dynamic computation. Another field using structures to describe static systems are algebraic specifications (72). As well in that field it has been observed that the absence of state makes many interesting applications infeasible. This lead to work proposing extension of algebraic specifications with state (52; 17; 181). Unlike these approaches, ASMs allow to define evolution of the state in the most direct form: by explicit enumeration of the pointwise difference from one state to the next. All other approaches try to reduce the allowable state-updates to a minimum, in order to guarantee the preservation of certain properties from one state to the next. In contrast to this, ASMs allow to make arbitrary many changes from one state to the next. Still Gurevich’s initial program for ASMs is pure mathematical: a mathematically defined dynamic system, which would allow to model arbitrary algorithms. His thesis (79) is that unlike Turing machines (211) his machines would allow to model algorithm without encoding data-structures and splitting execution steps. He observed that every conceivable data-structure can be modeled as a Tarsky structure, and every possible state change of the algorithm can be modeled by a set of explicit, pointwise changes to the structure. A proof of the thesis for sequential algorithms is given (83; 84). This pure mathematical program, has been implicitly transformed in a computer science project, by defining a concrete rule-language for constructing the update sets. While in earlier publications (79; 80) Gurevich is investigating the concept of dynamically changeable Tarsky structures, later he defines a set of fixed, minimal languages for defining rules (81). ASMs are then defined to correspond to this rule-programming-language, and under this interpretation the thesis has subsequently provoked a lot of polarization among computer scientists. The lack of modularization and reuse feature in the proposed languages is for computer scientists not compatible with the claim, that arbitrary algorithms can be modeled on their natural abstraction level. While the initial mathematical meaning of this sentence makes a lot of sense, it contradicts computer scientist’s experience, if “algorithm” is interpreted as software or hardware, and “modeled” is interpreted as “prototyped” or even “implemented” in a feasible and maintainable way. However, the debate on ASMs in computer science has led to an impres- 4.6. Related Work and Results 127 sive collection of case studies, each of them using ASMs to model a system which is considered to be complex. Examples are referenced in the annotated bibliography (29). While most models try to restrict the used rule-languages to the predefined ones, in many cases additional machinery has been used in order to manage the complexity. Such machinery reuses typically common concepts from programming languages. The functional programming paradigm has been considered as the best candidate for extending the minimal rule-languages. The reason is, that many theoretical ASM case studies use a considerable amount of higher mathematics to describe the static part of algorithms. Functional programming is ideal to model higher mathematics and it uses modularization concepts based on mathematical concepts. This approach has led to a number of ASM implementations based on functional languages (220; 54). Odersky (168) proposes the opposite way, e.g. to use variable functions as an additional construct in functional programming languages. In both cases a functional type system is proposed. The introduction of such a type system is helpful for cases where the described algorithms fits well into the type system. On the other hand, Gurevich’s original untyped definition of ASMs still provides the highest level of flexibility. We do not know of an ASM implementation based on functional languages which provides an implementation of the original, untyped definition of ASMs. Today’s software systems reached a level of complexity leading to use of multiple paradigms (48). Our experience shows that untyped ASMs are useful to use different paradigms in parallel. The idea behind X ASM is to start with Gurevich’s untyped definition of ASMs (80) and to make it extensible. The exact mechanisms have been discussed before. Unlike other extensions of ASMs, the X ASM approach does not alter the semantics idea of Tarsky structures and update sets. The only difference of X ASM to Gurevich’s ASMs is, that we allow extensible rule languages. Since the means for extension are again ASMs, the X ASM call can be seen as well as a way to structure ASMs. An algebraic view of a similar structuring concept has been given by May in (150). The X ASM call is a special case of notions defined in (150). While May applies the state of the art in algebraic specification technologies to ASMs, the idea of X ASM is to generalize the original idea of Gurevich, resulting in a more practical specification and implementation tool. Unlike many other proposals for extending ASMs, the X ASM approach tries to follow Gurevich’s style to introduce as few concepts as possible. In fact, the X ASM call, which is a simple generalization of Gurevich’s denotational semantics of ASMs (82), is the only new concept and can be used to define all other extensions. Another field using structures to describe static systems are algebraic specifications (72). As well in that field it has been observed that the absence of state makes many interesting applications infeasible. This lead to work proposing extension of algebraic specifications with state (52; 17; 181). Unlike these approaches, ASMs allow to define evolution of the state in the most direct form: by explicit enumeration of the pointwise difference from one state to the next. All other approaches try to reduce the allowable state-updates to a minimum, 128 Chapter 4. eXtensible Abstract State Machines (X ASM) in order to guarantee the preservation of certain properties from one state to the next. In contrast to this, ASMs allow to make arbitrary many changes from one state to the next. Some newer work on modeling transition systems with algebraic specifications (125; 136; 177; 178) led to the Especs formalism which allows to map full ASMs into their framework, combining their power with the structuring and refinement techniques of algebraic specifications. Based on our experience we would like to challenge the ASM thesis as follows. Agreeing on the choice of Tarsky structures and update set for modeling algorithms, we claim that the current choice of ASM constructs is not able to fulfill the ASM thesis. There are two problems with the current rule-language. ¯ Although theoretically every update set can be denoted by an appropriate ASM rule, the abstraction level how the update set is calculated is fixed. ¯ Although theoretically an arbitrary signature can be chosen, the abstraction level for defining this signature is fixed. We propose to remedy these problems by extending ASM such, that both the update sets, and the definitions of signatures can be calculated by means of another ASM. The X ASM call is a way to calculate updates sets with other ASMs, and Mapping Automata (101) (Appendix B) or parameterized X ASM (Chapter 5) are proposals how to use ASMs to calculate the signature. It would go beyond the scope of this thesis to discuss whether this is a real challenge of the ASM thesis or whether it is only an indication that the choice of a fixed rule language should be reconsidered. The X ASM language is fully implemented and available as Open Source (8). The system is used as the basis for the Montages/Gem-Mex, where generated X ASM code is translated into an interpreter for the language specified using Montages. Other case studies are an application to microprocessor simulation (208) and the application of X ASM as gluing code in legacy systems (13). Additional theoretical applications outside the ASM area are possible, since ASMs can be considered as an instance of so called transitions system models (194), which form as well the basis for other popular formalisms such as UNITY (41), TLA (140), SPL (149) and the SAL intermediate language (194). Using Montages, both syntax and semantics of new or alternative X ASM constructs can be developed in the integrated development environment Gem-Mex. Such an extensible system architecture allows to tailor X ASM as a tool for one of the above mentioned formalisms based on transition systems. 5 Parameterized X ASM The purpose of this chapter is to extend X ASM with features for parameterization of their signature. Parameterization allows us to ”program” the signature of an algorithm. This possibility is especial useful if abstract algorithms are defined, which are intended to operate on concrete data-structures. As an example imagine an X ASM-algorithm INTERP which interpretes textual representation of X ASM-rules in such a way, that the interpretation of a rule has exactly the same effects as the direct execution of it. The algorithm INTERP needs thus to access and update functions which are given by the signature of the interpreted X ASM-rule. This is only possible if we can parameterize the signature of INTERP with the signature of the interpreted rule. Another example is partial evaluation of interpreters, where it is often desirable that the resulting specialized program has a signature similar to the signature of the interpreted program. Otherwise the author of the program cannot validate the specialized code with respect to her/his original formulation. In our context, we aim at using parameterization to give an X ASM semantics of Montages which can be specialized to a simple X ASM for each program in the described language. In Section 5.1 we motivate parameterized X ASM (PXasm), by showing that they are needed for a generic algorithm constructing the abstract syntax trees (ASTs) used in Montages. The new programming-features of PXasm are introduced in Section 5.2. The design principle of these new features is that if an ASM  is called by an ASM , the information dynamically calculated by before the call can be used to defined the signature of  . From  ’s point of view, the signature is still static, but it is instantiated differently at each time  is called. Therefore our design of parameterized XASM can be seen as another conservative extension to standard ASMs. In the run of a parameterized ASM, the state is still a Tarski structure, and the transition rule can be easily specialized to a traditional ASM rule. 130 Chapter 5. Parameterized X ASM In order to avoid confusion we use the term ASM to refer to an abstract machine given by the X ASM construct asm ... is ... endasm , and we say traditional ASM if we mean Abstract State Machines as defined by Gurevich (82). Parameterized X ASM are referred to as PXasm. The construction of ASTs for Montages, which serves as a motivating example for PXasm, is formalized with the new features in Section 5.3. Another example for the use of the new features is the definition of an X ASM self-interpreter, which executes rules and evaluates terms (Section 5.4). In Section 6.1 of the next Chapter this self-interpreter will be applied to give a tree finite state machine (TFSM) interpreter, which later serves as core of our Montages semantics. Finally in Section 5.5 we come to partial evaluation, the main application of PXasm. We define a partial evaluator for the PXasm formalism written in PXasm. In the next Chapter we will show in detail how the previously given TFSM interpreter can be specialized in compiled code by assuming that a given TFSM is static. This process of specializing the TFSM interpreter corresponds directly to the process of specializing the Montages meta-interpreter into compiled code. Since the details of the full process are not given, this section serves as a more detailed description of the Montages system architecture described in Figure 37. Throughout the chapter we define and explain in detail a number of longer and more complex X ASM programs for constructing canonic trees (ASM 18), finding enclosing instances of tree nodes (ASM 20), doing self-interpretation of X ASM rules (ASM 25). We include the full definitions because they are integral parts of the formal Montages semantics given in Chapter 8. 5.1. Motivation 131 expr Expr Sum 1 sum factor expr constant sum S−Factor Factor Constant digits factor expr "2"s variable factor constant "x" digits Expr Sum 2 S−Digits Digits Name = 2 ident S−Expr S−Factor Factor Variable 4 5 3 S−Expr Expr 6 Factor Constant S−Ident Ident "1" 7 Name = "x" S−Digits Digits Name = 1 8 Fig. 40: The constructor term and the abstract syntax tree for 2 + x + 1 5.1 Motivation In Section 3 we have given an example and an informal model of a language specification in the Montages style. Since the signature of rules and actions of such a model depends on the specific EBNF of the described language, it is not possible to give a standard X ASM modeling Montages of different languages with one fixed signature. Since defining a different X ASM for each language described is too much overhead, we need additional features which allow to parameterize the signature of an X ASM model. As an example consider the X ASM model for the ASTs of the presented example language Ë in Section 3. The model features special universes for each symbol in the EBNF of Ë and selector functions with names derived from the symbols in the EBNF. The rule Sum ::= Factor “+” Expr, for instance, introduces universes Sum, Factor, and Expr, as well as unary selector functions S-Factor and S-Expr. Formal semantics of the parse-tree construction can now be given based on the representation of programs as constructor terms. A possible mapping of Ë to constructors has been defined in Section 4.5.1. In Figure 40 we show on the left-hand side a visualization of the constructor term Term 6, resulting from the application of the mapping, and on the right-hand side the corresponding parse tree as shown already in Figure 19 of Section 3.2.2. The ASM ConstructTree which will be given below implements the construction of parse trees from constructor terms. While it is easy to write such an ASM for each possible EBNF, we cannot easily give a conventional ASM taking a constructorterm generated for an arbitrary EBNF, and constructing a corresponding AST along the guidelines of Section 3.2.2. Even if the mapping into constructor terms is the same for each EBNF productions, for instance using the canonical mapping as described in Section 4.5.3, we still would have to solve the problem 132 Chapter 5. Parameterized X ASM how to parameterize the signature of universes and selector functions with the symbols existing in a specific EBNF grammar. ASM 15: asm ConstructTree(t) accesses constructors sum(_,_), expr(_), factor(_), variable(_), constant(_) updates universes Expr, Sum, Factor, Constant, Variable updates functions S-Factor(_), S-Expr(_), S-Digits(_), S-Ident(_) is if t =˜ sum(&l, &r) then extend Sum with n n.S-Factor := ConstructTree(&l) n.S-Expr := ConstructTree(&r) return n endextend elseif t =˜ expr(&a) then let n = ConstructTree(&a) in Expr(n) := true return n endlet elseif t =˜ factor(&a) then let n = ConstructTree(&a) in Factor(n) := true return n endlet elseif t =˜ variable(&a) extend Variable with n n.S-Ident := ConstructTree(&a) return n endextend elseif t =˜ constant(&a) extend Constant with n n.S-Digits := ConstructTree(&a) return n endextend elseif t =˜ ident(&a) extend Ident with n n.Name := &a return n endextend elseif t =˜ digits(&a) extend Digits with n n.Name := &a return n endextend endif endasm 5.2. The $, Apply, and Update Features 5.2 133 The $, Apply, and Update Features For situations where the needed signature is not known in advance, we allow to declare and use functions by referencing them with a string-value, using the $, Apply, and Update features of PXasm1 . The design principle of the new features is that if an ASM B is called by an ASM A, the information dynamically calculated by A can be used to define the signature of B. From B’s point of view, the signature is still static, but it is instantiated differently at each time of B’s call. Because of the design principle, the string references to functions are resolved at different times for the declaration part and the rule part of an ASM. The occurrences in the declaration part are resolved at the time when the ASM is called, and the occurrences in the rule are resolved at execution time. The rules have dynamic signature, depending on the evaluation of the terms referring to functions. Nevertheless, the signature of such an ASM is not dynamic, but determined at call time. In the rule evaluation they are checked each time to be consistent with the signature determined at call time. If a term evaluates to an undeclared signature, an inconsistent state is reached. With this mechanism, the user of X ASM is forced to put redefinitions of signatures at the beginning of an ASM call. During the execution of one ASM, the signature is static, as in traditional ASMs2 . 5.2.1 The $ Feature The $-feature is explained best by means of an example. Using the $-feature, instead of the declaration and rule function f(_) f(3) := 5 we can write equivalently function $"f"$(_) $"f"$(3) := 5 As a more complex example, we show ASM Partition (ASM 16), an algorithm to partition a set of nodes in different universes. Consider as read-only environment functions a universe N of nodes and a unary function Name( ) denoting the kind of each node3 . Kinds are simply given as strings. Now ASM Partition declares for each kind a universe and partitions the set of nodes in these universes. The derived universe function K( ) calculates the set K of all kinds. Then for 1 In fact the system would also work with arbitrary values, resulting in a system similar to Mapping Automata (101), see Appendix B. For our purposes it is general enough to allow only string-values. 2 In contrast to parameterized X ASM, mapping automata allow the user to calculate and change the signature completely dynamically. In fact, mapping automata are defined such that every element is both a value, and a function symbol. 3 Please remember that ”universe” is the same as a unary relation, and a relation is the same as a function ranging over Boolean, initially defined to produce false for each argument. 134 Chapter 5. Parameterized X ASM each string in K a universe with that name is declared, using the $-feature. The actual partition is done by the “do forall” rule. Please note that for this example, absence of runtime-errors due to dynamic signature mismatch can be proved, while in the general case this cannot be done. ASM 16: asm Partition accesses universe N accesses function Name(_) is derived universe K(k) == (exists n in N: n.Name = k) (forall k in K universe $k$ ) do forall n in N $Name(n)$(n) := true enddo endasm 5.2.2 The Apply and Update Features Another problem which has to be solved for parameterized X ASM is how to feed an unknown number of arguments to a function. For this purpose we introduce the Apply construct, having as arguments a function symbol and arguments represented as a tuple or list. For instance the function application f(t1, t2, t3) can be equivalently written as Apply("f", [t1, t2, t3]) or as well as Apply("f",(t1, t2, t3)) The reason we allow both kind of syntax is that we want to have a flexible way of passing arguments available in form of lists or tuples to functions whose signature is given using the $-feature. The rule Apply("f", [t1, t2, t3]) := t is equivalent to f(t1, t2, t3) := t To increase readability we allow as well the following alternative syntax. Update("f", [t1, t2, t3], t) For convenience, Apply can also be used in combination with all built-in functions, as well as unary and binary operators, for instance ”+”, ”-”, e.t.c. The term 1 + 2 can thus be written as Apply(”+”, [1,2]). 5.3. Generating Abstract Syntax Trees from Canonical Representations 5.3 135 Generating Abstract Syntax Trees from Canonical Representations In Section 5.1 we motivated the need for parameterized X ASM by showing that they are needed for an algorithm constructing abstract syntax trees (ASTs) as described in Section 3.2. In this section we give such an algorithm based on the canonical mapping described in Section 4.5.3. The presented AST construction algorithm will be used directly as part of the formal semantics of Montages in Section 8. 5.3.1 Constructing the AST We assume that a given EBNF has been decorated with canonical mappings as defined in Section 4.5.3 and that the EBNF has been analyzed to define the universe CharacteristicSymbols containing all strings corresponding to characteristic symbols in the EBNF, and to define the universe SynonymSymbols containing all strings corresponding to synonym symbols in the EBNF. We define the following generic ASM ConstructCanonicTree which constructs the corresponding universes, nodes, and selector functions for all possible EBNF definitions. For the sake of simplicity we ignore the ”S1-” and ”S2-” selectors and treat only the ”S-” selectors. Interface of ConstructCanonicTree The constructors characteristic and synonym are used to decompose the argument , being a canonic representation of the program. The mentioned symbol universes, selector functions, and the Parent function must be ”update” accessed, in order to create the AST. This accesses are declared in the following interface of ConstructCanonicTree. ASM 17: asm ConstructCanonicTree(t) accesses constructors characteristic(_,_), synonym(_,_) accesses universes CharacteristicSymbols, SynonymSymbols (for all c in CharacteristicSymbols: updates universe $c$ updates function $"S-"+c$(_) ) (for all s in SynonymSymbols: updates universe $s$ updates function $"S-"+s$(_) ) is ... endasm Processing of Synonyms If the argument  matches the constructor synonym, it constructs a tree for the right-hand-side '& of the synonym, adds the resulting root-node  to the corresponding synonym-universe, and returns  as result of the construction. ... if t =˜ synonym(&s, &rhs) then 136 Chapter 5. Parameterized X ASM let n = ConstructCanonicTree(&rhs) in $&s$(n) := true return n endlet ... Processing of Characteristics If  matches constructor characteristic, the corresponding characteristic universe is extended with a new node , a node child is constructed for all elements  in the list of right-hand-sides , the attribute Parent of each child is set to node , and the selector functions of  are defined according to the informations in the right-hand-side terms  . ¼ ¼ ... elseif t =˜ characteristic(&c, &l) then extend $&c$ with n n.Name := &c do forall t’ in list &l let child = ConstructCanonicTree(t’) in child.Parent := n if t’ =˜ characteristic(&c, &l) then n.$"S-" + &c $ := child elseif t’ =˜ synonym(&s, &rhs) then n.$"S-" + &s $ := child endif endlet enddo return n endextend ... Lists and Options In Section 4.5.3 we explained that symbols in square option bracket or in curly list-brackets are returning a (possibly empty) list of instances. In Section 3.4 we defined that a list of length 0 is represented in the AST with a specially created node, which is an instance of universe NoNode4 , lists with length 1 are represented in the AST with the node representing the unique member, and lists with length 2 or longer are represented in the AST as lists. A list with length one would be treated exactly like its member. The parts needed to process lists, and options are given as follows. In order to simplify later processing of the tree, a universe ListNode containing all lists being part of the AST, and the attribute Parent are defined as well. The Interface of ConstructCanonicTree is extended with update accesses to universe NoNode and ListNode. The interface of ASM 17 is refined to the following definition ASM 18: asm ConstructCanonicTree(t) ... 4 This subtle details results from the fact, that we use constructor terms to represent lists in the AST. As long as we have at least one node inside the list, this works perfectly, but if we have an empty list, it does not have its own identity and would destroy the structure of the AST immediately. 5.3. Generating Abstract Syntax Trees from Canonical Representations 137 updates universe NoNode, ListNode is ... endasm and the processing of synonyms and characteristics as described before remains unchanged. The processing of an empty list creates an element of NoNode and returns it as result. ... elseif t =˜ [] then extend NoNode with n return n endextend ... Otherwise a derived function ProcessList is used to construct a tree for each element in the list of constructor terms, and the resulting list of root-nodes is added to universe ListNode and returned as result. The Parent attribute of each list element is set to the list itself. ... elseif t =˜ [& | &] then let res = ProcessList(t) in ListNode(res) := true do forall e in list res e.Parent := res enddo return res endlet endif The ASM ProcessList is given as follows. It constructs for each element of the list a canonic tree, and appends the root of that tree to the local variable ' . If the complete list is processed, the list of root-nodes ' is returned. ASM 19: asm ProcessList(l: [NODE]) accesses function ConstructCanonicTree(_) is function r <- [] if l =˜ [&hd | &tl] then r := r + [ConstructCanonicTree(&hd)] l := &tl else return r endif endasm 5.3.2 Navigation in the Parse Tree A very important feature for modeling various structural programming concepts is the possibility to access the least enclosing instance of a certain kind of programming language constructs. 138 Chapter 5. Parameterized X ASM The following ASM enclosing takes as arguments a node of an AST, and a set of strings, being names of node-universes, and returns the least enclosing node, which is an instance of a universe corresponding to one of the nodeuniverse-names. ASM 20: asm enclosing(node, setOfUniverseNames) (forall s in set setOfUniverseNames accesses universe $s$ ) accesses function Parent is if node.Parent = undef then undef else if (exists s in setOfUniverseNames: $s$(node.Parent)) then node.Parent else node.Parent.enclosing(setOfUnivNames) endif endif endasm The function enclosing is a very powerful tool for static semantics definition, since it allows to access directly enclosing statements. The enclosing function is used for name resolution, break/continue statements, exception handling, as well as many aspects of an object oriented type system, such as our Java typing specification in Section D. Typically, information such as declaration tables or visibility predicates are defined as attributes of the corresponding node, and all enclosed statements for which the information is valid can access it directly via enclosing. Interestingly, the same function enclosing is already used by Poetzsch-Heffter in the MAX system (184; 186). In the MAX case studies this feature is very important to specify all kinds of scoping and name resolution aspects of a language. Both in MAX and in our system, the enclosing function allows to simplify the specification of such features by being able to point directly to the least enclosing instance of a certain feature, or the the least enclosing instance of a set of features. In Part III we will use the enclosing-function together with sets of universe names for scopes of variable visibility (Chapter 11) and frames representing jump targets of all kind of abrupt control flow features, such as continues, exceptions, but as well returns from procedure calls (Chapter 14). Simplified versions of such applications are given in the next section. 5.3.3 Examples: Abrupt Control Flow and Variable Scoping Our first example for navigation in the AST is abrupt control flow. Abrupt control flow is a term used for all kinds of control flow not being sequential, but leaving a statement abruptly. Examples of abrupt control flow are breaks and 5.3. Generating Abstract Syntax Trees from Canonical Representations 139 continue jumps out of loops, exceptions, but as well certain aspects of the return from a procedure call. Leaving the statement means climbing up the syntax tree towards the root, resuming the sequential flow in some enclosing statement. For instance, the break statement leaves a loop, in order to terminate it and continue after the loop, the continue statement leaves a loop in order to start again at the beginning of the loop, exception statements try to find a matching catch clause. Variable Scoping The first example is scoping. Different constructs like procedure declarations and blocks define a new scope. A scope typically opens a new name space, and references to functions and variables are resolved first in the least enclosing scope, then in the next outer, and so on. By defining a derived function Scope being a set of strings being the scoping-constructs of the described language, the function enclosing(n, Scope) can be used to access the least enclosing scope, and typically a binary function declTable(Node, Ident) is defined for each scope, mapping the names in the scope’s name space to the corresponding entities. The following ASM lookUp(Node, Ident) is following this pattern to look up definitions through the scopes. The first parameter is the reference, and the second the identifier to be looked up. ASM 21: asm lookUp(node, ident) accesses functions Scope, enclosing(_,_), declTable(_,_) is let scopeNode = node.enclosing(Scope) in if scopeNode = undef then return undef else let decl = scopeNode.declTable(ident) in if decl = undef then node := scopeNode else return decl endif endlet endif endlet endasm Break and Continue In the case of breaks and continue, the enclosing function can be used to find the least enclosing loop statement, having a matching label. Consider the following grammar of Java loops, coming from Chapter 14: Gram. 5: stm iterationStm continueStm breakStm labelId whileStm doStm = = ::= ::= = ::= ::= ...  continueStm  breakStm  iterationStm  labeledStm whileStm  doStm “continue” [ labelId ] “;” “break” [ labelId ] “;” id “while” exp body “do” body “while” exp “;” 140 Chapter 5. Parameterized X ASM labeledStm ::= labelId “:” iterationStm If a break or continue statement is executed, the following function getLoop(Node) takes as parameter a break or continue statement and returns the least enclosing while or do statement, whose label matches the second argument of the function. If the first argument is a continue or break statement whose label is not defined, the least enclosing loop is returned. ASM 22: asm getLoop(node) accesses functions enclosing(_,_), Name(_), S-labelId(_), S-iterationStm(_) is function label <- node.S-labelId.Name if label = undef then return node.enclosing({"whileStm", "doStm"}) else let e = node.enclosing({"labeledStm"}) in if e = undef then return undef else if e.S-labelId.Name = label then return e.S-iterationStm else node := e endif endif endlet endif In Montages such a solution is typically combined with non-local transitions, like the ones showed in the goto-example of Section 3.4.5. In Chapter 14 the control flow of break and continue statements of the imperative core of Java is specified by combining enclosing with non-local transitions. This solution leads to a high level of decoupling. Additional iteration statements can be added without changing the specifications of break, continue, and labeled statement. Other types of abrupt control flow, such as exception handling and procedure calls can be added without changing the specifications. Most interestingly, statements which do not know the concept of abrupt control flow, need not be adapted. The detailed specifications providing this empirical findings are given in Chapters 14.3 and 14.4. 5.4. The PXasm Self-Interpreter 5.4 141 The PXasm Self-Interpreter In this section we present an PXasm interpreter INTERP, written in PXasm. The special property of this interpreter is, that while interpreting a rule it accesses and updates the same functions as the direct execution of does. Given an X ASM rule , the rules and INTERP(R) are equivalent in the sense that given a longer rule  , of which is a part, the result of replacing by INTERP(R) does not affect the outcome of executing  . This program equivalence property is known as full abstraction (78). We use the introduced techniques to represent PXasm rules as constructor terms, and use the signature of the represented rule to parameterize the interpreter’s signature. The interpreter function INTERP( ) executes X ASM rules according to their semantics. The definition of the constructor term representation of PXasm rules and expression is given in Section 5.4.1. Using this representation the self-interpreter definition is given in Section 5.4.3. As an example for the use of the self-interpreter we refer to Section 6.1 where the definition of a TFSM interpreter is given. 5.4.1 Grammar and Term-Representation of PXasm To transform PXasm rules into constructor terms, we give the EBNF of PXasm together with a mapping into constructor terms. For the sake of simplicity we completely neglect parsing problems and operator precedence. Gram. 6: Rule BasicRule DoUpdate Arguments Symbol Meta Ident Conditional DoForAll ::=  BasicRule  / BasicRule = DoUpdate  Conditional  Let  DoForAll  Choose  Extend  Application / rhs ::= Symbol [ Arguments ]“:=” Expr / update(Symbol, Arguments, Expr) ::= “(” Expr  ”,” Expr  “)” / Expr = Meta  Ident / rhs ::= “$” Expr “$” / meta(Expr) = [A-Za-z][A-Za-z0-9]* / Name ::= “if” Expr “then” Rule [“else” Rule] “endif” / conditional(Expr, Rule.1, Rule.2) ::= “do” “forall” Symbol “in” Symbol [“:” Expr] Rule “enddo” / doForall(Symbol.1, Symbol.2, (if Expr = [] then constant(true) else Expr), Rule) 142 Chapter 5. Parameterized X ASM Choose Extend Expr Constant Unary Binary Application Let LetDef Both ::= “choose” Symbol “in” Symbol [“:” Expr] Rule “ifnone” Rule “endchoose” / choose(Symbol.1, Symbol.2, (if Expr = [] then constant(true) else Expr), Rule.1, Rule.2) ::= “extend” Symbol “with” Symbol Rule “endextend” / extendRule(Symbol.2, Symbol.1, Rule) = Unary  Binary  CondExpr  Application  Constant  Let / rhs = “true”  “false”  String  Number / constant(...corresponding ASM constant...) ::= Op Expr / apply(Op, [Expr]) ::= Expr Op Expr / apply(Op, [Expr.1, Expr.2]) ::= Symbol [ Arguments ] / apply(Symbol, Arguments) ::= “let”  LetDef  “in” Both “endlet” / letClause(LetDef, Both) Symbol “=” Expr / letDef(Symbol, Expr) = Rule  Expr / rhs Examples The rule of the first example, ASM 1 is represented as follows, Term 8: [update("x1", [], constant(1)), update("x2", [], apply("x1", [])), update("x3", [], apply("x2", []))] Accordingly, the rule of example ASM 3 can be rewritten in the following form: ASM 23: doForall("i", "Integer", apply("and", [apply(">=", [apply("i",[]), constant(2)]), apply("<", [apply("i",[]), apply("n",[])])]), [update("x", [apply("-",[apply("i",[]), constant(1) ] ) ], apply("x",[apply("i",[])]) ) 5.4. The PXasm Self-Interpreter 143 ] ) Finally consider the above ASM 16. Its rule represented with constructors looks as follows. Term 9: 5.4.2 doForall("n", "N", constant(true), update(meta(apply("Name", [apply("n", []) ] ) ), [apply("n",[])], constant(true))) Interpretation of symbols A symbol in the EBNF grammar is either an identifier, or a meta-constructor, which represents the application of the $-feature. Since Symbols are not X ASM rules or expressions, we define a special X ASMSymbolINTERP which deals only with the Symbol-case. ASM 24: asm SymbolINTERP(t) accesses function INTERP(_) accesses constructor meta(_) is if t =˜ meta(&s) then return INTERP(&s) else return t endif endasm 5.4.3 Definition of INTERP( ) The interface of INTERP is calculated from the parameter  using the functions ¯ MaxArity(t), calculating the maximal arity of functions accessed or updated in , ¯ UpdFct(n,t) providing a comma-separated string listing all -ary functions updated by , and finally ¯ AccFct(n,t) providing a comma-separated string listing all -ary functions accessed by . Given these informations, the interface to the 3-ary updated functions can be given as updates functions with arity 3 $UpdFct(3, t) 144 Chapter 5. Parameterized X ASM Interface of INTERP The interface of INTERP are its parameter , being the rule or expression to be interpreted, and its access to the functions contained in the lists AccFct, the constructors used to represent X ASM rules, as well as its update of functions in the lists UpdFct. The ASM SymbolINTERP is an external function. ASM 25: asm INTERP(t: Rule | Expr) accesses functions UpdFct(_, _), AccFct(_, _), MaxArity(_) (forall n in {0 .. MaxArity(t)}: updates functions with arity n $UpdFct(n, t)$ accesses functions with arity n $AccFct(n,t)$ ) accesses constructors update(Symbol, [Expr], Expr), conditional(Expr, Rule, Rule), doForall(Symbol, Symbol, Expr, Rule), choose(Symbol, Symbol, Expr, Rule), extendRule(Symbol, Symbol, Rule), constant(Value), apply(Symbol, [Expr]), letClause([LetDef], Rule), letDef(Symbol, Expr) is external function SymbolINTERP(_) ... Interpretation of rules The interpretation of the X ASM rules is relatively straightforward. The components of the rule are evaluated by using recursively the interpreter INTERP. Then depending on the result, the main construct is executed using the corresponding X ASM construct. The conditional rule and parallel rule blocks are interpreted as follows. ... if t =˜ [&hd | &tl] then return [INTERP(&hd) | INTERP(&tl)] elseif t =˜ conditional(&e, &r1, &r2) then if INTERP(&e) then INTERP(&r1) else INTERP(&r2) endif return true ... For the update- rule the Update-operator is used and as result the constant true is returned. ... elseif t =˜ update(&s, &a, &e) then Update(SymbolINTERP(&s), INTERP(&a), INTERP(&e)) return true ... In the case of doForall, choose, and extend, the name of the bound variable and the universe are evaluated using SymbolINTERP( ) and then the $-operator is used to transform the names into the corresponding symbols. ... elseif t =˜ doForall(&i, &s, &e, &r) then 5.4. The PXasm Self-Interpreter 145 do forall $SymbolINTERP(&i)$ in $SymbolINTERP(&s)$ : INTERP(&e) INTERP(&r) endo return true elseif t =˜ choose(&i, &s, &e, &r1, &r2) then choose $SymbolINTERP(&i)$ in $SymbolINTERP(&s)$ : INTERP(&e) INTERP(&r1) ifnone INTERP(&r2) endchoose return true elseif t =˜ extendRule(&i, &s, &r) then extend $SymbolINTERP(&s)$ with $SymbolINTERP(&i)$ INTERP(&r) endextend return true ... Interpretation of expressions The interpretation of constants is done by removing the constant-constructor. Please note that the constant-constructor is needed, since a constructor term representing an X ASM rule is as well a constant, and it is thus necessary to encapsulate real constants with the constant-constructor. ... elseif t =˜ constant(&c) then return &c ... The interpretation of an application is done with the built-in Apply operator. ... elseif t =˜ apply(&o, &a) then return Apply(SymbolINTERP(&o), INTERP(&a)) ... Interpretation of let-clauses Finally, the parallel let-clause is interpreted, by first interpreting the terms in all let clauses, and then building up recursively a structure of lets. Since we first evaluate all terms, our constructed recursive let-structure correctly interprets the parallel one. ... elseif t =˜ letClause(&defList, &r) then if &defList =˜ [letDef(&p, &t)|&tl] then return INTERP(letClause(INTERP(&defList), &r)) elseif &defList =˜ [(&p, &o) | &tl] then let $&p$ = &o in return INTERP(letClause(&tl, &r) endlet else return INTERP(&r) endif elseif t =˜ letDef(&p, &t) then return (SymbolINTERP(&p), INTERP(&t)) else return "Not matched" 146 Chapter 5. Parameterized X ASM endif endasm We claim that every X ASM rule or expression 0 is equivalent to the X ASM rule    0  where 0 is the term-representation of 0 . The rule (expression) 0 and INTERP(X’) are equivalent in the sense that given a longer rule (expression) 1 , of which 0 is a part, the result of replacing 0 by INTERP(X’) does not affect the outcome of executing (evaluating) 1 . This program equivalence property is known as full abstraction (78). The proof of this property would involve a structural induction over rule constructors, and their interpreted versions, calculating their rule and value denotations, and showing that they are the same for both the rule and its interpreted version. ¼ ¼ 5.5. The PXasm Partial Evaluator 5.5 147 The PXasm Partial Evaluator Partial evaluation (108; 46) allows us to specialize PXasm descriptions if some of the access functions in the interface are known to be static. For instance, an interpreter together with a fixed program can be specialized to compiled code. The same technique can be applied to implement Montages. An abstract metaalgorithm is given as semantics of Montages. Applying partial evaluation to this algorithm results in specialized interpreters for the specified languages and, subsequently, for compiled, transparent X ASM code for programs written in these languages. This process has already been visualized in Figure 37, and discussed in the introduction of Part II. Parameterization of signature can be used to obtain compiled code whose signature corresponds to terminology introduced by either the language semantics or the program code, allowing us to tailor the readability of the generated code. In this Section we give some details on how to define partial evaluators using parameterized X ASM (Section 5.5.1), and later on in Section 6.4 we show how to apply it to TFSM interpretation. 5.5.1 The Partial Evaluation Algorithm We give a partial evaluator PE, whose arguments are an ASM rule  to be partially evaluated, and a set sf of those function symbols which are considered static. For simplicity we assume that sf always contains the built in functions and all used constructors, which are static by nature. The decision whether an external function is static can be made by the user under the condition that external functions marked as static are always producing an empty update denotation. If an external function is marked as static, it will be pre-evaluated by our PE-algorithm, independent whether it is really independent from dynamic functions or not. We do not discuss here how external functions could be analyzed, and marked as static by the PE-algorithm. Such analysis would be possible and interesting in the case of external function realized as X ASM. In order to simplify the algorithm, we define PE such that partial evaluation of a rule always returns a list of rules, whereas partial evaluation of expressions returns an expression. In the extreme case, the partial evaluation algorithm reduces a rule to an empty list of rules, and an expression to a constant. Typically the outcome is an ASM where the parameterization features are not used anymore and where do-forall and choose rules are replaced with a finite set of simpler rules. Partial Evaluation of Symbols We give a special ASM SymbolPE covering the partial evaluation of symbols. A symbol is either a string, or the meta-constructor representing the $-operator. Partial evaluation of a symbol tries to partially evaluate the argument of the meta constructor, and if the result is a constant constructor containing a string, this string is returned. ASM 26: asm SymbolPE(s: Symbol, sf: set of String) 148 Chapter 5. Parameterized X ASM accesses function PE(_) accesses constructors meta(_), constant(_) is if s =˜ meta(&t) then let tPE = PE(&t, sf) in if tPE =˜ constant(&symb) then return &symb else return meta(tPE) endif endlet endif endasm Interface of PE The interface of PE are its access to the constructors used to represent X ASM rules. External functions are the above mentioned ASM SymbolPE, and later introduced ASMs ArgumentPE, RemoveConstant, and InstantiateRules. ASM 27: asm PE(t: Rule, sf: set of String) accesses constructors update(Symbol, [Expr], Expr), conditional(Expr, Rule, Rule), doForall(Symbol, Symbol, Expr, Rule), choose(Symbol, Symbol, Expr, Rule), extendRule(Symbol, Symbol, Rule), constant(Value), apply(Symbol, [Expr]), letClause([LetDef], Rule), letDef(Symbol, Expr) is external functions SymbolPE(_,_), ArgumentPE(_,_), RemoveConstant(_), InstantiateRules(_,_,_,_) ... Partial Evaluation of Constants This first case is the simplest case at all. It returns the constant as it is. Thus the following fragment is added to ASM 27: ... if t =˜ constant(&c) then return t ... Partial Evaluation of Function Application As a second case function applications are processed. The idea behind partial evaluation of a function application is to partially evaluate the symbol, and the arguments (using ASM ArgumentPE), and then to check whether ¯ the partially evaluated symbol is a string, ¯ this string is in the set sf of static functions, and 5.5. The PXasm Partial Evaluator 149 ¯ all arguments partially evaluated to constants. If all this conditions hold, the RemoveConstant function is used to transform the argument-list of constant constructors into a list of values, and the Apply function is used to calculate the result of applying the corresponding function. This result is then wrapped into a constant-constructor and returned as result of the partial evaluation. ... if t =˜ apply(&op, &a) then let opPE = SymbolPE(&op, sf), aPE = ArgumentPE(&a, sf) in if opPE isin sf andthen (forall a in list aPE: a =˜ constant(&)) then let argList = RemoveConstant(aPE) in return constant(Apply(opPE, argList)) endlet else ... In all other cases an apply constructor with partially evaluated arguments is returned. ... else return apply(opPE, aPE) endif endlet ... The above rule uses the ASM ArgumentPE to partially evaluate argument lists, and the ASM RemoveConstant to remove the constant constructor from list of constant arguments. The definitions are given now. ASM 28: asm ArgumentPE(l: [Expression], sf: set of String) accesses function PE(_) is function r <- [] if l =˜ [&hd | &tl] then r := r + [PE(&hd, sf)] l := &tl else return r endif endasm ASM 29: asm RemoveConstant(l: [Constant]) is function r <-[] if l =˜ [constant(&hd) | &tl] then r := r + [&hd] l := &tl else return r endif endasm 150 Chapter 5. Parameterized X ASM Partial Evaluation of Rules The partial evaluation of updates, rule lists, conditional rules, and extend rules is straightforward. In order to allow for homogeneous processing, our algorithm always returns a list or rules.The following fragment is added to ASM 27. ... elseif t =˜ update(&s, &a, &e) then let sPE = PE(&s, sf), aPE = ArgumentPE(&a, sf), ePE = PE(&e, sf) in return [update(sPE, aPE, ePE)] elseif t =˜ [&hd | &tl] then return PE(&hd, sf) + PE(&tl, sf) elseif t =˜ conditional(&e, &r1, &r2) then let ePE = PE(&e, sf), r1PE = PE(&r1, sf), r2PE = PE(&r2, sf) in if ePE = constant(true) then return r1PE elseif ePE = constant(false) then return r2PE else return conditional(ePE, r1PE, r2PE) endif endlet elseif t =˜ extendRule(&i, &s, &r) then let iPE = SymbolPE(&i, sf), sPE = SymbolPE(&s, sf), rPE = PE(&r, sf) in if rPE = [] then return [] else return extendRule(iPE, sPE, rPE) endif endlet ... Partial Evaluation of Choose The partial evaluation of choose can only simplify the rule, if the bound variable, and the universe are not meta, if the universe is static, and if for each element of the universe the guarding predicate partially evaluates to either constant(true) or constant(false). If there is exactly zero or one elements for which the guard partially evaluates to constant(true), the rule can be simplified. Otherwise, a static set of the elements for which the guard evaluated to true could be constructed. This last simplification is not given here. ... elseif t =˜ choose(&i, &s, &e, &r1, &r2) then let iPE = SymbolPE(&i, sf), sPE = SymbolPE(&s, sf) then if sPE isin sf and not iPE =˜ meta(&) and (forall $iPE$ in $sPE$: (let ePE = PE(&e, sf + {iPE}) in 5.5. The PXasm Partial Evaluator 151 ePE = constant(true) or ePE = constant(false))) then if not (exists $iPE$ in $sPE$: PE(&e, sf + {iPE}) = constant(false)) then return PE(&r2, sf) elseif (exists unique $iPE$ in $sPE$: PE(&e, sf + {iPE}) = constant(true)) then let i0 = (choose $iPE$ in $sPE$: PE(&e, sf + {iPE}) = constant(true)) in return (let $iPE$ = i0 in PE(&r1, sf + {iPE})) endlet else return choose(iPE, sPE, PE(&e,sf), PE(&r1,sf), PE(&r2,sf)) endif else return choose(iPE, sPE, PE(&e,sf), PE(&r1,sf), PE(&r2,sf)) endif ... Partial Evaluation of Parallel Let Definitions The partial evaluation of parallel let definitions tries to find a let definition, where the let-symbol partially evaluates to a string, and where the definition partially evaluates to a constant. If such a let definition is found, consisting of symbol , defining constant , and a rule ' , the rule can be partially evaluated with the set  of static function symbol extended by : let $s$ = c in PE(r, sf + {s}) endlet Subsequently the let definition for  can be removed. This is the core of the partial evaluation of let. The remaining parts are concerned with processing the list of let definitions, and reassembling those lets, which cannot be removed. The first if checks, whether the list of letDef constructors is empty. If the list is empty, the partially evaluated rule is returned. Otherwise, in the “then” part of the first if construct, the symbol and the term of the first let are partially evaluated to pPE and tPE, respectively. If as result from the partial evaluation the symbol is no more meta, and the term did evaluate to a constant, the let clause is removed by partially evaluating the rule, extending the set of static functions sf with the symbol pPE, and setting the value of pPE to the constant tPE by a simple let-construct: ... if (not pPE =˜ meta(&)) and tPE =˜ constant(&tConst) then return PE(letClause(&tl, (let $pPE$ = &tConst in 152 Chapter 5. Parameterized X ASM PE(&r1, sf + {pPE}))), sf) ... Otherwise, the non-constant pPE and tPE are remembered, the rule is partially evaluated with the remaining lets, and at the end the let-definition with pPE and tPE is added to the rule again. Adding the let definition is done by appending it to the list of parallel lets, if the rule returned from partial evaluation is a letconstruct, otherwise a new let-clause with the single let-definition (pPE,tPE) is created: ... let rPE = PE(letClause(&tl, &r1), sf) in if rPE =˜ letClause(&defList2, &r2) then return letClause([letDef(pPE, tPE)| &defList2], &r2) else return letClause([letDef(pPE, tPE)], rPE) endif endlet ... The full PE-definition for parallel lets is given as follows. ... elseif t =˜ letClause(&defList1, &r1) then if &defList1 =˜ [letDef(&p, &t)|&tl] then let pPE = SymbolPE(&p, sf), tPE = PE(&t, sf) in if (not pPE =˜ meta(&)) and tPE =˜ constant(&tConst) then return PE(letClause(&tl, (let $pPE$ = &tConst in PE(&r1, sf + {pPE}))), sf) else let rPE = PE(letClause(&tl, &r1), sf) in if rPE =˜ letClause(&defList2, &r2) then return letClause([letDef(pPE, tPE)| &defList2], &r2) else return letClause([letDef(pPE, tPE)], rPE) endif endlet endif endlet else return PE(&r1, sf) endif endif 5.5. The PXasm Partial Evaluator 153 Partial Evaluation of Forall Rules The partial evaluation of a forall rule does a kind of parallel loop unrolling, if the universe of elements is static. ... elseif t =˜ doForall(&i, &s, &e, &r) then let iPE = SymbolPE(&i, sf), sPE = SymbolPE(&s, sf), ePE = PE(&e, sf), rPE = PE(&r, sf) in if ePE = constant(false) then return [] else if sPE isin sf and not iPE =˜ meta(&) then return InstantiateRules(iPE, sPE, ePE, &r) else return doForall(iPE, sPE, ePE, rPE) endif endif endlet ... The ASM InstantiateRules has four arguments, the bound variable %, the universe , the rule ' and the set of static functions sf. A local universe SetCollector is used to collect an ASM rule for each element in universe , and a variable ListCollector is then used to construct a parallel rule-block from these rules. A variable trigger is used to sequentialize the phases for collecting the rules and then building the list representing the rule-block. The interface of the ASM is given as follows. ASM 30: asm InstantiateRules(i: String, s: String, e: Expr, r: Rule, sf: set of Strings) accesses function PE(_,_) is relation trigger universe SetCollector function ListCollector <- [] if not trigger then ... The collection of rules is done by a ”do forall”-rule, which ranges % over universe , and partially evaluates rule ' in an environment where % is bound to an element of  and the set of static functions is extended with %. ... do forall $i$ in $s$ let ePE = PE(e, sf+{i}), rPE = PE(r, sf+{i}) in ... 154 Chapter 5. Parameterized X ASM Depending on whether the guard condition  partially evaluates to a constant or not, the partially evaluated rule is either returned, skipped, or embedded into a conditional-constructor. Having processed each $i$ in $s$, the trigger is set to true, and the next mode is entered in the else-branch of the outermost ifconstruct is entered. ... if ePE = constant(false) then elseif ePE = constant(true) then SetCollector(rPE) := true else SetCollector(conditional(ePE, rPE, []) := true endif enddo trigger := true else ... Once relation trigger is set to true, a choose rule is fired, which selects an element of universe SetCollector, appends it to list ListCollector and removes it from SetCollector. This choose-rule is repeated until SetCollector is empty, then ListCollector is returned as result. ... choose r0 in SetCollector SetCollector(r0) := false ListCollector := [r0|ListCollector] ifnone return ListCollector endchoose endif endasm Our algorithm does not check whether the set of static symbols makes sense. A more sophisticated version of the algorithm would try to deduce itself which functions could be static by analyzing which functions are updated, and which are not. Such an analysis, and the partial evaluation of X ASM call would result in a more powerful partial evaluator. 5.5.2 The do-if-let transformation for sequentiality in ASMs In Section 4.1.2 we have shortly discussed how sequentiality is typically modeled in ASM by means of a variable holding the “program counter”. We call such a variable a sequentialization variable. Besides the initial example, we have seen many ASMs using such variables. In simple cases such functions could be replaced with a simple sequentiality operator. More interesting are cases where several such variables exist, and the sequential steps are not within a one-dimensional space, but within a space having as many dimensions as there are sequentialization variables. An example for such a more complex case is TFSM interpretation where the variables holding the current node and the current state span a two dimensional space. 5.5. The PXasm Partial Evaluator 155 We present here a transformation of X ASM rules, which takes advantage of information about sequentialization variables, and reformulates an X ASM rule in such a way that partial evaluation of the resulting rule will result in a high portion of pre-evaluation, and remarkably simplified rules. Def. 18: do-if-let transformation of ASM rules. Given the sequentialization variables ranging over universes #½      # and an ASM rule )½      )  )½      )  the do-if-let transformation is defined to be do forall )½  #½     )  # if )½      )   )½      )  then let )½  )½      )  ) in ¼ ¼ ¼  ¼ )½      ) endlet endif enddo  ¼ ¼ The idea behind this transformation is to enumerate all possible states of the sequentialization-variables in an outermost do-for all. If this do-forall is partially evaluated, the rule is instantiated for each such state. Now by introducing the guard of the if, it is guaranteed, that always only one of the instantiated rules is executed. Thus a flat structure of rules, which are guaranteed to be visited in some sequential order has been created. This rule can then easily be transformed into sequential fast code. As last step of the transformation a let is introduced, which overrides the definition of the sequentialization-variables, by introducing bound let-variables with the same names. The values of these variables are set to the bound variables of the do-forall loop. The PE algorithm can now extend the set of static function-symbols sf with all bound variables )½      ) , and by means of the let-clause, they are renamed into )½      ) , and finally the rule )½      )  can be partially evaluated at each instance with static definitions of the sequentialization variables. In Section 6.4 we will show how the do-if-let transformation is applied to compilation of TFSMs . ¼ ¼ 156 Chapter 5. Parameterized X ASM 5.6 Related Work and Conclusions We have motivated and introduced PXasm by showing that they are needed for situations where a family of related problems exists, but the most natural models for the family members do not share one unique signature. Introducing a unique signature may lead to a natural model of the problem family, but if we are interested in models of the family members, a unique signature is often inappropriate. PXasm are a means for constructing the signature of each family member, as soon as the exact member is determined. PXasm can therefore be seen as another approach to domain engineering, which we discussed in Section 2.8. In contrast to the domain-specific languages (DSL) approach, PXasm does not allow us to introduce new language features, having a specialized syntax and static semantics. PXasm allows us to mirror with the signature the terminology of the problem-domain. We use this technique in this thesis to describe the meta-formalisms Montages for DSLs, where each problem is a specific DSL which is using the terminology of the corresponding domain. For a meta-formalism like Montages there are four implementation patterns. The four choices result from the fact that for both the language-description and the program written in the language we have to decide whether a compilation approach, or an interpretation approach is chosen. Even more complexity has to be handled if additional configuration information exists. Again the configuration information can be interpreted at runtime, or compiled into specialized code. If we continue categorizing the full problems, we end up with eight different implementation patterns. Using partial evaluation all of these patterns can be implemented. Those parts which should be compiled are marked as static, and those which should be interpreted are marked as dynamic. The detailed discussion of partial evaluation and its use to generate interpreters and compilers from Montages descriptions would go beyond the scope of this thesis and we refer to the literature (46; 108). Nevertheless we would like to refer to the work of Bigonha, Di Iorio, and Maia (57; 56) who investigated the general problem of partial evaluation for language interpreters written with ASMs. Combining their advanced partial evaluation techniques with our relatively simple problem of partially evaluating TFSMs may result in very good code. Since the aim of PXasm is to parameterize the signature of traditional ASMs, we restrict the possible values for the signature-parameters to strings. Partial evaluation can then be used to reduce them back to traditional ASMs. Mapping Automata (MA) (101) allow one to use arbitrary elements as signature. While traditional ASMs and PXasm view each dynamic function as set of mappings from locations to values, MA views dynamic functions as objects associated with mapping from attributes to values. Therefore in MA the signature is equivalent to the superuniverse Í . The extend rule can be used to create a new element, and at the same time a new dynamic function is created. The details of MA are given in Appendix B. In contrast to MA, in PXasm the signa- 5.6. Related Work and Conclusions 157 ture is still a static collection of function symbols, but the collection may be calculated while initializing the PXasm. A PXasm is thus an MA, where the signature is restricted to a collection of symbols (string values) which is calculated at initialization and remains static during execution. As presented, X ASM rules must be transformed into constructor terms before they can be interpreted or partially evaluated. A further improvement could be achieved by allowing one to use X ASM rules directly as values. Instead of writing the rather unreadable ASM 23 we could then write: ASM 31: asm P’ is function x(_) accesses universe Integer INTERP( "" do forall i in Integer: i >= 2 and i < n x(i - 1) := x (i) endo "" ) where the quadruple quotes ”” are used to indicate that a rule value is used. Since these rule values correspond to the constructor terms representing the rules, it makes sense to allow pattern matching on such rules. For instance the rather clumsy formulation of partial evaluation of the conditional rule in Section 5.5.1 could be given as follows: ... elseif t =˜ "" if &e then &r1 else &r2 endif "" then let ePE = PE(&e, sf), r1PE = PE(&r1, sf), r2PE = PE(&r2, sf) in if ePE = "" true "" then return r1PE elseif ePE = "" false "" then return r2PE else return "" if #PE# then #r1PE# else #r2PE# endif "" endif endlet ... where the # operator is used within quadruple quotes to evaluate rule-values, similar to the way how the $-operator evaluates strings to symbols. The term within the #-operator must evaluate to a rule, which has previously been created with the quadruple quotes, and it is checked that the result is a correct PXasm rule. The double quotes together with the # feature build a so called template language, as described by Cleaveland (44; 45). Cleaveland discusses in detail the advantages of a full featured template language. The implementation and design of the above sketched X ASM template language, possibly integrating Cleaveland’s XML template language, remains for future work. As well the combination of partial evaluation and parameterized signature can be considered to work like a template language (127). The actual “generation” of the program happens only if the partial evaluation results in a complete evaluation of the signature-parameters, whereas in traditional template 158 Chapter 5. Parameterized X ASM languages or the case of the above discussed ””/# features, the content of the templates can always be evaluated. Further our parameterization of signature is integrated with our development language X ASM in such a way, that programs can be executed even if partial evaluation did not completely evaluate the parameterized signature. In contrast unevaluated templates are typically not valid programs. Therefore the combination of parameterized signature with partial evaluation could be described as a template-language, which allows for incremental and partial instantiation of templates, and which allows one to execute templates which are fully instantiated, but as well partially instantiated, and notinstantiated templates. The combination of ””/# works more like a conventional template language X ASM has shown to be well suited to our approach to code generation via partial evaluation and signature parameterization, since it has a very simple denotational semantics, and everything is evaluated dynamically. As discussed, in X ASM the semantics of the available programming constructs is composed by combining the update-sets and values of sub-constructs; this system is fully referentially transparent, and does not suffer from the side-effects problem in normal imperative languages. Based on such a model, it is easier to use partial evaluation and to add parameterization of signatures, than implementing them on top of an existing language such as C or Java. 6 TFSM: Formalization, Simplification, Compilation In this section we show in detail the TFSM interpreter (corresponding to the algorithm Execute we have given in Section 3.3.5) and how it can be specialized in compiled code by assuming that a given TFSM is static. The partial evaluation of a full Montages meta-interpreter works in a similar way, but the details for the full problem are left for future work. Nevertheless this section serves as well as a more detailed description of the Montages system architecture described in Figure 37. In Section 6.1 the TFSM interpreter is given in two versions, one for deterministic, and one for non-deterministic TFSMs. The following two sections show how to simplify TFSMs, by eliminating transitions without action rules (Section 6.2, and by partially evaluating action rules and transitions, once a TFSM is built (Section 6.3). Finally in Section 6.4 compilation of TFSMs is discussed, and in the last section of the chapter some conclusions are drawn. 6.1 TFSM Interpreter In Section 3.3 we have given the construction of TFSMs and in Section 3.3.5 we sketched how they are executed. Given the formalization of the AST we can give now an ASM Execute which executes a TFSM. Later in Section 6.4 it will serve as example for the new Montages tool architecture, and finally in Section 8.4.6 it is used as part of the formal semantics of the Montages formalism itself. We repeat the major definitions from previous sections. The state of TFSM execution is given by two 0-ary, dynamic functions, the current node CNode and the current state CState. If the state (n0, s0) is visited, 160 Chapter 6. TFSM: Formalization, Simplification, Compilation or in other words if       then the action rule associated with CState is executed, using fields of CNode to store and retrieve intermediate results. Fields are modeled unary dynamic functions. The function function getAction(Node, State) -> Action is defined such, that for each node , and state  the term n.getAction(s) returns the corresponding X ASM action represented as constructor term. Transitions in TFSMs change both the current node and the current state. A TFSM-transition  is defined to tuples having five components, the source node , the source state , the condition , the target node , and the target state .        In the condition expression , the source node  can be referred to as bound variable src, and the target node  as bound variable trg. All TFSM transitions are contained in the universe Transition. In the following two sections we give now two variants of a TFSM interpreter, one which can execute non-deterministic TFSMs, e.g. a TFSM where it is possible that several transitions can be triggered, and therefore one has to be chosen nondeterministically, and one interpreter which is specialized for deterministic TFSMs. 6.1.1 Interpreter for Non-Deterministic TFSMs The interface of ASM Execute(n,s) consists of ¯ the parameters spectively,  and  used to initialize the variables CNode, and CState, re- ¯ the access to universes CharacteristicSymbols and SynonymSymbols, and subsequently the accesses to the node-universes and selector functions defined by these universes, and finally ¯ the access to universe Transitions containing all transitions of the TFSM, and the access to function getAction( , ) associating all TFSM-states with the corresponding action-rule. The declaration part defines a boolean variable (or 0-ary relation, in ASM terminology) fired, which is switched between true and false, indicating whether we are in step 1 or 2 of the algorithm given in Section 3.3.5. The interpreter INTERP is defined as external function, and the two variables CNode and CState are declared. 6.1. TFSM Interpreter 161 ASM 32: asm Execute(n,s) accesses universes CharacteristicSymbols, SynonymSymbols, (forall c in CharacteristicSymbols: accesses universe $c$) accesses function $"S-"+c$(_)) (for all s in SynonymSymbols: accesses universe $s$ accesses function $"S-"+s$(_)) accesses universe Transitions accesses function getAction(_, _) is relation fired functions CNode <- n, CState <- s external function INTERP(_) ... The rule of ASM Execute has two parts which are executed in alternation. If fired equals false, the first part is executed, interpreting the action rule of the current state, using the INTERP function, and providing the correct binding of the self variable using a let construct. The first part redefines fired to true such that in the next step the second part if executed. ... if not fired then let self = CNode in INTERP(getAction(CNode, CState)) endlet fired := true else ... The second part tries to choose a transition, whose source node and state match the current state (CNode, CState) and whose condition evaluates to true, if the src and trg variables are defined to be the current node CNode, and the target node of the transition, respectively. ... else choose t in Transitions: t =˜ (CNode, CState, &c, &tn, &ts) and (let src = CNode in (let trg = &tn in INTERP(&c))) CNode := &tn CState := &ts ifnone ... If no transition with valid condition is found, a transition with a default condition is chosen, and activated. Subsequently the relation fired is set to false. ... ifnone choose t in Transitions: t =˜ (CNode, CState, default, &tn’, &ts’) CNode := &tn’ CState := &ts’ 162 Chapter 6. TFSM: Formalization, Simplification, Compilation endchoose endchoose fired := false endif endasm 6.1.2 Interpreter for Deterministic TFSMs For the later sections reusing the TFSM interpretation algorithm, it is advantageous to transform the non-deterministic form using the choose-construct into a deterministic form using the do-forall-construct. Such a transformation is possible if the provided TFSM is deterministic, thus if ¯ conditions on transitions from the same node/state pair are mutually exclusive and ¯ there is exactly one transition with default condition sourcing in each node/state pair, Given such a deterministic TFSM we can replace each default condition with the negation of the conjunction of all other transitions sourcing in the same node/state pair. The ASM TransformTransitions replaces each transition with default condition with a transition whose condition is calculated by the ASM NegateConjunction( , ). ASM 33: asm TransformTransitions updates universe Transitions is external function NegatedConjunction(_,_,_,_) forall t1 in Transitions: t1 =˜ (&sn, &ss, default, &tn, &ts) Transition(t1) := false let c’ = NegatedConjunction(&sn, &ss) Transition((&sn, &ss, c’, &tn, &ts)) := true endforall return true endasm The ASM NegateConjunction( , ) takes as argument a node sn and a state ss. The ASM has two modes, in the first, where function trigger is equal to false, a universe SetCollector is filled with all transitions whose source node and state are (sn, ss) and whose condition is not default. If the universe is built up, the algorithm changes in the second mode by setting trigger to true. ASM 34: asm NegateConjunction(sn, ss) accesses is relation universe function relation Transition trigger SetCollector ListCollector <- [] if not trigger then 6.1. TFSM Interpreter 163 do forall t in Transitions: t =˜ (sn, ss, &c, &, &) andthen &c != default SetCollector(&c) := true enddo trigger := true else ... In the second mode, the transitions in the SetCollector are transformed into a list, and then as result of NegateConjunction the constructor corresponding to the negated conjunction of this list is returned as result of NegateConjunction. ... else choose r0 in SetCollector SetCollector(r0) := false ListCollector := [r0|ListCollector] ifnone return apply("not", [apply("and",ListCollector)]) endchoose endif endasm Given the preconditions and after applying the above transformations, we eliminated all default transitions and we know that for every TFSM state, at most one transition can be triggered. Under these circumstances the following deterministic ASM can be used, instead of the above non-deterministic ASM 32. The interface is not changed and directly reused from ASM 32. ASM 35: asm Execute(n,s) ... if not fired then let self = n0 in INTERP(getAction(n0, s0)) endlet fired := true else do forall t in Transitions: t =˜ (n0, s0, &c, &tn, &ts) if (let src = n0 in (let trg = &tn in INTERP(&c))) then CNode := &tn CState := &ts fired := false endif enddo endif endasm 164 Chapter 6. TFSM: Formalization, Simplification, Compilation 6.2 Simplification of TFSMs The simplification phase applies the TFSM simplification algorithm of Section 3.3.4. The following ASM SimplifyTFSM removes all states with empty action rules, as visualized in Figure 27. The algorithm tries to find two transitions ½ and ¾, such that ½ goes from  to +, and ¾ goes from + to , and such that intermediate state + is not associated with an action rule. In this case the conditions of ½ and ¾ can be combined, and the two transitions can be replaced with a transition from  to . The condition of the new transition is the conjunction of the conditions of ½ and ¾. Since these transitions have different src and trg nodes, the right src and trg definitions are fed to them via let-clauses. ASM 36: asm SimplifyTFSM updates universe TRANSISTIONS accesses function getAction(_,_) is choose t1, t2 in Transitions: t1 =˜ (&n, &s, &cond1, &n’, &s’) andthen t1 =˜ (&n’,&s’,&cond2, &n’’,&s’’) andthen &n’.getAction(&s’) = [] Transitions(t1) := false Transitions(t2) := false Transitions(&n, &s, apply("and", [letClause([letDef("src",constant(&n)), letDef("trg",constant(&n’))], &cond1), letClause([letDef("src",constant(&n’)), letDef("trg",constant(&n’’))], &cond2) ]), &n’’,&s’’) := true endchoose endasm The above algorithm works only if there are no default conditions 1, e.g. deterministic TFSMs where the above ASM 33 has been applied 6.3 Partial Evaluation of TFSM rules and transitions Show how to apply PE to rules and transitions, taking advantage from the fact that self for the rules, and src/trg for the transitions are static. Further we assume 1 A second problem is, if there are states where the control may remains for ever, or cycles among nodes without transition rules. Such cycles may again arise the problem that the control may reside there for ever. Since such a cycle has never occurred in our examples, and since we never experimented with examples where it is important that the ”ever remains at same state” behavior is maintained, we do not further treat these cases. 6.3. Partial Evaluation of TFSM rules and transitions 165 that the selector functions and universes are static. In order to simplify the algorithms we skip the parts which are defining the access interfaces to selector functions, node universes, and which are adding these functions to the sets of static functions provided to the PE-algorithm. The first ASM PartialEvaluateTFSMrules(sf) replaces each action rule with its partially evaluated version, taking as set of static functions those given as argument and self. The argument sf will typically contain the selector-functions, the node-universes, as well as some static functions defined by the environment. The decision which functions are static, and when to call the partial evaluation is again with the user. ASM 37: asm PartialEvaluateTFSMrules(sf) updates function getAction(node) (for all f in set sf accesses function $f$ ) is external function PE(rule,staticSet) for all n in NODE n.getAction := let self = n in PE(n.getAction, sf + {"self"}) endlet enddo endasm The second ASM PartialEvaluateTFSMtransitions replaces each transition with a variant where the condition has been partially evaluated, assuming that the term src statically evaluates to the source node of the transition, and assuming that the term trg statically evaluates to the target node of the transition. ASM 38: asm PartialEvaluateTFSMtransitions(sf) updates universe Transitions (for all f in set sf accesses function $f$ ) is external function PE(rule,staticSet) for all t in Transitions: t =˜ (&sn, &ss, &c, &tn, &ts) Transitions(t) := false let cPE = let src = &sn, trg = &tn in PE(&c, sf + {"src", "trg"}) in Transitions((&sn, &ss, cPE, &tn, &ts)) endlet enddo endasm 166 Chapter 6. TFSM: Formalization, Simplification, Compilation 6.4 Compilation of TFSMs In this section we show the compilation of TFSMs in specialized ASM code. We apply partial evaluation to the the transition rule of Execute, given in Section 6.1.2, ASM 35. As a first step of the compilation we reformulate the original formulation of Execute (ASM 35) using the do-if-let transformation (Definition 18, Section 5.5.2), taking CNode and CState as sequentialization variables. ASM 39: asm Execute(n,s) ... is relation fired functions CNode <- n, CState <- s external function INTERP(_) do forall n0 in NODE, s0 in STATE if (CNode, CState) = (n0, s0) then let CNode = n0, CState = s0 in if not fired then let self = CNode in INTERP(getAction(CNode, CState)) endlet fired := true else do forall t in Transitions: t =˜ (CNode, CState, &c, &tn, &ts) if (let src = CNode in (let trg = &tn in INTERP(&c))) then CNode := &tn CState := &ts fired := false endif enddo endif endlet endif enddo endasm We take the TFSM defined in Section 3.4.5, Figure 33 representing the example program of the goto language given by Grammar 2 and the Montages in Figures 31, 30, and 32. We assume that the TFSM of the example program as well as the rules associated with the states are static. Further, we can see that if the simplification algorithm of Section 3.3.4 is applied consequently, the TFSM of Figure 33 can be further reduced such that all ”initial” and ”go” states disappear. As a consequence the transition relation of our example in Figure 33 is simplified. Introducing the names Program, Const1, Print1, Const2, and Print2 for the remaining AST nodes, a visual representation of the TFSM is given in 6.4. Compilation of TFSMs 167 Figure 41, and the textual representation of the relation Transition is given as the following set, containing five quintuples. {(Program, (Const1, (Print1, (Const2, (Print2, Term 10: "I", "setValue", "print", "setValue", "print", true, true, true, true, true, Const1, Print1, Const2, Print2, Const1, "setValue"), "print"), "setValue"), "print"), "setValue")} Program Labeled Label A Goto Print1 Ident B Const1 I setValue Labeled Label Labeled C Goto Ident Label B A print Goto Print2 Ident C Const2 setValue print go Fig. 41: The simplified version of Figure 33 According to our assumptions, all functions in the interface of ASM Execute are static. Now we apply PE to the rule of ASM 39. As a result the outermost do-forall is unrolled, the first case being given as follows. if (CNode, CState) = (Const1, "setValue") then let CNode = Const1, CState = "setValue" in if not fired then let self = CNode in INTERP(getAction(CNode, CState)) endlet fired := true else do forall t in Transitions: t =˜ (CNode, CState, &c, &tn, &ts) if (let src = CNode in (let trg = &tn in INTERP(&c))) then CNode := &tn CState := &ts fired := false Based on the fact that Const1 and “setValue” are constants, the PE-algorithm is now pushing these constants into the static-let variables CNode and CState which are overriding the dynamic functions CNode and CState. As a result the above case is partially evaluated to if (CNode, CState) = (Const1, "setValue") then 168 Chapter 6. TFSM: Formalization, Simplification, Compilation if not fired then let self = Const1 in INTERP(getAction(Const1, "setValue")) endlet fired := true else do forall t in Transitions: t =˜ (Const1, "setValue", &c, &tn, &ts) if (let src = Const1 in (let trg = &tn in INTERP(&c))) then CNode := &tn CState := &ts fired := false As a simplification, we assume that the actions returned by getAction match the signature, and that the partial evaluation of INTERP(a) for all involved actions  results in rule   . The final result of partial evaluation of the above discussed case is if (CNode, CState) = (Const1, "setValue") then if not fired then value(self) := "1" fired := true else CNode := Print1 CState := "print" fired := false endif End the complete result is the following version of ASM Execute, ASM 40. ASM 40: asm Execute(n,s) ... is relation fired functions CNode <- n, CState <- s external function INTERP(_) if (CNode, CState) = (Const1, "setValue") then if not fired then value(self) := "1" fired := true else CNode := Print1 CState := "print" fired := false endif elseif (CNode, CState) = (Print1, "print") then if not fired then stdout := Const1.value fired := true else CNode := Const2 CState := "setValue" fired := false 6.5. Conclusions and Related Work 169 endif elseif (CNode, CState) = (Const2, "setValue") then if not fired then value(self) := "2" fired := true else CNode := Print2 CState := "print" fired := false endif elseif (CNode, CState) = (Print2, "print") then if not fired then stdout := Const2.value fired := true else CNode := Const1 CState := "setValue" fired := false endif endif endasm 6.5 Conclusions and Related Work While our intention is to use PXasm for the semantics of Montages, we have shown in this chapter their usefulness for a TFSM interpreter and the compilation of TFSMS. The presented TFSM interpreter is the nucleus of the later presented Montages semantics, and the described compilation of TFSMs by means of partial evaluation shows the principles behind the new implementation of Montages. Using the same approach the later presented Montages semantics can be reduced to a specialized interpreter, and a program can be compiled to specialized X ASM code. The presented simplification and compilation allow for an efficient implementation of Montages based on our novel concept of TFSM. Further other meta formalisms can use TFSM as their virtual machine. In fact the basic ideas for TFSMs have been developed by the author while designing a different, XML based meta-specification formalism for the company A4M (126). A very interesting field of development related to TFSMs are model driven architectures, proposed by the OMG group as successor of UML (25; 170). These architectures, which are driven by a model of the problem to be solved, are closely related to domain-engineering. DSLs are considered an important part in such architectures, and many UML based ways for defining such DSLs are discussed (43; 148). Montages, which combines ASTs of DSLs, and statemachines whose states are decorated with actions, may be a good candidate for such definitions: UML already uses such state machines for defining methods of classes, and using the same notation for defining semantics of DSL con- 170 Chapter 6. TFSM: Formalization, Simplification, Compilation structs may be natural. In order to examine this possibility we will redefine Montages based on UML’s variant of state-machines and action-languages. The precise definition of such UML action-languages allows for executable variants of UML (152; 205) and integrating these technologies with Montages will help to move Montages into the domain of practicable software-engineering tools. Interestingly the proposed action languages (2; 229) have many similarities with X ASM. 7 Attributed X ASM The description of main-stream programming languages with Montages (225; 98) has shown the need for a feature corresponding to attribute grammars (AG) (122). In fact, the experiments showed that the complexity of static semantics of a language like Java or C cannot be handled with a methodology less powerful than AGs. The simplicity of Montage’s initially proposed one-pass technique (133), earlier combinations of AGs with ASMs (184; 186), and a proposal for extending AGs with reference values (89) have inspired us to design a new kind of AGs using X ASM. The definition of this attribute grammar variant is based on a very simple mechanism called Attributed X ASM or short AXasm. The motivation for and introduction of AXasm is given in Section 7.1. In Section 7.2 Formal semantics of AXasm is given in three ways, ¯ by translating attributions into derived functions (Section 7.2.1), ¯ by extending the denotational semantics of X ASM with attribution features (Section 7.2.2), and finally ¯ by extending the self-interpreter to full AXasm (Section 7.2.3). The self-interpreter of AXasm is later used in Chapter 8 as part of the Montages semantics. Finally in Section 7.3 we shortly compare AXasm with traditional attribute grammars, and refer to related work. As example we combine in Appendix D attributions with abstract syntax trees, specifying an attribute grammar for the type system of the Java Programming Language. 172 Chapter 7. Attributed X ASM 7.1 Motivation and Introduction If we compare object-oriented (OO) programming with procedural programming and attribute grammars (AGs) with functional programming, we find some interesting commonalities of the two relations. Both OO programming and AGs feature some sort of dynamic binding which allows to associate code with data, and to use this association to choose dynamically the right code for each kind of data. In OO programming the code comes in form of procedures changing the state, and in AGs, the code comes in form of function definitions calculating a result from the arguments. In both cases, the code is not directly associated with the data elements, but with types of data. In OO programming the types are called classes, and the procedures associated with classes are called methods. In the case of AGs, the types are the labels of the abstract syntax tree (AST) nodes, and the functions are called attributions. This section contains a motivation of the AXasm design based on the comparison of the mentioned paradigms, object-orientedness, functional programming, and attribute grammar. The only purpose of our discussion is the motivation of AXasm, for the more in depth discussion of the topic we refer to the existing literature (174). In Section 7.1.1 we compare OO programming to procedural programming and in Section 7.1.2 AGs and functional programming are related to each other. The commonalities of OO programming and attribute grammars are analyzed in Section 7.1.3, and in Section 7.1.4 we introduce AXasm, which achieves some of the same advantages as the other two approaches by adding dynamic binding to derived functions of X ASM. Some features make AXasm look more like OO programs than AGs: attributes may have several parameters, and the values of attributes can be other elements having attributes. Further, using the extend construct, it is in principle possible to create new instances dynamically. Nevertheless in the context of Montages we will mainly use AXasm to simulate the behavior of traditional AGs. For simplifying the presentation we define only dynamically bound derived functions, and do not introduce dynamically bound functions of other kinds. Therefore in our definition of AXasm, the elements have no local state. We do not forbid that attributes are evaluated at runtime, but we concentrate on the case that attributes are evaluated before runtime in order to check static semantics. Partial evaluation of Montages specification is more effective in the case of attributions evaluated before run time, and typical optimization of programming language implementations, such as static typing rely on pre run-time evaluation of attributes. 7.1.1 Object-Oriented versus Procedural Programming The transition from procedural programming to OO programming has led to an increased productivity in software development. One of the reasons for this is that OO programming supports directly the modeling of a system as a number of object-classes whose instances share behavior and state structure. The be- 7.1. Motivation and Introduction 173 havior is given by methods, which may be differently implemented for different classes. If a method is applied to some value, the type of this value determines dynamically which method implementation is bound to the call. This feature is called dynamic binding More detailed, the objects in a class are called its instances. The class of which an object is an instance is called its type. Each class has a number of variables associated, as well as a number of procedures. The variables of a class are called its fields and the procedures of a class are called its methods. Two classes may share the same fields and methods names, but each of them may define them differently. Given a method m, classes , and  , the , definition of typically fits -instances, and the , definition of  fits  -instances. If , is applied to some variable which may hold or  objects, the type of the actual object determines which definition is applied. The following OO pseudo code shows a call of method , on variable . Depending on the type of the value of , either the or  definition of , is executed. class A method m begin m-definition of A end endclass class B method m begin m-definition of B end endclass call o.m The same result can of course be achieved using a procedural programming language. The following procedure , executes the -definition of , if the parameter  of , is an -instance, and the  -definition if the parameter is a  -instance. The call m(o) will thus result in the execution of either the or  definition of ,, depending whether  evaluates to an or to a  instance. procedure m(self: OBJECT) begin if self is A-instance then m-definition of A elseif self is B-instance then m-definition of B endif end call m(o) The power of OO programming comes into play, if a third class  is added. In the procedural implementation, the definition of the unique , procedure has 174 Chapter 7. Attributed X ASM to be extended with the cases covering  instances. Thus the full source code has to be changed. In the OO style, simply a third class  is added to the system, and classes and  are not touched. This is a little advantage, if we look at toy examples, but it is crucial, if realistic software systems are developed. Typically in realistic software system it is very hard to change existing code, since many other system components may rely on it. Before we show how to add dynamic binding to X ASM, we analyze functional programming and attribute grammars. It will be shown that attribute grammars can be considered a dynamically bound version of functional programming. 7.1.2 Functional Programming versus Attribute Grammars Programs represented in the form of ASTs can be conveniently analyzed by decorating their nodes with properties of the corresponding programming construct. Many of such node-properties, such as static type, arguments, or constant value can be expressed as expressions over properties of other nodes in the AST. If the grammar is stable, and if the existing rules are known, a solution using functional programming, where each property is modeled as a function can be given as follows. Consider for instance a grammar with symbols ,  , and  and corresponding expressions defining the property staticType. The following functional definition of staticType can then be applied to calculate the static type of a node : staticType(self: NODE) == (if self is A node then else (if self is B node then else (if self is C node then else undef))) staticType-definition of A staticType-definition of B staticType-definition of C staticType(n) Depending whether  is an ,  , or  node, the corresponding definition of staticType is evaluated. Unfortunately this solution is only feasible, if the grammar and rules are known, and if the grammar is not changing over time. This assumption is not realistic for real-world languages, or for the design process of new domain-specific languages. Therefore a notation which allows to add new definitions without changing the existing ones is needed. A solution to our problem is provided by AGs, which allow to give the property definitions for each grammar symbol. Similar to OO programming, dynamic binding is used to evaluate the attributes. A formulation of the above property or attribute staticType in AG style is given as follows. rule A .... attribute staticType == staticType-definition of A rule B ... attribute staticType == staticType-definition of B 7.1. Motivation and Introduction 175 rule C ... attribute staticType == staticType definition of C n.staticType is added to the definition, the AG style allows If a new kind of nodes to simply add the rule for , while the functional style urges us to change the definition of the unique function staticType. 7.1.3 Commonalities of Object Oriented Programming and Attribute Grammars If we try to analyze the commonality of both OO programming and AGs, we can identify the following points: ¯ The involved elements (objects or nodes) are typed by universes of elements (object-classes or nodes with the same label). The type of an element is determined at its creation (instance creation or production rule application) and is never changed. ¯ Expressions are dynamically typed by the element they evaluate to. ¯ The same function (method or attribute) can be differently defined for each of the type-universes; definitions are associated with these universes. ¯ The dynamic type of the first argument of a function-call (method-call or attribute-evaluation) is used to dynamically bind the call to the corresponding function definition (method definition, attribution). ¯ The first argument of a function which is used for dynamic binding is written before the function, using the dot notation, and within the function-definition this argument can be uniformly accessed with the symbol self 1 . In the next section these common features of OO programming and attribute grammars are added to the semantics of derived functions in X ASM resulting in AXasm. 7.1.4 AXasm = X ASM + dynamic binding Dynamic binding allows to give specialized implementations of the same method for different classes, or of the same attribute for different node types. If a method is called, or an attribute evaluated, the type of the first argument, the so called self or context object, determines which implementation is chosen. In order to make the syntax more explicit, this first argument is typically written 1 This is a simplification, since in each formalism this element is accessed with a different syntax, for instance this instead of self, and in many formalism it is even considered as an implicit argument. 176 Chapter 7. Attributed X ASM in front of a dot. Given an attribute or method , parameters  ½       , the call or evaluation of , with context given by expression  ¼ , is written as follows:      The type of ¼ determines dynamically which implementation is chosen for , and within the code of , the term self can be used to refer to the value of  ¼ . A subtle detail is that the arguments  ½       are not expected to be evaluated with respect to the new context, but with respect to the outermost context, in which ¼ has also been evaluated. Therefore not only the context-object cc, but as well the outermost context object (oc) must be known to evaluated such terms. As motivation consider the following term. ¼  ½ ½½   ½  ¾ ¾½   ¾  The two attributes ½ and ¾ are naturally evaluated with respect to the context object defined by the terms before the ”dot”. On the other hand, it seems more natural that the arguments ½½      ½  ¾½      ¾ should be evaluated in the same context as the initial term  ¼ . Therefore the ”outermost” context-object must be passed through the calculations, and used whenever parameters are evaluated. In order to introduce dynamic binding in X ASM, we need a typing of elements. The idea of AXasm is to use an arbitrary set of disjoint universes to ”type” elements. Given such a partition, the type of an element is given by its membership in one of the universes. The type of an expression is dynamically determined by evaluating the expression. For each of these universes we allow the definition of attributes, a special kind of derived functions. One possibility to guarantee disjointness of universes is to use only the extend function to populate them. For instance the ASM ConstructCanonicTree (ASM 17, Section 5.3.1) uses only extend-rules to populate the characteristic universes. Therefore these universes are disjoint and build a partition called the characteristic partition. This partition is used to combine AXasm with ASTs, resulting in the AG system of Montages. An example for attribute definitions is the following declaration of universe U 0, given in concrete syntax. ASM 41: universe U_0 attr a_1(p_1_1, p_1_2, ..., p_1_n1) == t_1_1 attr a_2(p_2_1, p_2_2, ..., p_2_n2) == t_1_2 ... attr a_m(p_m_1, p_m_2, ..., p_m_nm) == t_1_m As mentioned the interpretation of each rule or expression of an attributed X ASM is depending on a context-object (cc) and an outermost context object (oc). A function 2 maps context objects to the corresponding context universe definitions. The context-object itself is always accessible as function self. 7.1. Motivation and Introduction 177 Evaluating a function application ¼        with respect to    , first the parameters are evaluated with respect to    , resulting in elements  ¼       ; then the attribute is searched in the definitions of the context 2  . If such an attribute definition is present, and the numbers of formal parameters match, the definition is evaluated with actual parameters ¼       , where during the evaluation of f the symbol self refers to , and terms are evaluated with respect to   . Otherwise a function in the global context is searched, where all global dynamic functions, ASMs, constructors, and derived functions reside. The dot-notation can be used to interpret an expression in the context given by another expression. Evaluating ¼ ½ with respect to (cc, oc), the term ¼ is evaluated with respect to the same objects, evaluating to element ¼ , and then ½ is evaluated with respect to the new context-object ¼ and the old outermost context-object  . The result of this second evaluation is the result of the complete dot-term. 7.1.5 Example As an Example consider the following definitions, introducing global functions , , universe  , having attributes , +, and a rule extending  . function a <-1, x universe U attr a == 3 attr b == x.a extend U with u x := u a := a+ u.b endextend First step The rule within the extend clause updates the global function to the new element !. In the next update the global function  is updated to it’s value 1 plus the value of + in the context of !. Since ! is created as element of  , the context of ! is  , and therefore + is identified as an attribute of  . The definition of attribute + is . Since is initially undef, the  of  is initially evaluated in the global context. In the global context,  is initially 1, thus the result of  and thus of attribute + of ! is 1. Thus the global  is updated to 2. Second step After this first step, the value of is the newly create  instance, the value of  is 2. In the second step, again a new instance of  is created. The global function 178 Chapter 7. Attributed X ASM  is incremented with the value of attribute + of the new element. In contrast to the first step, this time the attribute + evaluates to ¿, since is no more undef but evaluates to an element being member of universe  . The evaluation of term  results thus in evaluation of  in the  context, where  is an attribute with constant value ¿. After the second step, the global  is set to 5, and is set to the second newly created element. In all following steps, new  instances are created, and  is incremented with 3. 7.2. Definition of AXasm 7.2 179 Definition of AXasm In this section the formal semantics of AXasm is given in three ways. Section 7.2.1 explains AXasm by translating the dynamically bound derived functions into standard derived functions of X ASM, following the pattern in Section 7.1.2 where the functional counterpart of an attribute grammar has been shown. A semantics without the help of a syntactical transformation is given in Section 7.2.2 where the denotational semantics of X ASM, presented in Definition 9, Section 4.3 is extended to AXasm. Finally in Section 7.2.3 we extend the X ASM self-interpreter of Section 5.4 to a self-interpreter of AXasm. Such a self interpreter will be used in situations where the attributions are not known in advance, for instance the definition of the Montages meta-interpreter needs an AXasm self-interpreter. 7.2.1 Derived Functions Semantics We look at a more general example of attributions and explain their meaning by expressing them as an equivalent derived function. In the following attributed X ASM, symbols ½       are used for universes, symbols ½       are used for attributes, the terms  are defining the attributes, and finally is the transition rule. ASM 42: universe U_1 attr a_1 == t_1_1 attr a_2 == t_1_2 ... attr a_m == t_1_m universe U_2 attr a_1 == t_2_1 attr a_2 == t_2_2 ... attr a_m == t_2_m ... universe U_n attr a_1 == t_n_1 attr a_2 == t_n_2 ... attr a_m == t_n_m R The given definitions of attributes  ½       can be transformed into an equivalent non-attributed X ASM with , derived functions    %        ,. The definition of this function applies the attribute definitions, depending on the value of the context-object self. Instead of the dot-notation  ½ ¾ , an explicit Dot( , ) function must be used. ½  ¾  evaluates first ½ and then makes the result of this evaluation available as context object self in the evaluation of  ¾ . The result of this ¾ evaluation is the result of ½  ¾ . 180 Chapter 7. Attributed X ASM derived function Dot(t1, t2) == (let self = t1 in t2) Following this approach standard X ASM declarations being equivalent to the above attributed X ASM can be given as follows. universes U_1, U_2, ..., U_n function self derived function a_1 == (if U_1(self) then t_1_1 else (if U_2(self) then t_2_1 ... else (if U_n(self) then t_n_1 else undef ) ...)) derived function a_2 == (if U_1(self) then t_1_2 else (if U_2(self) then t_2_2 ... else (if U_n(self) then t_n_2 else undef ) ...)) derived function a_m == (if U_1(self) then t_1_m else (if U_2(self) then t_2_m ... else (if U_n(self) then t_n_m else undef ) ...)) R where all dot-applications in and the terms   are rewritten using the derived function Dot, e.g. ½¾ is replaced by Dot(t1, t2) 7.2.2 Denotational Semantics From a denotational point of view, an attributed X ASM (AXasm) describes an X ASM where elements have local signatures, and the dot-notation can be used to evaluate a term in an other elements signature. Binding of function evaluation is thus done dynamically. The purpose of this section is to extend the denotational semantics of X ASM, as given in Definition 9, Section 4.3. At this moment we would like to note that we have chosen the definitions such, that they are suited as well for object based ASM (102), or even fully object oriented ASMs (128). Here we will restrict us to Attributed X ASM, where objects have local signatures, but no local states. On the other hand, since we formally introduced derived functions as special kinds of X ASM calls (see Section 4.4.3), we will cover full ASM call functionality for the attributes. These correspond semantically to methods of OO systems, but to avoid confusions we call them attributions. A real OO system would be obtained by introducing local states, as in ObASM (102) and by introducing inheritance. Although these features would help to structure further the case studies, we decided to abstract from them in order to shorten the material. 7.2. Definition of AXasm 181 In AXasm the elements of the superuniverse are typed by the so called classes, represented as universes.  has an associated universe 2   from a set  of disjoint universes. The element is member of 2   and not member of any other universe in . We call the universe 2   the class of , and is said to be an instance of 2  . The elements undef, true and false, as well as other built in constants, and constructor terms are the members of the so called global or main class main . Def. 19: Element Partition and Class Association. Each element Since the elements have no local state, the signature of dynamic functions is not split into local signatures, the state is still a mapping from to actual definitions of the functions. The class of an element determines a local extended signature . The global extension signature is considered to be the local extension signature of the global class main. Def. 20: Local Extended Signature and Classes. Associated with each class is a set ext   of attributions defined within the definition of . The global extension signature ext corresponds to ext ,%. The extended signature with respect to an object  is   ext 2  Terms in AXasm can be built over the union of all extended signatures, but in the context of an object , only terms over  can be defined, all others are undefined. The attributions are derived functions defined locally for each class . Formally, a derived function can be represented as a tuple of an expression, and the formal parameters. Def. 21: Attributions. The attributions are given by a family  of mappings. For each class ,   maps the n-ary symbol of ext  to a (n+1) tuple          where  is an expression, and        are the formal parameters. In concrete syntax, the definition of in would be given as follows. universe c ... derived function f(p1, ..., pn) == E ... universe... In summary, an AXasm is given by a transition rule , signature of dynamic functions, a set of disjoint universes   whose interpretation in each state builds a partition of the elements in , a family of attributions (local external functions) ext   for each class and a family of mappings  giving the definitions of the attributions. 182 Chapter 7. Attributed X ASM Def. 22: AXasm. An AXasm is given by a quintuple,    ext     the transition rule ,  signature of dynamic functions,  a set of disjoint universes   partition of the elements in , whose interpretation in each state builds a ext   a family of attributions (local external functions)  for each class , and  a family of mappings  giving the definitions of the attributions. Given an AXasm, the mapping 2  lated in each state $ by 2    from elements to classes can be calcuwhere $ Current and outermost context In AXasm rules and terms are evaluated with respect to a context given by two elements. The first element is the current context, referred to as cc, and the second one is the outermost context, referred to as oc. In the initial state of an AXasm, both cc and oc are equal to undef, and since undef is member of the main class, rules and terms are evaluated with respect to the main, or global context. Global external functions are considered the attributes of this global context, and in the global context, the behavior of an AXasm is the same as the behavior of a normal X ASM. Update and value denotations of AXasm constructs In order to extend X ASM’s denotational semantics to full AXasm, the signature of value denotation (Definition 6) and update denotation (Definition 1), as well as external update and value denotations (Definition 8) must be extended with two additional arguments, the current context cc and the outermost context oc. Def. 23: Denotations with Context. With respect to the current context cc and the out- ermost context oc, the update and value denotation of a rule R in a state given by Upd Eval  $   $  $ is   and the denotations of an external function with actual parameters  ½       are given by ExtUpd ExtEval   ½ ½        $  $    7.2. Definition of AXasm 183 Semantics of self The update denotation of term self is the empty set, and the value denotation of term self is the current context object cc. Def. 24: AXasm self evaluation. if R = self then Upd  $   Eval  $       Semantics of attributions If an external function is realized with an ASM (Definition 14 and 16), the new argument is added to the state as value of self, and it is used as initial current and outermost context objects of the ASM 2 . The denotations ExtUpd and ExtEval are given as follows.  %-,ext2 Def. 25: Update and Value Denotations of Attributions. Given current context cc and n-ary attribute   , and the definition        the ExtUpd and ExtEval functions are given as follows: Upd $ Eval $ self  self  ExtUpd         cc     ExtEval         cc          $       $      Semantics of function application If a function application      is evaluated with respect to    , the arguments of the application are evaluated with respect to    , and the function itself is evaluated with respect to the single context element . The class 2   determines the local signature of external functions (attributes) ext -,, . If ext -,,  the definition of this external function (attribute) is applied, as defined above, otherwise the dynamic function is evaluated. Within the evaluation of , the current context can be referred to as term self.   Def. 26: AXasm Function Evaluation. if          where        are terms and   Eval  $     and    and  2  Eval  $     Since for our purpose we purpose we use AXasm only with attributions being derived functions, we are not giving the details of the refined definitions for the general ASM call. 184 Chapter 7. Attributed X ASM then if  ext 2   then   $        ½        $  ½    $    Eval  $       )  ½        $ else Upd  $     ½    $     Eval  $      ½        Ë  Ë  Semantics of dot application The dot notation is used to change the current context and allows to evaluate external functions of other classes. For instance the term ¼ ½ is evaluated with respect to    , evaluates first ¼ with respect to    element ¼ , and then evaluates ½ with respect to ¼   . Def. 27: AXasm Dot Term Evaluation. if  to  ½ ¾ where ½  ¾ are terms and ½  Eval½  $    then   $     ½  $   Eval  $     Eval¾  $ ½     ¾  $ ½    In all other cases the definitions of Definition 9 remain valid, except that the additional arguments cc and oc are passed as well. 7.2.3 Self Interpreter Semantics In this section the formal semantics of AXasm is given by extending the definition of the PXasm self-interpreter INTERP such that it takes the context-object as additional argument, and evaluates both normal functions and references to attributes. We assume that the current list of attributions is available as constructor term, being assigned to the global 0-ary function AttrDefs. In Section 7.2.3.1 the mapping from attribute definitions in constructor terms is defined. Then we explain first an interpreter for attributions without parameters (Section 7.2.3.2) and then we extend the definitions to a self interpreter for attributions with parameters (Section 7.2.3.3). 7.2.3.1 Constructor Term Representation of Attributes The attributions are provided in the form of constructor terms, built up from the constructors attrDefs(Ident,[Attribute]) whose first argument is the name of the universe, and whose second argument is a list of attributions valid for that universe. Each attribution is a three-ary constructor attribute(Ident, [Ident], Expr) 7.2. Definition of AXasm 185 whose arguments are the name of the attribute, a list of parameters, as well as a term-representation of the X ASM-expression defining the attribute. The initial example of an attribution, ASM 41 is represented using the introduced constructors as follows, Term 11: attrDefs("U_0", [attribute("a_1", [p_1_1, p_1_2, ..., p_1_n1], MOD(t_1_1)), attribute("a_2", [p_2_1, p_2_2, ..., p_2_n2], MOD(t_1_2)), ... attribute("a_m", [p_m_1, p_m_2, ..., p_m_nm], MOD(t_1_m))]) where MOD( ) denotes a function transforming an X ASM expression or rule into its constructor term representation. Correspondingly the representation of the previously defined attributes in ASM 42 is given in Term 12. Term 12: [attrDefs("U_1", [attribute("a_1", attribute("a_2", ... attribute("a_m", attrDefs("U_2", [attribute("a_1", attribute("a_2", ... attribute("a_m", attrDefs("U_n", [attribute("a_1", attribute("a_2", ... attribute("a_m", ] [], MOD(t_1_1)), [], MOD(t_1_2)), [], MOD(t_1_m))]), [], MOD(t_2_1)), [], MOD(t_2_2)), [], MOD(t_2_m))]), [], MOD(t_n_1)), [], MOD(t_n_2)), [], MOD(t_n_m))]) The above representation is generated by extending the EBNF of PXasm (Grammar 6, Section 5.4.1) with the following productions and constructor mappings. Gram. 7: UniverseDef AttrDef ::= “universe” Symbol  AttrDef  / attrDefs(Symbol, AttrDef) ::= “attr” Symbol [ Arguments ] “==” Expr / attribute(Symbol, Arguments, Expr) The constructor representations of the X ASM constructs dot and self expressions are given by the following definitions. Gram. 8: Expr = ...  Dot  Self 186 Chapter 7. Attributed X ASM Dot Self / rhs ::= Expr “.” Expr / dot(Expr.1, Expr.2) ::= “self” / selfSymb 7.2.3.2 Extending the Self-Interpreter for Attributions without Parameters The definition of the self interpreter of such an attribution system is relatively complex. We try to simplify understanding by first concentrating on nonparametric attributions, which are nearer to classical attribute grammars. Later in Section 7.2.3.3 we come back to attributions with parameters. This allows us to abstract in this section from the outermost context object, which is only used for evaluating parameters. The signature of the self interpreter given in this section is A INTERP( , ), the first argument being a term-representation of the ASM rule to be executed, the second being the current context-object. The following function EvalAttribute(cc,a) is used to evaluate an attribute  with respect to a context-object cc. The attribute definitions are available as a list of their constructor term representation which is assigned to variable AttrDefs. If the attribute is not defined for the given context-object, the constant 3 notDeclared is returned, otherwise the result of evaluating the attribute definition by means of A INTERP is returned. The derived universe derived function UniverseSet(u) == (exists a in list AttrDefs: a=˜ attrDefs(u, &)) denotes a set of all universes for which attributes are defined. The following ASM chooses a universe ! in the set of universes UniverseSet, such that the argument cc is in !. Then it chooses u’s attribute definitions ATTR DEFS in the list AttrDefs, and in the list ATTR DEFS is chosen the attribution ATTR corresponding to the ident . The defining expression of ATTR is then interpreted with respect to context object cc. If one of the three choose operators does not succeed, the element notDeclared is returned. ASM 43: asm EvalAttribute(cc: Object, a: Ident) accesses function A_INTERP(_,_) accesses function AttrDefs accesses constructor notDeclared, meta(_) accesses universe UniverseSet (forall u in UniverseSet accesses universe $u$) is choose u in UniverseSet: $u$(cc) choose ATTR_DEFS in list AttrDefs: ATTR_DEFS =˜ attrDefs(u, &DefList) choose ATTR in list &DefList: ATTR =˜ attribute(a, [], &e) A_INTERP(&e, cc)) ifnone return notDeclared 3 Constants are modeled as 0-ary constructors. 7.2. Definition of AXasm 187 endchoose ifnone return notDeclared endchoose ifnone return notDeclared endchoose endasm Interpretation of Symbols For non-parametric attributions, the arguments of the AXasm self interpreter A INTERP are an X ASM rule  and the context object cc. The same arguments are needed for the interpretation of symbols, which is similar to the interpretation of symbols without attributes (ASM 24), except that the context object must be passed as well. ASM 44: asm SymbolA_INTERP(t, cc) accesses function A_INTERP(_,_) is if t =˜ meta(&s) then return A_INTERP(&s, cc) else return t endif endasm Interpretation of Rules: Structure The interpreter for attributed X ASM needs to update all functions mentioned in the term , and accesses the mentioned functions EvalAttribute, AttrDefs, as well as the constructor notDeclared, and the universe UniverseSet. The signature of the ASM is almost identical to the corresponding ASM 25 of the PXasm selfinterpreter. ASM 45: asm A_INTERP(t, cc) asm INTERP(t: Rule | Expr) (forall n in {0 .. MaxArity(t)}: updates functions with arity n $UpdFct(n, t)$ accesses functions with arity n $AccFct(n,t)$ ) accesses constructors update(Symbol, [Expr], Expr), conditional(Expr, Rule, Rule), doForall(Symbol, Symbol, Rule), extendRule(Symbol, Symbol, Rule), constant(Value), apply(Symbol, [Expr]), letClause([LetDef], Rule), letDef(Symbol, Expr) accesses function AttrDefs accesses constructor notDeclared accesses universe UniverseSet is external function EvalAttribute(obj,par) external function SymbolA_INTERP(_,_) ... 188 Chapter 7. Attributed X ASM Interpretations which do not change considerably There are a number of rules whose interpretation is not changing its main functionality with respect to the PXasm self-interpreter. This rules are update, lists, conditionals, doForall, extendRule, and constant. The only difference of the interpretation for these rules is, that the additional context object cc is passed as argument to each interpretation of their components. Interpretation of Apply The interpretation of apply has now to take into consideration the context object cc. The interpreter calls ASM EvalAttribute to see whether in the context of cc an attribute is defined, which matches the symbol to be interpreted. First, it is tested whether context object cc is undef, e.g. if we are in the global context. If yes, a global dynamic function is evaluated, otherwise an attribute is assumed, or if later no such attribute is defined, the cc is added as first argument to a global function evaluation. Since we assume attributes to have no parameters, the parameters are simply skipped in the call of EvalAttribute. If there is no attribute found, the function EvalAttribute returns notDeclared and the function is evaluated as global function using the built in Apply operator. ... elseif t =˜ apply(&op, &a) then let opINT = SymbolA_INTERP(&op, cc), aINT = A_INTERP(&a, cc) in if cc = undef then return Apply(opINT, aINT) else let r = EvalAttribute(cc, opINT) in if r = notDeclared then return Apply(opINT, [cc | aINT]) else return r endif endlet endif endlet ... Interpretation of dot The dot operator is used to change the context object cc. In the case of attributed X ASM without parameters this is easily done by replacing the argument cc with the newly created object. ... elseif t =˜ dot(&t1, &t2) then let lhs = A_INTERP(&t1, cc) in return A_INTERP(&t2, lhs) endlet ... 7.2. Definition of AXasm 189 Interpretation of self The term self refers to the context object, thus to the object cc. ... elseif t =˜ selfSymb then return cc ... Interpretation of let clauses Finally we have to treat the let clauses. For this purpose we calculate first the value of each let definition and build then recursively the let rules. Since we first evaluate all let definitions, the used recursive let construction is equivalent to the intended parallel one of the interpreted term. ... elseif t =˜ letClause(&defList, &r) then if &defList =˜ [letDef(&p, &t)|&tl] then return A_INTERP(letClause(A_INTERP(&defList, cc), &r), cc) elseif &defList =˜ [(&p, &o) | &tl] then let $&p$ = &o in return A_INTERP(letClause(&tl, &r), cc) endlet else return A_INTERP(&r, cc) endif elseif t =˜ letDef(&p, &t) then return (SymbolA_INTERP(&p, cc), A_INTERP(&t, cc)) ... 7.2.3.3 Attributions with Parameters Parametric attributions extend attributes with parameters. If such attributions are evaluated, the parameters are evaluated in the outermost context, and only the context of the attribute evaluation is changed by the ”dot”-notation. Therefore two context objects (cc, oc) must be passed as arguments of the evaluation function, one for the parameters, and one for the attribute. As a consequence, all of the above rules have to be extended with a second context-object parameter. As an exception, the ASM EvalAttribute still only needs access to one context object, but we need to pass the already evaluated arguments to the attributes. Further, the arity of the accessed A INTERP has changed with respect to the old definition ASM 43. The ASM CreatePairs is needed to transform the lists of actual and formal parameters into a list of pairs, consisting of formal name and actual value, which then can be interpreted as a list of let-clauses, thereby using the self-interpretation of let-clauses to give self-interpretation of attributes with parameters. ASM 46: asm EvalAttribute(cc: Object, a: Ident, actual: [Object]) accesses accesses accesses accesses function A_INTERP(_,_,_) function AttrDefs constructor notDeclared universe UniverseSet 190 Chapter 7. Attributed X ASM (forall u in UniverseSet accesses universe $u$) is choose u in UniverseSet: $u$(cc) choose ATTR_DEFS in list AttrDefs: ATTR_DEFS =˜ attrDefs(u, &DefList) choose ATTR in list &DefList: ATTR =˜ attribute(a, &formal, &e) A_INTERP(letClause(CreatePairs(&formal, actual), &e), cc, cc)) ifnone return notDeclared endchoose ifnone return notDeclared endchoose ifnone return notDeclared endchoose endasm asm CreatePairs(l1, l2) is if l1 =˜ [&hd1 | &tl1] then if l2 =˜ [&hd2 | &tl2] then return [(&hd1, &hd2) | CreatePairs(&tl1, &tl2)] endif else return [] endif endasm With respect to the formulation without parameters, ASM 45, the ASM A INTERP has a third argument, the outermost object, and the referred function EvalAttribute is now 3-ary. ASM 47: asm A_INTERP(t: Rule, cc: Object, obj_outermost: Object) ... is external function EvalAttribute(_,_,_) ... For almost all rules, the third parameter is simply passed to the interpretation of the components. The only place where the outermost context is used is in the fragment dealing with applications. There the parameters are evaluated in the context of the outermost object, while the attribute is evaluated with respect to the contextobject. If there is no attribute defined, the Apply-operator is used to evaluate the function. The self is set to the context object. ... elseif t =˜ apply(&op, &a) then 7.2. Definition of AXasm let opINT = SymbolA_INTERP(&op, obj_outermost, obj_outermost), aINT = A_INTERP(&a, obj_outermost, obj_outermost) in let r = EvalAttribute(cc, opINT, aINT) in if r != notDeclared then return r else let self = cc in return Apply(opINT, aINT) endlet endif endlet ... The complete A INTERP ASM looks as follows. ASM 48: asm SymbolA_INTERP(t, cc, obj_outermost) is if t =˜ meta(&s) then return A_INTERP(&s, cc, obj_outermost) else return t endif endasm asm A_INTERP(t, cc, obj_outermost) updates * accesses function EvalAttribute(obj,att,par) accesses function AttrDefs accesses constructor notDeclared accesses universe UniverseSet is if t =˜ update(&s, &a, &e) then Update(SymbolA_INTERP(&s, cc, obj_outermost), A_INTERP(&a, cc, obj_outermost), A_INTERP(&e, cc, obj_outermost)) return true elseif t =˜ [&hd | &tl] then return [A_INTERP(&hd, cc, obj_outermost) | A_INTERP(&tl, cc, obj_outermost)] elseif t =˜ conditional(&e, &r1, &r2) then if A_INTERP(&e, cc, obj_outermost) then A_INTERP(&r1, cc, obj_outermost) else A_INTERP(&r2, cc, obj_outermost) endif return true elseif t =˜ doForall(&i, &s, &e, &r) then do forall $SymbolA_INTERP(&i, cc, obj_outermost)$ in $SymbolA_INTERP(&s, cc, obj_outermost)$: A_INTERP(&e, cc, obj_outermost) A_INTERP(&r, cc, obj_outermost) endo return true 191 192 Chapter 7. Attributed X ASM elseif t =˜ choose(&i, &s, &e, &r1, &r2) then choose $SymbolA_INTERP(&i, cc, obj_outermost)$ in $SymbolA_INTERP(&s, cc, obj_outermost)$: A_INTERP(&e, cc, obj_outermost) A_INTERP(&r1, cc, obj_outermost) ifnone A_INTERP(&r2, cc, obj_outermost) endchoose return true elseif t =˜ extendRule(&i, &s, &r) then extend $SymbolA_INTERP(&s, cc, obj_outermost)$ with $SymbolA_INTERP(&i, cc, obj_outermost)$ A_INTERP(&r, cc, obj_outermost) endextend return true elseif t =˜ constant(&c) then return &c elseif t =˜ apply(&op, &a) then let opINT = SymbolA_INTERP(&op, obj_outermost, obj_outermost), aINT = A_INTERP(&a, obj_outermost, obj_outermost) in let r = EvalAttribute(cc, opINT, aINT) in if r != notDeclared then return r else let self = cc in return Apply(opINT, aINT) endlet endif endlet elseif t =˜ dot(&t1, &t2) then let lhs = A_INTERP(&t1, cc, obj_outermost) in return A_INTERP(&t2, lhs, obj_outermost) endlet elseif t =˜ selfSymb then return cc elseif t =˜ letClause(&defList, &r) then if &defList =˜ [letDef(&p, &t)|&tl] then return A_INTERP(letClause( A_INTERP(&defList, cc, obj_outermost), &r), cc, obj_outermost) elseif &defList =˜ [(&p, &o) | &tl] then let $&p$ = &o in return A_INTERP(letClause(&tl, &r), cc, obj_outermost) endlet else return A_INTERP(&r, cc, obj_outermost) endif elseif t =˜ letDef(&p, &t) then return (SymbolA_INTERP(&p, cc, obj_outermost), A_INTERP(&t, cc, obj_outermost)) else return "Not matched" endif endasm 7.3. Related Work and Results 7.3 193 Related Work and Results We have discussed the relation of Montages with Attribute Grammar based formalisms for dynamic semantics in Section 3.5 and we concentrate now on the comparison of AXasm and traditional Attribute Grammars (AGs) for the specification of static semantics of programming languages. The application of AGs for specifying static semantics of programming languages has produced a large number of approaches. A good survey of the obtained results can is given by Waite and Goos (221). The actual algorithms for the semantic analysis are simple but will fail on certain input program if the underlying AG is not well-defined. Testing if a grammar is well-defined, however, requires exponential time (103). A sufficient condition for being well-defined can be checked in polynomial time. This test defines the set of ordered AGs as being a subset of the well-defined grammars (117). However, there is no constructive method to design such grammars. These problems have led to a number of alternative approaches based on predicate calculus (212; 167; 183) which avoid these problems, but do not allow for the generation of efficient semantics analyzer which can be used in practical compilers. Since AXasm allow both the use of arbitrary complex AGs and predicate calculus, they are not solving the traditional problems of AG research. The only purpose of AXasm is to simplify the specification of static semantics and they are not providing any solution for the problem of generating efficient semantics analysis tools. With other words, AXasm are not an alternative for AGs since AXasm are only reusing the ease of specification features of AGs, but not preserving the efficiency features of AGs. In contrast to AXasm, traditional Attribute Grammars make the connection to the grammar explicit and declare not only the signature of attributes, but as well their typing and the direction of the information flow. ¯ Synthesized Attributes Attributes whose value is calculated from attributes of their siblings are called synthesized attributes. Information for the calculation of these attributes flows thus from the leafs of the tree towards the root. ¯ Inherited Attributes Those attributes whose value is calculated from the value of their parent’s attributes are called inherited attributes. Information for these calculations flows from the root towards the leaves of the tree. In AXasm only synthesized attributes are defined traditionally, inherited attributes are simulated using a special attribute Parent which links nodes in the parse tree to their parent node. The attribute Parent has been introduced in Section 3.2.2 and formalized in Section 5.3.1. We see clear limitations of not having inherited attributes, but on the other hand this allows us to considerably simplify the syntax of attribute definitions and to have the definitions look and feel like method declarations in object oriented programming. On the other hand the existence of the Parent attribute and the enclosing function (Section 5.3.2) together with the fact that values of AXasm attributes can be references to other nodes in the tree allows in certain situation for a much 194 Chapter 7. Attributed X ASM more compact specification style. Instead of locally moving information from parent to sibling, using inherited attributes, the information can be directly accessed by using the enclosing function. For instance name resolution, a feature typically specified with inherited attributes, is covered in AXasm by directly accessing the declaration table of the least enclosing scope. Interestingly, the same function enclosing is already used by Poetzsch-Heffter in the MAX system (184; 186). Both in MAX and in our system, the enclosing function allows to simplify the specification of such features by being able to point directly to the least enclosing instance of a certain feature, or the the least enclosing instance of a set of features. In summary the main differences of AXasm with respect to attribute grammars are the following. ¯ Arbitrary Structure AXasm can be defined over a number of object sets, which are not building a parse tree. In fact, AXasm do not start with a grammar, but with an arbitrary partition of the involved objects, independent whether they are nodes of a parse tree or not. For simplicity we still use the notion node for those objects which have attributes ¯ Untyped The terms defining attributions of AXasm are not typed. ¯ Global References While in traditional attribute grammars the definition of an attribute only depends from the attributes of its siblings or its parent, in AXasm attributes can be calculated by referring to any other object. Both the MAX system (186) and Hedin’s reference attribute grammars (89) provide a similar feature. ¯ Reference Values In traditional attribute grammars, the values of attributes are restricted to constants, such as strings and numbers, or mappings. In X ASM, the value of one attribute can be another node of the AST. Again the MAX system and reference AG provide a similar feature. ¯ Parameterized Attributes In X ASM, an attribute can have additional parameters. Like this, it is not necessary to return higher order data-structures like mappings. By further generalizing the idea and extending it with a mechanism for inheritance, an OO version of X ASM would be obtained, but the definition of a full OO version of X ASM is beyond the scope of this thesis and we refer the reader to the executable specification of OO X ASM (128). 8 Semantics of Montages In this section we give a formal semantics of Montages using parameterized, attributed X ASM as introduced in Chapters 5 and 7. For simplicity we refer to them as X ASM. The presented algorithms are based on code which has been implemented and carefully tested with the Gem-Mex tool. The running code has ten been rewritten for the thesis using the novel X ASM features introduced in the last chapters. Testing the final version of the algorithms has not been possible since the new features are not yet implemented. In Section 8.1 we reevaluate the meta-interpreter semantics of Montages by discussing different alternatives for giving semantics for a meta-formalism. As mentioned at the beginning of Part II the advantage of the given formalization are that it is executable, serves directly as implementation of Montages, and is easy to maintain, since it is based on one, fixed X ASM specification. Based on TFSMs, we have shown in Chapter 6 how the meta-interpretation specification allows to use partial evaluation to transform language descriptions into specialized interpreters and to compile programs of the described language into specialized X ASM code. The resulting specialized code is in both cases not only more efficient, but as well easier to understand and validate. In this Chapter we abstract from partial evaluation and other efficiency and code transparency related issues and give algorithms building non-optimized and non-simplified TFSMs from Montages. The techniques of Chapter 6 can then be applied to get a maintainable and efficient implementation of Montages from the here presented meta-interpreter. In Section 8.2 the Montages metainterpreter is structured, and then the details of processing Montages aspects relating to static semantics (Section 8.3), and to dynamic semantics (Section 8.4) are given. Finally in Section 8.5 we conclude that the given meta-interpreter can be 196 Chapter 8. Semantics of Montages used to meta-bootstrap the X ASM language, given a Montages description of X ASM, and point to ongoing work on bootstrapping the complete Montages system. 8.1 Different Kinds of Meta-Formalism Semantics A complete specification of a language is given by defining its syntax, its static semantics, and its dynamic semantics. A meta-formalism like Montages is used to give such language definitions. Typically a language definition is given by means of a mathematical mechanism which takes as input a program in the given syntax, checks static semantics, and simulates dynamic semantics. If the language to be defined is a meta-formalism, e.g. a formalism to define other formalisms, the situation is more complex. Of course, as well a meta-formalism is given by defining its syntax and semantics. But each “program” written with the meta-formalism defines another formalism. The “programs” written with a meta-formalism are thus called language-definitions, and we use the term program for the programs written in the formalism specified by a language-definition. The specification of a meta-formalism defines thus syntax and semantics of language-definitions, and defines syntax and semantics for each language defined. There are two different choices to formulate the specification of a metaformalism. Either one gives a mathematical mechanism which takes both, the program and the language-definition as input, or one gives a mathematical mechanism, which transforms a language-definition into a mathematical mechanism being a definition of the described language. The first choice, which takes as input both the program and the semantics definition is called meta interpretation. In our context, a meta-interpreter is an ASM which reads Montages descriptions of a language L plus an L-program P, and interpretes P according to the L-semantics. In Figure 37 we show a metainterpreter, its input, and how it can be specialized to interpreters and compiled code for the specified language. Alternatively, one can define a program generator, taking Montages descriptions of L as input and generating a specialized X ASM model. This choice, which corresponds to the current architecture of the Gem-Mex tool, is visualized in Figure 36. The advantage of this approach is the simplicity of the resulting X ASM model. The signature and structure of the model can be specialized for the given Montages. A simple language described by a few simple Montages results in a simple, specialized X ASM model of the language. The disadvantage of this approach is that it is not trivial to formalize the generator. Further our experience with implementing this approach showed that the software generator can be a considerable maintenance problem. Because of this maintenance problem, and because we can achieve the advantages of the generator approach with partial evaluation of meta-interpreters, we decided to follow 8.1. Different Kinds of Meta-Formalism Semantics 197 the meta-interpreter approach. Following the meta-interpretation approach, we have the problem, that the signature of the terms used in Montages is specialized to the EBNF of the described language. One possibility to solve this problem is to transform the Montages descriptions into descriptions using a more generic signature, and give a meta-interpreter processing such generic Montages definitions. Like this we have the possibility to give a single, fixed ASM as semantics of Montages. The disadvantage of this solution is that the existing Montages modules must be transformed in a complex and context-dependent way. Another disadvantage is, that the complex, generic signature has to be understood even for simple Montages examples. The author has experimented with this solution, described it for an XML based meta-formalism (126), and subsequently implemented it for Montages with the Gem-Mex tool. Although this results in a very small, highly abstract model, the outcome tends to be hard to understand. The reasons are that the complexity of the model is independent from the language described, and the terminology of the described language is not used for its description. The semantics given by such an abstract model can thus not be easily understood by the domain-experts. Instead of transforming the Montages, we propose thus to use parameterized X ASM to ”program” the specialized signature of the Montages. In the introduction to Part II we have already shown a simple example for this process. A meta-interpreter using this approach is as complex as one over a fixed signature. But using partial evaluation, the given parameterized meta-interpreter can be specialized into an interpreter or even a compiled program, using a signature corresponding to the terminology introduced by the EBNF rules of the described language. A meta-interpreter approach using parameterized X ASM allows thus to take advantage of end-user terminology, and fits perfectly a framework for domain-specific languages. The resulting specialized X ASM descriptions correspond both in signature and structure to the given Montages. In the following sections one fixed parameterized ASM MontagesSemantics is given as semantics of the Montages meta-formalism. Given a language description, the signature-parameters of MontagesSemantics can be instantiated and the parameterized ASM is easily reduced to a simple specialized ASM, whose size and complexity is directly related to the complexity of the described language. 198 Chapter 8. Semantics of Montages 8.2 Structure of the Montages Semantics To define the semantics of Montages, we give the meta-interpreter ASM MontagesSemantics which receives as parameters mtg the list of Montages, and prg the program to be analyzed and executed. MontagesSemantics generates from these parameters an AST, collects the attribution rules from the Montages, checks the static semantics condition for each node, decorate the AST with states and transitions, and finally execute the resulting TFSM. 8.2.1 Informal Typing Until now we have given no typing information, since X ASM has no static type system. To make the descriptions of constructor-term representations more readable, we use an informal notation for typing. The following declaration constructor c3(T1, T2, T3) -> T4 denotes that constructor  takes arguments of type  ,  ,   and produces constructor terms of type  . As a convention we assume that constructor symbols are given with lower case letters, and that types start with a capital letter. The notion [T] denotes a list-type of T-instances, T denotes a corresponding set-type. The synonym notation known from the EBNF rules can be used to denote union types. For instance, the rule Gram. 9: Expr = Unary  Binary  CondExpr Application  Constant  Let  from the grammar of X ASM rules induces an informal typing definition of union type Expr built by the types on the right-hand-side. In general we will treat upper-case EBNF symbols from the X ASM and attribution grammars as types of the corresponding constructor-terms. 8.2.2 Data Structure Both mtg and prg are passed to MontagesSemantics as constructor terms. The program prg to be executed is passed as a constructor term built up by the constructor characteristic, representing applications of characteristic production rules, and the constructor synonym, representing applications of synonym productions. Section 4.5.3 gives the details of this canonical representation. The elements of a Montage represented as constructor are its name, being an Ident, a list of Attributes, an X ASM expression being the static semantics condition, a list of States, and a list of MVL transitions. 8.2. Structure of the Montages Semantics 199 constructor montage(Symbol, [Attributes], Expr, [State], [Transition]) Examples of Montages containing all these parts are the A-Montage in Figure 9 and the While Montage in Figure 10. The transitions of these Montages have already been given as Term 1 and 2 in Section 3.3.2. The representation of the A-Montage as constructor term, modulo the free variables    ½  ¾  ¿, and looks as follows. Term 13: montage("A", [... , attribute("a", ["p1", ..., "pn"], T), ...], C, [ ..., state("s3", R), ...], [transition(siblingPath("B", undef, statePath("s1")), C1, siblingPath("B", undef, statePath("s2"))), transition(siblingPath("B", undef, statePath("T")), C2, statePath("s3")), transition(statePath("s3"), C3, siblingPath("B", undef, statePath("I")))] ) The corresponding constructor term for the while is: Term 14: montage("While", [attribute("staticType", [], dot(apply("S-Expr",[]), apply("staticType",[])))], apply["=", [apply("staticType",[]), apply("BooleanType,[])]], [state("profile",update("LoopCounter", [], apply("+", [apply("LoopCounter",[]), constant(1)])))], [transition((statePath("I"), default, siblingPath("Expr", undef", statePath("I"))) transition(siblingPath("Expr", undef, statePath("T")), src.value, statePath("profile")), transition(siblingPath("Expr", undef, statePath("T")), default, statePath("T")), transition(statePath("profile"), default, siblingPath("Stm", undef, statePath("LIST"))), transition(siblingPath("Stm", undef, statePath("LIST")), default, 200 Chapter 8. Semantics of Montages siblingPath("Expr", undef", statePath("I"))) ] 8.2.3 Algorithm Structure The ASM MontagesSemantics processes program and semantics in different phases. Starting with the construction of the parse tree, the next step is collection of attribution rules, and then follows the check of static semantics conditions. After this phase, a program is said to be valid. If the program is not valid, the string ”Program is not valid” is return and the process is stopped. Parse trees of valid programs are then decorated with control-flow information and then executed. The current phase of this process is given by a dynamic function mode which changes its value from construct to collect, validate, then if the program is valid to decorate and finally to execute. In Figure 8 of Section 3 these phases have already been mentioned. Phase 1 of that figure is concerned with initialization and construction of the AST. Phase 2 relates to collection and phase 3 to validation. Finally the phase 4 of the referenced figure relates to decoration, and phase 5 to execution. The overall structure of the ASM MontagesSemantics looks as follows. ASM 49: asm MontagesSemantics(prg, mtg) ... is constructors construct, collect, validate, notValid, decorate, execute function mode <- construct if mode = init then ... construct tree ... mode := collect elseif mode = collect then ... collect attributions ... mode := validate elseif mode = validate then if ... check static semantics ... = true then mode := decorate else return "Program is not valid." endif elseif mode = decorate then ... decorate tree ... mode := execute elseif mode = execute then ... execute ... endif endasm In Section 8.3 we give all details of the Montages semantics concerned with static semantics of described programming languages and in Section 8.4 the formalization of the dynamic semantics aspects are given. 8.3. X ASM definitions of Static Semantics 8.3 201 X ASM definitions of Static Semantics After the construction of the AST, which is described in Section 8.3.1, the attributions of each Montages are collected and assembled to an attributed X ASM. This collection phase is described in Section 8.3.2. As the last phase of static semantics processing, the static semantics conditions are checked for all nodes of the abstract syntax tree. This process is described in Section 8.3.3. 8.3.1 The Construction Phase In the construction phase, the abstract syntax tree is constructed from the given term representation of the program. The definition of ASM ConstructCanonicTree has been given as ASM 17 in Section 5.3. The universes and selector functions updated by ConstructCanonicTree are declared here, such that they are available by in later phases. Further a dynamic function root is declared, and the root of the constructed AST is assigned to it. The corresponding fragment of MontagesSemantics is given as follows, refining ASM 49. ASM 50: asm MontagesSemantics(prg, mtg) accesses constructors synonym(_,_), characteristic(_,_), ... accesses universess CharacteristicSymbols, SynonymSymbols, ... is external function ConstructCanonicTree(Term), ... universe NoNode, ListNode (for all c in CharacteristicSymbols: universe $c$ function $"S-"+c$(_) ) (for all s in SynonymSymbols: universe $s$ function $"S-"+s$(_) ) function mode <- construct function root, ... constructors construct, collect ... ... if mode = construct then root := ConstructCanonicTree(prg) mode := constructed ... endasm 8.3.2 The Attributions and their Collection The list of attributes is a list of attribute constructors, as introduced in Section 7.2.3.1. The typing of the attribute constructor is constructor attribute(Ident, [Ident], Expr) -> Attribute where the first Ident is the name of the attribute, the list of Idents denotes the arguments of the attribute and the expression Expr is an X ASM expression whose evaluation determines the value of the attribute. 202 Chapter 8. Semantics of Montages The list of attributions is collected by the following ASM. The parameter mtgList is the list of montage-terms representing a language specification. The algorithm extracts from each montage-constructor the first and the second argument, and builds up a corresponding list of attributions, using the attrDefsconstructor. ASM 51: asm CollectAttributions(mtgList) accesses constructors montage(_,_,_,_,_), attrDefs(_,_) is function a <- [] if mtgList =˜ [montage(&Symbol, &Attrs, &, &, &) | &tl] then a := a + [attrDefs(&Symbol, &Attrs)] mtgList := &tl else return a endif endasm In the collect phase of the Montages semantics, the attributions are collected and assigned to function AttrDefs. The details of MontagesSemantics with respect to the collect phase are given as the following refinement of ASM 50 ASM 52: asm MontagesSemantics(prg, mtg) ... is ... external function CollectAttributions(Mtgs) function AttrDefs ... constructors ..., collect, validate, ... ... elseif mode = collect then AttrDefs := CollectAttributions(mtg) mode := validate ... endasm 8.3.3 The Static Semantics Condition The third element of a Montage is the static semantics condition. It is a normal X ASM expression, which will be checked in the context of each instance of the Montage. For the evaluation of the conditions, the ASM 48, A INTERP from Section 7.2.3 is used. The derived function getMontage(Ident), returns the Montage constructor having the name given with the argument, and the derived function getCondition(Montage) returns the static semantics condition from a Montage constructor. derived function getMontage(id) == (choose m in list mtg: m =˜ montage(id, &,&,&,&)) derived function getCondition(m) == (if m =˜ montage(&, &, &Cond, &,&) then &Cond else undef) 8.3. X ASM definitions of Static Semantics 203 The following ASM CheckSemantics evaluates for all instances  of a characteristic symbol  the corresponding static semantics condition getMontagegetCondition  The ASM accesses the AXasm interpreter A INTERP, the functions getMontages and getCondition, as well as the universe of characteristic functions. The body calculates the conjunction of all static semantics conditions of all nodes in the AST. The nodes are enumerated by ranging over all node-universes, given by the characteristic functions. ASM 53: asm CheckSemantics accesses function A_INTERP(Term, Obj, Obj) accesses functions getMontage(Symbol) accesses function getCondition(Montage) accesses universe CharacteristicSymbols is return (forall s in CharacteristicSymbols: (let mtg0 = s.getMontage in (let cond0 = mtg0.getCodition in (forall n in $c$: A_INTERP(cond0, n, n))))) endasm The corresponding fragment of MontagesSemantics is given below. Together with ASM 49 and refinement ASMs 50 and 52 it covers the static aspects of a Montages specification. ASM 54: asm MontagesSemantics(prg, mtg) ... is ... derived function getMontage(Symbol) == (choose m in list mtg: m =˜ montage(Symbol, &,&,&,&)) derived function getCondition(Mtg) == (if Mtg =˜ montage(&, &Cond, &,&) then &Cond else undef) external function CheckSemantics external function A_INTERP(Term, Obj, Obj) ... constructors ..., validate, decorate, ... ... elseif mode = validate then if CheckSemantics then mode := decorate else return "Program is not valid." endif ... endasm 204 Chapter 8. Semantics of Montages 8.4 X ASM definitions of Dynamic Semantics Once the static semantics conditions are checked, the lists of states and transitions are used to build a tree finite state machine as described in Section 3.3. ¯ In Section 8.4.1 the association of states and actions is pre-calculated, ¯ in Section 8.4.2 the form of transitions is recapitulated, ¯ in Section 8.4.3 the instantiation of explicit transitions, and ¯ in Section 8.4.4 the creation of implicit transition are described. In Section 8.4.6 the semantics of execution is given. Most material in this section is a refinement of algorithms introduced in Sections 3.3 and 6.1. 8.4.1 The States A state has two elements, its name, being an identifier, and an X ASM rule, its action. constructor state(Ident, Rule) The structure of rules has been given in Section 5.4 grammar Grammar 6. A dynamic function function getAction(Node, State) -> Action is defined such, that for each node , and state  the term n.getAction(s) returns the corresponding action-rule. The same function has been used in the TFSM interpreter of Section 6.1. Now we can give an ASM DecorateWithStates which defines function getAction for all nodes. The derived function getStates extracts the state component from the montage-constructor derived function getStates(Mtg) == (if Mtg =˜ montage(&, &, &, &States,&) then &States else undef) and the derived function getMontages returns the right Montage constructor. ASM 55: asm DecorateWithStates updates function getAction(_,_) accesses constructor state(_,_) accesses function getMontage(_), getStates(_) accessse universe CharacteristicSymbols is do forall c in CharacteristicSymbols: let mtg0 = c.getMontage in let states0 = mtg0.getStates in do forall n in $c$: do forall s in list states0: s =˜ state(&name, &action) n.getAction(&name) := &action enddo enddo endlet enddo endasm 8.4. X ASM definitions of Dynamic Semantics 8.4.2 205 The Transitions A MVL transition consists of three parts, the source of the transition, its firing condition, and the target of the transition. The ASM InstantiateTransitions instantiates MVL-transitions with TFSM transitions. In Section 3.3.2 we have defined basic paths, in Section 3.4.1 we introduced paths from and to lists, and an Section 3.4.3 paths to non-local target have been explained. Throughout these sections the algorithm InstantiateTransitions has been explained, and finally in Section 3.4.4 the complete definition was given. Later in Section 6.1 we have formalized the TFSM transitions as five-tuples in X ASM which are added to a universe Transitions. The differences between the informal version in Section 3.4.4 and the X ASM counterpart are relatively small. We use dynamic functions instead of variables, and the outer explicit loop can be skipped, since X ASM loop implicitly. For the application of the selector functions, the AXasm interpreter A INTERP is used, and TFSM transitions are created by adding them to the relation Transition. The previous X ASM definitions are refined such that for each instantiated transition the node triggering its creation is remembered. In the condition of the transition, this create-node or context-node can be accessed as self in the the condition. The reason for this refinement is, that like this, all terms in a Montages, both the action rules and the conditions on transitions refer to the same self-object, if they are evaluated. Like this, a higher level of decoupling among different Montages is achieved. As a consequence, in the refined version TFSM transitions are six-tuples, rather than five-tuples. Adding a transition ¯ from node/state pair (3), ¯ being created in under condition , ¯ targeting to (3), and ¯ being created by node  is done by the following update. Transitions((sn, ss, cn, c, tn, ts)) := true 8.4.3 The Transition Instantiation Algorithm InstantiateTransitions instantiates each MVL-transition with a number of TFSM-transitions. In the so called decoration phase of MontagesSemantics the MVLtransitions of each Montage are instantiated for all instances of that Montage. The ASM DecorateWithTransitions instantiates for all nodes  all transitions  being part of its Montage mtg0. Formally, this is done by a number of let and do-forall constructs as follows. do forall c in CharacteristicSymbols: let mtg0 = c.getMontage in 206 Chapter 8. Semantics of Montages let trans0 = mtg0.getTransisions in do forall n in $c$: do forall t in list trans0: ... The actual instantiation of the MVL-transition  is done by an external function InstantiateTransitions. The arguments passed are twice the start-node , and the source and target paths. ... if t =˜ transition(&sp, &c, &tp) then let cond = &c in InstantiateTransition(n, &sp, n, &tp) endlet endif enddo enddo endlet endlet enddo The ASM InstantiateTransitions processes the start nodes and paths and creates the corresponding TFSM transitions. The derived function getTransitions is defined as follows: derived function getTransitions(Mtg) == (if Mtg =˜ montage(&, &, &, &Trans) then &Trans else undef) The complete ASM DecorateWithTransitions is given as follows. ASM 56: asm DecorateWithTransitions updates universe Transitions accesses functions CharacteristicSymbols, getMontage(_), getTransitions(_) accesses constructors transition(_,_,_), siblingPath(_,_,_), globalPath(_,_), statePath(_) is external function InstantiateTransitions(_,_,_,_) do forall c in CharacteristicSymbols: let mtg0 = c.getMontage in let trans0 = mtg0.getTransisions in do forall n in $c$: do forall t in list trans0: if t =˜ transition(&sp, &c, &tp) then let cond = &c in InstantiateTransition(n, &sp, n, &tp) endlet endif enddo enddo endlet endlet enddo endasm 8.4. X ASM definitions of Dynamic Semantics 207 The ASM InstantiateTransitions has four arguments, updates universe Transition, and accesses the functions cond and n from ASM DecorateWithTransitions. ASM 57: asm InstantiateTransition(srcNode, srcPath, trgNode, trgPath) accesses function n, cond updates universe Transitions accesses constructors transition(_,_,_), siblingPath(_,_,_), globalPath(_,_), statePath(_) is ... endasm Sibling Paths The cases where source or target paths are sibling paths have been discussed already in Section 3.3.3. If the source path srcPath (respectively target path trgPath) is a sibling path, the corresponding sibling of the source node srcNode (respectively target node trgNode) is calculated and assigned to srcNode (respectively trgNode) and the remaining path-component is assigned to srcPath (respectively trgPath). In contrast to the informal version given in Section 3.3.3, we use the $-feature to construct the syntax of the selector function. As in earlier sections, we abstract from S1- and S2- type selector functions. ... elseif srcPath =˜ siblingPath(&name, 1, &path) then srcNode := srcNode.$"S-"+&name"$ srcPath := &path elseif trgPath =˜ siblingPath(&name, 1, &path) then trgNode := trgNode.$"S-"+&name"$ trgPath := &path ... With this rules, each time if either the source or target path is a sibling path, the corresponding sibling is calculated and the path simplified. Global Paths The processing of global paths has also been discussed before in Section 3.4.3. If srcPath (respectively trgPath) is a global path, InstantiateTransitions is called recursively for each instance of the universe denoted by the global path. Again the $-feature is used for the new formulation. ... elseif srcPath =˜ globalPath(&name, &path) then do forall n0 in $&name$ InstantiateTransition(n0, &path, trgNode, trgPath) enddo elseif trgPath =˜ globalPath(&name, &path) then do forall n0 in $&name$ InstantiateTransition(srcNode, srcPath, n0, &path) enddo ... 208 Chapter 8. Semantics of Montages List Processing Processing of lists has already been discussed in Section 3.4.1. If due to the processing of a sibling or global path either the source or target node is a list, InstantiateTransitions is recursively called for each element of the list. ... elseif srcNode =˜ [&hd | &tl] then ... do forall n0 in list srcNode InstantiateTransition(n0, srcPath, trgNode, trgPath) enddo ... elseif trgNode =˜ [&hd | &tl] then ... do forall n0 in list srcNode InstantiateTransition(n0, srcPath, trgNode, trgPath) enddo ... There are two exceptions to these processing rules, reflecting transitions starting and ending at special ”LIST” boxes representing the whole list rather than its instances. For a detailed discussion we refer again to Section 3.4.1. The first exception, concerning transitions departing from such boxes is as follows. If the source node srcNode is a list and at the same time, the source path srcPath is equal to a special path statePath(”LIST”) then the source of the transition is the ”T”-state of the last element in the list. The second exception covers transitions ending at the List-box. If the target node trgNode is a list and at the same time, the target path trgPath is equal to a special path statePath(”LIST”) then the target of the transition is the ”I”-state of the first element in the list. Those exceptions are reflected by the following refinement of the above rule fragment processing source and target node lists. ... if srcNode =˜ [&hd | &tl] then if srcPath = statePath("LIST") then if &tl = [] then InstantiateTransition(&hd, trgNode, else InstantiateTransition(&tl, trgNode, endif else do forall n0 in list srcNode InstantiateTransition(n0, trgNode, enddo endif elseif trgNode =˜ [&hd | &tl] then if trgPath = statePath("LIST") then InstantiateTransition(srcNode, &hd, else do forall n0 in list srcNode statePath("T"), trgPath) statePath("LIST"), trgPath) srcPath, trgPath) srcPath, statePath("I")) 8.4. X ASM definitions of Dynamic Semantics 209 InstantiateTransition(n0, srcPath, trgNode, trgPath) enddo endif ... In order to guarantee a correct processing, the list rules must be applied first, before the sibling and global rules. In none of the described rules matches anymore, we have the guarantee, that both the source and target node are normal nodes of the syntax tree, and that both the source and target path are state paths. The components of the state paths are dispatched, and the corresponding entry into the transition relation is created. The node  and the condition cond are functions accessed from the DecorateWithTransitions ASM. ... elseif srcPath =˜ statePath(&srcState) then if trgPath =˜ statePath(&trgState) then Transition((srcNode, &srcState, n, cond, trgNode, &trgState)) := true endif The above explained single rules for InstantiateTransitions give together the following complete X ASM definition, corresponding to the informal algorithm in Section 3.4.4. It is interesting to see that the formal version is neither larger nor more complex than the informal one. 210 Chapter 8. Semantics of Montages ASM 58: asm InstantiateTransition( accesses function updates universe accesses constructors srcNode, srcPath, trgNode, trgPath) n, cond Transitions transition(_,_,_), siblingPath(_,_,_), globalPath(_,_), statePath(_) is if srcNode =˜ [&hd | &tl] then if srcPath = statePath("LIST") then if &tl = [] then InstantiateTransition(&hd, statePath("T"), trgNode, trgPath) else InstantiateTransition(&tl, statePath("LIST"), trgNode, trgPath) endif else do forall n0 in list srcNode InstantiateTransition(n0, srcPath, trgNode, trgPath) enddo endif elseif trgNode =˜ [&hd | &tl] then if trgPath = statePath("LIST") then InstantiateTransition(srcNode, srcPath, &hd, statePath("I")) else do forall n0 in list srcNode InstantiateTransition(n0, srcPath, trgNode, trgPath) enddo endif elseif srcPath =˜ siblingPath(&name, 1, &path) then srcNode := srcNode.$"S-"+&name"$ srcPath := &path elseif trgPath =˜ siblingPath(&name, 1, &path) then trgNode := trgNode.$"S-"+&name"$ trgPath := &path elseif srcPath =˜ globalPath(&name, &path) then do forall n0 in $&name$ InstantiateTransition(n0, &path, trgNode, trgPath) enddo elseif trgPath =˜ globalPath(&name, &path) then do forall n0 in $&name$ InstantiateTransition(srcNode, srcPath, n0, &path) enddo elseif srcPath =˜ statePath(&srcState) then if trgPath =˜ statePath(&trgState) then Transitions((srcNode, &srcState, n, cond, trgNode, &trgState)) := true 8.4. X ASM definitions of Dynamic Semantics 8.4.4 211 Implicit Transitions In addition to the explicit transitions, there are implicit default transitions, linking list elements sequentially, and connecting ”I” and ”T” state of each NoNodeinstance. The implicit transitions have already been discussed at the end of Section 3.4. The ASM DecorateWithImplicitTransitions generates these implicit transitions. ASM 59: asm DecorateWithImplicitTransitions accesses universe ListNode, NoNode updates universe Transitions is external function InstantiateListTransitions(_) do forall l in ListNode: InstantiateListTransitions(l) enddo do forall n in NoNode: Transitions((n, "I", n, default, n, "T")) := true enddo endasm The ASM InstantiateListTransitions creates the implicit transitions for lists. ASM 60: asm InstantiateListTransitions(l) updates universe Transitions is if l =˜ [&hd0 | [&hd1 | &tl]} then Transitions((&hd0, "T", &hd0, default, &hd1, "I")) := true InstantiateListTransitions([&hd1 | &tl]) endif return true endif 8.4.5 The Decoration Phase The X ASM MontagesSemantics has been given till the state when the static semantics condition is checked. The next step is to decorate the parse tree with the states and transitions resulting in a TFSM. The following fragment of MontagesSemantics refines ASM 54. ASM 61: asm MontagesSemantics(prg, mtg) ... is ... universe Transitions external functions DecorateWithStates, DecorateWithTransitions, 212 Chapter 8. Semantics of Montages DecorateWithImplicitTransitions ... constructors ..., decorate, execute, ... ... elseif mode = decorate then DecorateWithStates DecorateWithTransitions DecorateWithImplicitTransitions mode := execute ... endasm 8.4.6 Execution The execution of the program is done in the execution phase. The following definition of ASM Execute(Node, State) refines the earlier nondeterministic version of Execute, ASM 32 of Section 6.11 . The state of the execution is hold by two functions, CNode denoting the current node of the syntax tree, where control of the execution is, and CState, the current state of this node being visited. The firing of the current action is done by the following rule. A_INTERP(CNode.getAction(CState), CNode, CNode) and the condition of a transition  is evaluated by providing to the self interpreter not only the values for src and trg but as well by feeding the create-object as context of the evaluation. t =˜ (CNode, CState, &cn, &c, &tn, &ts) andthen (let trg = &tn in A_INTERP(&c, &cn, &cn))) (let src = CNode in Further we declare the self-interpreter A INTERP as an access function, rather than an external function. With this choice, the characteristic/synonym symbols, universes and selector functions must not be included in the interface of Execute. 1 Other earlier sections of this thesis relating to the Execute algorithm are Section 3.3.1, introducing the algorithm with an example and Section 6.4, showing how to apply partial evaluation to this algorithm. 8.4. X ASM definitions of Dynamic Semantics 213 ASM 62: asm Execute(n,s) accesses functions getAction(_, _), A_INTERP(_,_,_) accesses universe Transitions is relation fired functions CNode <- n, CState <- s if not fired then A_INTERP(CNode.getAction(CState), CNode, CNode) else choose t in Transitions: t =˜ (CNode, CState, &cn, &c, &tn, &ts) and (let src = CNode in (let trg = &tn in A_INTERP(&c, &cn, &cn))) CNode := &tn CState := &ts ifnone choose t in Transitions: t =˜ (CNode, CState, &, default, &tn’, &ts’) CNode := &tn’ CState := &ts’ endchoose endchoose endif endasm As a last refinement of the ASM MontagesSemantics we can now give the fragment refining ASM 61 with the fragment for execution. ASM 63: asm MontagesSemantics(prg, mtg) ... is ... external function Execute(_,_) ... elseif mode = execute then Execute(root, "I") endif endasm We have now given the complete definition of the semantics of Montages. In total the definition has a size of about 377 lines of X ASM code, counting every line in the way we presented the algorithms, including lines with “end” constructs, and lines with closing brackets. An efficient implementation needs in addition the algorithms for partial evaluation and simplification of TFSM, which are about 268 lines of code, following the same conventions. 214 Chapter 8. Semantics of Montages Montages Spec. of Xasm Xasm Metainterpreter M1 input partial evaluation Xasm−interpreter input Xasm Xasm Metainterpreter M1 partial evaluation Xasm M1−impl. M2 Fig. 42: Meta Bootstrapping of Montages System 8.5 Conclusions and Related Work The given Montages meta-interpreter together with an X ASM-semantics allows to meta bootstrap both the existing X ASM language definition and metainterpreter, as well as future versions of the X ASM definition and metainterpreter. Since the meta-interpreter corresponds to the definition of Montages, we are therefore able to meta-bootstrap future versions of both Montages and X ASM with the presented process. In Figure 42 we show what we understand under meta-bootstrapping (129), by applying the architecture presented in Figure 37 to the semantics of X ASM and the Montages meta-interpreter. The input to the system are a Montagesspecification of X ASM, and the meta-interpreter ½. Please note that the same X ASM-program ½ is both used as meta-interpreter, and as program serving as input to the partial-evaluation process. ½ is first specialized to an X ASMinterpreter, and then to an implementation of ½, which we call ¾. Metaboots-trapping is done by tuning the specification, the partial evaluator, and the meta-interpreter such, that ½ equals ¾ modulo pretty-printing. Like normal bootstrapping, this procedure cannot guarantee correctness, but allows to make the system more robust. In Figure 35 the meta-bootstrapping has been visualized from a different perspective. The two cycles on the right are again shown in Figure 43, adapting them to the terminology of Figure 42. The meta-interpreter, being Montage’s implementation and semantics, is developed in the left cycle on development platform X ASM. In the right cycle, Montages is used as development platform to further develop the specification of X ASM. If a new X ASM-specification is 4 4 4 4 4 4 4 8.5. Conclusions and Related Work 215 released from the right cycle, the process of Figure 42 is used to bootstrap the existing meta-interpreter to the new version of X ASM. t m en oy pl de Metainterpreter sig n de development cycle of g g implementation t in tes t in tes sig n Montages de t m en oy pl de on on ti ica cif ti ica cif platform: Xasm feedback e sp e sp platform: implementation development cycle of Xasm−Definition feedback Fig. 43: The bootstrapping of X ASM and Montages Open problems and current areas of investigation are how to map object oriented X ASM effectively into main-stream languages like C++ and Java, and how to port not only the interpreter/compiler from the old to the new architecture, but as well the graphical debugger and animation tool, which is currently generated for each described language (10). 216 Chapter 8. Semantics of Montages Part III Programming Language Concepts 219 In this part we use Montages to specify programming language concepts. We try to isolate each concept in a minimal example language. Each of these languages is tested carefully using Gem-Mex, and we invite the reader to use the prepared examples and the tool to get familiar with the methodology. The standard Gem-Mex distribution contains the sources. The material is structured along two dimensions. The first is the already mentioned dimension of programming language concepts. We start with simple expressions, and then cover control statements like if and while, introduce the notions of variables and updates. Finally we show more advanced programming constructs like procedure calls, exceptions, and classes. The second dimension is the dimension of applied specification patterns. Besides the Montages built in pattern of tree finite state machines, we use four identifiable patterns: ¯ Declarator-Reification A pattern common to most presented example language is to reuse tree-nodes being declarations as objects representing the type, variable, class, field, or method they are declaring. Attribution of the nodes is used to specify further properties, and dynamic fields are used to store the current value or state, e.g. the value of a variable or field, or the state of a class being initialized. Advantages of this pattern are compactness of the resulting model, since the existing nodes are reused, ease of animation of the specification, since the nodes correspond to areas in the program text which can be high-lighted, and ease of specification for features like scoping, overriding, and reloading of classes and modules, since different declaration-nodes with the same name can coexist. We call this pattern Declarator-Reification since the parse tree nodes being only declarations of objects like variables, procedures, classes, or modules are reificated into the very same objects. ¯ Tree-Structural-Approach A second major pattern is the use of the tree structure, by means of the universes, selector functions, the parent function, and the ASM enclosed, which have been defined in Section 5.3. As discussed in Section 5.3.3 the tree structure is used for both static scope resolution and dynamic binding to associate type, variable, and procedure (respectively class, field, and method) uses with the right declaration and to guide abrupt control flow through the program structure2 . The advantages of this pattern are ease of animation, since the structure of the program text is used, as well as simplicity of understanding the idea to move up the tree, until a matching value is found. We call this pattern Tree-Structural-Approach since instead of traditional structural approaches, where constructors are used, in this pattern we use the structure of the tree. In contrast to the traditional structural approach, this allows to move not only down the tree, but as well up until the root is reached. Some technical aspects of this pattern, namely the ASMs enclosing and lookUp have been discussed already in Section 5.3.3 2 The abrupt control flow features use this pattern in combination with the later discussed frame-result-controlflow pattern. 220 ¯ Field-Of-Object-Mapping The third specification pattern is the use of one binary dynamic function fieldOf to model the mapping of an object’s field to its value. Given object  and field , the value of the field is given by the term fieldOf  Different language features are unified under this view, for instance – global variables are considered to be fields of a constant Global, – local variables are considered to be fields of an object being the current call incarnation, – static fields are considered to be fields of the class containing the field, and – instance fields are of course fields of the object instance. We decided to name this pattern Field-Of-Object-Mapping since it uses one mapping to unify several related features under the view of a object/field model. ¯ Frame-Result-Controlflow The fourth and last specification pattern is a special case of the Tree-Structural-Approach pattern, combined with a global variable RESULT which is used to return various results from non-sequential control flow. Examples for such results are – a return-value produced by a function/method call, – a target-label produced by a break or continue statement, or – an exception-object produced by a throw statement or error condition. All of this constructs have in common that their “results” are passed up the structure tree, and that there is only one such result at the time. Therefore a global variable RESULT can be used to model the current value of the result. The pattern works such that as soon as a result is generated, control is passed up the tree, rather than along the control-flow arrows. If the type of RESULT matches the frame-node, thus if – a return-value reaches a call-statement – a target-label matches a labeled-statement – an exception-object triggers a catch-statement then the frameHandler processes the result, and resets RESULT to undef, otherwise the control is passed further up the tree to the next least enclosing framenode. Each frame-node only needs to check whether the type of RESULT matches its own kind, and otherwise it passes control further up the tree. Therefore, such a specification will not change depending on what are the other cases of nonsequential control flow. This allows us to give completely independent models and to compose them easily for a full fledged language. A more technical description of the frame-result-controlflow pattern is given in Section 14.1. 221 L1 ExpV1 (Chapter 9) meaning of arrow: language L2 extends language L1 ImpV1 L2 (Chapter 10) ImpV2 (Section 11.1) Variable-Models (Chapter 11) ImpV3 (Section 11.2) ObjV1 (Section 11.3) ObjV2 (Chapter 12) FraV1 (Section 14.2) FraV2 (Section 14.3) Object-Field Models ObjV3 (Chapter 13) FraV3 (Section 14.4) Structural-Flow Models (Chapter 14) Fig. 44: The example languages of Part III 222 In Figure 44 the presented languages are depicted. The variable models of the first group are introducing stepwise the use of the first two pattern for reusable specifications of different kind of variables. The second group of object-field models shows how to specify object orientedness and recursive function-calls. In the third group different kinds of abrupt control flow are modeled with the frame-result-controlflow pattern. Each language and group is labeled by the chapter in which it is discussed. The material is ordered such that each language can be formulated as an extension or refinement of its predecessor. An arrow from L1 to language L2 denotes that the definition of L2 extends or refines the definition of L1. The leave languages of the resulting tree are specified such, that they can be easily combined to one big language with all introduced features. This is an indication that Montages allow to specify common language technology in a modular and composable way. The language ExpV1 is a simple expression language similar to the language introduced in Section 3. In contrast to its predecessor, this language features a rich choice of operators, as known from realistic programming languages. The remaining example languages are extensions of ExpV1, as denoted by the arrows in the figure. The first imperative language ImpV1 extends ExpV1 by introducing the concept of statements, blocks of sequential statements and conditional control flow. At this point we take advantage to give simple specifications of while and repeat loops, as well as a more advanced specification of the switch-statement. The concept of global variables is then introduced in example language ImpV2. The purpose of languages ImpV1 and ImpV2 is to introduced features of a simple imperative language. In a series of refinements, the primitive, name based variable model of ImpV2 is the further developed into the more sophisticated versions ImpV3, and finally ObjV1. Language ObjV2 is an extension of ObjV1 with classes and dynamically bound instance fields, and ObjV3 is an extension of ObjV1 with recursive procedure calls. The languages FraV1 , FraV2 , and FraV3 feature iterative constructs, exception handling, and a refined model of procedure calls, respectively. The presented example languages are an extract from a specification of sequential Java. The Java specification mainly differs from the languages presented here by a complex OO-type system, many exceptions and special cases, and a number of syntax problems. We have given the specification of the complete Java OO-type system as example in Appendix D. 9 Models of Expressions In this chapter, we show an expression language ExpV1, where the intermediate results are computed during the execution of the program. The language works exactly like the example language of Section 3.2 but features more operators and more different kinds of expressions. In addition, ExpV1 has a simple type system, features lazy evaluation of disjunction and conjunction, and detects runtime errors such as division by zero. The grammar is given as follows, leaving away details on available unary and binary operators: Gram. 10: Program exp lit uExp bExp cExp ::= = = ::= ::= ::= exp lit  uExp  bExp  cExp Number  Boolean “(” uOp exp “)” “(” exp bOp exp “)” “(” exp “?” exp “:” exp “)” In ExpV1 only constant expressions such as the following can be formulated: Ex. 1: (((((3 - 2) * 7) > 2) and true) or false) The result of executing this program is that ”true” is printed to the standard output. 9.1 Features of ExpV1 The start symbol of the language is Program, and each program consists of an expression, whose value is printed after the execution. The expressions are 224 Chapter 9. Models of Expressions evaluated by storing the value of each subexpressions in an attribute val, which is modeled as a dynamic unary function. Declarations The signature of the global declarations consist of the single dynamic function val( ), together with the derived function defined(n) == (n != undef) and the declaration of ASM handleError( , ) which is used to handle run-time errors such as division by zero.. Decl. 1: function val(_) derived function defined(n) == ( n != undef) external function handleError(_,_) The Montage Program in Figure 45 specifies the semantics of the startsymbol of the ExpV1-language. The execution of such a program visits first the exp-component and then the PrintIt-state is visited. The PrintIt-action outputs the attribute val of the exp-component to the standard output stdout. Program I ::= exp S-exp PrintIt T @PrintIt: stdout := S-exp.val Fig. 45: Montage Program of language ExpV1 9.1.1 The Atomar Expression Constructs The atomar expression constructs Number (Figure 46) and Boolean (Figure 47) use both a derived attribute constantVal to calculate their constant values. The definition of constantVal in the Number-Montage uses the built-in Nameattribute to get the parsed string-value of the Digits-token, and then applies the built-in strToInt-function to transform the string-value in an integer. The corresponding definition for the Boolean-Montage transforms the strings “true” or “false” in the corresponding elements true and false. The dynamic semantics of both constructs consists of the unique state setVal whose action updates the val-attribute to constantVal. 9.1.2 The Composed Expression Constructs The unary expression uExp is specified in Figure 48. The components of a unary expressions are a unary operator uop and an expression. The local definitions of uExp contain the derived function Apply( , ) 9.1. Features of ExpV1 Number = 225 Digits I setVal T attr constantVal == Name.strToInt attr staticType == ”int” @setVal: val := constantVal Fig. 46: Montage Number of language ExpV1 Boolean = ”true” ”false” I setVal attr constantVal == (if Name = ”true” then true else false) attr staticType == ”boolean” @setVal: val := constantVal Fig. 47: Montage Boolean of language ExpV1 T 226 Chapter 9. Models of Expressions Decl. 2: derived function Apply(op, arg) == (if (arg = undef) else (if op = "+" else (if op = "-" else (if op = "!" else undef )))) then then then then undef arg 0 - arg not arg which is used in the action setVal to calculate the result of the expression and to set the val-attribute to said result. uExp uop ::= = ”(” uop exp ”)” ”+” ”-” ”!” I attr staticType S-exp setVal T == S-exp.staticType @setVal: val := Apply(S-uop.Name, S-exp.val) Fig. 48: Montage uExp of language ExpV1 Binary Expression The binary expression Montage is shown in Figure 49. For standard operations, control flows through the two expressions, and then the setVal-action sets the val-attribute to the calculated value of the binary expression. The arguments to calculate the value are in the val-attributes of the left and right expression, respectively. This standard case is visualized in Figure 50 and corresponds exactly to the Sum Montage in Section 3.3.1, Figure 21. Before we explain the other cases of control flow, we give the definition of the Apply function. Decl. 3: derived function Apply(op, arg1, arg2) == (if op = "and" then (if arg1 = false then false else arg2) else (if op = "or" then (if arg1 = true then true else arg2) else (if (arg1 = undef) or (arg2 = undef) then undef else (if op = "+" then arg1 + arg2 else (if op = "-" then arg1 - arg2 else (if op = "*" then arg1 * arg2 else (if op = "/" then arg1 / arg2 else (if op = "%" then arg1 / arg2 9.1. Features of ExpV1 bExp bop relOp arithOp divOp ::= = = = = 227 ”(” exp bop exp ”)” arithOp relOp ”and” ”or” ”” ”=” ”” ”=” ”==” ”!=” divOp ”*” ”+” ”-” ”/” ”%” (S-bop.divOp) and (S2-exp.val = 0) divisionBy0 I S1-exp S2-exp setVal (op = ’and’) and (S1-exp.val = false) (op = ’or’) and (S1-exp.val = true) attr op == S-bop.Name attr staticType == CalculateType(op, S1-exp.staticType,S2-exp.staticType) condition staticType.defined @setVal: val := Apply(S-bop.Name, S1-exp.val, S2-exp.val) @divisionBy0: handleError(”ArithmeticException”) Fig. 49: Montage bExp of language ExpV1 T 228 Chapter 9. Models of Expressions sumExp I ::= ”(” exp ”+” exp ”)” S1-exp S2-exp setVal T attr staticType == CalculateType(”+”, S1-exp.staticType, S2-exp.staticType) condition staticType.defined @setVal: val := S1-exp.val + S2-exp.val Fig. 50: Montage sumExp of language ExpV1 else (if op = "==" else (if op = "!=" else (if op = "<" else (if op = ">" else (if op = "<=" else (if op = ">=" else undef )))))))))))))) then then then then then then arg1 arg1 arg1 arg1 arg1 arg1 = arg2 != arg2 < arg2 > arg2 <= arg2 >= arg2 Lazy evaluation of conjunction In the flow specification of Figure 49 we see several control arrows, in addition to the described standard way. The first of them, departing from the S1-exp component directly to the setVal-action is labeled with the condition (op = “and”) and (S1-exp.val = false) This arrow guarantees that for the and operation the second argument is only evaluated if the first argument evaluates to true. This behavior is called “lazy evaluation” of conjunction, and is important, if the evaluation of the second argument has side-effects. Lazy evaluation of disjunction Similarly, the flow arrow labeled with (op = “or”) and (S1-exp.val = true) specifies lazy evaluation of disjunction. Division by zero The arrow departing from the second expression to the divisionBy0-action catches the case when the operand is a division, and the second expression 9.1. Features of ExpV1 229 evaluates to zero. The action divisionBy0 calls the ASM handleError. In this language the definition of handleError simply prints error messages to the standard output. If in a later stage, the same Montage is reused in connection with exception handling, the definition of ASM handleError can be refined to a rule triggering a “division by 0” exception. Coming back to the concept of partial evaluation, as discussed in Section 5.5 it is interesting and instructive to look at specialized Montages resulting from considering the binary operators to be static and to partially evaluate all expressions with this information. Examples for Montages resulting from such a specialization of Montage bExp are Montage sumExp (Figure 50), Montage orExp (Figure 51), and Montage divExp (Figure 52). orExp I ::= ”(” exp ”or” exp ”)” S1-exp S2-exp setVal T (S1-exp.val = true) attr staticType == CalculateType(”or”, S1-exp.staticType, S2-exp.staticType) condition staticType.defined @setVal: if S1-exp.val then val := true else val := S2-exp.val endif Fig. 51: Montage orExp of language ExpV1 Conditional Expression The conditional expression cExp is specified in Figure 53. The control enters initially the first expression, and if it evaluates to true true, control flows along the upper arrow to the second expression; otherwise control flows along the lower arrow to the third expression. From those expressions control flows in the setVal-action. This action updates the attribute val. 230 Chapter 9. Models of Expressions divExp ::= ”(” exp ”/” exp ”)” (S2-exp.val = 0) divisionBy0 S1-exp I S2-exp setVal T attr staticType == CalculateType(”/”, S1-exp.staticType, S2-exp.staticType) condition staticType.defined @setVal: val := S1-exp.val / S2-exp.val @divisionBy0: handleError(”ArithmeticException”) Fig. 52: Montage divExp of language ExpV1 cExp ::= ”(” exp ”?” exp ”:” exp ”)” S1-exp.val = true I S2-exp S1-exp setVal S1-exp.val = false S3-exp attr staticType == lcst(S2-exp.staticType, S3-exp.staticType) condition staticType.defined AND S1-exp.staticType = ”boolean” @setVal: val := (if S1-exp.val then S2-exp.val else S3-exp.val) Fig. 53: Montage cExp of language ExpV1 T 9.2. Reuse of ExpV1 Features Concept Description Program start symbol of each language synonym for expressions synonym for litterals the number litteral the Boolean litteral unary expression binary expression conditional expression exp lit Number Boolean uExp bExp cExp FraV3 FraV2 FraV1 u: used ObjV3 ObjV2 ObjV1 ImpV3 ImpV2 ImpV1 r: refined ExpV1 i: introduced 231 i i i i i i i i r r u u u u u r r r r r r u r u r r u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u Fig. 54: Roaster of ExpV1 features and their introduction (i), refinement (r), and use (u) in the different languages 9.2 Reuse of ExpV1 Features Figure 54 displays the so called ”feature roaster” of ExpV1 showing which languages are reusing and refining the ExpV1-features. In the first column, the symbols of the features are listed. After a short description, there is one column per language, and for each feature it is marked whether it is (i) introduced, (r) refined, or (u) used by a language. The column of ExpV1 shows of course an (i) for each feature, since ExpV1 is the first language we define. The remaining columns show that all features with exception of exp and Program are used without refining them by all other languages. The symbol exp is a synonym for expressions, and is of course refined each time a new expression is introduced. The symbol Program is only used to make each example language a testable, complete language. It is therefore different for each language. The feature roaster is shown for each example language in order to visualize the high level of exact reuse and modularity of our specifications. 232 Chapter 9. Models of Expressions 10 Models of Control Flow Statements In this chapter we introduce the concept of statements and their sequential execution. We start with an example language ImpV1 featuring a simple print statement, and the if-then-else statement. The basic concept of a statement is a program construct that can be executed, and through its execution it has effects or changes the state, while, in contrast, an expression is a program construct that can be evaluated, and through its evaluation delivers a result. Thus programming languages without state, e.g. pure functional languages do not feature statements. On the other hand in most imperative and object oriented languages the evaluation of expressions may change the state as well, thus the evaluation of expressions in such languages delivers a result and changes the state. Montages is especially well suited for languages with such “un-pure” expression concepts. Description ExpV1 ObjV3 ObjV2 ObjV1 ImpV3 ImpV2 ImpV1 Program start symbol of each language i r u u u exp synonym for expressions i r r u u i i r r r u u r r r r u u u u u u u r: refined u: used lit, Number, Boolean, uExp, bExp, cExp stm synonym for statements block sequential block of statements r r r r FraV3 FraV2 FraV1 Concept i: introduced r u r u r r printStm the print statement i u u u u u u u u ifStm the if statement i u r u u u u u u Fig. 55: Roaster of ImpV1 features and their introduction (i), refinement (r), and use (u) in the different languages 234 Chapter 10. Models of Control Flow Statements In Section 10.1 we show the grammar, an example program, the Montages, and the feature roaster of language ImpV1. A number of additional control statements are shown in Section 10.2: switch, while, repeat, and for statements. Later in Chapter 14 the while and repeat statements will be refined with versions allowing for break and continue 10.1 The Example Language ImpV1 We use again a small example language to explain the new constructs. The language is called ImpV1 and its grammar is Gram. 11: Program = block ::= “” stm “” = “;”  printStm  ifStm  block ::= “print” exp “;” ::= “if” exp block [ “else” block ] = (see Gram. 10) block stm printStm ifStm exp where the definitions for the expressions are inherited from ExpV1 in Chapter 9, Grammar 10 and the Montages in Figures 46 through 53. The Block Statement The Montage for the block statement of ImpV1 is given in Figure 56. The list of stm-components is represented graphically by the special list-box. By default the components of the list are connected sequentially by flow arrows. Thus the control flow enters the list at the first element and traverses it sequentially. block ::= stm LIST I S-stm T Fig. 56: Montage block of language ImpV1 The Print Statement The print statement is specified by the printStm-Montage (Figure 57). The dynamic semantics of the print statement evaluates first the exp-component, and then in the printIt state the val-attribute of the exp-component is sent to standard output. 10.1. The Example Language ImpV1 printStm ::= 235 ”print” exp ”;” S-exp I printIt T @printIt: stdout := ”Printing: ” + S-exp.val + ” ” Fig. 57: Montage printStm of language ImpV1 The If-Then-Else Statement The if-statement of our example language is specified in Figure 58. In order to avoid the usual “dangling-if” problem the if-syntax of ImpV1 forces the user to give each time a block included in curly brackets. The else-part is made optional. The control-flow specification is similar to that found in the cExp-Montage, Figure 53. The control enters the if construct at the exp-component, and then, depending whether the expression evaluates to true or false, control flows to the first or the second block. ifStm ::= ”if” exp block [”else” block] S-exp.val = true S1-block I o S-exp S-exp.val = false S2-block condition S-exp.staticType = ”boolean” Fig. 58: Montage ifStm of language ImpV1 T 236 Chapter 10. Models of Control Flow Statements 10.2 Additional Control Statements The following statements are sketched in order to demonstrate the compactness and ease of readability of Montages of different control flow statements. The While and Repeat Statements In Figure 59 we present a simplified version of the while-statement of Section 3.1, Figure 10, where the domain specific action is left away, and ExpV1 style typing is added. This Montages is closely related to the ”repeat...until”or ”do..while”-statement shown in Figure 60. Comparing the two Montages we see how a subtle difference in the semantics of while and repeat is visually documented. whileStm ::= ”while” exp block S-exp I T S-exp.val = true S-block condition S-exp.staticType = ”boolean” Fig. 59: Montage whileStm of language ImpV1 doStm ::= ”do” block ”until” exp ”;” S-exp T S-exp.val = true I S-block condition S-exp.staticType = ”boolean” Fig. 60: Montage doStm of language ImpV1 10.2. Additional Control Statements 237 A Simple For Statement In Figure 61 a very simple for-statement is shown. Two integer expressions are given and then the block is repeated x times, where x is the difference between the two expressions. The val-attribute is used to remember how many times the loop has already been executed. This example is given to show how easy a new iteration construct can be specified, and how near the specification techniques for the semantics are to common programming techniques. The way how the var-field is used to count the repetitions is very similar to the way a programmer would solve the same problem. forStm ::= I ”for” exp ”to” exp block initVal S-block T val  0 decVal condition (S1-exp.staticType = ”int”) andthen (S2-exp.staticType = ”int”) @initVal: if S1-exp.val  S2-exp.val then val := S1-exp.val - S2-exp.val else val := S2-exp.val - S1-exp.val endif @decVal: val := val - 1 Fig. 61: Montage forStm of language ImpV1 The Switch Statement The switch statement is a kind of more powerful conditional-statement. Depending on the value of an expression, the statement ”switches” to one of different statements marked by labels. The statements following the selected statement are executed as well, a behavior called ”fall through”. The following EBNF productions extend Grammar 11 with a switch-statement. Gram. 12: (refines Grammar 11) 238 Chapter 10. Models of Control Flow Statements stm switchStm switchLabelOrStm defaultLabel caseLabel = ::= = ::= ::= switchStm “switch” exp “”  switchLabelOrStm  “” “;” stm  defaultLabel  caseLabel “default” “:” “case” Number   In Figure 62, 63, and 64 the Montages for switchStm, defaultLabel, and caseLabel are given. The components of the switchStm-Montage are an expression, the expcomponent, and a list of components being statements, or labels. Some control arrows in this Montage use the src and trg functions, which denote in arrowlabels the origin and target nodes of the arrow. Further two arrows go not to the list-node, but the node inside the list-node. These arrows denote a family of arrows, one to each component of the list. The control flows first through the exp-component. From there, a family of flow-arrows labeled trg.hasLabel(src.val) leads to the components in the list. Such a label evaluates to true, only if the target of the corresponding arrow is a caseLabel and if that label has a constant value equivalent to the just evaluated exp-component. If the control cannot flow along any of these arrows, it flows to the default-action. From there sources another family of arrows leading to the components of the list. The flow condition trg.isDefault on these arrows leads control directly to the default label in the list. If there is no such label, control flows to the T-action. If any of the discussed arrows led control into the list, all remaining components of the list are executed sequentially. This property is called “fall-through” and typically it is expected to use in most cases an explicit “jump”1 to break out of the switch without falling through all the remaining cases. In our little language, jumping out is not possible. 1 Break or continue. 10.2. Additional Control Statements switchStm ::= switchLabelOrStm I = 239 ”switch” exp ”” switchLabelOrStm ”” ”;” stm defaultLabel caseLabel S-exp default T T trg.isDefault LIST S-switchLabelOrStm trg.hasLabel(src.val) Fig. 62: Montage switchStm of language ImpV1 defaultLabel I ::= ”default” ”:” o T attr constantVal == ”default” attr isDefault == true Fig. 63: Montage defaultLabel of language ImpV1 240 Chapter 10. Models of Control Flow Statements caseLabel I ::= ”case” Number ”:” o attr constantVal == S-Number.constantVal attr staticType == S-Number.staticType attr hasLabel(l) == constantVal = l condition constantVal.defined Fig. 64: Montage caseLabel of language ImpV1 T 11 Models of Variable Use, Assignment, and Declaration Unlike a mathematical variable, which serves as placeholder for values, a variable in imperative and object oriented programming languages is a kind of box which is used to store a value. The value stored in the box is called the value of the variable. The action to exchange the content of the box is called variable update or variable assignment. After a variable has been updated with value ) , the content, or value of remains ) until the next update of . In expressions a variable can be used like a constant. Modeling variables in X ASM can be done in a number of different ways. The simplest, but most inflexible choice is to model each variable as a 0-ary dynamic function. This solution has already been explained in the introduction of Part II where we as well discussed the advantages of using this model in combination with partial evaluation. In Section 11.1 we present a full example language with global variables ImpV2 based on this solution. The disadvantage of this first solution is that two incarnations of a variable named ”x” cannot coexist, since the name of the variable is used as it’s identity. A pattern to solve this problem is the Declarator-Reification patter, which uses the declaration of a variable as its identity. Combining this pattern with the Tree-Structural-Approach pattern allows then to easily introduce several nested scopes. This solution to variable use and declaration is presented in form of the example language ImpV3 in Section 11.2. This language features nested blocks of statements with nested scopes of variable names. The advantages of this second model are ease of animation and ease of specification. Further it may be an advantage that parameterized X ASM are not needed for this kind of model. In general, PXasm are used in the rules and declarations if a special kind of production code has to result, and they are not 242 Chapter 11. Models of Variable Use, Assignment, and Declaration needed for an abstract model serving as prototype and documentation of the language. Finally, in Section 11.3 the variable model is refined using the Field-OfObject-Mapping pattern. The declarations of the variables are interpreted to be fields of a constant element Global. In addition we extend the specification of the assignment construct such that it can model both assignments to simple variables, as well as assignments to variables calculated by expressions. Such assignable expressions are called use, and if they are evaluated they evaluate not only their value, or right value, but as well the variable, or left value. The same pattern is then used in the next two chapters to model an object oriented example language and recursive procedure calls. 11.1 ImpV2: A Simple Name Based Variable Model In this Section we define the example language ImpV2 by extending ImpV1 with simple, name based models for variable update and use. Using the symbol asgnStm for the variable-update, and the symbol use for the variable-use in expressions we extend the grammar rules stm and exp as follows, reusing the other definitions of Grammar 11. The ... notation in synonym productions denotes that all choices of the predecessor language are reused, and extended with some additional synonyms 1. Gram. 13: (refines Grammar 11) stm exp asgnStm use = = ::= = ...  asgnStm ...  use id “=” exp id An overview on the features and their reuse/refinement is given in ImpV2’s feature roaster, Figure 65. The two new constructs use and assign are going to be refined twice in ImpV3 and in ObjV1. Declarations Variables in ImpV2 must not be declared, and each used or assigned variable is directly modeled as an 0-ary dynamic X ASM function which is initialized as 0. The PXasm declaration of those functions for all used variables is given as follows. Decl. 4: (for all s in String: (exists a in asgnStm: a.S-id.Name = s) or (exists u in use: u.Name = s) function $s$ <- 0 ) 1 As we mentioned in the introduction, we would need to extend Montages with inheritance mechanisms to formalize the notion of “reused” and “extended” but have not done this. 11.1. ImpV2: A Simple Name Based Variable Model Concept Description Program exp use start symbol of each language synonym for expressions the use expression ..., lit, Number, Boolean, uExp, bExp, cExp stm synonym for statements block sequential block of statements i i r r i i u u u u u r r r r r r u r i u r r r r u u u u u r r u u r r r r u u u u u u u i r r u u u u u r asgnStm the assignment statement ..., printStm, ifStm FraV3 FraV2 FraV1 u: used ObjV3 ObjV2 ObjV1 ImpV3 ImpV2 ImpV1 r: refined ExpV1 i: introduced 243 Fig. 65: Roaster of ImpV2 features and their introduction (i), refinement (r), and use (u) in the different languages Assignment Statement The specification of syntax and semantics of asgnStm is shown in Figure 66. The attribute signature is introduced for readability, and denotes the identifier representing the variable to be updated. The control flows through the asgnStm by first evaluating the expression, and then triggering the doAsgn-action, doing the update by updating the function named after the string value of signature. The $ operator is used to refer to the 0-ary function corresponding to the value of signature. asgnStm I ::= id ”=” exp ”;” S-exp doAsgn T attr signature == S-id.Name @doAsgn: $signature$ := S-exp.val Fig. 66: Montage asgnStm of language ImpV2 Use Expression The use-Montage in Figure 67 consists mainly of the readVar-action, which sets the val-attribute of the use-expression to the value of the 0-ary function whose signature corresponds to the value of the signature-attribute. 244 Chapter 11. Models of Variable Use, Assignment, and Declaration use I = id readVar T attr signature == Name @readVar: val := $signature$ Fig. 67: Montage use of language ImpV2 11.2 ImpV3: A Refined Tree Based Variable Model In this section we define the language ImpV3 featuring a refined tree based model for variable declaration, use, and update. Variable must be declared prior to their use. The feature-roaster in Figure 68 shows that in addition block and the block-statements bstm are introduced and reused without further refinement by all following languages. The grammar of ImpV3 is given by extending and refining the definitions of Grammar 13. Gram. 14: (refines Grammar 13) stm block bstm var type = ::= = ::= = ...  block “” bstm  “” var  stm type id “;” “int”  “boolean” A declaration consists of the keyword var, the type and the name of the variable. Variables are represented by the node being their declaration in the program. Blocks can contain variable declarations and can be nested, e.g. a block can contain another block. The nesting of blocks defines so called scopes or name spaces. The var-Montages (Figure 69) and the type-Montage (Figure 70) contain only attribute definitions. In the var-Montage, for instance, the signatureattribute returns the name of the variable, and the staticType-attribute returns the static type of the type-nodes. These attributes are used for basic type checks in ImpV3-programs. The dynamic semantic of var does nothing, a situation which is here explicitly specified with a state “skip” having no action associated. This “skip” behavior is the default behavior of a Montages if no states and arrows are given. 11.2. ImpV3: A Refined Tree Based Variable Model Program exp start symbol of each language synonym for expressions the use expression lit, Number, Boolean, uExp, bExp, cExp stm synonym for statements i i u u u r u u i r r r r u u r r r r r r r u u u u i u u u u u u i u u u u u u r r use i asgnStm the assignment statement block list of block statements bstm statement or variable declaration printStm, ifStm var type variable declaration types of the language i i i FraV3 FraV2 Description FraV1 Concept ObjV3 ObjV2 u: used ObjV1 ImpV3 r: refined ImpV2 ImpV1 ExpV1 i: introduced 245 r r r r u u r r u u u u r r r u u u u u u u r r u u u Fig. 68: Roaster of ImpV3 features and their introduction (i), refinement (r), and use (u) in the different languages var ::= type id ”;” skip I attr signature == S-id.Name attr staticType == S-type.staticType Fig. 69: Montage var of language ImpV3 type = ”int” ”boolean” attr staticType == Name Fig. 70: Montage type of language ImpV3 T 246 Chapter 11. Models of Variable Use, Assignment, and Declaration As mentioned, the block-statement may contain not only statements, but as well variable declarations. The block-Montage in Figure 71 links the execution of the mixed statement and variable list sequentially. The unary attribute declTable( ) Attr. 1: attr declTable(n, id) == (choose v in sequence n.S-bstm: (v.var) AND (v.signature = id)) returns a var-component of the bstm-list, whose signature equals the argument of declTable( ); is no such component exists, it returns undef. block bstm ::= = ”” bstm ”” stm var LIST I S-bstm T attr declTable(n) == (choose v in sequence S-bstm: (v.var) AND (v.signature = n)) Fig. 71: Montage block of language ImpV3 The attribute declTable is used by the function lookUp( , ) which has been introduced in Section 5.3.3, as ASM 21, and which uses the ASM 20, enclosing( , ). The ASM enclosing in turn relies on an appropriate definition of Scope, being a set of Montages-names serving as scopes. For our Grammar 14 the correct definition of Scope is Decl. 5: derived function Scope == {‘‘block’’} the set consisting of the single string element “block”. The new versions of the use and asgnStm Montages in Figure 72 and 73 both contain the attribute definition Attr. 2: attr decl == lookUp(signature) for accessing the identity of the variable. Read and write accesses to the variable are then done by updating and reading the unary dynamic function val( ). With other words expressions and variable declarations in the abstract syntax tree are interpreted as objects whose value is given by the attribute val. The difference is that expressions values are only implicitly updated during their evaluation, and variable values are explicitly updated using an assignment statement. 11.2. ImpV3: A Refined Tree Based Variable Model use = 247 id I T readVar attr signature == Name attr decl == lookUp(signature) attr staticType == decl.staticType condition lookUp(Name).defined @readVar: val := decl.val Fig. 72: Montage use of language ImpV3 asgnStm I ::= id ”=” exp ”;” S-exp doAsgn attr signature == S-id.Name attr decl == lookUp(signature) condition (S-exp.staticType) = (decl.staticType) @doAsgn: decl.val := S-exp.val Fig. 73: Montage asgnStm of language ImpV3 T 248 Chapter 11. Models of Variable Use, Assignment, and Declaration 11.3 ObjV1: Interpreting Variables as Fields of Objects A further refinement of the model in the last section is given by language ObjV1 which uses directly the Field-Of-Object-Mapping pattern, modeling the global variables, e.g. the reification of their declarations as fields of a constant Global. The grammar remains unchanged. The new declarations for val, fieldOf, and Global are given as follows Decl. 6: function fieldOf(_,_) constructor Global derived function val(n) == n.fieldOf(Global) Through the redefinition of val as field of the constant Global we can reuse the existing use and asgnStm Montages (Figures 72 and 73) without any change. During the whole specification process we found that there are many instances of exact reuse in Montages, and therefore we have neglected more advanced reuse features such as inheritance. To enable exact reuse in later languages we introduce now two equivalent, refined definitions of use and asgnStm. The new definition of the use-Montage (Figure 74) is semantically equivalent to the old one, but defines explicitly two attributes lObject and lField. These attributes serve as interface for accessing left values of expressions. The right-value is given by the already given definition of the val-attribute. The refined specification of the assignment Montage is given in Figure 75. This version of the assignment works with arbitrary complex use-expressions on the left, as long as the evaluation of this expression results in defining its lObject and lField attributes. The action ASM 64: let o = S-use.lObject f = S-use.lField in f.fieldOf(o) := S-exp.val endlet is then generically working for assignments to global variables, local variables, and instance variables. In the feature roaster of Figure 68 we see that the refined versions of use and asgnStm are reused as they are in all remaining languages, with exception of ObjV2, which is a successor of ObjV1, but not a predecessor for the other languages. In the next two Chapters we show other applications of the Field-Of-ObjectMapping pattern, one for modeling classes, instances, and instance fields, and one for modeling procedures, recursive-calls, parameters, and variables. 11.3. ObjV1: Interpreting Variables as Fields of Objects use = 249 id I readVar T attr decl == lookUp(Name) attr staticType == decl.staticType attr lObject == Global attr lField == decl condition decl.defined @readVar: val := lField.fieldOf(lObject) Fig. 74: Montage use of language ObjV1 asgnStm I ::= S-use use ”=” exp ”;” S-exp doAsgn condition (S-exp.staticType) = (S-use.staticType) @doAsgn: let o = S-use.lObject f = S-use.lField in f.fieldOf(o) := S-exp.val endlet Fig. 75: Montage asgnStm of language ObjV1 T 250 Chapter 11. Models of Variable Use, Assignment, and Declaration 12 Classes, Instances, Instance Fields In this chapter we present the language ObjV2 a simple “object oriented” language, featuring classes, inheritance, instance-fields, and their dynamic binding. Many main-stream languages like Java feature only dynamic binding of methods, and instance-fields are statically bound; our choice to present a language without methods but dynamically bound instance fields allows us to present key features of object oriented languages in a minimal setting. To specify ObjV2 we extend ObjV1, by refining two out of 17 existing Montages and adding six new Montages. Four Montages are introduced to build the syntax for class and field declaration, two to define the new kind of types. The use- and a asgnStm-Montages are refined in order to take into consideration the differences of variable and field accesses and updates. Finally we define two new expressions, the newExp for creating objects, and a cast for casting the dynamic type of an object, in order to allow access to the overridden fields of its super-classes. The grammar of ObjV2 is given as follows. Gram. 15: (refines Grammar 14) Program classDeclaration superId fieldDeclaration type primitiveType typeRef ... exp ::=  classDeclaration  body ::= “class” id  “extends” superId ℄ “”  fieldDeclaration  “” = id ::= type id “;” = primitiveType  typeRef = “int”  “boolean” = id = ...  newExp  cast 252 Chapter 12. Classes, Instances, Instance Fields Concept Description Program exp use start symbol of each language synonym for expressions the use expression i i u u u u u r r r r r r u r i u r r r u u u u u u r r u u r r r r i u r i r u u u lit, Number, Boolean, uExp, bExp, cExp stm synonym for statements printStm, ifStm, asgnStm, body, block, bstm, var type types of the language classDeclaration declaration of OO class fieldDeclaration declaration of object fields synonym of primitive types primitiveType typeRef references to class types newExp cast r r i FraV3 u: used FraV2 FraV1 ObjV3 ObjV2 ObjV1 ImpV3 ImpV2 r: refined ImpV1 ExpV1 i: introduced i i i i i expression for creation of new obj. casting of dynamic object type Fig. 76: Roaster of ObjV2 features and their introduction (i), refinement (r), and use (u) in the different languages newExp cast ::= “new” typeRef ::= “cast” “(” typeRef “,” use “)” 12.1 ObjV2 Programs The start symbol Program of ObjV2 is given in Figure 77. The attribute declTable( ) of this Montage maps identifiers to the corresponding class declaration node, which are modeling the classes. The control enters directly the block of statements, the list of class declarations needs not to be visited. Program I ::= classDeclaration block S-block T attr staticType == Name Fig. 77: Montage Program of language ObjV2 The possible scopes used by lookUp and enclosing are now including Program in addition to the block. Therefore Declaration 5 is refined as follows. Decl. 7: derived function Scope == {‘‘block’’, ‘‘Program’’} 12.2. Primitive and Reference Type 253 12.2 Primitive and Reference Type In language ImpV3 we introduced built-in types, namely integers and booleans. We defined the attribute staticType for expressions and introduced simple type checks. In object-oriented languages the definition of classes or reference types, allows the user to introduce new types. The existing built in types are called primitive types, since the values of these types have no internal structure. In ObjV2 there exist the primitive, built-in types integer and boolean, and the userdefined classes. The existence of different kind of types rises the question how they can be treated in a uniform way, in order to make type checking and variable declarations simple. In ObjV2 we model all types as elements, the primitive types are represented by the string-values corresponding to their name, and the reference types are represented by their declaration-node in the syntax tree. The type-production has two synonyms, primitiveType as specified in Figure 78, and typeRef as specified in Figure 79. The attribute staticType of the first points to the name of the primitive types, and the staticType definition of the second points to the corresponding class-declaration, which is retrieved using the lookUp function. primitiveType = ”int” ”boolean” attr staticType == Name Fig. 78: Montage primitiveType of language ObjV2 typeRef = id attr signature == Name attr staticType == lookUp(signature) condition staticType.defined AND (staticType.classDeclaration) Fig. 79: Montage typeRef of language ObjV2 Type references are specified in Figure 79. Their static semantics guarantees that their staticType attribute refers to a class declaration. The definition of staticType of type references uses the lookUp function introduced earlier. 254 Chapter 12. Classes, Instances, Instance Fields classDeclaration superId ::= = ”class” id [”extends” superId] ”” fieldDeclaration ”” typeRef attr signature == S-id.Name attr superType == S-superId.staticType attr declTable(n) == (choose f in sequence S-fieldDeclaration: f.signature = n) attr fieldTable(n) == (if declTable(n).defined then declTable(n) else (if superType = undef then undef else superType.fieldTable(n))) Fig. 80: Montage classDeclaration of language ObjV2 12.3 Classes and Subtyping Classes are specified in Figure 80. The first component of a class is an identifier, denoting the name of the class. This name is accessible as attribute signature. The second component is an optional type reference to the super type of the class. The attribute superType of a class refers directly to the static type of the type reference to the super type, e.g. to the class declaration of the super type. Again based on the definition of the attribute superType, we can now define the sub-typing relation subtypeOf( , ).j The term subtypeOf(a,b) or a.subtypeOf(b) evaluates to true if either  and + are equal, or if a.superType is defined and this super type is a subtype of the second argument. Decl. 8: derived function subtypeOf(t1, t2) == (t1 = t2) OR (t1.superType.defined AND t1.superType.subtypeOf(t2)) Finally, the last component of a class is a list of field declarations. Each field, as specified in Figure 81 has two attributes, the signature attribute referring to the field’s name, and the staticType attribute referring to the field’s type. Coming back to the class declaration, there are two attributes to refer to the fields, both taking the field-name as argument. The first, declTable( ), returns a field-declaration node from the class’s list of field-declarations, if one of these declarations matches a given field-name, otherwise it returns undef. The attribute fieldTable( ) is collecting field declarations from the class and its super-classes. It tries to find a field-declaration using the previously defined declTable. If there is no field found in the declTable of the class itself, the field table of the super-type is evaluated, if a super-type exists. Otherwise undef is returned. 12.4. Object Creation and Dynamic Types fieldDeclaration ::= 255 type id ”;” attr signature == S-id.Name attr staticType == S-type.staticType Fig. 81: Montage fieldDeclaration of language ObjV2 12.4 Object Creation and Dynamic Types As mentioned earlier, we model objects as ASM-elements. A universe ObjectID( ) of all elements being objects in the specified language is introduced, and a dynamic function dynamicType( ) is used to keep track of the type of the created objects. Decl. 9: univers ObjectID function dynamicType(_) In the newExp-Montage (Figure 82) the specification of the object creation construct is given. The “createObject”-action creates a new member  of the ObjectID universe, sets the dynamic type of  to the static type of the newclause, and sets the val-attribute of the new-clause to . newExp I ::= ”new” typeRef createObject T attr staticType == S-typeRef.staticType @createObject: extend ObjectID with o o.dynamicType := staticType val := o endextend Fig. 82: Montage newExp of language ObjV2 12.5 Instance Fields The instance fields of objects in ObjV2 are modeled as the field-declaratornodes, being linked to the dynamic type of the object via the fieldTable attribute. 256 Chapter 12. Classes, Instances, Instance Fields The values of such fields are modeled using the dynamic function fieldOf( , ). Once the field-declarator node lField is known of an object lObject, the value of that field is read as the following expression lField.fieldOf(lObject) and it is set to a new value ) by the following update lField.fieldOf(lObject) := v 12.6 Dynamic Binding Which field of an object is read or written is determined dynamically, depending on the dynamic type of an object , being determined by expression o.dynamicType. Given field-name , and object , the field is determined by o.dynamicType.fieldTable(f) In the following each Montage of an assignable expression, e.g. an expression that can be on the left-hand-side of an assignment, has attribute definitions lObject, denoting the so called left object, and lField, the left field. Assigning a value ) to such an assignable expression  is done by e.lField.fieldOf(e.lObject) := v The use construct The specification of the use-construct, which serves for variable uses and field accesses and as left part of assignments, is complicated since it covers both simple variable accesses and the above sketched accesses to object fields. The complete specification is given in Figure 83. To simplify the explanations, we deduce by partial evaluation two specialized versions of the use-Montages, one for simple variable accesses and one for instance-field accesses. In the case of a simple variable use, the useOrCast-component and the “.” are not present and the attribute notNested evaluates to true. The specialized Montage for this case is called useVar and is given in Figure 84. Control flows directly to the setValAndType action. This action sets the val-attribute to the value of the referenced variable. The value of variables is stored as a field of left-object Global, and the left-field is looked up by lookUp(signature). The action uses the term lField.fieldOf(lObject) to read the value of the variable. The case of field access is visualized by the Montage useField in Figure 85, which is again obtained by specializing the use-Montage. The attribute nestedUse points directly to the useOrCast-component, which is always present in this case. Control flows first into the useOrCast component, being either again a use, or alternatively a cast. If after the evaluation of this component either it’s dynamic type is undefined, or there results no value, control flows into the 12.6. Dynamic Binding use ::= 257 [useOrCast ”.”] id undefinedFieldAccess (src.dynamicType = undef ) OR (src.val = undef) I I S-useOrCast setValAndType notNested attr signature == S-id.Name attr notNested == S-useOrCast.NoNode attr nestedUse == S-useOrCast attr lObject == (if notNested then Global else nestedUse.val) attr lField == (if notNested then lookUp(signature) else lType.fieldTable(signature)) attr lType == nestedUse.dynamicType condition (if notNested then lookUp(signature).defined AND lookUp(signature).var else true) @setValAndType: let v = lField.fieldOf(lObject) in val := v if not notNested then dynamicType := v.dynamicType endif endlet @undefinedFieldAccess: handleError(”Access of undefined field.”) Fig. 83: Montage use of language ObjV2 T 258 Chapter 12. Classes, Instances, Instance Fields useVar ::= id setValAndType I attr signature == S-id.Name attr notNested == true attr nestedUse == undef attr lObject == Global attr lField == lookUp(signature) attr lType == undef condition lookUp(signature).defined AND lookUp(signature).var @setValAndType: let v = lField.fieldOf(lObject) in val := v dynamicType := v.dynamicType endlet Fig. 84: Montage useVar of language ObjV2 T 12.6. Dynamic Binding useField ::= 259 useOrCast ”.” id undefinedFieldAccess (src.dynamicType = undef ) OR (src.val = undef) I S-useOrCast attr signature == S-id.Name attr notNested == false attr nestedUse == S-useOrCast attr lObject == nestedUse.val) attr lField == lType.fieldTable(signature)) attr lType == nestedUse.dynamicType condition true @setValAndType: let v = lField.fieldOf(lObject) in val := v dynamicType := v.dynamicType endlet @undefinedFieldAccess: handleError(”Access of undefined field.”) Fig. 85: Montage useField of language ObjV2 setValAndType T 260 Chapter 12. Classes, Instances, Instance Fields undefinedFieldAccess-action, triggering a “Access of undefined field”-error. We do not further specify how such an error is handled. If the error does not occur control flows into the setValAndType-action. lObject is the object, and lField the field to be accessed. As mentioned at the beginning of this section, lField is looked up in the field table of the dynamic type of the accessed object. To increase readability, the attribute lType is introduced, denoting the above used dynamic type of the field access. The assignment statement The asgnStm-Montage is given in Figure 86. As we can see, it is not needed to differentiate between variable and field use in this Montage. Further it is possible to assign values both to the above described use Montages, respectively it’s special cases useVar and useField, and the later described cast Montage. This property is achieved by using the definitions of lObject and lField attributes as interface for left values, as discussed earlier in Section 11.3. First control flows through the exp-component, resulting in the evaluation of its val-attribute, and then into the use or cast component, resulting in the evaluation of their lObject and lField attributes. Then the assignment is done in action doAsgn, or, if the types of left and right side are not assignable, the control flows to action wrongAssignment. The exact definition for assignability in ObjV2 is that the dynamic type of the expression is assignable to the static type of the field or variable we are assigning to. In detail, the field or variable to which we assign is lUse.lField, thus the type of the left side, lType is defined as lUse.lField.staticType. The attribute rType denotes the dynamic type of the exp-component, if defined, otherwise the static type. The condition for a correct assignment is that all instances of the rType are instances of the lType, with other words, the rType must be a subtype of the lType. This condition is given as label of the control-arrow from the “S-exp”-box to the “doAsgn”-oval. In the case of correct dynamic types, the same action as in the ObjV1 version of asgnStm (Figure 75) is triggered, assigning to the lField of the lObject of the left-hand-side the value of the right-hand-side expression. 12.6. Dynamic Binding asgnStm I ::= 261 useOrCast ”=” exp ”;” S-useOrCast doAsgn rType.subtypeOf(lType) S-exp attr lUse == S-useOrCast attr lType == lUse.lField.staticType attr rType == (if S-exp.dynamicType.defined then S-exp.dynamicType else S-exp.staticType) @doAsgn: let o = lUse.lObject f = lUse.lField in f.fieldOf(o) := S-exp.val endlet @wrongAsignment: handleError( ”This asignment is not valid, due to ”+ ”dynamic type missmatch.”) Fig. 86: Montage asgnStm of language ObjV2 wrongAsignment T 262 Chapter 12. Classes, Instances, Instance Fields 12.7 Type Casting With the type-casting expression it is possible to change the dynamic type of an object to one of it’s super-types. This is needed for instance, if a field of a subtype hides the definition of a field of a super-type. Hiding in this sense happens if the names of these fields are equal. Using the cast expression, the hidden field of the super-type can be read or written. The specification of the cast-expression is given in Figure 87. The dynamic type check in this Montage ensures, that no field accesses happen on null objects, and that assignments are type correct with respect to the static type of the variable of the field which one is assigning to. The values of attributes lObject and lField are copied from the corresponding fields of the use-component. cast ::= ”cast” ”(” typeRef ”,” use ”)” castError setValAndType I S-use T S-use.dynamicType.subtypeOf(staticType) attr staticType attr lObject attr lField @setValAndType: val := S-use.val dynamicType := staticType @castError: handleError(”CastError”) Fig. 87: Montage cast of language ObjV2 == S-typeRef.staticType == S-use.lObject == S-use.lField 13 Procedures, Recursive-Calls, Parameters, Variables In this chapter we introduce example language ObjV3, featuring function calls, recursion, and call-by-value parameters, as well as local variables. The language is defined by extending and refining the definitions of ObjV1 (Section 11.3), the grammar is given as follows. Gram. 16: (refines Grammar 14) ::=  functionDecl  block = ...  call = ...  returnStm ::= “function” id “(”  var  “)” “:” type body ::= id “(” [ actualParam  “,” actualParam  ] “)” = exp ::= “return” exp “;” Program exp stm functionDecl call actualParam returnStm 13.1 ObjV3 Programs The start-symbol of the grammar, Program, produces a list of function declarations and a block. The execution of an ObjV3 program is done by executing the block. This behavior is given in Figure 89. The same specification defines as well the declaration table for accessing the functions, allowing to access function from any point  in the program as enclosing“Program”declTable   264 Chapter 13. Procedures, Recursive-Calls, Parameters, Variables Concept Description Program exp use start symbol of each language synonym for expressions the use expression i i u u u u u r r r r r r u r i u r r r u u u u u u r r u u r r r r i u r r u u u r r FraV3 u: used FraV2 FraV1 ObjV3 ObjV2 ObjV1 ImpV3 ImpV2 r: refined ImpV1 ExpV1 i: introduced lit, Number, Boolean, uExp, bExp, cExp stm synonym for statements printStm, ifStm, asgnStm, body, block, bstm, var type types of the language functionDecl procedure dec.aration call procedure call actualParam returnStm i actual parameter of call return statment i i r r i i u r Fig. 88: Roaster of ObjV3 features and their introduction (i), refinement (r), and use (u) in the different languages Program I ::= functionDecl block S-block attr declTable(n) == (choose c in sequence S-functionDecl: c.signature = n) Fig. 89: Montage Program of language ObjV3 T 13.2. Call Incarnations 265 13.2 Call Incarnations The semantics of the function calls is based on modeling function-call incarnations as elements of universe INCARNATION. After creation, the current call incarnation is assigned to the dynamic function Incarnation. The new current incarnation is linked to the previous one by the dynamic function lastInc. Decl. 10: universe INCARNATION function Incarnation function lastInc(_) The most simple semantics based on this model calls a function by executing extend INCARNATION with i i.lastInc := Incarnation Incarnation := i endextend and returns from the call by restoring the old value of Incarnation. Incarnation := Incarnation.lastInc From these actions we omitted the details how the call-statement is found, once the called function terminated, how the parameters are passed, and how the result is returned. Before we come to these details we continue to investigate the properties of languages with recursive calls. In contrast to languages without recursion, expressions may have different values in different function-call incarnations, and therefore, the definition of the attribute val is refined to the derived function Decl. 11: derived function val(n) == n.fieldOf(Incarnation) which stores and retrieves values of a program-expression  as the value of field  of object Incarnation, where  is the AST-node representing , and Incarnation is the previously introduced current incarnation. Like this expressions have distinct values in distinct function-call incarnations and at the same time, the old val syntax can be used to calculate expressions within the current incarnation. On the other hand, the val-attribute cannot be used to pass information from one function-call incarnation to the next one, e.g. for passing formal parameter and returning call-results. This will be done by using a simple variable RESULT which is just a 0-ary dynamic ASM function. 13.3 Semantics of Call and Return As mentioned there are two points when information must be passed across incarnations, once when call is triggered and the formal parameters of the function declaration must be actualized, and once when the result of the terminating call is returned. 266 Chapter 13. Procedures, Recursive-Calls, Parameters, Variables For passing information from one incarnation to another we use a simple 0-ary dynamic function called RESULT. The RESULT function is used in the current example language and in following languages whenever information is passed along the control flow. In the Montage for the call construct (Figure 90) we can see the action prepareCall which executes both the above outlined rule for creating a new call incarnation, and which sets the RESULT to the actual parameters. As last component the current call-node self is assigned to the field ReturnPoint of the newly created function-call incarnation. Then the call-Montage sends control to a function-declaration. The control flows to the function declaration being denoted by the decl-attribute of the callMontage. If control entered the function-declaration Montage (Figure 91) the actual parameters are passed to the formal ones, and the RESULT-function is reset to undef. If in the body of the function declaration a return statement (Montage in Figure 92) is reached, the RESULT-function is set to the value of the returned expression, and control is send to the finishCall-action of the call-instance being stored in the field ReturnPoint of the current incarnation. The X ASM declarations for the described processes are given as follows. Decl. 12: function RESULT constructor ReturnPoint external function PassParameters(_,_) derived function Scope == {"block", "functionDecl"} 13.3. Semantics of Call and Return call actualParam ::= = 267 id ”(” [actualParam ”,” actualParam] ”)” exp LIST I prepareCall S-actualParam trg = decl functionDecl finishCall setVal attr signature == S-id.Name attr decl == enclosing(”Program”).declTable(signature) attr staticType == decl.staticType @prepareCall: RESULT := S-actualParam.combineActualParams extend INCARNATION with i ReturnPoint.fieldOf(i) := self i.lastInc := Incarnation Incarnation := i endextend @finishCall: Incarnation := Incarnation.lastInc @setVal: val := RESULT RESULT := undef Fig. 90: Montage call of language ObjV3 T 268 Chapter 13. Procedures, Recursive-Calls, Parameters, Variables functionDecl I ::= ”function” id ”(” var ”)” ”:” type body passActualToFormal S-body noReturnError attr staticType == S-type.staticType attr signature == S-id.Name attr declTable(pStr) == (choose p in sequence S-var : p.signature = pStr) @passActualToFormal: val := PassParameters(RESULT, S-var) RESULT := undef @noReturnError: handleError(”Exiting without return error”) Fig. 91: Montage functionDecl of language ObjV3 returnStm ::= I S-exp ”return” exp ”;” setRESULT trg = ReturnPoint.val finishCall call @setRESULT: RESULT := S-exp.val Fig. 92: Montage returnStm of language ObjV3 13.4. Actualizing Formal Parameters 269 13.4 Actualizing Formal Parameters Before we can pass the actual parameters via the RESULT function, we need to transform the list of expressions of the call-syntax into the list of the actual values of these expressions. This is done by the following derived function. Decl. 13: derived function combineActualParams(al) == (if al =˜ [&hd | &tl] then [&hd.val | &tl.combineActualParams] else al ) In the prepare-action of the call-Montage the resulting list of assigned to the RESULT function, and the new incarnation is created. Then control flows into the corresponding functionDecl-node where the list is retrieved from RESULT and passed together with the list S-Var of formal parameter declarations to the ASM PassParameters which is given in the following. The algorithm traverses the list of values and the list of parameter declarations in parallel and sets the val attribute of each parameter in the second list to the corresponding value in the first list. ASM 65: PassParameters.xasm asm PassParameters(a, f) -- a is sequence of values, f sequence of parameter instances updates function val(_) is function a0 <- a, f0 <- f if a0 =˜ [&ahd | &atl] then if f0 =˜ [&fhd | &ftl] then &fhd.val := &ahd a0 := &atl f0 := &ftl else return "length mismatch of actual and formal parameters" endif else return true endif endasm -- a is sequence of values, f sequence of parameter instances 270 Chapter 13. Procedures, Recursive-Calls, Parameters, Variables 14 Models of Abrupt Control In this chapter we introduce example languages FraV1 (Section 14.2), FraV2 (Section 14.3), and FraV3 (Section 14.4) featuring iteration constructs, exception handline, and a revised version of recursive function calls. All of this languages use the concept of frames which is explained in the next section. A main result of this thesis is the fact the here presented specifications are compositional and provide the same degree of modularity for abrupt control flow features as the normal Montages transitions provide for sequential, regular control flow. 272 Chapter 14. Models of Abrupt Control 14.1 The Concept of Frames For the specification of FraV1, and its relatives FraV2 (exception handling) and FraV2 (procedure calls) we use the frame-result-controlflow or short frame pattern introduced in the introduction to Part III for modeling abrupt control flow. Abrupt control flow is a term for all kind of non-sequential control flow such as breaking out of a loop, throwing an exception, or calling a procedure. A frame is a node in the syntax tree which is relevant for abrupt control. By defining the set of universe names Frame to contain all symbols relevant to abrupt flow, we can jump to the least enclosing frame using the earlier introduced enclosing function. The information relevant for controlling abrupt control flow is passed via the RESULT function, and each frame has an action frameHandler which handles the information, if it is relevant for the frame, and otherwise passes the information further up to the next enclosing frame. In Figure 93 an abstract Montage framePattern visualizes the principle how abrupt control flow is specified with frames. The normal, sequential control flow enters the Montage at the I-edge, and triggers normal processing of the components of the Montage, such as the abstract body component, and then leaves the Montages via the T-edge. Within the body, control follows the sequential transitions, until a statement initiating abrupt control is reached. As an abstract example we show the abruptPattern-Montage in Figure 94. The setRESULT-action of this Montage updates RESULT with the information needed to control the abrupt control, and then sends control to the FrameHandler-action of the least enclosing frame, leading us back to Figure 93. From the reached FrameHandler-state depart two transition. The first is followed if the RESULT is relevant and can be processed by this Montage 1 In this case, the abrupt processing is done, RESULT is reset to undef, and control is led back into the regular sequential flow. If the RESULT is not relevant for the Montage, the control is sent further up to the FrameHandler-action of the next enclosing frame. Since this pattern works for all kinds of abrupt control flow and a certain frame can pass arbitrary information to the next enclosing frame, such definitions are compositional and allow the same degree of modularity for abrupt control flow as the normal transitions do for sequential control flow. In Appendix C a non-compositional model of abrupt control flow is shown. In the following frames are applied to iteration constructs, where the instances of the abrupt pattern are continue and break statements, and where the instances of the frame pattern are the different kinds of loops and the labeled statement. In the next chapter we show exception handling, where the abrupt pattern is used for the throw statement, and the frame pattern is used for try, catch, and finally clauses. As a third example in Chapter 14.4 we reformulate recursive calls using the abrupt pattern for the return-statement and the frame 1 As an example, an exception would be a relevant result to the frame handler of an exceptionconstruct, but a continue would not be a relevant result for the same construct. 14.1. The Concept of Frames framePattern I ::= 273 ... body ... normal processing S-body T unsetRESULT RESULT is relevant frameHandler frameHandler trg = enclosing(Frame) Node @unsetRESULT: RESULT := undef Fig. 93: Montage framePattern of language FraV1 abruptPattern I ::= S-exp ... exp ... setRESULT trg = enclosing(Frame) Node abrupt processing frameHandler @setRESULT: RESULT := ... process(S-Exp.val) .... Fig. 94: Montage abruptPattern of language FraV1 274 Chapter 14. Models of Abrupt Control pattern for the function call and declaration. 14.2. FraV1: Models of Iteration Constructs 275 14.2 FraV1: Models of Iteration Constructs FraV1 features while, repeat, continue, break, and labeled statements. FraV1 extends the earlier while example in Section 3.1 and the control statements of Section 10.2 with continue and break mechanisms. A first model for reaching targets of break and continue statement directly has already been shown in Section 5.3.3. In contrast, the here presented model uses the frame-pattern and is compositional with other kinds of abrupt control flow. The grammar of FraV1 is defined as extension and refinement of the ObjV1 grammar. Gram. 17: (refines Grammar 14) stm = iterationStm continueStm breakStm labelId whileStm doStm labeledStm = ::= ::= = ::= ::= ::= ...  continueStm  breakStm  iterationStm  labeledStm whileStm  doStm “continue” [ labelId ] “;” “break” [ labelId ] “;” id “while” exp body “do” body “while” exp “;” labelId “:” iterationStm The exact definition of the Frame constant together with the declaration of break and continue constructors is given as follows. Decl. 14: derived function Frame == {"whileStm", "doStm", "labeledStm"} constructors break(_), continue(_) The Montages of FraV1 are mostly direct instantiations of the abrupt and frame patterns explained above. The labeled break and continue (Figures 95 and 96) follow the abrupt pattern and set the RESULT to the corresponding constructor term. If this term has the label undef it is catched by the while and do statements (Figures 97 and 98) which are both following the frame pattern. In both Montages we see how the frame-handler sends continue-results back inside the loop, and break-results to a program point after the loop. If the RESULT term has a label , it is catched by the least enclosing instance of Montage labeledStatement (Figure 99), another instance of the frame pattern. This Montage analyzes at the frameHandler-action whether the label in the RESULT matches its own label. If there is a match, the labeled break/continue constructor terms are replaced by their un-labeled versions and control is send to the frameHandler-action of the statement after the label. The static semantics of labeledStm guarantees that this statement is a frame and therefore has a frameHandler-action. The un-labeled break and continue are then catched by a while or do, as mentioned above. 276 Chapter 14. Models of Abrupt Control continueStm I ::= ”continue” [labelId] ”;” setRESULT T trg = enclosing(Frame) frameHandler Node attr signature == S-labelId.Name condition (if not noLabel then enclosing(”labeledStm”)  else true) @setRESULT: if noLabel then RESULT := continue(undef) else RESULT := continue(signature) endif Fig. 95: Montage continueStm of language FraV1 undef T 14.2. FraV1: Models of Iteration Constructs breakStm I ::= 277 ”break” [labelId] ”;” setRESULT T trg = enclosing(Frame) frameHandler Node attr noLabel == S-labelId.NoNode attr signature == S-labelId.Name condition (if not noLabel then enclosing(”labeledStm”) undef else true) @setRESULT: if noLabel then RESULT := break(undef) else RESULT := break(signature) endif Fig. 96: Montage breakStm of language FraV1 T 278 Chapter 14. Models of Abrupt Control whileStm ::= I I ”while” exp body S-exp T T (src.val = true) S-body RESULT = continue(undef) RESULT = break(undef) frameHandler trg = enclosing(Frame) @I: RESULT := undef @T: RESULT := undef Fig. 97: Montage whileStm of language FraV1 frameHandler Node 14.2. FraV1: Models of Iteration Constructs doStm ::= 279 ”do” body ”while” exp ”;” S-body I (srcd.val = true) I S-exp T T RESULT = continue(undef) RESULT = break(undef) frameHandler frameHandler trg = enclosing(Frame) @I: RESULT := undef @T: RESULT := undef Fig. 98: Montage doStm of language FraV1 Node 280 Chapter 14. Models of Abrupt Control labeledStm ::= labelId ”:” stm S-stm I T frameHandler frameHandler (RESULT = break(undef)) or (RESULT = continue(undef)) o trg = enclosing(Frame) attr signature == S-labelId.Name condition S-stm.Name isin Frame @frameHandler: if RESULT = break(signature) then RESULT := break(undef) elseif RESULT = continue(signature) then RESULT := continue(undef) endif Fig. 99: Montage labeledStm of language FraV1 frameHandler Node 14.3. FraV2: Models of Exceptions Concept Description Program exp use start symbol of each language synonym for expressions the use expression i i u u u u u r r r r r r u r i u r r r u u u u u u r r u u r r r r i u r r u u u r r FraV3 u: used FraV2 FraV1 ObjV3 ObjV2 ObjV1 ImpV3 ImpV2 r: refined ImpV1 ExpV1 i: introduced 281 lit, Number, Boolean, uExp, bExp, cExp stm synonym for statements printStm, ifStm, asgnStm, body, block, bstm, var type types of the language throwStm exception throwing tryCatchFinally finally part of exception catch tryCatchClause try part of exception catch catch i single exception catch clause i i i i Fig. 100: Roaster of FraV2 features and their introduction (i), refinement (r), and use (u) in the different languages 14.3 FraV2: Models of Exceptions Example language FraV2 features exception throws and try-catch-finally constructs. It is directly formulated as an extension and refinement of ObjV1. Gram. 18: (refines Grammar 14) stm throwStm tryCatchFinallyStm tryCatchClause catch = ::= ::= ::= ::= ...  throwStm  tryCatchFinallyStm “throw” exp “;” tryCatchClause [ “finally” block ] “try” block  catch  “catch” “(” exp “)” block The semantics of FraV2 is basically given using the frame pattern of Section 14.1, and therefore the given Montages can be freely combined with other languages based on the frame pattern. The exact definition of Frame and the declaration of the constructor exception are given as follows. Decl. 15: derived function Frame == {"tryCatchClause", "tryCatchFinallyStm", "catch"} constructor exception(_) Exceptions in FraV2 are triggered using the throwStm construct (Figure 101), an instance of the abrupt-pattern. In our simplified setting the information within the exception( ) constructor are arbitrary values, and exception catching (Figure 102) is based on equality of the exception information and the value in the catch clause. In object oriented languages, exceptions are typically instances of a special exception-class, and catching is done by checking for types, 282 Chapter 14. Models of Abrupt Control rather than values. The presented Montages have been applied to this situation as well, in fact they are taken directly from the specification of exception handling in Java. The following three Montages catch, tryCatchFinallyStm, and tryCatchClause (Figures 102, 103, and 104) are refining the frame pattern by introducing a second action execFinally, which is used to guarantee that the control executes the block after the “finally” keyword in tryCatchFinallyStm even if any exception or other abrupt control has been triggered. Assume normal control enters the tryCatchFinallyStm (Figure 103), which leads directly into the tryCatchClause (Figure 104), and then into the block. If no abrupt control is triggered in the block, the tryCatchClause is then left, control flows back in the tryCatchFinanllyStm-Montage, and then the block after the “finally”-keyword is entered. If again no abrupt control is triggered, tryCatchFinallyStm is terminated normally. There are now two possible places where abrupt control can be triggered, in the block of the tryCatchClause, and in the block after the “finally”. We call the first block try-block and the second block finally-block, and we assume that the triggered abrupt control is an exception throw. If an exception is thrown in the try-block, control is send to the frame handler of the tryCatchClause and the list of catches is entered. Each catch-clause (Figure 102) checks after its o-state whether the value of the exception matches its catch-value. RESULT = exception(S-exp.val) If not, control is sent to the next catch-clause in the list, and if none of the clauses catches the exception, control leaves the list of catch-clauses, exits the tryCatchClause, executes the finally-block, and since RESULT is still set to the unmatched exception, control is passed up to the frame handler of the least enclosing frame. If the catch clause matches the exception, control is sent to the resetRESULTstate, RESULT is set to undef the block of the catch is executed, and control is sent out of the list to the finally-block. For this purpose the action execFinally is introduced, which sends control straight up to the finally-block. Thus after the block of the catch is executed, control is sent to the execFinally-action of the least enclosing frame. Besides this main scenario, there are three more subtle cases, which result from abrupt-control triggered in the finally-block, the expression or block of a catch clause. We discuss here the case of exceptions triggered in these places. ¯ If an exception is triggered in the finally-block, the frame handler of the tryCatchFinallyStm sends control to the least enclosing frame. ¯ If an exception is triggered in the expression or block of a catch clause, the newly triggered exception must not be catched by the catch-list of the enclosing tryCatchClause-frame, but control must be sent to the finally-block directly. Therefore the frame handler of the catch sends control to the execFinally-action of the enclosing frame. 14.3. FraV2: Models of Exceptions throwStm ::= I S-exp 283 ”throw” exp ”;” setRESULT trg = enclosing(Frame) frameHandler Node @setRESULT: RESULT := exception(S-exp.val) Fig. 101: Montage throwStm of language FraV2 T T 284 Chapter 14. Models of Abrupt Control catch I ::= ”catch” ”(” exp ”)” block o S-exp T RESULT = exception(S-exp.val) resetRESULT S-block trg = enclosing(Frame) frameHandler execFinally trg = enclosing(Frame) Node @resetRESULT: RESULT := undef Fig. 102: Montage catch of language FraV2 tryCatchFinallyStm I ::= tryCatchClause [”finally” block] S-tryCatchClause S-block T RESULT.defined and trg = enclosing(Frame) execFinally frameHandler trg = enclosing(Frame) frameHandler Fig. 103: Montage tryCatchFinallyStm of language FraV2 Node 14.3. FraV2: Models of Exceptions tryCatchClause I ::= 285 ”try” block catch S-block T frameHandler T LIST S-catch RESULT = exception(&) execFinally trg = enclosing(Frame) execFinally Fig. 104: Montage tryCatchClause of language FraV2 Node 286 Chapter 14. Models of Abrupt Control Concept Description Program exp use start symbol of each language synonym for expressions the use expression lit, Number, Boolean, uExp, bExp, cExp stm synonym for statements i i u u u u u r r r r r r u r i u r r r u u u u u u r r u u r r r r i u r r u u u r r i FraV3 u: used FraV2 FraV1 ObjV3 ObjV2 ObjV1 ImpV3 ImpV2 r: refined ImpV1 ExpV1 i: introduced printStm, ifStm, asgnStm, body, block, bstm, var type functionDecl call actualParam returnStm types of the language procedure dec.aration r procedure call i i actual parameter of call return statment i i u r r Fig. 105: Roaster of FraV3 features and their introduction (i), refinement (r), and use (u) in the different languages 14.4 FraV3: Procedure Calls Revisited The example language FraV3 is a revised, frame-pattern version of ObjV3 which can be composed with the definitions of FraV1 and FraV2. The declaration of the frame-universe consists only of “functionDecl”, and the constructor callResult( ) is needed to wrap the call results, similar how the exception values or break/continue labels have been wrapped in the last two chapters. Decl. 16: function RESULT derived function Frame == {"functionDecl’’} constructor callResult(_), ReturnPoint The given Montages work like the ones of ObjV3 in Chapter 13, with the following differences. ¯ In the returnStm-Montage (Figure 106) the result ) is not directly assigned to RESULT, but as constructor term callResult(v). Further control is not sent directly to the caller, but to the frameHandler of the least enclosing frame. ¯ In the call-Montage (Figure 107) the frameHandler-action is introduced, and sends control only to the setVal-action if the returned result is a callResult-term. Otherwise it sends control to the least enclosing frame. The finishCall-action has been removed, its work is taken over by the frame-handler in the functiondeclaration. In addition the setVal-action must unwrap the result from the callResult-term. 14.4. FraV3: Procedure Calls Revisited 287 ¯ Finally in the functionDecl-Montage (Figure 108) a frameHandler-action is added, which resets the incarnation to the last one, and sends control to the caller which is stored as value of the ReturnPoint-constant. A subtle change to the previous specification in ObjV3 is that the call-node where one has to return is no more stored as ReturnPoint-field of the new incarnation, but as ReturnPoint-field of the old incarnation. This change may seem unnecessary, but it turned out to be the only choice due to the following situation. Since we want to allow any kind of abrupt control flow exiting a call correctly, we need to reset the incarnation in the framehandler of the function declaration. All other choices are not correct: ¯ if the incarnation is reset in the frame handler of the call, wrong behavior results from abrupt control triggered in the actual parameters of the call 2 ¯ if the incarnation is reset in a special finishCall-action which is located between the frame-handler and the setVal-action of the call-Montage, we obtain the opposite error: abrupt control returning from the call, but not being a call-result is not triggering the reset of the incarnation and therefore leads to wrong behavior. Since we therefore need to reset the incarnation in the frame-handler of the function-declaration, it is not possible to access the ReturnPoint-value on the new incarnation, which has been lost for ever by reseting the current incarnation to the old one. Therefore it is mandatory in this new situation to store the callnode to which we have to return in the old incarnation. 2 In fact, if we assume a very generalized language design, where return-statements can be used as expressions, then we would need to further refine the semantics in order to avoid the error that a call-result issued by an actual parameter would be interpreted as result of the not yet called function. Our solution works perfectly if the only abrupt control we expect from the actual parameters are exceptions. Since this is the case in all main-stream language we know, we are not further refining the specification at this point. 288 Chapter 14. Models of Abrupt Control returnStm ::= ”return” exp ”;” I S-exp setRESULT trg = enclosing(Frame) frameHandler Node @setRESULT: RESULT := callResult(S-exp.val) Fig. 106: Montage returnStm of language FraV3 T T 14.4. FraV3: Procedure Calls Revisited call ::= actualParam = 289 id ”(” [actualParam ”,” actualParam] ”)” exp LIST I S-actualParam prepareCall !decl frameHandler RESULT = callResult(&) trg = enclosing(Frame) frameHandler Node attr signature == S-id.Name attr decl == enclosing(”Program”).declTable(signature) attr staticType == decl.staticType @prepareCall: RESULT := S-actualParam.combineActualParams extend INCARNATION with i ReturnPoint.val := self i.lastInc := Incarnation Incarnation := i endextend @setVal: if RESULT = callResult(&r) then val := &r RESULT := undef endif Fig. 107: Montage call of language FraV3 setVal T 290 Chapter 14. Models of Abrupt Control functionDecl I ::= ”function” id ”(” var ”)” ”:” type body passActualToFormal S-body frameHandler noReturnError trg = ReturnPoint.val frameHandler Node attr staticType == S-type.staticType attr signature == S-id.Name attr declTable(pStr) == (choose p in sequence S-var : p.signature = pStr) @passActualToFormal: val := PassParameters(RESULT, S-var) RESULT := undef @frameHandler: Incarnation := Incarnation.lastInc @noReturnError: handleError(”Exiting without return error”) Fig. 108: Montage functionDecl of language FraV3 Part IV Appendix A Kaiser’s Action Equations Unpublished joint work with Samarjit Chakraborty. Among the several mechanisms proposed for specifying programming environments, attribute grammar systems have been one of the most successful ones. The main reason for this lies in the fact that they can be written in a declarative style and are highly modular. However, by itself they are unsuitable for the specification of dynamic semantics. The work of Gail Kaiser on action equations (AE) (111; 112) addresses this problem by augmenting attribute grammars with mechanisms taken from action routines proposed by MedinaMora in (151) for use in language based environments. In this appendix, action equations are described and compared with Montages. 294 Appendix A. Kaiser’s Action Equations A.1 Introduction Action routines are based on semantic routines used in compiler generation systems such as Yacc, in which the semantics processing is written as a set of routines in either a conventional programming language or a special language devised for this purpose (3). Each node in the abstract syntax tree (AST) is associated with such actions and the execution of a construct is triggered by calling the corresponding action routine. In contrast to this, actions in AE are given by a set of rules similar in form to semantic equations of attribute grammars. Such equations are embedded into an event-driven architecture. Events occurring at any node of the AST activate the attached equations in the same sense in which in the action routines paradigm commands trigger the associated action routines. Equations which are not attached to any events correspond exactly to the semantic equations of attribute grammars. Equations in this framework can be of five types: assignments, constraints, conditionals, delays and propagates. Assignments and constraints are exactly similar in form, with the difference being that constraints are not attached to events and hence are active at all times. The propagate equations propagate an event from one node of the AST to another after evaluating the equations in that node. Thus the control flow is modeled by propagation of events from one node to the other. This appendix reevaluates the problem of specifying dynamic semantics in an attribute grammar framework for language definitions in an environment generator, by comparing AEs with Montages. Montages can be seen as a combination of Attribute Grammars and Action Routines. For giving the actions, Montages use Abstract State Machine (ASM) rules. There exist a number of case studies applying ASMs to the specification of programming languages. In the case of imperative and object oriented languages, these applications work in the same way as Action Routine specifications, but they have a formal semantics. Montages adapt and integrate the ASM framework for specifying dynamic semantics with attribute grammars, and a visual notation for specifying control-flow as state transitions in a hierarchical finite state machine (FSM). In short the differences between AE and Montages can be summarized as follows. In AE, the semantic processing at each node of the abstract syntax tree (AST) is given by sets of equations which are attached to particular events. The triggering of an event at a node leads to a reevaluation of these equations. Montages on the other hand uses ASM rules to specify such semantic processing, which is strictly different from the concept of using equations. As a second difference, control flow in AEs is specified by propagating an event from a source to a destination node, thereby activating the equations associated with this event in the destination node. In contrast to this, control flow in Montages is specified by state transitions in a finite state machine, which is described using graphical notation. Section A.2 describes how control flow is specified using action equations. Section A.3 contains a description of a number of different control-structures A.2. Control Flow in Action Equations 295 specified using Montages which are found in any imperative or object-oriented language. These are compared to the corresponding specifications written using AE. In order to simplify comparison, we base the Montages direct on the abstract syntax definitions. In one example (Example 6) we show a programming construct whose ASM-action cannot be given as AE equation and in other example (Example 3) we show that our visual notation makes it substantially easier to understand a specification. In the process of describing with Montages the control structures corresponding to AE examples in the literature, an error was discovered in Example 3 of Kaiser’s article in ACM transactions on programming languages and systems (112). The same error would have been hard to overlook in the graphical Montages description. A.2 Control Flow in Action Equations As described above, the AE paradigm is based on the concept of attaching a set of equations with non-terminals of the grammar, and thereby with the instances of the non-terminals as the nodes of the AST. The occurrence of an event at a node of the AST leads to an evaluation of the equations attached to that particular event in that node. Events, like attributes in attribute grammars, can be either synthesized or inherited. The events associated with the left-hand non-terminal of a production, as shown below, are synthesized. Example 1 production event½ equation½½  equation½  event equation½  equation   Here 5!%½½ through 5!%½ are attached to )½ , and similarly for the other events. Inherited events with their attached equations are associated with the right-hand non-terminals of a production. In (112) the left-hand nonterminal is referred to as the goal-symbol, the non-terminals on the right as the components of the goal symbol, and the context-free grammar notation is the same as that introduced in Example 1. Using this notation the inherited events are given as 296 Appendix A. Kaiser’s Action Equations Example 2 goal symbol ::= component½ : type  component : type event On component½ equations event On component½ equations    event On component equations  The On keyword is used to denote that the inherited event is associated with the named component. It was also mentioned that the propagate equation is used to propagate an event from a source to a destination node of the AST. This has the effect of activating the equations at the destination node attached to the named event. Formally the equation is stated as Propagate event To destination Using these equations at each step of the computation, set of equations is dynamically determined and activated. The reevaluation of these equations results in the redefinition of a number of attributes. This redefinition of attributes is used for side-effects. The next Section shows the AE specifications for common control constructs and compares these with Montages specifications for the same constructs. Throughout the Section, sequential control flow is modeled with two kind of events, Execute and Continue. A.3 Examples of Control Structures Example 3 As first example how to model dynamic semantics with AE we take the if statement, as it is described in (112). The ifStm has two children, the condition-part being an expression, and the thenpart, being a statement. ifStm ::= condpart: EXPRESSION thenpart: STATEMENT When the Execute event occurs at an instance of ifStm, the Execute is propagated to the condpart. Execute -> Propagate Execute To condpart A.3. Examples of Control Structures ifStm I ::= 297 condpart: EXPRESSION thenpart: STATEMENT condpart o T condpart.value thenpart Fig. 109: The ifStm Montage After any semantics processing involving the condpart are completed (including, for example, the setting of its value attribute), then the condpart propagates the Continue event to itself. A Continue on the condpart activates the following pair. Continue On condpart -> If condpart.value Then Propagate Execute To thenpart Else Propagate Continue To self If the value-attribute evaluates to true, Execute is propagated to the thenpart. If not, the if statement has completed execution, and Continue is propagated to itself. After the thenpart terminates, the Continue is correspondingly propagated to the ifStm. Continue On thenpart -> Propagate Continue To self Figure 109 we see how the same mechanism is given in terms of a FSM. It the ifStm is executed, the first visited state is the condpart. The semantics processing involving the condpart is given by the related FSM, whose actions set for instance its value attribute. The condpart has then two outgoing control edges along which the processing of the ifStm continues. One of the edges is labeled by ')! and the other has no label. In such cases, the non-labeled edge is assumed to represent the else-case, e.g. the case when all labels of other edges evaluate to true. Consequently, if the condpart.value is true, control continues to the thenpart, otherwise control leaves the ifStm through the terminal T. When the semantic processing of the thenpart terminates, control leaves the ifStm along the unique outgoing arrow. The advantage to have an explicit visual representation of the control flow is that it is much easer to understand and validate the semantics of a construct like the ifStm. This is even indicated by the fact that while we entered the above 298 Appendix A. Kaiser’s Action Equations boolAnd ::= operand1: EXPRESSION operand2: EXPRESSION I operand1 operand2 set T operand1.value = false @set: if operand1.value then value := operand2.value else value := false endif Fig. 110: The boolAnd Montage example we found that the “Continue On thenpart” rule is missing in (112). This rule corresponds to the unique outgoing arrow from the thenpart, and it the user would forget this arrow it would be immediately clear that something is missing. Example 4 The following AE description gives the semantics of a lazy evaluated boolean and as available for instance in Pascal. The second operand must not be evaluated, if the first operand evaluates to false. This is important for the semantics, since expressions may have side effects. After the evaluation of the operands, the value is equal to the value of operand2, if the value of operand1 is true, otherwise it is equal to false. boolAnd ::= operand1: EXPRESSION operand2: EXPRESSION Execute -> Propagate Execute To operand1 Continue On operand1 -> If operand1.value Then Propagate Execute To operand2 Else Propagate Continue To self Continue On operand2 -> Propagate Continue To self Continue -> If operand1.value Then value := operand2.value Else value := false A.3. Examples of Control Structures loop I ::= 299 initialization: STATEMENT condition: EXPRESSION body: STATEMENT reinitialization: STATEMENT initialization condition T condition.value body reinitialization Fig. 111: The loop Montage In Figure 110 we see the equivalent Montage. While the form of the value calculation remains the same, the visualization of the control flow shortens the length of the textual elements considerably. Example 5 Another example is the following loop construct. After initialization, the control loops until the condition evaluates to false. In each cycle, the reinitialization is executed. While in Figure 111 the cyclic control structure is explicitly visible, in the following AE description it is encoded using the events. loop ::= initialization: condition: body: reinitialization: STATEMENT EXPRESSION STATEMENT STATEMENT Execute -> Propagate Execute To initialization Continue On initialization, reinitialization -> Propagate Execute To condition Continue On condition -> If condition.value Then Propagate Execute To body Else Propagate Continue To self Continue On Body -> Propagate Execute To reinitialization Example 6 In a last example we consider a simple construct that repeats a statement ntimes, where n is a constant, positive integer. 300 Appendix A. Kaiser’s Action Equations constRepeat I ::= constant: DIGITS body: STATEMENT init i body dec i T i0 @init i: i := constant @dec i: i := i - 1 Fig. 112: The constRepeat Montage constRepeat ::= constant: DIGITS body: STATEMENT In a Montages specification we would introduce an attribute %, initialize it with constant, and after each time we executed the body we decrease the value of % by one. If after this % is still larger than 0, the body is reevaluated, else constRepeat terminates. In Figure 112 the complete Montage is given, using the name init i and dec i for the two states doing the initialization and the decreasing. Naively one would model this in a similar way with AEs: constRepeat ::= constant: DIGITS body: STATEMENT Execute -> i := constant Propagate Execute To body Continue On body -> if (i - 1) > 0 then Propagate Execute To body i := i - 1 else Propagate Continue To self But using the AE framework, the formalization of   is not possible with one equation. There is an intrinsic circular dependency in such an equation and the try to evaluate it would not lead to a solution. The only possible solution is to introduce a help-attribute &, and to activate in a first step the equation &  A.4. Conclusions 301 and then in a next step to activate the equation & In order to introduce an intermediate step, one needs to introduce a new event helpEvent. Using this the complete AE solution is: constRepeat ::= constant: DIGITS body: STATEMENT Execute -> i := constant Propagate Execute To body Continue On body -> if (i - 1) > 0 then h := i - 1 Propagate helpEvent To self else Propagate Continue To self helpEvent -> i := h Propagate Execute To body This solution introduces an additional complexity which makes the designer’s task more tedious and specifications more verbose, respectively. In this respect, being Montages based on ASM, which is a Dynamic Abstract Data Type framework, presents the advantage that one can express directly the following update   requesting that the original value of the -ary function can be discarded and replaced by a new one without an intermediate step, i.e. by means of a non homomorphic transformation of the algebra modeling the state before the modification. A.4 Conclusions This appendix compared two different paradigms which extend the attribute grammar framework in different ways, for the specification of dynamic semantics in a programming environment generator. Most of the previous work on environment generators were more concerned with the generation of a languagebased editing system. The design of the AE paradigm followed this line, the main focus being incremental semantic processing during editing. In contrast to this, the Montages framework is concerned with the rapid prototyping of a language and focuses on issues like ease of specification. It is understandable that the event oriented view is helpful and probably even necessary for the specification of a system which has to do some interactive processing. Apart from the Execute and the Continue events of AE described in this 302 Appendix A. Kaiser’s Action Equations paper which models the control flow, other events arising from the functionality required in an editor include events like Create, Delete, Clip, etc. Although an editor is currently not generated in the Gem-Mex tool-suite for Montages, we do not foresee any difficulties in doing so. The event-based framework of AE can result in triggering a set of rules from different nodes of the AST. As a result of this equations in different nodes can be active at the same time. Such a system is highly distributed and well suited for situations other than dynamic semantics of sequential languages. In this paper we consider only the application of the event-mechanism to situations with a single sequential tread of control. For these situations we are able to present the sequential control flow in terms of FSMs. For distributed situations FSMs would have to be replaced with PetriNets or StateCharts. B Mapping Automata Joint work with Jörn Janneck, published as technical report (101) In this appendix we describe Mapping Automaton (MA), a variant of Gurevich’s Abstract State Machines (GASM). The motivation for this work is threefold. First we want to make the MA view explicit in a formal way. Second the MA and the mapping from GASM to MA serve as implementation base for a GASM interpreter written in Java (100). And finally the definition of MA simplifies the syntactic aspect as well as the structure of a state by removing the concept of ’signature’. Removing signature and the induced structure from the specification language and the state, respectively, makes state and specification completely orthogonal, only connected by an interpretation of the basic syntactic constants. These constants play the role of syntax (vocabulary), which are independent from the structure of the semantics (objects, and the interpretation of 6 ). In effect, any specification may be interpreted in any state (that has certain basic properties, such as being ’big’ enough to allow sufficiently many objects to be allocated), which in turn means that different specifications may be interpreted on the same state. We believe that this will allow us to compose specifications much easier than was possible in GASM, an interesting aspect of this improved compositionality possibly being the easy integration of object-based constructs into the concept with a view of making it a practical specification and prototyping method in such environments (99). 304 Appendix B. Mapping Automata B.1 Introduction The motivation for MA starts with Gurevich’s claim that in dynamic situations, it is convenient to view a state as a kind of memory that maps locations to values (82). A location is a pair of an ' -ary function name and an ' -tuple of elements. Such a memory is partitioned in different areas each consisting of the locations belonging to one function. We believe that it is often more appropriate to view a state as a collection of objects, each associated with a mapping from attributes to values. In this view the notions of attribute, value, and object are unified. This allows to model a large number of commonly used data structures, e.g. records with pointer attributes, arrays with dynamic length, stacks, or hashtables. For the moment we restrict our interest to completely untyped object systems. Such systems can be modeled with a Tarski structure having only one binary function, encoding the objects and their associated mapping. We fix the name of this function to 6 . Mapping Automaton (MA), is a name for the combination of the above explained object-view on state with GASM whose vocabulary contains only the binary 6 and a set of static constants. We define and investigate MA as a mathematical object, by adopting the definition of GASM over mapping-structures to the MA view, i.e. the 6 function is made part of the formal definition of MA states. Finally we give a formal mapping from GASM to MA. In the next section, the used static structures are described, then MA are defined formally. In Section B.4 the definition of transition rules is adopted to MA. In the last section of this chapter the mapping from GASM to MA is formalized. B.2 Static structures Before we present MA as describing the dynamic transition from one state to the next, we first make precise our notion of state. For MA, this notion is completely independent of any syntactical concepts and indeed of the existence of any MA defined for it. B.2.1 Abstract structure of the state Our intuitive concept of state is that of a structure between objects of a set. This set, the set of all admissible objects that may ever occur in the computation to be modeled, we will subsequently call our universe  . We will not make any assumptions about its nature, except that it be big enough (cf. section B.4.5 for details on this) and contain a special element . We will refer to the elements of  as objects. Given such as universe we can now define our concept of state as follows: Intuitively, we may think of a state as a mapping 6 , that assigns each element B.2. Static structures 305   of a unary function over . Many common data structures can be directly conceptualized in this way: records (mapping field names to field values), arrays (indices to values), hash-tables (keys to values), etc. Of course, higher arities may be modeled by successive application of unary functions or with tuples. 1 Alternatively, and equivalently, a state may be regarded as a mapping of pairs of objects to objects, i.e. as a two dimensional square table with objects as entries. Formally, Def. 28: State space.. Given a universe  , we define the state space of  to be   Note that the equation           Í supports the alternative views of the state as either a square table populated by objects or a mapping of objects to mappings. Since these are two equivalent manners of speaking, we will freely alternate between these two conceptions of a state, talking about a mapping associated with an object, or equivalently refer to an object as being an index to a row in the state table (assuming here and in the following that a row corresponds to a mapping). B.2.2 Locations and updates The structure of such a state is changed in one atomic action by a set of pointwise updates, which specify a location to be set to a new value. However, MA locations are somewhat simpler than those in GASM, since they basically specify a place in the two-dimensional position in the state table, i.e. they are a pair of objects.  , a location is a pair in  , the set of all locations is     . An update is a pair consisting of a location and an element in  , the set of all updates is thus defined as Í    . Def. 29: Location and update.. Given a universe Applying a set of such updates results in a new state, with the entries in the square table changed to the values given in the update set: Def. 30: Application of update set.. Given a state 6   ÙÍ and an update set , applying to 6 yields the successor state 6 – symbolically 6 6 – that is defined as follows: Ù ¼ 6 + ¼ 1  ) 6+  + )  Ù otherwise See also the discussion in section B.5.2 for more details. ¼ 306 Appendix B. Mapping Automata Clearly, the above definition only yields a well-defined function if the update set contains at most one new value for a given location. This condition is called consistency.  is called consistent, iff $½ )½ $¾ )¾    $½  $¾  )½  )¾ Def. 31: Consistency.. An update set In the following, we assume an update set to be consistent. Since there are several possible ways of defining the effects of the application of inconsistent update sets, each with its respective merits and drawbacks, we will not commit ourselves to one particular version and choose to leave this point open for further discussion. B.3 Mapping automata Mapping Automata (MA) describe the evolution of a state as defined above. Although its structure differs slightly from GASM, where it is an algebra of a given signature, the evolution is still described by a rule, that computes an update set for a given state and the application of this update set to the state it was computed for, resulting in the successor state. Formally, we define MA as follows: Def. 32: Mapping automaton.. A mapping automaton is a pair a set of constant symbols and  a rule.  , with      The constant symbols  are similar in function to the signature in GASM in that they serve as anchor points for interpretation and also term evaluation, as will be seen below.2 Such an MA is related to some state universe by an interpretation as follows:  and a mapping automaton    , we call a function      an interpretation of . Def. 33: Interpretation.. Given a universe Without going into the details of how such a rule may be described (this will be the task of section B.4, this is what it does: Given an interpretation, it computes an update set from some state. Formally, Def. 34: Rule.. Given an MA and an interpretation of its constant symbols, its rule maps states to update sets:     2 In fact, as will become clear in section B.4, these symbols not only serve as constants, but also as the namespace for quantified and other variables. However, since the interpretation  is never updated during the execution of an MA, and since even when some variable binding shadows a constant in the scope of a rule, this at least is not destructively modified in its scope, we will stick to this name. B.4. A rule language and its denotation 307 Now we can make precise the ’dynamics’ of an MA, by defining a run starting from some state 6 : Def. 35: Run.. A run of an MA 6 ¾ such that       6 6    starting from some initial state  is a sequence 6   6  Of course, a run terminates iff ex . such that 6 B.4 6  6 for all % / . . A rule language and its denotation In the following we will suggest a notation for MA rules, which parallels the one suggested for GASM in (82). Following (82), we will give the denotation of each construction in our notation in terms of the update set that it represents given an interpretation and a state – according to definition 34. First, however, we will develop the notion of term, which are basic constituents in most rule constructs. B.4.1 Terms Terms are some kind of syntactic structure that we use to refer to objects of the universe. Some objects of the universe we can refer to directly using constant symbols and an interpretation of them. For others we form compound terms and use the state. Therefore, we will define the evaluation in a given state 6  and under some interpretation  . MA terms are very simple structures:3 They are either constant symbols, or pairs of terms. The latter can be intuitively thought of as signifying the application of the mapping that is bound to the value of the first term to the value of the second - which is the intuition that is responsible for the name of mapping automata.4 Since we also need a basic predicate testing for the equality (i.e. identity) of two objects, this is also a term. Def. 36: Terms.. Let  be a set on constant symbols. Then the set of all terms  of  is defined to be the smallest set such that                3 However, see. section B.4.3.2 for an extension that complicates things somewhat. Making application left-associative, one can write the term    in the more familiar for   . 4 308 Appendix B. Mapping Automata            They are assigned a value in a given state in a most straightforward way: constants are mapped to their interpretation, while pairs are evaluated by applying the map associated with the first element to the value of the second, or, equivalently, simply applying the state 6 to the pair of values of the two terms. The identity test is if the two terms to not yield the same object. If they do, however, this test must produce some other element, which we will call  here, but which has no special significance other than being different from .  . Then we define the value under interpretation  recursively as Def. 37: Term evaluation.. Given a set of constant symbols )  ℄ of a term  in a state 6  follows:      for     6 )  ℄ )  ℄  )  ℄  )  ℄    ℄  otherwise )   ℄ )   ℄   ) B.4.2      Basic rules constructs Now we will outline a few basic rule constructs and give their meaning by the rule they denote. The skip construct .% has no effect on the state. Its denotation is accordingly the empty set for any state:  skip℄6    The most fundamental non-empty rule construct is the single atomic update, which we denote as ½ ¾   Given a state 6 , it denotes an update set consisting of one update:  ½ ¾  ℄6   )   ℄ )  ¾ ℄ )  ℄  ½   The conditional rule construct decides which of two rules to fire according to the value of a term: if  then ½ else ¾ endif Its denotation is therefore:  if  then ½ else ¾ endif℄6        ℄6 ¾ ℄6  ½ )  ℄  otherwise  B.4. A rule language and its denotation 309 We also define the parallel composition of two rule descriptions, written as 5  Its denotation is simply the union of the update sets:   B.4.3  ℄6     ℄6    ℄6 First-order extensions As shown by Gurevich (82), one can add first-order constructs to describe both rules and terms. We will start with rule constructs and then turn to first-order terms. B.4.3.1 Do-forall rule The do-forall rule construction allows to compute the update set of a rule description with some constant symbol bound to each element of some set. Its syntax is as follows: do forall in   enddo is a constant symbol, a rule description, and  specifies the set the elements which will be bound to in . Clearly, we must somehow restrict the sets that may thus be iterated upon, not only for practical reasons.6 We choose to restrict  to constructions of the form ,  or ' , where  is any term. These then denote the domain and range, respectively, of the mapping associated with the value of . 7 Def. 38: Domain and range of mappings.. Given an  , we define its domain and range (equivalently the domain and range of the mapping associated with it) as ,  '      6                  6     With this, the denotation of the above set constructions becomes   , ℄    ' ℄      , )  ℄ ' )  ℄     Now we can define the denotation of the do-forall rule construct as the union of all updates resulting from the body for each individual element of the specified set bound to the constant symbol:  do forall in   enddo℄6        ℄   ℄6  ℄ 5 Since at this point we have no notion of blocks as in (82), we need no do in-parallel syntax that except for inconsistencies, this rule notation is otherwise equivalent to. 6 From a theoretical point of view, allowing, a rule to iterate on, say, Í would potentially make the entire universe accessible, and thus the reserve empty – see section B.4.5 for details. 7 Further constructions might be useful here and harmless in the sense discussed in the previous footnote, such as a range of integers (if these are available) etc. However, without making any assumptions about the structure of Í , the above seem to be most natural. 310 Appendix B. Mapping Automata B.4.3.2 First-order terms First-order terms extend the definitions of the set   of terms for a set of constant symbols (see definition 36 by the following clauses, assuming   ,       '       the set of set-expressions:               ' %                   % %      The forall-term evaluates to  iff  evaluates to something else than for all elements of the set denoted by  bound to the symbol , and to otherwise. The exists-term is if  is for all elements of that set, and  otherwise. Binding an object to a constant symbol is tantamount to changing the interpretation at point to this new value, which we will write as    ℄. )  forall in   ℄  )  exists in   ℄    B.4.4      ℄  )   ℄  otherwise      ℄  )   ℄  otherwise       ℄ ℄ Nondeterministic rules The basic nondeterministic construction is choose in   endchoose Intuitively, this nondeterministically selects one of the values in the set denoted by , binds it to and evaluates . In order to capture this intuition we must introduce a nondeterministic denotation     ℄6  of a rule description , which is a set of alternative update sets. For the choose-construct above, its (nondeterministic) denotation would be as follows:   choose in            ℄ ℄  endchoose℄6    ℄6   ℄   otherwise  Of course, we now have to give nondeterministic denotations for the other rule constructs as well, which can be done as follows: B.4. A rule language and its denotation 311   skip℄6    skip℄6         ℄6         ℄6  else  endif℄6    if  then        ℄6  ℄6  ) ℄  otherwise     ℄6         ℄6         ℄6    do forall in   enddo℄6                ℄             ℄6 ℄ Except for the do-forall case (and the parallel composition case, which can be considered a special case of the former), the nondeterministic denotation is very similar to the deterministic case, except that we talk about a set of update sets. For the do-forall construct, one has to consider all combinations of nondeterministic choices at each instance of the rule and build the union over these. The notion of a run is of course also affected by non-deterministic constructions. If a rule yields a set of update sets instead of just one, a non-deterministic run then is defined like this: Def. 39: Non-deterministic run.. A non-deterministic run of an MA from some initial state 6 is a sequence 6   such that  6 6  6  6 B.4.5   such that   starting   6   Creating new objects Even though the universe is a static collection of objects, in specifications we often wish to refer to hitherto unused or fresh objects. Therefore, instead of creating new objects and extending the universe itself, we make objects that have so far been unaccessible to the MA accessible by picking them from a part of the universe that we could not refer to. This part, which we will make more precise below, is called our reserve. B.4.5.1 Accessibility and allocation We will define the set of all objects  (or just  if the interpretation is understood) that a rule can refer to and depend on in a given state 6 under and interpretation  . The definition will inductively include all elements that can be 312 Appendix B. Mapping Automata reached by the constructions of the language, starting from the elements which are the interpretation of the constant symbols: Def. 40: Accessibility.. Given constant symbols , we define the set  of all accessible elements of  in state 6 under interpretation  to be the smallest set such that:       +               6  +      ,       '            Clearly, the result of any rule cannot depend on any object and its surrounding structure that is not in   . In this sense, the accessibility criterion is similar to the rules that govern garbage collection in programming language implementations.8 So in any state 6 and interpretation  , we can only talk about the accessible objects in  . If we allow arbitrary ’construction’ of new objects (as we do in the rule language in section B.4), we have to provide a sufficiently large universe so that we can guarantee that we can recruit new objects from the hitherto ’unused’ (i.e. irrelevant) portion of the universe, which we will call our reserve: Def. 41: Reserve. The set     is called the reserve (of state 6 ). The requirement for a meaningful execution of an MA is therefore that its reserve be non-empty in any reachable state. Clearly, this rules out constructions that allow iteration and updates on the entire universe, such as do forall in      enddo If is a constant symbol interpreted as any non- value, applying the denotation of this rule to any state leads to a state where the entire universe becomes accessible. Of course, the notion of accessibility is strongly connected to the constructions of the rule notation. If some constructs do not occur in a given MA, we 8 However, this definition of global accessibility is far too loose for many practical applications to be used as a basis for storage allocation. Consider for example a situation where is the set of all integer numerals, all strings, and all identifiers. A useful interpretation will supposedly map all these infinitely many symbols to infinitely many different objects, which thus become globally accessible, while any sensible implementation will only create those number objects as they are needed during the computation process. It might make sense, therefore, to restrict the globally accessible objects for a given MA to those which can be reached by terms formulated only in constant symbols actually occurring in the MA rules. We will not further elaborate this point here. B.5. Comparison to traditional ASMs 313 may adapt the accessibility definition accordingly. This is of particular importance when we restrict the language by imposing some kind of static structuring on the rules – then the set of visible elements in this kind of automaton may be quite different from the one we must assume for general MA. See section B.5.2 for an example and an application of this principle. B.4.5.2 The import-rule Constructing the reserve in the above way allows us to give meaning to the notion of importing new or fresh elements into our visible part of the universe. The basic rule to pick an object from the reserve looks like this: import endimport This rule actually does three things: it first picks an element from the reserve, binds it to the symbol and then executes the rule body in the new context, i.e. in an interpretation that is identical to except at point , which is mapped to the new object instead. If we call the new object chosen from the reserve , we can write the new interpretation as  ℄, and the deterministic and non-deterministic denotation, respectively, then become     import endimport℄6      import endimport℄6      ℄6   ℄6  ℄ ℄     As in (82) we assume that different imports choose different reserve elements. Furthermore, we assume that for any new element , 6   for all . Note also, that the new object does not automatically become a member of   : although it is in  ℄ , the rule body has to manipulate the state so that it can be accessed outside the rule in the next state.   ¼ B.5  Comparison to traditional ASMs In this section we will first shed some light on what we perceive as one of the basic differences between MA and GASM, and then proceed to show their fundamental equivalence (as far as computational expressibility and level of abstraction are concerned). This will serve to document our claim that MA are basically a slightly different way of doing very similar things. B.5.1 State and automata A key difference between traditional ASMs and MA is the relation between a state (and the set of all states) and the automaton: A GASM state is always a state of a vocabulary, i.e. a signature containing some function names of various arities that impose a certain structure on the state. Also, an ASM operating meaningfully on this state must in a sense ’know’ about this structure, i.e. share its vocabulary. 314 Appendix B. Mapping Automata In MA, the situation is somewhat simpler. First, the a state can be meaningfully defined without any recourse to syntactical elements such as function names, or their MA-counterparts, constant symbols. A state is a simple structure imposed on the elements of some universe, indeed, there need not even be an MA, constant symbols, or any other syntactical conventions to be able to talk about a state. However, when we want to refer to particular parts of such a structure, say, individual objects, we must have a way of identifying them so we can investigate the structure ’around’ them. It was felt that the most straightforward way of doing this was to simply give them names, i.e. to provide a set of names and a mapping between these names and their denotations. These names and their interpretation, however, to not in any way introduce a structure into the system – unlike function names of various fixed arities. 9 They are basically a flat collection of distinguishable identifications of elements in the universe. The structure, therefore, is completely separated from the naming. This separation of concerns, leaving structure to the state and naming to the automaton (and its interpretation) that describes the evolution of such a structure, can be leveraged in various ways. For instance, there is no problem in applying several automata (each with its own interpretation and even different sets of constant symbols) to the same state - concurrently, independently, alternatively. This can be used to promote a much higher degree of compositionality of automata. When composing a specification of a set of automata, it might make sense to require them to share the same set of constant symbols. For GASM, sharing the same signature over a large number of automata would seem like a somewhat unnatural requirement, and possibly even involve a good deal of renaming, prefixing, etc. to actually make it work, but for MA this might be a sensible choice for the standard case: for instance, a conceivable set of constant symbols could consist of all identifiers plus all representations of some primitive data types, such as numbers and strings. B.5.2 Equivalence of MA and traditional ASM In this section we show how to map a GASM into an MA and vice versa. The translation from MA to GASM is already given by the fact that MA are defined as a GASM with a special kind of structure. The translation from GASM into MA allows to use the MA tool for GASM tool support, since the translation does not change the abstraction level. In fact the translation deals only with some semantical details, e.g. the adaption of the different views on boolean and relations, and the modeling of n-ary functions with tuples. Before we start describing the translation between GASM to MA we remember the different ways booleans and partial functions are treated. In GASM booleans are modeled by two distinct elements true and false and partial func9 Of course, the names themselves become structured by the way they relate to the different or identical elements of the universe. B.5. Comparison to traditional ASMs 315 tions are modeled by mapping to a third element undef. The carrier set of each GASM needs thus at least three distinct elements, true, false, and undef. Differently, in MA exist only two distinct elements, called bottom and top . is used for partial functions, and as interpretation of false, true is represented by  or any other element in the carrier set. Both GASM and MA are not strict. Mapping a GASM state into an MA state. In general the universe  of objects in a MA consist of at least two elements, one denoted by and the other by . Since the GASM super-universe  contains at least three elements (true, false, and undef ) we need to start with a  containing a third element. The set of constant symbols of an MA modeling a GASM contains at least the three constants true, false, and undef, and each interpretation  maps undef to the element , true to the element , and false to the third default element in  . We will no more make a difference between the symbols  undef, true, false  and the tree objects representing them, and for our convenience. Tuples are modeled in MA by free generated elements with a static mapping as follows:   the associated mapping of the 0-ary tuple () is given by:   !  where  is the free generated one-tuple.  the associated mapping of a one-tuple is given by:  where   !       is a free generated two-tuple.  for each n " 1 the mapping of an n-tuple is given by:            !            If mapping a concrete GASM into a MA  , all elements of  are included into  and all symbols of the vocabulary of are included into the constant symbols of  , and for each of them a new element being its interpretation is included into  . In other words,  consists of the disjoint union of    false, the super-universe  , the elements interpreting the GASM functions, and the above introduced tuples. We need to make a case distinction between functions and relations in GASM. The interpretation of each n-ary function in structure , i.e.  , is reflected in   interpretation of 6 , i.e. 6 :  ¼               # 6               316 Appendix B. Mapping Automata An n-ary relation ' in a GASM is returning either true or false. To make everything fit together we reflect the interpretation of each ' as follows:   ½       '   '   false #      ½       6 ½       '      true # 6  ' ½         Now we need two different wrappings. One is needed to get back the original true,false results of a relational term. The second is needed to map such results back into the , model in MA. Lets thus assume two constants 7 ½ and 7¾ such that: 7 ½  ! 7½  ! 7¾ false ! 7¾  ! false  where   where  false For equality, the usual MA equality can be used, the logical operations in GASM are mapped into MA like normal binary relations. Remark on reachability of course the mappings associated with the tuples and the wrappings 7 ½ and 7¾ must be excluded from the definition of reachability. Mapping a GASM rule into an MA rule We define now a transformation  from GASM rules to MA rules. For notational convenience we leave away the  and  whenever the situation is clear. Terms For all function symbols , the subterms must be transformed:   ½       ℄    ½℄     ℄ For all relation symbols ' , in addition the term is wrapped with 7 ½ :  '½       ℄  7½ '  ½ ℄       ℄ Updates For all function symbols , the subterms must be transformed::   ½         ¼ ℄    ½       ℄   ¼ ℄ For all relation symbols ' , in addition the righ-hand-side is wrapped with 7¾ :  '½         ¼ ℄   '½       ℄  7¾  ¼ ℄ Conditional  if then ½ else ¾ endif℄  if 7¾  then   ½ ℄ else   ¾ ℄ endif B.5. Comparison to traditional ASMs 317 Do forall  do forall % in  Rule enddo℄  do forall % in dom   Rule℄ enddo Choose  choose % in  Rule endchoose℄  choose % in dom   Rule℄ endchoose 318 Appendix B. Mapping Automata C Stärk’s Model of the Imperative Java Core In this appendix we reproduce with Montages the specification of the imperative core of Java as given by Stärk (203), which is based on Schulte and Börgers Java model (33). Our reproduction shows that their style of describing languages with ASM can be directly used with Montages. Using our framework, the resulting specification is shorter and more visual than the original ASM model. In the Montages solution the textual rules are shortened from 85 lines to 29 lines and the complete control flow is specified graphically. The given reproduction can be directly executed using the Gem-Mex tool. In the following we only provide the minimal description, in order to allow for a comparison with our alternative, more compositional specification we give in Chapter 14. The descriptions are an extract from a hand-out given to the students. 320 Appendix C. Stärk’s Model of the Imperative Java Core C.1 Functions The universe Abr contains the unary constructors break( ) and continue( ) denoting the set of reasons for abrupt completion. universe Abr = {break(_), continue(id)} The universe Nrm is the set of normal values, including booleans, integer, ..., and the constant normal. In (203) a dynamic, 0-ary function pos and a universe Pos are used to keep track of the control. These functions are not needed in our reproduction, since we use FSMs. pos corresponds to the current state in the FSM, and Pos corresponds to the set of states in the FSM. function loc(_) The dynamic, unary function loc assigns values to variables. It is updated in an assignment statement. It is also updated as a side effect during evaluation of assignment expressions. We will refer to loc as the local environment. attr val The dynamic attribute val is used to store intermediate values of expressions and results of the execution of statements. It assigns normal or abrupt values to the nodes of the AST. C.2 Expressions exp = lit  id  uExp  bExp  cExp  asgn The reproduced specification contains literals, identifiers, unary-, binary, conditional-, and assignment-expressions. The dynamic semantics of these constructs is given by rules that evaluate the expression and assign the result to the attribute val. lit = Boolean  Number For simplicity only the literal numbers and booleans are considered. Their val attribute is statically initialized with their constant value. Their FSM consists of one state without action. The semantics of a unary expression is given by the Montage in Figure 113 First the exp-component is visited resulting in its evaluation. The result is accessed as S-exp.val and used to calculate the value of the unary expression. According to (203) the JLS-function contains the Java Language Specification (74) definitions for operators. C.2. Expressions uExp uop 321 ::= = ”(” uop exp ”)” ”+” ”-” ”!” cast S-exp I eval T @eval: val := JLS(S-uop.Name, S-exp.val) Fig. 113: The uExp Montage. In a similar way binary expressions are evaluated, see Figure 114. In the case of division by zero, the firing condition guides the FSM in the exit state, otherwise the eval-state is reached. In the exit state execution is stopped abruptly. bExp bop ::= = ”(” exp bop exp ”)” ”*” ”/”  ”+” ”-” ... I S1-exp S2-exp eval T S-bop.Name = ”/” and S2-exp.val = 0 exit @eval: val := JLS(S-bop.Name, S1-exp.val, S2-exp.val) @exit: RAISE EXPRESSION Fig. 114: The bExp Montage. The condition expression cExp is given in Figure 115. After the evaluation of the first expression, depending on their value, control is passed either to the second or third expression. The three different expressions are referenced as S1-expr, S2-expr, and S3-expr, respectively. The condition whether to choose the second or third expression is formalized as src.val. The term src denotes the source of a control arrow. Thus in Figure A115 the firing-condition src.val is equivalent to S1-val.val. As a very convenient feature the term src can as well be used within transition rules. In the later case, src denotes the source of the 322 Appendix C. Stärk’s Model of the Imperative Java Core control arrow that has been used to reach the current state. This fact is used in the copy transition rule val := src.val where the value of the evaluated expression is copied as value of the conditional expression. cExp ::= ”(” exp ”?” exp ”:” exp ”)” src.val I S2-exp copy S1-exp T S3-exp @copy: val := src.val Fig. 115: The cExp Montage. The Montage of an assignment is given in Figure 116. The do-action updates the value of the variable S-id.Name in the local environment to the value of S-exp. Further the value of of the assignment is set to the value of S-exp. asgn ::= I ”(” id ”=” exp ”)” S-exp do @do: loc(S-id.Name) := S-exp.val val := S-exp.val Fig. 116: The asgn Montage. T C.3. Statements C.3 323 Statements A total of 8 different statements is given: stm = skipIt  asgnStm  ifStm  whileStm  labeledStm  breakStm  continueStm  block The Montages for the skip (Figure 117), the if- (Figure 118), and the assignment statement (Figure 119) are self explaining. The edges in the while statement (Figure 120) repeated the execution of the statement-component, as long as the value of the expression-component evaluates to true. Another possibility to exit the loop is, if the value of the statement-component evaluates to an abrupt-constructor. If the loop is left, the copy-action sets the value of the while-statement to the value of the last executed construct. In the norm-state, non-abrupt values are reset to normal. skipIt ::= ”;” norm I T @norm: val := normal Fig. 117: The skipIt Montage. ifStm ::= ”if” ”(” exp ”)” stm ”else” stm src.val I S1-stm copy S-exp S2-stm @copy: val := src.val Fig. 118: The ifStm Montage. T 324 Appendix C. Stärk’s Model of the Imperative Java Core asgnStm ::= id ”=” exp ”;” I S-exp do T @do: loc(S-id.Name) := S-exp.val val := S-exp.val Fig. 119: The asgnStm Montage. whileStm I ::= ”while” ”(” exp ”)” stm copy S-exp norm T S-exp.val S-stm Abr(S-stm.val) @copy: val := src.val @norm: if not Abr(val) then val := normal endif Fig. 120: The whileStm Montage. The Montages for the break and the continue statements correspond to the literal expressions. Their value is statically initialized with the corresponding constructor terms. The EBNF rules are breakStm continueStm ::= ::= “break” id “;” “continue” id “;” The value of the break-statement is initialized to break(S-id.Name) and value of the continue-statement is initialized to continue(S-id.Name). C.3. Statements 325 In Figure 121 a block-statement for a fixed block length of 3 is shown. In case of an abrupt completion, for instance break( ) or continue( ), the default flow is overruled by the control arrows with the condition Abr(src.val). In the copy-state the the value of the last executed statement is passed as value of the block. blockOf3 ::= I ”” stm stm stm ”” S1-stm S2-stm S3-stm copy T Abr(src.val) Abr(src.val) Abr(src.val) @copy: val := src.val Fig. 121: The blockOf3 Montage. In Figure 122) the Montages for the block-statement with variable length is given, using the List box. The previously shown fixed-length block is an example how such a List box works: the members of the list are linked sequentially be default-arrows. An arrow leaving from the element inside a list corresponds to a family of arrows, one for each member. 326 Appendix C. Stärk’s Model of the Imperative Java Core block bstm ::= = ””  bstm  ”” stm var LIST I S-bstm copy T Abr(src.val) @copy: val := src.val Fig. 122: The block Montage. The labeled statement (Figure 123) is used to catch the abrupt completions of its statement component. In case of a continue-completion matching the label, and the statement component being a while loop, control is passed again to the statement component. This case is covered by the arrow leaving and entering the S-stm box. Otherwise the usual copy-state recovers the value of the statement-component. In the norm-state, the value is reset to normal, if the statement-value was a break with a matching label. C.3. Statements 327 labeledStm I ::= id ”:” stm S-stm copy norm whileStm(S-stm) and S-stm.val = continue(S-id.Name) @copy: val := src.val @norm: if S-stm.val = break(S-id.Name) then val := normal endif Fig. 123: The labeledStm Montage. T 328 Appendix C. Stärk’s Model of the Imperative Java Core D Type System of Java As example for the use of static semantics technology we show the type system of the Java programming language. For examples we refer to the Java language specifications, editions 1 (74) and 2 (75). The following descriptions are minimal extracts from an executable version running on the Gem-Mex system. A detailed discussion would include a detailed discussion of Java typing, a topic which goes beyond the scope of this thesis. 330 Appendix D. Type System of Java D.1 Reference Types In Java there are primitive types and reference types. Reference types are classes, interfaces, and arrays. Here we introduce classes and interfaces. Our Java model identifies class and interface types with the syntax-tree nodes being the declarations of them. The same technique has been used in a number of ASM models of object-oriented languages (130) and will be used in Section 12. This approach has several advantages, among others the ease of animating typing annotations, and the possibility to ”reload” new versions of a class, without stopping the program; in that case one has simply two copies of the same class, one AST being the old version, used as type of all existing instances of the class, and a new version, a second AST, which will be used as type for new instances to be created. Further its the ideal bases to model advanced features like inner classes. Gram. 19: program ::=  unit  body ::=  classModifier  classOrInterface = classDeclaration  interfaceDeclaration = “public”  “abstract”  “final” ::= “class” typeId [“extends” superId] [“implements” interfaceId “,” interfaceId ] “” memberDeclaration “” superId = typeRef interfaceId = typeRef typeRef = Ident interfaceDeclaration ::= “interface” typeId [“extends” interfaceId “,” interfaceId] “” interfaceMemberDeclaration “” unit classOrInterface classModifier classDeclaration The start symbol program produces a list of units and a body. A unit is a class or interface declaration together with a list of modifiers. The attribute signature is used to unify access to the names of units. Attr. 3: unit: attr signature == S-classOrInterface.signature classDeclaration: attr signature == S-typeId.Name interfaceDeclaration: attr signature == S-typeId.Name Within a class or interface declaration, the function enclosing( , ) (ASM 20, Section 5.3.2) together with the derived set TypeDecl can be used to refer to the enclosing type. D.1. Reference Types 331 Decl. 17: derived function TypeDecl == {"classDeclaration","interfaceDeclaration"} The term n.enclosing(TypeDecl) denotes the least enclosing reference type. Static Typing The attribute staticType is defined for types, where its definition is the identity, type references being used in different declarations, statements, and expressions of Java. Further each Java expression has a static type, which is used as basis for type checking and for evaluating dynamic typing. Attr. 4: classDeclaration: attr staticType == self interfaceDeclaration: attr staticType == self Instances of program have the attribute declTable( ) for looking up the class and interface declarations, given their name. Attr. 5: program: attr declTable(uRef) == (choose u in sequence S-unit: u.signature = uRef).S-classOrInterface Type references can determine their static type looking up the declTable of the least enclosing program or package instance. Here we abstract from packages. Attr. 6: typeRef: attr staticType == enclosing({’’program’’}).declTable(signature) attr signature == Name Modifiers Instances of unit, classDeclaration, memberDeclaration, fieldRest, methodRest, interfaceDeclaration and interfaceMemberDeclaration can have modifiers. Possible modifiers for classes and interfaces are public, final, and abstract. Methods and fields may as well be protected or private, and finally fields may have the modifier static. The attribute hasModifier( ) is used to test for modifier. Its definition contains some parts related to the implicit abstract modifier. Attr. 7: hasModifier( ) unit: attr hasModifier(mStr) == (exists M in sequence S-classModifier: M.Name = mStr) classDeclaration: attr hasModifier(mStr) == Parent.hasModifier(mStr) OR ( (mStr = "abstract") AND isAbstract) interfaceDeclaration: attr hasModifier(mStr) == mStr = "abstract" OR Parent.hasModifier(mStr) 332 Appendix D. Type System of Java A special case is the modifier abstract Class declaration are implicitly abstract, if they have at least one abstract member, or if there is a visible abstract method, which is not implemented by another visible method overriding the first one. Attr. 8: isAbstract attr isAbstract == ( (exists mDec in sequence S-memberDeclaration: (mDec.methodDeclaration) AND (mDec.hasModifier("abstract"))) OR (exists mDec in NODE: mDec.methodDeclaration AND mDec.hasModifier("abstract") AND visible(mDec) AND (not (exists m2Dec in NODE: m2Dec.methodDeclaration AND m2Dec.signature = mDec.signature AND (not (m2Dec.hasModifier("abstract"))) AND m2Dec.enclosing(Scope).subtypeOf(mDec.enclosing(Scope)) AND visible(m2Dec))))) Accessibility A type is accessible from another type  , if either has modifier “public”, or both types are defined in the same program. The attribute accessibleFrom( ) is defined as follows. Attr. 9: accessibleFrom( ) unit: attr accessibleFrom(tDec) == (enclosing({"program"})) = (tDec.enclosing({"program"})) OR hasModifier("public") classDeclaration: attr accessibleFrom(tDec) == Parent.accessibleFrom(tDec) interfaceDeclaration: attr accessibleFrom(tDec) == Parent.accessibleFrom(tDec) D.2 Subtyping The subtyping relation is based on the direct super classes and direct interfaces. The direct super class is denoted in the “extends”-clause and the direct interfaces are denoted by the “implements”-clause. A class without extends clause has the direct super class Object. Decl. 18: constructor Object D.3. Members 333 The definitions for direct super class and direct interfaces are given as follows. Attr. 10: classDeclaration: attr directSuperClass == --JLSv1, 8.1.3;line1-2 (if S-superId.NoNode then Object else S-superId.staticType) attr directInterface(iDec) == (exists iRef in sequence S-interfaceId: iDec = iRef.staticType) interfaceDeclaration: attr directInterface(iDec) == (exists iRef in sequence S-interfaceId.Children: iDec = (iRef.staticType)) Subtyping is basically the transitive closure over the relations directSuperClass and directInterface. Attr. 11: subtypeOf( ) classDeclaration: attr subtypeOf(tDec) == (self = tDec) OR ((directSuperClass != Object) AND directSuperClass.subtypeOf(tDec)) OR (exists iDec in interfaceDeclaration: (directInterface(iDec) AND iDec.subtypeOf(tDec))) interfaceDeclaration: attr subtypeOf(tDec) == --SPECIALIZATION FROM classDeclaration (self = tDec) OR (exists iDec in interfaceDeclaration: directInterface(iDec) AND iDec.subtypeOf(tDec)) D.3 Members Classes and interfaces are characterized by a number of members. Members can be fields or methods. Here we use a dummy definition for methods to shorten the definitions. Gram. 20: memberDeclaration ::== modifier returnType idOrMethId fieldOrMethodRest 334 Appendix D. Type System of Java interfaceMemberDeclaration ::= returnType id fieldOrMethodRest modifier = “public”  “protected”  “private”  “final”  “static”  “abstract” returnType = voidType  type idOrMethId = Ident  methId fieldOrMethodRest = fieldRest  methodRest fieldRest ::= [“=” exp] “,” additionalFieldDeclaration“;” additionalFieldDeclaration ::= Ident [“=” exp] methodRest ::= “(”“)” body The attributes fieldDeclaration and methodDeclaration are used to check whether a member is a field or a method. Attr. 12: memberDeclaration, interfaceMemberDeclaration: attr fieldDeclaration attr methodDeclaration == S-fieldOrMethodRest.fieldRest == S-fieldOrMethodRest.methodRest Static Typing staticType denotes the type of the member, envType the enclosing class or interface declaration. Attr. 13: memberDeclaration: attr staticType == S-returnType.staticType attr envType == enclosing(TypeDecl) interfaceMemberDeclaration: attr staticType == S-returnType.staticType attr envType == enclosing(TypeDecl) Modifiers As in the case of types, modifiers of members denote special properties of them. Some of them are given explicitly, by the modifier-sequence, and others, like “abstract”, may be derived. Attr. 14: hasModifier( ) memberDeclaration: attr hasModifier(mStr)== (exists m2Str in sequence S-modifier: m2Str.Name = mStr) OR S-fieldOrMethodRest.hasModifier(mStr) interfaceMemberDeclaration: attr hasModifier(mStr) == (mStr isin {"public","final"}) OR (mStr = "abstract" AND S-fieldOrMethodRest.methodRest) OR (mStr = "static" D.4. Visibility and Reference of Members 335 AND S-fieldOrMethodRest.fieldRest) fieldRest: attr hasModifier(mStr) == false methodRest: attr hasModifier(mStr) == (mStr = "abstract") AND (S-body.empty) Accessibility Accessibility determines whether a member , is accessible from a type . Formally this fact is written as accessibleFrom , Accessibility of members is a precondition for visibility, which is in turn a condition for a member being present in the declaration table declTable of a Java type. A member is accessible, if it is public, or if it is private and the type from which it is accessed is the same as the type in which it is declared, or if it is not private, and the types it is accessed from and where it is declared in are in the same package, or it is protected, and the type it is accessed from is a subtype of the type it is declared in. Attr. 15: accessibleFrom( ) memberDeclaration, interfaceMemberDeclaration: attr accessibleFrom(tDec) == hasModifier("public") OR (hasModifier("private") and (envType = tDec)) OR ((not hasModifier("private")) and (tDec.enclosing({"package"}) = enclosing({"package"}))) OR ( hasModifier("protected") AND (tDec != Object) AND (tDec.subtypeOf(envType))) D.4 Visibility and Reference of Members A member , is visible in type , formally visible,, if it is a direct member or the following three conditions hold. First, , is accessible from type , second 1 there exists no other member with the same name, being a direct member of , and third, either , is visible in the direct super-class of , or there exists a direct interface of  where , is visible. 1 The third condition in the formula Attr. 16. 336 Appendix D. Type System of Java Attr. 16: visible( ) classDeclaration: attr visible(mDec) == directMember(mDec) OR ( mDec.accessibleFrom(self) AND ( (directSuperClass != Object AND directSuperClass.visible(mDec)) OR (exists iDec in interfaceDeclaration: directInterface(iDec) AND iDec.visible(mDec))) AND ( not (exists m2Dec in NODE: directMember(m2Dec) AND m2Dec.signature = mDec.signature))) interfaceDeclaration: attr visible(mDec) == directMember(mDec) OR ( mDec.accessibleFrom(self) AND (exists iDec in interfaceDeclaration: directInterface(iDec) AND iDec.visible(mDec)) AND ( not (exists m2Dec in NODE: directMember(m2Dec) AND m2Dec.signature = mDec.signature))) D.5 Reference of Static Fields For the reference to static fields the above function visible is now used. A static field is in the declTable of a type  if there exists a unique member among all members of all types with the name being visible in . For the reference to methods, the definition of visible is enough. Attr. 17: declTable( ) classDeclaration, interfaceDeclaration: attr declTable(mRef) == -- only needed for fields, for methods, visible is enough (choose unique mDec in NODE: (mDec.memberDeclSet) AND (mDec.signature = mRef) AND visible(mDec)) Bibliography [1] Proc. First USENIX Conference on Domain Specific Languages, Santa Barbara, California, October 1997. [2] Alcatel, I-Logix, Kennedy-Carter, Kabira, Project Technology, Rational, and Telelogic AB. Action semantics for the uml, omg ad/2001-08-04, response to omg rfp ad/98-11-01, August 2001. [3] V. Ambriola, G. E. Kaiser, and R. J. Ellison. An action routine model for ALOE. Technical Report CMU-CS-84-156, Department of Computer Science, Carnegie Mellon University, August 1984. [4] M. Anlauff. Aslan - programming in abstract state machines. A small stand–alone ASM interpreter written in C, ftp://ftp.first.gmd.de/pub/gemmex/Aslan. [5] M. Anlauff. XASM - An Extensible, Component-Based Abstract State Machines Language. In Gurevich et al. (86), pages 69–90. [6] M. Anlauff, A. Bemporad, S. Chakraborty, P. W. Kutter, D. Mignone, M. Morari, A. Pierantonio, and L. Thiele. From ease in programming to easy maintenance: Extending dsl usability with montages. Technical Report 83, ETH Zurich, Institute TIK, 1999. [7] M. Anlauff, S. Chakraborty, P. W. Kutter, A. Pierantonio, and L. Thiele. Generating an Action Notation environment from montages descriptions. Software Tools and Technology Transfer, Springer, (3):431– 455, 2001. [8] M. Anlauff and P. W. Kutter. The xasm open source project. http://www.xasm.org, 2002. [9] M. Anlauff, P. W. Kutter, and A. Pierantonio. The Gem-Mex tool homepage. URL: http://www.gem-mex.com. [10] M. Anlauff, P. W. Kutter, and A. Pierantonio. Formal aspects of and development environments for Montages. In M. Sellink, editor, Proc. 2nd International Workshop on the Theory and Practice of Algebraic Specifications, Workshops in Computing, Amsterdam, 1997. Springer Verlag. [11] M. Anlauff, P. W. Kutter, and A. Pierantonio. Aslan: Programming with asms. Presentation at the Second Cannes ASM Workshop 1998, June 1998. 338 BIBLIOGRAPHY [12] M. Anlauff, P. W. Kutter, and A. Pierantonio. Enhanced control flow graphs in Montages. In D. Bjorner, M. Broy, and A.V. Zamulin, editors, Perspective of System Informatics, volume 1755 of Lecture Notes in Computer Science, pages 40 – 53. Springer Verlag, 1999. [13] M. Anlauff, P. W. Kutter, A. Pierantonio, and A. Sünbül. Using domainspecific languages for the realization of component composition. In T. Maibaum, editor, Fundamental Approaches to Software Engineering (FASE 2000), volume 1783 of Lecture Notes in Computer Science, pages 112 – 126, 2000. [14] G. Arango. Domain analysis: From art form to engineering discipline. ACM SIGSOFT Engineering Notes, 14(3):152–159, May 1989. 5th Int. Workshop on Software Specification and Design. [15] M. A. Ardis, N. Daley, D. Hoffman, H. Siy, and D. M. Weiss. Software product lines: a case study. Software Practice and Experience, 30(7):825–847, June 2000. [16] M. A. Ardis and J. A. Green. Successful introduction of domain engineering into software development. Bell Labs Technical Journal, pages 10 – 20, July-September 1998. [17] E. Astesiano and E. Zucca. D-oids: a model for dynamic data-types. Mathematical Structures in Computer Science, 5(2):257–282, June 1995. [18] R. Bahlke and G. Snelting. Design and structure of a semantics-based programming environment. International Journal of Man-Machine Studies, 37(4):467 – 479, October 1992. [19] R. A. Ballance, S. L. Graham, and M. L. Van De Vanter. The Pan language-based editing system. ACM Transactions on Software Engineering and Methodology, 1(1):95 – 127, January 1992. [20] A. Basu, M. Hayden, G. Morrisett, and T. von Eicken. A language-based approach to protocol construction. In Pcoc. DSL’97, ACM SIGPLAN Workshop on Domain-Specific Languages, Univ. of Ill. Comp. Sci. Report, pages 1–15, 1997. [21] D. Batory, B. Lofaso, and Y. Samaragdakis. Jts: tools for implementing domain-specific languages. In Proc. of 5th Int. Conf. on Software Reuse, pages 143–153. IEEE Computer Society Press, June 1998. [22] A. Beetem and J. Beetem. Introduction to the galaxy language. IEEE Software, May 1989. [23] D. Bell and M. Parr. Spreadsheets: a research agenda. SIGPLAN notices, 28(9):26–28, September 1993. [24] J. L. Bentley. Programming pearls: Little languages. Communications of the ACM, 29(8):711–721, 1986. BIBLIOGRAPHY 339 [25] OMB Architecture Board. Model-driven architecture: A technical perspective. ftp://ftp.omg.org/pub/docs/ab/01-02-01.pdf, 2001. [26] B. Boehm. Making RAD work for your project. IEEE Computer, pages 113–119, March 1999. [27] E. Börger and I. Durdanović. Correctness of Compiling Occam to Transputer Code. Computer Journal, 39(1):52 – 92, 1996. [28] E. Börger, I. Durdanović, and D. Rosenzweig. Occam: Specification and Compiler Correctness. Part I: The Primary Model. In IFIP 13th World Computer Congress, Volume I: Technology/Foundations, pages 489 – 508. Elsevier, Amsterdam, 1994. [29] E. Börger and J. Huggins. Abstract state machines 1988 – 1998: Commented ASM bibliography. In H. Ehrig, editor, EATCS Bulletin, Formal Specification Column, number 64, pages 105 – 127. EATCS, February 1998. [30] E. Börger and D. Rosenzweig. A mathematical definition of full prolog. In Science of Computer Programming, volume 24, pages 249–286. North-Holland, 1994. [31] E. Börger and D. Rosenzweig. The WAM - Definition and Compiler Correctness, chapter 2, pages 20 – 90. Series in Computer Science and Artificial Intelligence. Elsevier Science B.V.North Holland, 1995. [32] E. Börger and J. Schmid. Composition and submachine concepts for sequential ASMs. In P. Clote and H. Schwichtenberg, editor, Gurevich Festschrift CSL 2000, LNCS. Springer-Verlag, 2000. to Appear. [33] E. Börger and W. Schulte. A programmer friendly modular definition of the semantics of Java. In J. Alves-Foss, editor, Formal Syntax and Semantics of Java, volume 1523 of Lecture Notes in Computer Science. Springer Verlag, 1998. [34] P. Borra, D. Clement, T. Despeyroux, J. Incerpi, G. Kahn, B. Lang, and V. Pascual. CENTAUR: The System. Technical Report 777, INRIA, Sophia Antipolis, 1987. [35] P. Borras, D. Clément, T. Despeyroux, J. Incerpi, G. Kahn, B. Lang, and V. Pascual. Centaur: The system. In Proc. SIGSOFT 88: 3rd. Annual Symposium on Software Development Environments, Boston, November 1988. ACM, New York. [36] G. H. Campbell. Domain-specific engineering. In Proceedings of the Embedded Systems Conference, San Jose, September 1997. Miller Freeman, Inc., San Francisco, www.mfi.com. [37] G. H. Campbell, S. Faulk, and D. M. Weiss. Introduction to synthesis. Technical Report INTRO-SYTNTHESIS-PROCESS-90019-N, Software Productivity Consortium Services Corporation, 2214 Rock Hill Road, Herndon, Virginia 22070, 1990. 340 BIBLIOGRAPHY [38] M. Caplinger. A Single Intermediate Language for Programming Environments. PhD thesis, Department of Computer Science, Rice University, Houston, Texas, 1985. Available as COMP TR85-28. [39] R. J. Casimir. Real programmers don’t use spreadsheets. SIGPLAN notices, 27(6):10–16, June 1992. [40] S. C. Cater and J. K. Huggins. An ASM dynamic semantics for standard ML. In Gurevich et al. (86), pages 203–223. [41] M. Chandy and J. Misra. Parallel Program Design: A Foundation. Addison-Wesley, Reading, MA, 1988. [42] N. Chomsky. Three Models for the Description of Language. IRE Trans.on Information Theory, IT–2:113 – 124, 3 1956. [43] T. Clark, A. Evans, and S. Kent. Engineering modelling languages: A precise meta-modelling approach. In R.-D. Kutsche and H. Weber, editors, Fundamental Approaches to Software Engineering, 5th International Conference, FASE 2002, held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2002, Grenoble, France, April 8-12, 2002, Proceedings, volume 2306 of Lecture Notes in Computer Science, pages 159–173. Springer, 2002. [44] J. G. Cleaveland. Building application generators. IEEE Software, pages 25–33, July 1988. also reprinted in Domain Analysis and Software System Modeling by Prieto-Diaz and Arango 1991. [45] J. G. Cleaveland. Program Generators with XML and Java. The Charles F. Goldfarb Series on Open Information Managment. Prentice Hall PTR, NJ, 2001. [46] C. Consel and O. Danvy. Tutorial notes on partial evaluation. In ACM Press, editor, 20th ACM Symposium on Principles of Programming Languages, pages 493–501, Chaleston, South Caroline, 1993. [47] C. Consel and R. Marlet. Architecturing software using a methodology for language development. In C. Palamidessi, H. Glaser, and K. Meinke, editors, PLILP/ALP’98 Proc. of the 10th Int. Symposium on Programming Languages, Implementations, Logics, and Programs, volume 1490, pages 170–194. Springer, Heidelberg, September 1998. [48] J. O. Coplien. Multi-Paradigm Design for C++. Addison-Wesley, Reading, MA, 1999. [49] J. R. Cordy and C. D. Halpern. Txl: a rapid prototyping system for programming language dialects. In Proc. IEEE 1988 Int. Conf. on Computer Languages, pages 280–285, 1988. [50] D. Cuka and D. M. Weiss. Specifying executable commands: An example of fast domain engineering. Comm. of the ACM, 2001. BIBLIOGRAPHY 341 [51] K. Czarnecki and U. Eisenecker. Generative Programming: Methods, Tools, and Applications. Addison-Wesley, Reading, MA, 2000. [52] P. Dauchy and M. C. Gaudel. Algebraic Specification with Implicit States. Technical report, Univ. Paris–Sud, 1994. [53] M. DeAddio and A. Kramer. The Handbook of Fixed Income Technology, chapter An Object Oriented Model for Financial Instruments, pages 269–301. The Summit Group Press, 1999. [54] G. Del Castillo. Towards comprehensive tool support for abstract state machines: The asm workbench tool environment and architecture. In D. Hutter, W. Stephan, P. Treaverso, and M. Ullman, editors, Applied Formal Methods – FM-Trends 98, number 1641 in LNCS, pages 311– 325. Springer, 1999. [55] Ch. Denzler. Modular Language Specification and Composition. PhD thesis, Swiss Federal Institute of Technology (ETH), Zürich, 2000. [56] V. O. Di Iorio. Avaliação Parcial em Máquinas de Estado Abstratas. PhD thesis, Departamento de Ciência da Computação da Universidade Federal de Minas Gerais, março 2001. in Portuguese. [57] V. O. Di Iorio, R. S. Bigonha, and M. A. Maia. A Self-Applicable Partial Evaluator for ASM. In Proceedings of the ASM 2000 Workshop, pages 115–130, Monte Veritá, Switzerland, March 2000. [58] B. DiFranco. Specification of ISO SQL using Montages. Master’s thesis, Università di l’Aquila, 1997. in Italian. [59] E. W. Dijkstra. A Discipline of Programming. Prentice-Hall, NJ, 1976. [60] K.-G. Doh and P. D. Mosses. Composing programming languages by combining action-semantics modules. In Mark van den Brand and Didier Parigot, editors, Electronic Notes in Theoretical Computer Science, volume 44. Elsevier Science Publishers, 2001. [61] V. Donzeau-Gouge, G. Huet, G. Kahn, and B. Lang. Programming environments based on structured editors: The MENTOR experience. In D. R. Barstow, H. E. Shrobe, and E. Sandewell, editors, Interactive Programming Environments, chapter 7, pages 128 – 140. McGrawHill, New York, 1984. [62] J.-M. Eber and Risk Awards Editorial Bord. Software product of the year. Risk Magazine, 2001. [63] W. Edwardes. Key Financial Instruments, understanding and innovating in the world of derivatives. Financial Times, Prentice Hall, Pearson Education, 2000. [64] P. D. Edwards and R. S. Rivett. Towards an automative ’safer subset’ of c. In P. Daniel, editor, SAFECOMP’97 16th Int. Conf. on Comp. Safety, Reliability, and Security. Springer, 1997. 342 BIBLIOGRAPHY [65] H. Ehrig and B. Mahr. Fundamentals of Algebraic Specification 1: Equations and Initial Semantics, volume 6 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, Berlin, 1985. [66] G. Engels, C. Lewerentz, M. Nagl, W. Schafer, and A. Schurr. Building integrated software development environments Part I: Tool specification. ACM Transactions on Software Engineering and Methodology, 1(2):135 – 167, April 1992. [67] D. K. Every. What is the history www.mackido.com/History/History VB.html, 1999. of vb? [68] R. E. Faith, L. S. Nyland, and J. F. Prins. KHEPERA: A system for rapid implementation of domain specific languages. In Proc. First USENIX Conference on Domain Specific Languages (1). [69] Russian Institute for System Programming. The mpC website. www.ispras.ru/$ mpc, 2003. [70] H. Ganzinger. Modular first-order specifications of operational semantics. In H. Ganzinger and N.D. Jones, editors, Programs as Data Objects, volume 217 of Lecture Notes in Computer Science. Springer Verlag, 1985. [71] K. Godel. The REXX Language. Prentice-Hall, Englewood Cliffs, NJ, 1985. [72] J. W. Goguen, J. W. Thatcher, E. G. Wagner, and J. B. Wright. Initial algebra semantics and continuous algebras. Journal of the ACM, 24:68– 95, 1977. [73] G. Goos and W. Zimmermann. ASMs and Verifying Compilers. In Gurevich et al. (86), pages 177–202. [74] Gosling. The Java Language Specification. Sun Java Press, 1 edition. [75] Gosling. The Java Language Specification. Sun Java Press, 2 edition. [76] R. W. Gray, V. P. Heuring, S. P. Levi, A. M. Sloane, and W. M. Waite. Eli: A complete, flexible compiler construction system. Communications of the ACM, 35(2):121–131, February 1992. [77] J. Grosch and H. Emmelmann. A tool box for compiler construction. In D Hammer, editor, Proceedings of CC’90, number 477 in LNCS, pages 106–116. Springer Verlag, 1990. [78] C. A. Gunter. Semantics of Programming Languages. Foundations of Computing. The MIT Press, 1992. [79] Y. Gurevich. A new thesis. Abstracts, American Mathematical Society, page 317, August 1985. [80] Y. Gurevich. Logic and the Challenge of Computer Science. In E. Börger, editor, Theory and Practice of Software Engineering, pages 1–57. CS Press, 1988. BIBLIOGRAPHY 343 [81] Y. Gurevich. Evolving Algebras 1993: Lipari Guide. In E. Börger, editor, Specification and Validation Methods. Oxford University Press, 1995. [82] Y. Gurevich. May 1997 draft of the ASM guide. Technical Report CSETR-336-97, University of Michigan EECS Department Technical Report, 1997. [83] Y. Gurevich. Sequential ASM Thesis. Bulletin of European Association for Theoretical Computer Science, (67):93–124, February 1999. Also Microsoft Research Technical Report No. MSR-TR-99-09. [84] Y. Gurevich. Sequential abstract state machines capture sequential algorithms. ACM Transaction on Computational Logic, 1(1):77–111, July 2000. [85] Y. Gurevich and J. K. Huggins. The semantics of the C programming language. In E. Börger, G. Jäger, H. Kleine Bünig, S. Martini, and M.M. Richter, editors, Computer Science Logic, volume 702 of Lecture Notes in Computer Science, pages 274–308. Springer Verlag, 1993. [86] Y. Gurevich, P. W. Kutter, M. Odersky, and L. Thiele, editors. Abstract State Machines: Theory and Applications, volume 1912 of Lecture Notes in Computer Science. Springer Verlag, 2000. [87] A. Heberle. Korrekte Transformationsphase - der Kern korrekter Übersetzer. PhD thesis, Universität Karlsruhe, 2000. [88] A. Heberle, W. Löwe, and M. Trapp. Safe reuse of source to intermediate language compilations. Fast Abstract, 9th International Symposium on Software Reliability Engineering, September 1998. http://chillarege.com/issre/fastabstracts/98417.html. [89] G. Hedin. Reference Attribute Grammarsa. Informatica, 24(3):301–318, sep 2000. [90] J. Heering. Application software, domain–specific languages, and language design assistants. In Proc. SSGRR’00 Inter. Conf. on Adv. in Infrastructure for Electronic Business, Science and Education on the Internet, 2000. [91] J. Heering, G. Kahn, P. Klint, and B. Lang. Generation of interactive programming environments. In ESPRIT’85: Status Report of Continuing Work, Part I, pages 467 – 477. North-Holland, 1986. [92] J. Heering and P. Klint. Semantics of programming languages: A tooloriented approach. ACM SIGPLAN Notices, 35(3), March 2000. [93] P. Henriques, M. V. Pereira, M. Mernik, M. Lenič, E. Avdičaušević, and V. Žumer. Automatic generation of language-based tools. In Mark van den Brand and Ralf Laemmel, editors, Electronic Notes in Theoretical Computer Science, volume 65. Elsevier Science Publishers, 2002. 344 BIBLIOGRAPHY [94] R. M. Herndon and V. A. Berzins. The realizable benefits of a language prototyping language. IEEE Transactions on Software Engineering, 14:803–809, 1988. [95] C. A. R. Hoare. Proof of a program: Find. Comm. of the ACM, 14(1):39– 45, 1971. [96] C. A. R. Hoare. Hints on programming language design, chapter 0, pages 31–40. Computer Science Press, 1983. Reprinted from Sigact/Sigplan Symposium on Principles of Programming Languages, Oct. 1973. [97] J. Huggins. The Abstract State Machine Homepage at Michigan, URL: http://www.eecs.umich.edu/gasm/. [98] J. K. Huggins and W. Shen. The static and dynamic semantics of c: Preliminary version. Technical Report CPSC-1999-1, Computer Science Program, Kettering University, February 1999. [99] J. W. Janneck. Object-based mapping automata - reference manual. Technical report, Institute TIK, ETH Zürich. [100] J. W. Janneck. Object-based mapping automata home page. http://www.tik.ee.ethz.ch/ janneck/OMA. [101] J. W. Janneck and P. W. Kutter. Mapping automata. Technical Report TIK Report 89, Institute TIK, ETH Zürich, Institute TIK, ETH Zurich, June 1998. [102] J. W. Janneck and P. W. Kutter. Object-based abstract state machines. Technical Report TIK Report 47, Institute TIK, ETH Zürich, Institute TIK, ETH Zurich, 1998. [103] M. Jazayeri. A simpler construction showing the intrinsically exponential complexity of the circularity problem of attribute grammars. Journal of the ACM, 28(4):715–720, 1981. [104] S. C. Johnson and R. Sethi. yacc: A parser generator. In Unix Research System Papers. Tenth Edition. Murray Hill, NJ: AT&T Bell Laboratories, 1990. [105] C. Jones. End-user programming. IEEE Computer, pages 68–70, September 1995. [106] C. Jones. Estimating Software Costs. McGraw-Hill, 1998. [107] M. A. Jones and L. H. Nakatani. Method to produce application oriented languages. Patent WO9815894, April 1999. [108] N. Jones, C. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall, 1993. [109] S. P. Jones, J.-M. Eber, and J. Seward. Composing contracts: an adventure in financial engineering. In International Conference on Functional Programming. BIBLIOGRAPHY 345 [110] G. Kahn. Natural Semantics. In Proceedings of the Symp. on Theoretical Aspects of Computer Science, Passau, Germany, 1987. [111] G. E. Kaiser. Semantics for Structure Editing Environments. PhD thesis, Department of Computer Science, Carnegie Mellon University, Pittsburg, Pennsylvania, May 1985. [112] G. E. Kaiser. Incremental dynamic semantics for language-based programming environments. ACM Transactions on Programming Languages and Systems, 11(2):169 – 193, April 1989. [113] A. Kalinov, A. Kossatchev, A. Petrenko, M. Posypkin, and V. Shishkov. Coverage-driven automated compiler test suite generation. accepted at LDTA 2003, 2002. [114] A. Kalinov, A. Kossatchev, A. Petrenko, M. Posypkin, and V. Shishkov. Using ASM Specifications for Compiler Testing. In Abstract State Machines - Advances in Theory and Applications 10th International Workshop, ASM 2003, volume 2589 of LNCS, 2003. [115] A. Kalinov, A. Kossatchev, M. Posypkin, and V. Shishkov. Using ASM specification for automatic test suite generation for mpC parallel programming language compiler. In Proceedings of Fourth International Workshop on Action Semantics and Related Frameworks, AS’2002 NS-00-8 Department of Computer Science, University of Aarhus, Technical Report, pages 96–106, 2002. [116] S. Kamin, editor. Proc. First ACM SIGPLAN Workshop on Domain Specific Languages, Paris, January 1997. Published as University of Illinois at Urbana Champaign Computer Science Report URL: www-sal.cs.uiuc.edu/˜kamin/dsl. [117] U. Kastens. Ordered attribute grammars. Acta Informatica, 13(3):229– 256, 1980. [118] H. M. Kat. Structured Equity Derivatives, the definitive guide to exotic options and structured notes. Wiley Finance, 2001. [119] J. Kiczales, G. des Riviéres and D. Bobrow. The Art of the Metaobject Protocol. MIT Press, Cambridge, MA, 1991. [120] P. Klint. A meta-environment for generating programming environments. ACM Transactions on Software Engineering and Methodology, 2(2):176–201, 1993. [121] D. Knuth. An empirical study of FORTRAN programs. Software – Practice and Experience, 1:105–133, 1971. [122] D. E. Knuth. Semantics of Context–Free Languages. Math. Systems Theory, 2(2):127 – 146, 1968. [123] B. Kramer and H-W. Schmidt. Developing integrated environments with ASDL. IEEE Software, pages 98 – 107, January 1989. 346 BIBLIOGRAPHY [124] P. W. Kutter. Executable Specification of Oberon Using Natural Semantics. Term Work, ETH Zürich, implementation on the Centaur System (35), 1996. [125] P. W. Kutter. Integration of the Statecharts in Specware and Aspects of Correct Oberon Code Generation. Master’s thesis, ETH Zurich, 1996. [126] P. W. Kutter. Methods and Systems for Direct Execution of XML Documents. Patent Applications PCT/IB 00/01087, US 09921298, August 2000. [127] P. W. Kutter. The formal definition of anlauff’s extensible abstract state machines. Technical Report 136, ETH Zurich, Switzerland, Institute TIK, June 2002. ftp://ftp.tik.ee.ethz.ch/pub/publications/TIKReport136.pdf. [128] P. W. Kutter. Oo xasms executable semantics, xasmmontages-v0.2.tar. http://www.xasm.org, 2002. [129] P. W. Kutter. Replacing Generation of Interpreters with a Combination of Partial Evaluation and Parameterized Signatures, leading to a Concept for Meta-Bootstrapping. submitted for publication, April 2002. [130] P. W. Kutter and F. Haussmann. Dynamic Semantics of the Programming Language Oberon. Term work, ETH Zürich, July 1995. A revised version appeared as technical report of Institut TIK, ETH, number 27, 1997. [131] P. W. Kutter and A. Pierantonio. Montages: Unified static and dynamic semantics of programming languages. Technical Report 118, Universita de L’Aquila, July 1996. As well appeared as technical report Kestrel Institute. [132] P. W. Kutter and A. Pierantonio. The formal specification of Oberon. Journal of Universal Computer Science, 3(5):443–503, 1997. [133] P. W. Kutter and A. Pierantonio. Montages: Specification of realistic programming languages. Journal of Universal Computer Science, 3(5):416 – 442, 1997. [134] P. W. Kutter, D. Schweizer, and L. Thiele. Integrating formal DomainSpecific Language design in the software life cycle. In D. Hutter, W. Stephan, P. Traverso, and M. Ullmann, editors, Current Trends in Applied Formal Methods, volume 1641 of Lecture Notes in Computer Science, pages 196 – 212. Springer Verlag, October 1998. [135] P.W. Kutter. The A4M homepage, URL: http://www.a4m.biz. [136] P.W. Kutter. State transitions modeled as refinements. Technical report, Kestrel Institute, 1996. [137] D. A. Ladd and J. C. Ramming. Two application languages in software production. In USENIX Symposium on Very High Level Languages, pages 169–177, New Mexico, October 1994. BIBLIOGRAPHY 347 [138] R. Lämmel and C. Verhoef. Cracking the 500-languages problem. IEEE Software, 18(6):78–88, November/December 2001. [139] R. Lämmel and C. Verhoef. Semi-automatic grammar recovery. Software: Practice and Experience, 31(15):1395–1438, December 2001. [140] L. Lamport. The temporal logic of actions. ACM TOPLAS, 16(3):872– 923, 1994. [141] P. J. Landin. The next 700 programming languages. Comm. of the ACM, 9(3):157–166, May 1966. [142] C. Larman. Protected variation: The importance of being closed. IEEE Software, 18(3):89–91, 2001. [143] M. E. Lesk and E. Schmidt. Lex - a lexical analyzer generator. In Unix Research System Papers. Tenth Edition. Murray Hill, NJ: AT&T Bell Laboratories, 1990. [144] R. Lipsett, E. Marschner, and M. Shahdad. VHDL- The Language. IEEE Design & Test of Computers, 3(2):28–41, 1986. [145] J.A. Lowell. Unix Shell Programming. John Wiley & Sons, 2nd edition, September 1990. [146] M. Lutz. Programming Python. Number ISBN 1-56592-197-6. O’Reilley, 1996. [147] B. Magnusson, M. Bengtsson, L-O. Dahlin, G. Fries, A. Gustavsson, G. Hedin, S. Minor, D. Oscarsson, and M. Taube. An overview of the Mjolner/ORM environment: Incremental language and software development. In Proc. Second International Conference TOOLS (Technology of Object-Oriented Languages and Systems), pages 635 – 646, Paris, June 1990. [148] J. Malenfant. Modélisation de la sémantique formelle des langages de programmation en UML et OCL. Rapport de recherche 4499, INRIA, Rennes, Juillet 2002. in French. [149] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems, Volume 1: Specification. Springer-Verlag, New York, NY, 1992. [150] W. May. Specifying complex and structured systems with evolving algebras. In M. Bidoit and M. Dauchet, editors, Proc. of TAPSOFT’97: Theory and Practice of Software Development, 7th International Joint Conference CAAP/FASE, number 1214 in LNCS, pages 535– 549. Springer, 1997. [151] R. Medina-Mora. Syntax-directed Editing: Towards Integrated Programming Environments. PhD thesis, Carnegie Mellon University, March 1982. Tech. Rep. CMU-CS-82-113. [152] S.J. Mellor and M.J. Balcer, editors. Executable UML: A Foundation for Model Driven Architecture. Addison Wesley Professional, May 2002. 348 BIBLIOGRAPHY [153] Marjan Mernik, Mitja Lenič, Enis Avdičaušević, and Viljem Žumer. A reusable object-oriented approach to formal specifications of programming languages. L’Objet, 4(3):273–306, 1998. [154] Marjan Mernik, Mitja Lenič, Enis Avdičaušević, and Viljem Žumer. Multiple Attribute Grammar Inheritance. Informatica, 24(3):319–328, September 2000. [155] R. Milner, M. Tofte, and R. Harper. The Definition of StandardML. MIT Press, Cambridge, Massachusetts, 1990. [156] M. Mlotkowski. Specification and Optimization of Smalltalk Programs. PhD thesis, Institute of Computer Science, University of Wroclaw, 2001. [157] J. Morris. Algebraic Operational Semantics and Modula-2. PhD thesis, University of Michigan, 1988. [158] P. D. Mosses. Action Semantics. Number 26 in Cambridge Tracts in theoretical Computer Science. Cambridge University Press, 1992. [159] P. D. Mosses. Modularity in natural semantics (extended abstract). Available at http://www.brics.dk/$pdm, 1998. [160] P. D. Mosses. A modular SOS for Action Notation. Research Series BRICS-RS-99-56, BRICS, Department of Computer Science, University of Aarhus, 1999. [161] L. H. Nakatani, M. A. Ardis, R. O. Olsen, and P. M. Pontrelli. Jargons for domain engineering. In DSL 99, Domain-Specific Languages, pages 15–24, 1999. [162] L. H. Nakatani and L. W. Ruedisueli. Fit programming language primer. Technical Report Memorandum 1264-920301-03TMS, AT&T Bell Laboratories, March 1992. [163] L.H. Nakatani and M.A. Jones. Jargons and infocentrism. In Kamin (116), pages 59–74. Published as University of Illinois at Urbana Champaign Computer Science Report URL: www-sal.cs.uiuc.edu/˜kamin/dsl. [164] P. Naur. Revised report on the algorithmic language algol 60. Numerical Mathematics, (4):420–453, 1963. [165] J. Neighbors. Software Construction Using Components. PhD thesis, University of California, Irvine, 1980. Also tech. report UCI-ICSTR160. [166] J. Neighbors. The evolution from software components to domain analysis. Int. Journal of Knowledge Engineering and Software Engineering, 1992. [167] M. Odersky. A New Approach to Formal Language Definition and its Application to Oberon. PhD thesis, ETH Zürich, 1989. BIBLIOGRAPHY 349 [168] M. Odersky. Programming with variable functions. In International Conference on Functional Programming, Baltimore, 1998. ACM. [169] R. O’Hara and D. Gomberg. Modern Programming Using REXXX. Number ISBN 0-13-597329-5. Prentice Hall, 1988. [170] OMB. Model-driven architecture http://www.omg.org/omg/index.htm. home page. [171] J. Ousterhout. Tcl and the Tk Toolkit. Number ISBN 0-201-63337-X. Addison-Wesley, 1994. [172] J. K. Ousterhout. Tcl: An embeddable command language. In Winter USENIX Conference Proceedings, 1990. [173] J. K. Ousterhout. Scripting: Higher level programming for the 21st century. IEEE Computer, 31(3):23–30, March 1998. [174] D. Parigot. Attribute Grammars Home Page, URL: http://www-sop.inria.fr/oasis/Didier.Parigot/www/fnc2/attri [175] D. L. Parnas. On the criteria to be used in decomposing systems into modules. Comm. ACM, 12(2), 1972. [176] D. L. Parnas. On the design and development of program families. IEEE Transactions on Software Engineering, pages 1–9, March 1976. [177] D. Pavlovic and R. Smith. Composition and refinement of behavioral specifications. In Proceedings of 16th Automated Software Engineering Conference, pages 157–165. IEEE press, November 2001. [178] D. Pavlovic and R. Smith. Guarded transitions in evolving specifications. In Proceedings of AMAST’02, 2002. [179] R. Pawson. Expressive Systems, a manifesto for radical business software, chapter An expressive system to improve risk management in options trading, pages 36–43. CSC Research Services, 2000. [180] P. Pfahler and U. Kastens. Language design and implementation by selection. In Kamin (116). Published as University of Illinois at Urbana Champaign Computer Science Report URL: www-sal.cs.uiuc.edu/˜kamin/dsl. [181] A. Pierantonio. Making statics dynamic: Towards an axiomatization for dynamic adt’s. In G. Hommel, editor, Proc. Int. Workshop on Communication Based Systems, pages 19–34. Kluwer Accademic Publisher, 1995. [182] G. D. Plotkin. A structural approach to operational semantics. Lecture Notes DAIMI FN-19, Department of Computer Science, University of Aarhus, 1981. [183] A. Poetzsch-Heffter. Formale Spezifikation der kontextabh ängigen Syntax von Programmiersprachen. PhD thesis, Technische Uni. München, 1991. in german. 350 BIBLIOGRAPHY [184] A. Poetzsch-Heffter. Programming Language Specification and Prototyping using the MAX System. In M. Bruynooghe and J. Penjam, editors, Programming Language Implementation and Logic Programming, volume 714 of Lecture Notes in Computer Science, pages 137 – 150. Springer–Verlag, 1993. [185] A. Poetzsch-Heffter. Developing Efficient Interpreters Based on Formal Language Specifications. In P. Fritzson, editor, Compiler Construction, volume 786 of Lecture Notes in Computer Science, pages 233 – 247. Springer–Verlag, 1994. [186] A. Poetzsch-Heffter. Prototyping realistic programming languages based on formal specifications. Acta Informatica, 34:737–772, 1997. 1997. [187] M. Posypkin. Personal communications. email, January 2003. [188] S. P. Reiss. PECAN: Program development systems that support multiple views. IEEE Transactions on Software Engineering, SE-11(3):276 – 285, March 1985. [189] T. Reps and T. Teitelbaum. The Synthesizer Generator: A System for Constructing Language-Based Editors. Texts and Monographs in Computer Science. Springer Verlag, New York, 1989. [190] J. C. Reynolds. Reasoning about arrays. Communications of the ACM, 22:290–299, 1979. [191] D. A. Schmidt. On the need for a popular formal semantics. ACM SIGPLAN Notices, 32(1):115–116, 1997. [192] D. S. Scott and C. Strachey. Toward a mathematical semantics for computer languages. Computers and Automata, (21):14 – 46, 1971. Microwave Research Institute Symposia. [193] M. G. Sèmi. Gènèration de spècifications centaur á partir de spècifications montages. Master’s thesis, Universitè de Nice–Sophia Antipolis, June 1997. in French. [194] N. Shankar. Symbolic Analysis of Transition Systems. In Gurevich et al. (86). [195] E. Sheedy and S. McCracken, editors. Derivatives, the risks that remain. Macquarie Series in Applied Finance. Allen & Unwin, 1997. [196] S. Siewert. A common core language design for layered language extension. Master’s thesis, Univ. of Colorado, 1993. http://wwwsgc.colorado.edu/people/siewerts/msthesis/thesisw6.htm. [197] C. Simonyi. The death of computer languages, the birth of intentional programming. Technical Report MSR-TR-95-52, Microsoft Research, 1995. [198] C. Simonyi. The future is intentional. IEEE Computer, pages 56–57, May 1999. BIBLIOGRAPHY 351 [199] D. Spinellis. Reliable software implementation using domain-specific languages. In G.I. Schuëller and P. Kafka, editors, Proc. ESREL’99 – 10th European Conf. on Safety and Reliability, pages 627 – 631, September 1999. [200] D. Spinellis. Notable design patterns for domain specific languages. Journal of Systems and Software, 56(1):91–99, February 2001. [201] D. Spinellis and V. Guruprasad. Lightweight languages as software engineering tools. In USENIX Conference on Domain-Specific Languages (1), pages 67–76. [202] T. Standish. Extensibility in programming languages design. SIGPLAN Notices, July 1975. [203] R. Stärk. Abstract state machines for java. Lecture Notes for Computer Science Students, Theoretische Informatik 37402, Departement Informatik, ETH Zürich, 1999. available at http://www.inf.ethz.ch/$staerk/teaching.html. [204] R. Stärk, J. Schmid, and E. Börger. Java and the Java Virtual Machine Definition, Verification, Validation. Springer Verlag, 2001. [205] L. Starr. Executable UML: A Case Study. Model Integration, LLC, February 2001. [206] J. Szyperski, C.and Gough. The role of programming languages in the life-cycle of safe systems. In STQ’95, 2nd Int. Conf. on Safety Through Quality, Kennedy Space Center, Cape Canaveral, Florida, USA, October 1995. [207] A. Tarski. Der wahrheitsbegriff in den formalisierten sprachen. Studia Philosophica, (1):261–405, 1936. English translation in A. Tarski. Logic, Semantics, Methamathematics. Oxford University Press. [208] J. Teich, P. W. Kutter, and R. Weper. Description and Simulation of Microprocessor Instruction Sets Using ASMs. In Gurevich et al. (86), pages 266–286. [209] S. Thibault. Domain-Specific Languages: Conception, Implementation and Application. PhD thesis, l’Université Rennes 1, Institut de Formation Supérieure en Informatique et Communication, October 1998. [210] S. Thibault and C. Consel. A framework of application generator design. In M. Harandi, editor, Proceedings of the ACM SIGSOFT Symposium on Software Reliability (SSR ’97), volume 22 of Software Engineering Notes, pages 131 – 135, Boston, USA, May 1997. [211] A. M. Turing. On computable numbers, with an application to the Entscheidungs problem. Proc. London Math. Soc., 2(42):230–265, 1936. (Corrections on volume 2(43):544–546). 352 BIBLIOGRAPHY [212] J. Uhl. Spezifikation von programmiersprachen und uebersetzern. Berichte 161, Gesellschaft fuer Mathematik und Datenverarbeitung, 1986. in German. [213] M. G. J. van den Brand, J. Heering, P. Klint, and P. A. Olivier. Compiling language definitions: The asf+sdf compiler. ACM Transactions on Programming Languages and Systems, 24(4):334–368, 2002. [214] M. G. J. van den Brand, A. van Deursen, P. Klint, S. Klusener, and E. A. van der Meulen. Industrial applications of ASF+SDF. In Proc. AMAST’96, 5th International Conference on Algebraic Methodology and Software Technology, Munich, Germany, July 1996. SpringerVerlag. Lecture Notes in Computer Science 1101. [215] A. van Deursen. Using a domain-specific language for financial engineering. ERCIM News, (38), July 1999. [216] A. van Deursen and P. Klint. Little languages: Little maintenance? Journal of Software Maintenance, 10:75 – 92, 1998. [217] A. van Deursen, P. Klint, and J. Visser. Domain–specific languages – an annotated bibliography. ACM SIGPLAN Notices, 35(6), June 2000. [218] T. van Rijn. Financial product solution. Cap Gemini Ernst & Young, internal documentation. [219] G. van Rossum and J. de Boer. Interactively testing remote servers using the Python programming language. CWI Quarterly, Amsterdam, 4(4):283–303, 1991. [220] J. Visser. Evolving algebras. Master’s thesis, Delft University of Technology, 1996. [221] W. M. Waite and G. Goos. Compiler Construction. Springer Verlag, 1984. [222] L. Wall, T. Christiansen, and R. Schwartz. Programming Perl. Number ISBN 1-56592-149-6. O’Reilly and Associates, second edition, 1996. [223] L. Wall and R. L. Schwartz. Programming Perl. O’Reilly & Associates, Inc., 1991. [224] C. Wallace. The Semantics of the C++ Programming Language. In E. Börger, editor, Specification and Validation Methods, pages 131 – 164. Oxford University Press, 1994. [225] C. Wallace. The Semantics of the Java Programming Language: Preliminary Version. Technical Report CSE-TR-355-97, University of Michigan EECS Department Technical Report, 1997. [226] R. Weicker. Dhrystone: A synthetic systems programming benchmark. Comm. of the ACM, 27(10):1013–1030, October 1984. [227] D. M. Weiss and C. T. R. Lai. Software Product Line Engineering: A Family-Based Software Development Process. Addison Wesley, Reading, MA, 1999. BIBLIOGRAPHY 353 [228] R. L. Wexelblat. Maxims for malfeasant designers, or how to design languages to make programming as difficult as possible. In Proc. of the 2nd Int. Conf. on Software Engineering, pages 331–336. IEEE Computer Society Press, 1976. [229] I. Wilkie, A. King, M. Clarke, C. Weaver, and C. Raistrick. Uml action specification language (asl), reference guide. Kennedy Carter Limited, KC/CTN/06, www.kc.com, February 2001. [230] N. Wirth. On the design of programming languages. In J.L. Rosenfeld, editor, Information Processing 74, Proc. of IFIP Congress 74, pages 386–393. North-Holland Publishing Company, 1074. [231] N. Wirth. The programming language PASCAL. Acta Informatica, 1(1):35 – 63, 1971. 354 BIBLIOGRAPHY