Model-Driven Data Migration
Mohammed A Aboulsamh
St Catherine’s College
University of Oxford
A thesis submitted for the degree of
Doctor of Philosophy
Hilary 2012
Abstract
Information systems often hold data of considerable value. Their continuing development or maintenance will often necessitate evolution of the
system and migration of the data from one version to the next: a process
that may be expensive, time-consuming, and prone to error. That such
a process remains a source of challenges, is recognised by both academia
and industry. In current practice, data migration is often considered only
in the later stages of development, leaving critical data to be transformed
and loaded by hand-written scripts, long after the design process has been
completed.
The advent of model-driven engineering offers an opportunity to consider
the question of information system evolution and data migration earlier in
the development process. A precise account of the proposed changes to an
existing system model can be used to predict the consequences for existing
data, and to generate the necessary data migration implementation.
This dissertation shows how automatic data migration can be achieved by
extending the definition of a data modeling language to include modellevel operations, each of which corresponds to the addition, modification,
or deletion of a model component. Using the Unified Modeling Language
(UML) notation as an example, we show how the specification of these
operations may be translated into an abstract program in the Abstract
Machine Notation (AMN), employed in the B-method, and then formally
checked for consistency and applicability prior to translation into a concrete programming notation, such as Structured Query Language (SQL).
Contents
1 Introduction
1.1 Problem and motivation . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
1.2
Trends and challenges
. . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
This dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2 Background
2.1 Data modeling approaches . . . . . . . . . . . . . . . . .
2.1.1 Entity-Relationship (ER) modeling . . . . . . . .
2.1.2 Fact-Oriented modeling . . . . . . . . . . . . . .
2.1.3 Object-Oriented modeling . . . . . . . . . . . . .
2.1.4 Choosing an appropriate data modeling approach
2.2
.
.
.
.
.
9
10
11
13
14
15
MDE of information systems . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Metamodel hierarchy . . . . . . . . . . . . . . . . . . . . . . .
17
18
2.2.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2.2.3
Class-based and object-based representation of metamodels and
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Model-Driven Architecture (MDA) . . . . . . . . . . . . . . .
18
20
2.2.4
Model transformation . . . . . . . . . . . . . . . . . . . . . . .
21
2.2.5 Information systems evolution . . . . . . . . . . . . . . . . . .
Formal foundations for model-driven engineering . . . . . . . . . . . .
22
22
2.3.1
Integrating modeling approaches and formal methods . . . . .
23
2.3.2
Choosing an appropriate formal method . . . . . . . . . . . .
25
Formal modeling with B . . . . . . . . . . . . . . . . . . . . . . . . .
28
3 Towards a Data Model Evolution Language
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
34
2.3
2.4
3.2
Towards a Data Model Evolution Language . . . . . . . . . . . . . .
35
3.3
State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Database Schema Evolution . . . . . . . . . . . . . . . . . . .
37
38
i
3.3.2
3.4
Model-Driven Engineering . . . . . . . . . . . . . . . . . . . .
41
Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.4.1
3.4.2
46
47
Synthesizing design requirements . . . . . . . . . . . . . . . .
Main elements of our approach . . . . . . . . . . . . . . . . .
4 Modeling Data Model Evolution
52
4.1
Modeling Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.2
Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 UML concepts for static structure modeling . . . . . . . . . .
55
55
4.2.2
Characterization of a UML Data Metamodel . . . . . . . . . .
56
4.2.3
Consistency of UML data model . . . . . . . . . . . . . . . . .
59
Modeling evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Definition of evolution metamodel . . . . . . . . . . . . . . . .
63
64
4.3.2
Primitive Model Edits . . . . . . . . . . . . . . . . . . . . . .
68
4.3.3 Expressing evolution patterns . . . . . . . . . . . . . . . . . .
Induced data migration . . . . . . . . . . . . . . . . . . . . . . . . . .
71
73
5 Specification and Verification of Data Model Evolution
5.1 Semantics of UML data model . . . . . . . . . . . . . . . . . . . . . .
77
78
4.3
4.4
5.2
5.3
5.4
5.1.1
Type structure . . . . . . . . . . . . . . . . . . . . . . . . . .
78
5.1.2
5.1.3
Data model classes . . . . . . . . . . . . . . . . . . . . . . . .
Data model properties . . . . . . . . . . . . . . . . . . . . . .
79
79
5.1.4
Data model associations . . . . . . . . . . . . . . . . . . . . .
80
5.1.5
Data model instances . . . . . . . . . . . . . . . . . . . . . . .
81
Consistency of the data model . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Syntactic consistency . . . . . . . . . . . . . . . . . . . . . . .
81
82
5.2.2
83
Semantic Consistency . . . . . . . . . . . . . . . . . . . . . . .
Semantics of model evolution operations . . . . . . . .
5.3.1 Specifying evolution primitives . . . . . . . . . .
5.3.2 From evolution primitives to evolution patterns
Verification of data model evolution . . . . . . . . . . .
5.4.1
5.4.2
5.4.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
85
86
92
94
Verifying data model consistency . . . . . . . . . . . . . . . . 94
Verification of data model refactoring . . . . . . . . . . . . . . 97
Checking applicability of data migration . . . . . . . . . . . . 100
ii
6 Generating Platform-Specific Data Migration Programs
6.1
6.2
From an object data model to a relational data model . . . . . . . . . 109
6.1.1
6.1.2
Refining data model properties . . . . . . . . . . . . . . . . . 110
Flattening inheritance . . . . . . . . . . . . . . . . . . . . . . 111
6.1.3
6.1.4
Introduction of keys . . . . . . . . . . . . . . . . . . . . . . . 112
Refining data model associations . . . . . . . . . . . . . . . . 112
6.1.5
6.1.6
Refinement of abstract machine operations . . . . . . . . . . . 113
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Generating data migration programs . . . . . . . . . . . . . . . . . . 119
6.2.1
6.2.2
6.2.3
6.3
108
Formalizing SQL metamodel in AMN . . . . . . . . . . . . . . 120
Linking data model state to SQL state . . . . . . . . . . . . . 123
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.4 Implementation of evolution operations in SQL . . . . . . . . 126
Generating SQL data migration programs . . . . . . . . . . . . . . . 132
7 Discussion
141
7.1 Research contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.1.1
Modeling evolution . . . . . . . . . . . . . . . . . . . . . . . . 142
7.1.2
Precise modeling of data model evolution . . . . . . . . . . . . 142
7.2
7.1.3 Predicting consequences of data model evolutionary changes . 143
7.1.4 Generation of correct data migration programs . . . . . . . . . 143
Genericity of the approach . . . . . . . . . . . . . . . . . . . . . . . . 144
7.3
Comparison with related work . . . . . . . . . . . . . . . . . . . . . . 150
7.4
Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . 151
7.4.1 Feedback generation . . . . . . . . . . . . . . . . . . . . . . . 151
7.5
7.4.2
From data migration to model migration . . . . . . . . . . . . 152
7.4.3
Predicting consequences of evolution on behavior properties . 153
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Bibliography
172
A Implementation and case study
173
B B-method notation ASCII symbols
190
C B Specifications
191
D Proof activities
216
iii
Chapter 1
Introduction
1.1
Problem and motivation
Information systems are becoming more pervasive, and the data that they collect
is becoming more detailed, more complex, and more essential to our daily lives. It
should come as no surprise to find that this data will typically be more valuable than
the components of the system itself. These components are easily replaced, easily
upgraded: the cost of hardware and software continues to fall. The data, however,
may be irreplaceable: the context in which it was collected no longer exists.
Measures can be taken to protect against the consequences of catastrophic data
loss: should a hard disk fail, another may hold a copy of the same data; should
a server fail, or a smartphone be lost, the data might be restored from a recent
backup. More insidious, more damaging, and more likely is a loss not of data, per se,
but of integrity. If this happens, any further data collected might be improperly or
inconsistently stored; any further work done using the system may be wasted. If an
integrity issue is not detected quickly, then considerable value may be lost.
The integrity of data—the semantic integrity—is expressed partly in terms of the
values of different data items, and partly in terms of the relationships between them.
For example: if a data item represents a person’s age, then we would expect it to
be in the range 0 to 120, perhaps—certainly not −53, or “yellow”; if one data item
represents the number of children a person has, and another represents the number of
male children, we would not only expect both items to be numeric, we would expect
one to be numerically greater than the other.
The question of integrity becomes particularly important when the system itself
is changed. If information is to be recorded in a different way, then both data values
and relationships may need to be updated to match: existing data may need to be
transformed, and new values generated, in order that the system might continue to
2
behave in a consistent fashion. This process of data migration can be costly, errorprone, and time-consuming. Several “dry runs” may be required, and even then there
may be significant interruptions in service while data is transferred from the old to the
new system, and checks are performed to confirm—as best as can be determined—
that semantic integrity has been maintained.
The model-driven approach to systems development offers significant promise of
improvement in this situation. If the design of the system, and the data representation
in particular, is characterized by an abstract model, then we may be able to determine
the data migration requirements by comparing two models: one of the existing version
of the system, another of the new version that is to be installed. More than this, we
may be able to adopt a model-driven approach to the data migration process: by
constructing an evolution model that describes how the new model is obtained from
the old, we might obtain a basis for the automatic generation of an appropriate data
migration program.
This is the thesis of the present dissertation: that such a model can be constructed,
written in a language that can itself be automatically derived from the language of
the system model; and that an appropriate program can be generated. The program
is appropriate not only in that it accurately reflects the proposed changes to the
system, but also that we can guarantee, in advance, that the transformed data will
meet the integrity constraints of the new system. This proposition is explored in the
context of the modeling framework of MOF (Meta Object Factory), that underpins
the widely-used UML (Unified Modeling Language) notation: a realistic context in
which to examine the challenges of data migration.
The approach to the generation of an appropriate program—in particular, the
determination of a sufficient condition for the program to guarantee the integrity
of the transformed data—is based upon the formal, mathematical foundation of the
AMN (Abstract Machine Notation), and the supporting framework of the B-method.
The applicability of the approach is demonstrated by means of a transformation from
the evolution modeling language into the widely-used Structured Query Language
(SQL), perhaps the most common implementation platform for data migration.
1.2
Trends and challenges
In data-intensive systems such as information systems, the complexity of queries
and updates to be specified may be low, but the data they act upon are typically
very large, long-lived, sharable by many programs, and subject to numerous rules of
3
consistency [43]. While many years of research on database schema evolution provide
solid theoretical foundations and interesting methodological approaches, information
systems evolution activities are still a source of challenges [77]. As observed by
[136], one important drawback of most schema evolution approaches is the lack of
abstraction which remains largely unsolved. As a result, typically data conversion
tasks in information systems evolution projects are left to programmers [41, 23].
Basic questions such as how to reflect in the existential level (i.e.data level) the
changes that have occurred in the conceptual schema of a database, in such a way
that consistency and validity of data are guaranteed, still represent a common problem
in most information systems evolution settings [77].
Most often designers are basically “left alone” in the process of evolving an information system. Based only on their expertise, those designers must figure out how to
express System changes and the corresponding data migration —not a trivial matter
even for simple evolution steps. Given the available tools, the process is not incremental and there is no system support to check and guarantee consistency preservation,
nor is support provided to predict or test consequences of evolution on existing data.
Model-driven engineering paradigm can be an important technique to reduce the
complexity of information systems evolution : by employing abstraction and considering the various system artifacts as models, our ability to deal with the complexity
of information systems evolution activity can be significantly improved. The abstraction promise offered by model-driven engineering paradigm has recently been realized
in concrete results in the use of models as central elements in various software development and maintainance activities. For example, models can be used to capture
design requirements and properties [18]; to map one software artifact to the other as in
model transformation [45], to trace changes of a software artifacts as in model weaving
[57], to reorganize and improve the quality of software as in model refactoring [166]
or they can be used to generate platform-specific code and implementations [112].
These and other approaches in model-driven engineering may help in characterizing
information systems evolution problem, however they remain largely general-purpose
and offer no specific support for information systems evolution specific tasks such as
data migration.
If models are to be treated as programs and compiled, or used as the basis for automatic generation of any kind, as advocated by model-driven engineering paradigm,
then the abstract models must have a precise, formal semantics [46]. Assigning precise
meaning to software models can be done by giving formal description to the abstract
concepts and relationships that they present. This formalization helps in analyzing
4
and reasoning about the domain that the model represents. This way, formality can
contribute to software development productivity and efficiency by bringing analysis
and verification activities into the design phase rather than relying on implementation
testing.
1.3
This dissertation
In this dissertation, we show how information systems evolution challenges can be
addressed through a formal, model-driven approach. We start by showing how a
model evolution language (in the form of a metamodel) may be derived for any MetaObject Facility (MOF)-based data modeling language such as UML. We show how
the evolution modeling language is adequate for the description of changes to models
conformant to the selected data modeling language. We show also that the language
can be given a formal semantics using the Abstract Machine Notation (AMN) of Bmethod, sufficient for the expression of consistency conditions of the language and
data integrity constraints of the data model.
Our proposed language of model evolution, properly mapped to a formal semantics
in B-method, can be used for the analysis of proposed evolutionary changes to data
models. Using the supporting framework of B-method, we show how the applicability
of a sequence of model evolution steps may be determined in advance, and used to
check that a proposed evolution will preserve model consistency and data integrity.
This applicability information can be fed back to system designers during the modeling activity, allowing them to see in advance how the changes they are proposing
would affect the data in the existing system.
Using B-method refinement mechanism, we show how operating successive transformations on our abstract evolution specifications can be translated into an application in an executable language such as Structured Query Language (SQL), which
is the implementation language we have chosen for describing data migration implementations. As a product of formal refinement, the generated code is guaranteed to
preserve various consistency and integrity notions represented by the data model.
Each of our models or representations—the sequence of proposed changes, the
AMN representation and the SQL implementation—conforms to a specific metamodel, each of which conforms in turn to MOF meta-metamodeling standard. Where
a data modeling language, within MOF framework, is subject for evolutionary changes,
we show how our approach may be generalized to account for data modeling language
5
evolution and predict consequences of such evolution on existing data models and corresponding data instances.
This dissertation is structured as follows. In Chapter 2, we provide a brief background on the main definitions, concepts and terminology we use in the rest of the
dissertation.
Chapter 3 discusses requirements and motivations for a data model evolution
language. The main idea of this chapter is to establish the need for a model evolution
language applicable to data models written in a standard modeling notation and
investigate the main features which such a language should present. Following a
critical review of relevant literature, the chapter concludes with the identification of
key requirements for a data model evolution language.
Chapter 4 presents a characterization of a subset of UML adequate for data modeling. This characterized subset is then extended with evolution operations so that
we can specify evolution of data models written in UML. In particular, we show how,
given a standard MOF-based data modeling notation (e.g. UML), the basic elements
of an evolution language can be automatically generated. We then elaborate on the
main components of the generated evolution language and discuss how it can be used
as a basis for generating data migration implementation.
In Chapter 5, we use B-method to extend our proposed data model evolution language with appropriate relational semantics, mapping well-formed UML data models to structured constraints upon object properties and their values and mapping
model evolution operations to appropriate substitutions upon object property values.
We demonstrate how the proposed formal notation is sufficient for specifying data
model consistency, an important pre-requisite for determining domain of applicability of model evolution operations. Using a proposed characterization of consistency
constraints, we also show how conformance can be checked at two levels of MOF
abstraction hierarchy — at model-level (between the data model and its modeling
language) and at data-level (between data instances and the data model used for
their collection and persistence).
Chapter 6 shows how the refinement of data model abstract evolution specifications into a formal description of the Structured Query Language (SQL), supports the
generation of data transformations between successive versions of a system. In particular, we present a formal characterization of an SQL-metamodel and and demonstrate
how the generated data migration (in SQL) are consistency-preserving
This dissertation ends with a discussion chapter summarizing research contributions, comparing these to related work and highlighting limitations and opportunities
6
for future work.
In the appendices of this dissertation, we describe the main components of a prototype implementation of the model-driven approach to data migration proposed by
this thesis. We also include a complete description of B-method abstract, refinement
and implementation machines we have developed to support the formalization of the
presented approach. Finally, main proof activities are summarized at the end of this
document.
7
Chapter 2
Background
Model-Driven Engineering (MDE) aims to raise the level of abstraction by which
software systems are developed. This is done by emphasizing the need for thorough
modeling. MDE relies on the use of models to describe software development activities
and artifacts [18]. In MDE, models are abstractions of the artifacts developed during
the software engineering process and can be used by domain experts to represent a
variety of problem domains (e.g. data models, business processes, etc.). The basic
assumption in MDE is that models are considered first-class entities: models are
software artifacts that can be updated, analyzed or processed for different purposes
[18].
A software model description can be formalized by giving precise interpretations
to the classifications, associations, and constraints that the model presents. These
precise interpretations lead to the formal analysis and verification of MDE. For example, using formal metamodeling we can ensure that a model defined with the intention
of being conformant to a specific metamodel is, indeed, conformant, check that the
various elements of a model are consistent, or guarantee correctness of a generated
implementation.
Information systems frequently change throughout their lifetime. This change may
occur due to various reasons such as changes in end user requirements or to improve
the quality and organization of the system. These changes can be accommodated
by changing the behavior of the system: the way the methods were implemented or
by modifying the structure: the data model that was used to collect and persist the
data.
The central theme of this thesis revolves around the problem of information systems evolution: how can we specify changes of an evolving information system and
use these specifications to reason about evolution and data stored against an old sys-
9
tem model so that all structural and integrity constraints imposed by a new model
remain valid?
We start this chapter by considering alternative data modeling notations. Using a
running example, we explore the strengths and weaknesses of each alternative before
we motivate our selection of UML. We then discuss some MDE fundamental concepts
based on which we will build various elements of our proposed approach. Such concepts include metamodeling hierarchy and model transformation. In the second part
of this chapter, we discuss the importance of integrating data modeling languages
(UML in our context) with a precise formal notation. We discuss the applicability of
a number of model-oriented notations, before we motivate our selection of B-method.
2.1
Data modeling approaches
When designing an information system for a particular application, we may create a
design model of the application area. Technically, the application area being modeled
is called the ‘application domain’ and typically is part of the real world [75].
In the field of information systems, we make the fundamental assumption that
an application domain consists of a number of objects and the relationships between
them, which are classified into concepts. The state of a particular domain, at a given
time, therefore consists of a set of objects, a set of relationships, and a set of concepts
into which these objects and relationships are classified [126]. For example, in the
domain of a company, we may have the concepts of a customer, a product and a
sale. At a given moment, we have objects classified as customers, objects classified as
products, and relationships between customers and products classified as sales.
Building a good model for the application domain requires a good understanding
of the world we are modeling, and hence is a task ideally suited to people rather than
machines. Accordingly, the model should first be expressed at the conceptual level, in
concepts that people find easy to work with. Implementation concerns are, of course,
important, but should be ignored in the early stages of modeling. Once an initial
conceptual design has been constructed, it can be mapped down to a logical design
in any modeling language we like. This added flexibility also gives us a separation
of concerns and makes it easier to implement and maintain the same application on
more than one kind of implementation platform.
Although most information systems we conceptually model involve processes as
well as data, given the context of our research, we will focus on the information (and
hence on the data). This focus on date will, in turn, require us to concentrate on data
10
models, defined as collections of concepts that can be used to describe the structure
of databases, i.e. data types, relationships and the constraints that should hold for
the data [117].
There is a great diversity in conceptual data models, and they may be more or
less useful in particular situations or for particular purposes. However, all of them
are based on the fundamental assumption that we have mentioned above, which we
shall attempt to clarify in the remainder of this section.
Within the context of this research, based on the work of [77, 74, 153, 19, 20], we
have considered three main conceptual modeling approaches: Entity-Relationship(ER)
modeling, Fact-Oriented modeling and Object-Oriented modeling. In our discussion
of these modeling approaches, we do not intend to describe them in their entirety.
We will focus our discussion on the main principles of each approach to confirm the
fundamental assumption we stated at the beginning of this section and present the
modeling approach which we will adopt throughout this thesis.
Running example. In this chapter and throughout the remaining chapters of this
thesis, we will use a simplified Employee Information System (EIS) data model as a
running example to illustrate our ideas and explain alternative approaches. In our
EIS example, a company workforce is represented by employees. The company stores
each employee’s name and age. The company is organized into departments and each
employee works for a department which is managed by an employee of the company.
In addition, employees’ personal data maintained by the company is organized in
personal files which are linked to corresponding employees. In the remaining parts of
this section, we will use the above example to illustrate the concepts of alternative
modeling approaches which we have considered.
2.1.1
Entity-Relationship (ER) modeling
Entity-Relationship (ER) was introduced by Peter Chen in 1976 [36] and is one of
the most widely used approach for data modeling. It pictures the world in terms
of entities that have attributes and participate in relationships. Over time, many
different versions of ER emerged, and today, there is no standard ER notation [153].
The basic object that the ER model represents is an entity. An entity may be
an object with a physical or a conceptual existence. Each entity has attributes—the
particular properties that describe it. A particular entity will have a value for each
of its attributes. Each entity in the database is described by its name and attributes.
11
N
1
worksFor
name
Employee
Department
1
age
1
manages
1
name
file
age
1
PersonalFile
fileNo
Figure 2.1: An ER Schema diagram for a simplified Employee Information System
Example. Figure 2.1 depicts our EIS running example using Chen’s ER notation.
As the diagram shows, an entity is represented in ER notation, as a rectangular box
enclosing the entity name. Attribute names are enclosed in ovals and are attached
to their entity by straight lines. An important constraint on an entity is the key or
uniqueness constraint on attributes. Such an attribute is called a key attribute, and its
values can be used to identify each entity. Whenever an attribute of one entity refers
to another entity, some relationship exists. In the ER model, these references should
not be represented as attributes but as relationships. In ER diagrams, relationships
are displayed as diamond-shaped boxes, which are connected by straight lines to the
rectangular boxes representing the participating entities. The relationship name is
displayed in the diamond-shaped box. Relationships can also have attributes, similar
to those of entities. The cardinality ratio for a relationship specifies the maximum
number of relationship instances that an entity can participate in. For example,
the cardinality ratio of ‘n and 1’ between Employee entity and Department entity is
interpreted as ‘many to one’ (each employee works for at most one department, but
many employees may work for the same department).
To its credit, this ER diagram portrays the application domain in a way that is
independent of the target software platform. For example, classifying a relationship
end as mandatory is a conceptual issue. There is no attempt to specify here how this
constraint is implemented (e.g. using mandatory columns, foreign key references, or
object references). Conversely, the ER diagram is less than ideal for validating the
model with the domain expert, and the conceptual step from the data output to
the model can, sometimes, be ambiguous and may only be guessed by the model’s
creator. For example, in ER diagrams the direction of relationships is undecided (i.e.
12
Employee
Department
manages
managedBy
file
employee
employee department
PersonalFile
Figure 2.2: An ORM model diagram for a simplified Employee Information System
the diagram does not explicitly state whether the employee works for the department,
or the department works for the employee). Outside academia, Chen’s notation seems
to be rarely used nowadays. One important reason may be that there are so many
ER notation versions, with no single standard.
2.1.2
Fact-Oriented modeling
Object Role Modeling (ORM) is one of the most commonly cited Fact-Oriented modeling method [27]. It began in the early 1970s as a semantic modeling approach that
views the world simply in terms of objects playing roles (taking parts in relationships).
ORM has appeared in a variety of forms such as the natural-language information
analysis method (NIAM) [74]. ORM includes various procedures to assist in the
creation and transformation of data models. A key step in its design procedure is
the verbalization of information examples relevant to the application, such as sample
reports expected from the system. ORM sentence types (and constraints) may be
specified either textually or graphically.
Domain concepts are shown in ORM as named ellipses and must have a reference
scheme, as an abbreviation of the relevant association (e.g., Employee has a name).
In ORM, a role is a part played in a relationship or association. A relationship is
shown as a named sequence of one or more role boxes, each connected to the object
type that plays it.
Apart from object types, the only data structure in ORM is the relationship
type. In particular, attributes are not used at all in base ORM. This is one of the
13
fundamental differences between ORM and and ER. Wherever an attribute is used
in ER, ORM uses a relationship instead. An ORM model is essentially a connected
network of object types and relationship types.
Example.
To understand how ORM works, we use the same running example
of EIS depicted earlier in ER. In an ORM diagram, as shown in Figure 2.2, roles
appear as boxes, connected by a line to their object type. Roles are simply the line
ends, but may optionally be given names. ORM allows associations to be objectified
as first class object type. In ORM this is captured as a mandatory role constraint,
represented graphically by a black dot. The lack of a mandatory role constraint on
the left role indicates it is optional.
2.1.3
Object-Oriented modeling
Although many object-oriented approaches exist, the most influential is the Unified
Modeling Language (UML), which has been adopted by the Object Management
Group(OMG) [124].
UML is a general-purpose modeling language. It offers elements of a graphical
concrete syntax to create visual models of software intensive systems. UML synthesized notations of the Booch method [22], the Object-Modeling Technique (OMT
[140]) and Object-Oriented Software Engineering (OOSE [86]) by fusing them into
a single, common and widely usable modeling language; several other methods have
also influenced the UML, as for instance Entity-Relationship modeling.
The UML rapidly gained popularity in industry, as it facilitated the communication between diverse stakeholders at different phases of software development, on
one hand, and, on the other, it provided several points of views onto that system,
including its context.
UML takes up central ideas from the ER model and put them into a broad software development context by proposing various graphical sublanguages and diagrams
for specialized software development tasks. UML concepts relevant for data modeling include concepts typically used for modeling structural aspects such as objects,
classes, properties, generalization and association. More precision to UML diagrams
may be added by using textual constraints of Object Constraint Language [121].
Example.
Figure 2.3 shows how our EIS data model can be represented in a
UML class diagram. Each concept in the model is represented by a separate class.
As such, we have a class for Employee, Department and Personal File. Each class
14
Employee
worksFor
1
name
age
Department
*
head
1
name
location
1
1
file
1
PersonalFile
fileNo.
Figure 2.3: A UML model for a simplified Employee Information System
includes a number of attributes to help define properties of objects of each respective
class. For example, for each employee object we are interested in capture name and
age attributes. In addition, the diagram shows a number of associations describing
relationships between different classes. For example, head is a relationship between
Employee and Department classes. Based on such relationship, we expect each department object to be associated with an employee object acting as a head of that
department.
2.1.4
Choosing an appropriate data modeling approach
It is important to note that, when describing data models, although terms used by
each modeling approach vary, the meaning of concepts remains very close. For example, the concept of entity proposed by Chen [36] to describe object with a physical
or a conceptual existence is similar to the concept of object type used by ORM and
the concept of Class used by UML. Similarly, the concept of relationship proposed
by Chen to make the connection between entities and the cardinality constraints that
may accompany such description. This concept can be intuitively mapped to the
concept of association in UML.
It is equally important to note that these modeling methods differ in representing
some of the application domain concepts. As we highlighted above, unlike UML,
ER models do not explicitly show the direction of entity relationships. In addition,
unlike ER and UML, ORM makes no use of attributes in its models. All facts are
represented in terms of objects (entities or values) playing roles. As such, compared to
15
ER and UML models, ORM models loose an important advantage: ability to produce
compact models, which can complicate the task of the modeler.
Considering the relative strengths and weaknesses of different approaches outlined
above, in this thesis we have selected to use UML for data modeling. Our decision is
based on three main considerations:
1. Extendability. As we will show in Chapter 4, a main component of our modeldriven approach is an evolution model. This evolution model represent abstract description of data model changes. To capture such data model evolutionary changes,
we need to extend the modeling language in which the data model itself is written.
This requires the modeling notation to be extendable. Extension to the UML may
be achieved in two different ways [87], as a lightweight or heavyweight extension. The
first of these involves the definition of a UML ‘profile’, using extension mechanisms
included within the standard UML profiles package ([124], pp.179). This is the approach we followed in [2], but soon discovered that it is limited in its applicability.
Instead, here we present the heavyweight extension, similar to the approach we followed in [3]. Unlike UML, other modeling approaches such as ER and ORM do not
offer similar extendability mechanisms, limiting our ability to model the data model
and its abstract evolutionary changes in the same modeling notation. This topic is
discussed further in Chapter 4.
2. Standardization. Unlike UML, none of the other modeling approaches which
we have considered has been adopted as a standard modeling notation. The use of a
standard modeling framework such as UML for data and evolution modeling makes
our approach relevant for potential usage by a wide range of audience in software
engineering community in both academia and industry.
3. Tool support. One aim of this thesis is to realize a solution for data model evolution. Such a solution can be used by information system designers in performing
data model updates, analyzing some of the consequences of their updates and subsequently migrating the data. As we will explain in Chapter 5, the realization of such
solution requires integrating a number of metamodels in a model-driven engineering
chain. Although tool support exists for ER and, at a wider extent, for ORM, tool
support in the model-engineering community for UML exceed by far any other modeling approach. This abundance in tool support increases our opportunity to integrate
with a large number of existing supporting tools towards making our approach more
usable.
16
2.2
MDE of information systems
Model-Driven Engineering (MDE) is a unification of initiatives that aims to improve
software development by employing high-level domain models in the design, implementation, testing and maintenance of software systems. The basic assumption in
MDE is to consider models as first-class entities. The main implication of this assumption is that models are software artifacts that can be modified, updated, or processed
for different purposes. Different operations can be applied on models. This differs
from the traditional view of software development where models are used essentially
for documentation.
MDE of information systems applies the principles of MDE in the development of
information systems: an information system is specified as a set of models that are
repetitively transformed and refined until a working implementation is obtained.
Within the domain of information systems, the notion of MDE has been a subject
of increasing attention. Models have been widely used for the design and development
of databases [47], schema and data integration [96], [132] and for data transformation
[67]. In this work we use MDE paradigm to cater for the evolution and maintenance
of information systems.
In MDE models are placed at the center of software development. A model represents a problem domain such as software, business processes or user requirements.
The different entities of the problem domain are captured by model elements. The
model elements have a set of properties, and may have relations between themselves.
The nature of the model elements, i.e. their type, set of properties, and possible relations, are defined in a metamodel to which the model needs to conform. The primary
responsibility of the metamodel layer is to define a language for specifying models.
As such, a metamodel may be considered as the type of a given model, because it
defines a set of constraints for creating the model. UML [120] and the OMG Common
Warehouse Metamodel (CWM) [154] are examples of metamodels. A metamodel, in
turn conforms to a meta-metamodel. The primary responsibility of this layer is to
define the language for specifying a metamodel. The meta-metamodeling layer forms
the foundation of the metamodeling hierarchy, established by OMG [124]. This layer
is often referred to as M3, and MOF is an example of a meta-metamodel. The metamodel, the model and user data object layers are referred to as M2, M1 and M0
respectively.
As we described in the previous section, although different modeling notations can
be used in MDE of information systems such as Entity-Relationship (ER) diagrams
17
[153], Object Role Modeling (ORM) [75], we focus on using the Unified Modeling Language (UML) [120], which is a representative notation of object modeling paradigm.
In information systems design, UML models are normally augmented with OCL constraints [121] to express data integrity properties, pre and postconditions of update
and query operations [51].
From an MDE perspective and considering the four-layer metamodeling architecture outlined in Section 2.2.1, UML and OCL element used in an IS data model
correspond to layer [M1] and must conform to metamodels of UML and OCL at layer
[M2]. Data models, in turn, can be instantiated into object diagrams representing
data instances of real life systems [M0].
2.2.1
Metamodel hierarchy
An illustration of how these meta-layers relate to each other is shown in Figure 2.4
which is adapted from [124]. A metamodel (at level M2) is an instance of a metametamodel (at level M3), meaning that every element of the metamodel is an instance
of an element in the meta-metamodel. Similarly, a model (at level M1) is an instance
of a metamodel and a user run-time object (at level M0) is an instance of a model
element defined in a model defined at the upper abstraction layer.
The relationship between meta-metamodel, metamodels and models can be confusing. A major source of that confusion is the fact that the UML meta-metamodel
(MOF) is subset of UML itself (corresponding roughly to class diagrams). To help
clarify this confusion, we may use object-based representation technique as outlined
below.
2.2.2
Class-based and object-based representation of metamodels and models
As already stated, a model that is instantiated from a metamodel can, in turn, be
used as a metamodel of another model in a recursive manner. As such, A MOF
metamodel (at level M2) can either be represented in a class-based : by providing
type descriptions via a set of model elements, as shown above in Figure 2.4, or in
object-based representation: by a set of MOF objects that instantiate the MOF model
elements, as shown in Figure 2.5. This dual representation is often left implicit when
using the MOF with users generally employing the class-based representation [133].
To illustrate this concept, in Figure 2.5, we show an object-based representation
of the same metamodel hierarchy shown in Figure 2.4. The metamodel in level M2
18
Type
M3
(e.g. MOF)
Property
Class
0..*
superclass
0..1
name : String
isAbstract : Boolean
name : String
lower : Integer
upper : UnlimtedNat
type : Type
0..*
ownedAttributes
class
Operation
0..*
<<instanceOf>>
<<instanceOf>>
ownedOperations
<<instanceOf>>
M2
(e.g. UML)
Type
<<instanceOf>>
<<instanceOf>>
Property
Class
0..*
superclass
0..1
name : String
isAbstract : Boolean
0..*
ownedAttributes
class
name : String
lower : Integer
upper : UnlimtedNat
type : Type
Operation
<<instanceOf>>
0..*
ownedOperations
<<instanceOf>>
M1
(user data model)
Employee
name : String
M0
(run-time instance)
<<instanceOf>>
<<instanceOf>>
1
0..*
worksFor
employees
Department
hire()
<<instanceOf>>
anEmployee
Figure 2.4: Illustration of OMG metamodel hierarchy
is defined as a collection of MOF class objects. Objects C1, C2 and C3 are instances
of MOF Class concept in level M1 and are used to denote Class, Property and Operation concepts of UML metamodel. Objects P1, P2 and P3 are instances of MOF
Property and are used to denote name, isAbstract and ownedOperations properties
of UML Class concept. Similarly, at level M1, we show part of the user data model
in an object-based representation. Objects C1 and C2 are used denote Employee and
Department classes and objects P1 and P2 are used to denote the two associations
in the model: worksFor and employees. The bottom layer of the diagram shows an
instance of the Employee class.
19
Type
M3
(e.g. MOF)
Property
Class
0..*
superclass
0..1
name : String
isAbstract : Boolean
0..*
ownedAttributes
class
0..*
name : String
lower : Integer
upper : UnlimtedNat
type : Type
Operation
ownedOperations
c1:Class
ttri
tes
bu
name = ‘Property’
isAbstract = false
owne
ne
i
ttr
bu
s
te
s
p3:Property
name = ‘ownedOperations’
lower = 0
upper = -1
c1:Class
c2:Class
name = ‘Department’
isAbstract = false
e
typ
ow
ne
dA
ttrib
ute
s
name = ‘Employee’
isAbstract = false
p1:Property
name = ‘Operation’
isAbstract = false
dA
bute
p2:Property
name = ‘isAbstract’
lower = 1
upper = 1
type = Boolean
M1
(user data model)
c3:Class
ow
dAttri
p1:Property
name = ‘name’
lower = 1
upper = 1
type = String
p2:Property
name = ‘employees’
lower = 0
upper = -1
type
ow
ne
dA
ttrib
ute
s
dA
ne
ow
c2:Class
name = ‘Class’
isAbstract = false
typ
e
M2
(e.g. UML)
name = ‘worksFor’
lower = 1
upper = 1
M0
(run-time instance)
e1:Employee
name = ‘Ed’
Figure 2.5: Object-based representation of metamodel hierarchy, shown in Figure 2.4
2.2.3
Model-Driven Architecture (MDA)
An attempt to implement a development process based on the concepts of metamodel hierarchy outlined above has been proposed and standardized by OMG under
the name of Model-Driven Architecture (MDA) [125]. Such an approach is based on
20
three fundamental abstraction layers, each of which related to specific artifacts. In
particular, the highest abstraction level deals with the description of business logic details through a Platform-Independent Model (PIM) and a Computation-Independent
Model (CIM). It is not important to know further information about the implementation platform. Such description is delegated to the lower level of abstraction described
as a Platform-Specific Model (PSM).
2.2.4
Model transformation
Model transformation is at the heart of MDE [142]. Model transformation techniques
can enable a wide range of different automated activities as domain models can be
transformed into other models at the same level of abstraction (horizontal model
transformation) or at a lower level of abstraction (vertical model transformation)
[45]. Examples of horizontal model transformation include model refactoring and
model weaving while model refinement and code generation are considered examples
of vertical model transformation.
Model transformations define a mapping between a source and a target model
through a set of rules. Each rule links one or more model elements of the input with
the corresponding target. Obviously, this description has to be given in a generic
way; therefore, it can not be based on a particular model instance but on the abstract
specification of such instances, that is, on the metamodels.
Model transformations can vary from simple element correspondences (translations) to intricate mappings which require a deep knowledge of source and target
domains and complex navigations and analysis of the given abstractions. As a consequence, increasing development of model transformations demands a growing precision of model specifications. As we will be explaining in Section 2.3, this precision
may be achieved by augmenting MDE techniques by formal semantics.
A considerable amount of research has been produced in the area of model transformation. Some of this research has developed into integrated approaches addressing
different aspects of model transformation technique and accompanied by tool support.
Some of such research efforts include for instance, ATL [90], GReAT [7], C-SAW [166]
and VIATRA [44]. Our early investigation of information systems evolution used ATL
[90] tool suit to build a prototypical solution, please see [1].
21
2.2.5
Information systems evolution
Information systems evolution can be driven by changes in various dimensions: end
user requirements (e.g. integrating new concepts and new business rules often translates into the introduction of new data structures and behavior) and technological
advancements (e.g. enhanced capabilities of DBMS with the increasing requirements
for real time answers of unplanned queries may induce modifications to an information
system existing data structures and functions) [73]. Another dimension of changes
involves fixing error conditions (e.g. due to misunderstanding of initial requirements).
For the purpose of this research, the information systems evolution scenario we are
interested in here may be characterized by the continuous introduction of small and
incremental changes and refinements to the underlying object models employed in
the development of information systems. These changes are normally induced by new
user requirements and specified after initial system deployment. As a result, we may
expect to see a series of object model versions representing a natural progression:
or at least, one in which the differences between successive model versions can be
captured and explained.
We may describe the information system evolution problem as a variant of coupled transformation problem outlined by [97] which occurs when multiple instances of
software artifacts need to adapt to conform to an updated version of a shared specification so that they remain consistent with each other. This problem can be tracked
in several areas of computer science such as database schema evolution, grammar
evolution and format evolution.
Obviously, among these manifestations of the coupled transformation problem,
database schema evolution is more closely related to the work we present here. A
detailed review of relevant database schema evolution approaches is included in the
next chapter.
2.3
Formal foundations for model-driven engineering
As highlighted in Section 2.2, the use of MDE paradigm in the design and development of information systems has attracted an increasing attention in both academia
and industry. Using abstract models to describe various system concerns, to facilitate communication among stakeholders and to generate working information system
22
implementations promise not only to increase system quality but also to reduce system development time and cost. However, the lack of precise semantics of various
modeling notations employed by MDE continued to cause difficulties. An informal
description unavoidably involves ambiguities and lacks rigor, and thus precludes the
reasoning, early simulation and automated analysis of system design expressed by a
model. For example, it would be impossible to prove the consistency of a non-trivial
system purely described in a data model. Equally, within the context of MDE model
transformation process (outlined in section 2.2.4), it would be extremely difficult to
ensure that a target (generated) software model is valid. We need to define validation
properties to ensure the syntactic and semantic correctness of the generated model
are in accordance with the specified metamodel.
These problems have been widely recognized [64, 17, 24, 38], and have led to
the development of a number of approaches to improving the precision of modeling
notations. The most common approach to the problem has been to make modeling
notations more precise and amenable to rigorous analysis by integrating them with a
suitable formal specification notation [24]. A number of integrated UML and formal
notations have been proposed, see [29] for a recent survey.
Formal specification methods can provide a precise supplement to modeling language description and can be rigorously validated and verified, leading to early detection of error conditions. A method is formal if it has a sound mathematical basis,
typically given by a formal specification language. This basis provides the means
of precisely defining notions like consistency, completeness and correctness. It provides the means of proving that a specification is realizable, proving that a system
can be implemented correctly, and proving properties of a system without necessarily
running it to determine its behavior.
2.3.1
Integrating modeling approaches and formal methods
An important aspect of this thesis is to investigate the formalization of UML data
models within the context of information systems evolution and data migration. We
aim to show how the assignment of precise meaning to UML data models can be
achieved by mapping a core part of UML modeling language constructs to appropriate
mathematical constructs in an appropriate formal notation. This mapping is a formal
equivalent of the informal semantics of UML modeling language itself: an explanation
of what each construct in our selected modeling language means. Once our UML
models have formal semantics, we are able to exploit formal reasoning techniques
associated with the formal method that we have employed. Since UML is a large
23
Company
Person
Employee
Freelance
employs
0 .. *
Figure 2.6: A simplified company example
modeling language, we restrict ourselves to considering formalization techniques for
a core part of its static model.
Example.
Figure 2.6 shows an example inspired by [17]. A UML data model
can provide a visually expressive and intuitive description of an information system.
However, it is less effective when it comes to answering important questions about
the system it represents. In particular, it is not possible to reason (in a precise
manner) with the model or deduce properties about it. For example, what is the
relationship between the Company and the Person?. Furthermore, does a company
employ any of its Freelance staff?. Based on informal arguments, these questions
might be answered like: ‘some persons must be employed by the company, as some
employees are persons’, or ‘clearly, there is no relationship between the company
and freelance staff - they are not connected’, or ‘surely, some freelance staff can be
employed - the data model does not forbid this’. By developing a precise description
of what a UML data model means, we can develop sound rules for reasoning with
UML models.
Accordingly, exploring the semantic base of UML with formal techniques can be
beneficial for a number of reasons [31]:
1. Formalization allows one to explore consequences of a particular design. Such
exploration can uncover problems related to incomplete, inconsistent and ambiguous specification.
2. Variants of the semantics can be obtained by relaxing and/or tightening constraints on semantic models. This paves the way for performing various analysis
24
and simulation techniques in different contexts and can yield significant insights.
3. System verification is the process of showing that a system satisfies its specification. Formal verification is impossible without a formal specification. Although
we may never completely verify an entire system, we can certainly verify smaller,
critical pieces.
Naturally, based on the above benefits, we should now ask the question: which
formal method should we integrate with our core part of UML data modeling?. Given
the diversity of formal method notations, there is no immediate answer to this question. Each formal method provides different concepts and is well-suited to a particular
context of use. In the following section we take a closer look into our formalization
requirements and map them into the characteristics of some of the well-known formal
methods.
2.3.2
Choosing an appropriate formal method
To be able to identify an appropriate formal method to use, we need to establish the
main characteristics of information systems data models that we need to capture and
formalize. The main purpose of a data model is to represent structural concerns which
involve identifying appropriate static concepts, their properties and relationships,
as opposed to concepts representing other concerns such as real-time or distributed
systems.
Accordingly, an appropriate formal method that we can choose needs to provide
the capabilities to represent various concepts of static structures such as classes, properties and associations with a high level of abstraction. Such capabilities can typically
exist when the method provides mechanisms well-suited for capturing modular specifications. In addition, given our focus on refining abstract specifications into an
executable code, the formal method we need to choose should have a well-established
refinement mechanism. Ideally, such a mechanism should be an integrated part of a
method capable of representing the overall life-cycle of software system development.
Formal methods differ because their underlying specification languages have different syntactic (specific notations with which a specification is represented) and/or
semantic domains (a ‘universe of objects’ that is used to describe the system). An
important consideration prior to applying formal methods is that some languages may
be more suitable to one type of specification than to others [163]. For example, a formal method might be applicable for describing sequential programs but not parallel
ones, or for describing message-passing distributed systems but not transaction-based
25
distributed databases. Without knowing the proper domain of applicability, a user
may inappropriately apply a formal method to an inapplicable domain.
From a high-level perspective, we can classify three broad classes of formal methods : model-oriented, property-oriented and process-oriented [163]. Using a modeloriented method, a user defines a system’s behavior directly by constructing a model
of the system in terms of mathematical structures such as tuples, relations, functions,
sets, and sequences. Using a property-oriented method, a user defines the system’s
behavior indirectly by stating a set of properties, usually in the form of a set of axioms, that the system must satisfy. Examples of property-oriented languages include
Larch [65] and OBJ [68]. Concurrent systems are described using process-oriented
formal specification languages. An implicit model for concurrency is typically the
basis for these languages. In these languages, processes are denoted and built up by
elementary expressions which describe operations of simple processes that are combined to yield new potentially more complex processes. An example of this category
of languages include Communicating Sequential Processes (CSP)[28].
The main criteria we outlined above allow us to eliminate property-oriented and
process-oriented methods. The emphasis of these methods on specifying system properties and behavioral concerns can prevent us from building an explicit model of system structure. This leaves us with formal methods that fall within the third category
of model-oriented languages. In the following section, we concentrate on presenting
some well-known or commonly-used model-oriented formal methods.
Model-oriented languages aim to capture software systems structure and state
abstractly and succinctly. They are based on the idea of data abstractions and most of
them favor a style of specification based on abstract data types (ADTs). State-based
languages that favor a model-oriented style of specification include the languages
of VDM, Z, Object-Z and B. They all follow a similar approach to model software
systems, based on the concepts of set theory and predicate logic.
VDM (Vienna Development Method) [89] started in the early 1970s. Initially, the
language was intended to be used to model the semantics of programming languages,
but it was later adapted to model software systems in general. The latest version of
the language, VDM-SL, is documented in ISO standards [82]. VDM favors a style
of specification based on abstract data types (ADTs), providing special syntax to
support this style of specification (record types).
Z is based on typed set-theory and first order predicate calculus, enriched with a
structuring mechanism based on schemas. Z is flexible and adaptable to suit different styles of specification, but the standard style is based on ADTs specified using
26
the schema calculus. The latest version of Z is in ISO standards [83]; the version
documented in [147] is still popular.
Object-Z is an extension to Z to facilitate the structuring of a model in an ObjectOriented style. The reference language definition is [143]. It provides the class schema
as a structuring mechanism (a collection of an internal state schema and operation
schemas) and support for Object-Oriented notions such as object, inheritance, and
polymorphism. Object-Z is the most successful OO extension to Z, among the ones
that emerged in early 1990s (see [149] for a survey).
B [6] is a state-based model-oriented language and a method designed by Jean
Raymond Abrial, one of the early contributors to Z. It includes a notation based on
abstract machines, and a method to refine abstract models into code in a stepwisemanner. Formal verification of proof obligations ensures that a specification is consistent throughout its refinements. B, like its predecessor, Z [147], is based on set theory
and first-order predicate logic. For refinement, B requires a ‘refinement relation’ as
part of its invariant predicate, which is analogous to an ‘abstraction relation’ schema
in Z. The reference version of the language is documented in [6].
Comparison — VDM, Z, Object-Z and B are state-based model-oriented languages,
which usually model a system by representing its state as a collection of state variables, their values and some operations that can change system state. All are based on
set theory and mathematical logic. We can eliminate VDM and Object-Z. According
to [89] and [143], these two methods are better-suited for describing programs written
in Object-Oriented, while our main focus is on describing database data models. Z
notation is a strongly typed specification language. It is not an executable notation;
it cannot be interpreted or compiled into a running program [85]. Z and B converge
on their approach to model software systems, which reflects Abrial’s influence on both
of them, but they diverge on the level of abstraction. Z is more abstract, focusing
on modeling software systems, and B is more like an abstract programming language
with a very strong emphasis on refining models into code. Both languages also differ
on their semantics, Z has a denotational semantics, and B has a semantics based on
weakest-preconditions.
At an early stage of this research our initial conclusion was that either language
would be applicable. However, upon further investigation and considering our intended development context, we favored B for three main reasons. First, B modular
Abstract Machine Notation (AMN) and Generalized Substitution Language (GSL)
mapped appropriately with our intended representation of data model and evolution
27
MACHINE M
SETS S
VARIABLES v
INVARIANT I
INITIALIZATION T
OPERATIONS
out<-op =
…
/* GSL Operators */
…
END
(a) Abstract
clauses
Machine
GSL Operator
SKIP
S1 [] S2
P|S
P ==> S
S1 ; S2
S1 || S2
Substitution
Immediate termination
CHOICE S1 OR S2 END;
PRE P THEN S END;
SELECT P THEN S END
do S1 then S2
do S1 and S2
(b) partial list of GSL Operators
Figure 2.7: Main elements of B-method AMN and GSL notations
model respectively. The lower level of abstraction provided by these two notations
ensured that representing the semantics of our data and evolution modeling concepts
would not be cumbersome. Second, B refinement mechanism which covers both data
and operation refinement can be used to generate executable code from specification.
Finally, compared to Z, B is better-equipped with tool support. It is supported by
tools for proof and refinement. B proof tools such as Atelier B [13] and B toolkit
[102] are well-documented and can be used at various development steps to ensure
consistency of both abstract specification and related refinement.
In this thesis, we therefore propose to use B-method to formalize UML data models. Below we highlight some of the previous work that used B-method to formalize
UML models
2.4
Formal modeling with B
A B model is constructed from one or more abstract machines. Each abstract machine
has a name and a set of clauses that define the structure and the operations of the
machine. Figure 2.7(a) shows main clauses of B AMN where clause SETS contains
definition of sets; VARIABLES defines the state of the system, which should conform
to properties stated in the INVARIANT clause. INITIALIZATION of variables and variable manipulations in OPERATIONS clause should also preserve invariant properties.
OPERATIONS are based on GSL whose semantics is defined by means of predicate
transformers [71] and the weakest precondition [52].
A generalized substitution is an abstract mathematical programming construct,
built up from basic substitution, for example, the Assignment operator takes the form
28
REFINEMENT M1
REFINES M
VARIABLES v1
INVARIANT J
INITIALIZATION T1
OPERATIONS
out1<-Op1 = PRE P1 THEN S1
END
...
END
MACHINE M
VARIABLES v
INVARIANT I
INITIALIZATION T
OPERATIONS
out<-op = PRE P THEN S END
...
END
Figure 2.8: Example of Data Refinement in B
x := E, corresponding to assignment of expression E to state variable x. The Preconditioning operator P|S executes as S if the precondition P is true, otherwise, its
behavior is non-deterministic and not even guaranteed to terminate. The statement
@x.(P ==>S) represents an unbounded choice operator which chooses an arbitrary
x that satisfies predicate P and then executes S with the value of x. Other GSL operators include SKIP, Bounded Choice, Guarding, Sequential and Parallel composition
as can be seen in Figure 2.7. [6] provides more details on GSL and AMN.
B is a proof-based development method which integrates formal proof techniques
in the development of software systems. At each step of the development process, B
gives rise to a number of so-called proof obligations which guarantee consistency of
specifications and correctness of subsequent implementation. Such proof obligations
can be discharged by a B proof tool using automatic or interactive proof procedures
supported by a proof engine, for example [102].
The B-method supports the notion of data refinement. Design decisions that are
more concrete (closer to an executable code) are stated in refinement machines as
opposed to abstract machines that include abstract design decisions. A refinement
machine must include a linking invariant which relates abstract state to a refinement
state. In addition, a refinement machine will have exactly the same interfaces as the
machine it refines. This means that it will have the same operations as the abstract
machine with exactly the same input and output parameters. Furthermore, operations in the refinement machine are required to work only within the preconditions
given in the abstract machine, so those preconditions are assumed to hold for the
refined operations [141].
Assume that the refinement machine M1 below is a refinement of the abstract
machine M:
29
Machine M1 might then contain new variables as well as replace the abstract data
structures of machine M with the concrete ones. The invariant of M1 - J defines not
only the invariant properties of the refinement machine, but also the linking invariant
connecting the state spaces of M and M1.
For a refinement step to be valid, every possible execution of the refined machine
must correspond (via J) to some execution of the abstract machine. To demonstrate
this, we should prove that initialization T1 is a valid refinement of T, each operation of
M1 is a valid refinement of its counterpart in M. In other words, we must discharge the
following proof obligations to ensure that the transformation preserves the properties
of the abstract level:
1. [T1] ¬ T ¬ J
This proof obligation states that every initial state [T1] in the Refinement machine must have a corresponding initial state [T] in the Abstract Machine via
the linking invariant J.
2. I ∧ J ∧ P ⇒ [S1] ¬ [S ] ¬ J
This proof obligation states that every possible execution [S1] of the Refinement
Machine must correspond (via the linking invariant J) to some execution [S] of
the Abstract Machine. We require this to be true in any state that both the
Abstract Machine and the Refinement Machine can jointly be in (as represented
by the invariants I ∧ J) we require this to be true only when the operation called
within its precondition.
3. I ∧ J ∧ P ⇒ [S1 [out1/out]] ¬ [S ] ¬ (J ∧ out1 = out)
this proof obligation has exactly the same explanation provided in proof obligation 2 above with the added condition that output out1 of the Refinement
operation, must also be matched by an output out of the Abstract Machine
operation.
To carry out these proofs, the B-toolkit [102] includes two complementary provers.
The first one is automatic, implementing a decision procedure that uses a set of
deduction and rewriting rules. The second prover allows the users to enter into a
dialogue with the automatic prover by defining their own deduction and/or rewriting
rules that guide the prover to find the right way to discharge the proofs.
30
IMPLEMENTATION
M2i
REFINES
M2
IMPORTS
Machine1, Machine2
SETS
SET1;SET2
CONSTANTS
CONST1, CONST2
PROPERTIES
CONST1 > CONST2
INVARIANT
I1 & I2
OPERATIONS
OP1;OP2
END
Figure 2.9: Main clauses of B Implementation Machine
An implementation is a particular type of a B machine. When a B-developed system is to be implemented in executable code, it is typically refined to an implementation which IMPORTS an abstract specification of an existing (coded) development
to implement the operations of a refined component.
Figure 2.9 shows the main clauses of an Implementation machine. REFINES clause
names the abstract machine or the refinement being implemented; IMPORTS names
an abstract machine whose operations are used in the implementation; SETS defines
local sets which must be fully enumerated; CONSTANTS defines single scalar values
which are fully defined in the PROPERTIES clause; INVARIANT clause links the states
of imported and refined machines. OPERATIONS lists operations from the refined
machine which are implemented in terms of operations from the imported machine.
The proof obligations for an implementation are the same as those for refinement.
In addition, given that an implementation machine is closer to concrete implementation than the machine it refines, it only allows a restricted number of substitution
forms compared to those used in an abstract machine or in refinement. Those forms
include simple variable assignment, IF THEN ELSE END, VAR IN END and sequential composition. One form of substitution that only exists in implementation
is WHILE-loop. This is an important form of implementation substitutions with its
own proof obligations. A WHILE-loop takes the form of
[WHILE P DO S INVARIANT I VARIANT V END] R
The operation of the loop is controlled by a while-test, denoted by P. The body
of the loop, denoted by S, is executed repeatedly as long as the predicate P is true.
31
As soon as the predicate becomes false, the loop terminates. A loop is normally
preceded by a sequence of instructions called loop initialization that prepares values
in variables that will be used in the loop. The loop variant is an expression that
denotes a natural number and represents the maximum number of iterations of loop
body. The loop invariant is a predicate that makes a statement about the values
of variables in the loop body. The invariant must be true before the loop starts,
while the loop progresses and after the loop terminates, as elaborated further in loop
correctness proof obligations below.
S0 ;
WHILE P
(PO1) [S0] I
(PO2) ∀ X . ( I /\ P ⇒ [S] I )
DO S
(PO3) ∀ X . ( I ⇒ v ∈ ℕ )
(PO4) ∀ X . ( I /\ P ⇒ [n := v ; S] (v < n )
(PO5) ∀ X . ( I /\ ┐P ⇒ R )
⇒
INVARIANT I
R
VARIANT V
END
Figure 2.10: Proof obligations of B implementation loop
Figure 2.10 states the rules of loop correctness in the form of proof obligations.
The first Proof Obligation (PO1) states that the loop invariant I holds on entry to
the loop; (PO2) states that the body of the loop S preserves the invariant; (PO3)
states that the loop variant V is a natural number; (PO4) states that the variant
strictly decreases on each loop iteration; finally, (PO5) states that the desired result
R holds on exit from the loop.
Note: throughout this dissertation, we describe our B-method formalization using
machine-readable ASCII symbols. A brief mapping of the commonly used symbols in
this dissertation to corresponding representation of mathematical operators can be
found in Appendix B.
32
Chapter 3
Towards a Data Model Evolution
Language
3.1
Introduction
In this chapter, we take first steps towards designing a data model evolution language.
The main purpose of this language is to allow information system designers to abstractly specify data model changes and subsequently use this abstract specification
as a basis for generating platform-specific data migration programs.
One of our aims in this thesis is to look into information systems evolution problem
from a model-driven perspective. This aim implies a basic assumption: the information system under consideration has been developed in a model-driven fashion, i.e.
it has been specified as a model which, consequently, transformed and refined until a
working implementation is obtained.
With this aim in mind, at the beginning of this chapter, we depict the basic idea
of our approach in Section 3.2. Using MOF abstraction layers as a starting point, we
investigate main considerations for designing a data model evolution language and
categorize these considerations into two key design requirements, revolving around
precise specification of changes and managing change consequences on consistency
conditions and data instances.
We then look into relevant state of the art literature. We have identified database
schema evolution and model-driven engineering as the main areas of related research.
Following [55], a database schema is a description of the information content of a
database. As such, we consider Entity Relation (ER) diagrams, Relational models
and object models all to be data models, regardless of the underlying database implementation technology. Accordingly, we review schema evolution approaches in object
34
and relational databases to confirm our identified key requirements and to develop a
list of key features that a data model evolution language should present.
The model-driven engineering paradigm treats abstract models as the source code
for a range of testing, implementation, and configuration processes. Although little
has been done in terms of the generation of data migration implementations, there
is clearly related, general purpose work in the areas of model refactoring, model
comparison, model weaving and metamodel evolution as we are discussing later in
this chapter.
The outcome of the above investigation will be utilized in two ways: first, we
develop a list of features that a data modeling language should present. Second,
we also identify the kinds of changes typically required when evolving data models.
This includes compound evolution steps that modelers typically need to perform
and commonly presented in literature. Additionally, we identify techniques used to
manage the effect of data model evolution. Particularly, mapping changes to instances
in the underlying database.
3.2
Towards a Data Model Evolution Language
Figure 3.1 below illustrates the main ideas of the approach that we are going to
investigate in this chapter. Following abstraction hierarchy established in [120], the
figure depicts three abstraction layers. A data model, in M1 layer, specifies the
structure (data types, relationships and constraints) and operations of an information
system [117]. A data model is defined using specific data modeling language (also
referred to as metamodel), shown in M2 layer. This modeling language provides the
modeling constructs and Well-Formedness Rules (WFRs) that a data model must
preserve. Layer M0 represents user data objects or instances that has been collected
and persisted.
As data models evolve from one version to another, we would like to be able
to describe changes to an existing data model. A data model evolution language
thus needs to provide support for describing data model changes. Such changes typically involve primitive operations: adding, deleting or modifying model elements (e.g.
classes and attributes of a data model). However, in many cases a data modeler needs
to change a collection of related model elements. Hence, a data model evolution language would need to provide support for compound operations: referring to multiple
model elements (e.g. extracting a superclass or merging two classes together).
35
conformsTo
model evolution
vn
data migration
M0 - System
Data
Models
vn+1
conformsTo
Data
Models
conformsTo
M1 - Model
conformsTo
Data modeling
language
M2 - Metamodel
Data
Data
Figure 3.1: overview of the approach
Data models are used for collecting and persisting user data objects. When a data
model evolves, a natural consequence would be to propagate data model changes to
data instances. A key requirement for data model evolution language is thus to propagate the data model changes to the database, i.e. to derive and execute data migration
correctly by implementing the changes specified in data model evolution specification.
In many model evolution and data migration scenarios, values of newly introduced
or modified attributes or association can be represented in relation to values existing
in a source data model. For example, in our evolution specifications of a student
management system, we may wish to express that a data value of ‘A’ in a student
grade data element in the evolved model is equivalent to a data value of ‘outstanding’
in the existing model. It would, therefore, be beneficial if the language can provide
support for annotating attributes and associations with expressions representing their
new intended value in relation to existing ones.
It is important to note here that the conformsTo relationship in Figure 3.1 requires satisfying two kinds of constraints, at two different levels of abstraction. The
conformsTo relationship between data models and the modeling language (M1 to
M2) requires data models to satisfy constraints defined at the modeling language
level. Such constraints normally specify how a valid data model can be formed. In
addition, the conformsTo relationship between data instances and data model (M0 to
M1) requires data instances to satisfy constraints defined at data model level. Such
constraints represent integrity properties that may restrict values that a data element
may have or the way a data element may relate to another data element. As a re36
sult, while evolving data models, a data model evolution language needs to provide
support for preserving both kinds of constraints.
Finally, following an evolution, we may want to analyze some of the functionality
or properties of the evolved model. For example, we may wish to determine whether
the evolved model still conforms to the language in which it was initially written or
whether the evolved model still preserves the properties of the source model. To be
able to address such requirements, a data model evolution approach needs to provide
precise semantics to the data model and to its related changes. Such precise semantics
can be exploited in an appropriate analysis framework and provide answers to the
kind of questions addressed above.
In summary, from the above explanation, we can synthesize a number of key requirements that need to be addressed by a data model evolution language. These
key requirements can be categorized in two main categories: specifying evolution and
managing the effect of evolution. The first key requirement deals with the ability to
precisely specify different kind of changes to a data model. The second key requirement deals with the ability of the language to maintain model consistency so that an
evolved model still conforms to the modeling language. In addition, as the change
specifications need to be mapped to executable data migration programs, the language needs to provide mechanisms for confirming that the migrated data instances
preserve the integrity rules of the evolved data model.
3.3
State of the Art
Considering the two key requirements which we have identified above, here we examine relevant approaches in two main research areas: database schema evolution and
model-driven engineering.
As Figure 3.2 shows, each of the two main research areas include a number of
subareas where different aspects of our research have been addressed. In our investigation of these research efforts, we focus on the two key dimensions we have identified:
specifying evolution and managing the effect of evolution.
Within database schema evolution, we review object and relational database
schema evolution approaches. However, we put more emphasis when reviewing object databases. This is mainly due to the similarity between data models proposed in
object database approaches and the underlying data model we consider in this thesis.
In addition, by closely reviewing object database schema evolution approaches, we
37
SpecIfyIng evolutIon
Model Refactoring
OODB Schema Evolution
Relational Database Schema
Evolution
ModelDriven
Evolution
of IS
Database Schema Evolution
Model Comparison
Metamodel Evolution
Model-Driven Engineering
M a n a g I n g t h e e f f e c t of e v o l u t i o n
Figure 3.2: Overview of relevant research areas with a focus on two key dimensions
aim to achieve another, more specific, goal. Given that primitive evolution operations are largely the same regardless of the object model employed, we aim to identify
composition requirements and the kind of compound evolution operations that our
data model evolution language needs to support.
3.3.1
Database Schema Evolution
Once a database system is developed, various reasons can require changes to be
made to the database system (e.g. due to a change in user requirements, a fix to
an erroneous condition, or a need to support new applications). These changes can
imply modifications to the conceptual schema, logical schema or database state. In
literature this process is termed schema evolution. It is defined as the process of
applying changes to a schema in a consistent way and propagating these changes to
data instances while the database is in operation [50]. We start our review of schema
evolution literature by looking into approaches that investigate the evolution of object
database schemas.
Object Database Schema Evolution Approaches. Several object data model
representations have been proposed in literature with corresponding schema evolution
approaches, for example [14, 60, 100, 131, 39, 40, 134].
The underlying object data models in most of object-oriented database approaches
are, to a large extent, similar. Data models of earlier approaches like [14, 60] consisted
of a collection of classes organized in a lattice (rooted and connected directed acyclic
graph). Each class has at least one superclass (except object class). Both instance
variables and methods of a class can be inherited and overridden. Object data models
of more recent approaches such as [39, 40, 134] were based on ODMG standard [165].
38
Most of Object-Oriented database approaches establish a taxonomy of pre-defined
schema changes such as addition or deletion of classes and attributes. While some
of these approaches did not define any compound evolution operations in their taxonomies like [14]. Other approaches [26, 100, 134] introduced a set of high-level
operators. Table 3.1 shows a list of taxonomies proposed by selected object database
schema evolution approaches.
In our work and similar to [100] and [134], we assume that compound data model
evolution operations can be decomposed into simpler primitive evolution operations.
A pre-requisite step here is the provision of composition operators that can allow
building compound operations on top of the primitive ones. [26], in particular, showed
how to implement compound operators using primitive operators on O2 object model.
[134] based the definition of composites on the notion of link between classes. Another
kind of compound evolution operations, focussing on evolving relationship constructs,
was introduced by Claypool et al [40].
Maintenance of data model integrity constraints was an active area of research
within the object database schema evolution community. In earlier approaches like
[14], integrity maintenance was achieved by using a set of rules for selecting the most
meaningful way to preserve the invariant properties for each of the schema changes. A
similar approach was adopted by O2 [49]. In more recent approaches like [39], schema
invariants are maintained through the use of contract (pre and post condition) that
are associated with each evolution operation.
Relational Database Schema Evolution Approaches.
The relational data
model organizes data in the form of relations (tables). These tables consist of tuples
(rows) of information defined over a set of attributes (columns). The attributes, in
turn, are defined over a set of atomic domains of values. Many relational database
schema evolution approaches such as [42] rely on the Data Definition Language (DDL)
statements from SQL (CREATE, DROP, ALTER) to perform schema evolution.
Schema evolution primitives in the SQL language are primitive in nature. Unless
there is an extension to the language, each statement describes a simple change to a
schema (e.g. adding or dropping individual tables or columns). One such extension is
the Schema Modification Operators (SMO) proposed by the PRISM approach [113].
In SMO each statement represents a common database restructuring action that
requires data migration. In addition, in PRISM, each SMO statement and its inverse
can be represented as a logical formula in predicate calculus as well as SQL statements
that describe the alteration of schema and movement of data. Another SQL extension
39
Approach
ORION - Banerjee et al,
(1987) [14]
OTGen - Lerner and
Haberman (1990) [101]
O2 - Ferrandina and Zicari (1995) [60]
Primitive evolution operations
Compound evolution operations
• Changes to a node (add, drop, rename)
• Changes to an edge (making a class
a superclass, removing a class from
the superclass list, changing the order of superclasses)
• Changes to an instance variable
(add, drop, rename, change of domain, change inheritance, change
default value)
• Class (add, delete, rename, add superclass, delete superclass)
• Instance variable (add, delete, rename, change type)
• Creation, modification, renaming
and deletion of a class
• Creation, modification, deletion
and renaming of an attribute
• Creation and deletion of an inheritance link between two classes
•
•
•
•
Extract superclass
Extract subclass
Extract class
Merge classes
SERF [39]
•
•
•
•
Add-class
Destroy-leaf-class
Add and delete-ISA-edge
Add and delete-attribute
•
•
•
•
•
Merge-classes-union
Merge-classes-difference
Inline-class
Encapsulate-class
Generalize and specializeclasses
•
•
•
•
Add and delete-class
Add and delete-ISA-edge
Add and delete-reference-attribute
form and drop-relationship
•
•
•
•
Change-cardinality-1m,
Change-cardinality-m1,
Change-relationship-name
Change-type
ROVER [40]
Schema modification by
catalog [134]
• Add and delete class
• Add and delete attribute
• Add and delete inheritance link
• Specialize super type
• Extract class, super and
subclass
• Inline class, super and subclass
• Move feature over reference
• Merge classes
• ...
Table 3.1: An outline of selected database schema evolution approaches
40
to support relational schema evolution was proposed by HECATAEUS [129]. Here, a
central construct is an evolution policy, which is a syntactic extension to SQL. DBMAIN is a conceptual modeling platform that offers services that connect models and
databases. Based on DB-MAIN platform, a number of proposals such as [41] and [42]
were made to ensure that changes to a model should propagate to the database in
a way that evolves the database and maintains the data in its instance rather than
dropping the database and regenerating a fresh instance.
In a relational model, integrity constraints can take various forms. The most
popular types are entity integrity and referential integrity constraints [55]. The first
guarantees that no two tuples belonging to the same relation refer to the same realworld entity. In other words, it guarantees uniqueness of keys. The second constraint
makes sure that whenever a column in one table derives values from a key of another table, those values must be consistent. A great deal of research work has been
carried out on the enforcement of integrity constraints in relational databases. In
the pioneering work of [35] a general framework is described for transforming constraints into active rules for constraint maintenance. In Trker and Gertz [158] a set
of system-independent rules were proposed to implement triggers based on constraint
specifications by using Tuple Relational Calculus.
This concludes our review of related approaches in database schema evolution
area. We now take a closer look into main relevant approaches in Model-Driven
Engineering.
3.3.2
Model-Driven Engineering
In Model-Driven Engineering, we have model refactoring, model weaving, model comparison and model co-evolution as potentially relevant sources of research relevant for
the main focus of our thesis.
Model Refactoring. Opdyke [127] and Roberts [139] work focused on refactoring
object-oriented programs in such a way that it does not alter the external behavior
of the code, yet improves its internal structure. Fowler et al [62] promoted the notion
of refactoring as a means of revising design models and hence code. They presented
a catalogue of refactoring rules. Although Fowler did not provide a precise definition
of his refactoring rules (e.g. using a formal notation), we find his work relevant to
to the work we carry out in this thesis : his work can be an appropriate source of
candidate model evolution patterns.
41
Category
Moving features between objects
Refactoring rules
• Move field
• Extract class
• Inline class
Organizing data
•
•
•
•
•
•
•
•
•
Replace Data Value with Object
Change Value to Reference
Change Reference to Value
Change Unidirectional Association to Bidirectional.
Change Bidirectional Association to Unidirectional.
Replace type code with class
Replace type code with subclasses
Replace type code with state / strategy
Replace subclass with fields
•
•
•
•
•
•
•
Pull up Field
Push down Field
Extract Subclass
Extract Superclass,
Collapse Hierarchy
Replace Inheritance with Delegation.
Replace Delegation with Inheritance
Dealing with generalization
Table 3.2: A selected list from Fowler’s refactoring catalog
In his book, Fowler presented an extensive catalogue of refactoring, primarily
based on industrial experience. Sixty-eight refactorings were presented and organized
into six categories according to the kind of refactoring. Since Fowler’s focus in his
book was on refactoring design models of object-oriented code which is different from
our focus on evolving data models, only a subset of Fowler’s refactorings would be
relevant for the work we present here. Therefore, when reviewing Fowler’s refactoring
catalogue, we limited ourselves to those refactoring rules that we find suitable for data
modeling and may have a direct impact on persistent data. For example, we can exclude categories such as composition of methods which is related to the re-organization
and re-structuring of methods. Fowler’s refactoring rules that we find relevant were
categorized under three main categories: moving features between objects, organizing
data and dealing with generalization. Inside these three categories, we had to go one
step further and exclude those refactoring rules that deal with the way the code is
organized (e.g. pull up constructor body) or those which deal with dynamic behavior
of an object-oriented program (e.g. introduce foreign method ). As a result of our
42
review, we extracted a list of Fowler’s refactoring rules which is presented in Table
3.2.
Since our focus is on data model evolution, we are particularly interested in investigating the application of the refactoring notion to models. Most of the work on
model refactoring was based on Fowler’s refactoring catalogue. Approaches such as
[69, 21, 108, 151] proposed different ways for refactoring UML Class diagrams and
state chart models. However, since these approaches do not contribute directly to our
identified two key requirements, we take note of them here without going further in
the review of their details.
The refactoring of a model might result in the evolution of persistent data. Thus,
persistent data needs to co-evolve. This aspect of refactoring is relevant to our second
key requirement which deals with managing the effect of evolution. We may note
that many of the changes made to schemas in Object Oriented Databases (OODBs)
as discussed in Section 3.3.1 are similar to refactoring. In spite of this close relation
between OODB schema evolution and model refactoring, little was done by the modeldriven engineering community to support data migration as a result of data model
refactoring [156].
Although the notion of refactoring has been widely investigated, a precise definition of behavior preservation is rarely provided [111]. Fowler, for example, did
not offer any characterization to prove this essential refactoring property. Rather, he
presented these refactoring steps in an informal way providing step-by-step guidelines
for their applications. A similar approach was adopted by Ambler [10] in his work
on database refactoring. In model refactoring literature some efforts were made in
order to characterize the notion of behavior preservation. For example, in [151], for
class diagram refactoring, it was argued that the refactoring steps carried out on the
model do preserve the behavior (e.g. creating a generalization between two classes
does not introduce new behavior).
As the essence of refactoring is behavior preservation, pure refactoring is insufficient to assist in data model evolution as model evolution could result in introducing
new behavior. Therefore, we continue to investigate the literature on other modeldriven engineering approaches that involve other structural changes to models.
Model Comparison. Model comparison can be used as a first step towards model
evolution. Instead of editing a source model in order to obtain a target (evolved)
model, we may wish to build the target model in isolation of the source model and
use model comparison technique to identify the difference between the two model
43
versions. This difference can then be used to derive the migration of the underlying
data.
Model comparison approaches aims to realize how a model was evolved and generate a difference between two model versions. Epsilon Comparison Language (ECL)
was presented in [94] as a task-specific language of the generic model management
language, Epsilon Object Language (EOL) [95]. ECL features a rule-based metamodel agnostic syntax. The main focus of ECL is on matching element(s) of a left
model to element(s) of a right model. ECL proposes no specific representation of
the result of model comparison (e.g. difference model) nor any integration with
transformation techniques. In [70] the capability to find mappings and differences between models was termed Model differentiation. The authors presented a metamodelindependent algorithm and a tool (DSMDiff) for detecting mappings and differences
between domain-specific models. The algorithm presented determines only if the two
models are syntactically equivalent. Structural differences in DSMDiff are shown in
a tree browser with coloring and iconic notations which lack the ability to integrate
into other MDE processes (e.g. model transformation).
EMF Compare [80] is another model comparison tool which is part of the larger
Eclipse Modeling Framework Technology (EMFT) project. The objective of this
component is to provide a generic and scalable implementation of model comparison.
Although EMF Compare tool represents model difference as a model which makes the
tool appealing for the purpose of our approach, it offers limited support considering
our data migration scenario. The generic capabilities of the tool implies that it
is at a too high level of an abstraction that makes it inflexible to domain specific
customization required for model-driven data migration approach.
In general, while model comparison and difference representation approaches address a similar problem : structural evolution of models, there are three important
differences from the work we are investigating. First, in these approaches, a primary
assumption is that a target model already exists and the focus is on calculating and
characterizing the difference between a source model and a target model. Second,
model comparison and difference representation approaches seem to limit their scope
to the structural elements of a model with no consideration to integrity constraints or
well-formedness rules. Third, these approaches tend to be generic and do not focus
on information system evolution specific requirements such as data migration.
Metamodel Evolution and Model Co-Evolution.
A number of approaches
proposed to address the problem of metamodel evolution and model co-evolution i.e.
44
adapting (migrating) models that conform to an older version of a metamodel to a
newer version of the metamodel.
[78] Investigated the potential of automating model migration in response to metamodel changes by analyzing the evolution history of two industrial metamodels. A
classification of metamodel changes based on potential automation was presented as
a basis for specifying the requirements for effective tool support. Another classification of metamodel changes in EMF/Ecore was presented by [32] which introduced
a process model that defines the necessary steps to migrate model instances upon
an evolving metamodel. Wachsmuth [160] proposed an operator-based approach for
metamodel evolution and classifies a set of operators according to the preservation of
metamodel expressiveness and existing models. Cicchetti et al. [37] proposed to represent metamodel changes as difference models conforming to a difference metamodel
to identify semi-automated countermeasures in order to co-evolve the corresponding
models. Here, the co-evolution of models induced by the metamodel changes are reduced to three cases which are: ∆MM [resolvable] model containing resolvable changes
(e.g. extracting abstract superclass) , ∆MM [unresolvable] with unresolvable changes
(e.g. adding obligatory meta-property) and non-breaking changes that do not need
to be propagated and are ignored (e.g. adding non-obligatory meta-property).
In [78] and [37] metamodel evolution is specified by a sequence of operator applications. To support the co-evolution of models, each operator application can be
coupled to a model migration separately. Operator-based approaches generally provide a set of reusable operators which work at the metamodel level as well as at the
model level. At the metamodel level, a coupled operator defines a metamodel transformation capturing a common evolution. At the model level, a coupled operator
defines a model transformation capturing the corresponding migration.
Our work aims to deal with data model changes (M1) which has consequential effects on data instances (M0) and require explicit treatment of data model constraints
to ensure the semantic integrity of the data is preserved. Moreover, with the exception of [37], classified changes are not presented in models; hence lack the ability to
integrate into other MDE processes (e.g.model transformation).
3.4
Our approach
In this section we synthesize the key requirements that need to be addressed by a
data model evolution language and highligh, at a high level, the main components
45
Specifying evolution
• Ability to express primitive evolutionary changes
• Support for compositionality
• Support for semantic linkage of data
Managing the effect of evolution
• Maintenance of model consistency
• Maintenance of data integrity
• Support for data migration
Figure 3.3: Main requirements of data model evolution language
of our approach. These components will be elaborated in great detail in subsequent
chapters.
3.4.1
Synthesizing design requirements
As we have seen throughout the previous section; there is a considerable body of
related and applicable work addressing different aspects of information systems evolution and data migration. Database schema evolution approaches that we have reviewed in section 3.3.1 provide solid theoretical foundations and interesting methodological efforts, the lack of abstraction was observed in [137] and remains largely
unresolved after many years. Conversely, model-driven engineering approaches which
we reviewed in section 3.3.2, promote the idea of abstracting from implementation details by focusing on models as first class entities. Approaches such as model weaving,
model refactoring, model comparison and metamodel evolution may help in characterizing information systems evolution however, for our purpose, they remain largely
general-purpose and offer no specific support for information systems evolution tasks
such as data migration.
From the review of the above literature, we can synthesize requirements that need
to be addressed by a data model evolution language. We use the two main categories
we identified earlier in section 3.2 to classify a number of more specific and desirable
features for a data model evolution language. This list of requirements is presented
in figure 3.3 and briefly explained below.
46
Under evolution specification category, a data modeler must be able to precisely
specify two kinds of changes on a data model: primitive and compound. Given
the constructs of a particular data modeling language, a model evolution language
should enumerate possible changes and derive a basic notation for the evolution of
data models in that language. This can be achieved by considering the ways in
which each kind of model element may be added, removed, or modified. Additionally,
a data model evolution language should define combinators. Using combinators, a
modeler can compose compound evolution operations to describe changes to a number
of different model elements. In addition, the language needs to include a facility
for annotating attributes and associations with expressions representing their new
intended value in relation to values existing in a source data model.
From the perspective of managing the effect of evolution, it is important to note
that a data model rarely exists in isolation. On one hand, a data model must be a
valid instance of a data modeling language, and on other hand, valid data models
are used to collect and persist user data objects. A data model evolution language
thus needs to provide mechanisms for maintaining the integrity of both model and
data integrity. Within the domain of information systems evolution, the value of
any data model evolution language is greatly minimized if it does provide ways for
supporting data migration. Data model changes can have direct impact on existing
data instances. As a result, upon data model evolution, data migration needs to be
specified. This can be achieved by describing corresponding data transformations by
which we can adapt data collected and persisted based on a source data model so
that it conforms to a target data model.
3.4.2
Main elements of our approach
The key requirements we have synthesized above may be met by a model-driven approach to data model evolution and data migration. As a primary technique, our
proposed approach uses metamodeling and model transformations to develop data
migration implementations. Figure 3.4 shows an overview of our proposed approach.
The figure presents a package diagram representing metamodels defining the main
components of our approach. All these metamodels are modeled using Meta-Object
Facility (MOF)[122], as a metamodeling language. Our metamodeling approach consists of a number of metamodels. Each metamodel defines a component of the approach and is represented in a separate package. The combination of these packages
defines our model-driven approach to information system data migration.
47
Meta-Object Facility (MOF)
UML Data
Metamodel
<<extends>>
Evolution
Metamodel
B Metamodel
Semantic Mapping
SQL Metamodel
Code Generation
Figure 3.4: Overview of the metamodeling approach
In our approach, each development step is represented by a model, which is an
instance of the corresponding metamodel. We move from one development step to the
other using model transformation. By expressing our solution at a metamodel level,
in essence, we are proposing a general reusable solution for describing the evolution of
all data models conforming to our selected data metamodeling language (e.g. UML).
Because these metamodels and model transformation artifacts are the basic essential elements of our approach, we elaborate more on their contents and use below. As
shown in Figure 3.4, There are three main packages and two model transformation
algorithms:
• UML Data Metamodel : is a subset of UML core package that we have
defined to model domain state of an information system. Although, as we outlined in Section 2.2, different modeling notations can be used in Model-Driven
Engineering of information systems, we focus on using the Unified Modeling
Language (UML)[124] for describing model elements and the Object Constraint
Language (OCL) [121] for describing constraints. This metamodel is discussed
in greater details in Section 4.2.
• Evolution Metamodel : is an extension of the UML Data Metamodel. It
defines the abstract syntax of our proposed evolution modeling language. This
metamodel will also be presented in more details in Section 4.3.
• B-method Metamodel : building on the introduction of the main concepts
of B-method language we presented in Section 2.3, this MOF-based metamodel
48
encapsulates the abstract syntax of B-method language constructs. This metamodel is used for giving precise semantics to both our data modeling and evolution modeling languages.
We explain the way we use B-method to assign formal semantics to our data and
evolution metamodel in Chapters 5. In Chapter 6, we explain how we transform
an abstract representation of B-method abstract machines into refinment and
implementation constructs. In Appendix A, we elaborate on the use of Bmethod metamodel within a model-driven engineering context.
• SQL Metamodel : to generate an appropriate implementation of our model
evolution and data migrations, we require a mapping from our abstract model
operations to operations upon a specific, concrete platform. Given the prevalence of relational database implementations, we assume that the data we want
to evolve is persisted in a relation database and use a metamodel for the Structure Query Language (SQL) to generate executable data migration code.
In Chapter 6, we explain how the main elements of SQL metamodel may be
given formal semantics in B-method framework. This step is necessary in order
to establish a formal refinement relation between proposed abstract evolution
specifications and corresponding SQL code representation. In Appendix A,
we elaborate on the use of SQL metamodel within a model-driven engineering
context by describing how the required SQL migration code is generated.
• Semantic Mapping : we use model transformation technique to link the abstract syntax of the Evolution Metamodel to the abstract syntax of B Metamodel. This mapping gives the evolution language its meaning in terms of
B-method constructs. As we will explain in greater details in Chapter 5, we
distinguish two types of semantics: static and dynamic. Static semantics refers
to precise description of data model constructs and consistency conditions of
constructs in the data metamodeling language, and is specified as invariant conditions that must hold for any data model created using the evolution language.
Dynamic semantics refer to the interpretation of a given set of model evolution
operations in the context of data models.
• Code Generation : this is a model-to-text transformation to transform an
SQL model, a valid instance of SQL Metamodel, into a corresponding collection
of SQL statements. The generated statements include both Data Definition
49
Language (DDL) statements to update table and column structure and Data
Manipulation Language (DML) statements to migrate data records.
Although Figure 3.4 shows only two mappings defined by model transformation
algorithms, other mappings exist in the approach and are not shown in the diagram
for clarity of presentation. For example, as we will show in the implementation
section in Appendix A, we map the abstract syntax of the Evolution Metamodel
into a context-free EBNF grammar to generate an editor that can be used to edit
data models, conformant to our UML Data Metamodel. Another example is the
refinement transformation we perform in Chapter 6. In this transformation activity,
we map an instance of B Metamodel constructs corresponding to (an object-oriented)
evolution model to another instance of B Metamodel corresponding to (a relational)
SQL model.
50
Chapter 4
Modeling Data Model Evolution
In this chapter, we define two modeling components of our model-driven approach :
data model and evolution model. In our context, data models include information
about structural aspects of an information system design. Evolution models, on the
other hand, include abstract description of data model changes. As UML is our
selected modeling language, naturally, these two modeling components need to be
defined in UML.
Our proposed UML data model will use those UML concepts which are essential
for modeling aspects related to the structural concepts and their relationships. Since
semantic integrity is an important concern in data modeling, these structural concepts
will need to be augmented with a constraint language, precisely defining how data
elements may be related or restricting the domain of certain data values.
In the context of model-driven engineering, an evolutionary step in the design of
an information system corresponds to a change in the system model: removing or
adding features, changing properties or associations. To capture such evolutionary
changes, we must extend the language in which the data model itself is written; in
this case UML.
Since our ultimate goal is to migrate data collected against an old data model
into a form suitable for storage against a new data model, it may not be sufficient to
record information about the intent of the evolution at the data model level. If we
are able to define the ways in which the values of attributes in the new model are
related to the values of attributes in the old, we can generate an appropriate data
transformation, applicable at the instance level, which can be transformed into an
appropriate data migration implementation
This chapter is structured as follows. Section 4.1 briefly describes our modeling
approach in general. Section 4.2 discusses our data modeling approach. Since we use
52
Data Modeling (Section 4.2 )
Identification of UML concepts for static structure
modeling (Section 4.2.1)
input
Characterization of a UML Data Metamodel
(Section 4.2.2)
output
Definition of data model consistency
(Section 4.2.3)
Evolution Modeling (Section 4.3 )
Definition of Evolution Metamodel (Section 4.3.1)
input
Identification of primitive data model edits
(Section 4.3.2)
output
Identification of composition operators and
expression of compound evolution (Section 4.3.3)
Figure 4.1: Main steps of data and evolution modeling approach
UML as the modeling language, we first identify UML concepts relevant for structural modeling. We then explain the method we followed to characterize a UML
metamodel subset, appropriate for data modeling and we discuss conditions required
for data models to be consistent. Section 4.3 proceeds with a definition of an abstract
syntax of evolution models, in the form of an Evolution Metamodel. In this section
we first discuss how a UML-based metamodel (similar to our Data Metamodel) may
be extended. We then present an algorithm for generating an evolution model, suitable for capturing changes to data models, written in our proposed data modeling
language. Section 4.4 presents ‘induced data migration’ as an important realization
of our modeling approach. The chapter closes with a discussion that establishes the
need for extending our modeling approach with appropriate formal semantics.
4.1
Modeling Approach
Our approach to model data model evolution [1, 3, 2] can be summarized in the
following two steps (see Figure 4.1):
53
Step 1 : Data model structural concepts are described using a UML Data Metamodel. This metamodel includes concepts typically found in a UML class diagram
such as classes, properties and associations. We have followed a systematic method
to identify these elements within UML packages relevant for describing static structures. This metamodel (at level M2 of MOF hierarchy) can be instantiated into data
models (at level M1) describing user data applications from various domains (e.g.
employee information, student registration, part ordering etc). Once data models are
in place, they can be used to collect and persist data objects (at level M0). For a
data model to be consistent, elements at various abstraction levels, need to satisfy
a set of consistency constraints. As we will elaborate in Section 4.2.3, two kinds of
consistency constraints need to be preserved : constraints defined by the modeling
language (i.e. UML Data Metamodel) need to be respected by data models; while
constraints defined by the data model need to be respected by data objects at the
instance level.
Step 2 : Evolutionary steps of UML data models are described using an Evolution Metamodel. This evolution metamodel (at level M2 of MOF hierarchy) is an
extension of the Data Metamodel, defined in step 1 and can be instantiated in an evolution model (at level M1), abstractly describing data model changes. Using model
transformation, the abstract evolution model can be transformed into a program in
an implementation language (e.g. SQL) to migrate the data (at level M0) from one
data store to the other.
Our modeling approach has several benefits. The most important among them
are the following:
1. Decoupling from underlying implementation platform. Both modeling components which we describe in this chapter are abstractly defined in UML. As such,
they are not tight to any underlying implementation technology. Using appropriate model transformation techniques (see Section 2.2.4), information system
designers can translate these modeling artifacts into executable programs in
a chosen platform-specific implementation. This separation between modeling
and implementation concerns can enable induced data migration, as we will
explain in Section 4.4.
2. Reusability. Defining the approach at a metamodel level of design implies that
resulting metamodel artifacts are reusable : they can be generically used as
54
templates to describe various data models and different data model evolution
scenarios.
3. Relevance. The use of a standard modeling framework such as UML for data
and evolution modeling makes our approach relevant for potential usage by a
wide range of audience in software engineering community; and integration with
a large number of existing supporting tools.
4. Extendability. While the set of primitive model edits that make up an evolution
model can be identified : each data model element may be added, modified or
removed, compound evolution scenarios can be fairly large. Using our evolution metamodel, new evolution patterns can be defined by composing primitive
model edits or existing evolution patterns to describe new and emerging evolution scenarios.
4.2
Data Modeling
Although UML is mainly known for designing object-oriented program code, it is
becoming a popular tool in data modeling community [138],[115] and [116]. Yet,
there is no generally accepted standard for data modeling based on UML. In the
absence of a standard UML data modeling extension, we characterize an adequate
metamodel for modeling information systems data using a subset of UML constructs
augmented with language-level constraints relevant for data modeling. In this section,
we elaborate on the steps we followed to define a UML Data Metamodel.
4.2.1
UML concepts for static structure modeling
The first step our UML data modeling is to identify concepts relevant for modeling
structural aspects of an information system design. We have identified objects, classes,
properties,generalization, and association as the fundamental concepts needed for
developing data models using UML. As these concepts are the essential building
blocks of our Data Metamodel and have a direct influence on the way we define
our data model evolution, it is important to describe what do they mean within the
context of UML structural modeling.
Object is the most fundamental concept of UML for describing structural aspects
of a system. Objects do not only represent real world entities (such as employees,
departments, etc.), but they also describe abstract concepts like employment or assignment. Structural properties of an object are described as a set of attributes, which
55
can be instantiated into slots holding data values. Objects are described by classes.
A class describes the common properties of a set of objects of a domain. When a
class is abstract, it cannot have any objects. Objects do not stand alone; they are
always tied to their classes. The terms instance and object are largely synonymous
and, from now on, will be used interchangeably.
Common properties of a set of classes can be placed in a parent class which inherits
these properties to its child classes. This forms a generalization relationship between
the parent class (superclass) and the child classes (subclasses). Closely related to
generalization is the notion of inheritance, mainly known from object-oriented programming languages. The child inherits structure, behavior, and constraints from
all its parents. In a simple case, a class has a single parent. In a more complicated
situation, a child may have more than one parents. This is called multiple inheritance
(see [152] for a comprehensive overview). In our defined Data Metamodel, we only
allow single inheritance.
Associations are the UML primary constructs for expressing relationships in a
model. An association has a name and two or more association ends defining roles
played by classes participating in the relationship. An association end can be paired
with an opposite association end to form a bidirectional association. In such a case,
both ends need to be owned by the same association. In addition, each association
end has a minimum and maximum multiplicity bound. The minimum multiplicity
bound indicates that all objects of the owning class must be related, at least, to a
minimum objects of the associated class. The maximum multiplicity bound defines
that an object of the owning class cannot be related with more than a maximum
objects of the associated class. At the instance level, an association is instantiated
into a link which holds references to objects of classes participating in the association.
4.2.2
Characterization of a UML Data Metamodel
Now that we have identified the basic UML concepts that are needed to describe
data models, we need to present them in a data modeling language (in the form of
metamodel). Defining a UML-based metamodel supporting the structural modeling
concepts which we have identified requires mapping these basic concepts into appropriate modeling elements in UML metamodel. This mapping would provide us with
a set of elements with their associated UML definitions and characteristics.
UML metamodel is organized into a vast number of packages from which various
diagrams may be constructed to model different perspectives and views of an application. Since the focus of our work is on data modeling, as we discussed above, we only
56
Element
Element
NamedElement
Type
TypedElement
Class
MultiplicityElement
Operation
Parameter
Property
DataType
Enumeration
EnumerationLiteral
PrimitiveType
Comment
Package
Data Metamodel
Essential
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Persistable
X
X
X
X
X
X
X
X
Table 4.1: Elements of UML Core::Basic package
consider aspects related to modeling static structures. UML elements corresponding
to our identified structure modeling concepts are defined in the Core package of the
infrastructure library ([124], pp.27) which define a meta language core that can be
reused to define a variety of metamodels. The Core package is subdivided into a
number of sub-packages: Basic, PrimitiveTypes, Abstractions, Constructs and Generalization. As a convention, we use ‘::’ to denote navigation from one package to its
sub-package. Core::Basic is of particular interest to our work as it provides a minimal
class-based modeling language on top of which more complex languages can be built.
Table 4.1 lists all elements (metaclasses) of the Basic package. For each element,
the table shows whether the element is included in our UML subset (Data Metamodel
column). The selection is based on two criteria. First, whether the element is required
for modeling structural aspects (Essential column) i.e. does it map directly to one
of our identified structural modeling concepts?. This is a compulsory criterion that
determines whether an element can be included or excluded from our subset. Second,
whether the element can persist data items (Persistable column). This is an important
but not a compulsory criterion. This criterion is important because we are going to
depend on it to identify elements that can be instantiated and subsequently, present
a corresponding instance layer. But it is not a compulsory criterion since although
abstract metaclasses cannot have direct instances, they can play an important role
in structuring and organizing the data model (e.g. MultiplicityEelement metaclass).
57
PrimitiveType
1..*
associationType
NamedElement
name: String
Enumeration
MultiplicityElement
DataType
Type
TypedElement
0..1
dataType
elemntType
0..1
superclass
Class
isAbstract: Boolean
Property
0..*
class
ownedAttributes
isComposite: Boolean
Association
2..*
AssoEnd
0..1
association
linkAssot’n
slotProperty
lkEndPropty
objectClass
0..1
opposite
propertyslot
Slot
slotValue
associationLinks
Instance
slot
instances
LkEndValue
propertyLinkEnd
Object
DataValue
linkEnds
LinkEnd
Link
ownedLkEnds
owningLink
dataValue
Figure 4.2: A subset of UML metamodel
The table shows the result of applying both criteria to elements of the Core::Basic
package and accordingly, were included in or excluded from our metamodel subset.
From a UML data modeling perspective, it is important to note here that the
elements above which we extracted from the Core::Basic package are closely related
to concepts modeled in other Core sub-packages. For example, the Property element
can be typed by a one of the types pre-defined in the Core::PrimitiveType package
and Class element (as a Classifier) can be organized in an inheritance hierarchy based
on concepts defined in Core::Abstraction::Generalization package.
Table 4.1 describes the elements that we selected from UML to include in our
data modeling language (i.e. Data Metamodel). However, the table does not explain
how these elements may be used or how they are related to each other. Figure 4.2
shows the resulting subset in a UML class diagram that shows how these elements
are modeled and the relationship between them. A brief summary of the elements
included in this figure follows. Model elements with names can be represented by the
abstract NamedElement class. A Type is a named element that is used as the type
58
for a TypedElement and constrains the set of values that a typed element may refer
to. A Class is a type that has objects as its instance. A Class may participate in an
inheritance hierarchy using superclass property. A MultiplicityElement is an abstract
metaclass that defines an inclusive interval of non-negative integers beginning with a
lower bound and ending with an upper bound. A Property is owned by a class and is
a typed element that may be typed by a data type and shown as an ownedAttribute
of the class or typed by another class and shown as associationEnd of Association.
A property may be paired with an opposite property to represent a bidirectional
association. A DataType is a class that acts as a common superclass for different
kinds of data types. A DataType is specialized in PrimitiveType that includes Integer,
Boolean, String and UnlimitedNatural. An Enumeration is a set of literals that can
be used as values.
In the UML, the data held in a system can be characterized as a model instance,
represented by elements at the lower part of figure 4.2. More specifically, using concepts from UML Instance package ([124], pp.53), the lower part of figure 4.2 shows
how system data can be presented as instances of a data model. An Instance is a
model element that represents modeled system instances. The kind of instance depends on its corresponding data model classifier element. An Object is an instance of
a Class; Slot is an instance of a primitive-typed Property; and a Link is an instance of
an Association. A Slot is owned by an instance. It specifies the values of its defining
Property. A link contains two LinkEnds, each, in turn, contains a reference to an
object participating in an association. Objects, slots, values and links must obey any
constraints on the classes, attributes or associations of which they are instances.
4.2.3
Consistency of UML data model
For a model to be consistent, it has to conform to its metamodel definition. A model
conforms to its metamodel definition when the metamodel specifies every concept used
in the model definition and the model uses the metamodel concepts according to the
Well-Formedness Rules (WFRs) specified by the metamodel [128]. Consistency can be
described by a set of constraints between a model and its corresponding metamodel.
When all consistency constraints are satisfied, a data model is said to conform to its
metamodel.
The conceptual data metamodel we have presented in section 4.2, which is based
on UML metamodel and its structural properties, thus, provides a unifying means to
allow us to differentiate consistent from inconsistent data models. More specifically,
59
Syntactic consistency
--Name Uniquness
context Class inv :
self.allInstances()->forAll(c1,c2 |
c1.name = c2.name implies c1 = c2)
--Absence of circular inheritance
context Class inv :
self.allInstances()->forAll(c|
not (c.AllParents()->includes(c)))
--Existential dependency of composition
context Property inv :
self ->allInstances()->forAll(p |
Class->allInstances()->exists(c|
c.ownedAttributes->includes(p))
--Association bidirectionality
context Association inv :
self.associationEnds->forAll(ae1,ae2|ae1<>ae2
and ae1.association = ae2.association
implies ae1.type = ae2.class and
ae2.type = ae1.class)
Figure 4.3: OCL characterization of data model syntactic consistency
as our presented metamodel characterizes two layers of abstraction: data model and
instance model, we can distinguish between two sets of consistency constraints. First,
consistency constraints that are defined by the modeling language and need to be
satisfied by the data model. Such constraints can be used to check data model
syntactic conformance. Second, consistency constraints that are defined by the data
model and need to be satisfied by the instance model. Such constraints can be used
to check data model semantic conformance.
Following the discussion of the structural properties of our selected UML subset
above, we can synthesize a number of data model consistency constraints in terms
of syntax and semantics. Below we list a number of these consistency constraints.
The set of constraints we list below is by no means an exhaustive list of consistency
constraints for data models, but it lays a foundation for demonstrating how our
proposed approach may be used to maintain consistency of evolving data models.
Definition 1 ( Syntactic Consistency).
A data model is syntactically confor-
mant when it fulfills WFRs defined by the modeling language (UML). A metamodel
for UML contains considerable number of structural properties. However, for explaining how we may maintain syntactic conformance, we consider the consistency
constraints stated in Figure 4.4 (using OCL) and explained below:
60
Semantic consistency
--Instance conformance:
context Object inv:
self.class.ownedAttributes->forAll(a|
self.ownedSlots->exists(s|s.property =a))
--Link conformance
context Object inv :
self.class.oppositeAssociationEnds()->
forAll(ae|exists(le|self.selectedLinkEnds(ae)->
size()>= ae.lower and
self.selectedLinkEnds(ae)->size()<= ae.upper))))
-Value conformance:
context Slot inv:
self.value.oclIsTypeOf(DataValue).
dataType.checkDataType()
Figure 4.4: OCL characterization of data model syntactic consistency
• Name uniqueness: this constraint involves data model classes, associations as
well as properties (attributes and associationEnds). The scope of name distinction of a property is the union of native and inherited properties of its owning
class.
• Absence of circular inheritance: a class cannot generalize itself directly or indirectly, i.e., no cycles should exist within class inheritance hierarchy.
• Existential dependency of composition: a composite determines the life span of
its components. When a composite instance is deleted, all of its components are
recursively deleted.
• Association bidirectionality: this involves two association ends paired together
to form a bidirectional association. In such a case, both ends need to be owned
by the same association and each association End becomes an opposite of the
other end.
Definition 2 ( Semantic Consistency).
An instance model consisting of objects, slots and links must satisfy all consistency conditions defined by its corresponding data model. This denotes semantic consistency of an instance model to its data
model. This semantic consistency can be characterized by a number of consistency
constraints that link instance-level elements to corresponding model-level elements,
thereby establishing a certain set of conditions that must be satisfied for any valid
61
instance model. If an instance satisfies all these constraints, we say that the instance
semantically conforms to its data model. Based on UML structure properties we
described above, we can synthesize the following semantic consistency conditions:
• Instance conformance: an instance conforms to its type if each instance slot
conforms to instance’s class property which is defined by the class or its superclass. The instance should also fulfill all further WFRs which are defined in the
context of the instance’s class or its superclasses.
• Slot conformance: a slot conforms to its property if the value of the slot is
consistent with the data type of the property.
• Link conformance: a link conforms to its association if the objects it references
are consistent with the association type and multiplicity, and the objects it
references satisfy the bidirectional association defined at the data model level.
Finally, we should note that there is no reason to separate an instance model from
its data model by disregarding semantic consistency constraints. Such an instance
model, in most cases, would not exhibits integrity or contains enough information to
be understood.
Example.
Keeping with the running example we have introduced at the beginning of Chapter 2, Figure 4.5 represents the initial version of our simplified EIS data
model. The workforce in the company presented in this data model can either be employees, represented by class Employee, or contractors, represented by class Freelance.
Both kinds of workers are considered persons, represented by Person class. A person is assigned to a department, represented by Department. In addition, employees’
personal data maintained by the company is represented by PersonalFile class.
Each of the above classes contains a number of attributes. In addition, the model
has a number of associations to represent relationships between persons and the department which they work for, an optional manager relationship and the personal
information held about an employee.
The current version of the model is restricted by a number of constraints, written
in OCL, shown below the model. This initial version of the model was instantiated
into a valid model instance shown in Figure 4.6.
62
Person
name : String
age : Integer
Department
assignment
0..*
1
worksFor
employees
startDate : Date
location : String
minSalary : Integer
0..1
Freelance
dept
manages
Employee
1
rate : NAT
hireDate : Date
salary : NAT
1
head
employee
info
1
file
PersonalFile
maritalStatus : String
location
: String
/* An employee’s salary must be greater than the department minSalary*/
context Department inv C1 : self.employees ->forAll(e|e.salary >=
self.minSalary)
/* employees and department are bi-directional associations*/
context Employee inv C2 : self.department.employees ->includes (self)
/* An employee’s hire date must be after the department start date*/
context Employee inv C3 : self.department.startDate < self.hireDate
Figure 4.5: Data model of a simplified Employee Information System
4.3
Modeling evolution
In the previous section, we have characterized a subset of UML metamodel as a data
modeling language that can be used to describe data models. In this section, we
describe Evolution Metamodel : the second part of our modeling approach, presented
in Figure 4.1. In particular, we present the modeling language based on which we
can write evolution models describing how data models, conforming to our Data
Metamodel can be evolved.
To be able to evolve data models written in our UML subset, we extend our Data
Metamodel with a set of model-level edits in the form of model operations. Such
operations are drawn from the model elements of the UML Data Metamodel itself :
each element in the Data Metamodel may be added, modified or removed. Manual
specification of such operations can be error-prone and time consuming activity, hence
we show how such set of model operations can be automatically generated. Before,
we do that, however, we first give an account of how a UML-based metamodel such
as our Data Metamodel may be extended.
63
e100:Employee
p
em
p10:PersonalFile
name = emp1
salary = 2000
...
f100:Freelance
em
ps
wo
rks
Fo
r
status = ‘single’
...
em
ps
o
inf
name = freelance1
rate = 20
...
d1000:Department
location = loc1
...
e200:Employee
o
inf
p
em
p20:PersonalFile
f200:Freelance
emp
name = emp2
salary = 2100
ps
em
s
work
sF
or
d2000:Department
status = ‘married’
location = loc1
dep
r
Fo
rks
wo
or
rkF
wo
r
Fo
rks
wo
name = freelance2
rate = 21
ps
em
p30:PersonalFile
p
em
ps
em
e300:Employee
o
inf
head
name = emp3
salary = 4000
f300:Freelance
name = freelance3
rate = 40
status = ‘single’
Figure 4.6: Instance model of the simplified Employee Information System
4.3.1
Definition of evolution metamodel
UML is frequently used as a general-purpose modeling language, but it can also be
extended to meet the requirements of specific domains. In the context of our work,
we extend UML metamodel, part of which is shown in Figure 4.2 , to cater for the
domain of information systems evolution.
An extension to the UML metamodel may be achieved in two different ways [87],
as a lightweight or heavyweight extension. The first of these involves the definition of
a UML ‘profile’, using extension mechanisms included within the standard UML profiles package ([124], pp.179). This approach has been used successfully for notations
like CORBA [119], but is limited in its application: any new concept or construct
needs to be encoded in terms of the existing concepts within UML, creating problems with interpretation, expressiveness, and flexibility. Instead, here we present a
heavyweight extension, similar in the nature to the ones defined for the Common
Warehouse Metamodel [154] and the Software Process Engineering Metamodel [155].
Definition 3 (Metamodel Extension).
In line with the OMG metamodel hierarchy concepts introduced in Section 2.2.1,
to extend a metamodel, we need to consider its meta-metamodel (at level M3 of
MOF hierarchy). Considering the metamodeling language (in this case MOF) allows
use to distinguish the different concepts, properties and the relationships the modeling language includes. Accordingly, we can declare extend as a mapping function
64
M3
MOF
extends
UML
Data Metamodel
M2
evolves
Data Model
M1
Evolution
Metamodel
Evolution Model
: conforms to
Figure 4.7: Illustration of UML metamodel extension
that takes metamodeling language concepts (at level M3) together with their metamodel instances (at level M2) as input and return an extended (evolution) metamodel
consisting of the original concepts of the input metamodel with their corresponding
evolution operations. More specifically, this function can have the following type:
extend : (MMM × MM) →
7 MM
where MMM denotes the metamodeling language (M3) and MM a modeling language (M2). extend is a partial function (denoted by →)
7 since it only operates on a
subset of the specified metamodel concepts (e.g. abstract classes are not associated
with evolution operations).
Using MOF, we can extend UML metamodel by introducing evolution operations.
As such, each UML metamodel concept will be associated with primitive evolution
operations to add, delete or modify instances of that concept at the model-level. The
primitive evolution operations will be combined into an evolution metamodel and
instantiated into an evolution model as illustrated in Figure 4.7.
Algorithm 1 shows part of the algorithm used for generating evolution metamodel. This algorithm takes as input the UML Data Metamodel, its corresponding
meta-metamodel (MOF) and OCL Metamodel and returns as output an evolution
metamodel with the operations required to evolve elements of the Data Metamodel.
The algorithm generates the basic model edits as instances of MOF Operation Class.
It also generates operation parameters (as instances of MOF Parameter Class) and
65
Alg. 1 An algorithm for generating evolution metamodel
Input: DataMM, MOF, OclMetamodel
Output: evolutionMM
1
// create empty evolutionMM
2
evolutionMM = new MM( )
3
// create Evolution Class
4
MOFClass Evolution = new MOFClass
5
MOFClass ModelEdit = new MOFClass
6
...
7
for each element in DataMM do;
8
element = findMOFClass(DataMM, MOF);
9
//generate model edits for metamodel class
10
if (element is instanceOf MOFClass);
11
if (element.isAbstract = false);
12
//generate addClass edit
13
MOFOperation addClass = new MOFOperation;
14
MOFClassProperties = findMOFClassProperties(MOFClass);
15
for each property in MOFClassProperties do
16
MOFParameter = new MOFParameter();
17
MOFParameter = MOFClassProperty;
18
add MOFParameter to addClass Operation
19
add addClass to ModelEdit Class
20
endFor
21
//generate modifyClass edit
22
MOFOperation modifyClass = new MOFOperation;
23
...
24
//generate deleteClass edit
25
MOFOperation deleteClass = new MOFOperation;
26
...
27
endIf
28
endIf
29
//generate evolution primitives for metamodel properties
30
if (element is instanceOf MOFProperty);
31
...
32
endIf
33
endFor
appropriately assigns them to the respective model edit operation. The basic idea
is that for every evolving (non-abstract) element of the Data Metamodel, we identify its MOF class (line 8). Every property of the identified MOF class as well as
properties of its superclass appears as a parameter in the model edit, respecting the
parameter data type. For example, addClass edit (line 9-20) will have parameters
corresponding to the properties of its corresponding MOF Class (also called Class) and
its superclasses (e.g. NamedElement), and takes the form : addClass(name:String,
isAbstract:Boolean, superclass:Class). Applying the algorithm above, we obtain a MOF-compliant evolution metamodel: an extension of the UML Data Meta66
Package
(From MOF)
Class
(From MOF)
Operation
(From MOF)
NamedElement
(From MOF)
evolvableElement
DataModel
Evolution
1
ModelEdit
0..*
name : String
source
0..1
evolElement
edits
invariants
<<enumeration>>
EvolType
- addClass
- deleteClass
- addAttribute
- deleteAttribute
- addAssociation
- ...
0..*
PrimitiveEdit
OCLExpression
(From OCL Metamodel)
expression
1
0..
CompoundEdit
evolType: EvolType
initExpression
varExpression
IteratorEdit
SequentialEdit
Figure 4.8: Evolution Metamodel
model defined at level (M2) of the four-level MOF architecture and applied to data
models at level (M1).
Figure 4.8 shows the main concepts of the generated evolution metamodel. Classes
drawn from MOF and OCL are annotated accordingly. Remaining classes represent
the main concepts in our proposed evolution metamodel. An Evolution defines how a
source data model can be evolved into a target data model. Evolution is a subclass of
both Package and Class. As a package it provides a namespace for its ModelEdits and
as a class it can define properties and operations. An Evolution can refer to OclExpression to specify invariant properties and WFRs of its instantiating models (i.e.
evolution models). Each Evolution contains a set of ModelEdits. A ModelEdit may be
subclassed to address different types of changes. These changes could be primitive,
primarily affecting one element of a data model, and represented by PrimitiveEvolutionOperations, for example, addClass() and deleteAssociation() or compound,
affecting multiple elements in a data model, and represented by CompoundEvolutionOperations, for example, extractClass(). Here we have assumed only that a suitable
enumeration can be provided. Each ModelEdit is associated with a set of typed parameters and specify a number of evolution operations. Attribute and Association
addition and modification edits take an additional parameter: an OCL expression, to
describe the initial value of attribute or association.
With the introduction of the Evolution Metamodel, it is now possible to evolve
67
Evolution of Data Model Classes
addClass (className, isAbstract, superclass)
modifyClass (className, isAbstract, superclass)
deleteClass (className)
Evolution of Data Model Attributes
addAttribute (className, attributeName, expression)
modifyAttribute (className, attributeName, expression)
deleteAttribute (className, attributeName)
Evolution of Data Model Associations
addAssociation (assoName, srcClass, tgtClass
srcProperty, tgtProperty, multiplicity, expression)
addAssociation (assoName, srcClass, tgtClass
srcProperty, tgtProperty, multiplicity, expression)
deleteAssociation (assoName)
Figure 4.9: Primitive Model Edits
and modify UML data models using a set of primitive model edits that can be used
independently or combined together to describe an intended data model evolution
scenario.
4.3.2
Primitive Model Edits
The presentation of model edits in the Evolution Metamodel primarily specifies their
abstract syntax. The meaning of each edit is yet to be defined. In the subsequent
chapter, we give precise semantics to each model edit. However, here it is important
to briefly introduce the intended use of each edit, particularity, given our main focus
on data migration, its impact on persistent system data.
Algorithm 1 has generated three primitive model edits corresponding to the addition, modification and deletion of each non-abstract metaclass in our UML Data
Metamodel depicted in Figure 4.2. Considering all generated primitive edits would
complicate the presentation unnecessarily. In the context of our dissertation, we are
mainly concerned with data model changes that affect the form or validity of the
corresponding data instances in a system. Therefore, we focus our presentation on
edit operations that evolve the main structural elements presented in a model, that
is, classes, properties and associations. Subsequently, we disregard changes to less
important structural elements such as changes to data types and enumerations. Since
data model constraints may need to change as a result of data model evolution, at the
68
end of next section we discuss the impact of constraint evolution on the existing data,
although we do not generate operations corresponding to the evolution of constraints.
Figure 4.9 shows the primitive model edits that we shall focus on.
Evolution Primitives for Data Model Classes.
At the data model level (M1),
adding a class will create a new class, specifying its name, whether it is an abstract
class and if it is a subclass to any other class in the data model. At the instance
level (M0), existing data are not affected by the class addition, and will continue to
be conformant to the target (evolved) model.
Modifying a class so that it is abstract requires the designer to provide a subclass
to which class objects need to be migrated. Consequently, mandatory attributes that
are not available in the subclass would need to be created and initialized to default
values. Modifying a class so that it is concrete (i.e. can have instances) does not imply
changes to data instances. Modifying a class so that it is no longer a superclass implies
removing slots of properties inherited from that class. Additionally, association ends
pointing to the superclass, referring to objects of one of its subclasses, need to be
removed.
Deleting data model classes requires deleting corresponding data instances. However, deletion of instance elements poses the risk of migrating to inconsistent state.
For example, deletion of objects may leave association links pointing to non-existing
objects. Therefore, class deletion is bound to metamodel level constraints: classes
may only be deleted when they are outside inheritance hierarchies and are not targeted by associations. Such considerations are discussed further when describing the
formal semantics of addClass in Section 5.3.1.
Evolution Primitives for Data Model Attributes.
Attribute evolution prim-
itives are parameterized by the owning class, the name and type of the new attribute.
Furthermore, addition and modification primitives may include an OCL expression
describing an initial value that the new or modified attribute can hold. Modifying
attribute types require type conversion that maps the original set of values onto a
new set of values conforming to the new attribute type. This conversion can be done
directly or through an intermediate type. As type conversion is more of an implementation issue, it is not discussed in this thesis. Deleting an attribute, removes the
named attribute from its woning class and the corresponding value from class objects.
69
Evolution Primitives for Data Model Associations.
Adding an association
requires an association name and two association ends, each owned by classes participating in the association relationship. The addition of an association may require
initialization of corresponding links using default values or default value expressions.
This can be achieved using an OCL expression, given as a parameter of the primitive.
An association may be modified so that it points to a class which is a subclass
or a superclass of the original class. The former case requires deletion of links not
conforming to the new association type. The latter case requires no data migration.
Modifying an association so that it is an opposite needs a migration to make the link
set satisfy this added constraint. Modifying an association so that it no longer has
an opposite does not require instance-level migration.
Deleting data model association require deleting corresponding instances, such
as links by the migration. However, deletion of links poses the risk of migration to
inconsistent models: for example, deletion of a links may break object containment.
Therefore, deleting an association is bound to metamodel level constraints: associations may only be deleted when they are neither composite, nor have an opposite.
Evolution of Data Model Constraints. Although many evolutions will involve
restructuring and extension of models, many more will involve the introduction of
additional constraints, or the modification of constraints already specified within
the model —particularly where the evolution is taking place within a model-driven
architecture, and the constraints may be used directly to determine the detailed
behavior of a system. The simplest, and most common, constraint evolution involves a
change to the multiplicity of some property: for example, we might decide that a oneto-many association needs instead to be many-to-many, or vice versa. In UML, this
will correspond to a change to the upper value associated with one of the properties.
More complex constraints may be specified as class or model invariants, describing
arbitrary constraints upon the relationships between values of properties across the
model.
If the conjunction of constraints after the evolutionary step is logically weaker
than before, and the model is otherwise unchanged, then the data migration can be
considered feasible. Whatever data the system currently holds, if it is consistent with
the original model, then it should also be consistent with the new model. However,
where the conjunction of constraints afterwards is stronger than before, or the evolutionary step involves other changes to structures and values, then data may not fit:
70
that is, the data migration corresponding to the proposed evolution might produce a
collection of values that does not properly conform to the new model.
4.3.3
Expressing evolution patterns
While describing compound data model evolution using a set of basic model edits
may be achieved by sequencing these edits, pure sequencing is not always powerful
enough to express many desired evolution. For example, in generalize class scenario
[100], we may wish to create a new superclass AB based on two input classes A and
B. Superclass AB is constructed by collecting common attributes that exist in both
classes A and B. There may not be any pure sequencing of evolution primitives that
could characterize the logic of this scenario. Thus, we recognize the need for an
iterator combinator. Hence, in the context of our work, combining edits can be done
using one of two combinators :
• Sequential composition : describes the effect of two updates being performed
one after the other, as part of the same compound evolution.
• Iterator composition : iterates over a sequence of objects to apply an edit.
With the help of these two combinators, we may compose primitive edits to describe evolution patterns. For example, we can use these combinators to define operations such as inlineClass [26] in terms of their component edits on model elements:
inlineClass(Source,Target,property) =
( forall p : Source.properties .
addAttribute(Target,p,Source.p.type,
Source.p.upper, Source.p.lower, Source.p) ) ;
deleteClass(Source)
Here, we have assumed a language in which we may select the values of metaproperties (properties in the metamodel) with the usual . notation, as an OCL expression and the existence of an appropriate iterator forall. We have then used a
sequential comosition operator, denoted by ‘;’, to combine two primitive updates :
addAttribute and deleteClass to define inlineClass compound operation.
Other commonly-used patterns include merging classes, introducing an association
class and the movement of attributes up and down inheritance hierarchies. A wide
range of patterns have been identified in the literature: see for example, [14], [60],
and [39].
71
Example.
Within the context of our example outlined at end of Section 4.2.3
and depicted in Figure 4.5, assume in a subsequent, due to change in requirements,
it was decided that having a separate PersonalInfo class is unnecessary and that the
multiplicity of manager association between Department and Employee classes now
needs to be considered mandatory rather than optional. In addition, a new attribute
named seniority is added to the Person class and initialized into ‘senior’ if a person’s
age is greater than 50, otherwise, the initial value is ‘junior’. Suppose also that, to
enhance employee assignment tracking capability, a new Project class was extracted
from the Department class.
In order to capture these changes at the design level, we may write an evolution
model representing the proposed changes as evolutionary steps described in terms of
our proposed evolution metamodel.
inlineClass(Employee, employee, PersonalInfo);
addAttribute(Employee, seniority: String
[if self.age > 50 then ’senior’ else ’junior’]) ;
modifyAssociation(Department, manager, Employee, 1,1) ;
extractClass(Department, Project, projects)
Note that each of inlineClass and extractClass is a compound edit which can be
written in terms of the primitive edits. For example, the extractClass consists of
the following operations, in part:
(addClass(Project);
addAssociation (Department, projects : Project, 0, *);
moveAttribute(
addAttribute(Project, location,
[Project.location = Department.location]);
deleteAttribute(Project, location));
moveOperation(
addOperation(Project, setProjectAssigmt);
deleteOperation(Department, setProjectAssigmt))
Figure 4.10 shows the evolved version of the data model after the above evolution
operations have been performed.
72
Person
0..*
name : String
age : Integer
Department
assignment
1
worksFor
employees
0..1
startDate : Date
minSalary : Integer
1
Freelance
rate : NAT
manages
Employee
hireDate : Date
salary : NAT
Seniority : String
maritalStatus : String
location
: String
orgUnit
monitors
dept
0..*
Project
1
head
projects
location
: String
closingDate : Date
/* An employee’s salary must be greater than the department minSalary*/
context Department inv C1 : self.employees ->forAll(e|e.salary >=
self.minSalary)
/* employees and department are bi-directional associations*/
context Employee inv C2 : self.department.employees ->includes (self)
/* An employee’s hire date must be after the department start date*/
context Employee inv C3 : self.department.startDate < self.hireDate
Figure 4.10: Employee Information System model (evolved)
4.4
Induced data migration
One important benefit of our proposed modeling approach is that data migration
at the instance-level (M0) can be derived by data model changes at the model-level
(M1). Modeling each concern at a separate abstraction level allows us to decouple
the conceptual representation of the data from the physical representation at the
implementation platform.
Figure 4.11 depicts the concept of induced migration, where each data model
evolution has a corresponding data migration. Induced data migration consists of
three components: (1) a definition of the data model evolution and (2) a mapping
from data model evolution to data migration. (3) data migration rules that depends
on numerous factors, such as how the data is stored, what platform is available to
execute the data transformation and the quantity of the data. This component is
therefore driven by the context. The question that remains though is: how to induce
data migration from data model evolution?. Below, we characterize this question
more precisely. Throughout the coming chapters, we demonstrate how this concept
73
1
induce
Target
Data Model
conformsTo
Data Model Evolution
conformsTo
Source
Data Model
2
3
Source
Data Store
Data Migration
Target
Data Store
Figure 4.11: Induced Data Migration
can be realized using our model-driven approach.
We specify data model evolution as a collection of model edits. Each model edit
specifies a relation between a source model and a target model in the same modeling
language: modelEdit ∈ Model →
7 Model , where Model represents the set of all data
models.
Data migration, on the other hand, can be specified as a collection of data edits.
Each data edit specifies a relation between a source data store and a target data store:
dataEdit ∈ Data → Data , where Data represents the set of all data stores.
We may use modelEdit as a basis for inducing dataEdit functions that, when
applied to a data store d 1 ∈ Data of the existing system model, will produce a target
data store d 2 ∈ Data. To map data model evolution at the model level (M1 of MOF
hierarchy) to data migration at the data instance level (M0 of MOF hierarchy), we
define function induce: induce : (Model → Model ) → (Data → Data)
so that d2 = derive(ModelEdit m1 )(d1 )
As stated above, instance level actions depend on the implementation platform.
For example, if the data we are interested in migrating is persisted in a relational
database, we may expect that the instance actions we need to migrate this data would
consist of a collection of SQL statements. However, to abstractly demonstrate the
concept of induced data migration, in this section we may assume that our instance
data can be manipulated by a set of UML instance actions, defined in UML Action
package ([120], pp.217).
Table 4.2, shows main actions defined in this package. As the table shows, Objects, attribute value slots and links can be manipulated by actions. Objects can be
created and destroyed; attribute value slots can be set and links can be created and
destroyed. For example, CreateObject(c:Class):Instance creates a new object
and DestroyObject (o:Instance) destroys an object.
74
Action
CreateObject(c:Class):Instance}
DestroyObject(o:Instance)
AddAttributeValue(o:Instance,
att:Property,v: ValueSpecification)
RemoveAttributeValue(o:Instance,
att:Property, v: ValueSpecification)
CreateLink(as:Association,p1:Property,
o1:Instance, p2:Property,o2:Instance)
DestroyLink(as:Association,p1:Property,
o1:Instance, p2:Property, o2:Instance)
Description
Creates a new object that conforms to the specified
class. The object is returned as an output parameter.
Removes object o from its respective class. We assume that links in which o participates are not automatically destroyed.
Sets the value v as the new value for the attribute
att of the specified object o.
Removes the value v of att in the specified object o.
Creates a new link in the specified association as between objects o1 and o2, playing the roles p1 and p2,
respectively.
Removes the link between objects o1 and o2 from as.
Table 4.2: Main Actions of UML Action Package
Each model edit at the model level may be interpreted as a series of instance
actions. For example, the inlineClass compound edit, shown in Section 4.3.3 could
be interpreted as:
1
2
3
4
5
6
7
forall object : Target .
addAttributeValue (object, sourceProperty, value)
forall s2t : sourceTotarget
destroyLink (sourceTotarget, sourceProperty, targetObject,
targetProperty, sourceObject)
forall sourceObject : Source
destroyObject (sourceObject)
with addAttributeValue(), destroyLink() and destroyobject instance actions
implementing addAttribute(), deleteAssociation() and deleteClass respectively.
Our proposed Evolution Metamodel may be used incorrectly by designer writing
evolution models. In particular, a designer may describe evolution operations that
violate data model syntactic consistency, semantic consistency or both. Accordingly,
our specified model edits and data edits need to handle consistency constraints at
two abstraction levels, that is, model-level and instance-level.
We may also note that some evolution operations may have a bearing upon the
data integrity semantics. This leads to a non-trivial guard upon the overall data migration function. For example, the modifyAssociation() operation in the example
at the end of Section 4.3.3 changes the multiplicity of the manager association from
optional to mandatory. Accordingly, we would expect a non-trivial guard of the form:
75
forall dd : Department .
dd.manager ->size() > 1
to arise upon the migration of existing data. This guard requires that the manager
link should exist in all objects of class Department for the migration of the data to
succeed. Although this guard is satisfied by Department instance d2 in Figure 4.5
which has a manager link to Employee instance e3, Department instance d1 does
not have such a link and may not be migrated to the new model until such link
is established. This can be achieved by providing additional input: a set of links
or references that are to be created for the association in question or an expression
explaining how the link is to be created, in order that the constraints of the new
model should be satisfied.
76
Chapter 5
Specification and Verification of
Data Model Evolution
When a data model evolves, one important consideration is to maintain data model
consistency. As we discussed in Section 4.2.3, data model consistency can be characterized by a number of consistency constraints. Such constraints define conditions
that must be satisfied by a data model and its corresponding data instances.
To be able to maintain the consistency of the data model within the context of
evolution, two aspects of semantics need to be elaborated. First, we need to formally
capture data model concepts at both levels of abstraction: model-level and instancelevel and to precisely define how the two abstraction layers are related. Second, we
need to specify the effect of evolution on the data model, showing the impact of
evolution operations on model-level elements and on corresponding data instances,
ensuring that the data model consistency is preserved.
To satisfy the above requirements, we need to have a formal representation of the
data model. This formal representation provides us with a framework for characterizing data model concepts and the relationship that exists among them. It also
helps us reason about data model evolution and the effect of such evolution on the
consistency of the data model. Defining formal semantic for our data model evolution
requires the specification of a semantic domain in a formal language [76]. We selected
B-method [6] as a formal framework for the specification of our semantic domain.
Model-based formal methods such as B or Z [164] are well-suited to the expression
of models with important static constraints, and to the verification of these constraints
across state changes. In our work, we have favored B. As outlined in section 2.4, Bmethod, is a precise state-based formal modeling language with well-defined semantics
and two main notations: AMN and GSL.
77
In this chapter we exploit B formal specification techniques, expressing both the
data model and the evolution model. On one hand, semantics of UML data model
(Section 5.1) describes the meaning of a UML data model in terms of the structure
of a characterized B abstract machine: sets of elements and their relationships which
are consistent with the Well-Formedness Rules (WFRs) defined by the UML abstract
syntax. This part of the semantics is given in AMN notation of B method. On
the other hand, semantics of model evolution operations (Section 5.3) describes the
evolution of the state of the modeled elements and capture the applicability and
effect of evolution primitives. This part of the semantics is given in GSL notation of
B-method.
5.1
Semantics of UML data model
An appropriate B semantic domain for UML data models should be generic enough to
express the structural aspects and properties of UML data models, which we outlined
in Section 4.2.1. Since previous work such as [98, 144, 105, 107] has already investigated the mapping from UML to B, we do not investigate this problem. However, we
still draw on previous work translating UML and OCL into B to define an appropriate
B semantic domain, which is rich enough for our UML data models (instances of the
specified UML subset). Below we show parts of the abstract machine specification.
Full specification of the abstract machine can be found in Appendix C.
5.1.1
Type structure
UML Boolean type is mapped to BOOL (defined as the set TRUE, FALSE in the
Bool_TYPE machine) in B. UML String type is mapped to STRING in B, defined in
String_TYPE machine. The UML Integer type is translated to B’s NAT, defined in
Int_TYPE machine. These machines are system type machines that can be accessed
using SEES clause in the main machine (lines 2-3, in table 5.1).
The types of other classes in the data model are represented by given sets in
the SETS clause (lines 4-6), for example, CLASS, is a given set used to hold names
of possible data model classes and OBJECTID is a given set used to define all object
identifiers. A B given set may have enumerated values, or, as we use it here, may
simply be an (implicitly) non-empty, finite set [[6], Sects. 2.2 and 4.17].
In general, TYPE is a given set that represents all types in the data model. It
is categorized into PrimitiveType and classType, represented by two CONSTANTS
78
1
2
3
4
5
6
7
8
9
10
11
12
13
14
...
SEES
Bool_TYPE,String_TYPE
SETS
CLASS; OBJECTID; PROPERTY; ASSOCIATION;
VALUE; TYPE; PrimitiveTYPE
CONSTANTS
typeOf, primitiveType, classType
PROPERTIES
primitiveType : PrimitiveTYPE --> TYPE
&
classType
: CLASS
--> TYPE
&
ran(primitiveType) /\ ran(classType) = {} &
typeOf
: VALUE >-> TYPE
...
Table 5.1: Formalization of data model type structure
1
2
3
4
5
6
...
className
isAbstract
superclass
ownedProperties
...
<:
:
:
:
CLASS
CLASS
CLASS
CLASS
+-> BOOL
+-> CLASS
<-> PROPERTY
&
&
&
&
Table 5.2: Formalization of data model classes
functions. Finally, typeOf is an injective function mapping each data model value to
its corresponding type.
5.1.2
Data model classes
Table 5.2 shows the formal semantics given to data model classes. A class name is
represented by className set, defined as a subset (denoted by <:) of CLASS : a given
set that holds all possible class names. isAbstract is a function from CLASS to a
boolean value. Class generalization is represented by the superclass function that
relates a class to its immediate superclass. A data model class can own a number of
properties (attributes and association ends). This is represented by ownedProperties
relation that maps data model classes to its owned properties.
5.1.3
Data model properties
A UML property may represent an attribute or an association end [124]. Table
5.3 shows the formalization of data model properties. Each property has a name,
79
1
2
3
4
5
6
7
8
9
...
propertyName
<:
isComposite
:
propertyLowerMultip :
propertyUpperMultip :
owningClass
:
propertyType
:
opposite
:
...
PROPERTY
PROPERTY
PROPERTY
PROPERTY
PROPERTY
PROPERTY
PROPERTY
+->
+->
+->
+->
+->
+->
BOOL
NAT
NAT
CLASS
TYPE
PROPERTY
&
&
&
&
&
&
&
Table 5.3: Formalization of data model properties
represented by propertyName, defined as a subset of PROPERTY: a given set that holds
all possible property names. owningClass function maps a property to its owning
class. propertyType function maps a property to a data type. When a property is an
association end, it can represent composition (whole-part) relationship between the
property owning class and a related class. This is mapped to isComposite function
from PROPERTY to a boolean value. Lower and upper multiplicity of properties are
represented by two functions : propertyLowerMultip and propertyUpperMultip
respectively, mapping a property to a natural number. Function opposite maps a
PROPERTY to a PROPERTY to represent the case where two association ends are paired
together to form a bi-directional association.
5.1.4
1
2
3
4
5
Data model associations
...
associationName
association
memberEnds
...
<: ASSOCIATION
&
: ASSOCIATION +->(CLASS +-> CLASS)
&
: ASSOCIATION +->(PROPERTY +-> PROPERTY) &
Table 5.4: Formalization of data model associations
An association can have a name, represented by associationName, defined as a
subset of ASSOCIATION: a given set that holds all possible association names. An association relates two classes, as represented by association function. This function is
indexed by an association name and yields a functional relation between two classes.
Each association must have two association ends. This is represented by memberEnds
function relating ASSOCIATION to a partial function between two properties. Other
80
characteristics of a member end (as a PROPERTY) is shown in table 5.3 above. For example, the multiplicity of each member end is represented by propertyLowerMultip
and propertyUpperMultip.
5.1.5
1
2
3
4
5
...
extent
value
link
...
Data model instances
: CLASS
+-> POW(OBJECTID)
&
: PROPERTY
<-> (OBJECTID +-> VALUE)
&
: ASSOCIATION <-> (OBJECTID +-> POW(OBJECTID)) &
Table 5.5: Formalization of data model instance layer
In our data model semantics definition, we differentiate between two abstraction
layers, the model layer and the instance layer. Table 5.5, shows the main elements
of the instance layer semantic elements. The function extent maps a class to a set
of object identifiers. The relation value returns the current value of an attribute
property for each object of the property owning class. The relation link, indexed
by an association name, yields a function between object identifiers, represented as
a function from an object identifier to a set of object identifiers. Note that here we
use relations (denoted by <->) rather than partial functions, to characterize values
and links. This is mainly to capture data model inheritance at the instance level.
A property can be owned by multiple classes: a superclass and its subclasses in an
inheritance hierarchy.
5.2
Consistency of the data model
Our AMN formalization presented above captures the semantics of UML Data Metamodel subset which we use for data modeling, and thus provides the context needed
for expressing consistency constraints for data models and, hence, allow us to analyze UML data models for consistency. In the previous chapter, Section 4.2.3, we
introduced two kinds of data model consistency constraints dealing with data model
consistency from syntactic and semantic views. To be able to verify model consistency checking, here, we formalize these consistency rules based on the abstract
machine formalization we presented above. Data model consistency rules are defined
as invariants in the abstract machine.
81
5.2.1
1
2
3
4
5
Syntactic consistency
INVARIANT
...
!(c1,c2).(c1:className & c2:className &
c1 /= c2 & (c1|->c2) : closure1 (superclass) =>
ownedProperties[{c1}] /\ ownedProperties[{c2}] = {}) &
Table 5.6: Name uniquness for data model properties
Name uniqueness: for a data model to be valid all data model constructs such classes,
properties and associations need to have unique names. This condition is captured by
the fact that names of these constructs are defined as sets. However, in presence of
data model inheritance, it is necessary to capture the condition that, within an inheritance hierarchy, a property name should uniquely refer to one property only. Where
! denotes universal quantification, closure1 denotes transitive closure operator, the
invariant shown in table 5.6 expresses this requirement.
1
closure1(superclass) /\ id(CLASS) = {}
Table 5.7: Absence of circular inheritance
Absence of circular inheritance: this consistency condition requires that no cycles
exist within class inheritance hierarchy. This is precisely expressed by the invariant
presented in Table 5.7. The transitive closure (denoted by closure) of superclass
function intersected with the identity relation on data model classes (denoted by id)
should result in an empty set.
1
2
3
4
!(aa).(aa:dom(memberEnd) =>
!(ae).(ae:memberEnd(aa) & isComposite(ae) = TRUE &
association(aa)(owningClass(ae))= {} =>
ownedProperties~[{ae}] = {} ))
Table 5.8: Existential dependency of composition
Existential dependency of composition: in a composite association, the class that
represents the part can only exist if the class that represents the whole exists. This is
captured by the invariant in Table 5.8. After universally quantifying over data model
associations which are in the domain of memberEnds function (line 1), we require
82
that, if the target class of the composite association got delete, the owning class of
the association end should also be deleted.
1
2
3
4
5
6
!(aa1,p1,p2).(aa:dom(memberEnds)
&
p1:dom(memberEnds(aa1) &
p2:ran(memberEnds(aa1) &
(p1|->p2) : opposite =>
#(aa2).(aa2:dom(memberEnds)) &
memberEnds(aa) = {(p2|->p1)})))
Table 5.9: Association bidirectionality
Association bidirectionality: two member end properties can be paired so that each
end can be an opposite of the other. In such a case, both ends need to be owned
by the same association. We capture this consistency condition by quantifying over
an association and two opposite properties. The invariant requires that, for such an
association, there exists another association name with the same member ends, going
in the opposite direction.
5.2.2
1
2
3
4
5
6
7
Semantic Consistency
!(cc,oo,pp).(cc:className &
oo:extent(cc)
&
pp:PROPERTY
&
oo : dom (value (pp)) &
owningClass(pp) = cc
=>
pp : ownedProperties[(closure1(superclass)\/id(className))[{cc}]])
Table 5.10: Instance conformance
Instance conformance: in UML data models, an instance conforms to its class if
values of the instance slots are consistent with properties of the instance class and
superclasses. This is captured by the invariant in Table 5.10. The invariant requires
that properties defining instance values be owned the instance class or its superclasses.
Value conformance: in UML data models, a value conforms to its property if
the value is consistent with the data type of the property. This is captured by the
invariant in Table 5.11. The invariant requires that the type of any attribute value
83
1
2
3
4
!(cc,aa,oo).(cc:className &
aa:ownedProperties(cc) &
oo:extent(cc) =>
typeOf(value(aa)[{oo}]) = propertyType(aa) )
Table 5.11: Value conformance
assigned to an object of the class to match the type of the property as defined in the
object’s class (line 4).
1
2
3
4
!(aa,pp).(aa:dom(memberEnds) & pp : dom(memberEnds(aa)) =>
(!(cc,oo).(cc:dom(association(aa)) & oo:extent(cc)
=>
(card(link(aa)(oo))) >= propertyLowerMultip(pp)
&
(card(link(aa)(oo))) <= propertyUpperMultip(pp) )))
Table 5.12: Link conformance
Link conformance: a link conforms to its association if the objects it references are
consistent with the association type and multiplicity. The referenced objects must
satisfy the bidirectional association defined at the data model level. The invariant
in table 5.12 above requires that the number of objects participating in an association link at the instance level must be permitted by the multiplicity of corresponding
member end at the model level.
Example. To show how our proposed abstract machine may be used, we may
instantiate its variables from our Employee Information model and its conformant
object model in Figure 4.5 and Figure 4.6 respectively, we get the following relations,
where a |-> b denotes the pair (a, b):
className
= {Employee, Department, PersonalFile,... }
ownedProperties = {Employee |->hireDate
propertyType
association
Employee |->,salary
Department |->location ,
PersonalFile|->maritalStatus,... }
= {hireDate |->Date,
salary
|-> NAT,
location |->String,...}
= {assignment |->(Person|->Department),
manages
|->(Department|->Employee),...}
84
extent
= {Employee
|->{e100,e200,...}
Department|->{d1000, d2000},...}
value
= {name|->(e100|->emp1),
rate |-> (f100|->20)
link
= {assignment|->(e100|->{d1000}),
manages
|-> (e100 |->{d2000}),
Data model-specific constraints (for example, association multiplicity or userdefined constraints) that require particular representation of data instances will also
be mapped into invariant properties constraining extent, value, and link variables
in our abstract machine semantic domain above.
For example, In Employee Information model, the OCL constraint
context Employee inv C1 :
self.department.employees->includes(self)
can be mapped to the AMN machine INVARIANT:
! ee : extent (Employee).
! dd : link (assignment) (ee) =>
ee : link (assignment) (dd)
where ! denotes universal quantification, : denotes set membership and () denotes
function application.
5.3
Semantics of model evolution operations
We may give semantics to our evolution operations by mapping each to appropriate
substitutions on machine variables, described in B-method Generalized Substitution
Language (GSL). Each substitution will act upon the variables of the machine state,
giving precise meaning to changes in data model elements and corresponding instances. This characterization will help us later reason about data model evolution.
By reasoning about the substitution, we may determine whether or not the proposed
change corresponds automatically to a successful data model evolution and corresponding data migration, or whether manual intervention will be required.
85
5.3.1
Specifying evolution primitives
Each of the model evolution operations which we presented in Section 4.3.2 of Chapter
4, may be given semantics as an operation upon the machine which we defined in
section 5.1, explaining the applicability and intended effect of an evolution step upon
data model elements and data instances contained in the system.
In general, the semantics of our evolution primitive can specified using a combination of preconditioned substitution PRE..THEN..END and unbounded choice substitution using ANY..WHERE..THEN..END, taking the form:
evolutionPrimitive(parameters) =
PRE
parameter typing & other B Predicates
THEN
ANY
local variable WHERE
B Predicates on local variable &
B Predicates on other variables
THEN
localvariable
:= B Substitution
other variables := B Substitution
||
END
END
The PRE operator in preconditioned substitution is used to assign types to operation parameters and to describe the applicability conditions under which the substitution statements can be executed correctly. Thus, the statement after the PRE
operator can be seen as expressing the expectations or requirements of the specifier
on the conditions under which the substitution will be executed.
The local variable following ANY is a variable disjoint from the state space of the
operation, created only to execute the statement and discarded after the execution.
In general, an ANY statement can make use of a list of local variables. The predicate
after the WHERE clause is a predicate on the local variable(s) that provides a type to
the variable(s) and may, in addition, provide some other constraining information.
It may also refer to other state variables and relate them to the local variable(s).
The body of the substitution comes after THEN clause. Here, we describe the effect
of the operation. The substitution statements describing the effect of the operation
may refer to local variable(s) specified after ANY or other machine state variables
86
1
2
3
4
5
6
7
8
9
10
11
12
addClass(cName,isAbst,super) =
PRE
cName : CLASS - className &
cName /: dom(extent)
&
isAbst : BOOL
&
super : className
THEN
className := className
\/
isAbstract := isAbstract \/
superclass := superclass \/
extent
:= extent
\/
END
{cName}
{cName |-> isAbst}
{cName |-> super }
{cName |->{}}
||
||
||
Table 5.13: Part of the semantics of class addition
relevant to the operation. Typically, we use B parallel substitution (denoted by ||),
to indicate an undetermined order of execution.
The general form of substitution for add operations is a set union (denoted by
\/). This is used to add a new element to the set of known elements and to update
functional relations in which the type of the element being added takes part. The
general form of substitution for delete operations is either set subtraction (denoted
by -) , domain subtraction(denoted by <<|) or range subtraction (denoted by |>>).
These operators are used to remove an element from an existing set and to update
functional relations in which the type of the element being removed takes part. The
general form of modify operation substitution is function over-riding (denoted by <+):
the current value of an element is replaced with a new value supplied by operation
parameters.
The evolution primitive semantics may refer to either model-level or instance-level
concepts. This feature allows us to specify how an evolution of a model-level construct
may impact other model-level constructs and /or instance level constructs.
Below we show parts of the semantics of the evolution primitives. The complete
description of the semantics of each evolution primitive can be found in Appendix C.
Class addition and deletion
The update addClass corresponds to the B-method operation shown in Table
5.13. The PRE operator (lines 2-6) provides typing information to the operation parameters. We then select a fresh class name (denoted by className)(line 3) to update
the variables of the class construct in a series of parallel substitutions (denoted by
||) (lines 8-11). The class variables such as className, isAbstract, superclass are
87
updated based on the parameters of the operation using set union. Since the addition
of a class does not affect existing data instances, at the instance level, we map the
class identifier to an empty set of extent.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
deleteClass (cName) =
PRE
cName : className &
cName /: ran(superclass) &
ownedProperties[{cName}] /\ ran(opposite) = {}
THEN
ANY
cObjects, cProperties,cAssociations
WHERE
cObjects = extent(cName) &
cProperties = ownedProperties[{cName}] &
cAssociations = {asso|asso : ASSOCIATION &
(dom(association(asso))\/
ran(association(asso))) = {cName}}
THEN
className
:= className - {cName}
||
superclass
:= {cName} <<| superclass |>> {cName} ||
propertyName
:= propertyName - cProperties
||
association
:= cAssociations <<| association
||
extent
:= {cName}
<<| extent
||
value
:= cProperties
<<| value
||
link
:= cAssociations <<| link
END
END;
Table 5.14: Part of the semantics of class deletion
deleteClass operation has the class name cName as a parameter. The semantics
of this operation is shown, in part, in Table 5.14. The precondition predicate (lines
2-5) ensures that the class name, provided as a parameter of the operation already
exists in the data model classes. It also ensures that the deleted class is not one
of the superclasses and that none of the deleted class properties is an opposite in a
bidirectional association.
The body of the operation first declares three local variables for class objects,
class properties and class associations and binds these local variables with their corresponding sets. These local variables are then used to update class variables in the
machine such as className, superclass and propertyName...etc. This is done
using set and domain subtraction. The remaining substitution statements remove
88
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
addAttribute(cName,attrName,type,exp) =
PRE
cName
: CLASS
&
attrName : PROPERTY - propertyName &
attrName /:ownedProperties[{cName}] &
type
: ran(primitiveType)
&
exp
: VALUE
&
typeOf(exp) = type
THEN
ANY objectId
WHERE objectId = extent(cName)
THEN
propertyName
:= propertyName
\/ {attrName}
||
ownedProperties := ownedProperties \/ {cName|->attrName} ||
propertyType
:= propertyType
\/ {attrName|-> type} ||
value
:= value \/ {attrName|->(objectId * {exp})}
END
END;
Table 5.15: Part of the semantics of attribute addition
1
2
3
4
5
6
7
8
9
deleteAttribute(cName,attrName) =
PRE
cName : CLASS &
attrName : PROPERTY
THEN
propertyName
:= propertyName - {attrName}
owningClass
:= {attrName} <<| owningClass
value
:= {attrName} <<| value
END;
||
||
Table 5.16: Part of the semantics of attribute addition
owned class properties from relevant functions. Any objects corresponding to the
class name are deleted from extent, value and link functions.
Attribute addition and deletion
The operation of addAttribute has the semantics shown in Table 5.15. The
precondition of the operation defines the relevant type of each operation parameters
(lines 3-9). The body of the operation consists of a series of parallel substitutions,
creating a new attribute and setting up its propertyName, propertyType and default
value based on expression provided by the operation parameters (lines 14-17).
89
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
addAssociation(assoName,
srcClass,srcProp,tgtClass,tgtProp,isComp,exp) =
PRE
assoName: ASSOCIATION - associationName &
srcClass : CLASS & srcClass /: dom(association(assoName)) &
tgtClass : CLASS & tgtClass /: ran(association(assoName)) &
srcProp : PROPERTY - propertyName &
srcProp /:ownedProperties[{srcClass}]&
exp
: OBJECTID
THEN
ANY srcOID
WHERE srcOID = extent(owningClass(srcProp))
THEN
associationName := associationName \/ {assoName}
||
propertyName
:= propertyName
\/ {srcProp,tgtProp}
||
association
:= association
\/
{assoName|->{srcClass|->tgtClass}}
||
memberEnds
:= memberEnds
\/
{assoName|->{srcProp|-> tgtProp}}
||
ownedProperties := ownedProperties \/
{srcClass|->srcProp, tgtClass|->tgtProp} ||
link
:= link
\/ {assoName |-> (srcOID * {{exp}})}
END
END;
Table 5.17: Semantics of association addition
The semantics of deleteAttribute operation is shown in Table 5.16. This operation takes two parameters, attrName: the name of the attribute to be deleted
and cName: the name of the attribute owning class. Both parameters are bound
to corresponding attribute and class names. A parallel composition of substitution
statements are then used to remove the attribute from relevant sets and relations
using set and domain subtraction operators (lines 6-8).
Association addition and deletion
The operation of addAssociation has the description presented in Table 5.17. In
addition to an association name, the operation takes a source and a target class
names and a source and a target property names as two association ends to form an
association. The parameters of the operation also specify whether any of the two ends
is considered as a composite end of the association. Similar to attribute addition, the
90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
deleteAssociation(assoName) =
PRE
assoName : ASSOCIATION
THEN
ANY
srcClass,tgtClass, srcProp,tgtProp
WHERE
srcClass : dom(association(assoName))&
srcProp : dom(memberEnds(assoName)) &
THEN
associationName := associationName - {assoName}
||
propertyName
:= propertyName - {srcProp,tgtProp} ||
association
:= {assoName} <<| association
||
memberEnds
:= {assoName} <<| memberEnds
||
ownedProperties := ownedProperties {srcClass|->srcProp, tgtClass|->tgtProp} ||
link
:= {assoName} <<| link
END
END
END
Table 5.18: Part of the semantics of association deletion
parameters of addAssociation operation may include an expression giving a default
value to the association being added.
The body of the operation binds a local variable of source object identifier to
corresponding identifiers. This variable is used to initialize the link created at the
instance-level when an expression is provided in the operation parameters. The substitution statements update relevant functional relations such as associatioName,
association and memberEnds using set union. Note here that we assume that the
source and target properties provided as parameters in the operation did not exist
before in the model. Hence, we use a number of substitution statements to introduce
them into the data model.
The operation of deleteAssociation, shown in Table 5.18, takes the association
names as a single parameter. The PRE operator ensures that the association name
is a member of the ASSOCIATION set. The body of the operation first introduces a
number of local variables to identify source and target classes and association ends
participating in the association to be deleted. These variables are then used to update
association-related functions such as association and memberEndsusing a combination of substitution statements.
91
5.3.2
From evolution primitives to evolution patterns
We may wish also to include model edits corresponding to more complex model
evolution operations such as inlining an existing class, introducing an association
class, extracting a class or specializing a reference type. These operations involve
changes to a collection of related model elements.
Fowler’s refactoring catalogue [62] and advanced schema evolution operations ,
similar to those outlined in [100, 39, 26], or more recently in [19] can be a starting
point for candidate patterns relevant within the context of model-driven information
systems evolution. Using our AMN representation, these data model evolution patterns may be given a formal semantics to precisely define their applicability and effect
on an information system data model.
For example, we may introduce inlineClass compound operation. This operation [39] is used when a referenced class becomes redundant and we wish to retain
some or all of its attributes as members of another class. For the arguments srcClass,
refClass, and srcAttribute, this operation can have the semantics shown in Table
5.19.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
inlineClass(srcClass, refClass, refProperty) =
PRE
srcClass
: className &
refClass
: className &
refProperty : propertyName &
refProperty : ownedProperties[{srcClass}] &
propertyType(refProperty) = classType(refClass)
THEN
! attrib : ownedProperties [{refClass}] .
{addAttribute(srcClass, attrrib, type, exp);
deleteAttribute(refClass, attrrib)};
VAR assoName IN
assoName := dom(association~[{(srcClass|->refClass)}];
deleteAssociation(assoName);
deleteClass(refClass)
END
Table 5.19: Semantics of class inlining
As another example, we take generalizeClass compound operation [100]. This
operation finds common attributes between two given classes and builds a new superclass. The semantics of this operation is defined in Table 5.20.
92
The precondition of this operation ensures that both subclasses exist in the data
model. In the substitution of the operation, we first create an abstract superclass.
We then define three local variables : to collect the attributes of each subclass and
the common attributes that exist in both classes. We finally assign each common
attribute to the new superclass and modify each subclass to point to the new class as
its super.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
generalize (subClass1, subClass2, superClass) =
PRE
subClass1 : className &
subClass1: className &
superClass: CLASS
&
THEN
addClass (superClass, isAbstract) ;
VAR newSuper, sc1Attribs, sc2Attribs, commonAttribs IN
sc1Attribs := union (ownedProperties(subClass1)) &
sc2Attribs := union (ownedProperties(subClass2)) &
commonAttribs = sc1Attribs /\ sc2Attribs
THEN
! attrib . (attrib : commonAttribs .
! oClass . (oClass = owningClass(attrib)
ownedProperties := ownedProperties - {oClass |-> {attrib}}
\/ { superClass |-> {attrib}}
owningClass (attrib) := superClass));
modifyClass(subClass1, newSuper) ;
modifyClass(subClass2, newSuper) ;
END
END
Table 5.20: Semantics of class generalization
In the following section, we use the formal semantics we proposed here to show
how a calculation of consistency constraints can be used to determine the domain
of applicability of a proposed sequence of model changes: the set of data states
from which a successful data migration will be possible—from the old version to the
new version of the system. As the system model is edited and annotated to reflect
proposed changes, the designer can be given an indication as to whether the data
held in the existing system, transformed according to the specified evolution, would
fit the updated model.
93
5.4
Verification of data model evolution
As we outlined in Chapter 2, within the context of this dissertation, the problem of
model-driven data migration can be decomposed into two sub-problems: (i) specifying
data model evolution and (ii) managing the effect of evolution.
The first sub-problem deals with the meaning of each evolutionary step on the
overall data model and underlying data instances. For example, the deletion of a
property in a class affects the owning class and its subclasses inheriting that property.
This deletion also implies deleting corresponding data values from objects of the
affected classes. In the previous section, we have assigned our evolutionary steps a
precise semantics from B-method that deals with this sub-problem.
The second sub-problem requires techniques for maintaining both data and model
consistency. As information systems evolve, it is important to be able to determine
whether the proposed evolution preserves data model consistency and whether the
corresponding data migration is going to succeed. In other words, we need to ensure
that each model-level change and corresponding data-level values satisfy the invariants
of the target model, hence we can determine the success of the data migration before
it is executed.
Building on AMN formalization that we presented in the previous section, in
this section we present three important verification activities. First, using B-method
internal consistency proofs, we show how our model edits can preserve data model
consistency. In our context, this is a pre-requisite for evolving data models, i.e. we
need to ensure that by applying our model edits on a consistent source data model, we
obtain a consistent target data model. Second, as another verification activity, using
B-method refinement proofs, we show how it is possible to check whether an evolved
data model is a refactoring of a source data model. Such a check would allow modelers
to perform a wide range of edits and, at the same time, ensure that data migration
can be performed successfully, given that the resulting model is a refinement of the
source model. Finally, when data model evolution involves adding or removing data
model features such that refinement cannot be established, we show applicability of
data model evolution may be determined and use this as an indication for the success
of data migration from the old version to the new version of the system.
5.4.1
Verifying data model consistency
The AMN formalization we presented so far can play a substantial role in verifying
data model consistency, both syntactic and semantic. Hence, we are able to verify
94
the notion of conformance; a crucial principal in model-driven development. While
most of the related work in this area, such as [128, 54, 25], focus on one aspect of
conformance, that is, whether a model conforms to a metamodel (M1 to M2 levels of
MOF hierarchy), we, in addition, are also interested in investigating another aspect
of conformance, which is verifying that data instances conform to the data model
(M0 to M1). In our work, checking both aspects of conformance resolves to discharging an internal consistency proof obligation in B-method. Below we explain the
main principles based on which data model consistency proof obligations are raised.
In Appendix D, we provide detailed explanation of how these proof obligations are
discharged using B-method proof tool.
Based on our explanation in Section 4.2.3, we assume that all data model consistency constraints are properly mapped into abstract machine invariants, here denoted
by INV . Abstract machine INV specifies what is required to be true of all possible
states that can be reached during any model edit. We also assume that all data
model edits are properly mapped into abstract machine operations, here denoted by
EDIT . If invoked correctly (i.e. within their precondition), each model edit EDIT
must maintain machine invariants, and reach a final state in which INV holds. If we
further assume that that an edit can only be called from a state in which the INV is
true, the proof obligation on model edits to preserve the invariant is thus as follows:
INV ∧ P ⇒ [EDIT ]INV
(5.1)
This states that if the machine is in a state in which INV and PRE are true,
and the operation is supplied with inputs in accordance with PRE , then its behavior
(described by EDIT ) must be guaranteed to re-establish INV . If all of our model
edits meet this requirement, and the machine starts in a state in which INV is true,
then the invariant INV must indeed be true in every reachable state.
To establish that machine invariants hold when model edits are executed, we need
to prove that
[EDIT ](INV )
(5.2)
As the abstract machine includes a number of invariants given as conjunction, according to B-method proof generation technique, we need to establish that
[EDIT ](INV1 ) ∧ · · · ∧ (INVn )
(5.3)
This is equivalent to establishing each of the invariants separately and it allows us
to give smaller proof obligations rather than one large proof obligation. In fact, only
95
x
x
x
x
x
Attribute addition
Attribute deletion
x
x
x
Association addition
Association deletion
x
x
x
Link Conform.
Inst. Conform.
x
Value Conform.
Exis. Depndncy.
Class addition
Class deletion
Name Uniq.
Bidirectionality
Semantics
Consistency
Inher. Heirchy.
Syntactic
Consistency
x
x
x
x
x
x
Table 5.21: Potential violations of machine invariants by model edits
the invariants that include any of the variables that EDIT acts on as a free variable
will raise proof obligations, i.e. if i is a free variable in INV 1 and it is being updated
by EDIT , we will have to prove that [EDIT ] (INV 1 ). For the rest of the invariants
INV (excluding INV 1 ), we have that [EDIT ] (INV ) = INV ; since there is no free
occurrence of i that EDIT substitutes.
Table 5.21 shows a high-level overview of the proof obligations raised by B-method
prover for our model edits. Columns correspond to data model consistency constraints, both syntactic and semantic; rows to our generated model edits. A cross
in a cell represents that there is a potential conflict between the corresponding consistency invariant and the specific edit, meaning that the consistency constraint can
potentially be violated when the edit executes.
An operation whose proof obligation is not true highlights possible conflict between the machine invariant and the operation, which will need to be resolved during
the design process. There are a number of ways that the conflict can be resolved,
depending how it arose. It may be that the machine allows the operation to be invoked when it should not be. This is controlled by the precondition to the operation,
and it can be corrected by strengthening the precondition to exclude the states and
inputs where it should not be invoked. Consider an operation whose body is PRE P
THEN EDIT END in a machine whose invariant is INV . The proof obligation for the
operation is:
INV ∧ P ⇒ [EDIT ]INV ≡
P ∧ INV ⇒ [EDIT ]INV ≡
P ⇒ (INV [EDIT ]INV )
96
INV [EDIT ]INV represents the weakest value of P that will ensure that EDIT
will preserve invariant INV . This gives us a justification to use proof obligations to
complete preconditions.
For example, since the typing invariants of variables representing model elements
is denoted by partial functions, in order to ensure that such typing invariants are
maintained after the execution of relevant model edits, we identify a precondition,
which states that if a new model element is added by the operation, this element
should not be in the domain of the partial function. For instance, owningClass,
which is a partial function from PROPERTY to CLASS is updated by addAttribute
operation. In addAttribute operation, we identify attrName /: dom(owningClass)
as a precondition to be added to the operation.
The fact that our model edits are proved to preserve data model consistency
constraints gives us a guarantee that when we use these edits to evolve a consistent
source data model, we will end up with a consistent target data model. With this
guarantee, we proceed to another kind of check involving source and target data
models, that is to check whether the target model is a refinement of the source model.
5.4.2
Verification of data model refactoring
Our focus is on refactoring UML data models used within the context of information
systems development. We have shown how a refactoring step such as extractClass
can be applied to an information system data model.
The essence of refactoring is to improve the design of a software artifact without
changing its externally observable behavior [111]. Although the notion of refactoring has been widely investigated, for example [62], a precise definition of behavior
preservation is rarely provided [111].
The refinement proof in B establishes that all invariant properties of a current
data model and that all pre-post properties of model operations are also valid in
the target (evolved) model. Thus, it can give us a means to verify the data model
refinement property and, at the same time, guarantee that the data migration will
succeed: that is, the data values after the changes will satisfy the target model
constraints. This enables designers to perform a wide range of data model changes
(e.g. those characterized as refactoring ) while ensuring that data persisted under a
current model can be safely migrated.
In Section 3.3.2, we mentioned that a refactoring should not alter the behavior
of software. As we are interested in applying the refactoring notion to data models
as software artifacts, we need to capture the behavior aspect of data models and
97
prove that the source model behavior (before refactoring) is semantically equivalent
to the target model behavior(after refactoring). This can be achieved by translating
the source model to an AMN abstract machine and the target model to an AMN
refinement machine. We use a linking invariant in the refinement machine to relate the state space of the two machines. By discharging B-method refinement proof
obligations we outlined in section 2.4, we can prove that the target model is a refinement of the source model and hence establish that the behavior of the two models is
equivalent.
The mapping from UML to B described in the previous chapter can be used for
such verification. We map the current data model to an AMN abstract machine and
the target model to an AMN refinement machine. In the refinement machine, we need
to define a linking invariant: an invariant property that relates data in both machines
and acts as data transformation specification. This invariant can be mapped from
initial value expressions provided by designer in the evolution specifications (e.g.,
those provided with addAttribute() and addAssociation() primitives).
By discharging B refinement proof obligations we can prove that a target model,
after evolution, is a data refinement of the current model and, hence, establish that
the behavior of the two models is equivalent and that the data migration can be
applied. Below, we state the proof obligations in general then apply them to an
example.
Example.
Figure 5.1(a) shows a partial mapping of the initial version of UML
data model, presented in Figure 4.6(a), to an AMN abstract machine. Figure 5.1(b)
shows a similar mapping of the same data model after applying extractClass()
refactoring step, as shown in Figure 4.10, to an AMN refinement machine. Note that
the last invariant conjunct in the refinement machine (line 9 in Figure 5.1(b)) describes a linking invariant in the form of data transformation of the team association.
While in the source model, this association was a relation between Department and
Employee classes, in the target model it is a relational composition through Project
class. Applying refinement proof obligations to the example above, we get the following outcome:
1. Initialization. This condition holds since the empty sets in Initialization
clauses can be related to each other.
2. Operations. When considering setProjAssgmt() operation (line 16 in Figure
5.1(b)), this condition holds since this operation in the refinement machine has
no explicit precondition (it works under the assumption of the precondition of
98
MACHINE
1
EmployeeTracking
Refinement EmployeeTrackingR
REFINES
EmployeeTracking
SETS
PROJECT
VARIABLES employeesr, projectsr, teamr
INVARIANT
...
teamr
: PROJECT <--> EMPLOYEE &
projectsr : DEPARTMENT <-> PROJECT &
team =
(projectsr ; teamr)
...
INITIALISATION
employeesr :={} || projectsr := {}||
teamr := {}
OPERATIONS
...
setProjAssgmt (emp,dep)=
BEGIN
ANY pp WHERE pp:projectsr[{dep}]
THEN teamr := teamr \/ {pp|->emp} END
END;
2
SETS
EMPLOYEE; DEPARTMENT
VARIABLES employees, team
INVARIANT
...
employees : DEPARTMENT <-> EMPLOYEE &
team
: DEPARTMENT <-> EMPLOYEE
...
3
4
5
6
7
8
9
10
11
12
INITIALISATION
employees := {} || team := {}
13
14
15
16
17
18
19
20
OPERATIONS
...
setProjAssgmt (emp,dep)=
PRE
emp: EMPLOYEE & dep: DEPARTMENT
THEN
team := team \/ {dep |-> emp} END;
21
22
23
24
25
26
27
response<--getProjAssgmt(emp)=
PRE
emp: ran (team)
THEN
response:= team~[{emp}] END
END
response<--getProjAssgmt(emp)=
BEGIN
VAR ppr
IN ppr:= teamr~[{emp}];
resp:= projectsr~[ppr]END
END
(a)
(b)
Figure 5.1: Mapping of data models in Figure 4.6(a) and Figure 4.10 to AMN (partial)
corresponding setProjAssgmt() operation in the abstract machine) and every
execution of this operation in the refinement machine [S1] updates teamr relation which is (according to linking invariant J) an equivalent relation to team
which is updated by setProjAssgmt() operation in the abstract machine (line
16 in Figure 5.1(a)).
3. Operations with outputs. Applying this proof obligation on getProjAssgmt()
operation (line 22 in Figure 5.1(b)), we find that it holds. Every execution of this
operation in the refinement machine generates a response output assigned to an
input employee. This is matched by the execution of the corresponding operation in the abstract machine via the linking invariant team = (projectsr ; teamr).
We conclude that a refinement relation exists between the two machines.
We have shown how an information system data model may be evolved by introducing a refactoring steps. The applicability and effect of such refactoring steps were
precisely specified using our formal semantics. This precise definition allowed us to
apply refinement checks to establish whether a target data model is a refinement of
a source data model.
99
5.4.3
Checking applicability of data migration
Where the evolution represents a change in requirements, we should not expect the
new model to preserve the behavior or establish the invariant properties of the current
model; to the contrary, we should expect to find that different properties are now
constrained in different ways.
In this section, we present a technique for computing the pre- and postconditions
of a collection of evolution operations. This technique is important for two main
reasons. First, it allows us to specify an evolution pattern to an existing data model
design as a composition of evolution primitives and then to check the legality of the
composition and calculate its overall precondition. Second, since we aim to check
data model evolution statically, this technique will help us decide, in advance, if it is
meaningful or safe to perform a sequence of evolution or not and, as a result, avoid
invoking a kind of roll-back mechanism to restore the system to a previous state when
one evolution operation cannot be executed. We start from the basic assumption that
it is easier to check if a sequence of model evolution and corresponding data migration
can be successfully applied on a given data model before execution and that such check
can be simpler if we can represent such composed evolution by one pre-post pair.
As we stated previously, in our work, we have two main ways in which we can
compose evolution:
1. Grouping or combining (using sequential compositions).
2. Set iteration.
Chaining using a sequential composition (denoted by ;) is where a sequence of evolution primitives is applied one primitive after the other. For example, the following
chain adds attributes attr1 and attr2 to the class E.
addAttribute(E,attr1);
addAttribute(E,attr2)
Set iteration is where an evolution or an evolution chain is performed on a set of
model elements. For example, the following set iteration copies all the attributes of
the class E to the class D.
forAll attr:Attribute & owningClass(attr) = E
addAttribute(D,attr)}
100
. {
A chain of evolution primitives may be of any length, but we can simplify the
computation of its pre- and postconditions by observing that we need only to solve
the problem for a chain of a length of two primitives. This procedure can then be
iteratively applied to the remaining chain until the full pre- and postconditions have
been computed.
Below we present how an overall pre- and postcondition of a chain of evolution may
be calculated in three steps. First, we need to establish that the chain of evolution
primitives is legal, i.e. there is no logical contradiction between one evolution primitive
and the other. Second, since an evolution chain may contain a set iteration, we
explain how the pre- and postconditions of a set iteration can be computed . Finally,
we perform the computation of the overall chain of evolution.
Step 1 : determining the ‘legality’ of a compound evolution
To provide more precise description of evolution composition, we introduce the following notation:
statei - set of all variables in a particular system state i .
pre primv (statei ) - precondition of an evolution primitive evaluated on state i .
post primv (statei ) - postcondition of an evolution primitive evaluated on state i .
Initially, one might think that two evolution primitives can be composed simply
by combining their pre and postconditions using the AND operator. In some cases
this is correct. In some other cases, combining evolution primitives may introduce
logical contradictions that may not seem obvious. for example,
addClass(F);
addAttribute(F,attr)
Considering the semantics of addClass and addAttribute primitives we discussed in
Section 5.3, ANDing the preconditions of these two primitives produces, among other
precondition clauses:
F
: CLASS - className
F
: className & ...
& ...
even though this chain seems perfectly correct. The source of this contradiction lies in
the fact that the two preconditions should be valid at different points in data model
evolution.
In other compositions, the chain may simply be illegal, although the overall precondition may not indicate contradiction e.g.,
101
deleteClass(C);
addAttribute(C,attr)
ANDing the preconditions here gives C : className even though this chain is illegal. Although the precondition for addAttribute is valid at the start of the chain,
deleteClass breaks it, so this composition of evolution primitives can not be correct.
Therefore, for our chain of evolution to be legal, we require that every precondition
should be valid at the point where it is applied.
If eval is a function of type eval : PREDICATE → BOOLEAN, we may write :
eval [pre primvi (statei )] = true
If this condition is not satisfied then we have one inapplicable evolution step and we
can conclude that the whole chain is inapplicable.
The second condition that must be satisfied is that if the postcondition of a primitive description evaluates to true then the precondition of a subsequent evolution
primitive must also evaluate to true, i.e.:
eval [(post primvi (statei , statei+1 ) ⇒ pre primvi+1 (statei+1 )] = true
Therefore, before we continue with composing evolution primitives, assuming a chain
of two primitives, we need to analyze the consistency between postcondition of the
first and precondition of the second evolution primitives. With this analysis we detect
cases in which, after completion of the first evolution primitive, we cannot continue
with the second evolution primitive due to unsatisfied precondition.
Step 2 : computing pre- and postconditions for set iteration
As stated earlier, a set iteration has the following form:
forAll x:Element & Pred(x,...) . {
primitive (x,. . . )}
where Element is some kind of model elements, Pred is some predicate with an
argument from the model elements or the evolution. If the set of x of type Element
that satisfies Pred(x,...) is given as {x1,x2,...,xn}, and writing primvk as a
shorthand for primitive(xk , ...), then this iteration may be viewed as the following
chain: primv1 , primiv2 , ..., primvk .
To be able to calculate the overall pre and postcondition, we must first establish
whether such an iteration is legal, i.e. no contradiction exists between the postcondition of any evolution primitive and the precondition of a subsequent evolution
primitive within the set iteration. This can be done using the same criteria we presented in step 1 above.
102
Once the legality of set iteration is established, we can compute its precondition.
This can be done by logically ANDing the preconditions of its primitives
Now that we are able to determine the legality of a chain of evolution and express
the pre and postcondition of a set iteration, we are in a position to compute the
overall pre and postcondition of the chain.
Step 3: Computing overall pre- and postcondition
Assuming the chain is legal, its precondition is obtained by logically ANDing pre
primv 1 with whatever parts of pre primv 2 that are not contradicted by post primv 1.
If a contradiction arises in this evaluation, the chain is illegal. This results in aborting
the calculation. Once we establish that no logical contradiction exists, we can compute
the overall precondition of the chain. This is obtained by evaluating:
pre primv 1 ∧ (post primv 1, pre primv 2) ∧ (post primv 2, pre primv 3)∧ ...
In our approach a postcondition is described as substitution statements. An expression E can be substituted for a free variable x (i.e. one that is not in scope
of quantification) in a predicate P, by replacing all free occurrences of x with the
expression E, i.e. P[E/x]. As well as single substitutions that replace a single variable in a predicate with an expression, we also use multiple substitutions. These
simultaneously replace a collection of variables with a corresponding collection of expressions, i.e. P[E,...,F / x,...,y]. Any machine variable not mentioned in the
postcondition of an evolution primitive is implicitly not affected by the evolution.
The postcondition of a compound evolution is obtained by considering the overall
variable substitution described in the postconditions.
An example of the application of the technique we described above is given next.
Example
In this example we take a typical compound evolution that involves both chaining
and set iterations and compute its pre- and postconditions. We will use inlineClass
operation which was part of the data model evolution in our example at the end of
Section 4.3.3. Figure 5.2 depicts the overall effect of this operation. The semantics
of this operation is defined, in part, as:
inlineClass(srcClass, refClass, refProperty) =
<forAll attrib : ownedProperties [{refClass}] .
addAttribute(srcClass, attrrib,...);
deleteAttribute(refClass, attrrib)}
103
A
a1
A
a1
b1
b2
ab
B
b1
b2
Figure 5.2: Class inlining
deleteAssociation(assoName);
deleteClass(refClass)>
...
>
Computing the pre-and postconditions of this compound evolution proceeds in several steps:
1. Computing pre and postcondition for the set iteration
This involves first rewriting the precondition of deleteAttribute operation with
the postcondition of addAttribute:
evaluate([post addAttribute], [pre deleteAttribute]) =
attrib : ownedProperties[{refClass}] & ...
and then ANDing this with the precondition for addAttribute(), which is:
attrib /: ownedProperties[{srcClass}]
so the final precondition for this chain is, in part,:
attrib : ownedProperties[{refClass}] & ...
attrib /: ownedProperties[{srcClass}]
which causes no contradiction so the chain is legal within the set iteration. Therefore, on every iteration, the precondition must be true, i.e. must be valid for every
attribute owned by class refClass. The postcondition of the set iteration body can
be computed by concatenating the substitution statements of deleteAttribute and
addAttribute, in part:
104
ownedProperties := ownedProperties \/ { srcClass |-> attrib } ||
ownedProperties := ownedProperties ...
{ refClass |-> attrib }
The iteration creates a new attrib in class srcClass each time and deletes the same
attrib from class refClass.
2. Computing pre and postconditions for deleteAssociation() primitive
Here, we need to rewrite the preconditions for the deleteAssociation() with
the postconditions of the set iteration in step 1. However, since the postcondition of
the set iteration updates relevant machine variables for the attribute attrib, while
the precondition of deleteAssociation() checks, among other things, whether the
deleted association already exists in the data model, there is no relation between the
two and the re-writing required by this step is avoided.
3. Computing pre and postconditions for deleteClass() primitive
The precondition of deleteClass() is specified as:
refClass : className
the postcondition of deleteClass() is specified, in part, as:
className := className - {refClass}
||
superclass := {refClass} <<| superclass |>> {refClass} ||
ownedProperties := {refClass} <<| ownedProperties
||
...
4. Computing pre and post for the overall chain
Precondition of deleteClass must be rewritten with the postcondition of
deleteAssociation() and if there are remaining conjuncts, these must be part of the
precondition of the whole compound evolution. the precondition of deleteAssociation
obviously constraints the association, memberEnds and other association-related
functions and, within the context of this example, may not have direct relation to
variables updated by deleteClass. Therefore, the precondition remains the same
and does not need to be re-written. The overall postcondition of the chain can be
obtained by combining the postconditions of the set iteration, deleteAssociation
and deleteClass as elaborated above. Accordingly, we conclude that the evolution
proposed of inlineClass operation is applicable.
105
In other cases, the evolution proposed by class inlining may not be applicable.
For example, if we suppose that the refClass (i.e. the class to be inlined ) owned
a property that is an end of a bi-directional association, we would not expect the
following precondition of deleteClass operation to be satisfied:
ownedProperties[{refClass}] /\ ran(opposite) = {}
That is, to delete refClass, we need to ensure that none of its owned properties
participates in an opposite relation.
106
Chapter 6
Generating Platform-Specific Data
Migration Programs
In previous chapters, we have presented our approach which combines UML with Bmethod to facilitate the task of data model evolution. The result is a precise abstract
description of information system changes. This abstract description can be used
to reason about data model evolution and to induce corresponding data migration,
moving data instances from one information system version to the other.
To generate an appropriate implementation of our data model evolution specifications, we require a mapping from our abstract model operations to operations
upon a specific, concrete platform: some of these operations will update metadata,
or features of the representation; others will be data operations implementing the
semantics outlined above in AMN. In practice, the kind of data that we might wish
to preserve across successive versions of an information system is likely to be stored
in a relational database.
The goal of this chapter is to show how the abstract specifications of data model
structural properties and data model changes can be transformed into a relational
model then into an executable implementation. We have defined a set of formal
refinement rules that deal with different concepts such as inheritance and associations.
The illustration of these rules will be covered in the first part of the chapter. In
the second part of this chapter, we will cover a subsequent refinement to an SQL
implementation. In both refinements we cover two sets of refinement rules. The
first set deals with the refinement of data structure : introducing more concrete
variables or changing the type of existing abstract variables. The second set deals
with substitution refinement : rewriting the description of evolution operations in
terms of the more concrete variables introduced on data structure refinement.
108
B Abstract Machine
Data Migration Implementation
Object-to-Relational Model
Evolution Metamodel
First refinement:
Data Metamodel
B Implementation Machine
B Refinement Machine
Data structure &
Substitution refinement
IMPORTS
Second refinement:
REFINES
Evolution Metamodel
Data structure &
Substitution refinement
SQL Abstract Machine
REFINES
Object-to-Relational
Figure 6.1: From abstract specifications to relational database implementation
Figure 6.1 shows the main elements of our B specifications. The abstract machine
for Evolution Metamodel has been presented in Chapter 5. This abstract machine
has been refined into more concrete specifications in the form of Object-to-Relational
refinement machine, which will be presented at the first part of this chapter. In
the second part of this chapter, we introduce the last component : Data Migration
Implementation, which imports an abstract machine representing the metamodel of
SQL, the language that we are using to implement our data migration programs.
Each of the above refinement steps gives rise to proof obligations. While specification proofs, as we have outlined in Chapter 5, ensure that operations preserve
the invariant, refinement proofs ensure the correctness of a refined component with
respect to its abstract specifications.
6.1
From an object data model to a relational data
model
The increasing popularity of using object models in data modeling [138] and the
prevalance of relational model for data persistence created the need for addressing
object to relational mapping techniques. Using similar techniques to those reported
in [10] and [20], we add two contributions to previous work in this area. First, our
mapping rules are formally characterized in B-method, a language with theoremproving tool, which enables us to verify the correctness of the refinement process.
Second, unlike [106] and [110] which have used AMN to describe refinement within
the context of information systems development, our main focus here is on describing
such rules within the context of information systems evolution and data migration.
The main difference of our refinement rules originates from the need to refine model
evolution operations into corresponding and faithful relational representation then
into SQL executable programs.
109
Refinement
Variable
Related
Abstract Variable
Description of
Linking Invariant
ownedAttributes, ownedProperties ownedProperties abstract variable is refined into two refinement
ownedAssoEnds
variables : ownedAttributes and ownedAssoEnds, defined as
disjoint subsets of ownedProperties.
1
2
3
4
5
6
ownedAttributes <: ownedProperties &
ownedAssoEnds
<: ownedProperties &
ownedAttributes \/ ownedAssoEnds = ownedProperties &
ran(ownedAttributes) /\ ran (ownedAssoEnds) = {} &
propertyType[union(ran(ownedAttributes))] = ran(primitiveType) &
propertyType[union(ran(ownedAssoEnds))] = ran(classType)
Figure 6.2: Refining data model properties
In this section, we present Object-to-Relational Refinement Machine. The overall machine can be found in appendix C. Here, we describe main refinement rules,
according to which this refinement machine was developed.
Our Object-to-Relational refinement follows the rules defined by [20] that derive a
relational database schema from an object model whose semantics is similar to ours.
The main idea of these rules is to reorganize an object model into a set of independent classes, so that object-oriented concepts such as inheritance and association are
transformed into corresponding concepts in relational database domain.
6.1.1
Refining data model properties
In our object relational refinement context, we make a distinction between attributes
and association end s. This distinction is necessary to facilitate the subsequent refinement of data model properties into SQL implementation features where attributes
are mapped into columns typed by SQL basic data types (e.g. varchar and integer)
and association end s are mapped into foreign keys pointing to primary keys of their
class types.
In the abstract machine, both attributes and association ends were represented
by ownedProperties: a single variable typed as a relation between CLASS: the set
that contains all class names to PROPERTY: the set that contains all property names
(see Section 5.1.3). In refinement, ownedProperties relation is refined into two
variables: ownedAttributes and ownedAssoEnds. Figure 6.2 shows how each of
these two variables is typed: as a disjoint subset of ownedProperties. The linking
invariant conjuncts in lines (5-6) ensure the typing of each variable by relating it
110
Refinement
Variable
Related
Abstract Variable
Description of
Linking Invariant
inheritedAttributes, ownedProperties ownedProperties of each superclass in the data model is refined into
(of superclass) inheritedAttributes and inheritedAssoEnds in all
inheritedAssoEnds,
subclasses in the inheritance hierarchy.
propertyClass
1
2
3
4
5
6
7
8
9
10
11
12
13
14
owningClass
A property may be owned by multiple classes along the inheritance
hierarchy.
inheritedAttributes : CLASS
<-> PROPERTY
&
inheritedAssoEnds
: CLASS
<-> PROPERTY
&
propertyClass
: PROPERTY <-> CLASS
&
dom(inheritedAttributes)<:className
&
! cc.(cc : className &
cc /: dom(superclass) =>
inheritedAttributes[{cc}]= {}) &
! cc.(cc : className &
cc : dom(superclass) =>
inheritedAttributes [{cc}] = ownedAttributes
[closure1(superclass)[{cc}]]) &
...
propertyClass = (ownedAttributes \/ inheritedAttributes \/
ownedAssoEnds
\/ inheritedAssoEnds ) ~
Figure 6.3: Flattening data model inheritance
to primitiveType and classType : two typing functions defined in the abstract
machine (see Section 5.1.1).
6.1.2
Flattening inheritance
Relational databases do not support inheritance[9]. Several solutions exist (see for
example [9, 20]). The approach we follow in refinement is that each subclass in
an inheritance hierarchy includes both its own attributes and association ends as
well as attributes and association ends inherited from its superclasses. Accordingly, in our refinement we introduce two new variables : inheritedAttributes
and inheritedAssoEnds. The typing and linking invariants of these two variables
are shown in Figure 6.3.
Each variable is defined as a relation between CLASS and PROPERTY. We relate
inheritedAttributes refined variable into abstract variables using three linking invariant predicates, Figure 6.3, lines 4-11. First, the class that has inherited attributes
must be one of the data model classes (a subset of className). Second, top-level
classes (those that do not have superclasses) are not expected to have any inherited
attributes. Finally, inherited attributes of a class in the data model is defined in
terms of owned attributes of any of that class superclasses, using closure1 transitive
111
closure operator. The inheritedAssoEnds relation is similarly defined: as the set of
all class-typed properties owned by any superclass in the inheritance hierarchy.
In addition, variable propertyClass was introduced as a refinement of abstract
variable owningClass. In the abstract machine, owningClass was typed as a partial
function mapping each named property to is owning class. As a result of flattening
inheritance, this variable has been refined into propertyClass which is a relation
from CLASS to PROPERTY. Now, a property (either an attribute or an association end)
may be owned by more than one class in the inheritance hierarchy.
6.1.3
Introduction of keys
Refinement
Variable
classKey
1
2
3
Related
Abstract Variable
className,
isAbstract
Description of
Linking Invariant
a unique classKey is defined for each non-abstract class in the data
model
classKey
: CLASS
>+> NAT
&
dom(classKey) <: className
&
dom(classKey) = isAbstract ~ [{FALSE}] &
Figure 6.4: Introduction of class keys
In our AMN formalization, a UML class is represented by a name in CLASS : the
set that contains all possible class names. Existing class names that are used in the
data model are maintained in className which is a subset of CLASS. In the relational
model, each class must have a key. It is a value-based model, which means that each
tuple is identified by a key [55].
In our refinement machine, a key is represented by variable classKey, which is a
partial injective function from CLASS to NAT : the set of natural numbers, capturing
the fact that each class key is unique to one class. The linking invariant (Figure
6.4, lines 2-3) states that each non-abstract class in the data model must have an
identifying key.
6.1.4
Refining data model associations
In the abstract machine, an association is represented as a partial function, from
ASSOCIATION : the set of possible association names to another partial function that
goes from CLASS to CLASS (see Section 5.1.4). In refinement, we map every data model
112
association to a table. As such, we introduce associationTable as a refinement
variable, whose typing and linking invariants are shown in Figure 6.5.
An associationTable variable is typed as a partial function from ASSOCIATION
set to another partial function on PROPERTY, representing the two ends of the association table. The linking invariant defines each associationTable in terms of abstract
association. This is done first by relating the names of both variables (line 2) and
then by relating the two ends to the two classes pointed to by the abstract association
variable using propertyClass function (lines 3-7).
Refinement
Variable
associationTable
1
2
3
4
5
6
7
Related
Abstract Variable
association
Description of
Linking Invariant
Every data model association is mapped to an association table. The
association table consists of two ends owned by the classes participating in
the association relationship.
associationTable : ASSOCIATION +-> (PROPERTY +-> PROPERTY) &
dom (association) = dom (associationTable) &
! (aa,ae1,ae2).(aa : dom (associationTable) &
ae1 : dom (associationTable(aa)) &
ae2 : ran (associationTable(aa)) =>
propertyClass [{ae1}] = dom (association(aa))
&
propertyClass [{ae2}] = ran (association(aa)))
Figure 6.5: Refining data model association
This concludes our object model data structure refinement. With the introduction
of the above refinement variables, our data model structure is aligned to relational
data model paradigm. This alignment takes us one step closer towards our goal
of generating data migration programs in a relational database. However, before
discussing how such an implementation may be generated, we need to show how the
description of our abstract model edits may be refined to updated the new refinement
variables.
6.1.5
Refinement of abstract machine operations
The introduction of refinement variables, as outlined in the previous section, has a
direct impact on the description of our abstract model edits. Each model edit needs
to be refined. The refinement of each edit is realized by describing its behavior in
terms of the more concrete variables introduced in refinement. At the same time, we
need to ensure that the refined behavior of each edit is consistent with its behavior
113
in the abstract machine. This consistency is ensured by the stated linking invariants
that relate refinement variables to their counterparts in the abstract machine. In
fact, the main purpose of the refinement proof activity is to demonstrate how this
consistency is verified. In the following section, we show parts of model edits refined
description. The complete description of refinement proof activity can be found in
Appendix D.
Class addition and deletion
In refinement, addClass operation is described in terms of its effect on the following
refinement variables: classKey, inheritedAttributes, and inheritedAssoEnds.
1
2
3
4
5
6
7
8
9
10
11
12
addClass ( cName , isAbst , super ) =
IF isAbst = FALSE
THEN
ANY
cKey
WHERE cKey : NAT - ran (classKey)
THEN
classKey := classKey \/ {cName |-> cKey}
END
||
inheritedAttributes := inheritedAttributes \/
({cName}*(ownedAttributes
[closure1(superclass)[{cName}]]))
...
END ;
||
Table 6.1: Part of addClass operation refinement
Table 6.1 shows part of the refined description of addClass operation. If the
added class is not abstract, we assign it a fresh class key (lines 2-7) to enable it to
persist data. This is going to be an important feature that we will need when refining
classes to database table in the subsequent refinement step.
As a result of flattening inheritance in this refinement step, we need to allow
newly introduced subclasses to inherit attributes and association ends of its superclasses along the inheritance hierarchy. Lines 8-10 show how this is done for inherited
attributes : by mapping the new class to the set of ownedAttributes of all its superclasses, using the transitive closure operator on superclass relation, for the new
class. A similar technique is done for inheritedAssoEnds.
The refined description of deleteClass operation follows a similar pattern to
addClass, but with an opposite effect. One difference to note, though, relates to the
update of associationTable variable. Here, this variable is updated by collecting all
114
associations where the class named for deletion (or any of its subclasses) participates
and removing those associations from associationTable variable, as shown in Table
6.2, lines 6-9.
1
2
3
4
5
6
7
8
9
10
11
12
deleteClass ( cName ) =
ANY classAssos , subclassAssos
WHERE
classAssos
= {asso | asso : ASSOCIATION &
((dom(associationTable (asso))
\/
ran (associationTable (asso))) /\
ownedAssoEnds [{cName}]) /= {}} &
subclassAssos = ...
THEN
associationTable := (classAssos \/ subclassAssos ) <<|
associationTable
END
Table 6.2: Part of deleteClass operation refinement
Attribute addition and deletion
The refined behavior of addAttribute and deleteAttribute is described in terms of
its effect on the following refined variables: ownedAttributes, inheritedAttributes
and propertyClass. While the effect on ownedAttributes is straightforwod: adding
or subtracting the named attribute to/from its owning class, the effect on
inheritedAttributes and propertyClass may require further explanation.
As shown in Table 6.3, to update inheritedAttributes, if the owning class of
the new attribute, represented by cName parameter, is a superclass, we identify all
subclasses of this superclass using the transitive closure operator, applied on the
inverse of superclass relation, for the owning class. The resulting set of subclasses
is mapped to the new attribute using the cartesian product operator.
The update of propertyClass specifies the owning classes of the new attribute
after refinement. As a result of flattening inheritance, the added attribute is not
only owned by its immediate owning class but also by all subclasses of the owning.
deleteAttribute operation follows a similar pattern, with a reverse effect and can
be seen in Appendix C.
Association addition and deletion
The refined behavior of addAssociation and deleteAssociation is described in
terms of its effect on the following refined variables: associationTable, ownedAssoEnds,
115
1
2
3
addAttribute ( cName , attrName , type , exp ) =
BEGIN
ownedAttributes := ownedAttributes \/ {cName|->attrName}
||
4
5
6
7
8
9
10
inheritedAttributes := inheritedAttributes \/
closure1 (superclass~)[{cName}]*{attrName } ||
propertyClass := propertyClass \/
{attrName}*closure1(superclass~)[{cName}] \/
{attrName |-> cName }
END ;
Table 6.3: Part of addAttribute operation refinement
inheritedAssoEnds and propertyClass.
Table 6.4 shows part of the refined description of addAssociation. In refinement, creating an association updates associationTable variable by adding source
and target association ends, represented by srcProp and tgtProp respectively, as
two ends of the table. These two ends are also mapped to their respective owning
classes in the set of ownedAssoEnds. If any of the source class or target class is a
superclass, we require adding the two association ends to the subclasses of these two
classes. This is done by applying the transitive closure on the inverse of superclass
relation for the respective class, similar to the way we did in the refined substitution of
addAttribute operation above. The refined description of the deleteAssociation
is similarly defined and can be found in Appendix C.
1
2
3
4
5
6
7
8
9
10
11
12
13
addAssociation (assoName , srcClass , srcProp ,
tgtClass , tgtProp , isComp , exp ) =
BEGIN
associationTable := associationTable \/
{assoName |-> {srcProp|->tgtProp}} ||
ownedAssoEnds := ownedAssoEnds \/
{(srcClass|->srcProp) ,
(tgtClass |->tgtProp) } ||
inheritedAssoEnds
:= inheritedAssoEnds \/
closure1(superclass~)[{srcClass}]*{srcProp} \/
closure1(superclass~)[{tgtClass}]*{tgtProp} ||
...
END ;
Table 6.4: Part of addAssociation operation refinement
116
Person
name : String
Freelance
Employee
classKey : NAT
name : String
rate : NAT
emps
1
manages
0..*
head
classKey : NAT
name : String
salary : NAT
1
dept
assignment
0..1
Department
1
classKey : NAT
worksFor
location : String
0..*
emps
emp
info
1
assignment
worksFor
1
info
PersonalFile
classKey : NAT
status : String
Figure 6.6: Refined Data Model
6.1.6
Example
To demonstrate our Object-to-Relational refinement rules, we use the company data
model which we introduced in Chapter 4, Figure 4.5, with its corresponding instance
model in Figure 4.6. Applying our refinement rules, we obtain the refined model
shown in Figure 6.6. As the figure shows all classes, other than Person class, have got
classKey attribute to uniquely identify instances of each class, in preparation for data
persistence in the relational model, as we will show in the subsequent refinement step.
In addition, Person class attributes (e.g. name) and association ends (e.g. worksFor)
have been inherited by Employee and Freelance classes. The instance model of the
above refined data model, shown in Figure 6.7, is, to a great extent, similar to the
instance model before refinement, with the exception of classKey attribute that is
now persisted at the instance level and updated with the object identifier of objects
of each class. Using our refined data structure, this Object-Relational model can be
represented as follows, in part:
...
propertyClass
= {(name |-> Person),(name |-> Employee),
(name |-> Freelance),(worksFor|->Freelance)
(rate |-> Freelance),(salary |->Employee),
(worksFor|->Person) ,(worksFor|->Employee),...}
117
100:Employee
p
em
10:PersonalFile
110:Freelance
em
ps
classKey = 100
name = emp1
salary = 2000
wo
rks
Fo
r
classKey = 10
status = s
1000:Department
w
or
ks
Fo
r
em
ps
o
inf
classKey = 110
name = flance1
rate = 20
classKey = 1000
location = loc1
200:Employee
o
inf
p
em
20:PersonalFile
classKey = 200
name = emp2
salary = 2100
120:Freelance
emp
s
ps
em
work
sFor
2000:Department
or
rkF
wo
classKey = 2000
location = loc2
dep
ps
em
p
em
30:PersonalFile
ps
em
300:Employee
o
inf
r
r
Fo
rks
wo
classKey = 20
status = m
Fo
rks
wo
classKey = 120
name = flance2
rate = 210
head
classKey = 300
name = emp3
salary = 4000
130:Freelance
classKey = 130
name = flance3
rate = 400
classKey = 30
status = s
Figure 6.7: Refined Instance Model
ownedAttributes
=
{(Person
|-> name),
(Department |-> location)
(Freelance |-> rate),
(PersonalFile |-> status),...}
ownedAssociationEnds
inheritedAttributes
inheritedAssoEnds
...
= {(Person |-> worksFor),
(Department |-> employees),
(PersonalFile |-> employee},
(Employee |->info),...}
= {(Freelance |-> name),(Employee |-> name)}
= {(Freelance |-> name),(Employee |-> name)}
To show how the refined behavior of our model edits can be used to update
refinement variables as a result of updates in the abstract machine, assume that
our original model has been updated with a new class named BankDetails to hold
bank account information of each person working for the company. In addition, a
new association named paymentInfo is added with two association ends: bankData of
type BankDetails and payee of Person. Further, assume that a new attribute named
lastPaymentDate was introduced in Person class to record the last date on which an
employee or a freelance received a payment from the company.
118
These changes in the data model can be described using our model edits : addClass
to add the new class , followed by addAssociation to add the new association together with its two association ends and finally, addAttribute to add lastPaymentDate
feature. The semantics of this update at the abstract machine level has already been
described in Section 5.3. Here, we are interested in demonstrating the updates of
the variables introduced as a result of refinement. This can be seen by enumerating
the refinement variables that both addClass and addAssociation update as shown
below:
addClass(BankDetails) =
...
classKey := classKey \/ {BankDetails |-> cKey}
addAttribute(‘Person’,‘lastPaymentDate’, ...) =
ownedAttributes := ownedAttributes \/
{Person |-> lastPaymentDate}
propertyClass
:= propertyClass \/
{(lastPaymentDate |-> Employee),
(lastPaymentDate |-> Freelance),
(lastPaymentDate |-> Person)}
inheritedAttributes := inheritedAttributes \/
{(lastPaymentDate |-> Employee),
(lastPaymentDate |-> Freelance)}
addAssociation(‘paymentInfo’, ‘Person’, ‘account’,
‘BankDetails’, ‘payee’, ...) =
associationTable := associationTable \/
{paymentInfo|-> {account |-> payee}}
ownedAssoEnds := ownedAssoEnds \/
{(Person |-> acount) ,
(BankDetails |-> paye)}
inheritedAssoEnds := inheritedAssoEnds \/
{(Employee, account), (Freelance, account)}
6.2
Generating data migration programs
In this section we describe the second refinement step that we follow to generate SQL
implementation programs, corresponding to our data model evolution operations.
119
B Abstract Machine
Evolution Metamodel
refinedBy
B Refinement Machine
Object – to - Relational
B Abstract Machine
SQL Metamodel
refinedBy
importedBy
B Implementation Machine
Data Migration
Figure 6.8: Main components of B specifications
Here, we show how the Object-to-Relational Refinement, obtained in the previous
refinement step, can be refined further into a data migration program in SQL. To
achieve this goal, we perform two transformations. First, we transform our data
model structural properties into equivalent structural properties in SQL. Second, we
transform our model edits into corresponding statements in SQL.
Figure 6.8 shows an overview of B specification elements involved in the refinement
step which we perform in this section. For the sake of this refinement step, we
require a description of SQL language in B-method. This description can be obtained
by translating an SQL metamodel into a corresponding abstract machine in AMN,
similar to the way we formalized UML in B (refer to Section 5.1). The main aspects
of AMN formalization of SQL metamodel is explained in the following section. For a
complete description of the generated SQL machine, please refer to Appendix C.
Given that Object-to-Relational refinement was a refinement of the abstract evolution machine, if we can demonstrate that the generated SQL implementation is a
refinement of Object-to-Relational Refinement, then we can conclude that the SQL
implementation is, indeed, a refinement of the evolution abstract machine, based on
the transitive nature of refinement [6].
6.2.1
Formalizing SQL metamodel in AMN
Structured Query Language (SQL) is currently supported in most relational database
management systems and is the focus of an active standardization process. In our
formalization, we use a subset of SQL-Foundations part of SQL-2003 syntax [84, 34].
120
1
2
3
4
5
6
7
8
9
10
11
tableName
tableColumns
columnName
columnType
parentTable
canBeNull
isID
isUnique
primaryKey
foreignKey
tuple
<:
:
<:
:
:
:
:
:
:
:
:
TABLE
TABLE
COLUMN
COLUMN
COLUMN
COLUMN
COLUMN
COLUMN
TABLE
TABLE
TABLE
&
&
&
&
&
&
&
&
<-> COLUMN
+->
<->
+->
+->
+->
<->
<->
<->
Sql_TYPE
TABLE
BOOL
BOOL
BOOL
COLUMN
&
(COLUMN --> TABLE)
&
(COLUMN +-> Sql_VALUE)
Table 6.5: Part of the formalization of SQL data structure in AMN
This part of SQL standard defines the data structures and basic operations on SQL
data. Below, we focus on conceptualizing the main features of that standard that are
relevant to our selected UML data metamodel subset and to the implementation of
our evolution primitives.
Table 6.5 shows the formalization of SQL metamodel data structure in AMN.
Tables in an SQL model are named elements. They are represented by tableName,
typed as a subset of TABLE: the set of all possible table names. Table columns are
represented by a relation from TABLE to COLUMN: the set of all possible column names.
Each column in an SQL model has a name represented as an element of columnName
set, and a type represented by columnType partial function. We use the given set of
Sql_Type to represent all SQL basic data types (e.g varchar, integer, decimal, etc.). A
column is related to its owning table using parentTable relation. Other SQL column
characteristics we define include isID which specifies whether the named column acts
as the primary key of its parent table, canBeNull which specifies whether a column
must be valued and isUnique which specifies whether a column value is unique in
the table extension (described below).
A key in SQL can either be a primaryKey, defined by one or more table columns to
uniquely identify table rows or a foreignKey, defined by one or more table columns,
each referring to a table in SQL model.
SQL state denotes the extension part (set of instances) of an SQL model. We
represent SQL state as a set of named tables, each of which is mapped to pairs of
columns and values. We use Sql_VALUE to denote the set of possible SQL values of
columns: a union over Sql_TYPE.
121
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/***Basic table operations***/
addTable (tName) = ...
alterTable (tName, colName) =
PRE
tName : tableName & colName : COLUMN
THEN
IF (tName |-> colName) /: tableColumns
THEN
tableColumns := tableColumns \/ {tName |-> colName}
ELSE
tableColumns := tableColumns - {tName |-> colName}
END
tableColumns := tableColumns \/ {tName |-> colName}
END ;
updateTable (tName) = ...
removeTable (tName) = ...
15
16
17
18
19
/***Basic column operations***/
add_id_Column (colName , tName , type) =
addColumn (colName , tName , type)= ...
removeColumn (tName, colName ) = ...
20
21
22
23
24
25
/***Basic key operations***/
add_pk_Key (colName, tName) = ...
remove_pk_Key (tName) = ...
add_fk_Key (colName, tName1, tName2) = ...
remove_fk_Key (tName) = ...
26
27
28
29
/***Basic value operations***/
setValue ( tName , colName , initialValue ) =...
removeValue ( tName ) =...
30
Table 6.6: Part of the formalization of SQL basic operations
The other aspect of SQL metamodel that needs to be formalized in B is the basic behavior properties which are used to interpret our proposed abstract evolution
operations. Table 6.6 shows SQL basic operations formalized in B. These abstract operations consist of substitution statements, acting on variables representing SQL data
structure. For example, basic operations on tables include alterTable (lines 3-12),
which adds a name to the existing set of table names and removeTable, which performs the reverse substitution. Similar operations are defined to manipulate columns,
primary and foreign keys and to set or remove table values. Other auxiliary operations are defined to facilitate the implementation of data model evolution operations
in SQL. For example, getColumns(tName) operation returns a sequence of column
122
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/***SQL metamodel auxiliary operations***/
allTablecolumns <-- getColumns(tName ) = ...
fkOwngTables <--getForgnKeyTables(tName) =
PRE
tName : tableName
THEN
ANY tables, result WHERE
tables <: TABLE &
tables = { ta | ta : TABLE & ta : dom(foreignKey) &
#co.(co : COLUMN & co |-> tName : foreignKey(ta))} &
result : iseq(TABLE) &
ran(result) = tables
THEN
fk_owng_Tables := result
END
END;
Table 6.7: Part of the formalization of SQL auxiliary operations
names specified for a particular table. Operation getForgnKeyTables(tName) (lines
33-44) return a sequence of tables names which own foreign keys referring to a specified table name.
6.2.2
Linking data model state to SQL state
To generate SQL implementation that faithfully represents our abstract evolution
operations, we need to map our data metamodel concepts into corresponding SQL
concepts. Since our main focus is on the interpretation of evolution operations in
SQL, we will assume a simple mapping of the data structures in the two domains.
In our framework, the mapping between the two domains is performed using linking
invariants. The main purpose of these invariants is to precisely describe how a concept
in the object model domain (e.g. inheritedAttributes) is defined in a relational
model. Figure 6.9 provides an overview of how Object-to-Relational model concepts
are mapped to corresponding SQL model concepts. The table provides informal
description of the mapping rules which are formally characterized, in part, in Table
6.8. To facilitate the specification of linking invariant, we introduce a number of
mapping functions (e.g. propertyToColumn), each taking an object domain concept
and returning a corresponding relational concept. Below we present examples of
linking invariants. Complete characterization of these invariants can be found in
Data Migration implementation machine in Appendix C.
In our SQL implementation, we require that each class or association in the data
123
Object Relational Model
Concepts
SQL Model Concepts
className
Table
associationName
Table
ownedAttributes
Column
inheritedAttributes
Coumn
ownedAssociationEnd
Column and foreign key
Brief Description
Each non-abstract class and association is
refined into a relational table.
Attributes owned by a class or inherited
from class superclasses are refined into
columns in the table corresponding to the
class.
inheritedAssociationEnd Column and foreign key
Same as owned and inherited attributes
with the addition of foreign key definition
pointing to the table corresponding to
association end type.
classKey
extent
Id Column and primary key
tuple
value
tuple
Each data model value is mapped to a
tuple sql value using two mapping
functions : one to map property to
corresponding columns and one to map
actual values to corresponding values.
link
tuple
Each data model link is mapped to a tuple
where tuple table is mapped to
association name and tuple column values
representing table foreign keys.
The extent of a class is mapped to a table
tuple. Extent object ids are mapped to
values representing the table id columns.
Figure 6.9: Overview of linking invariants relating Object-to-Relational to SQL
model is represented as a separate table in the SQL model. The linking invariant
(table 6.8, line 2-5) characterizes this fact. This linking invariant uses classToTable:
a mapping function that takes a class a returns its corresponding table.
In Object-to-Relation model, inheritedAttributes are defined as a relation between CLASS set and PROPERTY set. In SQL model, the attributes are mapped to
columns in all tables corresponding to subclasses of attribute owning class. In characterizing this linking invariant we use three mapping functions : classToTable,
propertyToColumn and tableHierarchY which returns a sequence of tables corresponding to subclasses of a particular superclass.
Object-to-Relation association ends are mapped to SQL foreign keys (lines 1423). This is characterized by stating that each association name is mapped to an
association table (using assoToTable mapping function) while each association end
is mapped to a foreign key owned by a table corresponding to association end owning
class.
Finally, the last linking invariant in Table 6.8 relates data model values to corresponding SQL tuple values. This is done by requiring that such a value exists in
124
1
2
3
4
5
/* mapping classes to tables */
tableName <: classToTable [className ] &
! cl. (cl:className & cl:dom (classKey) => cl:dom(classToTable)) &
dom ( tuple ) =
classToTable[dom(extent)] \/ assoToTable [dom(link)]
&
6
7
8
9
10
11
/*mapping inherited attributes to columns */
inheritedAttributes = { cc, att | cc : CLASS &
att : PROPERTY &
classToTable(cc) : ran(tableHierarchy(cc)) &
propertyToColumn(att) : tableColumns[{classToTable(cc)}] } &
12
13
14
15
16
17
18
19
20
21
22
23
/* mapping association ends to foreign keys */
! ( assoName , me1 , me2 ) . ( assoName : dom ( memberEnds ) &
me1 : dom (memberEnds (assoName)) &
me2 : ran (memberEnds (assoName)) =>
assoToTable (assoName) : dom ( foreignKey) &
# ff . (ff : foreignKey [{assoToTable (assoName)}] &
propertyToColumn (me1) |-> classToTable (owningClass(me1)):ff) &
# ff . (ff : foreignKey [{assoToTable(assoName)}] &
propertyToColumn (me2) |-> classToTable (owningClass(me2)):ff) &
propertyToColumn (me1) |-> classToTable (owningClass (me1)) :
union (foreignKey [{assoToTable (assoName)}])) &
24
25
26
27
28
29
/* mapping property values to column values */
! (pp , oid ) . (pp : dom (value) &
oid : dom (value (pp)) =>
mapSqlValue ((value(pp)(oid))) =
union (tuple[classToTable [propertyClass[{pp}]]])(propertyToColumn(pp)))
Table 6.8: Part of linking invariants relating Object-to-Relational to SQL
tuples of all tables corresponding to class owning the property defining such value.
6.2.3
Example
The data model of our running example, which has been refined into the model shown
in Figure 6.6 can be refined into the SQL model shown in Figure 6.10. In particular,
each (non-abstract) class or association in the refined data model is refined into a relational table. In our example, the abstract class Person did not have a corresponding
table. Other classes such as Employee and Department have corresponding tables, as
shown in Figure 6.10.
In addition, data model associations such as info relating Employee and PersonalFile classes, have been refined into separate tables, regardless of the multiplicities of
125
Department
Department
1
1
worksFor
classKey : NAT
location : String
dept
0..*
assignment
0..1
worksFor
d1
d1
d2
d2
d2
d2
assignment
manages
0..*
Freelance
classKey : NAT
name : String
rate : NAT
location
loc1
loc2
worksFor
assignment
emps
did
d1
d2
emps
Employee
1
head
manages
emps
e1
f1
e2
f2
e3
f3
head
e3
dept
d2
classKey : NAT
name : String
salary : NAT
1
Freelance
emp
fid
f1
f2
f3
info
1
Employee
name
r1
r2
r3
rate
20
210
400
eid
e1
e2
e3
name
n1
n2
n3
salary
2000
2100
400
file
PersonalFile
info
PersonalFile
pid
p1
p2
p3
classKey : NAT
status : String
status
s
m
s
emp
e1
e2
e3
file
p1
p2
p3
Figure 6.10: Mapping to SQL
their association ends. Each of the association tables has an id column on which a
primary key is defined. For example, table Department has departmentID as the id
column and primary key and table PersonalInfo has personalInfoID as an id column
and a primary key. All data model attributes (owned or inherited) are refined into
columns in tables corresponding to their respective owning classes. For example, in
table Department, the location column corresponds to a similarly named attribute in
class Department in the data model. In table Employee, the name column corresponds
to similarly named attribute inherited from Person class.
Data model association ends (owned or inherited) are also refined into columns
in tables corresponding to their respective owning classes. The difference here is that
on such columns we define foreign keys pointing to the primary key of the tables
corresponding to the association end type class. For example, in table assignment,
the worksFor column points to the primary key of the Department table. Similarly,
in table info, the column file points to the primary key of PersonalFile table.
6.2.4
Implementation of evolution operations in SQL
As described in Chapter 2, an implementation is a particular type of B machine, typically used to refine abstract specifications or other refinement machines into more
concrete representations, close to to a specific language of executable code. As one of
our main goals in this chapter is to generate executable data migration programs that
126
correspond to our abstract evolution specifications (and their refined specifications
in Object-to-Relational model), using B-method implementation mechanism seems a
natural step. Indeed, the formalization of SQL into an abstract machine and the characterization of linking invariants between data model and SQL model states makes
us ready to start on this implementation step.
In this implementation, we will perform the second refinement step outlined in
Figure 6.1. To perform data structure refinement, we use the linking invariants described in Section 6.2.2. These invariants are essential for verifying the correctness of
implementation. To perform substitution refinement, we describe how our abstract
evolution operations may be implemented using specific SQL operations. The correctness of this implementation will be determined by discharging implementation
proof obligations which are similar to those generated for Refinement. Appendix D
reports on our proof activity.
Class addition and deletion
The operation addClass can be implemented in the SQL model as shown in Table
6.9. We add a table with a name corresponding to the class name. We then add an
id column and designate it as the primary key of the new table. If the added class is
a subclass, we need to collect columns of all tables corresponding to its superclasses.
This is done by an implementation loop which uses a condition defined based on
the size of inh_Columns variable. This variable is populated by columnHierarchy:
a mapping function that takes a class and returns a sequence of columns. Inside
the loop, for each column, identified by curr_column local variable, we perform a
sequence of two steps: we first alter the new table and then add each identified
column. To verify loop correctness, the predicate in loop INVARIANT clause (lines
23-26) asserts that the column id in which the primary key of the new table was
defined, union the the set of columns added by the loop must always be a subset of
the columns of the new table.
The implementation of deleteClass requires defining two implementation loops.
The first loop will de-link foreign keys pointing to the table corresponding to the
class being deleted. We use getForgnKeyTables query operation to get a sequence
of tables owning these foreign keys. We then loop through the returned sequence of
tables to remove the columns on which the foreign keys were defined. The second loop
of deleteClass operation (part of which is shown in Table 6.10) is a loop through the
columns of the table corresponding to the class to be deleted. Also here, we introduce
an SQL query operation getColumns that takes a table name and returns a sequence
127
1
2
3
4
5
6
7
8
9
10
11
addClass (cName , isAbst , super) =
BEGIN
VAR
tName, colName, colType, counter, curr_column
IN
tName := classToTable(cName);
colName := tName;
colType := mapClassKey(cName);
inh_Columns := columnHierarchy (super);
counter := 1 ;
curr_column := inh_Columns (counter);
12
addTable (tName);
add_id_Column (colName,tName,colType);
add_pk_Key (colName,tName);
13
14
15
16
WHILE
counter <= size (inh_Columns)
DO
alterTable (tName, curr_column);
addColumn (colName,tName, colType);
counter := counter + 1
INVARIANT
primaryKey [{tName}] \/
propertyToColumn [inh_Columns [1..counter]] <:
tableColumns[{tName}]
VARIANT
card (inh_Columns) - counter
END
17
18
19
20
21
22
23
24
25
26
27
28
29
END
30
31
END;
Table 6.9: Implementation of class addition
of columns. We iterate through the returned columns and, in every pass of the loop,
we alter the table by removing the column. The loop invariant is self-explanatory.
After the second loop is over, we remove the primary key of the table corresponding
to the deleted class and delete the entire table. The complete description of the
implementation of this operation can be seen in Appendix C.
Attribute addition and deletion
The operation addAttribute can be implemented, as shown in table 6.11, by altering
the table that corresponds to the owning class of the new attribute. We then add a
128
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
...
VAR
allTableColumns, count_column
IN
allTableColumns := getTableColumns(classToTable(cName));
count_column
:= 1;
WHILE
count_column < size(allTableColumns)
DO
VAR curr_column
IN
curr_column := allTableColumns (count_column);
alterTable (tName);
removeColumn (tName, curr_column);
count_column := count_column +1
END
INVARIANT
!(ta, col) . (ta : dom(tableColumns) &
col : tableColumns[{ta}] =>
col |-> ta : parentTable)
VARIANT
card (allTableColumns) - count_column
END
END
Table 6.10: Part of implementation of class deletion
column with a name and a type corresponding to the name and type of the added
attribute. In this implementation, the type mapping is done using sqlType function,
defined as an injection from TYPE (the given set of all types in the data model) to
Sql_TYPE (the given set of all types in the SQL model).
The addAttribute operation may take an expression (denoted by exp in
addAttribute operation parameters). This is used to describe an initial value of
the added attribute. To be able to instantiate the added attribute with the value
of such expression, we need to translate it into an equivalent SQL value expression.
This is done using translate function (line 9). Translated value is used to initialize
initialValue local variable, we then use setValue operation to instantiate table
tuples accordingly (line 13).
If the attribute being added belongs to a superclass, we need to map the added
attribute to all tables in SQL model which correspond to subclasses of attribute
owning class in the data model. This can be achieved using a WHILE loop construct,
which captures the tables in SQL model corresponding to subclasses of a particular
129
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
addAttribute (cName,attrName,type,exp) =
BEGIN
VAR
tName , colName , colType , initialValue,...
IN
tName := classToTable (cName) ;
colName := propertyToColumn ( attrName ) ;
colType := sqlType ( type ) ;
initialValue := translate(exp);
alterTable (tName , colName) ;
addColumn (colName , tName , colType) ;
updateTable (tName);
setValue(colName,initialValue)
...
END
END ;
Table 6.11: Part of implementation of attribute addition
superclass in the data model. The loop terminates when the new attribute is mapped
to a column in every table corresponding to a subclass of cName in the data model.
This loop implementation can be seen in Appendix C. The correctness of this loop is
demonstrated in Appendix D as part of the Proof Activity.
The implementation of deleteAttribute follows the inverse of the same implementation pattern described above for addAttribute operation, and can be seen in
Appendix C.
Association addition and deletion
The addAssociation operation creates a table to store the links between objects
of the source and target classes participating in an association relation. An important consideration in our implementation strategy is to store relationship information
such as association, seprately from the objects that participate in such a relation.
Accordingly, in the implementation of this operation, we create a table (with an id
column and a primary key) and a pair of columns corresponding to the two association
ends (denoted by srcProp and tgtProp, in addAssociation operation parameters).
As with addAttribute, this implementation must be followed by an updateTable
and setValue operations that inserts the values of the expression exp, passed as a
parameter of the operation.
If one of the association ends introduced in addAssociation operation belongs
130
1
2
3
4
5
6
7
8
9
10
11
12
13
addAssociation ( assoName , srcClass , srcProp ,
tgtClass , tgtProp , isComp , exp ) =
...
inh_Tables_src := tableHierarchy (srcClass)
...
VAR counter IN
counter := 1 ;
WHILE
counter <= size (inh_Tables_src)
DO
VAR current_table IN
current_table := inh_Tables_src(counter) ;
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
alterTable
addColumn
add_fk_Key
updateTable
setValue
counter
(current_table, colName) ;
(firstColumnName,current_table,firstColumnType);
(firstColumnName,current_table,tgtTable);
(current_table) ;
(current_table,firstColumnName,initialValue)
:= counter + 1
END
INVARIANT
inh_Tables_src = tableHierarchy(srcClass) &
...
VARIANT size (inh_Tables_src)-counter
END
END
...
29
Table 6.12: Part of implementation of association addition
to a superclass, we use a WHILE loop construct, to update tables corresponding to
subclasses of that superclass. Table 6.12 shows part of the WHILE loop used to update
inherited tables corresponding to subclasses of the association source class, denoted
by inh_Tables_src.
Applying tableHierarchy mapping function on srcClass parameter, we get a
sequence of tables corresponding to subclasses of the source class (line 5). We use
WHILE loop to iterate through this sequence of tables. In every loop pass (linex 12-20),
we alter each table in the sequence by adding a column corresponding to the source
property, define a foreign key on the added column and set initial value if a default
value expression is defined. Loop correctness can be verified base on the invariant
predicate, part of which is stated in line 24. For example, the loop should not add or
131
remove any of the inherited tables (i.e. those tables should remain the same before,
while and after the loop executes).
A similar WHILE loop is defined to update tables corresponding to subclasses of
the association target class. The complete description of addAssociation operation
and the implementation of deleteAssociation operation can be found in Appendix
C.
6.3
Generating SQL data migration programs
Having instantiated SQL metamodel with particular values, based on the interpretation realized by our B implementation mechanism, we may define corresponding
textual representations that are close to SQL instantiated metamodel. These textual
representations may take the form of templates that can be populated from values in
the SQL metamodel. Below, we show how such SQL textual representation templates
can be defined for each implemented evolution operation.
Here, the basic idea is to to apply a set of pre-defined textual templates over
SQL metamodel instantiated in the previous refinement step. Essentially, in this step
we follow a template-based generation of the required SQL statements required to
build a data migration program. Templates are grouped depending on the kind of
model evolution operation (addClass , deleteAttribute, etc.). The variable parts of
templates are defined between angle brackets. These variable parts are instantiated
with the concrete information from the given SQL metamodel to obtain the final
statements. These statements constitute a data migration program 1 .
Given an SQL metamodel properly instantiated with the interpretation of addClass
evolution operation outlined in the B implementation above, the following template
can be used to generate corresponding SQL statements:
CREATE TABLE <tName> (
<tName_id> INT NOT NULL,
PRIMARY KEY (<tName_id>)
<columnName 1><columnType 1><columnConstraint 1>
,...,
<columnName N><columnType N><columnConstraint N>
);
1
as SQL could have different dialicate implemented by different RDBMS, the above templates
have been instantiated and tested on Apache Derby [11], which is part of Eclipse Data Tool Platform
[79].
132
where columnName 1,...,columnName N are column names obtained from the WHILE
loop in B implementation of addClass that collected columns defined by the superclasses of the class; <columnType 1>,...,<columnType N> refer to the SQL type of corresponding columns, properly mapped during B implementation; <columnConstraint1>,...,
<columnConstraintn> refer to SQL constraints (e.g. UNIQUE, NOT NULL) obtained
from the instantiated SQL abstract machine variables of the respective columns.
Similar to the way we defined SQL textual template for addClass operation, SQL
template corresponding to deleteClass operation is defined below. The representation of this template corresponds to B implementation steps used for deleteClass
operation. As such, we are able to use the values of SQL variables updated during
the implementation to instantiate variable elements of the proposed template:
ALTER TABLE <FK_OWNING_TABLE 1>;
DROP CONSTRAINT<FK1>; DROP COLUMN <FK_COLUMN 1>;
...;
ALTER TABLE <FK_OWNING_TABLE N>;
DROP CONSTRAINT <FK N>; DROP COLUMN <FK_COLUMN N>;
ALTER TABLE <tName>;
DROP COLUMN <COLUMN 1>;
...;
ALTER TABLE <tName>;
DROP COLUMN <COLUMN N>;
ALTER TABLE <tName>;
DROP CONSTRAINT <PK>; DROP COLUMN <id_COLUMN 1>;
DROP TABLE < tName>;
In the above template <FK_OWNING_TABLE 1> <FK_OWNING_TABLE N> refer to tables
owning foreign keys pointing to the table to be deleted. Here, we use the values
collected by the WHILE loop in B implementation of deleteClass operation and instantiate this template for all these tables and for the foreign keys that need to be
removed, represented by parameters <FK 1>,...,<FK N>. Although SQL syntax allows
columns on which these foreign keys are defined to remain in the table after foreign
keys deletion, we take the position that such columns are subsequently deleted. Since
133
our ultimate goal here is to delete the table to which these foreign keys were pointing,
values stored in the these columns will loose its semantic validity.
In the subsequent statements of the above template, ALTER TABLE<tName> refers
to the table corresponding to the class to be deleted. Before this table can be deleted,
we first need to delete all of its defined columns, represented by <COLUMN 1>,...,
<COLUMN N> parameters. In addition to dropping all defined columns, we also need
to drop the primary key of the table and the id column on which the primary key
was defined, before we are, finally, able to drop the table itself.
The SQL template of addAttribute operation can be specified as a pair of updates: first to the schema, and then to the rows of the table in question. In addition,
here, we must consider not only altering the table where the corresponding column
needs to be defined but also all other tables that correspond to the column owning
table. The identification of these tables (corresponding to subclasses of attribute
owning class) has been done during B implementation. These tables now instantiate proper elements of the SQL metamodel and can be used to instatntaie variable
parameters of the template below:
ALTER TABLE <tName>;
ADD COLUMN <columnName><columnType>;
UPDATE <tName> SET [tuple = exp];
ALTER TABLE <tName 1>;
ADD COLUMN
<columnName 1><columnType 1>;
UPDATE <tName 1>
SET [tuple = exp];
...;
ALTER TABLE <tName N>;
ADD COLUMN
<columnName N ><columnType N>;
UPDATE <tName N>
SET[tuple = exp];
In the first part of the above template, we modify the table corresponding to
added attribute owning class; by adding column name and type corresponding to the
added attribute name and type. We then use an update statement to update specific
table rows using a translated default value expression. In the second part of the template, we add the new column and update table values, following a similar pattern,
for tables identified as corresponding to subclasses of the new attribute owning class,
134
as outlined above.
The SQL template of deleteAttribute operation is similarly defined. It represents a modification to the table corresponding to the owning class of the attribute to
be deleted and a modification to all tables corresponding to subclasses of that class.
Here, since the entire column is deleted , we do not need to apply any update on
values stored by the deleted column. The template for deleteAttrribute is shown
below:
ALTER TABLE <tName>;
DROP COLUMN <COLUMN>;
ALTER TABLE <tName 1>; DROP COLUMN
...;
<COLUMN 1 >;
ALTER TABLE <tName N>; DROP COLUMN
<COLUMN N >;
As explained in the B implementation of addAssociation, an important consideration in our implementation strategy is that we create a table corresponding to
every named association in the data model, regardless of the multiplicity of its two
association ends. Similar to tables generated for non-abstract classes, this table will
have an id column and a primary key. However, given that we only allow binary
associations, such table will always consist of two columns, each representing one
association end. In addition, a foreign key constraint will be defined on each of these
two columns, pointing to the primary key of the table corresponding to the type of
the original association end. Accordingly, SQL textual template for representing this
operation takes the form specified below:
CREATE TABLE <tName> (
<tName_id> INT NOT NULL,
PRIMARY KEY (<tName_id>)
);
ALTER TABLE <tName>;
ADD COLUMN <firstColumnName><firstColumnType>,
Foreign Key <firstColumnName> REFERENCES <tgtTable>.(<tgtTable_id>);
UPDATE <tName> SET [tuple = exp];
ADD COLUMN
<secondColumnName><secondColumnType>,
135
Foreign Key <secondColumnName> REFERENCES <srcTable>.(<srcTable_id>);
UPDATE <tName>
SET [tuple = exp];
ALTER TABLE <tName 1>;
ADD COLUMN <columnName 1>< columnType 1>;
Foreign Key <firstColumnName> REFERENCES <tgtTable>.(<tgtTable_id>);
UPDATE < tName 1 >
...;
SET [tuple = exp];
ALTER TABLE <tName N>;
ADD COLUMN
<columnName 1><columnType 1>;
Foreign Key <firstColumnName> REFERENCES <tgtTable>.(<tgtTable_id>);
UPDATE <tName N> SET [tuple = exp];
ALTER TABLE <tName 1>;
ADD COLUMN <columnName 2 ><columnType 2>;
Foreign Key <secondColumnName> REFERENCES <srcTable>.(<srcTable_id>);
UPDATE <tName 1> SET [tuple = exp];
...;
ALTER TABLE <tName N>;
ADD COLUMN <columnName 2>< columnType 2>;
Foreign Key <secondColumnName> REFERENCES <srcTable>.(<srcTable_id>);
UPDATE < tName N>
SET [tuple = exp]
The first part of the template creates an association table together with an id column and a primary key. Subsequently, this table is altered to have the two columns
corresponding to the two association ends, each with a foreign key definition. The
UPDATE...SET statement is used to update table extension with default values corresponding to the translated expression input parameter. In subsequent parts of the
template, we alter tables belonging to table hierarchy of the source or target table
(i.e. those tables corresponding to classes related to source or target class of the association in the data model through a subclass relationship) following a pattern similar
to the pattern we used in altering the association table.
The SQL template for deleteAssociation operation takes the form specified
below:
ALTER TABLE <tName>;
DROP CONSTRAINT <FK1> DROP COLUMN <firstColumnName>;
136
DROP CONSTRAINT <FK2> DROP COLUMN <secondColumnName>;
DROP CONSTRAINT <PK> DROP COLUMN <id_COLUMN>;
DROP TABLE <tName>
In the above template, we start by altering the table corresponding to the named association. To be able to drop the two columns corresponding to the association ends,
we first drop the foreign key constraints defined on these columns. We then drop the
primary key constraint defined on the id column, before we drop the association table
itself.
Compound evolution operations such as inlineClass can be implemented as a
sequence of addAttribute operations one for each of the attributes of the target
class, followed by an update to insert the appropriate values, then a delete to target
class. Since the source class and target class tables are not directly linked, we need
to perform an inner join to the table corresponding to the association relationship
between the two classes to select appropriate records from the target class table
columns before it is dropped:
ALTER TABLE (<srcClass>),
ADD COLUMN <srcClass_refClassAttribute>
UPDATE <srcClass>
SET <srcClass> . <refClassAttribute> =
(SELECT <refClass> . <refClassAttribute>
FROM <refClass>
INNER JOIN <assoName>
ON <assoName.srcProp> = <refClass>.<refClass_idColumn>
INNER JOIN <srcClass>
ON <assoName.tgtProp = <srcClass>.<srcClass_idColumn>
<deleteClass <refClass> template>>
Example
Within the context of our running example and following the SQL template mapping
rules outlined above, we could generate SQL executable code to migrate data persisted
under data model version 1 and its conformant instance model, represented by Figure
4.5 and Figure 4.6 respectively so that the migrated data can be persisted under model
137
version 2, represented by Figure 4.10. The corresponding SQL data migration code
is shown below:
1
--SQL code generated for inlineClass(Employee, employee, PersonalFile);
2
3
4
alter table Employee add column maritalStatus char (32);
alter table Employee add column location char (32);
5
6
7
8
9
10
11
12
13
update Employee
set maritalStatus =
( SELECT p. maritalStatus
FROM Personalfile p
INNER JOIN info i
on i.info = p.personalfile_id
INNER JOIN employee e
on i.employee = e.employee_id);
14
15
16
17
18
19
--similarly for updating location column
...
alter table PersonalFile drop column maritalStatus;
alter table PersonalFile drop column location;
drop table PersonalFile;
20
21
22
23
--SQL code generated for addAttribute(Employee, seniority: String
[if self.age > 50 then ’senior’ else ’junior’]) ;
24
25
26
27
28
alter table Employee add column seniority char (32);
update Employee
set seniority = CASE
WHEN age = 50 then ’senior’ else ’junior’ end
29
30
31
--SQL code generated for modifyAssociation(Department, manager, Employee, 1,1) ;
-- user input is required : provide values for null entries
32
33
34
alter table manages
alter column head not null
35
36
--SQL code generated for extractClass(Department, Project, projects)
37
38
39
40
41
42
43
44
45
create table PROJECT (
project_id int not null GENERATED ALWAYS AS IDENTITY,
primary key (project_id),
location varchar(32)
);
insert into PROJECT (LOCATION)
select LOCATION
from DEPARTMENT d
138
46
where d.PROJECT is not null
47
1
2
3
4
5
6
7
8
create table projects (
projects_id int not null GENERATED ALWAYS AS IDENTITY,
primary key (projects_id),
department int,
foreign key (department) references department(department_id),
project int,
foreign key (project) references project(project_id)
);
9
10
11
12
13
14
-- updating the association table
insert into PROJECTs(department,project)
select department_id,project_id
from DEPARTMENT d, COMGTSYSTEM_V1.project p
where p.location = d.location
The generated code above is an instantiation of SQL templates which, in turn,
instantiate our B Data Migration implementation. As a direct result of following B
refinement (and implementation) mechanism, we can guarantee that the code above,
representing the SQL interpretation of an evolution model preserves the consistency
constraints at the object model level. Hence, it ensures that corresponding data will
be migrated in a way that preserves the invariants of the new data model .
We may observe that the SQL implementation of modifyAssociation (lines 3034) would generate an error : the column named head in table manages can not be
constrained into a non-nullable value unless all entries in the table meet this new
constraint. This requires user input to assign existing null-valued records appropriate values. Such an error should not come as a surprise to the designer, as it was
highlighted early while providing the specifications of the evolution at the data model
level (please see end of Section 4.4).
In practice, non-trivial preconditions for the migration are more likely to arise out
of changes to multiplicities, to value types or ranges, or to constraints associated with
model elements, then it is entirely possible that the precondition for the evolution
sequence would not hold for the existing data. An important value of our approach
is that such errors can be highlighted at an early design stage prior to committing
implementation resources.
139
Chapter 7
Discussion
The potential impact of Model-Driven Engineering (MDE) is considerable: the ability
to generate an implementation directly from an abstract model can greatly contribute
to system development efficiency. Within the domain of Information Systems (IS),
the value of this approach is greatly increased if data held in an existing version of a
system can be migrated automatically to a subsequent version.
However, existing work on model-driven approaches has been focused upon the
model: addressing the design of domain-specific modeling languages, the development
of model transformation techniques, and the generation of implementation code. Little work has been done formalizing and automating the process of data migration. To
be able to produce a new version of the system, quickly and easily, simply by changing
the model, has significant benefits for developers and customers alike. These benefits
are sharply reduced if changes to the model may have unpredictable consequences for
the existing data.
In this dissertation we have outlined a possible solution: capturing the changes
to data models using a language of model operations, mapping each operation to a
formal specification of the corresponding data transformation, checking operations
for consistency with respect to the model semantics and the existing data, and-for
a specific platform-automating the process of implementation. By decoupling data
instances and data models, our approach reduces the complexity of data migration
activity whilst also providing an integrated standard metamodel-based development
method, which is aligned with the MDA paradigm.
7.1
Research contributions
The contributions of this dissertation can be summarized into the following items:
141
7.1.1
Modeling evolution
To define a data model evolution language, we need to identify a modeling language
and the model elements that can be evolved. Accordingly, our first step towards
defining our proposed evolution language was to characterize a data metamodel as
a subset of UML metamodel. Inspired by UML classification, we differentiated between two abstraction levels in our data metamodel : one level is used to describe
model-level concepts and the other level is used to describe instance concepts. This
differentiation allowed us the possibility to define rules of consistency between the
two abstraction levels, from syntactic and semantic perspectives. We then showed
how a model evolution language (in the form of a metamodel) may be derived for our
characterized UML subset. Our main focus next was on the precise definition of the
evolution language we have derived.
7.1.2
Precise modeling of data model evolution
In this dissertation, we showed how information systems evolution challenges can be
addressed through a formal, model-driven approach. Since our main motivation is to
develop a model-driven approach, one challenge we initially faced when we started
using B was our inability to integrate B into a model-driven development process.
The main reason of this difficulty was the lack of a B metamodel, which can be used
to generate B models. We consider the B metamodel we described in Chapter 3 an
important contribution that enables the integration of B into a model-driven chain
of development.
As reported in [4] and explained in Chapter 5 of this dissertation, we used Bmethod Abstract Machine Notation (AMN) to assign semantics to data model elements. An essential aspect of this formal semantics was the characterization of
consistency conditions at the syntactic and the semantic levels of abstraction. This
characterization was important because it gave us a notion of correctness for our model
evolution operations. With this formalization, we were able to verify data model consistency utilizing a machine aided method using theorem-provers of B-method. To
give precise semantics to our model evolution operations, we used B-method Generalized Substitution Language (GSL). This formalization gave us the ability to precisely
specify both our primitive and compound model edits.
142
7.1.3
Predicting consequences of data model evolutionary
changes
Our proposed language of model evolution, properly mapped to a formal semantics
in B-method can be used for the analysis of proposed evolutionary changes to data
models.
One of the most important consequences to consider is consistency preservation:
whether the evolution would violate data model consistency. In this work, we present
consistency-preservation arguments in terms of a number of metamodel consistency
constraints that may be violated during a data model evolution. Using B-method
proof tool, we were able to identify what constraints can possibly be violated by each
individual model edit. Subsequently, we modified the preconditions or the substitution of these edits to avoid any consistency violation. As a guarantee of consistencypreservation, all of our model edits have been proved to preserve the stated metamodel
consistency constraints.
In addition, as we reported in [5], using our formalization and the B proof tool,
we showed that an evolved data model could be a refactoring of a source data model,
provided that one essential criteria is met : behavior preservation. As explained
in Chapter 5, this criteria means that changes to the source model only affect data
structure with out changing any behavior properties in the model. Since such changes
account for many kinds of data model evolutionary changes that often claimed to be
refactoring without a formal argument [10], we showed how the refinement proof of
B-method, applied on two versions of data models can be used to provide a formal
argument for such criteria.
Moreover, using the supporting framework of B-method, we showed how the applicability of a sequence of model evolution steps may be determined in advance,
and used to check that a proposed evolution will indeed model consistency and data
integrity. This applicability information can be fed back to system designers during
the modeling activity, allowing them to see in advance how the changes they are
proposing would affect the data in the existing system.
7.1.4
Generation of correct data migration programs
Using B-method refinement mechanism, we showed how operating two successive
transformations on our abstract evolution specifications can be translated into an
application in an executable language such as Structured Query Language (SQL),
which is the implementation language we have chosen for describing data migration
143
implementations. As a product of formal refinement, the generated code is guaranteed
to preserve various consistency and integrity notions represented by the data model.
Our refinement approach ensured that the generated SQL implementation preserves two kinds of constraints. First, constraints imposed by the data modeling language (e.g. inheritance constraints). Second, constraints defined by the data model
and used to impose business rules. In our framework, both kinds of constraints are
defined as invariants in the abstract data model. As the abstract model is formally
refined into a relational model and then to an SQL implementation, we are certain
that these invariants are preserved.
7.2
Genericity of the approach
In this section, we show that our approach can be generic in the sense that it can be
used to treat other classical data management problems such as data integration and
data warehousing, independent of the modeling environment or the database context.
This is possible because, as we explained in Section 2.1, similar modeling concepts
are used in most data modeling approaches, such as UML [124], ER [36], ORM [75].
Thus, an implementation that uses a representation of abstract models that includes
most of those concepts should be applicable to all such environments. In addition, in
this section we also show how our approach is aligned with some recent advances in
the database field that depend on conceptual representations such as Ontology-Based
Databases (OBD) and ontology evolution.
The basic idea underlying the genericity of our approach is to be able to use abstract models to describe main artifacts involved in the problem domain and manipulate these abstract models towards reaching a solution. This manipulation usually
requires an explicit representation of mappings, which describe how two models are
related to each other. This mapping representation can then be used to create a
model from another model, modify an existing model or generate code, similar to the
way we defined abstract evolution model between a source and a target data model
to guide data migration.
For the purpose of this section, the exact choice of model representation is not
important. However, there are several technical requirements on the representation
of abstract models, on which the genericity of our approach depends:
1. We expect the main artifacts of the problem domain to be represented in wellformed conceptual models. If this is not the case, a reverse engineering step
needs to be performed as a pre-requisite, using an approach like [30]. We also
144
require that the expressiveness of the representation of models to be comparable
to that of UML or ER models. That is, objects can have attributes (i.e., properties), and can be related by associations (i.e., relationships with no special
semantics).
2. We expect objects, properties and relationships to have types. Thus, there
are (at least) three meta-levels in the picture. Using conventional metamodel
terminology, we have: models; metamodels that consists of the type definitions
for the objects of models; and the meta-metamodel, which is the representation
language in which models and metamodels are expressed.
3. A model must contain a set of objects, each of which has an identity. By
requiring that objects have identity, we can define transformations between
models in terms of mappings between objects or combinations of objects.
Applications
Data integration.
Data integration is a fundamental task to enable the inter-
operability of information systems. Data integration involves combining data from
different sources and providing users with a view of this data combined together
[15]. Data integration is a complex and time-consuming method due to the fact that
it involves various distributed data sources and numerous stakeholders (e.g. data
proponents). In particular, data sources which are generally designed for different
purposes make the development of a unified data repository a challenging task. Several approaches have been proposed to overcome the challenges in the design of data
integration, see [135] for a survey.
To perform data integration, it has to be determined how to select, integrate and
transform the information stored in local data sources into a global data store. During
the formulation of the integration mapping, possible integration conflicts have to be
recognized and eliminated, like schema-level conflicts, e.g. different attribute ranges.
Our approach can be generalized to address schema-level conflicts concerning semantic and structural heterogeneity. If the schema of each data store participating
in the integration can be regarded as a model and described in a well-formed abstract representation, we can express the integration mapping between these abstract
models on the basis of our evolution metamodel, independent of the modeling language or modeling constructs involved. This abstract integration mapping can then
be used as a basis for generating a working implementation of data integration using
model-to-text transformation.
145
Integrated data store
G
mapping
mapping
transformation
integration
1
N
Data store 1
Data store N
Figure 7.1: Overview of data integration problem from the perspective of our approach
An overview of how our approach may be applied to data integration problem can
be seen in Figure 7.1. Assume that some local data schemata (Schema S1 and Schema
SN ) are to be integrated into a common global schema (Schema SG ). First, both
local and global schemata need to be represented in a common modeling language.
Afterwards, for each representation of a local schema, a mapping onto the global
schema is defined. Based on these mappings, the necessary data access and data
integration procedures can be generated using transformation techniques from modeldriven software domain, following the main principles of the approach we proposed
in this thesis.
With our approach, various target platforms can be supported. In contrast to
approaches tied to a specific data model such as [96] and [104], any format of data
transformation statements like SQL or Java can be generated by our method. Also,
various kinds of data sources (e.g. relational databases, semi-structured information
sources or flat files) can be integrated (assuming that these artifacts are represented
in abstract models and that our approach is interfaced to external schema matching
tools that support schema mapping).
Data warehousing.
Within data warehousing scenario, ETL (Extraction-
Transformation-Loading) processes are responsible for the extraction of data from
heterogeneous operational data sources, their transformation (conversion, cleaning,
normalization, etc.) and their loading into data warehouses [81]. As such, ETL
processes are a key component of data warehousing because incorrect or misleading
data will produce wrong business decisions, and therefore, a correct design of these
processes at early stages of a data warehousing project is absolutely necessary. Despite
the importance of designing the mapping of the data sources to the data warehousing
146
Data
mining
Reports
Ad-hoc
Analysis
ETL
Data warehouse repository
Data sources
Figure 7.2: Typical data warehouse architecture, based on [91]
repositories along with any necessary constraints and transformations, there are few
models that can be used by designers to this end [145, 159].
Figure 7.2 shows a typical data warehouse architecture based on [91]. Data from
the operational data stores may be specified in different schemas and have to be
extracted, transformed and loaded into a common data warehouse repository. In an
ETL process, the data extracted from a source systems passes through a sequence
of transformations before they are loaded into a data warehouse. The set of source
systems that contribute data to a data warehouse is likely to vary from standalone
spreadsheets to mainframe-based systems.
The model-driven approach we proposed in this thesis can be generalized to model
different aspects of a data warehouse architecture such as operational data sources,
the target data warehouse schema and ETL processes in an integrated manner by
using an abstract modeling notation. In particular, static structure concepts of our
proposed data modeling can be used to represent various aspects of source data stores
and target data warehouse repository at the conceptual level.
In addition, our abstract representation of evolutionary changes from a source
data model to a target data model may be used to model the relationship between a
schema representing a source operational data store and another schema representing
a data warehouse. More specifically, most common ETL processes such as those
mentioned in [103], can be mapped into evolutionary steps in our abstract evolution
representation, irrespective of the modeling notation employed. This can provide
147
the necessary mechanisms for an easy and quick specification of common operations
required in ETL processes.
During the integration process from data sources into the data warehouse, source
data may undergo a series of transformations, which may vary from simple algebraic
operations or aggregations to complex procedures. In our approach, the designer
can combine a long and complex transformation process into simple and small parts
represented by means of an evolution model that is a materialization of evolutionary
steps.
Therefore, our approach helps reduce the development time of a data warehouse,
facilitates managing data repositories, data warehouse administration, and allows the
designer to perform dependency analysis (i.e. to estimate the impact of a change in
the data sources on the global data warehouse schema).
Ontology evolution.
Gruber [72] characterizes ontology as the explicit specifica-
tion of a conceptualization of domain. While there are different kinds of ontologies,
they typically provide a shared/controlled vocabulary that is used to model a domain of interest using concepts with properties and relationships. In the recent past,
such ontologies have been increasingly used in different domains. In the database
area, ontologies have been used to facilitate a number of applications such as data
exchange and integration[53]. Furthermore, Ontology-Based Database (OBDS) is a
database in which an ontology and its instances are stored together [16]. Several OBDBs have been developed that use different approaches on how to store and manage
the ontologies and their instances, for example [48].
Ontologies are not static but are frequently evolved to incorporate the newest
knowledge of a domain or to adapt to changing application requirements. Ontology
providers usually do not know which applications/users utilize their ontology. Supporting different ontology versions is an important approach to provide stability for
ontology applications.
In Model-Driven Semantic Web (MDSW) [130], ontologies are represented as models derived from Ontology Definition Metamodel (ODM) [123]. The Ontology Definition Metamodel is an Object Management Group (OMG) specification to make
the concepts of Model-Driven Architecture (MDA) applicable to the engineering of
ontologies. This allows many problems of ontology engineering to be addressed by
model-driven engineering technologies.
In this section, we focus on the problem of ontology evolution and show how elements of the framework we proposed in this thesis can be used to address aspects
148
Model – Driven Architecture (MDA)
6. Validation
Ontology Definition Metamodel (ODM)
1. Capturing
5. Propagation
source ontology
model
target ontology
model
4. Implementation
2. Representation
3. Semantic of change
source instances
target instances
Figure 7.3: Overview of ontology evolution, from the perspective of our approach
of this problem. We analyze ontology evolution process based on our model-driven
approach, and present transformation-based conceptual framework for ontology evolution. The framework uses the basic ingredients of our approach : models and
transformations.
Figure 7.3 shows an overview of ontology evolution from the perspective of our
approach. Based on [150], ontology evolution process is comprised of six phases:
(1) change capturing, (2) change representation, (3) semantic of change, (4) change
implementation, (5) change propagation, (6) change validation.
The model-driven approach we proposed in this thesis can be generalized to support the main aspects of the above ontology evolution processes. Capturing changes
can typically be achieved using one of two approaches. In the first approach, ontology changes are explicitly specified and a new ontology version is provided. This is,
in essence, the model evolution approach we outlined in this thesis. In the second
approach, an ontology difference algorithm is used to determine an evolution mapping between two input ontology versions [118]. The result of this comparison can be
mapped to a set of ontology changes. Change representation mainly refers to the granularity of changes. Our approach can support a set of simple and complex ontology
changes. For example, many of ontology changes classified in [61] can be mapped to
our primitive and composite evolution operations. An important consideration in the
semantic of ontology change is to identify potential problems (inconsistencies) that
the intended changes can introduce within the ontology. For example, the deletion
of a concept C impacts its children and instances. Such a concern can be addressed
on the basis of our consistency preservation mechanism that we outlined in Chapter
4. When the ontology is modified, the propagation phase requires ontology instances
149
to be changed to preserve consistency with the ontology. In line with our proposed
induced migration approach, this can be achieved by using the change of the ontology
to guide the modification of ontology instances.
Within the framework of [150], our approach does not support change implementation or change validation phases. In the former phase, the user is presented with
a number of change implications and asked to choose one. The latter enables analysis of performed changes and undoing them at user’s request. The semantics of our
evolution operations are determined a priori and our approach does not include an
‘undo’ capability.
7.3
Comparison with related work
In model-driven engineering literature there many related and relevant approaches.
Model weaving [57, 56] can be used to establish semantic links between models. These
semantic links can be collected in a weaving model that is defined based on an extensible weaving metamodel and can be used by a model transformation tool as an
input to a model transformation process to translate source models into target models based upon weaving links. Model weaving is of an immediate relevance to our
work. A basic language of model operations can be formulated as an extension to the
weaving metamodel. At an early stage of our work, we have investigated this idea
in more details, as reported in [1]. However, following deeper investigation, the approach proved insufficient for our purposes as it does not support conditional mapping
between model elements; nor does it support constraint specifications.
While we might suppose that a data model evolution can be represented as a
sequence of model operations obtained automatically from a modeling tool, techniques
for the model comparison and difference representation such as [94], [70] and [80] can
also be relevant in the context of our work. In general, although model comparison and
difference representation approaches address a similar problem : structural evolution
of models, they tend to limit their scope to the structural elements of a model with
no consideration to integrity constraints or well-formdness rules. In addition, these
approaches are generic and do not focus on information system evolution specific
requirements such as data migration.
Metamodel evolution and model co-evolution approaches such as [78], [32], [160]
and [37] address the problem of metamodel evolution and model co-evolution i.e.
adapting (migrating) models that conform to an older version of a metamodel to a
newer version of the metamodel. [37], in close relation to our approach, proposed to
150
represent metamodel changes as difference models conforming to a difference metamodel to identify semi-automated countermeasures in order to co-evolve the corresponding models. While metamodel evolution approaches share the same aim of our
work: reduce manual migration activities, they operate on different abstraction level
(migration of M1 model elements upon M2 model changes) and focus mainly on structural evolution of generic metamodels where changes related to integrity constraints
of data models are neither classified nor explicitly considered.
The generation of B formal specifications from UML diagrams has been investigated by [98, 157, 99, 33]. In particular, [33] approach is supported by U2B where
only the insert operation is generated for associations. Operations on attributes are
not considered. Moreover, because the considered domain being reactive systems, the
semantics attached to UML diagrams is rather different from ours. [99] approach is
supported by ArgoUML + B tool. None of these two approaches includes generation of code. [98] has developed an approach, supported by the UML-RSDS tool, to
generate B, JAVA and SMV specifications from a subset of UML diagrams comprising class diagrams involving inheritance, state diagrams and constraints expressed in
OCL. The rules used to translate class diagrams are similar to ours. The generation
of SMV specifications permits to detect some intra- and inter-diagram inconsistencies. However no details are given about the formalism and the correctness of the
generated code. Although these approaches and others provided solid foundations for
showing how a formal method can be used to assign semantics to a graphical notation
or modeling language, contrary to our approach, the main focus in these approaches
was on the initial system development rather than system maintainance or evolution
and the essential requirement of moving software artifacts (such as data) from one
system version to the other.
7.4
7.4.1
Limitations and future work
Feedback generation
Following a proof-based approach during software design has been debated in literature . While some authors following this approach report positive experience in
the domain of database design [146], model-drive engineering [109] and in other domains [8]; others find such technique, neither feasible nor justifiable [66],[161] and
[63], considering the efforts and time involved. In our experience, using a proof-based
approach as part of the design process proved useful, for two main reasons. First, the
proof-based technique helped us complete specifications for our model edits. Due to
151
this activity, we can guarantee that a conformant source data model can be evolved,
using our model edits, into a conformant target data model. Second, given that our
approach is specified at metamodel level of design, the time and efforts involved can
be justified since we can generically use these specifications to address the evolution
of any model at a lower abstraction level that conforms to our data metamodel, i.e.
proofs are performed once and utilized many times.
However, the approach we presented in this dissertation is a one-way approach
: our model transformation rules translate from our data and evolution metamodels
to B not the other way around. When the type checker or the prover of Atelier
B finds an error in the specifications, the designer must be able to understand the
B specifications and then has to search in evolution model to find the error. Here,
we assumes that designers are familiar with such formal languages and are able to
interpret messages generated by a proof tool. This assumption may not be valid in
all cases as designers familiar with mathematical domains could be a minority . For
the proof tool to be of value to a designer, as a future work, we plan to complete the
feedback loop and complement our approach with another model transformation step
to interpret messages returned by the proof tool in terms of the UML. A similar idea
was reported [93] but was realized in a different context.
7.4.2
From data migration to model migration
In this dissertation, we have looked into the data migration problem from a data model
evolution perspective : how can we capture evolutionary changes on data models and
how to map those into corresponding migration rules to transform data instances
according to data model changes. However, not only may models evolve, but so also
may the languages in which the models are expressed. Like data models, modeling
languages are subject to evolution due to changing requirements, fixing errors and
keeping up with technological progress [59].
An important application of model-driven development techniques where metamodels play a central role is Domain-Specific Modeling (DSM) [92]. DSM requires
metamodels that allow a domain expert to capture key concepts of a domain of interest. However, these metamodels rarely defines the domain completely, and must
often be evolved. Metamodel evolution can affect model integrity. For example, when
a metamodel concept is removed, any models that use the removed concept no longer
conform to the metamodel. Due to metamodel evolution, existing models may no
longer conform to the evolved metamodel and thus need to be migrated to reestablish
conformance to the evolved metamodel.
152
An important future work item would be to show how our data model evolution
approach, set out in this dissertation, can be extended to include updates to metamodels and domain-specific languages used for data modeling language as well as to
updates to the data model (language evolution versus model evolution).
7.4.3
Predicting consequences of evolution on behavior properties
In this dissertation, we are concerned with changes that affect the form or validity
of the data in the system: we did not consider changes to behavioral properties.
Following an evolution, we may be interested in the question of whether particular
operations or workflows existed in the source model can still be triggered in the target
model and deliver the same goal. To be able to address this question, we need to be
able to compare the source model and the target model with respect to their behavior.
Information about the availability and effect of operations and worklows can be
drawn from UML state diagrams. An operation may be considered available when
suitably labeled transition from the current state is complete. i.e. the guard of the
operation is satisfied.
Our approach may be extended to address not only the consistency of data, but
also the applicability or availability of particular operations or workflows. If an operation or workflow is associated with a particular precondition, then we may map this
precondition to an additional constraint in AMN, and check to see whether its truth
or falsity would be affected by the proposed data model evolution.
7.5
Conclusion
As development in most information and data-intensive systems is not a one-off activity, these systems will need to be updated, repeatedly, in response to changes in
context and requirements. At each update, the data held within the old system must
be transferred to the new implementation. If changes have been made to the data
model, then careful consideration must be given to the transformation and representation of existing data—which may be of considerable value and importance to the
organization.
In this dissertation, we showed how to address the challenges of systems evolution
through a formal, model-driven approach. Using the Unified Modeling Language
(UML) as an example, we showed how a sequence of proposed changes to a system
can themselves be represented as a model. We show how such a model may be
153
used as the basis for the automatic generation of a corresponding data migration
function, in the standard Structured Query Language (SQL). We showed also how
a formal representation of the model in the Abstract Machine Notation (AMN) of
the B-Method allows us to check that this function is Applicable—that the migrated
data would fit within the constraints of the new system.
Our approach offers an opportunity to consider the question of information system
evolution and data migration, in detail, at the design stage. It demonstrated how
this may be achieved through the definition of formal semantics for an evolution
modeling language and allows us to verify that a proposed change is consistent with
representational and semantic constraints, in advance of implementation.
154
Bibliography
[1] Mohammed Aboulsamh and Jim Davies. Towards a model-driven approach to
information systems evolution. In William Song, Shenghua Xu, and Changxuan
Wan, editors, Information Systems Development, pages 269–280. Springer New
York, 2011.
[2] Mohammed A. Aboulsamh, Edward Crichton, Jim Davies, and James Welch.
Model-driven data migration. In Juan Trujillo, Gillian Dobbie, Hannu Kangassalo, Sven Hartmann, Markus Kirchberg, Matti Rossi, Iris Reinhartz-Berger,
Esteban Zimányi, and Flavius Frasincar, editors, ER Workshops, volume 6413
of Lecture Notes in Computer Science, pages 285–294. Springer, 2010.
[3] Mohammed A. Aboulsamh and Jim Davies. A metamodel-based approach to information systems evolution and data migration. In Jon Hall, Hermann Kaindl,
Luigi Lavazza, Georg Buchgeher, and Osamu Takaki, editors, ICSEA, pages
155–161. IEEE Computer Society, 2010.
[4] Mohammed A. Aboulsamh and Jim Davies. A formal modeling approach to
information systems evolution and data migration. In Terry A. Halpin, Selmin
Nurcan, John Krogstie, Pnina Soffer, Erik Proper, Rainer Schmidt, and Ilia
Bider, editors, BMMDS/EMMSAD, volume 81 of Lecture Notes in Business
Information Processing, pages 383–397. Springer, 2011.
[5] Mohammed A. Aboulsamh and Jim Davies. Specification and verification of
model-driven data migration. In Ladjel Bellatreche and Filipe Mota Pinto,
editors, MEDI, volume 6918 of Lecture Notes in Computer Science, pages 214–
225. Springer, 2011.
[6] Jean-Raymond Abrial. The B-book: assigning programs to meanings. Cambridge University Press, 1996.
156
[7] Aditya Agrawal. Graph rewriting and transformation (GReAT): A solution
for the model integrated computing (MIC) bottleneck. In Automated Software
Engineering (ASE), pages 364–368. IEEE Computer Society, 2003.
[8] Idir Aı̈t-Sadoune and Yamine Aı̈t Ameur. A proof based approach for modelling
and verifyingweb services compositions. In International Conference on Engineering of Complex Computer Systems (ICECCS), pages 1–10. IEEE Computer
Society, 2009.
[9] Scott Ambler. Agile Database Techniques. John Wiley and Sons , October 2003.
[10] Scott W. Ambler and Pramodkumar J. Sadalage. Refactoring Databases : Evolutionary Database Design (Addison Wesley Signature Series). Addison-Wesley
Professional, March 2006.
[11] Apache Derby. http://db.apache.org/derby/, Accessed February 2012.
[12] Paolo Arcaini, Angelo Gargantini, Elvinia Riccobene, and Patrizia Scandurra.
A model-driven process for engineering a toolset for a formal method. Software
Practical Experience, 41(2):155–166, 2011.
[13] Atelier B. Atelier proof tool, 2012.
[14] Jay Banerjee, Won Kim, Hyoung-Joo Kim, and Henry F. Korth. Semantics and
implementation of schema evolution in object-oriented databases. In Umeshwar Dayal and Irving L. Traiger, editors, SIGMOD Conference, pages 311–322.
ACM Press, 1987.
[15] C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Comput. Surv., 18(4):323–364,
December 1986.
[16] Ladjel Bellatreche, Yamine Aı̈t Ameur, and Chedlia Chakroun. A design
methodology of ontology based database applications. Logic Journal of the
IGPL, 19(5):648–665, 2011.
[17] Daniela Berardi, Diego Calvanese, and Giuseppe De Giacomo. Reasoning on
UML class diagrams. Artif. Intell., 168(1):70–118, October 2005.
[18] Jean Bézivin. On the unification power of models. Software and System Modeling, 4(2):171–188, 2005.
157
[19] Michael Blaha. Patterns of Data Modeling. CRC Press, Inc., Boca Raton, FL,
USA, 1st edition, 2010.
[20] Michael Blaha and William Premerlani. Object-oriented modeling and design
for database applications. Prentice-Hall, 1997.
[21] Marko Boger, Thorsten Sturm, and Per Fragemann. Refactoring browser for
UML. In Revised Papers from the International Conference NetObjectDays on
Objects, Components, Architectures, Services, and Applications for a Networked
World, pages 366–377, London, UK, UK, 2003. Springer-Verlag.
[22] Grady Booch, James Rumbaugh, and Ivar Jacobson. The Unified Modeling
Language User Guide(2nd Edition). Addison-Wesley Professional, 2005.
[23] Behzad Bordbar, Dirk Draheim, Matthias Horn, Ina Schulz, and Gerald Weber.
Integrated model-based software development, data access, and data migration.
In Lionel C. Briand and Clay Williams, editors, MoDELS, volume 3713 of
Lecture Notes in Computer Science, pages 382–396. Springer, 2005.
[24] Rafael Magalhães Borges and Alexandre Cabral Mota. Integrating UML and
formal methods. Electron. Notes Theor. Comput. Sci., 184:97–112, July 2007.
[25] Artur Boronat and José Meseguer. An algebraic semantics for mof. In Proceedings of the Theory and practice of software, 11th international conference on
Fundamental approaches to software engineering, FASE’08/ETAPS’08, pages
377–391, Berlin, Heidelberg, 2008. Springer-Verlag.
[26] Philippe Brèche. Advanced principles for changing schemas of object databases.
In Proceedings of the 8th International Conference on Advances Information
System Engineering, pages 476–495. Springer-Verlag, 1996.
[27] G. H. W. M. Bronts, S. J. Brouwer, C. L. J. Martens, and Henderik Alex Proper.
A unifying object role modeling theory. Inf. Syst., 20(3):213–235, 1995.
[28] Stephen D. Brookes, C. A. R. Hoare, and A. W. Roscoe. A theory of communicating sequential processes. J. ACM, 31(3):560–599, 1984.
[29] Manfred Broy and Marı́a Victoria Cengarle. UML formal semantics: lessons
learned. Software and System Modeling, 10(4):441–446, 2011.
158
[30] Hugo Bruneliere, Jordi Cabot, Frédéric Jouault, and Frédéric Madiot. MoDisco:
a generic and extensible framework for model driven reverse engineering. In
Proceedings of the IEEE/ACM international conference on Automated software
engineering, ASE ’10, pages 173–174, New York, NY, USA, 2010. ACM.
[31] Barrett R. Bryant, Jeff Gray, Marjan Mernik, Peter J. Clarke, Robert B. France,
and Gabor Karsai. Challenges and directions in formalizing the semantics of
modeling languages. Comput. Sci. Inf. Syst., 8(2):225–253, 2011.
[32] Erik Burger and Boris Gruschko. A change metamodel for the evolution of mofbased metamodels. In Gregor Engels, Dimitris Karagiannis, and Heinrich C.
Mayr, editors, Modellierung, volume 161 of LNI, pages 285–300. GI, 2010.
[33] Michael Butler and Colin Snook. Uml-b: Formal modeling and design aided by
uml. ACM Trans. Softw. Eng. Methodol., 15(1):92–122, January 2006.
[34] Coral Calero, Francisco Ruiz, Aline Lúcia Baroni, Fernando Brito Abreu, and
Mario Piattini. An ontological approach to describe the sql: 2003 objectrelational features. Computer Standards & Interfaces, 28(6):695–713, 2006.
[35] Stefano Ceri, Piero Fraternali, Stefano Paraboschi, and Letizia Tanca. Automatic generation of production rules for integrity maintenance. ACM Transactions on Database Systems, 19:367–422, September 1994.
[36] Peter P. Chen. The entity-relationship model - toward a unified view of data.
ACM Transactions on Database Systems, 1(1):9–36, 1976.
[37] Antonio Cicchetti, Davide Di Ruscio, Romina Eramo, and Alfonso Pierantonio.
Automating co-evolution in model-driven engineering. In Enterprise Distributed
Object Computing Conference (EDOC), pages 222–231. IEEE Computer Society, 2008.
[38] Tony Clark and Andy Evans. Foundations of the unified modeling language.
In Proceedings of the 2nd BCS-FACS conference on Northern Formal Methods,
(2FACS’97), pages 6–6, Swinton,UK, 1997. British Computer Society.
[39] Kajal T. Claypool, Jing Jin, and Elke A. Rundensteiner. Serf: Schema evalution
through an extensible, re-usable and flexible framework. In Georges Gardarin,
James C. French, Niki Pissinou, Kia Makki, and Luc Bouganim, editors, International Conference on Information and Knowledge Management(CIKM),
pages 314–321. ACM, 1998.
159
[40] Kajal T. Claypool, Elke A. Rundensteiner, and George T. Heineman. Rover:
flexible yet consistent evolution of relationships. Data Knowl. Eng., 39:27–50,
October 2001.
[41] Anthony Cleve, Anne-France Brogneaux, and Jean-Luc Hainaut. A conceptual
approach to database applications evolution. In Proceedings of the 29th international conference on Conceptual modeling, ER’10, pages 132–145. SpringerVerlag, 2010.
[42] Anthony Cleve, Jean Henrard, Didier Roland, and Jean-Luc Hainaut. Wrapperbased system evolution application to codasyl to relational migration. In Proceedings of the 2008 12th European Conference on Software Maintenance and
Reengineering, pages 13–22, Washington, DC, USA, 2008. IEEE Computer Society.
[43] Anthony Cleve, Tom Mens, and Jean-Luc Hainaut. Data-intensive system evolution. IEEE Computer, 43(8):110–112, 2010.
[44] György Csertán, Gábor Huszerl, István Majzik, Zsigmond Pap, András Pataricza, and Dániel Varró. Viatra - visual automated transformations for formal
verification and validation of UML models. In Automated Software Engineering
(ASE), pages 267–270. IEEE Computer Society, 2002.
[45] Krzysztof Czarnecki and Simon Helsen. Feature-based survey of model transformation approaches. IBM Systems Journal, 45(3):621–646, 2006.
[46] Jim Davies, Charles Crichton, Edward Crichton, David Neilson, and Ib Holm
Sørensen. Formality, evolution, and model-driven software engineering. Electr.
Notes Theor. Comput. Sci., 130:39–55, 2005.
[47] Jim Davies, James Welch, Alessandra Cavarra, and Edward Crichton. On the
generation of object databases using booster. In International Conference on
Engineering of Complex Computer Systems (ICECCS), pages 249–258. IEEE
Computer Society, 2006.
[48] Hondjack Dehainsala, Guy Pierra, and Ladjel Bellatreche. OntoDB: an
ontology-based database for data intensive applications. In Proceedings of the
12th international conference on Database systems for advanced applications,
DASFAA’07, pages 497–508, Berlin, Heidelberg, 2007. Springer-Verlag.
160
[49] Christine Delcourt and Roberto Zicari. The design of an integrity consistency
checker (ICC) for an object-oriented database system. In Proceedings of the
European Conference on Object-Oriented Programming, ECOOP ’91, pages 97–
117, London, UK, 1991. Springer-Verlag.
[50] Cecilia Delgado, José Samos, and Manuel Torres. Primitive operations for
schema evolution in odmg databases. In Dimitri Konstantas, Michel Léonard,
Yves Pigneur, and Shushma Patel, editors, Object-Oriented Information Systems (OOIS), volume 2817 of Lecture Notes in Computer Science, pages 226–
237. Springer, 2003.
[51] Birgit Demuth and Heinrich Hußmann. Using UML/OCL constraints for relational database design. In Robert B. France and Bernhard Rumpe, editors,
UML, volume 1723 of Lecture Notes in Computer Science, pages 598–613.
Springer, 1999.
[52] Edsger W. Dijkstra. A Discipline of Programming. Prentice Hall, Inc., 1976.
[53] Li Dong and Huang Linpeng. A framework for ontology-based data integration.
In Proceedings of the 2008 International Conference on Internet Computing in
Science and Engineering, ICICSE ’08, pages 207–214, Washington, DC, USA,
2008. IEEE Computer Society.
[54] Marina Egea and Vlad Rusu. Formal executable semantics for conformance in
the mde framework. Innovations in Systems and Software Engineering (ISSE),
6(1-2):73–81, 2010.
[55] R. Elmasri and S.B. Navathe. Fundamentals of Database Systems (4th Edition).
Benjamin Cummings, Redwood City, Calif., USA, 2003.
[56] Marcos Didonet Del Fabro and Patrick Valduriez. Semi-automatic model integration using matching transformations and weaving models. In Yookun Cho,
Roger L. Wainwright, Hisham Haddad, Sung Y. Shin, and Yong Wan Koo, editors, ACM Symposium on Applied Computing (SAC), pages 963–970. ACM,
2007.
[57] Marcos Didonet Del Fabro and Patrick Valduriez. Towards the efficient development of model transformations using model weaving and matching transformations. Software and System Modeling, 8(3):305–324, 2009.
161
[58] David Faitelson, James Welch, and Jim Davies. From predicates to programs:
The semantics of a method language. Electron. Notes Theor. Comput. Sci.,
184:171–187, 2007.
[59] Jean-Marie Favre. Languages evolve too! changing the software time scale.
In Proceedings of the Eighth International Workshop on Principles of Software
Evolution, pages 33–44, Washington, DC, USA, 2005. IEEE Computer Society.
[60] Fabrizio Ferrandina, Thorsten Meyer, Roberto Zicari, Guy Ferran, and Joëlle
Madec. Schema and database evolution in the o2 object database system. In
Umeshwar Dayal, Peter M. D. Gray, and Shojiro Nishio, editors, VLDB, pages
170–181. Morgan Kaufmann, 1995.
[61] Giorgos Flouris, Dimitris Manakanatas, Haridimos Kondylakis, Dimitris Plexousakis, and Grigoris Antoniou. Ontology change: Classification and survey.
Knowl. Eng. Rev., 23(2):117–152, June 2008.
[62] Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts.
Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, 1999.
[63] Robert France and Bernhard Rumpe. Model-driven development of complex
software: A research roadmap. In 2007 Future of Software Engineering, FOSE
’07, pages 37–54, Washington, DC, USA, 2007. IEEE Computer Society.
[64] Robert B. France, Andy Evans, Kevin Lano, and Bernhard Rumpe. The UML
as a formal modeling notation. Computer Standards & Interfaces, 19(7):325–
334, 1998.
[65] Stephen J. Garland, John V. Guttag, and James J. Horning. An overview
of larch. In Peter E. Lauer, editor, Functional Programming, Concurrency,
Simulation and Automated Reasoning, volume 693 of Lecture Notes in Computer
Science, pages 329–348. Springer, 1993.
[66] Martin Gogolla, Jörn Bohling, and Mark Richters. Validating UML and OCL
models in USE by automatic snapshot generation. Software and System Modeling, 4(4):386–398, 2005.
[67] Martin Gogolla and Arne Lindow. Transforming Data Models with UML. In
Knowledge Transformation for the Semantic Web, pages 18–33. 2003.
162
[68] Joseph A. Goguen, Claude Kirchner, Hélène Kirchner, Aristide Mégrelis, José
Meseguer, and Timothy C. Winkler. An introduction to obj 3. In Stéphane Kaplan and Jean-Pierre Jouannaud, editors, CTRS, volume 308 of Lecture Notes
in Computer Science, pages 258–263. Springer, 1987.
[69] Pieter Van Gorp, Hans Stenten, Tom Mens, and Serge Demeyer. Towards automating source-consistent UML refactorings. In Perdita Stevens, Jon Whittle,
and Grady Booch, editors, The Unified Modeling Language (UML) Modeling
Languages and Applications, volume 2863 of Lecture Notes in Computer Science, pages 144–158. Springer, 2003.
[70] Jeff Gray, Yuehua Lin, and Jing Zhang. Automating change evolution in modeldriven engineering. IEEE Computer, 39(2):51–58, 2006.
[71] D. Gries. The Science of Programming. Springer Verlag, New York, USA, 1981.
[72] Thomas R. Gruber. A translation approach to portable ontology specifications.
Knowl. Acquis., 5(2):199–220, June 1993.
[73] Jean-Luc Hainaut, Anthony Cleve, Jean Henrard, and Jean-Marc Hick. Migration of Legacy Information Systems. 2008.
[74] Terry Halpin. Object-role modeling (ORM/NIAM). Handbook on Architectures
of Information Systems, Springer Berlin Heidelberg, pages 81–103, 2006.
[75] Terry Halpin and Tony Morgan.
Information Modeling and Relational
Databases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.
[76] David Harel and Bernhard Rumpe. Meaningful modeling: What’s the semantics
of ”semantics”? IEEE Computer, 37(10):64–72, 2004.
[77] Michael Hartung, James F. Terwilliger, and Erhard Rahm. Recent advances in
schema and ontology evolution. In Zohra Bellahsene, Angela Bonifati, and Erhard Rahm, editors, Schema Matching and Mapping, pages 149–190. Springer,
2011.
[78] Markus Herrmannsdoerfer, Sebastian Benz, and Elmar Jürgens. Automatability
of coupled evolution of metamodels and models in practice. In Krzysztof Czarnecki, Ileana Ober, Jean-Michel Bruel, Axel Uhl, and Markus Völter, editors,
MoDELS, volume 5301 of Lecture Notes in Computer Science, pages 645–659.
Springer, 2008.
163
[79] http://www.eclipse.org/datatools/. Eclipse Data Tools Project, 2012.
[80] http://www.eclipse.org/modeling/emft/?project=compare. Eclipse Compare
Project, 2009.
[81] W. H. Inmon. Building the data warehouse. QED Information Sciences, Inc.,
Wellesley, MA, USA, 1992.
[82] International Standards Organization (ISO).
Vienna development method
(VDM) — specification language. ISO/Information Technology/IEC 138171:1996, 1996.
[83] International Standards Organization (ISO). Z formal specification notation
— syntax, type system and semantics.
13568:2002, 2002.
ISO/Information Technology/IEC
[84] International Standards Organization (ISO). Database languages SQL: Parts
1 to 4 and 9 to 14. ISO/Information Technology/IEC 9075-1:2003 to 907514:2003, 2003.
[85] Jonathan Jacky. The Way of Z. Cambridge University Press, Cambridge, UK,
1997.
[86] Ivar Jacobson, Magnus Christerson, Patrik Jonsson, and Gunnar Övergaard.
Object-oriented software engineering - a use case driven approach. AddisonWesley, 1992.
[87] Yanbing Jiang, Weizhong Shao, Lu Zhang, Zhiyi Ma, Xiangwen Meng, and
Haohai Ma. On the classification of UML’s metamodel extension mechanism.
In Thomas Baar, Alfred Strohmeier, Ana M. D. Moreira, and Stephen J. Mellor,
editors, UML, volume 3273 of Lecture Notes in Computer Science, pages 54–68.
Springer, 2004.
[88] Edward Crichton Jim Davies, James Welch. Model-driven engineering of information systems: 10 years and 1000 versions. Science of Computer Programming,
Special Issue on Success Stories in Model Driven Engineering, (Submitted for
publication).
[89] Cliff Jones. Systematic Software Development Using VDM. Prentice Hall, 2nd
edition, 1991.
164
[90] Frédéric Jouault, Freddy Allilaire, Jean Bézivin, and Ivan Kurtev. ATL: A
model transformation tool. Sci. Comput. Program., 72(1-2):31–39, 2008.
[91] Ralph Kimball and Margy Ross. The Data Warehouse Toolkit: The Complete
Guide to Dimensional Modeling. John Wiley & Sons, Inc., New York, NY, USA,
2nd edition, 2002.
[92] Anneke Kleppe. Software Language Engineering: Creating Domain-Specific
Languages Using Metamodels. Addison-Wesley Professional, 1st edition, 2008.
[93] Leonid Kof and Birgit Penzenstadler. From requirements to models: Feedback
generation as a result of formalization. In Haralambos Mouratidis and Colette
Rolland, editors, Conference on Advanced Information Systems Engineering
(CAiSE), volume 6741 of Lecture Notes in Computer Science, pages 93–107.
Springer, 2011.
[94] Dimitrios S. Kolovos. Establishing correspondences between models with the
epsilon comparison language. In Richard F. Paige, Alan Hartman, and Arend
Rensink, editors, ECMDA-FA, volume 5562 of Lecture Notes in Computer Science, pages 146–157. Springer, 2009.
[95] Dimitrios S. Kolovos, Richard F. Paige, and Fiona Polack. The epsilon object
language (EOL). In Arend Rensink and Jos Warmer, editors, ECMDA-FA,
volume 4066 of Lecture Notes in Computer Science, pages 128–142. Springer,
2006.
[96] Stefan Kurz, Michael Guppenberger, and Burkhard Freitag. A UML profile for
modeling schema mappings. In John F. Roddick, V. Richard Benjamins, Samira
Si-Said Cherfi, Roger H. L. Chiang, Christophe Claramunt, Ramez Elmasri,
Fabio Grandi, Hyoil Han, Martin Hepp, Miltiadis D. Lytras, Vojislav B. Misic,
Geert Poels, Il-Yeol Song, Juan Trujillo, and Christelle Vangenot, editors, ER
(Workshops), volume 4231 of Lecture Notes in Computer Science, pages 53–62.
Springer, 2006.
[97] Ralf Lämmel. Coupled Software Transformations (Extended Abstract). In
First International Workshop on Software Evolution Transformations, November 2004.
165
[98] Kevin Lano, David Clark, and Kelly Androutsopoulos. UML to B: Formal verification of object-oriented models. In Eerke A. Boiten, John Derrick, and Graeme
Smith, editors, Integrated Formal Methods (IFM), volume 2999 of Lecture Notes
in Computer Science, pages 187–206. Springer, 2004.
[99] Hung Ledang and Jeanine Souquières. Modeling class operations in B: Application to UML behavioral diagrams. In Proceedings of the 16th IEEE international conference on Automated software engineering, ASE ’01, pages 289–.
IEEE Computer Society, 2001.
[100] Barbara Staudt Lerner. A model for compound type changes encountered in
schema evolution. ACM Trans. Database Syst., 25(1):83–127, 2000.
[101] Barbara Staudt Lerner and A. Nico Habermann. Beyond schema evolution to
database reorganization. In OOPSLA/ECOOP ’90: Proceedings of the European conference on object-oriented programming on Object-oriented programming systems, languages, and applications, pages 67–76, New York, NY, USA,
1990. ACM.
[102] B-Core UK Limited. The B-toolkit., 2009.
[103] Sergio Luján-Mora, Panos Vassiliadis, and Juan Trujillo. Data mapping diagrams for data warehouse design with UML. In Paolo Atzeni, Wesley W. Chu,
Hongjun Lu, Shuigeng Zhou, and Tok Wang Ling, editors, ER, volume 3288 of
Lecture Notes in Computer Science, pages 191–204. Springer, 2004.
[104] Sanjay Madria, Kalpdrum Passi, and Sourav Bhowmick. An XML schema
integration and query mechanism system. Data Knowl. Eng., 65(2):266–303,
May 2008.
[105] Amel Mammar and Régine Laleau. A formal approach based on UML and B
for the specification and development of database applications. Autom. Softw.
Eng., 13(4):497–528, 2006.
[106] Amel Mammar and Régine Laleau. From a B formal specification to an executable code: application to the relational database domain. Information &
Software Technology, 48(4), 2006.
[107] Rafael Marcano-Kamenoff and Nicole Lévy. Transformation rules of OCL constraints into B formal expressions. In CSDUML’2002, Workshop on critical
166
systems development with UML. 5th International Conference on the Unified
Modeling Language, Dresden, Germany, September 2002.
[108] Slavisa Markovic and Thomas Baar. Refactoring OCL annotated UML class
diagrams. Software and Systems Modeling, 7(1), 2008.
[109] Tiago Massoni, Rohit Gheyi, and Paulo Borba. A framework for establishing formal conformance between object models and object-oriented programs.
Electr. Notes Theor. Comput. Sci., 195:189–209, 2008.
[110] Brian Matthews and Elvira Locuratolo. Formal development of databases in
asso and b. In Proceedings of the Wold Congress on Formal Methods in the
Development of Computing Systems-Volume I - Volume I, FM ’99, pages 388–
410, London, UK, UK, 1999. Springer-Verlag.
[111] Tom Mens and Tom Tourwe. A survey of software refactoring. IEEE Trans.
Softw. Eng., 30(2):126–139, 2004.
[112] Thomas O. Meservy and Kurt D. Fenstermacher. Transforming software development: An MDA road map. IEEE Computer, 38(9):52–58, September 2005.
[113] Hyun Jin Moon, Carlo Curino, MyungWon Ham, and Carlo Zaniolo. Prima:
archiving and querying historical data with evolving schemas. In Ugur
Çetintemel, Stanley B. Zdonik, Donald Kossmann, and Nesime Tatbul, editors,
SIGMOD Conference, pages 1019–1022. ACM, 2009.
[114] Carroll Morgan. Programming from specifications. Prentice-Hall, Inc., Upper
Saddle River, NJ, USA, 1990.
[115] Robert J. Muller. Database Design for Smarties: Using UML for Data Modeling. Morgan Kaufmann; 1st edition, 1999.
[116] Eric J. Naiburg and Robert A. Maksimchuk.
UML for Database Design.
Addison-Wesley Professional, 2001.
[117] Shamkant B. Navathe. Evolution of data modeling for databases. Commun.
ACM, 35:112–123, September 1992.
[118] Natalya F. Noy, Abhita Chugh, William Liu, and Mark A. Musen. A framework
for ontology evolution in collaborative environments. In Proceedings of the
5th international conference on The Semantic Web, ISWC’06, pages 544–558,
Berlin, Heidelberg, 2006. Springer-Verlag.
167
[119] Object Management Group. UML profile for CORBA, 2001.
[120] Object Management Group. UML 2.0 superstructure specification, 2005.
[121] Object Management Group. OCL specifications, version 2, 2006.
[122] Object Management Group. OMG’s MetaObject Facility (MOF) version 2.0,
retrieved february 09, 2011, 2006. http://www.omg.org/spec/MOF/2.0.
[123] Object Management Group. Ontology definition metamodel (ODM), 2009.
[124] Object Management Group. Unified Modeling Language (UML), Infrastructure,
Version 2.2, 2009. http://www.omg.org/docs/formal/09-02-04.pdf.
[125] Object Management Group. MDA guide version 1.0.1, June 2003.
[126] Antoni Olive. Conceptual Modeling of Information Systems. Springer-Verlag
Berlin Heidelberg, 2007.
[127] William F. Opdyke. Refactoring: A program restructuring aid in designing
object-oriented application frameworks, phd thesis, 1992.
[128] Richard F. Paige, Phillip J. Brooke, and Jonathan S. Ostroff. Metamodel-based
model conformance and multiview consistency checking. ACM Transactions on
Software Engineering and Methodologies, 16(3):11, 2007.
[129] George Papastefanatos, Fotini Anagnostou, Yannis Vassiliou, and Panos Vassiliadis. Hecataeus: A what-if analysis tool for database schema evolution. In
Proceedings of the 2008 12th European Conference on Software Maintenance
and Reengineering, pages 326–328, Washington, DC, USA, 2008. IEEE Computer Society.
[130] Fernando Silva Parreiras, Steffen Staab, and Andreas Winter. On marrying
ontological and metamodeling technical spaces. In Ivica Crnkovic and Antonia Bertolino, editors, 6th joint meeting of the European Software Engineering
Conference and the ACM SIGSOFT International Symposium on Foundations
of Software Engineering, ESEC/SIGSOFT FSE, pages 439–448. ACM, 2007.
[131] D. Jason Penney and Jacob Stein. Class modification in the GemStone objectoriented DBMS. In OOPSLA ’87: Conference proceedings on Object-oriented
programming systems, languages and applications, pages 111–117, New York,
NY, USA, 1987. ACM.
168
[132] Stephan Philippi. Model driven generation and testing of object-relational mappings. Journal of Systems and Software, 77(2):193 – 207, 2005.
[133] Iman Poernomo. A type theoretic framework for formal metamodeling. In
Architecting Systems with Trustworthy Components, pages 262–298, 2004.
[134] Anne Pons and Rudolf K. Keller. Schema evolution in object databases by
catalogs. In Proceedings of the 1997 international conference on International
database engineering and applications symposium, IDEAS’97, pages 368–376.
IEEE Computer Society, 1997.
[135] Erhard Rahm and Philip A. Bernstein. A survey of approaches to automatic
schema matching. The VLDB Journal, 10(4):334–350, December 2001.
[136] Awais Rashid and Peter Sawyer. A database evolution taxonomy for objectoriented databases: Research articles. Journal of Software Maintainance and
Evolution, 17(2):93–141, March 2005.
[137] Awais Rashid, Peter Sawyer, and Elke Pulvermüller. A flexible approach for
instance adaptation during class versioning. In Klaus R. Dittrich, Giovanna
Guerrini, Isabella Merlo, Marta Oliva, and Elena Rodrı́guez, editors, Objects
and Databases, volume 1944 of Lecture Notes in Computer Science, pages 101–
113. Springer, 2000.
[138] Roberto V. Zicari.
terview
with
How good is UML for Database Design?
Michael
Blaha,
retrieved
august
01,
2011,
In2011.
http://www.odbms.org/blog/2011/07.
[139] Donald Bradley Roberts. Practical Analysis for Refactoring, PhD thesis. University of Illinois at Urbana-Champaign, 1999.
[140] James E. Rumbaugh, Michael R. Blaha, William J. Premerlani, Frederick Eddy,
and William E. Lorensen. Object-Oriented Modeling and Design. Prentice-Hall,
1991.
[141] Steve Schneider. The B-Method: An Introduction. Palgrave Macmillan, 2001.
[142] Shane Sendall and Wojtek Kozaczynski. Model transformation: The heart and
soul of model-driven software development. IEEE Software, 20(5):42–45, 2003.
169
[143] Graeme Paul Smith. The Object-Z Specification Language. Advances in Formal
Methods. Kluwer Academic Publishers, 2000.
[144] Colin Snook and Michael Butler. UML-B: Formal modeling and design aided
by UML. ACM Trans. Softw. Eng. Methodol., 15:92–122, January 2006.
[145] Xudong Song, Xiaolan Yan, and Liguo Yang. Design ETL metamodel based
on UML profile. In Proceedings of the 2009 Second International Symposium
on Knowledge Acquisition and Modeling - Volume 03, KAM ’09, pages 69–72,
Washington, DC, USA, 2009. IEEE Computer Society.
[146] David Spelt and Susan Even. A theorem prover-based analysis tool for objectoriented databases. In Rance Cleaveland, editor, Tools and Algorithms for
Construction and Analysis of Systems (TACAS), volume 1579 of Lecture Notes
in Computer Science, pages 375–389. Springer, 1999.
[147] J. M. Spivey. The Z Notation: A Reference Manual. Prentice Hall, 2nd edition,
1992.
[148] Dave Steinberg, Frank Budinsky, Marcelo Paternostro, and Ed Merks. EMF:
Eclipse Modeling Framework, 2nd Edition. Addison-Wesley Professional, 2008.
[149] Susan Stepney, Rosalind Barden, and David Cooper. Object orientation in Z,
Workshops in Computing. Springer, 1992.
[150] Ljiljana Stojanovic, Alexander Maedche, Boris Motik, and Nenad Stojanovic.
User-driven ontology evolution management. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management.
Ontologies and the Semantic Web, EKAW ’02, pages 285–300, London, UK,
UK, 2002. Springer-Verlag.
[151] Gerson Sunye, Damien Pollet, Yves Le Traon, and Jean-Marc Jézéquel. Refactoring UML models. In Proceedings of the 4th International Conference on The
Unified Modeling Language, Modeling Languages, Concepts, and Tools, pages
134–148, London, UK, 2001. Springer-Verlag.
[152] Antero Taivalsaari. On the notion of inheritance. ACM Comput. Surv., 28:438–
479, September 1996.
[153] B. Thalheim. Fundamentals of Entity-relationship Modeling. Springer-Verlag
Berlin and Heidelberg GmbH Co. KG, 1999.
170
[154] The Object Management Group. Common Warehouse Metamodel (CWM)
Specification, 2003.
[155] The Object Management Group. Software Process Engineering Metamodel
Specification (SPEM), 2005.
[156] Lance Tokuda and Don Batory. Evolving object-oriented designs with refactorings. Automated Software Engg., 8(1):89–120, 2001.
[157] Helen Treharne. Supplementing a UML Development Process with B. In LarsHenrik Eriksson and Peter A. Lindsay, editors, International Symposium of
Formal Methods Europe FME, volume 2391 of Lecture Notes in Computer Science, pages 568–586. Springer, 2002.
[158] Can Türker and Michael Gertz. Semantic integrity support in SQL:1999 and
commercial (object-)relational database management systems. The VLDB
Journal, 10:241–269, December 2001.
[159] Panos Vassiliadis, Alkis Simitsis, and Spiros Skiadopoulos. Conceptual modeling for ETL processes. In Il-Yeol Song and Dimitri Theodoratos, editors,
DOLAP 2002, ACM Fifth International Workshop on Data Warehousing and
OLAP, November 8, 2002, McLean, VA, Proceedings, pages 14–21. ACM, 2002.
[160] Guido Wachsmuth. Metamodel adaptation and model co-adaptation. In
ECOOP 2007 - Object-Oriented Programming, 21st European Conference,
Berlin, Germany, July 30 - August 3, 2007, Proceedings, pages 600–624, 2007.
[161] Chris Wallace. Using Alloy in process modeling. Information & Software Technology, 45(15):1031–1043, 2003.
[162] James Welch, David Faitelson, and Jim Davies. Automatic maintenance of
association invariants. Software and System Modeling, 7(3):287–301, 2008.
[163] Jeannette M. Wing. A specifier’s introduction to formal methods. IEEE Computer, 23(9):8–24, 1990.
[164] J. Woodcock and J. Davies. Using Z—Specification, Refinement, and Proof.
Series in Computer Science. Prentice Hall International, 1996.
171
[165] Alexandre V. Zamulin. An object algebra for the ODMG standard. In Proceedings of the 6th East European Conference on Advances in Databases and Information Systems, ADBIS ’02, pages 291–304, London, UK, UK, 2002. SpringerVerlag.
[166] Jing Zhang, Yuehua Lin, and Jeff Gray. Generic and domain-specific model
refactoring using a model transformation engine. In Volume II of Research and
Practice in Software Engineering, pages 199–218. Springer, 2005.
172
Appendix A
Implementation and case study
Our focus in this chapter is to demonstrate how, using the open source Eclipse development platform 1 , we can realize a solution for data model evolution. Such a solution
can be used by information system designers in performing data model updates, analyzing some of the consequences of their updates and subsequently migrating the
data.
Overview
Our Eclipse-based solution is developed according to the principles of the approach we
presented in previous chapters. Figure A.1 shows an overview of our proposed modeldriven approach that is detailed in subsequent sections. In our approach, changes to a
UML data model are specified based on a language of model operations defined in the
form of an evolution metamodel. Using model transformation, these model operations
are given semantics in B-method to precisely specify their applicability and intended
effect on data instances. The B-method specifications of model operations represent
an abstract program that can be formally checked prior to transformation into a
concrete programming notation such as Standard Query Language (SQL).
Our approach essentially consists of series of model transformation [142] across a
number of MOF-based metamodels. Each metamodel is used to define a structure and
well-formedness rules that must be satisfied (conformed to) by models at the lower
abstraction level. By using Eclipse as a development environment, we have defined
these metamodels using the Eclipse Modeling Framework (EMF) 2 [148]. Below we
explain the main components of our prototypical implementation.
1
2
http://www.eclipse.org
http://www.eclipse.org/modeling/emf
173
Figure A.1: Overview of the implementation approach
Evolving data models
Data modeling with UML and OCL
Following the main principles outlined in Section 4.2, in our current implementation
using EMF, we defined two metamodels: one representing the subset of UML and
one representing the Object Constraint Language (OCL), as can be seen in Figure
A.2 below .
These metamodels can be instantiated into data models similar to the Employee
Information System data model. Figure A.3 below shows a fragment of how these
metamodels can be instantiated. The instance model consists of classes, attributes,
associations and constraints. Of particular interest in this figure is the way an OCL
constraint instantiates an OCL metamodel.
When expressing a constraint as an instance of the OCL metamodel, the body
of the constraint can be regarded as a binary tree where each node represents an
instance of a metaclass of the OCL metamodel.
We show in Fig. A.3 the constraint C2 as an instance of the OCL metamodel. In
this example, the forAll iterator (represented as an instance of the metaclass IteratorExp) is the root of the tree. The left child of the root is the source of the iterator.
In its turn, this left child is represented as an instance of the PropertyCallExpressionmetaclass corresponding to the navigation through the association end employees. Its
source is the access to the self variable. The right child of the root is the body of
174
(a) UML Metamodel, equivalent to Figure
(b) OCL Metamodel
4.2
Figure A.2: A fragment of UML and OCL Metamodels defined in Eclipse Modeling
Framework (EMF)
the iterator expression. The root of this right subtree is the operation ≥ represented
as an instance of the OperationCallExp metaclass. This node has two children. The
first one is its source, an access to the attribute salary (with a last child representing
the access to the e variable). The second one has the same structure, an access to
the minSalary attribute followed by an access to the self variable.
Editing data models
In our approach, according to our evolution metamodel presented and explained in
Section 4.3, each change to a data model is an operation on a model instance—adding,
modifying, or deleting a model element—and thus our evolutionary steps correspond
to operations at the metamodel level. Accordingly we have extended our UML subset
that we use for data modeling with classes of model operations, introducing add,
modify, and delete operations for each class of model element and explicitly defining
how a target data model is produced from a source data model.
To be able to include operations corresponding to more complex model evolution
steps, such as the inlining of an existing class, and the introduction of an association
class, we have introduced combinator for our evolution metamodel, allowing us to
compose changes both sequentially (;) and in iteration. In addition, we use an OclExpression to describe the initial values of a new or modified attribute or association.
175
Figure A.3: data model instantiated, equivalent to Figure 4.5
This expression may refer to other attributes or associations in the current model.
For more details on the above metamodel, please refer to Section 4.3.
import 'platform:/resource/org.example.dataModel/model/DataModel.ecore' as dmodel
Model
::= EvolutionModel ID editing SourceModel Operation “;”
SourceModel ::= model UriID “;”
Operation
::= AddClass | DeleteClass | AddPropertyWithValue |…|InlineClass|…”;”
AddClass
::= addClass ”(“ ID “)” “;”
DeleteClass ::= deleteClass “(“ dmodel::Class ”)” “;”
AddPropertyWithValue::= addPropertyWithValue “(“ dmodel::Class “,” ID “:” Type
“,” Multiplicity “,” “<” OCLExpression “>” “)” “;”
InlineClass::= inlineClass “(“ dmodel::Class “,” dmodel::Class “,”
dmodel::Property “)” “;”
…
Type::= String | Integer | Boolean | dmodel::Class
ID ::= (“a”..”z”|”A”..”Z”|”_”) (“a”..”z”|”A”..”Z”|”_”|”0”..”9”)*
Figure A.4: Data Model Evolution Grammar (excerpt)
176
Figure A.5: Data Model Editor
Concrete syntax
To derive a concrete syntax from our evolution metamodel, we have followed the rules
defined by [12] on how to derive a context-free EBNF grammar from a metamodel.
Figure A.4 shows an excerpt of the EBNF version of the grammar we have written
using Xtext by openArchitectureWare 3 . Xtext is an open-source language development framework that enables the generation of various language artifacts based on
concrete syntax. In the figure, the [dmodel::Class] represents a cross-reference to a
specific element of the source data model being edited (e.g. Class). Based on the
grammar rules above, using Xtext, we have generated the editor shown in Figure A.5.
Referencing a specific data model, written in our UML subset, an information
system designer can use this editor to update data model elements, adding , deleting or
modifying model components. The editor provides standard features such as keyword
highlighting, auto-completion and content assistance. Models written in the editor
can be parsed as instances of our evolution metamodel as can be seen in Figure A.6
below.
Using model transformation, this evolution model can be transformed into an SQL
model, suitable for migrating data instances to the target (evolved) model. However,
3
http://www.eclipse.org/Xtext
177
to ensure that such data migration is successful, we first transform the evolution
model into a a model in B-method.
Figure A.6: Part of parsed evolution model
Assigning B formal semantics
Introducing a MOF-based metamodel of B-method enables us to bring B-method into
a model-driven engineering setting and define B abstract machines, refinement and
implementation as models of B-method metamodel.
Figure A.7 shows parts of the B-method metamodel which we defined in Eclipse
using Eclipse Modeling Framework (EMF). This metamodel is the EMF implementation of the B-method metamodel as outlined in Chapter 3. Similar to the way we
defined our UML metamodel subset, this definition of B-method metamodel in EMF
enables us to use this metamodel as an input in a model transformation process to
assign formal semantics to our data metamodel and evolution metamodel.
More specifically, assigning formal semantics to data model evolution with Bmethod consists of the specification of an Abstract Machine (defined in B Abstract
Machine Notation), able to capture the semantics of data models defined in UML
and a collection of B abstract operations (defined in B Generalized Substitution Language), able to capture the semantics of our model evolution operations. In particular,
our B semantic domain consists of the following two parts:
178
(a)
(b)
Figure A.7: A fragment of B Metamodel
1. Generic Abstract Machine. This Abstract Machine describes corresponding
constructs of UML modeling language subset, that are necessary for modeling
data models.
Figure A.8: Part of the abstract machine used to assign semantics to UML data
metamodel
179
Figure A.9: Part of preconditioned substitution template used to assign semantics to
evolution operations
Figure A.8 shows the main clauses of the B abstract machine which is built
using concepts of the B-method metamodel, shown in Figure A.7. This generic
abstract machine is used to assign formal semantics to the various concepts in
the data metamodel. For example, main elements of the data metamodel are
represented using abstract set concepts, main characteristics of the data metamodel class concept are represented as machine variables with appropriate type.
Machine invariants such as absence of cyclic inheritance are also represented,
instantiating relevant elements of the B-method metamodel.
2. B Abstract Operations. The meaning of the evolution model defined with the
Evolution metamodel is specified by means of B abstract operations expressed
in the form of Generalized Substitution Language (GSL). Starting from the
given instance of the abstract machine, they evolve both model elements and
data instances.
Figure A.9 shows how we instantiate appropriate elements of the B-method
metamodel to assign formal semantics to the evolution operations. For example, operation addClass has three input variables. In a preconditioned substitution, these input variables are typed in the precondition clause and assigned
to appropriate state variables (className in the above example).
180
(a) SQL Metamodel
(b) SQL Model
Figure A.10: A fragment of SQL Metamodel and a model defined in Eclipse Modeling
Framework (EMF)
The above two components of our B-method semantic domain are obtained automatically by means of model transformation from a data model and its corresponding
evolution model.
Generating SQL data migration programs
Given the prevalence of relational databases, we assume that the data we want to
evolve will have relational implementation. In our current implementation, as a relational database management system (DBMS), we use Apache Derby 4 as implemented
in Eclipse data tool platform project 5 .
Similar to the way we defined UML and OCL metamodels using EMF framework, we first define SQL metamodel, as a .ecore file. Using our object-to-relational
transformation, the data model above may be translated into an initial SQL model
conforming to SQL metamodel, as can be seen in Figure A.12
Using model-to-text transformation technique, we can transform SQL model into
a corresponding collection of SQL statements. In the prototypical implementation of
the approach, we have chosen to do this using the Xpand notation 6 . This notation
4
http://db.apache.org/derby
http://www.eclipse.org/datatools
6
http://www.eclipse.org/modeling/m2t/?project=xpand
5
181
Figure A.11: SQL script Generation (excerpt)
is intended for transforming model descriptions into text files, but performs equally
well as a tool to navigate the structure of a model and generate an appropriate text.
(a) SQL Statements
(b) SQL Table
Figure A.12: A fragment of SQL Statements and a table
Figure A.11 shows part of an Xpand template used to generate SQL code. Typically, the first statement in such a template will be an IMPORT statement that imports
the required metamodels: once a metamodel has been imported, we can refer to its
elements in the programming context in which code fragments are to be produced.
The DEFINE clause represents a fragment that can be expanded within the context
of template execution; the EXPAND clause directs the execution to another DEFINE
template. The output SQL file and (a fragment of) corresponding physical database
table are shown in Figure A.12.
182
Case study
7
In this case study, we will report on a practical realization of our data model evolution approach. In particular, we will introduce and explain our experience applying
the approach which we outlined in the previous chapters to a real life case study.
This explanation will focus upon one information system, developed using Booster,
a domain-specific data modeling framework created at Oxford. We start by giving a
short overview of Booster and our motivation to apply our approach within a Booster
context. We then give a brief description of True Colours, the information system on
which our evolution approach was demonstrated. We also explain how data model
changes were captured, analyzed and used to generate data migration code. We end
our case study presentation by outlining some lessons learned.
A brief overview of Booster
Booster is a data modeling language, described in [46, 47] and [162], based on elements of the Abstract Machine Notation (AMN) of the B-method and the Refinement
Calculus [114]. The approach aims at automatically generating complete and working
object database implementations, sequentially accessed through queries and transactional updates.
A Booster model is a collection of classes, each defining the structure and behavior
of a particular kind of object. A Class has a list of attributes, associations
and methods. associations in Booster are normally bidirectional [162]: an explicit
association in one direction creates an implicit association in the other. methods are
defined in terms of preconditions, postconditions, and change-lists.
A compiler for the Booster language maps completed method predicates into statements in a programming notation [58]. Development with the Booster language involves the use of the Booster toolset, which is an extended version of B-toolkit that
automates the translation from abstract specification to executable code. This translation takes place as a series of steps [162, 58].
Currently limited data migration support is provided by Booster existing compiler,
any significant evolution must be decomposed into a series of smaller changes. There
is no notion of ‘evolution operations’, nor is there any formal ‘language of changes’.
Although Booster generates complete information systems, there is as yet no facility
7
Work on this case study was done within the context of [88]. I would like to acknowledge the
support provided by James Welch during the analysis of Booster data model changes while working
on this case study.
183
for specifying evolution within the model, and hence generated systems often require
a certain amount of manual adaptation following a model revision.
We find it tempting to apply our approach into a Booster information system for
two main reasons. First, the verification of our proposed generic approach is mainly
based on using AMN of the B-method. Development of Booster grew out of the
experience in using the B-toolkit. Second, A translation from Booster object model
to B-method is appealing because we can reuse proof theories and tools that were
developed for B-method.
Research on Booster has been driven by practical application, through the development and evolution of a range of systems in use within and outside the University
of Oxford. One of these systems has been available for inspection at various stages
of its development is True Colours system.
Overview of True Colours
True Colours is a patient — or mood-monitoring system designed to support clinical
care, health research, and self-management for patients with long-term mental health
conditions. The system is used to monitor patients with bipolar disorder (a psychiatric
diagnosis) in two counties south of England. In addition, the system is used to
computerize many of the reporting and management activities of clinical users and
support a range of research projects and clinical studies.
The primary function of the system is to support patient monitoring and selfmanagement of bipolar disorder. Participants are assigned monitoring schedules in
which they are prompted for a response at specified times: these could be at particular
times of the day, on particular days of the week, at certain times of the month, or
at random times within specified intervals. Schedules may be prescribed for days,
weeks, months, or indefinite periods. Prompts are typically sent as text messages,
inviting or reminding the participant to complete a questionnaire. The questionnaires
are carefully designed and validated, and become familiar to the participants, who
are given a credit card aide memoire of the questions. Responses may be provided
by logging in to a website, by sending an email, or by texting a reply. Each of the
questionnaires has a corresponding summary calculation, yielding a single score that
serves as a quick, positive or negative, indication of health. The scores are presented
in a graph, providing a useful visualization of how health has improved, or not, over
time, alongside relevant clinical information: in particular, details of medication.
All of this information is automatically exported to the main patient records
system used in the relevant part of the UK National Health Service (in this case,
184
90
80
70
Versions
60
50
40
30
20
10
0
0
50
100
150
200
Days
250
300
350
400
Figure A.13: True colours versions over time
Oxford Health NHS Foundation Trust), and sent also to clinicians in encrypted PDF
format. Negotiations are underway regarding wider adoption: for the monitoring of
other conditions in Oxfordshire, and for the monitoring of bipolar patients across
Scotland
Categorization of data model changes
The first version of the model was created on 4th December 2009, with an initial
collection of 7 classes, 46 attributes, and 50 methods, including all the basic functionality needed for user management. Initially, the only users of the system were the
clinicians and nurses involved in the monitoring and research programs.
Figure A.13 shows the rate of progress in terms of major versions developed, from
December 2009 to June 2011. As the rate varied over the first six months, we will
use the version number as the horizontal axis in a subsequent graph. As might be
expected, the rate of change in the model remained fairly constant after the first ten
or fifteen versions, until the point where the system was about to be exposed to a
new user group: that of the patients.
At this point, clinical and nursing staff became more focused upon the potential
patient experience, and produced more requests for minor modifications to the design:
for the most part, these would entail the addition of attributes to the model, and
the modification of existing methods; the classes and enumerations would remain
unchanged.
185
Booster primitive edit
Total number of changes
Add Class
Delete Class
Add Attribute
Change Attribute Type
Change Attribute Multiplicity
Rename Attribute
Delete Attribute
Add Method
Update Method
Delete Method
Add Enumeration
Modify Enumeration
Change Class Invariant
42
5
536
5
50
22
59
955
427
235
29
20
42
Table A.1: Recurrence of Booster model edits
Using Booster version files as input to an Eclipse Compare tool, we were able to
identify, Booster data model changes from one version to the other. Table A.1 shows
the frequency with which each primitive model edit has occurred. We may observe
that the highest two numbers refer to the number of times a method was added and
the number of times an attribute was also added. This shows a pattern of evolution in
which an application being more tightly scoped. Here, designers were not interested
in modifying existing elements, rather, they are mainly adding new elements (e.g.
attributes, methods or classes). This is different from other evolution patterns were
data models are updated to capture some structure or process at a greater level of
detail.
Figure A.14 shows the number of changes made, for each kind of model component,
against version number. FigureA.15 shows an example of true colours data model
changes from one version to the other.
Instantiation and analysis
These version changes can directly be mapped into primitive edits against our evolution metamodel and subsequently given semantics in AMN enabling reasoning and
analysis of evolution within our proposed B-method framework. However, since our
approach is defined at the metamodel level of design, and our consistency constraints
are stated in terms of sets and relations representing metamodel concepts, we need
186
250
200
No. of Changes
150
Classes
Attributes
Methods
100
Enumerations
50
0
0
10
20
30
40
Version No.
50
60
70
80
Figure A.14: True colours changes
to introduce an instantiation mechanism by which we can systematically instantiate
these metamodel concepts with values from Booster data models.
Booster data model version 9
Booster data model version 10
Enumeration addition
CLASS ResponseStatus [20] CONTROL
{NewResponse,
ManuallyDecoded,
AutomaticallyDecoded }
CLASS ResponseStatus [20] CONTROL
{NewResponse,
ManuallyDecoded,
AutomaticallyDecoded,
FromWeb, FromEmail,
AutoFromText,
ManualFromText }
Attribute renaming
objectHistoryForResponse :
[Response . responseHistory];
objectHistoryForScheduleEvent :
[ScheduleEvent . scheduleEventHistory];
Attribute addition
CLASS Response[1000]
ATTRIBUTES
responseTemp (ID) : NAT
END;
CLASS Response
responseDate (ID) : DATE;
responseTime (ID) : TIMEOFDAY;
responseStatus (ID) : ResponseStatus;
...
Attribute deletion
Attribute addition
scheduleResponses :
SET ( Response . responseForSchedule )
scheduleScheduleEvents :
SET ( ScheduleEvent . scheduleEventForSchedule )
Class addition
CLASS ScheduleEvent[10000]
ATTRIBUTES
scheduleEventDate (ID) : DATE;
scheduleEventTime (ID) : TIMEOFDAY;
...
Figure A.15: An example of version-level changes
In other words, to be able to verify the conformance of a Booster data model and
its corresponding data instances with respect to the formal semantics, we need to
187
allow the variables of our data metamodel to have values and encode data modelspecific constraints so that we are able to check the conformance of the model and
the data with respect to these constraints. The instantiation mechanism which we
propose next helps us perform these tasks.
The main idea of our instantiation mechanism is based on the observation that
each element in a data model is an instance (object) of a class in the metamodel (for
example: all the associations in a UML data model are instances of the Association
class in the UML metamodel). Hence, in our instantiation mechanism, we follow the
following four steps:
Step 1: each instantiable concept in Booster data metamodel (corresponding to
UML metamodel subset) is transformed into a B abstract machine. For example, we
have created a separate abstract machine for each Booster Class, Booster Property
and Booster Association concept. The properties of each concept are transformed
to variables in the corresponding abstract machine. We have then used an abstract
machine operation to instantiate the variables of the machine with corresponding
values from Booster data model.
This way, if we have a Booster class called Questionnair, we would create an abstract machine with the same name and populate its metamodel variables such as
className, isAbstract , ownedProperties, etc from the Questionnair class. Similaly, if we have a property called QuestionnairePrompt, we would create an abstract
machine and, in this case, populate our property variables such as propertyName,
propertyType, owningClass from the values of this property.
Step 2: data representing each metamodel concept (e.g Class, Property, etc) which
was collected in individual abstract machines in the previous step is merged, using
set union, into variables of one abstract machine representing an aggregated data
metamodel. As a result, in this data metamodel machine, we have variables whose
values are sets of data of all objects of Booster data model elements.
Step 3: with this instantiation mechanism, we can now express Booster data modelspecific constraints as invariants of the data metamodel machine. As a result Proof
obligations generated in the support tool can then inspect all objects to verify the
consistency of data instances.
Step 4: all individual machines of the data model see SEES a context machine which
defines all the sets of the data model, such as CLASS, PROPERTY, ASSOCIATION etc.
with possible values.
188
Using the above instantiation mechanism we were able to instantiate our abstract
data model and evolution machines with Booster data model values. We have performed this instantiation to analyze three arbitrary subsequent versions (from version
9 to version 12). Once version 9 instantiates the abstract machine corresponding to
our metamodel, we would use our evolution operations to evolve the model to the
subsequent version (guided by the changes identified in the change files).
Although none of these three versions was a refactoring of its predecessor, the evolution from one version to the other was applicable as the preconditions of operations
involved were all satisfiable.
189
Appendix B
B-method notation ASCII symbols
The table below provides a mapping of main B-method ASCII symbols used in our
B specifications.
Symbol
:
<:
;
/\
or
#
+->
<-->
|>
||>
+->>
-->>
==
/=
{}
*
NAT
Operator
∈
⊆
o
9
∩
∨
∃
→
7
↔
⊲
−
⊲
→
→
7
→
→
==
6=
∅
⊗
N
Symbol
/:
=>
\/
&
!
-->
>-->
|->
<|
>+->
>+->
>-->
||
<=
~
<=>
Operator
6∈
⇒
∪
∧
∀
→
→
7→
⊳
7
k
≤
∼
⇔
Table B.1: Mapping B ASCII charcters to math operators
190
Appendix C
B Specifications
Semantics of UML data metamodel
1
2
MACHINE
DataMetamodel
3
4
5
6
SETS
CLASS ; OBJECTID ; PROPERTY ; ASSOCIATION ;
VALUE ; TYPE ; PrimitiveTYPE
7
8
9
CONSTANTS
typeOf , primitiveType , classType
10
11
12
13
14
15
PROPERTIES
primitiveType : PrimitiveTYPE --> TYPE
classType
: CLASS --> TYPE
ran (primitiveType) /\ ran (classType) = {}
typeOf
: VALUE >-> TYPE
&
&
&
16
17
18
19
VARIABLES
/* Class variables */
className , isAbstract , superclass , ownedProperties ,
20
21
22
23
24
/* Property variables */
propertyName , propertyType , owningClass ,
propertyLowerMultip , propertyUpperMultip ,
isComposite , opposite ,
25
26
27
/* Association variables */
association , associationName , memberEnds ,
28
29
30
/* Association variables */
extent , value , link
31
191
32
33
34
35
36
37
INVARIANT
/* CLASS formalization */
className
<: CLASS
isAbstract
: CLASS +-> BOOL
superclass
: CLASS +-> CLASS
ownedProperties : CLASS <-> PROPERTY
&
&
&
&
38
39
40
41
42
43
44
45
/* PROPERTY formalization*/
propertyName
<: PROPERTY
isComposite
: PROPERTY
propertyLowerMultip : PROPERTY
propertyUpperMultip : PROPERTY
owningClass
: PROPERTY
propertyType
: PROPERTY
+->
+->
+->
+->
+->
BOOL
NAT
NAT1
CLASS
TYPE
&
&
&
&
&
&
46
47
48
49
50
51
/* ASSOCIATION formalization*/
associationName <: ASSOCIATION
&
association
: ASSOCIATION +-> (CLASS +-> CLASS )
&
memberEnds
: ASSOCIATION +-> (PROPERTY +-> PROPERTY ) &
opposite
: PROPERTY
+-> PROPERTY
&
52
53
54
55
56
/* Instance-level formalization*/
extent : CLASS
+-> POW (OBJECTID)
value : PROPERTY
<-> (OBJECTID +-> VALUE)
link
: ASSOCIATION <-> (OBJECTID +-> POW ( OBJECTID))
&
&
&
57
58
59
60
61
62
/**************************Name uniquness********************/
!(c1,c2).(c1:className & c2:className &
c1 /= c2 & (c1|->c2) : closure (superclass) =>
ownedProperties[{c1}] /\ ownedProperties[{c2}] = {}) &
63
64
/*********************Association bidirectionality***********/
65
66
67
/* A member end property of an association must be owned
by a class participating in the same association*/
68
69
70
71
72
73
!(aa,p1,p2).(aa:dom(association) &
p1:dom(memberEnds(aa)) &
p2=memberEnds(aa)(p1) =>
owningClass(p1) : dom(association(aa)) &
owningClass(p2) : ran(association(aa))) &
74
75
76
/* If a property is an opposite to another property in an association,
the other property should be the opposite of this property*/
77
78
!(aa1,p1,p2).(aa1:dom(memberEnds) &
192
79
80
81
82
83
p1:dom(memberEnds(aa1))&
p2:ran(memberEnds(aa1))&
(p1|->p2) : opposite =>
#(aa2).(aa2: dom(memberEnds)&
memberEnds(aa2) = {(p2|->p1)})) &
84
85
86
87
/****************Absence of circular inheritance*******************/
/*In an inheritance hierarchy, no class can generalize itself*/
closure1(superclass) /\ id(className) = {} &
88
89
90
91
/***************Instance conformance*******************************/
/* An object can only take values for the attributes of its class
(including its superclass) */
92
93
94
95
96
97
98
99
!(cc,oo,pp).(cc:className &
oo:extent(cc)
&
pp:PROPERTY
&
oo : dom (value (pp)) &
owningClass(pp) = cc
=>
pp : ownedProperties [(closure1 (superclass) \/ id(className))[{cc}]]) &
100
101
102
103
/*****************Value conformance*****************************/
/*Each attribute is instantiated into a value and assigned to an object.
This value needs to be consistent with the type of the attribute*/
104
105
106
107
108
!(cc,aa,oo).(cc:dom(extent) &
aa:ownedProperties[{cc}] &
oo:extent(cc) =>
typeOf(value(aa)(oo)) = propertyType(aa) ) &
109
110
111
112
/******************Link conformance****************************/
/* The number of objects participating in an association link
must be permitted by the multiplicity of corresponding member ends */
113
114
115
116
117
!(aa,pp).(aa:dom(memberEnds) & pp : dom(memberEnds(aa)) =>
(!(cc,oo).(cc:dom(association(aa)) & oo:extent(cc) =>
(card(link(aa)(oo))) >= propertyLowerMultip(pp) &
(card(link(aa)(oo))) <= propertyUpperMultip(pp) )))
118
119
120
121
/******************other consistency conditions*********************/
/*invariant properties on owningClass */
122
123
124
ran(owningClass) <: className &
dom(owningClass) <: propertyName &
125
193
126
127
INITIALISATION
...
128
129
130
131
OPERATIONS
...
END
132
194
Semantics of model evolution primitives
1
2
OPERATIONS
3
4
5
6
7
8
9
10
11
12
13
14
/**************************class addition***********************/
addClass(cName,isAbst,super) =
PRE
cName : CLASS - className &
cName /: dom(extent)
&
cName /: dom(superclass)
&
cName /: ran(superclass)
&
cName /: dom(ownedProperties) &
cName /: dom(isAbstract)
&
isAbst : BOOL
&
super : className
15
16
17
18
19
20
THEN
className
isAbstract
superclass
extent
:=
:=
:=
:=
className
isAbstract
superclass
extent
\/
\/
\/
\/
{cName}
{cName |-> isAbst}
{cName |-> super }
{cName |->{}}
||
||
||
21
22
END;
23
24
25
26
27
28
29
/**************************class deletion**************************/
deleteClass (cName) =
PRE
cName : className &
cName /: ran(superclass) &
ownedProperties[{cName}] /\ ran(opposite) = {}
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
THEN
ANY
cObjects, cProperties,cAssociations
WHERE
cObjects = extent(cName) &
cProperties = ownedProperties[{cName}] &
cAssociations = {asso|asso : ASSOCIATION &
(dom(association(asso))\/
ran(association(asso))) = {cName}}
THEN
className
:= className - {cName}
||
isAbstract
:= {cName} <<| isAbstract
||
superclass
:= {cName} <<| superclass |>> {cName} ||
ownedProperties := {cName} <<| ownedProperties
||
propertyName
:= propertyName - cProperties
||
195
propertyType
owningClass
association
memberEnds
extent
value
link
46
47
48
49
50
51
52
53
54
:=
:=
:=
:=
:=
:=
:=
cProperties
owningClass
cAssociations
cAssociations
{cName}
cP roperties
cAssociations
<<|
|>>
<<|
<<|
<<|
<<|
<<|
propertyType
{cName}
association
memberEnds
extent
value
link
||
||
||
||
||
||
END
END;
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
/*********************attribute addition********************/
addAttribute(cName,attrName,type,exp) =
PRE
cName
: CLASS
&
attrName : PROPERTY - propertyName &
cName /= owningClass(attrName)
&
attrName /:ownedProperties[{cName}] &
attrName /: dom(propertyType)
&
attrName /: dom(value)
&
type
: ran(primitiveType)
&
exp
: VALUE
&
typeOf(exp) = type
&
cName
/: ran(superclass)
THEN
ANY objectId
WHERE objectId = extent(cName)
THEN
propertyName
:= propertyName
\/ {attrName}
ownedProperties
:= ownedProperties \/ {cName|->attrName}
owningClass(attrName) := cName
propertyType
:= propertyType
\/ {attrName|-> type}
value
:= value
\/
{attrName|->(objectId * {exp})}
END
END;
81
82
83
84
85
86
87
88
89
90
91
92
/************************attribute deletion*******************/
deleteAttribute(cName,attrName) =
PRE
cName : CLASS &
attrName : PROPERTY
THEN
propertyName
:= propertyName - {attrName}
ownedProperties := ownedProperties - { cName |-> attrName}
owningClass
:= {attrName} <<| owningClass
propertyType
:= {attrName} <<| propertyType
value
:= {attrName} <<| value
196
||
||
||
||
||
||
||
||
93
END;
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
/***********************association addition********************/
addAssociation(assoName,srcClass,srcProp,tgtClass,tgtProp,isComp,exp) =
PRE
assoName: ASSOCIATION - associationName &
srcClass : CLASS & srcClass /: dom(association(assoName)) &
tgtClass : CLASS & tgtClass /: ran(association(assoName)) &
srcProp : PROPERTY - propertyName &
srcProp /:ownedProperties[{srcClass}]&
srcClass /= owningClass(srcProp) &
srcProp /: dom(isComposite) &
srcProp /: dom(propertyType) &
srcProp /: dom(opposite) &
tgtProp : PROPERTY - propertyName &
tgtProp /:ownedProperties[{tgtClass}]&
tgtClass /= owningClass(tgtProp) &
tgtProp /: dom(isComposite) &
tgtProp /: dom(propertyType) &
tgtProp /: dom(opposite) &
isComp
: BOOL &
exp
: OBJECTID
THEN
ANY srcOID
WHERE srcOID = extent(owningClass(srcProp))
THEN
associationName := associationName \/ {assoName}
||
propertyName
:= propertyName
\/ {srcProp,tgtProp}
||
association
:= association
\/
{assoName|->{srcClass|->tgtClass}}
||
memberEnds
:= memberEnds
\/
{assoName|->{srcProp|-> tgtProp}}
||
ownedProperties := ownedProperties \/
{srcClass|->srcProp, tgtClass|->tgtProp} ||
owningClass
:= owningClass
\/
{(srcProp|-> srcClass),
(tgtProp|-> tgtClass)}
||
propertyType
:= propertyType
\/
{(srcProp|-> classType(tgtClass)),
(tgtProp|-> classType(srcClass))}
||
isComposite
:= isComposite
\/ {srcProp|-> isComp}
||
link
:= link
\/
{ assoName |-> (srcOID * {{exp}} )}
END
END;
138
139
/**************************association deletion************************************/
197
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
deleteAssociation(assoName) =
PRE
assoName : ASSOCIATION
THEN
ANY
srcClass,tgtClass,
srcProp,tgtProp
WHERE
srcClass : dom(association(assoName))&
tgtClass : ran(association(assoName))&
srcProp : dom(memberEnds(assoName)) &
tgtProp : ran(memberEnds(assoName))
THEN
associationName := associationName - {assoName}
||
propertyName
:= propertyName - {srcProp,tgtProp} ||
association
:= {assoName} <<| association
||
memberEnds
:= {assoName} <<| memberEnds
||
propertyType
:= {srcProp,tgtProp}<<| propertyType ||
ownedProperties := ownedProperties {srcClass|->srcProp, tgtClass|->tgtProp} ||
owningClass
:= owningClass
{srcProp|->srcClass}-{tgtProp|->tgtClass} ||
link
:= {assoName} <<| link
END
END
END
166
198
Appendix :
Object to Relational Refinement
1
2
3
4
REFINEMENT
Object_to_Relational
REFINES
DataMetamodel3
5
6
7
8
9
VARIABLES
classKey , ownedAttributes , ownedAssoEnds ,
inheritedAttributes , inheritedAssoEnds ,
propertyClass , associationTable
10
11
12
13
14
15
16
17
18
INVARIANT
classKey
propertyClass
ownedAttributes
ownedAssoEnds
inheritedAttributes
inheritedAssoEnds
associationTable
:
:
<:
<:
:
:
:
CLASS
>+> NAT
&
PROPERTY <-> CLASS
&
ownedProperties
&
ownedProperties
&
CLASS
<-> PROPERTY
&
CLASS
<-> PROPERTY
&
ASSOCIATION +-> ( PROPERTY +-> PROPERTY ) &
19
20
/*************linking invariants ******************/
21
22
23
dom(classKey) <: className &
dom(classKey) = isAbstract ~ [{FALSE}] &
24
25
26
27
propertyClass =
(ownedAttributes \/ inheritedAttributes \/
ownedAssoEnds
\/ inheritedAssoEnds ) ~
&
28
29
owningClass = (ownedAttributes \/ ownedAssoEnds)~ &
30
31
dom ( association ) = dom ( associationTable ) &
32
33
34
35
36
37
! (aa,ae1,ae2).(aa : dom (associationTable) &
ae1 : dom (associationTable(aa)) &
ae2 : ran (associationTable(aa)) =>
propertyClass [{ae1}] = dom (association(aa))
propertyClass [{ae2}] = ran (association(aa)))
&
38
39
40
41
ownedAttributes
\/ ownedAssoEnds
= ownedProperties
ran ( ownedAttributes ) /\ ran ( ownedAssoEnds ) = {}
&
&
42
43
propertyType [ ran ( ownedAttributes ) ] = ran ( primitiveType ) &
199
44
propertyType [ ran ( ownedAssoEnds ) ] = ran ( classType )
&
45
46
47
48
49
ownedAttributes =
ownedProperties |> propertyType ~ [ ran ( primitiveType ) ]
ownedAssoEnds
=
ownedProperties |> propertyType ~ [ ran ( classType ) ]
&
&
50
51
52
53
54
55
56
57
58
59
60
dom ( inheritedAttributes ) <: className &
! cc . ( cc : className &
cc /: dom ( superclass ) /* toplevel class */
=>
inheritedAttributes [ { cc } ] = {} ) &
! cc . ( cc : className &
cc : dom ( superclass ) /* inherited class */
=>
inheritedAttributes [{cc}] = ownedAttributes
[closure1(superclass)[{cc}]]) &
61
62
63
64
65
66
67
68
69
70
71
72
73
dom ( inheritedAssoEnds ) <: className &
! cc . ( cc : className &
cc /: dom ( superclass )
=>
inheritedAssoEnds [ { cc } ] = {} ) &
! cc . ( cc : className &
cc : dom ( superclass )
=>
inheritedAssoEnds[{cc}] = ownedAssoEnds
[closure1(superclass)[{cc}]])
/*************************************************************/
INITIALISATION
74
75
76
77
78
79
80
81
82
classKey
inheritedAttributes
inheritedAssoEnds
ownedAttributes
ownedAssoEnds
associationTable
propertyClass
...
:=
:=
:=
:=
:=
:=
:=
{}
{}
{}
{}
{}
{}
{}
||
||
||
||
||
||
83
84
OPERATIONS
85
86
87
88
89
90
/***********************Refinement of class addition*******************/
addClass ( cName , isAbst , super ) =
IF isAbst = FALSE
THEN
ANY
cKey
200
91
92
93
94
95
96
97
98
99
100
101
102
103
WHERE cKey : NAT - ran ( classKey )
THEN
classKey := classKey \/ { cName |-> cKey } END ||
inheritedAttributes := inheritedAttributes \/
({cName}*(ownedAttributes
[closure1 (superclass \/{cName|->super})[{cName}]])) ||
inheritedAssoEnds := inheritedAssoEnds \/
({cName }*(ownedAssoEnds
[closure1(superclass \/ {cName|->super})[{cName}]])) ||
propertyClass := propertyClass \/
((inheritedAttributes[{super}])*{cName}) \/
(ownedAttributes [{super}]*{cName} )
||
END ;
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
/**********************Refinement of class deletion***********************/
deleteClass ( cName ) =
BEGIN
ownedAttributes
:= { cName }
<<| ownedAttributes
||
ownedAssoEnds
:= { cName }
<<| ownedAssoEnds
||
propertyClass
:= propertyClass |>> { cName }
||
inheritedAttributes := { cName }
<<| inheritedAttributes
||
inheritedAssoEnds
:= { cName }
<<| inheritedAssoEnds
||
classKey
:= { cName }
<<| classKey
||
ANY
classAssos , subclassAssos
WHERE
classAssos = { asso | asso : ASSOCIATION &
( ( dom ( associationTable ( asso ) ) \/
ran ( associationTable ( asso ) ) ) /\
ownedAssoEnds [ { cName } ] ) /= {} } &
subclassAssos = { asso | asso : ASSOCIATION &
( ( dom ( associationTable ( asso ) ) \/
ran ( associationTable ( asso ) ) ) /\
inheritedAssoEnds [ { cName } ] ) /= {} }
THEN
associationTable := ( classAssos \/ subclassAssos ) <<|
associationTable
END
END ;
/*******************Refinement of attribute addition**********************/
addAttribute ( cName , attrName , type , exp ) =
BEGIN
ownedAttributes := ownedAttributes \/ { cName |-> attrName } ||
134
135
136
137
inheritedAttributes := inheritedAttributes \/
closure1 (superclass~)[{cName}]*{attrName } ||
propertyClass := propertyClass \/
201
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
{attrName}*closure1(superclass~)[{cName}] \/
{attrName |-> cName }
END ;
/******************Refinement of attribute deletion************************/
deleteAttribute ( cName , attrName ) =
BEGIN
ownedAttributes
:= ownedAttributes - { cName |-> attrName }
||
ownedAssoEnds
:= ownedAssoEnds
- { cName |-> attrName }
||
inheritedAttributes := inheritedAttributes closure1(superclass~)[{cName}]*
{attrName} ||
propertyClass
:= propertyClass {attrName} * closure1(superclass~)[{cName}]
- { attrName |-> cName }
END ;
/**********************Refinement of association addition**********************/
addAssociation ( assoName , srcClass , srcProp ,
tgtClass , tgtProp , isComp , exp ) =
BEGIN
associationTable := associationTable \/
{ assoName |-> { srcProp |-> tgtProp } } ||
ownedAssoEnds := ownedAssoEnds \/
{ ( srcClass |-> srcProp ) ,
( tgtClass |-> tgtProp ) } ||
inheritedAssoEnds
:= inheritedAssoEnds \/
closure1 ( superclass ~ ) [ { srcClass } ] * { srcProp } \/
closure1 ( superclass ~ ) [ { tgtClass } ] * { tgtProp } ||
propertyClass
:= propertyClass \/
{ srcProp } * closure1 ( superclass ~ ) [ { srcClass } ] \/
{ srcProp |-> srcClass }
\/
{ tgtProp } * closure1 ( superclass ~ ) [ { tgtClass } ] \/
{ tgtProp |-> tgtClass }
170
171
172
END ;
/**********************Refinement of association deletion********************/
173
174
175
176
177
178
179
180
181
182
183
184
deleteAssociation ( assoName ) =
ANY
srcProp , tgtProp
WHERE
srcProp = { srcP | srcP : PROPERTY &
srcP : dom ( associationTable ( assoName ) ) } &
tgtProp = { tgtP | tgtP : PROPERTY &
tgtP : ran ( associationTable ( assoName ) ) }
THEN
associationTable
:= { assoName } <<| associationTable ||
ownedAssoEnds
:= ownedAssoEnds
|>> ( srcProp \/ tgtProp ) ||
202
inheritedAssoEnds
propertyClass
185
186
187
188
:= inheritedAssoEnds |>> ( srcProp \/ tgtProp ) ||
:= ( srcProp \/ tgtProp ) <<| propertyClass
END
END
189
203
Appendix :
SQL Abstract Machine
1
2
MACHINE
SQL
3
4
5
SETS
TABLE ; COLUMN ; TUPLE ; Sql_TYPE ; Sql_VALUE
6
7
8
9
CONSTANTS
idColType ,
Sql_Default
10
11
12
13
PROPERTIES
idColType
: NAT >-> Sql_TYPE &
Sql_Default : Sql_VALUE
14
15
16
17
18
19
VARIABLES
tableName , tableColumns ,
columnName , columnType , parentTable , canBeNull , isID , isUnique ,
primaryKey , foreignKey ,
tuple
20
21
22
23
24
25
26
27
28
29
30
31
32
INVARIANT
tableName
tableColumns
columnName
columnType
parentTable
canBeNull
isID
isUnique
primaryKey
foreignKey
tuple
<:
:
<:
:
:
:
:
:
:
:
:
TABLE
TABLE
COLUMN
COLUMN
COLUMN
COLUMN
COLUMN
COLUMN
TABLE
TABLE
TABLE
<-> COLUMN
+->
<->
+->
+->
+->
<->
<->
<->
Sql_TYPE
TABLE
BOOL
BOOL
BOOL
COLUMN
&
(COLUMN --> TABLE) &
(COLUMN +-> Sql_VALUE)
33
34
35
36
37
38
39
40
41
42
43
INITIALISATION
tableName
tableColumns
columnName
columnType
parentTable
canBeNull
isID
isUnique
:=
:=
:=
:=
:=
:=
:=
:=
{}
{}
{}
{}
{}
{}
{}
{}
&
&
&
&
&
&
&
&
||
||
||
||
||
||
||
||
204
44
45
46
primaryKey
foreignKey
tuple
:= {} ||
:= {} ||
:= {}
47
48
OPERATIONS
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
/*************************Basic table operations***********************/
addTable (nn) =
PRE
nn : TABLE - tableName
THEN
tableName := tableName \/ {nn}
END ;
/*------------------------*/
alterTable (nn , col) =
PRE
nn : tableName & col : COLUMN
THEN
IF (nn |-> col) /: tableColumns THEN
tableColumns := tableColumns \/ {nn |-> col}
ELSE
tableColumns
:= tableColumns - {nn |-> col}
END
tableColumns
:= tableColumns \/ {nn |-> col}
END ;
/*------------------------*/
updateTable (nn) =
PRE
nn : dom (tuple)
THEN
skip
END ;
/*------------------------*/
removeTable (nn) =
PRE
nn : tableName
THEN
tableName := tableName - {nn }
END ;
83
84
85
86
87
88
89
90
/*************************Basic column operations***********************/
add_id_Column (nn , tn , type) =
PRE
nn
: COLUMN - columnName &
tn
: tableName &
type
: NAT
THEN
205
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
columnName := columnName \/ { nn }
columnType := columnType \/ { nn |-> idColType (type)}
parentTable := parentTable \/ { nn |-> tn }
canBeNull
:= canBeNull
\/ { nn |-> FALSE }
isID
:= isID
\/ { nn |-> TRUE }
isUnique
:= isUnique
\/ { nn |-> TRUE }
END ;
/*-----------------------*/
addColumn (nn , tn , type) =
PRE
nn
: COLUMN - columnName &
tn
: tableName
&
type
: Sql_TYPE
THEN
columnName := columnName
\/ { nn }
||
columnType := columnType
\/ { nn |-> type}
||
parentTable := parentTable \/ { nn |-> tn }
||
canBeNull := canBeNull
\/ { nn |-> TRUE}
||
isID
:= isID
\/ { nn |-> FALSE} ||
isUnique
:= isUnique
\/ { nn |-> FALSE}
END ;
/*-----------------------*/
removeColumn (colName , tName) =
PRE
colName : columnName &
tName
: tableName &
colName : tableColumns [ { tName } ]
THEN
columnName := columnName - {colName}
||
columnType := { colName } <<| columnType ||
parentTable := { colName } <<| parentTable ||
canBeNull
:= { colName } <<| canBeNull
||
isID
:= { colName } <<| isID
||
isUnique
:= { colName } <<| isUnique
END ;
||
||
||
||
||
126
127
128
129
130
131
132
133
134
135
136
137
/*--------------------------------*/
allTablecolumns <-- getColumns ( tName ) =
PRE
tName : tableName
THEN
allTableColumns := [tableColumns [{ tName }]]
END ;
/*************************Basic key operations***********************/
add_pk_Key ( cn , tName ) =
PRE
cn
: columnName &
206
138
139
140
141
142
143
144
145
146
147
148
149
tName : tableName &
tName : parentTable [ { cn } ]
THEN
primaryKey := primaryKey \/ { tName |-> cn }
END ;
/*-----------------------*/
remove_pk_Key ( tKey ) =
PRE
tKey : dom (primaryKey)
THEN
primaryKey := { tKey } <<| primaryKey
END;
150
151
152
153
154
155
156
157
158
159
160
/*--------------------------*/
add_fk_Key ( cn , t1 , t2 ) =
PRE
cn : columnName
&
t1 : tableName
&
t2 : tableName
&
t1 /= t2 & t1 : parentTable [ { cn } ]
THEN
foreignKey := foreignKey \/ { t1 |-> { cn |-> t2 } }
END ;
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
/*------------------------------------*/
fkOwngTables <--getForgnKeyTables (tName) =
PRE
tName : tableName
THEN
ANY tables, result WHERE
tables <: TABLE &
tables = { ta | ta : TABLE & ta : dom(foreignKey) &
#co.(co : COLUMN & co |-> tName : foreignKey(ta))} &
result : iseq(TABLE) &
ran(result) = tables
THEN
fk_owng_Tables := result
END
END;
/*--------------------------*/
remove_fk_Key ( tKey ) =
PRE
tKey : dom ( foreignKey )
THEN
foreignKey := { tKey } <<| foreignKey
END ;
/*************************Basic value operations***********************/
207
185
186
187
188
189
190
191
192
setValue ( tName , colName , initialValue ) =
PRE
tName
: dom (tuple) &
colName
: tableColumns [{ tName }] &
initialValue : Sql_VALUE
THEN
tuple ( tName ) := { colName |-> initialValue }
END ;
193
194
195
196
197
198
199
200
/*--------------------------*/
removeValue ( tName ) =
PRE
tName
: dom (tuple)
THEN
tuple := {tName} <<| tuple
END
201
202
END
203
208
Appendix :
Data Migration Implementation
1
2
IMPLEMENTATION
DataMigration
3
4
5
REFINES
Object_to_Relational
6
7
8
IMPORTS
SQL
9
10
11
12
13
CONSTANTS
sqlType , propertyToColumn , tableHierarchy , mapObjectID , mapSqlValue ,
classToTable , mapClassKey ,
sqlTypeOf , assoToTable , columnHierarchy
14
15
16
17
18
19
20
21
22
23
24
PROPERTIES
mapSqlType
mapSqlValue
mapObjectID
propertyToColumn
classToTable
classToTable~
assoToTable
tableHierarchy
columnHierarchy
:
:
:
:
:
:
:
:
:
TYPE
>->
VALUE
>->
OBJECTID
-->
PROPERTY
-->
CLASS
-->
TABLE
+->
ASSOCIATION -->
CLASS +-> seq (
CLASS +-> seq (
Sql_TYPE
Sql_VALUE
Sql_VALUE
COLUMN
TABLE
CLASS
TABLE
TABLE )
COLUMN )
&
&
&
&
&
&
&
&
25
26
INVARIANT
27
28
29
30
31
/*mapping data model subclasses into corresponding SQL Tables */
!cl.(cl : className
=>
ran(tableHierarchy(cl)) = classToTable[closure1(superclass~)[{cl}]]) &
32
33
34
35
36
37
38
/*mapping data owned attributes into corresponding SQL columns */
ownedAttributes = { cc, att |
cc : CLASS &
att : PROPERTY &
cc |-> att : ownedAttributes &
classToTable(cc) |-> propertyToColumn(att) : tableColumns } &
39
40
41
42
43
/*mapping data inherited attributes into corresponding SQL columns */
inheritedAttributes = { cc, att | cc : CLASS &
att : PROPERTY &
classToTable(cc) : ran(tableHierarchy(cc)) &
209
44
propertyToColumn(att) : tableColumns[{classToTable(cc)}] } &
45
46
47
48
49
propertyClass = { pp, cc | pp : PROPERTY &
cc : CLASS &
pp |-> cc : propertyClass &
propertyToColumn(pp) |-> classToTable(cc) : parentTable } &
50
51
52
53
54
55
/*Each non-abstract class in the data model should be
represented as a table in the SQL model*/
tableName <: classToTable [ className ] &
! cl . (cl:className & cl : dom (classKey) => cl : dom(classToTable ) ) &
56
57
58
59
60
61
62
63
/*column names, execluding id column names, are a subset of property names
in the abstract data model*/
columnName <: propertyToColumn [ propertyName ] &
! colName . ( colName : columnName &
isID ( colName ) = FALSE
=>
colName : propertyToColumn [ propertyName ] ) &
64
65
66
67
68
69
70
71
72
73
/* value of the id column in SQL must match the value of the class key
in the object model*/
! (cc , col) . (cc : dom ( classKey) &
col : tableColumns [ classToTable [{cc}]] &
isID (col) = TRUE
=>
sqlTypeOf ( idColType (classKey ( cc))) =
tuple (classToTable (cc)) (col)) &
74
75
76
77
78
79
80
81
82
83
84
85
86
87
/* association member ends in the abstract data model become
foreign keys in an association table in the SQL model*/
! ( assoName , me1 , me2 ) . ( assoName : dom ( memberEnds ) &
me1 : dom (memberEnds (assoName)) &
me2 : ran (memberEnds (assoName))
=>
assoToTable (assoName ) : dom ( foreignKey) &
# ff . ( ff : foreignKey [ { assoToTable ( assoName)}] &
propertyToColumn ( me1 ) |-> classToTable ( owningClass ( me1)) : ff) &
# ff . ( ff : foreignKey [ { assoToTable ( assoName ) } ] &
propertyToColumn ( me2) |-> classToTable ( owningClass ( me2)) : ff) &
propertyToColumn ( me1 ) |-> classToTable ( owningClass ( me1)) :
union ( foreignKey [ { assoToTable ( assoName)}])) &
88
89
90
/*all table names in an SQL model are mapped either from
data model classes, represented by (dom(extent)) or
210
91
92
93
from data model associations*/
dom ( tuple ) =
classToTable [ dom ( extent ) ] \/ assoToTable [ dom ( link ) ] &
94
95
96
97
98
99
100
/*a value in the abstract data model is mapped to a corresponding
tuple value in SQL model*/
! ( pp , oid ) . ( pp : dom ( value) &
oid : dom ( value ( pp ) ) =>
mapSqlValue ((value (pp) (oid))) =
union (tuple[classToTable [propertyClass[{pp}]]])(propertyToColumn(pp )))
101
102
103
104
/* a link in the abstract data model is mapped to a tuple in an association table
where object ids that make up the link in the data model are mapped
to corresponding foreign keys */
105
106
107
108
109
110
111
112
113
114
!(nn,o1,o2).(nn : dom(link) &
o1 : dom(link(nn)) &
o2 : ran(link(nn))
=>
assoToTable(nn) : dom(tuple) &
mapObjectID(o1)) =
tuple( assoToTable(nn) )[ foreignKey(assoToTable(nn)) ] &
mapObjectID[o2])) =
tuple(assoToTable(nn)[foreignKey[assoToTable(nn)]]) )
115
116
117
OPERATIONS
118
119
/*****************Implementation of class addition***********/
120
121
122
123
124
125
126
127
128
129
130
131
addClass (cName , isAbst , super) =
BEGIN
VAR
tName, colName, colType, counter, curr_column
IN
tName := classToTable(cName);
colName := tName;
colType := mapClassKey(cName);
inh_Columns := columnHierarchy (super);
counter := 1 ;
curr_column := inh_Columns (counter);
132
133
134
135
addTable (tName);
add_id_Column (colName,tName,colType);
add_pk_Key (colName,tName);
136
137
WHILE
211
counter <= size (inh_Columns)
138
DO
139
alterTable (tName, curr_column);
addColumn (colName,tName, colType);
counter := counter + 1
INVARIANT
primaryKey [{tName}] \/
propertyToColumn [inh_Columns [1..counter]] <:
tableColumns[{tName}]
VARIANT
card (inh_Columns) - counter
END
140
141
142
143
144
145
146
147
148
149
END
150
151
END;
152
153
154
/*****************Implementation of class deletion***********/
deleteClass ( cName ) =
155
156
157
BEGIN
VAR
tName,fk_owng_Tables, fk_column, count_fkTable
158
159
160
161
162
163
IN
fk_owng_Tables
tName
fk_column
count_fkTable
:=
:=
:=
:=
getFKTables (tName);
classToTable(cName);
foreignKey(curr_fkTable)
1;
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
/* loop to de-link foreign keys */
WHILE
count_fkTable < = size (fk_owng_Tables)
DO
VAR
curr_fkTable
IN
curr_fkTable := fk_owng_Tables (count_fkTable) ;
remove_fk_Key (curr_fkTable );
alterTable (curr_fkTable, fk_column) ;
removeColumn (fk_column , curr_fkTable);
count_fkTable := count_fkTable + 1
END
INVARIANT
!(ta1,ta2,col) . (ta1:dom(foreignKey) &
(col|->ta2):foreignKey[{ta1}] =>
(ta1|->col) : tableColumns
VARIANT size (fk_owng_Tables) - counter
END
END;
212
185
186
/* loop to remove table columns */
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
VAR
allTableColumns, count_column
IN
allTableColumns := getTableColumns(classToTable(cName));
count_column
:= 1;
WHILE
count_column < size(allTableColumns)
DO
VAR curr_column
IN
curr_column := allTableColumns (count_column);
alterTable (tName);
removeColumn (tName, curr_column);
count_column := count_column +1
END
INVARIANT
!(ta, col) . (ta : dom(tableColumns) &
col : tableColumns[{ta}] =>
col |-> ta : parentTable)
VARIANT
card (allTableColumns) - count_column
END
END
211
212
213
214
remove_pk_Key (tName);
remove_id_Column (tName, colName);
removeTable(tName)
215
216
END;
217
218
/*****************Implementation of attribute addition***********/
219
220
addAttribute (cName,attrName,type,exp) =
221
222
223
BEGIN
VAR
inh_Tables , tName , colName , colType , initialValue
224
225
226
227
228
229
230
231
IN
inh_Tables := tableHierarchy (cName) ;
tName := classToTable (cName) ;
colName := propertyToColumn ( attrName ) ;
colType := sqlType ( type ) ;
initialValue := translate(exp);
alterTable (tName , colName) ;
213
addColumn (colName , tName , colType) ;
updateTable (tName);
setValue(colName,initialValue)
232
233
234
235
VAR
236
counter
237
IN
238
counter := 1 ;
WHILE
counter <= size (inh_Tables)
DO
VAR
current_table IN
current_table := inh_Tables (counter) ;
239
240
241
242
243
244
245
246
alterTable (current_table,colName) ;
updateTable (current_table) ;
setValue(colName,initialValue);
counter := counter + 1
END
INVARIANT
counter : 1..size(inh_Tables)+1 &
inh_Tables = tableHierarchy(cName) &
ran(inh_Tables) =
classToTable [closure1(superclass~) [{cName}]] &
{cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): (tableColumns \/
{classToTable(cName) |-> propertyToColumn (attrName)})
[{classToTable(cc)}]}
=
inheritedAttributes \/ (inh_Tables;classToTable~) [1..counter-1]*{attrName}
VARIANT size (inh_Tables)-counter
END
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
END
END
END ;
269
270
271
/*****************Implementation of attribute deletion***********/
deleteAttribute (cName , attrName) =
272
273
274
275
276
277
278
BEGIN
VAR inh_Tables , tName , colName,
counter, current_table
IN
inh_Tables := tableHierarchy (cName) ;
tName
:= classToTable (cName) ;
214
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
colName := propertyToColumn (attrName) ;
counter := 1;
current_table := inh_Tables (counter);
alterTable (tName , colName) ;
removeColumn (colName , tName) ;
WHILE
counter < = size(inh_Tables)
DO
alterTable (current_table , colName) ;
removeColumn (colName , current_table) ;
updateTable (current_table) ;
counter := counter + 1
INVARIANT
counter : 1..size(inh_Tables)+1 &
inh_Tables = tableHierarchy(cName) &
ran(inh_Tables) =
classToTable [closure1(superclass~) [{cName}]] &
propertyToColumn (attrName)})
[{classToTable(cc)}]}
=
inheritedAttributes \/
(inh_Tables;classToTable~) [1..counter-1]*{attrName}
VARIANT
card ( inh_Tables ) - counter
END
END
END ;
306
215
Appendix D
Proof activities
Abstract machine proofs
The machine corresponding to the evolution metamodel has been proved with the
AtelierB [13]. As we explained in Chapter 5, Section 5.4.1, from a theoretical point
of view, one proof obligation is generated for each substitution statement of abstract
machine operations. This proof obligation verifies that the execution of the operation
maintains abstract machine invariant. In order to facilitate the establishment of this
proof obligation, the prover operates simplification rules that lead to a great number
of proofs which are easier to achieve than one large proof for each operation.
To carry out these proofs, AtelierB [13] includes two complementary proof tools.
The first one is automatic, implementing a decision procedure that uses a set of
deduction and rewriting rules. The second prover allows the users to enter into a
dialogue with the automatic prover by defining their own deduction and/or rewriting
rules that guide the prover to find the right way to discharge the proofs. To ensure the
correctness of the interactive proofs, each added rule must be proved by the prover
of the AtelierB before it can be used.
To ensure the correctness of abstract specifications, we have established the correctness of each abstract evolution operation using AtelierB prover. About 78 proofs
were raised. The proof obligations generated concern the consistency constraints
(formalized as abstract machine invariants) and model edits (formalized as abstract
machine operations) as we described in Sections 5.2 and 5.3 respectively. With AtelierB (version 4.0), about 76% of these proofs have been automatically discharged.
Proof obligations involving transitive closure operator (which we use to represent the
inheritance hierarchy in the data model) were not discharged due to lack of inference
rules within the prover to discharge proof obligations involving this operator.
216
Below, we present a representative sample of proof obligation examples that we
had to prove manually. In each example, we first present the proof obligation generated by the prover, followed by a brief explanation of the basis on which the proof
obligation was discharged. At the end of this appendix, we present a detailed example, showing in a step-by-step, how a particular proof obligation was discharged using
AtelierB prover.
addClass. Proof Obligation No.5
1
2
3
4
5
6
7
8
9
10
Local hypotheses
&
c1: className \/ {cName}
& c2: className \/ {cName}
&
not(c1 = c2)
& c1|->c2: closure1(superclass\/{cName|->super}) &
"Check that the invariant
(!(c1,c2).(c1: className & c2: className &
c1/=c2 & c1|->c2: closure(superclass) =>
ownedProperties(c1) /\ ownedProperties(c2) = {}))
is preserved by addClass operation
=>
(ownedProperties \/ {cName |->{}})(c1) /\ (ownedProperties \/ {cName|->{}})(c2) = {}
Table D.1: PO of addClass edit vs. property name uniqueness
The goal of this proof obligation is to show that invariant property name uniqueness, which enforces property name uniqueness within data model class inheritance
hierarchy is preserved by addClass model edit. Figure D.1 shows an example clarifying the goal of this proof obligation. In the diagram, figure D.1(a) shows an invalid
state space as property ax is already defined in class C3 which is a superclass of class
C1, i.e. property ax is already inherited by C1 and does not need to be defined in C1
again. Figure D.1(b), on the other hand, shows a valid state space. Although property ax is defined in classes C4 and C5, these two classes are not in the inheritance
hierarchy of C1, so property name uniqueness is not violated.
C3
ax
ay
C3
ax
ay
C2
az
C4
c3
C5
C2
ax
an
az
C1
ax
c4
ax
ak
C1
ab
(a)
(b)
Figure D.1: Illustration of property name uniqueness
217
Proof. We discharge this proof obligation using case analysis. If c1 = cName,
this means that c1 is a fresh class name being added by addClass operaion and as
such, c1 did not have any properties before addClass operation executed. This gives
us an empty set from the function application. If c1 /= cName, this means that c1
used to exist in the data model before the operation executed. Since the invariant
was preserved before the operation executed, we can safely assume that cName /:
dom(ownedProperties). Same case analysis is done for variable c2. In summary,
if neither c1 nor c2 is cName, then their old values used to hold. If one of them
is cName, then the function application gives the empty set, which shows that the
invariant holds
addClass. Proof Obligation No.6
1
2
3
4
5
6
Check that the invariant
(closure1(superclass)/\id(className) = {})
is preserved by addClass operation
=>
closure1(superclass \/ {cName |->super}) /\
id(className \/ {cName})
= {}
Table D.2: addClass edit vs. absence of circular inheritance invariant
The goal of this proof obligation is to demonstrate that the absence of circular
inheritance invariant is not violated by addClass edit. Note that in the abstract
machine, we use variable superclass, defined as a partial function from CLASS to
CLASS to identify the immediate superclass of a data model class. We use the transitive closure, denoted by closure1 , applied on superclass function to identify other
superclasses of the inheritance hierarchy. Assuming that the absence of circular inheritance invariant and the precondition of addClass operation hold, we would like
to prove that the invariant still holds after the operation is executed.
Proof. We first apply deduction. The new goal becomes:
closure1(superclass\/{cName|->super}) /\
id(className \/ {cName}) = {}
Since AtleierB prover has not got enough inference rules to support discharging proof
obligations involving transitive closure, a new rule is added to the prover to assert
that closure1 of superclass is a tree that has no cycles and that the newly added
class is a new leaf (i.e. not a superclass itself). The added rule is stated as follows:
1
2
closure1(f) /\ id(E) = {} &
not(x:dom(f)) &
218
3
4
not(x=y) =>
closure1(f \/ {x|->y}) /\ id (E \/{x}) = {}
This states that if the goal to be proved has the form specified in line 4 in the
listing above, then this goal could be proved on the basis of hypotheses stated in
lines 2-3. With the help of the interactive prover, the hypothesis in line 2 could be
mapped to the precondition : cName /: dom(superclass), and the hypothesis in
line 3 to cName : CLASS - className & super: className. Accordingly, the goal
is proved
addClass. Proof Obligation No.8
1
2
3
4
5
6
7
8
9
Local hypotheses &
cc : dom(extent
\/ {cName|->{}})
&
aa : (ownedProperties \/ {cName|->{}})(cc) &
oo : (extent\/{cName|->{}})(cc) &
"Check that the invariant
(!(cc,aa,oo).(cc : dom(extent) & aa : ownedProperties(cc) &
oo : extent(cc) => typeOf(value(aa)(oo)) = propertyType(aa)))
is preserved by addClass operation =>
typeOf(value(aa)(oo)) = propertyType(aa)
Table D.3: addClass edit vs. value conformance
The purpose of this proof obligation is to show that the type of each value assigned
to an object in the data model should match the type given to the attribute, which
the value instantiates. For example, we could have an attribute named age : NAT
in Class Person. When the class is instantiated into objects (for example, Person is
instantiated into p1 object), these objects can be assigned values corresponding to
their class attributes (e.g., p1 can have value of 27 for the age attribute).
Proof. By case analysis. If cc=cName, we would get a false hypothesis for variable
aa since, in this case, the function application would type aa by the the empty set.
The same can be said about variable oo. Based on these two false hypotheses, we
discharge this case. If cc/=cName, this implies the situation prior to the execution of
the operation, and, hence, we could instantiate all the uiversally quantified variables
of the invariant (i.e. cc,aa,oo) based on hypotheses already existing in the machine
deleteAttribute. Proof Obligation No.5
This proof obligation ensures that typing invariant of machine values is not violated by attribute deletion.
219
1
2
3
4
"Check that the invariant
(value: PROPERTY +-> (OBJECTID +-> VALUE))
is preserved by deleteAttribute operation =>
{attrName}<<|value: PROPERTY +-> (OBJECTID +-> VALUE)
Table D.4: deleteAttribute edit vs. value conformance
Proof. After deduction, the goal becomes :
{attrName}<<|value: PROPERTY <-> (OBJECTID +-> VALUE)
Since dom(a): POW(s) & ran(a): POW(t) => a : s <-> t, this goal can be rewritten into two sub-goals:
Subgoal-1 : dom({attrName}<<|value) <: PROPERTY and
Subgoal-2 : ran({attrName}<<|value) <: OBJECTID +-> VALUE
Subgoal-1 can be simplified into dom(value)-{attrName} <: PROPERTY, which can
be further simplified to:
dom(value) <: PROPERTY\/{attrName}, since a: POW(c\/b) => a-b: POW(c).
This Subgoal can now be re-written as dom(value) <: PROPERTY, and discharged
since it exists in machine hypothesis. Subgoal-2 can be simplified into :
ran({attrName}<<|value) <: OBJECTID +-> dom(typeOf),
using equality of VALUE = dom(typeOf) from the machine. This Subgoal can be
discharged since (ran(r): POW(b)) => ran(a<<|r):POW(b))
addAssociation. Proof Obligation No.16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Local hypotheses&
srcOID = extent(owningClass(srcProp)) &
aa: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
pp: dom((memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa)) &
cc$0: dom((association\/{assoName|->{srcClass|->tgtClass}})(aa)) &
oo$0: extent(cc$0) &
Check that the invariant
(!(aa,pp).(aa: dom(memberEnds) &
pp: dom(memberEnds(aa)) =>
!(cc,oo).(cc: dom(association(aa)) &
oo: extent(cc) => card(link(aa)(oo))>=propertyLowerMultip(pp) &
card(link(aa)(oo))<=propertyUpperMultip(pp))))
is preserved by addAssociation operation
=>
card((link\/{assoName|->srcOID*{{exp}}})(aa)(oo$0))<=propertyUpperMultip(pp)
Table D.5: addAssociation edit vs. link conformance (cardinality)
The aim of this invariant is to ensure association link consistency. In the data
model, each association is instantiated into a link. For example, the association
named weworksFor between Employee Class and Department Class, is instantiated
220
into a link, we may call it wf. This link will be linking objects of the Employee
Class (e.g. e1, e2, e3) to objects of the Department Class (e.g. d1, d2, d3). The
cardinality of wf will be determined by the multiplicity of the association member
end of the association which wf is an instance of (in this case, worksFor ). In this
example, worksFor which is an association name, has two member ends: employees
property of Employee Class and department property of the Department Class. Each
member end will have a lower and an upper multiplicity. Assume that the lower
and upper multiplicity of the department end is 1 and 3 respectively, this means
that an employee object can be linked to a minimum of 1 department object and a
maximum of 3 department objetcs. In this proof obligation, we need to verify that
addAssociation operation does not cause the number of linked objects to exceed the
upper multiplicity of the association ends.
Proof. We discharge this proof using case analysis. We first assume that aa
is a new association introduced by addAssociation operation, i.e. aa = assoName.
In this case, the operation does not affect (increase or decrease) the cardinality of
objects of the owning class of the association end, here pp , i.e. the number of
these objects is determined outside the operation and read into the operation using
srcOID = extent(owningClass(srcProp)). In addition, under the same assumption (i.e. aa = assoName), the operation does not affect the upper multiplicity of the
association end (pp in this case). Since the operation affects neither the cardinality of
linked objects nor the upper multiplicity of the property, we discharge the proof under
this case. If aa/=assoName, this means that aa corresponds to an existing association
name already existing in the data model. Since we assume that the invariant holds
prior to the execution of the operation, we can discharge the proof under this case
too
Refinement proofs
As outlined in Section 2.4 in Chapter 2 (and demonstrated in Ch 5, Section 5.4.2), the
invariants of the Refinement machines does not only define new variables introduced
in refnement but also relates those variables to variables defined in the abstract machine (i.e. glue the two state spaces). Accordingly, for a refinement step to be valid,
we need to ensure that every possible execution in the refinement machine must correspond to some execution of the abstract machine. In the following proof activity,
we demonstrate that each of our refined evolution operations is a valid refinement of
its corresponding operation at the abstract machine level. In other words, our main
221
task is to discharge the following proof obligation:
I ∧ J ∧ P ⇒ [S1] ¬ [S ] ¬ J
Where I denotes the invariant of the abstract machine; J denotes the invariant
of the refinement machine; P denotes the preconditions of the abstract machine; [S1]
and [S] denote the execution of refinement machine and that of the abstract machine
respectively.
Since we assume that the abstract machine substitution [S] is called within its
precondition and given that ¬S ¬J means that there exists at least one S that
establishes J , our main focus will be to prove that each refinement operation satisfies
the relevant linking invariants in the refinement machine which defines the properties
of new variables in relation to abstract variables. Once such proof is established,
we can conclude that the refined substitution is developed in accordance with the
abstract specification.
To ensure the correctness of the refinement process we performed, we have established the correctness of each refinement rule using Atelier B prover [13]. About
120 refinement proofs were raised. With AtelierB (version 4.0), about 70% of these
proofs have been automatically discharged, but this concerns only the easier proofs.
The remaining proofs are rather hard and time-conuming. Fortunately, the generic
feature of our refinement rules made it possible to define proof tactics that enable
to automate the refinement proofs. This means that, once the proof of a generic
refinement rule has been obtained, it is possible to reuse it in other instantiations of
the rule. Below, we describe some of the proofs we had to perform manually using
Atelier B prover.
addAssociation.Proof Obligation No.2
This proof obligation is raised to verify if addAssociation operation preserves
the following linking invariant:
owningClass = (ownedAttributes \/ ownedAssoEnds)~
which relates owningClass variable, defined in the abstract machine to the union of
the inverse of two sets: ownedAttribute and ownedAssoEnds. In other words, we
need to ensure that any owned attribute or association end in the refined data model
is still owned by a data model class. We need to ensure that this is the case after
performing the refined substitution of addAssociation operation. The goal we need
to prove is:
222
1
2
3
4
5
6
7
8
"‘addAssociation preconditions in this component’" &
assoName: ASSOCIATION &
not(assoName: associationName) &
srcClass: CLASS &
not(srcClass: dom(association(assoName))) &
tgtClass: CLASS &
not(tgtClass: ran(association(assoName))) &
srcProp: PROPERTY &
9
10
11
12
13
14
15
16
"‘Check that the invariant
(owningClass = (ownedAttributes \/ownedAssoEnds )~)
is preserved by addAssociaiton operation
=>
(ownedAttributes \/ (ownedAssoEnds \/
{srcClass|->srcProp,tgtClass|->tgtProp}))~ =
owningClass\/{srcProp|->srcClass,tgtProp|->tgtClass}
Table D.6: PO No.2 of addAssociation refined operation
(ownedAttributes \/(ownedAssoEnds \/
{srcClass|->srcProp,tgtClass|->tgtProp}))~ =
owningClass\/
{srcProp|->srcClass,tgtProp|->tgtClass}
⇔
ownedAttributes~ \/ (ownedAssoEnds~ \/
{srcProp|->srcClass,tgtProp|->tgtClass}) =
owningClass\/
{srcProp|->srcClass,tgtProp|->tgtClass}
By replacing (ownedAttributes \/ ownedAssoEnds)~ by
ownedAttributes~ \/ ownedAssoEnds~ in the LHS of equality, (inverse of union is
union of inverses)
⇔
ownedAttributes~ \/ (ownedAssoEnds~ \/
{srcProp|->srcClass,tgtProp|->tgtClass}) =
(ownedAttributes \/ ownedAssoEnds )~ \/
{srcProp|->srcClass,tgtProp|->tgtClass}
By replacing the definition of owningClass from the linking invariant
⇔
ownedAttributes~ \/(ownedAssoEnds~ \/
{srcProp|-> srcClass,tgtProp|->tgtClass}) =
ownedAttributes~ \/ ownedAssoEnds ~\/
{srcProp|-> srcClass,tgtProp|->tgtClass}
223
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
"‘deleteClass preconditions in this component’" &
cName: className &
not(cName: ran(superclass)) &
ownedProperties[{cName}]/\ran(opposite) = {} &
"‘Local hypotheses’" &
classAssos = {asso | asso: ASSOCIATION
not(dom (associationTable(asso)) \/
ran (associationTable(asso)) /\
ownedAssoEnds[{cName}] = {})} &
subclassAssos = {asso | asso: ASSOCIATION &
not (dom (associationTable (asso)) \/
ran (associationTable(asso)) /\
inheritedAssoEnds[{cName}] = {})} &
"‘Check that the invariant
(propertyClass = (
ownedAttributes \/ inheritedAttributes \/
ownedAssoEnds
\/ inheritedAssoEnds)~)
is preserved by deleteClass operation "
=>
propertyClass |>>{cName} =
({cName}<<|ownedAttributes \/
({cName}<<|inheritedAttributes) \/
({cName}<<|ownedAssoEnds)
\/
({cName}<<|inheritedAssoEnds))~
Table D.7: PO No.13 of deleteClass refined operation
By replacing (ownedAttributes \/ ownedAssoEnds)~ by
ownedAttributes~ \/ ownedAssoEnds~ in the RHS of equality (inverse of union is
union of inverses)
⇔
ownedAttributes~ \/(ownedAssoEnds~ \/
{srcProp|->srcClass,tgtProp|->tgtClass})
=
ownedAttributes~ \/ ownedAssoEnds~ \/
{srcProp|->srcClass,tgtProp|->tgtClass}
associativity of union
deleteClass.Proof Obligation No.13
This proof obligation is raised to verify if deleteClass refinement operation preserves the following linking invariant on propertyClass variable:
propertyClass = (
ownedAttributes \/ inheritedAttributes \/
ownedAssoEnds
\/ inheritedAssoEnds)~
224
where propertyClass is a refinement variable defined as a relation from PROPERTY
to CLASS and used to refine abstract variable owningClass which was defined as a
partial function from PROPERTY to CLASS. The reason for this refinement step was the
flattening of inheritance where in Object-to-Relational refinement, the same property
could be owned by multiple classes in the data model(see Section 6.1.2). The goal we
have is:
propertyClass |>>{cName} =
({cName}<<|ownedAttributes
\/
({cName}<<|inheritedAttributes) \/
({cName}<<|ownedAssoEnds)
\/
({cName}<<|inheritedAssoEnds))~
⇔
(ownedAttributes \/ inheritedAttributes \/
ownedAssoEnds
\/ inheritedAssoEnds)~ |>>{cName} =
ownedAttributes~
|>>{cName} \/
(inheritedAttributes~|>>{cName}) \/
(ownedAssoEnds~
|>>{cName}) \/
(inheritedAssoEnds~
|>>{cName})
By using the definition of propertyClass in the linking invariant
⇔
(ownedAttributes~
\/ inheritedAttributes~ \/
ownedAssoEnds~
\/ inheritedAssoEnds~) |>>{cName} =
(ownedAttributes~ |>>{cName} \/ (inheritedAttributes~|>>{cName}) \/
(ownedAssoEnds~
|>>{cName}) \/ (inheritedAssoEnds~
|>>{cName})
By replacing(a\/b)~ by (a~\/b~)
⇔
(ownedAttributes~ \/ inheritedAttributes~ \/
ownedAssoEnds~
\/ inheritedAssoEnds~) |>>{cName} =
(ownedAttributes~ |>>{cName} \/ (inheritedAttributes~|>>{cName}) \/
(ownedAssoEnds~
|>>{cName}) \/ (inheritedAssoEnds~ |>>{cName})
By replacing (a \/ b)|>> c by (a |>> c \/ b |>>c)
⇔
225
(ownedAttributes~|>>{cName} \/ inheritedAttributes~|>>{cName} \/
ownedAssoEnds~ |>> {cName}
ownedAttributes~|>>{cName}
(ownedAssoEnds~|>>{cName})
\/ inheritedAssoEnds~ |>>{cName}) =
\/ (inheritedAttributes~|>>{cName}) \/
\/ (inheritedAssoEnds~|>>{cName})
addAtribute.Proof Obligation No.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
"‘addAttribute preconditions in this component’" &
cName: CLASS &
cName: className &
attrName: PROPERTY &
not(attrName: propertyName) &
not(cName = owningClass(attrName)) &
not(attrName: ownedProperties[{cName}]) &
not(attrName: ownedProperties[closure1(superclass)[{cName}]]) &
not(attrName: dom(propertyType)) &
not(attrName: dom(value)) &
type: ran(primitiveType) &
exp: VALUE &
typeOf(exp) = type &
not(cName: ran(superclass )) &
"‘Check that the invariant
(owningClass = (ownedAttributes \/ ownedAssoEnds)~)
is preserved by addAttribute " &
=>
(ownedAttributes \/ {cName|->attrName} \/
ownedAssoEnds )~ =
owningClass <+ {attrName|->cName}
Table D.8: PO No.2 of addAttribute refined operation
This proof obligation is raised to verify if addAttribute refinement operation
preserves the linking invariant on owningClass, as stated in Table D.9. The initial
goal is:
(ownedAttributes \/ {cName|->attrName} \/ ownedAssoEnds)~ =
owningClass <+ {attrName|->cName}
⇔
(ownedAttributes \/ {cName|->attrName}\/ownedAssoEnds)~ =
(ownedAttributes \/ ownedAssoEnds)~<+{attrName|->cName}
By using equality of owningClass
This equality is true because attrName is not an existing attribute in the data
model (otherwise the union operator \/ and overriding operator <+ would not be
equivalent) This can be proved using the hypothesises:
226
dom(owningClass) <: propertyName
(from invariant)
not(attrName: propertyName)
(from preconditions)
addClass.Proof Obligation No.7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"‘addClass preconditions in this component’" &
cName: CLASS &
not(cName: className) &
not(cName: dom(extent)) &
not(cName: dom(superclass)) &
...
"‘Local hypotheses’" &
isAbst = FALSE &
cKey: INTEGER &
not(cKey: ran(classKey)) &
"‘Check that the invariant
(inheritedAssoEnds: CLASS <-> PROPERTY)
is preserved by addClass operation" &
=>inheritedAssoEnds \/
{cName}*ownedAssoEnds [{closure1(superclass) (cName)}]:
CLASS <-> PROPERTY
Table D.9: PO 7 of addClass refinement operation
The main purpose of this proof obligation is to ensure that addClass refinement
operation does not violate the typing invariant of inheritedAssoEnds variable after
performing substitution. After deduction, the goal becomes:
inheritedAssoEnds \/
{cName}*ownedAssoEnds [{closure1(superclass) (cName)}]:
CLASS <-> PROPERTY
⇔
inheritedAssoEnds : CLASS <-> PROPERTY &
{cName}* ownedAssoEnds [{closure1(superclass)(cName)}] :
CLASS <-> PROPERTY
By applying a: s <-> t & b: s <-> t => a\/b: s <-> t
The first part of the goal: inheritedAssoEnds: CLASS <-> PROPERTY is discharged
because it exists in hypothesis. The second part of the goal can be rewritten as :
not({cName} = {}) &
not(ownedAssoEnds[{closure1(superclass)(cName)}] = {}) =>
{cName} <: CLASS &
ownedAssoEnds[{closure1(superclass)(cName)}] <: PROPERTY
227
By applying (not(a = {}) & not(b = {}) =>
a: POW(s) & b: POW(t)) => a*b: s <-> t
After deduction, the goal becomes:
{cName} <: CLASS &
ownedAssoEnds[{closure1(superclass)(cName)}] <: PROPERTY
The first part of this goal: {cName} <: CLASS can be simplified to cName:CLASS and
discharged because it exsits in hypothesis. The second part of the goal:
ownedAssoEnds[{closure1(superclass)(cName)}] <: PROPERTY
Can be rewritten as:
(ownedProperties|>propertyType~[ran(classType)])
[{closure1(superclass)(cName)}] <: PROPERTY
By using definition ownedAssoEnds = ownedProperties |>propertyType~
[ran(classType)] from the linking invariants of the Refinement machine.
⇔
ownedProperties[{closure1(superclass)(cName)}] /\
propertyType~[ran(classType)] <: PROPERTY
By applying binhyp(a: POW(c))
=> a/\b: POW(c), this goal can be discharged
because both parts of the set intersection exist in the hypothesis
addClass.Proof Obligation No.9
1
2
3
4
5
6
"‘Check that the invariant
(dom(classKey) = isAbstract~[{FALSE}])
is preserved by addClass operation " &
=>
dom(classKey\/{cName|->cKey}) =
(isAbstract\/{cName|->isAbst})~[{FALSE}]
Table D.10: PO 9 of addClass refinement operation
The main purpose of this proof obligation is to ensure that addClass refinement
operation does not violate the linking invariant on classKey variable. After deduction, goal is :
dom(classKey
\/ {cName|->cKey}) =
(isAbstract \/{cName|->isAbst})~[{FALSE}]
228
⇔
dom(classKey)
\/ {cName} =
(isAbstract~ \/ {isAbst|->cName})[{FALSE}]
By distributing the inverse operator over both parts of set union
⇔
isAbstract~ [{FALSE}]
\/ {cName} =
(isAbstract~ \/ {FALSE|->cName})[{FALSE}]
By using dom(classKey) = isAbstract~[{FALSE}] from its definition in the linking
invariants of the Refinement machine
⇔
This goal can be rewritten as:
FALSE: {FALSE} =>
isAbstract~[{FALSE}] \/ {cName} =
isAbstract~[{FALSE}]\/{cName}
&
not(FALSE: {FALSE}) =>
isAbstract~[{FALSE}] \/ {cName} =
isAbstract~[{FALSE}]
The first part of the goal can obviously be discharged. Applying deduction, the second
goal becomes:
isAbstract~[{FALSE}] \/ {cName} = isAbstract~[{FALSE}]
which can also be discharged because there are contradictory hypothesis.
deleteAssociation.Proof Obligation No.13
This proof obligation is raised to verify deleteAssociation refinement operation
does not violate the linking invariant on inheritedAssoEnds refinement variable. In
particular, we would like to ensure that, after deleting an association where a named
superclass participates (either as a source class or a target class of the association), the
inheritedAssoEnds variable of that class will be empty after deleting the association.
After deduction, goal can be stated as:
229
1
2
3
4
5
6
7
8
9
10
11
12
13
14
"deleteAssociation preconditions in this component" &
assoName: ASSOCIATION &
"‘Local hypotheses’" &
srcProp = {srcP | srcP: PROPERTY &
srcP: dom(associationTable (assoName))} &
tgtProp = {tgtP | tgtP: PROPERTY &
tgtP: ran (associationTable(assoName))} &
cc: className &
not(cc: dom(superclass)) &
"‘Check that the invariant
(!cc.(cc: className & cc/:dom(superclass) => inheritedAssoEnds[{cc}] = {}))
is preserved by deleteAssociation operation "
=>
(inheritedAssoEnds|>>(srcProp\/tgtProp))[{cc}] = {}
Table D.11: PO 13 of deleteAssociation refinement operation
(inheritedAssoEnds |>>(srcProp \/ tgtProp))[{cc}] = {}
This goal can be simplified as:
inheritedAssoEnds[{cc}]-srcProp <: tgtProp
where tgtProp is a set defined as:
tgtProp = {tgtP | tgtP: PROPERTY & tgtP: ran (associationTable(assoName))}
⇔
inheritedAssoEnds[{cc}] <: tgtProp\/srcProp
By applying a: POW(c\/b)=> a-b: POW(c).
⇔
inheritedAssoEnds[{cc}] <: {tgtP | tgtP: PROPERTY &
tgtP: ran(associationTable(assoName))} \/ srcProp
By applying the equality of tgtProp.
⇔
inheritedAssoEnds[{cc}] <:
{tgtP | tgtP: PROPERTY &
tgtP: ran (associationTable(assoName))} \/
{srcP | srcP: PROPERTY &
srcP: dom(associationTable(assoName))}
By applying the equality of srcProp.
⇔
230
{} <: {tgtP | tgtP: PROPERTY &
tgtP: ran(associationTable(assoName))} \/
{srcP | srcP: PROPERTY &
srcP: dom(associationTable(assoName))}
By applying the equality of inheritedAssoEnds[{cc}] = {}.
Accordingly, this goal can be discharged
Implementation proofs
As outlined in Section 2.4, implementation proof obligations are similar to those of
refinement. Since we have already demonstrated refinement proofs in the previous
section, we do not show implementation proofs. What we need to show, however, is
a demonstration of correctness of implementation WHILE-loops, as theses constructs
appear only in implementation and have their own proof obligations. Below, we
present an example for demonstrating WHILE-loop correctness. Discharging proof
obligations of this loop has been done interactively using AtelierB prover based on
the five conditions (proof obligations), shown in Figure 2.10 which, together, imply
that the loop is correct.
We first present the proof obligation generated by the prover, followed by a brief
explanation of the basis on which the proof obligation was discharged. We use the
WHILE-loop of addAttribute operation, as an example.
Implementation. addAttribute
Within the context of interpreting addAttribute operation to corresponding SQL operations, the main purpose of this loop is to add a column (mapped to the attribute
being added) to all tables in SQL model which correspond to subclasses of the attribute owning class (identified by input parameter cName). For example, adding an
attribute officeNo. to superclass Person would require adding the same attribute to
Person subclasses : Employee and Freelance. In SQL terms this requires adding a
column to tables corresponding to the two subclasses.
Figure D.2 shows the main components of the loop. Loop initialization (marked
with number 1) consists of declaring a local variable (counter) as a loop index and
initializing it with 1 as a value. WHILE-test (marked with number 2) compares
the value of the local variable counter with the size of inh_Tables as a condition
for executing the loop. Loop-body (marked with number 3) fetches a table from the
231
1
2
3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
VAR counter IN
counter := 1 ;
1
WHILE
counter <= size (inh_Tables)
DO
VAR current_table IN
current_table := inh_Tables(counter) ;
alterTable (current_table , colName) ;
updateTable(current_table)
;
2
counter
:= counter + 1
END
INVARIANT
counter : 1..size(inh_Tables)+1 &
inh_Tables = tableHierarchy(cName) &
ran(inh_Tables) = classToTable[closure1(superclass~)[{cName}]] &
{cc, att | cc : CLASS &
att:PROPERTY &
classToTable(cc) : ran(tableHierarchy(cc)) &
propertyToColumn(att) : tableColumns[{classToTable(cc)}] } =
inheritedAttributes \/(inh_Tables;classToTable~)
[1..counter-1]*{attrName}
VARIANT size (inh_Tables)-counter
END
3
Figure D.2: Illustration of main elements of B Implementation loop
sequence of inherited tables, and performs an alterTable and updateTable SQL
operations before increasing the counter by 1. The loop invariant consists of four
conjuncts (lines 14-22) that asserts conditions which need to remain true before the
loop starts, while the loop progresses and after the loop terminates. These predicates
play a central role in verifying the loop correctness. For the sake of presentation, we
will only consider the fourth invariant conjunct (lines 17-22). Other loop invariant
conjuncts can be proved similarly. Finally, the loop variant (line 23) consists of an
expression that must evaluate to a natural number and used to control the number
of loop iterations.
The fourth loop invariant conjunct asserts the equality of two expressions. To
the Left Hand Side (LHS) of the equality is a set comprehension charcterizing two
properties of inherited attributes. First, it asserts that an inherited attribute of a class
in the data model is an owned attribute of the class superclass. Second, it asserts that
mapping an inherited attribute to an SQL column should be in tableColumns set
union the columns corresponding to attributes owning class. To the Right Hand Side
(RHS) of the equality is a set union expression formulating the property that, within
the context of addAttribute operation, the existing set of inheritedAttributes will
be unioned with the set consisting of the cartesian product where the first maplet is
a class resulting from the mapping of inherited tables to the attribute name.
232
Given the proof obligations specified in Figure 2.10, to verify the correctness of
this loop, it is sufficient to prove that:
[S ](PO1 − PO5)
(D.1)
where [S] represents the substitution of the loop (lines 7-11, marked with number
3) and (PO1 - PO5) represents the five proof conditions, associated with WHILE
loops. In other words, we need to establish that:
[S ](PO1 ∧ PO2 ∧ PO3 ∧ PO4 ∧ PO5)
(D.2)
which is equivalent to establishing each of the conditions separately. We need to
consider each of the conditions in turn.
(PO1) [S0 ]I
The first proof obligation simply requires that the loop invariant holds in the
initial state (i.e.loop-initialization). This proof obligation can be written as :
1
2
3
4
5
6
7
8
[counter := 1]
({cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): (tableColumns\/
{classToTable(cName) |->
propertyToColumn (attrName)})[{classToTable(cc)}]}
=
inheritedAttributes \/(inh_Tables;classToTable~)[1..counter-1]*{attrName})
To simplify presentation, we will denote the expression to the left hand side of the
loop invariant (defined by the set comprehension, lines 2-6 above) with IHAttribs.
Hence, we want to prove that:
IHAttribs = inheritedAttributes\/
(inh_Tables;classToTable~)[1..counter -1]*{attrName}
⇔
IHAttribs = inheritedAttributes\/
(inh_Tables;classToTable~)[1..0]*{attrName}
(replacing counter by its value)
⇔
IHAttribs = inheritedAttributes\/
(tableHierarchy(cName);classToTable~)[{}]*{attrName}
(Cardinality lower bound is greater than cardinality upper bound)
233
⇔
IHAttribs = inheritedAttributes\/{}
(First element of cartisian product is an empty set)
⇔
IHAttribs = inheritedAttributes
(set union with an empty set)
Keeping in mind that, before the loop has been initialized, the attrName was added as
a column to the table corresponding to its owning class (identified by input parameter
cName), we now need to prove that, at the entry of the loop, IHAttribs defining the
loop invariant is equivalent to the set comprehension defining inheritedAttributes
in the linking invariant, i.e.
IHAttribs =
{cc, att |cc : CLASS & att : PROPERTY &
att : ownedAttributes[closure1(superclass)[{cc}]]
propertyToColumn(att) : (tableColumns \/
{classToTable(cName)|->
propertyToColumn (attrName)})[{classToTable(cc)}]}
which is true because the attribute that was added before the loop starts was not
added to tables corresponding to owning class subclasses, i.e.
attrName /: ownedAttributes[closure1(superclass)[{cName}]]
Accordingly, this proof obligation can be discharged
(PO3) ∀ X · (I ⇒ v ∈ N)
The main purpose of this proof obligation is to ensure that the variant expression
provided as part of the loop definition evaluates to a natural number. Hence, we need
to prove that:
1
2
3
4
5
6
7
({cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): (tableColumns\/
{classToTable(cName) |->
propertyToColumn (attrName)})[{classToTable(cc)}]}
=>
size(inh_Tables)-counter+1 : NAT
Using deducation, our goal becomes :
size(inh_Tables) - counter + 1 : NAT
234
Assuming that the loop invariant holds, we have the first invariant property of the
loop invariant as : counter : 1..size(inh_Tables)+1
which implies that the maximum value counter can get would be size(inh_Tables)+1,
as a result, the variant of the loop will remain, indeed, in N
(PO4) ∀ X · (I ∧ P ⇒ [n := v ; S ](v < n)
This proof obligation requires that the loop body should decrease the variant. Assuming that the loop invariant is true and the loop condition is true, we need to show
that, as the loop progresses, the new value of the variant is less than the old value.
Hence, we consider that :
[n := v ; S ](v < n), with n here acting as a temporary variable for holding the old
value of the variant before the substitution is performed, i.e. we need to prove that:
1
size(inh_Tables) - (counter+1) + 1 < size(inh_Tables) - (counter + 1)
which is clearly true.
(PO2) ∀ X · (I ∧ P ⇒ [S] I )
This proof obligation requires that whenever the invariant I and the loop condition
P both are true, then the substitution S in the loop body is guaranteed to establish
I. This requires that:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
({cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): (tableColumns\/
{classToTable(cName) |->
propertyToColumn (attrName)})[{classToTable(cc)}]}
=
inheritedAttributes \/
(inh_Tables;classToTable~) [1..counter-1]*{attrName})
&
(counter <= size(inh_Tables)))
=>
({cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): (tableColumns \/
{tableHierarchy(cName)(counter)|->
propertyToColumn(attrName)})[{classToTable(cc)}]}
=
235
18
19
inheritedAttributes\/
(tableHierarchy(cName);classToTable~)[1..counter+1-1]*{attrName})
Note that in the above proof obligation the invariant property appearing at the antecedent is the invariant property before applying substitution, while the invariant
property at the consequent is the invariant property after applying substitution (one
step of a loop iteration). To simplify presentation, we will write IHAttribs to denote
the set comprehension apprearing in lines 1-5 above.
We have
1..counter + 1 - 1
= 1..counter
=1..counter-1 \/ { counter }
which gives us
IHAttribs =
inheritedAttributes \/
(tableHierarchy(cName);classToTable~)[1..counter-1\/{counter}]*{attrName}
(replacing counter in RHS of the equality)
⇔
IHAttribs =
inheritedAttributes \/
(tableHierarchy(cName);classToTable~)[1..counter-1] * {attrName} \/
(tableHierarchy(cName);classToTable~)[{counter}] * {attrName}
(By applying f[A\/B]== f[A]\/f[B])
⇔
IHAttribs =
{cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): tableColumns{classToTable(cc)}]} \/
(tableHierarchy(cName);classToTable~)[{counter }]*{attrName}
(expanding the first part of the set union using set comprehension)
⇔
IHAttribs =
{cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): tableColumns [{classToTable(cc)}]} \/
{classToTable~(tableHierarchy(cName)(counter))}*{attrName}
236
(applying function compositionin the second part of the set union)
⇔
{cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): tableColumns$2[{classToTable(cc)}]} \/
(re-writing the second part of the set union as a set comprehension)
which is equivalent to IHAttribs.
(PO5) ∀ X · (I ∧ ¬P ⇒ R)
This proof obligation requires that the postcondition result R holds on exit of the
loop. R corresponds to the behavior of equivalent operation (i.e. addAttribute in
the refinement). For inheritedAttribute property that we are interested in, the
equivalent refinement substitution is:
R = inheritedAttributes \/
closure1 (superclass ~) [{cName}] * {attrName}
considering the loop invariant I, we have the following four predicates:
1
2
3
4
5
6
7
8
9
10
I1. counter : 1..size(inh_Tables)+1 &
I2. inh_Tables = tableHierarchy(cName) &
I3. ran(inh_Tables) = classToTable[closure1(superclass~)[{cName}]] &
I4.({cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): (tableColumns\/
{classToTable(cName) |->
propertyToColumn (attrName)})[{classToTable(cc)}]})
= inheritedAttributes \/
(inh_Tables;classToTable~) [1..counter-1]*{attrName}
To simplify presentation, we will denote the set comprehension at lines 4-8 above
with IHAttribs. Assuming that the above invarinat properties hold, our goal can be
stated as:
I & not(P)=> R, which gives us the following proof obligation:
1
2
3
4
5
6
7
8
not(counter <= size(inh_Tables)) & (not(P)) &
counter : 1..size(inh_Tables)+1 &
inh_Tables = tableHierarchy(cName) &
ran(inh_Tables) = classToTable[closure1(superclass~)[{cName}]] &
IHAttribs = inheritedAttributes \/
(inh_Tables ; classToTable~) [1..counter-1] * { attrName }
=>
inheritedAttributes \/ closure1(superclass~)[{cName}]*{attrName}
237
From the linking invariant on inheritedAttributes, we have:
1
2
3
4
5
{cc,att | cc: CLASS & att: PROPERTY &
att: ownedAttributes[closure1(superclass)[{cc}]] &
propertyToColumn(att): (tableColumns\/
{classToTable(cName) |->
propertyToColumn (attrName)})[{classToTable(cc)}]}
which is equivalent to IHAttribs, i.e. we have:
IHAttribs = R
(using linking invariant on inheritedAttributes)
From I1. and not(P), we can deduce:
counter = size(inh_Tables)+1
⇔
inheritedAttributes \/
(inh_Tables ; classToTable~) [1..counter-1]*{attrName}
= R
(given I4, replacing the LHS of goal equality)
⇔
inheritedAttributes \/
classToTable~[inh_Tables[1..counter-1]] * { attrName }
= R
(applying the function composition on the LHS of the equality)
⇔
inheritedAttributes \/
classToTable~[inh_Tables[1..size(Inh_Tables)+1-1]] * { attrName }
= R
(replacement of counter)
⇔
inheritedAttributes \/
classToTable~[inh_Tables[1..size(Inh_Tables)]] * { attrName }
= R
(applying +1 and -1 on the LHS of the equality)
⇔
238
inheritedAttributes \/
classToTable~[ran(inh_Tables)]] * { attrName }
= R
(replacmeent of inhTables[1..size(InhTables)] by ran(InhTables))
⇔
InheritedAttributes \/
classToTable~[classToTable[closure1(superclass~)
[{cName}]]] * { attrName }
= R
(using loop invariant hypothesis I3)
⇔
As classToTable is injective (property classToTable~ : TABLE +-> CLASS), and
that closure1(superclass~[{cName}]) is within the domain of classToTable, we
have:
classToTable~[classToTable[closure1(superclass~)[{cName}]]] =
closure1(superclass~)[{cName}]
Replacing it in the goal gives:
inheritedAttributes \/
classToTable~[classToTable[closure1(superclass~)[{cName}]]]* {attrName}
= R
⇔
inheritedAttributes \/ closure1(superclass~)[{cName}] =R
239
Proof Step
1
2
3
4
5
6
7
8
9
10
11
12
Atelier B Command
Description of the command
dc - do case
ss - simplify set
mp - mini proof
fh - false hypothesis
pp(rp.1) - predicate prover
dd - direct deduction
ss - simplify set
ph - particulariz hypothesis
ah - add hypothesis
pp(rp.1) - predicate prover
pr - prove
se - suggest for existential
Starts up a proof by case
Simplifies set expressions appearing in the goal
Allows to use prover without proof by case
Proves that a hypo. is contradictory to others
Applies the prover on selected hypotheses
Proves the goal under a stated hypothesis
same as step 2
Assigns values to variables appearing in hypotheses
Adds a predicate in the hypotheses stack
same as step 5
Calls the automatic prover
Instantiates variables under the scope of exist. quantif.
Table D.12: Overview of AtelierB proof steps for proof obligation addAssociation.PO12
Detailed Proof Illustration Using AtelierB Prover
Overview
This section shows a detailed example of how we used AtelierB [?] interactive prover
to discharge abstract machine proof obligations. We use the proof obligation of
addAssociation primitive, outlined in Section 5.3 to illustrate the steps. We first
provide a narrative description of the proof goal, and then go through the steps we
followed using the tool. The initial goal to be proved is listed below.
This goal refers to link conformance invariant, outlined in Section 5.2. The aim of
this invariant is to assert association bi-directionality property in the data model. We
would like to verify that this data model property is preserved by addAssociation
operation. In the data model, each association has an assoName and two association
memberEnds. If p1 and p2 are two association memberEnds, linked by an association
named, e.g. aa1 and, in addition, these two properties are in opposite relationship
(i.e. pointing to each other), this implies that there is another association name (e.g.
aa2), linking the same two properties, in the opposite direction. For example, assume
that we have two classes Employee and Department, and an association between
them, named worksFor linking employees property of Employee Class to department
property of Department Class and these two properties are in opposite relation. This
would imply that there is another association, named e.g. employedBy linking the
same two properties but in a different order.
Proof Steps
This section explains the corresponding proof steps that has been performed using
AtelierB. It goes through each proof command explaining how the command relates
240
1
2
3
4
5
6
7
8
9
Local hypotheses
srcOID = extent(owningClass(srcProp)) &
aa1: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
p1: dom((memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa1)) &
p2: ran((memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa1)) &
p1|->p2: opposite &
=>
#aa2.(aa2: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
(memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa2) = {p2|->p1})
Table D.13: addAssociation edit vs. link conformance (opposite)
to the illustration.
Step 1 - dc(aa1 = assoName)
1
2
3
4
5
6
7
8
9
Local hypotheses
srcOID = extent(owningClass(srcProp)) &
assoName: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
p1: dom((memberEnds\/{assoName|->{srcProp|->tgtProp}})(assoName)) &
p2: ran((memberEnds\/{assoName|->{srcProp|->tgtProp}})(assoName)) &
p1|->p2: opposite &
=>
#aa2.(aa2: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
(memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa2) = {p2|->p1})
Table D.14: Proof goal and hypotheses after proof step 1
This command starts the case analysis. The current goal is the case where aa1
is equal to assoName, and when this goal will be discharged, the second case aa1
/= assoName will have to be discharged. Note that under the current case (i.e.
aa1=assoName), variable aa1 has been replaced by assoName. This can be observed
by comparing hypotheses and goals in Table D.13 to those in Table D.14.
Step 2 - ss
1
2
3
4
5
6
Local hypotheses &
srcOID = extent(owningClass(srcProp)) &
p1 = srcProp & p2 = tgtProp & p1|->p2: opposite &
=>
#aa2.(aa2: dom(memberEnds)\/{assoName} &
(memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa2) = {p2|->p1})
Table D.15: Proof goal and hypotheses after proof step 2
The command ss (simplify set) is used to simplify the goal. It is able to deduce
that p1 = srcProp and p2 = tgtProp, as can be seen in line 3 in Table D.15.
241
Step 3 - mp
1
2
3
4
5
6
7
8
9
10
Local hypotheses &
ran(memberEnds) <: PROPERTY +-> PROPERTY &
dom(memberEnds) <: ASSOCIATION &
memberEnds: ASSOCIATION <-> (PROPERTY +-> PROPERTY) &
srcProp|->tgtProp: PROPERTY*PROPERTY & tgtProp|->srcProp: PROPERTY*PROPERTY &
p1 = srcProp & p2 = tgtProp &
srcProp|->tgtProp: opposite & srcOID = extent(owningClass(srcProp)) &
=>
#aa2.(aa2: dom(memberEnds)\/{assoName} &
(memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa2) = {tgtProp|->srcProp})
Table D.16: Proof goal and hypotheses after proof step 3
Command mp has normalized the hypotheses and generated additional hypotheses.
We can see that new hypotheses exist that were not part of the goal, by comparing
Table D.15 to Table D.16.
Step 4 - fh(srcProp,tgtProp:opposite)
1
not(srcProp|->tgtProp: opposite)
Table D.17: Proof goal after proof step 4
This command, false hypothesis fh, starts a proof by contradiction. It is used
to prove that the current case is an impossible case, by proving that one of the
hypotheses in Table D.16 introduces a contradiction. The current goal is replaced by
the negation of the hypothesis, as can be seen in Table D.17.
Step 5 - pp(rp.1)
1
2
3
4
5
6
7
8
9
10
not(aa1 = assoName) =>
("‘Local hypotheses’" &
srcOID = extent(owningClass(srcProp)) &
aa1: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
p1 : dom((memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa1)) &
p2: ran((memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa1)) &
p1|->p2: opposite &
=>
#aa2.(aa2: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
(memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa2) = {p2|->p1}))
Table D.18: Proof goal and hypotheses after proof step 5
This command runs the predicate prover on the goal, adding all the hypothesis
that have one symbol in common with the goal. As the goal is proved, it is replaced
242
1
2
3
4
5
6
7
8
9
Local hypotheses &
srcOID = extent(owningClass(srcProp)) &
aa1: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
p1: dom((memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa1)) &
p2: ran((memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa1)) &
p1|->p2: opposite &
=>
#aa2.(aa2: dom(memberEnds\/{assoName|->{srcProp|->tgtProp}}) &
(memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa2) = {p2|->p1})
Table D.19: Proof goal and hypotheses after proof step 6
by the second case. i.e. the case where aa1 /= assoName, as can be seen in line 1 in
Table D.18.
Step 6 - dd
This command has moveed the hypothesis not(aa1 = assoName), generated in
the previous step as part of the case analysis, to the global hypotheses.
Step 7 - ss
1
2
3
4
5
6
7
8
Local hypotheses&
srcOID = extent(owningClass(srcProp)) &
aa1: dom(memberEnds)\/{assoName} &
p1: dom(memberEnds(aa1)) & p2: ran(memberEnds(aa1)) &
p1|->p2: opposite &
=>
#aa2.(aa2: dom(memberEnds)\/{assoName} &
(memberEnds\/{assoName|->{srcProp|->tgtProp}})(aa2) = {p2|->p1})
Table D.20: Proof goal and hypotheses after proof step 7
Using the hypothesis not(aa1 = assoName), the set simplifier is able to simplify
the hypothesis on variables aa1, p1 and p2, this can be observed by comparing these
variables in Table D.19 and Table D.20.
Step 8
ph(aa1,p1,p2,!(aa1,p1,p2).(aa1: dom(memberEnds) &
p1: dom(memberEnds(aa1)) &
p2: ran(memberEnds(aa1)) &
p1|->p2: opposite
=>
#aa2.(aa2: dom(memberEnds) &
memberEnds(aa2) = {p2|->p1}))) &
243
1
2
3
4
aa1: dom(memberEnds)
&
p1: dom(memberEnds(aa1)) &
p2: ran(memberEnds(aa1)) &
p1|->p2: opposite
Table D.21: Proof sub-goals after proof step 8
This command starts the instanciation of the hypothesis, using the values of aa1,
p1 and p2. It is necessary to prove that the provided values meet the requirements of
the forall hypotheses,then, the hypotheses can be instantiated. As a result, the goal
is changed into four subgoals, as can be seen in Table D.21.
Step 9 - ah(aa1: dom(memberEnds)assoName)
1
aa1: dom(memberEnds)\/{assoName} => aa1: dom(memberEnds)
Table D.22: Adding a hypothesis to the present proof goal
The prover did not succeed in discharging the first subgoal. This step adds the
given hypothesis to the goal. The aim of adding this hypothesis to the goal is ensure
that the prover has all the required hypotheses so that we are able to run the predicate
prover only on this sub-goal (using the command pp(rp.0)).
Step 10 - pp(rp.0)
1
aa1: dom(memberEnds)
Table D.23: Current Proof sub-goal at proof step 10
Running the predicate prover, command pp, has discharged the previous sub-goal.
Now, we need to discharge the remaining three sub-goals, to be able to complete the
instantiation of the quantified hypotheses.
Step 11 - pr
1
2
3
p1: dom(memberEnds(aa1)) &
p2: ran(memberEnds(aa1)) &
p1|->p2: opposite
Table D.24: Current Proof sub-goals at proof step 10
244
All subgoals required for instantiating the hypothesis has been discharged, and
the instantiated hypotheses are added. Running the prover has moved the existential
hypothesis to the global hypothesis stack, and generated the corresponding hypotheses
for the variables.
Step 12 - se(aa2)
1
aa2: dom(memberEnds)\/{assoName} & memberEnds(aa2) = {p2|->p1}
Table D.25: Current Proof sub-goals at proof step 10
The se command allows proving the existential by providing a value meeting the
required properties. Here, the value aa2 (the one that has been generated in the
previous hypothesis) is suggested. The instantiation of the variables under the scope
of the existential quantifier is the last step to verify that this proof obligation can be
discharged
245