Lecture 2 - Introduction to Database Design - Conceptual Design
Lecture 2 - Introduction to Database Design - Conceptual Design
ICSE 6203:
Database Management Systems
Lecture 2
Introduction to Database Design
Conceptual Design
1
2/21/2012
2
2/21/2012
Database Design
• The database design process can be divided into six
basic steps. Semantic data models are most relevant to
only the first three of these steps.
1. Requirements Analysis:
• The first step in designing a database application is to
understand:
—what data is to be stored in the database,
—what applications must be built on top of it, and
—what operations are most frequent and subject to
performance requirements.
• Often this is an informal process involving:
—discussions with user groups and studying the current
environment and it is expected to change
—Examining existing applications expected to be replaced or
complemented by the database system.
3
2/21/2012
Database Design
2. Conceptual Database Design:
—The information gathered in the requirements analysis step
is used to develop a high-level description of the data to
be stored in the database, along with the constraints
known to hold over this data.
—This step is often carried out using the ER model
—The goal is to create a simple description of the data that
closely matches how users and developers think of the
data (and the people and processes to be represented in
the data).
—This facilitates discussion among all the people involved in
the design process, even those who have no technical
background.
—At the same time, the initial design must be sufficiently
precise to enable a straightforward translation into a data
model supported by a commercial database system
Database Design
3. Logical Database Design:
—A DBMS must be selected to implement the
database and to convert the conceptual database
design into a database schema in the data model
of the chosen DBMS.
—We will consider only relational DBMSs, and
therefore, the task in the logical design step is to
convert an ER schema into a relational database
schema.
—The result is a conceptual schema, sometimes
called the logical schema, in the relational data
model.
4
2/21/2012
5
2/21/2012
Database Design
• In the implementation phase, we must code each
task in an application language (e.g., Java), using
the DBMS to access data.
• In general, division of the design process into
steps should be seen as a classification of the
kinds of steps involved in design.
• Sometimes a complete database design will
probably require a subsequent tuning phase in
which all six kinds of design steps are interleaved
and repeated until the design is satisfactory.
6
2/21/2012
Database Design
Miniworld
Physical Design
Transaction Implementation
Internal Schema
Application Programs
7
2/21/2012
a derived attribute
R relationship
8
2/21/2012
disjoint
Total generalization
ISA
9
2/21/2012
E1 R1 E2
R2 (min,max)
R E2
customer-street
customer-name
amount
customer-city
customer-id
customer-id
10
2/21/2012
Customer
PK customer_id Loan
customer-id
address state
customer
zipcode
phone-num age
date-of-birth
11
2/21/2012
12
2/21/2012
13
2/21/2012
If E1, E2, …, En are entity sets, then a relationship set R is a subset of:
14
2/21/2012
15
2/21/2012
16
2/21/2012
a1 b1
a1 b1
a2 b2 b2
a2
a3 b3 b3
a3
b4
a4 b4
a4 b5
A B A B
b1
a1 b1 a1
a2 b2
b2 a2
a3 b3
b3 a3
a4 b4
b4 a4
a5 b5
b5
A B A B
17
2/21/2012
Participation Constraints
• The participation of an entity set E in a relationship set R is
said to be total if every entity in E participates in at least one
relationship in R. If only some of the entities in E participate
in a relationship in R, the participation of entity set E in
relationship R is said to be partial.
• As examples, consider the banking example. We would
expect that every loan entity be related to at least one
customer through a borrower relationship. Therefore the
participation of loan in the relationship set borrower is total.
• In contrast, an individual can be a bank customer whether or
not they have a loan with the bank. Thus, it is possible that
only some of the customer entities will be related to a loan
entity through the borrowers relationship. Therefore, the
participation of the customer entity set in the borrower
relationship is partial.
18
2/21/2012
19
2/21/2012
20
2/21/2012
Relationship Sets
• The primary key of an entity set allows us to distinguish
among the various entities in the set. There must be a
similar mechanism which allows us to distinguish among
the various relationships in a relationship set.
• Let R be a relationship set involving entity sets E1, E2, …,
En. Let Ki denote the set of attributes which comprise the
primary key of entity set Ei. For now lets assume that
1) all attributes names in all primary keys are unique, it will make
the notation easier to understand and it really isn’t a problem if
the names aren’t unique anyway, and
2) each entity set participates only once in the relationship.
• Then the composition of the primary key for the relationship set
depends on the set of attributes associated with the relationship
set R in the following ways:
Relationship Sets
21
2/21/2012
access date
22
2/21/2012
access date
23
2/21/2012
access date
24
2/21/2012
access date
access date
25
2/21/2012
access date
access date
access date
26
2/21/2012
access date
27
2/21/2012
28
2/21/2012
29
2/21/2012
30
2/21/2012
31
2/21/2012
Associative Entities
• The presence of one or more attributes on a
relationship suggests to the designer that the
relationship should perhaps instead be represented
as an entity type.
• An associative entity is an entity type that
associates the instances of one or more entity types
and contains attributes that are peculiar to the
relationship between those entity instances.
• For example, (see ER diagram on next page)
consider an organization that wishes to record the
date (month and year) when an employee completes
each certification course. The date completed
cannot be associated with either entity sets
EMPLOYEE or COURSE, because Date_Completed is a
property of the relationship Completes.
Associative Entities
EMPLOYEE A B COURSE
Completes
Some
sample E-R diagram
data representing
the situation
Employee_Name Course_Title Date_Completed
Kristi C++ 6/2005
Kristi Java 12/2005
Debi SQL 11/2005
Angela SQL 10/2005
Angela Perl 1/2006
32
2/21/2012
Associative Entities
Notice that the cardinality indicators
now terminate on the associative
entity rather than on the participating
entity types. Thus, an employee who
completes more than one course will
be awarded more than one certificate.
A B
EMPLOYEE Certificate COURSE
Emp_ID Course_ID
Certificate_ID
Emp_Name Course_Name
Date_Complete
E-R diagram
representing the
situation expressed as
an associate entity
Associative Entities
• How do you know whether to convert a
relationship into an associative entity type?
• There are four conditions that should exist:
1. All of the relationships for the participating entity
types are “many” relationships.
2. The resulting associative entity type has
independent meaning to end users, and preferably
can be identified with a single-attribute identifier.
3. The associative entity has one or more attributes,
in addition to the identifier.
4. The associative entity participates in one or more
relationships independent of the entities related in
the associated relationship.
33
2/21/2012
34
2/21/2012
35
2/21/2012
36
2/21/2012
payment-date
payment-num
amount
loan-num amount
loan- payment
loan
payment
37
2/21/2012
Specialization
• An entity set may include sub-groupings of entities that
are distinct in some way from other entities in the set.
For instance, a subset of entities within an entity set
may have attributes that are not shared by all the
entities in the set.
— As an example, consider the entity set person, with attributes
name, street, and city. A person could further be classified as
one of the following: student or instructor. Each of these
person types is described by a set of attributes that includes all
of the attributes of the entity set person, plus possibly some
additional attributes. For example, student entities may be
further described by the attributes gpa, and credit-hours-
earned, whereas, instructor entities are not characterized by
these attributes, but rather a different set such as, salary, and
years-employed.
38
2/21/2012
Specialization
Specialization
name street city
instructor student
ISA
39
2/21/2012
Generalization
• The refinement from an initial entity set into successive
levels of entity sub-groupings represents a top-down
design approach in which distinctions are made explicit.
• This same design process could also proceed in a bottom-
up approach, in which multiple entity sets are synthesized
into a higher-level entity on the basis of common
attributes. In other words, we might have first identified
the entity set students(name, address, city, gpa, credit-
hours-earned) and an entity set instructors(name, address,
city, salary, years-employed).
• This commonality of attributes is expressed by
generalization, which is a containment relationship that
exists between a higher-level entity set and one or more
lower level entity sets.
Generalization
• In our example, person is the higher-level entity set and
instructor and student are the lower-level entity sets.
• The higher-level entity set represents the superclass and
the lower-level entity represents the subclass. Thus,
person is the superclass of the instructor and student
subclasses.
• For all practical purposes, generalization is just the
inverse of specialization and both processes can be
applied (almost interchangeably) in designing the
schema for some real-world scenario. Notice in the E-R
diagram on previous page that there is no difference
specified between generalization and specialization other
that how you view the picture (reading from the top
down or from the bottom up).
40
2/21/2012
41
2/21/2012
Attribute Inheritance
• A crucial property of the higher and lower level entities that
are created by specialization and generalization is attribute
inheritance.
• The attributes of the higher-level entity sets are said to be
inherited by the lower-level entity sets.
— In our example above, instructor and student both inherit all the
attributes of person (recall that person is the superclass for both
instructor and student).
Attribute Inheritance
• Higher-level entity sets do not inherit any attribute or
relationship which is defined within the lower-level
entity set.
• Typically, what is developed will be a hierarchy of
entity sets in which the highest-level entity appears at
the top of the hierarchy.
• If, in such a hierarchy, a given entity set may be
involved as a lower-level entity set in only one ISA
relationship, then the inheritance is said to be single-
inheritance.
• If, on the other hand, a given entity set is involved as a
lower-level entity set in more than one ISA relationship,
then the inheritance is said to be multiple-inheritance
(then the resulting structure is called a lattice).
42
2/21/2012
Constraints on Generalization
• In order to more accurately model a real-world situation, a data
designer may choose to place constraints on a generalization (or
specialization).
• The first type of constraint involves determining which entities
can be members of a given lower-level entity set. This
membership can be defined in one of the following two ways:
Predicate-defined: In predicate-defined lower-level entity sets,
membership is evaluated on the basis of whether or not an
entity satisfies an explicit predicate (a condition).
— For example, assume that the higher-level entity set account has
the attribute account-type. All account entities are evaluated on
the defining account-type attribute. Only those entities that satisfy
the predicate account-type = “savings account” would be allowed to
belong to the lower-level entity set savings-account. Since all the
lower-level entities are evaluated on the basis of the same attribute,
this type of generalization is said to be attribute-defined.
Constraints on Generalization
User-defined: User-defined lower-level entity sets
are not constrained by a membership condition;
rather, the database user assigns entities to a
given entity set.
— For instance, suppose that after working 3 months at a
bank, the employee is assigned to one of five different
work groups. The teams would be represented as five
lower-level entity sets of the higher-level entity set
employee. A given employee is not assigned to a
specific work group automatically on the basis of an
explicit defining condition. Instead, the user
responsible for making the group assignment does so
on an individual basis, which may be arbitrary.
43
2/21/2012
Constraints on Generalization
Constraints on Generalization
Overlapping: In overlapping generalizations, the same
entity may belong to more than one lower-level entity set
within a single generalization. For example, consider the
banking work group from the previous section. Suppose
that certain managers may participate in more than one
work team. A given employee (a manager) may therefore
appear in more than one of the group entity sets that are
lower-level entity sets of employee.
— Note: lower-level entity overlap is the default case; a disjointness
constraint must be placed explicitly on a generalization (or specialization).
Within the E-R model a disjointness constraint is modeled by placing the
word “disjoint” next to the triangle symbol as shown in the example below.
The meaning of this diagram should now be clear: employees and
customers are specializations of the set persons and the disjointness
constraint implies that an employee is not also a customer. If the disjoint
constraint is removed, then it is possible for an employee to also be a
customer (or viewed from the other direction, it is possible for a person to
be both a customer as well as an employee).
44
2/21/2012
Constraints on Generalization
• A final type of constraint, the completeness constraint on a
generalization or specialization, specifies whether or not an
entity in the higher-level entity set must belong to at least
one of the lower-level entity sets within the
generalization/specialization. This type of constraint can
assume one of the following two forms:
Total generalization/specialization: Each higher-level entity
must belong to a lower-level entity.
Partial generalization/specialization: Some higher-level entities
may not belong to any lower-level entity set.
— Partial generalization is the default case. (Recall that total participation in a
relationship is represented in the E-R model by a double line – so too will it
be used to represent a total generalization. In the example shown below
the generalization is total and overlapping which means that every person
must appear as either an employee or a customer and it is possible for a
person to be both.
person
ISA
employee customer
45
2/21/2012
Aggregation
• One of the limitations of the E-R model is that it cannot
express relationships among relationships. To understand
why this is important consider the ternary relationship (3-
way relationship) works-on between employee, branch, and
job shown in the following E-R diagram.
title level
emp-name
street job
city
emp-id
branch_id
city assets
Aggregation
• Given this scenario, now suppose that we want to record the managers
for tasks performed by an employee at a branch office; that is, we want
to keep track of managers for (employee, branch, job) combinations.
Let’s assume that there is an entity set manager.
job
employee branch
works-on
manages
manager
46
2/21/2012
Aggregation
Answer:
Aggregation
• When you look at the E-R diagram which models this
situation, it would appear that the relationships sets works-
on and manages could be combined into a single
relationship set. However, we cannot do this since some
employee, branch, job combinations may not have a
manager.
• There is clearly redundant information in this figure,
however, since every employee, branch, job combination in
manages is also in works-on. If the manager were a value
rather than an entity, we could make manager a multi-
valued attribute of the relationship works-on. However,
doing this would make it more difficult (both logically as well
as in execution cost) to find, for example, employee-branch-
job triples for which the manager is responsible. However,
this option is not available in any case since the manager is
a manager entity.
47
2/21/2012
Aggregation
• The best way to model this type of situation is to use
aggregation.
• Aggregation is an abstraction through which relationships
are treated as higher-level entities.
• Thus, in our example, we would regard the relationship
set works-on (relating the entity sets employee, branch,
and job) as a higher-level entity set called works-on.
Such an entity set is treated in the same manner as any
other entity set. We can then create a binary relationship
manages between works-on and manager to represent
who manages what tasks.
• The E-R diagram in the next slide illustrates how
aggregation is represented in the E-R model.
Aggregation
job
employee branch
works-
on
manages
manager
48
2/21/2012
Multiway Relationships
Multiway Relationships
date
name address
name year
studio of star producing studio
studios
name address
country
49
2/21/2012
name address
name year
stars movies
date
star-of movie-of
contract
producing
studio-of studio
studio
name address
country
manager
unary employee employed
worker
50
2/21/2012
customer-name customer-street
customer name
customer-id customer-city
customer-id
customer-name
customer-street
customer customer-city
51
2/21/2012
Relationships
R
att1 att2
att1
att2
Cardinality Constraints
52
2/21/2012
person
person
overlapping generalization
ISA
E-R Diagrams
person
person
disjoint generalization
ISA
disjoint
E-R Diagrams
53
2/21/2012
access date
54
2/21/2012
access date
55
2/21/2012
Points to Review
1. Database design has six steps: requirements analysis, conceptual
database design, logical database design, schema refinement,
physical database design, and security design. Conceptual design
should produce a high-level description of the data, and the
entity-relationship (ER) data model provides a graphical approach
to this design phase.
2. In the ER model, a real-world object is represented as an entity.
An entity set is a collection of structurally identical entities.
Entities are described using attributes. Each entity set has a
distinguished set of attributes called a key that can be used to
uniquely identify each entity
3. A relationship is an association between two or more entities. A
relationship set is a collection of relationships that relate entities
from the same entity sets. A relationship can also have descriptive
attributes.
Points to Review
4. A key constraint between an entity set S and a relationship set restricts
instances of the relationship set by requiring that each entity of S
participate in at most one relationship. A participation constraint
between an entity set S and a relationship set restricts instances of the
relationship set by requiring that each entity of S participate in at least
one relationship. The identity and existence of a weak entity depends on
the identity and existence of another (owner) entity. Class hierarchies
organize structurally similar entities through inheritance into sub- and
superclasses. Aggregation conceptually transforms a relationship set into
an entity set such that the resulting construct can be related to other
entity sets.
5. Development of an ER diagram involves important modeling decisions. A
thorough understanding of the problem being modeled is necessary to
decide whether to use an attribute or an entity set, an entity or a
relationship set, a binary or ternary relationship, or aggregation.
6. Conceptual design for large enterprises is especially challenging because
data from many sources, managed by many groups, is involved.
56