Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ch2 ERModelingIntro Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Databases, okaram

ER Modeling

1 of 13

Entity-Relationship (ER) Modeling


The first step to develop any application is to understand what is the problem the application is supposed to solve and what functionality the application should provide. We call this step analysis, or requirements gathering. For database applications, a big part of the analysis is conceptual data modeling; where you are trying to analyze what data needs to be stored in the database. In this chapter, we cover conceptual data modeling; more precisely, we cover EntityRelationship (ER) modeling, which uses a particular kind of model, an EntityRelationship model. Notice in this book we do not cover how to actually get the requirements from the domain experts (users), but just how to express them as an Entity-Relationship model. We do not cover how to get or express the functionality requirements either, just how to model the data requirements as an ER model. Within the database community, we tend to express requirements in terms of business1 rules. A business rule is any statement that constrains any aspect of the business; in a way, business rules are the requirements for the business process, rather than just for the computer system. Business rules normally either assert business structure or somehow control business behavior. We expect most of the business rules will ultimately be automated through the DBMS and/or the computer system being built. Since business rules are business requirements, not computer ones, they will be expressed in terms familiar to the end users, not to software developers. One of the most challenging (and most interesting) parts of software development is that you are required to understand the business domain and learn to talk in their language. Other characteristics good business rules have (besides being expressed in terms familiar to end users) are shared with good requirements in general. Good business rules should be declarative (that is, state what is done, not how), atomic (express only one thing), precise (have just one meaning), consistent (both with itself and with all other business rules for the organization) and distinct or non-redundant (different from all the other business rules). In order to create a database for an application, the most important issue is the definition of the data; what data needs to be stored in the database. In order to specify what goes in the database, we create a conceptual model of the data; this is a highlevel, technology independent model of the data. While creating our data model, we will be choosing names for many data elements. Since the names we choose for all the data elements will be used throughout the application, it is extremely important to choose good names for all your data elements. Again, the names for data elements should be business-oriented rather than computer oriented; the other important issue is to be consistent in your naming conventions; mostl other issues are similar to variable naming conventions in programming languages, except that the consequences of a bad name are worse, since they will be used throughout the application and even outside of it (for ad-hoc reports or other applications that access the same database).
By Orlando Karam, Licensed under Creative Commons, Attribution, Share-Alike http://creativecommons.org/licenses/by-sa/3.0/ 1 Here the term business refers to the organization that will use the software, whether it is designed to make money or not.

Databases, okaram

ER Modeling

2 of 13

In the following sections we will be discussing the Entity-Relationship (ER) model, and the ER diagrams that we use to represent the model. Besides this diagrams, we normally have other information in textual form; at least data definitions and comments expressing constraints that are hard to express with the diagram. When we express the requirements in English2 we use terms, which are words or phrases with a specific meaning for the business, and facts which express relationships among those terms. Since we will be creating this model from user requirements, and so discussing the model with the users, we want a graphical notation, and one that's relatively easy to understand for 'normal' people. One of the most common models used in databases is the entity-relationship model (or ER model).
A term, is a word or phrase with a specific meaning for the business. A fact is a statemnet that expresses a relationship among two or more terms

1 Basic Components of an ER model


The term ER model can be used in two ways; we can use it in the general sense to refer to any model that uses entities and relationships, or to refer to a specific ER model for a specific situation. ER models are usually represented as ER diagrams, following some standard conventions. Although there is much variation in how to represent the ER model as an ER diagram, here we settle in one specific notation. The basic components of an ER model are entities and their relationships. Both entities and relationships can have attributes.

2 Entities and Attributes


Entities are the things in the real world that we want to model. We distinguish between entity types and entity instances; an entity type is a set of entities that share common characteristics, whereas an entity instance is each one of those entities. This is again the distinction intension and extension of a database, and is similar to the distinction between class and object in objectoriented programming. Notice that we are always representing the particular definition one organization gives to an entity for a particular problem, not necessarily any general concept.
An entity type is a set of entities with common characteristics. An entity instance is one of the members of an entity type.

We represent entity types in an ER diagram, by using a rectangle, with the name of the entity type inside it. As a convention, we use singular nouns for naming entity types (so we would use Person rather than People). Entities have properties, which we call attributes. We represent attributes by an oval, with the name of the attribute inside it. We use a line to attach attributes to the entity type they belong to. Figure 1 shows a simple representation of a Person entity type.

2 or any other human language

Databases, okaram

ER Modeling

3 of 13

Figure 1: Simple ER diagram for Person. Notice the rectangle represents the entity type (person) and the ovals the attributes. We classify attributes according to four independent binary categories:

Optional vs Mandatory Simple vs Composite Single-valued vs Multi-valued Stored vs Derived

An attribute is mandatory if all entity instances of a given entity type must have a value for the attribute, and it is optional otherwise. For example, for a particular application we may require that all students in the database must have a first and last name (which would make those mandatory), whereas the middle name would usually be considered optional. We do not use any specific markings in our drawing to denote whether an attribute is mandatory or optional. An attribute is simple if we cannot divide it into meaningful parts within our application; an attribute is composite if we can meaningfully divide it into parts. For a composite attribute, we will sometimes need to see it as a whole, while sometimes we will need to access its component parts. Notice that at the most basic levels, any attribute other than a bit can be decomposed; for example, a string can be decomposed into its individual characters or letters, and an integer could be divided into its individual bits; the important distinction is whether it is meaningful to do so within the application. We will distinguish the parts of a composite attribute by linking them to the composite attribute rather than to the entity type (or relationship), as shown in Figure 2.

Databases, okaram

ER Modeling

4 of 13

Figure 2: Name is a composite attribute of Person, with three parts; first, middle and last Many times, a single entity may have a set of values for an attribute rather than a single value. We call those attributes multi-valued, with attributes that have at most one value being called single-valued. Email addresses and phone numbers are common examples of multi-valued attributes, since people many times have more than one email or phone number. We use a double line for the oval to denote multi-valued attributes, as in Figure 3.

Figure 3: Email is a multi-valued attribute, denoted by double lines.

Another important distinction is between stored and derived attributes. Most attributes will be stored in the database, but a few can be obtained from other fields in the database; these attributes are called derived. We denote derived attributes by using a dashed line in the oval, rather than the normal line. The classical example would be age (which we have actually represented as a stored attribute in figures 1 through 3); we normally do not store a person's age, since the age changes; we would rather store the date of birth and calculate the age from there. If we consider age to be important enough to include in the diagram, we would add it with dashed lines. Notice we would NOT mark date-of-birth in any special way, but would add a note of some sort in our textual documentation with the formula for calculating the age. Figure 4 illustrates representing Age as a derived attribute.

Databases, okaram

ER Modeling

5 of 13

Figure 4: Age is a derived attribute (dashed line), notice date-of-birth is not especially marked; there would be a note on the definition of age detailing how to calculate it In the real world, different entities can always be distinguished from each other. When modeling entities for a database, we do not (normally) store all its attributes (which for a physical entity are arguably infinite); if we are not careful, we may not store enough attributes to distinguish among different entities, leading to confusion. To ensure that doesn't happen, we always mark an attribute (or a set of attributes) as the identifier for an entity type, meaning that no two entities will have the same value for that set of attributes. We denote the identifier by underlining its name. Notice that in many cases we may have more than one attribute that uniquely identifies entities; in that case we will just mark one of the possibilities as the identifier (by underlining it) and we will underline several attributes only when they are all, together needed to identify the entities of that type. Figure 5 shows the full diagram for the person entity type, including the use of SSN as identifier.

Figure 5: Complete ER diagram for person attributes

3 Relationship Basics
Normally, entities by themselves are not terribly useful; we need to somehow relate the entities to one another. In the ER model, relationships are used to represent associations between entities. Notice that they are the only way to represent associations among entities in the relational model3. In ER diagrams, we use diamonds
3 In the relational model, which we will discuss in other chapters, we use foreign keys to represent associations among rows in a table; in the ER model, attributes cannot be used to associate entities, only relationships are used to do that.

Databases, okaram

ER Modeling

6 of 13

to represent relationships among entities. When modeling relationships, we again can make the distinction between types and entities; a relationship type is a meaningful association that can occur between different entities of specified entity types, whereas a relationship instance is each one of the actual associations between entity instances. As with entities, we normally model relationship types, since the instances will eventually be stored in the database. Figure 6 shows an incomplete ER diagram for a relationship. The diagram does not illustrate cardinality constraints, which we will discuss shortly. Notice that we use verb phrases to name relationships (in this example, lives in); also, although relationships are bidirectional, and could be navigated in either direction, the name we give to it assumes one specific direction (in this example, it would be understood that a person lives in a country, rather than the other direction).

Figure 6: A person lives in a country (missing cardinality constraints) When modeling relationships, one of the most important issues is the degree of a relationship, that is, the number of entities that participate on it. The simplest relationships are binary, since they relate two entities (later we will discuss relationships of different degrees). Another important issue is the cardinality of a relationship. For each side of a relationship we identify constraints on the number of instances of the opposite side an entity could be related to. For example, we could specify that a person, at any given time, lives in exactly one country, and a country may have zero or more people living in a country at any given time. Figure 7 illustrates how we would represent such a situation:

Figure 7: Cardinality constrains; a person lives in exactly one country (minimum of 1, maximum of 1, represented with ||) and a country can have zero or more people living on it ( represented with 0<) On the side opposite to person (next to country) we specify the minimum and maximum

Databases, okaram

ER Modeling

7 of 13

countries a person lives in; for the minimum we would use either a 0 or a 1(represented by a circle or a vertical line) and for the maximum either 1 or more than one (represented by < or >); in this case we use || (minimum 1 maximum 1) for person; on the side opposite to country (next to person) we specify how many people could live in the same country; we specify that the minimum is zero (0) while the maximum is more than one; the full constraint is represented as >0. The ER notation can be extended to represent specific limits if necessary, but in most situations there are no specific limits other than 0,1 or more than one. For binary relationships, we combine the maximum cardinality on both sides, and we say a relationship is one-to-one, one-to-many (or many-to-one) or many-to-many. The relationship in Figure 7 would be one-to-many. Notice that relationships may also have attributes; for example, if we wanted to include in our model the date a person started living in a country, we could add an attribute (say, since) to the lives in relationship, as illustrated in Figure 8.

Figure 8: Relationships can also have attributes

4 Recursive Relationships
Recursive relationships are those that relate two or more entities of the same type. Normally, we use the entity type to distinguish among the sides of a relationship, but with recursive relationships there are at least two sides that are the same, so we need to introduce the concept of role to distinguish among them. A role is just a name given to one of the sides of a relationship. Notice that the book calls this kind of relationships unary, but this idea does not generalize. If we actually used the book's terminology, a relationship that relates three entity instances of two different entity types would be considered binary (and so, normal an easy :). Although I accept the book's definitions as answers to exams, I prefer to define degree as the number of entity instances that participate in a relationship instance. Recursive relationships are not terribly uncommon, since, besides other uses, they are useful to represent hierarchies. One of the best examples is the supervises relationship. In many companies, people supervise other people. So, if we want to show that relationship we could start thinking about having two kinds of entities, say Boss and Peon, with the supervising relationship as follows.

Databases, okaram

ER Modeling

8 of 13

Figure 9: Conceptual idea of Supervises. WRONG And we would say a boss supervises zero or more peons, and a peon is supervised by zero or one bosses. Of course, we would realize this is not exactly right, since this reflects only a two-level hierarchy, and what we want is one with unlimited levels. In order to do that, we need to introduce recursion (just like in programming), so we need to fold this diagram so the two entities become one. So, our diagram would look like this:

Figure 10: Supervises, roles missing, INCOMPLETE Now the problem would be how to distinguish among the two sides; that is whether you can have more than one peon or more than one boss. We do that by including the role each entity instance plays in the relationship instance. Basically, we assign a name to each side, and use that to distinguish among the sides. So, the full diagram, specifying that each employee has zero or one bosses (assuming the CEO doesn't have any boss), and that each employee may supervise zero or more other employees would look like this:

Figure 11: Supervises, final and right version Notice that the cardinality constraints apply to the full entity type, not just to the roles. It may look funny to allow for a boss with no subordinates, but we need to remember this cardinality constraint applies to all Employees.

Databases, okaram

ER Modeling

9 of 13

Bill of Materials
Another example is the part-of relationship, also called the bill of materials problem. This involves representing the fact that an item is made from other items; these subitems may in turn be made from other items, in a hierarchy. So, we can follow the same kind of reasoning. From a diagram that considers them to be two separate items:

Figure 12: Bill of material, conceptual idea. WRONG And then fold them into one recursive relationship as follows:

Figure 13: Bill of materials, right version, no attributes Now, we could want to improve this diagram by adding how many of each part each item uses. The diagram would look as follows:

Figure 14: Bill of materials, attribute on relationship (my preference)

Additional Examples of Recursive Relationships


There are a few other examples that always come to my mind when talking about recursive relationships. One example that is easy to understand and is NOT used to represent a hierarchy, is that of marriages. Since each marriage is between two people, this would be a recursive

Databases, okaram

ER Modeling

10 of 13

relationship; however, it is one-to-one and does not form a hierarchy. The diagram would look like this, with the roles identified as husband and wife.

Figure 15: Marriage is a one-to-one recursive relationship, that does not form a hierarchy. Notice that there may be constraints other than cardinalities (such as gender requirements for each role), and those constraints should be added as annotations outside the diagram. Another example, common in the academic domain, is that of course prerequisites and co-requisites. In many academic programs you are required to take certain courses before others (prerequisites); sometimes, you are allowed to take the courses concurrently (co-requisites). We can express this situation as two different recursive relationships among courses. Another way to model this would be to represent this as a single relationship, with the kind of requisite (either pre or co) as an attribute. The diagram would look like this:

Figure 16: Course prerequisites and correquisites as two separate relationships Figure 17: Course prerequisites and corequisites as one combined relationship

5 Additional Examples
To further clarify the basic ER concepts, we provide a few additional examples. 1. We have only one entity, called Person (everything else is represented as attributes), with the following attributes: Id (the identier); Name, which is composed of one or more given names and one or more family names; One or more aliases, an address (composed of street, city, state,zip); date-of-birth; and age, which can be calculated from the date of birth.

Databases, okaram

ER Modeling

11 of 13

Figure 18: Solution to Example 1 This example illustrates most of the different kinds of attributes, plus the fact that the representation of a given attribute depends on the situation; here name is NOT divided into rst,middle,last. 2. We have two kinds of entities: Products and Categories, both with attributes id (the identier), and name. Each product belongs to zero or more categories and each category can have zero or more products. Categories are organized in a hierarchy; each category belongs to zero or one categories, and may have zero or more sub-categories.

Figure 19: Solution to example 2 3. We want to model course and program information for a university. For each course we keep its id, its title, the number of lecture hours, the number of lab hours, the number of credit hours (which is always equal to the number of lecture hours plus one-half of the number of lab hours). Courses may have other courses as prerequisites. We also want to keep information about programs of study. For each program we want to keep its id and its title. We also want to keep track of which courses are required for a program of study, and which courses are allowed as electives for a program of study.

Databases, okaram

ER Modeling

12 of 13

This one, could be easily modeled in two different ways. The rst solution uses two different relationships between program and course; requires, which represents the fact that a course is required for a program and uses as elective, which represents the fact that a program uses a course as an elective. Notice the cardinalities for the relationships weren't really specied in the requirements, but zero-or-more sounds reasonable.

Figure 20: Solution to Exercise 3, with two different relationships between program and course The second solution uses only one relationship, with an attribute to distinguish between required and elective courses. Notice this would convey less information if the cardinalities for requires and 'Uses As Elective' were different.

Figure 21: Solution to exercise 3, with just one relationship, with a boolean attribute of the relationship indicating whether the program uses the course as a requirement or an elective

Databases, okaram

ER Modeling

13 of 13

4. We want to keep track of courses, sections and professors. A course has a course number (identier), and a title. A section has a crn (identier), semester, and year. A section belongs to exactly one course, and a course may have zero or more sections. A Professor has ssn (identier) and name. A professor teaches one or more sections (a section is taught by exactly one professor). A professor is qualied to teach one or more courses (a course may have one or more qualied professors). We also want to keep track of the date a professor became qualied to teach the course.

Figure 22: Solution to exercise 4

You might also like