Ch2 ERModelingIntro Notes
Ch2 ERModelingIntro Notes
Ch2 ERModelingIntro Notes
ER Modeling
1 of 13
Databases, okaram
ER Modeling
2 of 13
In the following sections we will be discussing the Entity-Relationship (ER) model, and the ER diagrams that we use to represent the model. Besides this diagrams, we normally have other information in textual form; at least data definitions and comments expressing constraints that are hard to express with the diagram. When we express the requirements in English2 we use terms, which are words or phrases with a specific meaning for the business, and facts which express relationships among those terms. Since we will be creating this model from user requirements, and so discussing the model with the users, we want a graphical notation, and one that's relatively easy to understand for 'normal' people. One of the most common models used in databases is the entity-relationship model (or ER model).
A term, is a word or phrase with a specific meaning for the business. A fact is a statemnet that expresses a relationship among two or more terms
We represent entity types in an ER diagram, by using a rectangle, with the name of the entity type inside it. As a convention, we use singular nouns for naming entity types (so we would use Person rather than People). Entities have properties, which we call attributes. We represent attributes by an oval, with the name of the attribute inside it. We use a line to attach attributes to the entity type they belong to. Figure 1 shows a simple representation of a Person entity type.
Databases, okaram
ER Modeling
3 of 13
Figure 1: Simple ER diagram for Person. Notice the rectangle represents the entity type (person) and the ovals the attributes. We classify attributes according to four independent binary categories:
An attribute is mandatory if all entity instances of a given entity type must have a value for the attribute, and it is optional otherwise. For example, for a particular application we may require that all students in the database must have a first and last name (which would make those mandatory), whereas the middle name would usually be considered optional. We do not use any specific markings in our drawing to denote whether an attribute is mandatory or optional. An attribute is simple if we cannot divide it into meaningful parts within our application; an attribute is composite if we can meaningfully divide it into parts. For a composite attribute, we will sometimes need to see it as a whole, while sometimes we will need to access its component parts. Notice that at the most basic levels, any attribute other than a bit can be decomposed; for example, a string can be decomposed into its individual characters or letters, and an integer could be divided into its individual bits; the important distinction is whether it is meaningful to do so within the application. We will distinguish the parts of a composite attribute by linking them to the composite attribute rather than to the entity type (or relationship), as shown in Figure 2.
Databases, okaram
ER Modeling
4 of 13
Figure 2: Name is a composite attribute of Person, with three parts; first, middle and last Many times, a single entity may have a set of values for an attribute rather than a single value. We call those attributes multi-valued, with attributes that have at most one value being called single-valued. Email addresses and phone numbers are common examples of multi-valued attributes, since people many times have more than one email or phone number. We use a double line for the oval to denote multi-valued attributes, as in Figure 3.
Another important distinction is between stored and derived attributes. Most attributes will be stored in the database, but a few can be obtained from other fields in the database; these attributes are called derived. We denote derived attributes by using a dashed line in the oval, rather than the normal line. The classical example would be age (which we have actually represented as a stored attribute in figures 1 through 3); we normally do not store a person's age, since the age changes; we would rather store the date of birth and calculate the age from there. If we consider age to be important enough to include in the diagram, we would add it with dashed lines. Notice we would NOT mark date-of-birth in any special way, but would add a note of some sort in our textual documentation with the formula for calculating the age. Figure 4 illustrates representing Age as a derived attribute.
Databases, okaram
ER Modeling
5 of 13
Figure 4: Age is a derived attribute (dashed line), notice date-of-birth is not especially marked; there would be a note on the definition of age detailing how to calculate it In the real world, different entities can always be distinguished from each other. When modeling entities for a database, we do not (normally) store all its attributes (which for a physical entity are arguably infinite); if we are not careful, we may not store enough attributes to distinguish among different entities, leading to confusion. To ensure that doesn't happen, we always mark an attribute (or a set of attributes) as the identifier for an entity type, meaning that no two entities will have the same value for that set of attributes. We denote the identifier by underlining its name. Notice that in many cases we may have more than one attribute that uniquely identifies entities; in that case we will just mark one of the possibilities as the identifier (by underlining it) and we will underline several attributes only when they are all, together needed to identify the entities of that type. Figure 5 shows the full diagram for the person entity type, including the use of SSN as identifier.
3 Relationship Basics
Normally, entities by themselves are not terribly useful; we need to somehow relate the entities to one another. In the ER model, relationships are used to represent associations between entities. Notice that they are the only way to represent associations among entities in the relational model3. In ER diagrams, we use diamonds
3 In the relational model, which we will discuss in other chapters, we use foreign keys to represent associations among rows in a table; in the ER model, attributes cannot be used to associate entities, only relationships are used to do that.
Databases, okaram
ER Modeling
6 of 13
to represent relationships among entities. When modeling relationships, we again can make the distinction between types and entities; a relationship type is a meaningful association that can occur between different entities of specified entity types, whereas a relationship instance is each one of the actual associations between entity instances. As with entities, we normally model relationship types, since the instances will eventually be stored in the database. Figure 6 shows an incomplete ER diagram for a relationship. The diagram does not illustrate cardinality constraints, which we will discuss shortly. Notice that we use verb phrases to name relationships (in this example, lives in); also, although relationships are bidirectional, and could be navigated in either direction, the name we give to it assumes one specific direction (in this example, it would be understood that a person lives in a country, rather than the other direction).
Figure 6: A person lives in a country (missing cardinality constraints) When modeling relationships, one of the most important issues is the degree of a relationship, that is, the number of entities that participate on it. The simplest relationships are binary, since they relate two entities (later we will discuss relationships of different degrees). Another important issue is the cardinality of a relationship. For each side of a relationship we identify constraints on the number of instances of the opposite side an entity could be related to. For example, we could specify that a person, at any given time, lives in exactly one country, and a country may have zero or more people living in a country at any given time. Figure 7 illustrates how we would represent such a situation:
Figure 7: Cardinality constrains; a person lives in exactly one country (minimum of 1, maximum of 1, represented with ||) and a country can have zero or more people living on it ( represented with 0<) On the side opposite to person (next to country) we specify the minimum and maximum
Databases, okaram
ER Modeling
7 of 13
countries a person lives in; for the minimum we would use either a 0 or a 1(represented by a circle or a vertical line) and for the maximum either 1 or more than one (represented by < or >); in this case we use || (minimum 1 maximum 1) for person; on the side opposite to country (next to person) we specify how many people could live in the same country; we specify that the minimum is zero (0) while the maximum is more than one; the full constraint is represented as >0. The ER notation can be extended to represent specific limits if necessary, but in most situations there are no specific limits other than 0,1 or more than one. For binary relationships, we combine the maximum cardinality on both sides, and we say a relationship is one-to-one, one-to-many (or many-to-one) or many-to-many. The relationship in Figure 7 would be one-to-many. Notice that relationships may also have attributes; for example, if we wanted to include in our model the date a person started living in a country, we could add an attribute (say, since) to the lives in relationship, as illustrated in Figure 8.
4 Recursive Relationships
Recursive relationships are those that relate two or more entities of the same type. Normally, we use the entity type to distinguish among the sides of a relationship, but with recursive relationships there are at least two sides that are the same, so we need to introduce the concept of role to distinguish among them. A role is just a name given to one of the sides of a relationship. Notice that the book calls this kind of relationships unary, but this idea does not generalize. If we actually used the book's terminology, a relationship that relates three entity instances of two different entity types would be considered binary (and so, normal an easy :). Although I accept the book's definitions as answers to exams, I prefer to define degree as the number of entity instances that participate in a relationship instance. Recursive relationships are not terribly uncommon, since, besides other uses, they are useful to represent hierarchies. One of the best examples is the supervises relationship. In many companies, people supervise other people. So, if we want to show that relationship we could start thinking about having two kinds of entities, say Boss and Peon, with the supervising relationship as follows.
Databases, okaram
ER Modeling
8 of 13
Figure 9: Conceptual idea of Supervises. WRONG And we would say a boss supervises zero or more peons, and a peon is supervised by zero or one bosses. Of course, we would realize this is not exactly right, since this reflects only a two-level hierarchy, and what we want is one with unlimited levels. In order to do that, we need to introduce recursion (just like in programming), so we need to fold this diagram so the two entities become one. So, our diagram would look like this:
Figure 10: Supervises, roles missing, INCOMPLETE Now the problem would be how to distinguish among the two sides; that is whether you can have more than one peon or more than one boss. We do that by including the role each entity instance plays in the relationship instance. Basically, we assign a name to each side, and use that to distinguish among the sides. So, the full diagram, specifying that each employee has zero or one bosses (assuming the CEO doesn't have any boss), and that each employee may supervise zero or more other employees would look like this:
Figure 11: Supervises, final and right version Notice that the cardinality constraints apply to the full entity type, not just to the roles. It may look funny to allow for a boss with no subordinates, but we need to remember this cardinality constraint applies to all Employees.
Databases, okaram
ER Modeling
9 of 13
Bill of Materials
Another example is the part-of relationship, also called the bill of materials problem. This involves representing the fact that an item is made from other items; these subitems may in turn be made from other items, in a hierarchy. So, we can follow the same kind of reasoning. From a diagram that considers them to be two separate items:
Figure 12: Bill of material, conceptual idea. WRONG And then fold them into one recursive relationship as follows:
Figure 13: Bill of materials, right version, no attributes Now, we could want to improve this diagram by adding how many of each part each item uses. The diagram would look as follows:
Databases, okaram
ER Modeling
10 of 13
relationship; however, it is one-to-one and does not form a hierarchy. The diagram would look like this, with the roles identified as husband and wife.
Figure 15: Marriage is a one-to-one recursive relationship, that does not form a hierarchy. Notice that there may be constraints other than cardinalities (such as gender requirements for each role), and those constraints should be added as annotations outside the diagram. Another example, common in the academic domain, is that of course prerequisites and co-requisites. In many academic programs you are required to take certain courses before others (prerequisites); sometimes, you are allowed to take the courses concurrently (co-requisites). We can express this situation as two different recursive relationships among courses. Another way to model this would be to represent this as a single relationship, with the kind of requisite (either pre or co) as an attribute. The diagram would look like this:
Figure 16: Course prerequisites and correquisites as two separate relationships Figure 17: Course prerequisites and corequisites as one combined relationship
5 Additional Examples
To further clarify the basic ER concepts, we provide a few additional examples. 1. We have only one entity, called Person (everything else is represented as attributes), with the following attributes: Id (the identier); Name, which is composed of one or more given names and one or more family names; One or more aliases, an address (composed of street, city, state,zip); date-of-birth; and age, which can be calculated from the date of birth.
Databases, okaram
ER Modeling
11 of 13
Figure 18: Solution to Example 1 This example illustrates most of the different kinds of attributes, plus the fact that the representation of a given attribute depends on the situation; here name is NOT divided into rst,middle,last. 2. We have two kinds of entities: Products and Categories, both with attributes id (the identier), and name. Each product belongs to zero or more categories and each category can have zero or more products. Categories are organized in a hierarchy; each category belongs to zero or one categories, and may have zero or more sub-categories.
Figure 19: Solution to example 2 3. We want to model course and program information for a university. For each course we keep its id, its title, the number of lecture hours, the number of lab hours, the number of credit hours (which is always equal to the number of lecture hours plus one-half of the number of lab hours). Courses may have other courses as prerequisites. We also want to keep information about programs of study. For each program we want to keep its id and its title. We also want to keep track of which courses are required for a program of study, and which courses are allowed as electives for a program of study.
Databases, okaram
ER Modeling
12 of 13
This one, could be easily modeled in two different ways. The rst solution uses two different relationships between program and course; requires, which represents the fact that a course is required for a program and uses as elective, which represents the fact that a program uses a course as an elective. Notice the cardinalities for the relationships weren't really specied in the requirements, but zero-or-more sounds reasonable.
Figure 20: Solution to Exercise 3, with two different relationships between program and course The second solution uses only one relationship, with an attribute to distinguish between required and elective courses. Notice this would convey less information if the cardinalities for requires and 'Uses As Elective' were different.
Figure 21: Solution to exercise 3, with just one relationship, with a boolean attribute of the relationship indicating whether the program uses the course as a requirement or an elective
Databases, okaram
ER Modeling
13 of 13
4. We want to keep track of courses, sections and professors. A course has a course number (identier), and a title. A section has a crn (identier), semester, and year. A section belongs to exactly one course, and a course may have zero or more sections. A Professor has ssn (identier) and name. A professor teaches one or more sections (a section is taught by exactly one professor). A professor is qualied to teach one or more courses (a course may have one or more qualied professors). We also want to keep track of the date a professor became qualied to teach the course.