Database Chapter 2
Database Chapter 2
Chapter 2
Introduction to Relational
Database
By:
Dr. Hajar Esmaeil As-Suhbani
2
Introduction
The relational database model, introduced by Edgar F. (Ted) Codd in 1970,
revolutionized data management by using the concept of mathematical
relations (tables) to organize and manipulate data.
This model replaced earlier data models like the network model and hierarchical
model, offering greater simplicity and flexibility for programmers.
First Commercial Implementations:
Oracle DBMS: One of the first commercial relational database systems.
SQL/DS (IBM): IBM's early relational database system.
Why the Relational Model Dominates:
Simplicity: Easier to understand and use compared to older models.
Flexibility: Allows for efficient querying and manipulation of data.
Standardization: SQL (Structured Query Language) became the universal language for
relational databases.
Popular Relational Database Management Systems (RDBMS):
Microsoft: SQL Server, Access.
IBM: DB2, Informix.
Others: MySQL, PostgreSQL, Oracle Database.
SQL as the Standard:
SQL is the standard query language for interacting with relational databases, enabling
operations like data retrieval, insertion, updating, and deletion.
3
Structure of Relational Databases
A relational database consists of a collection of tables (a table is also called as a
relation), each of which is assigned a unique name.
For example, consider the instructor table of Figure 2.1, which stores information
about instructors.
Columns headers → attributes
The table has four column headers: ID, name, dept_name, and salary.
Each row (tuple) of this table records information about an instructor, consisting of the
instructor’s ID, name, dept_name, and salary.
4
…Structure of Relational Databases
Similarly, the course table of Figure 2.2 stores information about courses, consisting
of a course_id, title, dept_name, and credits, for each course.
Note that each instructor is identified by the value of the column ID, while each course is
identified by the value of the column course_id.
5
…Structure of Relational Databases
Figure 2.3 shows a third table, prereq, which stores the prerequisite courses for each
course.
7
…Structure of Relational Databases
Attribute Types and Domains:
1. Attribute: An attribute is a column in a relation (table) that represents a specific property
or characteristic of the entity.
o Example: In an instructor table, attributes could include name, salary, and phone_number.
2. Domain: A domain is the set of permitted values for an attribute.
o It defines the type of data that can be stored in an attribute and ensures data integrity.
o Example:
The domain of the salary attribute might be all positive integers.
The domain of the name attribute might be all valid strings of a certain length.
3. Atomicity of Domains: A domain is atomic if its values are indivisible units. This means
that each value in the domain is treated as a single, inseparable entity.
o Example:
The domain of salary is atomic because each salary value is a single number.
The domain of phone_number would not be atomic if it allowed a set of phone
numbers (e.g., {123-456-7890, 987-654-3210}), because the set can be broken down
into individual phone numbers.
8
…Structure of Relational Databases
The null value:
The special value null is a member of every domain.
Indicated that the value is “unknown” or does not exist
For example, suppose as before that we include the attribute phone_number in
the instructor relation.
It may be that an instructor does not have a phone number at all, or that the
telephone number is unlisted.
We would then have to use the null value to signify that the value is unknown or
does not exist.
The null values cause a number of difficulties when we access or update the
database, and thus should be eliminated if at all possible.
9
Database Schema
When we talk about a database, we must differentiate between:
Database Schema: the logical design of the database and consists of a list of
attributes and their corresponding domains (does not generally change).
Database instance: which is a snapshot of the data in the database at a given
instant in time (change with time).
10
…Database Schema
A1, A2, …, An are attributes.
R (A1, A2, …, An ) is a relation schema.
For example, consider the department relation of Figure
2.5.
The schema for that relation is:
department (dept_name, building, budget)
Note that the attribute dept_name appears in both the instructor schema and the
department schema.
Using common attributes in relation schemas is one way of relating tuples of distinct
relations.
For example, suppose we wish to find the information about all the instructors who
work in the Watson building.
We look first at the department relation to find the dept_name of all the departments
housed in Watson, then, for each such department, we look in the instructor relation to
find the information about the instructor associated with the corresponding dept_name.
11
…Database Schema
Another example, each course in a university may be offered multiple times, across
different semesters, or even within a semester.
We need a relation to describe each individual offering, or section, of the class.
The schema is:
section (course_id, sec_id, semester, year, building, room_number, time_slot_id)
12
…Database Schema
There are many more relations maintained in a real university database.
student (ID, name, dept_name, tot_cred)
advisor (s_id, i_id)
takes (ID, course_id, sec_id, semester, year, grade)
classroom (building, room_number, capacity)
Time_slot (time_slot id, day, start_time, end_time)
13
…Database Schema
Degree (or Arity) of a Relation: The number of attributes (columns) in a
relation schema.
Example: A relation instructor(id, name, salary) has a degree of 3.
Cardinality: The total number of tuples (rows) in a relation at a given time.
Example: If the instructor relation has 10 rows, its cardinality is 10.
Relational Database Schema: A collection of relation schemas (tables) and a set
of integrity constraints (rules).
Example: A database schema for a university might include tables
like instructor, student, and course, along with constraints like primary keys and
foreign keys.
Relation State (or Relation Instance): The set of tuples (rows) in a relation at
a specific time.
Example: The current data in the instructor table (e.g., 10 rows) represents its
relation state.
14
Keys
Keys are one of the basic requirements of a relational database model.
It is widely used to identify the tuples(rows) uniquely in the table.
We also use keys to set up relations amongst various columns and tables of
a relational database.
Tuples in the same tables should be distinguishable → No two tuples in a
relation are allowed to have exactly the same value for all attributes.
Let K R
Different Types of Database Keys:
Candidate Key
Primary Key
Super Key
Alternate Key
Foreign Key
Composite Key
15
Keys:
Superkey
K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible
relation r(R).
A superkey uniquely identifies tuples in a relation.
It may include extra attributes beyond what is necessary.
Superkey attributes can have NULL values, but this can affect uniqueness.
Examples:
1.Relation: instructor(ID, name, salary)
•Superkeys:
•{ID}: Unique for each instructor.
•{ID, name}: Also unique, but name is redundant since ID alone is sufficient.
•Not a Superkey:
•{name}: Multiple instructors may share the same name.
2.Relation: student(ID, email, name)
•Superkeys:
•{ID}: Unique for each student.
•{email}: Unique for each student.
18
Keys:
Foreign key
A relation, say r1, may include among its attributes the primary key of another relation,
say r2.
This attribute is called a foreign key from r1, referencing r2.
It is a key it acts as a primary key in one table and it acts as a secondary key in
another table.
The relation r1 is also called the referencing relation of the foreign key dependency,
and r2 is called the referenced relation of the foreign key.
For example, the attribute dept_name in instructor is a foreign key from instructor,
referencing department, since dept_name is the primary key of department.
The constraint from r2 to r1 is an example of a referential integrity constraint.
A referential integrity constraint requires that the values appearing in specified
attributes of any tuple in the referencing relation also appear in specified
attributes of at least one tuple in the referenced relation.
Example: If dept_name in instructor is a foreign key referencing department,
then every dept_name in instructor must exist in the department table.
19
Schema Diagrams
A database schema, along with primary key and foreign key dependencies, can be
depicted by schema diagrams.
Figure 2.8 shows the schema diagram for the university organization.
Each relation appears as a box, with the relation name at the top in blue, and the
attributes listed inside the box.
Primary key attributes are shown underlined.
Foreign key dependencies appear as arrows from the foreign key attributes of the
referencing relation to the primary key of the referenced relation.
Referential integrity constraints other than foreign key constraints are not shown
explicitly in schema diagrams.
Many database systems provide design tools with a graphical user interface for
creating schema diagrams.
20
21
Expression of relation schema
Figure 2.9 gives the relational schema that we use in our examples, with
primary key attributes underlined.
This corresponds to the approach to defining relations in the SQL data-
definition language.
22
Relational Query Languages
A query language is a language in which a user requests information from the
database.
These languages are usually on a level higher than that of a standard programming
language.
Query languages can be categorized as either procedural or nonprocedural.
In a procedural language, the user instructs the system to perform a
sequence of operations on the database to compute the desired result.
In a nonprocedural language, the user describes the desired information
without giving a specific procedure for obtaining that information.
Query languages used in practice include elements of both the procedural and the
nonprocedural approaches.
There are a number of “pure” query languages:
The relational algebra is procedural(fundamental techniques for extracting data from
the database),whereas the tuple relational calculus and domain relational calculus are
nonprocedural.
The relational algebra consists of a set of operations that take one or two relations as
input and produce a new relation as their result.
The relational calculus uses predicate logic to define the result desired without giving
any specific algebraic procedure for obtaining that result.
23
Relational Operations
All procedural relational query languages provide a set of operations that can be
applied to either a single relation or a pair of relations.
These operations have the nice and desired property that their result is always a
single relation.
This property allows one to combine several of these operations in a modular way.
Specifically, since the result of a relational query is itself a relation, relational
operations can be applied to the results of queries as well as to the given set of
relations.
24
Relational Operations:
Relational Algebra
Relational Algebra is a procedural language.
Just as algebraic operations on numbers take one or more numbers as input and
return a number as output, the relational algebra operations typically take one or
two relations as input and return a relation as output.
25
Relational Operations:
Relational Algebra: Selection
The most frequent operation is the selection of specific tuples from a single relation
(say instructor) that satisfies some particular predicate.
Example: Select tuples from the instructor relation of Figure 2.1, satisfying the predicate
“salary is greater than $85000”,we get the result shown in Figure 2.10.
Query: s salary > 85000 (instructor)
Result:
Exercises:
• sdept_name=“Physics”(instructor)
26
…Relational Operations:
Relational Algebra: Selection
We allow comparison using : =, , >, , <, in the selection predicates.
We can combine several predicates into a larger predicate by using the connectives:
(and), (or), (not).
Example:
Find the instructors in physics with a salary greater than 85,000:
Query:
Result:?
27
Relational Operations:
Relational Algebra: Projection Operation
Another frequent operation is to select certain attributes (columns) from a relation.
The result is a new relation having only those selected attributes.
Example:
Relation r:
A,C (r)
28
…Relational Operations:
Relational Algebra: Projection Operation
For example, suppose we want a list of instructor IDs and salaries without listing the
name and dept_name values from the instructor relation of Figure 2.1, then the result,
shown in Figure 2.11, has the two attributes ID and salary.
Each tuple in the result is derived from a tuple of the instructor relation but with only
selected attributes shown.
Query: ID, salary (instructor)
Result:
30
…Relational Operations:
Relational Algebra: Cartesian product
Suppose we want to find the information about all instructors together with the
course_id of all courses they have taught.
We need the information in both the instructor relation and the teaches relation to
compute the required result.
The Cartesian product of instructor and teaches does bring together information
from both these relations, but unfortunately the Cartesian product associates every
instructor with every course that was taught, regardless of whether that instructor
taught that course.
If R and S have attributes with the same name, the resulting relation will have
duplicate attribute names, which must be handled (e.g., by renaming attributes).
The Cartesian product is often used in conjunction with other operations like
selection (σ) or projection (π) to perform more complex queries.
The Cartesian product can produce a very large result, especially if the input
relations are large, so it is often used with conditions to filter the results.
31
32
33
…Relational Operations:
Relational Algebra: Cartesian product
Since the Cartesian-product operation associates every tuple of instructor with every
tuple of teaches, we know that if an instructor has taught a course (as recorded in the
teaches relation), then there is some tuple in instructor × teaches that contains her name
and satisfies instructor.ID = teaches.ID.
So, if we write:
σinstructor.ID=teaches.ID(instructor × teaches)
We get only those tuples of instructor × teaches that pertain to instructors and the
courses that they taught.
The result of this expression is shown in Figure 2.13.
Note that this expression results in the duplication of the instructor’s ID.
This can be easily handled by adding a projection to eliminate the column
teaches.ID.
34
35
Relational Operations:
Relational Algebra: Join
The join operation allows the combining of two relations by merging pairs of tuples, one
from each relation, into a single tuple.
The join operation allows us to combine a selection and a Cartesian product into a single
operation.
Consider relations r(R) and s(S), and let θ be a predicate on attributes in the schema R
S.
36
…Relational Operations:
Relational Algebra: Join
Figure 2.12 shows an example of joining the tuples from the instructor and
department tables with the new tuples showing the information about each instructor
and the department in which she is working.
This result was formed by combining each tuple in the instructor relation with the
tuple in the department relation for the instructor’s department.
In the form of join shown in Figure 2.12, which is called a natural join, a tuple from
the instructor relation matches a tuple in the department relation if the values of their
dept_name attributes are the same.
All such matching pairs of tuples are present in the join result.
In general, the natural join operation on two relations matches tuples whose values
are the same on all attribute names that are common to both relations.
There are a number of different ways to join relations (as we shall see later).
37
38
Relational Operations:
Relational Algebra: Union
Because relations are sets, we can perform normal set operations on relations.
The union operation allows us to combine two relations.
The union operation performs a set union of two “similarly structured” tables (say a table
of all graduate students and a table of all undergraduate students).
Notation: r s
For r s to be valid.
1. r, s must have the same arity (same number of attributes)..
2. The attribute domains must be compatible (example: 2nd column of r deals
with the same type of values as does the 2nd column of s).
39
…Relational Operations:
Relational Algebra: Union
Consider a query to find the set of all courses taught in the Fall 2017 semester, the
Spring 2018 semester, or both.
The information is contained in the section relation.
To find the set of all courses taught in the Fall 2017 semester, we write:
course_id (σsemester =“Fall”∧ year=2017 (section))
To find the set of all courses taught in the Spring 2018 semester, we write:
course_id (σsemester =“Spring” ∧ year=2018 (section))
To answer the query, we need the union of these two sets; that is, we need all
course ids that appear in either or both of the two relations.
We find these data by the binary operation union, denoted, as in set theory, by .
So the expression needed is:
course_id (σsemester =“Fall”∧ year=2017 (section))
course_id (σsemester =“Spring” ∧ year=2018 (section))
The result relation for this query appears in Figure 2.14.
40
41
Relational Operations:
Relational Algebra: Set Intersection Operation
The set-intersection operation allows us to find tuples that are in both the input
relations.
Notation: r s
Assume:
1. r, s have the same arity.
2. attributes of r and s are compatible.
Example: Find the set of all courses taught in both the Fall 2017 and the Spring 2018
semesters.
course_id (s semester=“Fall” Λ year=2017 (section))
course_id (s semester=“Spring” Λ year=2018 (section))
Result
42
Relational Operations:
Relational Algebra: Set Difference Operation
The set-difference operation allows us to find tuples that are in one relation but are
not in another.
Notation r – s
Set differences must be taken between compatible relations.
1. r and s must have the same arity.
2. attribute domains of r and s must be compatible.
Example: to find all courses taught in the Fall 2017 semester, but not in the Spring
2018 semester
course_id (s semester=“Fall” Λ year=2017 (section)) −
course_id (s semester=“Spring” Λ year=2018 (section))
Result:
43
Relational Operations:
Equivalent Queries
There is more than one way to write a query in relational algebra.
Query 1:
s dept_name=“Physics” salary > 90,000 (instructor)
Query 2:
s dept_name=“Physics” (s salary > 90.000 (instructor))
The two queries are not identical; they are, however, equivalent -- they give the same
result on any database.
Optimization:
Depending on the database system, one query might be more efficient than the other.
For example, if the salary > 90000 condition filters out many tuples, applying it first
(as in Query 2) might reduce the number of tuples to process in the second step.
44
…Relational Operations:
Equivalent Queries
Another Example:
Suppose you want to find the names of instructors in the Physics department
with a salary greater than 90,000.
You could write this query in multiple ways:
45
THANK YOU!