Lecture 01 Data
Lecture 01 Data
Lecture 01 Data
Database
A database is a large collection of interrelated data.
A database is a mechanized, shared, formally defined and centrally controlled collection of data
used in an organization.
A database is an organized collection of information.
Database System
A database system is an integrated collection of related data, along with details of the
interpretation of the data contained therein.
Main Task: Database systems are designed to manage large bodies of information.
The management of data involves
(a) The definition of structures for the storage of information
(b) The provision of mechanisms for the manipulation of information
Additional Task 1: The system must provide for the safety of the information stored, despite
system crashes or attempts at unauthorized access.
Additional Task 2: If data are to be shared among several users, the system must avoid possible
anomalous results.
View of Data
A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. This concern has led to the design of
complex data structures for the representation of data in the database.
Since many database-systems users are not computer trained, developers hide the complexity from
users through several levels of abstraction, to simplify users’ interactions with the system.
Three-Level Architecture
The generalized architecture of a database system called the ANSI/SPARC model.
Three-Level Architecture
view level
(defined by user or application programmer in consultation with
DBA)
view 1 view 2 … view n
logical level
(defined by DBA)
physical level
(DBA defined for optimization)
At the physical level, a student or course record can be described as a block of consecutive storage
locations (words or bytes).
At the logical level, each such record is described by a type definition (using structure in C/C++,
record in Pascal) and the interrelationship among these record types is defined.
Finally, at the view level, computer users see a set of application programs that hide details of the
database types. Similarly, at the view level, several views of the database are defined, and database users
see these views.
Schema
The overall design of the database is called the database schema.
Database systems have several schemas, partitioned according to the levels of abstraction. At the
lowest level is the physical schema; at the intermediate level is the logical schema; at the highest level is
the subschema.
In general, database systems support one physical schema, one logical schema, and several
subschemas.
Data Independence
The ability to modify a schema definition in one level without affecting a schema definition in the
next higher level is called data independence.
Lecture 2
Data Models
Data model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints.
The various data models fall into three different groups:
Relational Model
The relational model uses a collection of tables to represent both data and the relationships among
those data.
Each table has multiple columns, and each column has a unique name.
Network Model
Data in the network model are represented by collections of records.
The relationships among data are represented by links, which can be viewed as pointers.
The records in the database are organized as collections of arbitrary graphs.
Hierarchical Model
The hierarchical model is similar to the network model in the sense that data and relationships
among data are represented by records and links, respectively.
It differs from the network model in that the records are organized as collections of trees rather
arbitrary graphs.
Data Models
(3) Physical Data Models
These models are used to describe data at the lowest level.
They capture aspects of database-system implementation.
There are few physical data models in use.
Examples: The unifying model, The frame-memory model.
DBMS Facilities
A database system provides two different types of facilities:
The data definition facility or data definition language, and
The data manipulation facility or data manipulation language.
Data-Definition Language
Database management systems provide a facility known as data-definition language (DDL),
which can be used to define the conceptual schema and also give some details about how to implement this
schema in the physical devices used to store the data.
This definition includes all the entity sets and their associated attributes as well as the relationships
among the entity sets.
This definition also includes constraints on the values assigned to different attributes in the same
or different records.
Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate
data as organized by the appropriate data model. There are basically two types.
Procedural DMLs: Procedural DMLs require a user to specify what data are needed and how to get those
data.
Nonprocedural DMLs: Nonprocedural DMLs require a user to specify what data are needed without
specifying how to get those data.
Nonprocedural DMLs are usually easier to learn and use than are procedural DMLs.
Since a user does not have to specify how to get the data, nonprocedural DMLs may generate
code that is not as efficient as the procedure by procedural languages.
Database Administrator
Centralized control of the database is exerted by a person or a group of persons under the
supervision of a high-level administration. This person or group is referred to as the database administrator
(DBA).
The Functions of the DBA
(a) Schema definition
(b) Storage structure and access-method definition
(c) Schema and physical-organization modification
(d) Granting of authorization for data access
(e) Integrity-constraint specification
Database Users
There are four different types of database-system users, depending on their degree of expertise or
the mode of their interactions with the DBMS.
(a) Naïve Users
(b) Sophisticated Users
(c) Application Programmers
(d) Database Administrator
Naïve Users
Users who need not be aware of the presence of the database system or any other system
supporting their usage are considered naïve users.
For example: user of an automatic teller machine.
Other such naïve users are end users of the database who work through a menu-oriented
permanent application programs where the type and range of response is always indicated to the user.
Sophisticated Users
These are users who may communicate with the database directly via an online terminal or
indirectly via a user interface and application program.
They are aware of the presence of the database system and may have acquired a certain amount of
expertise.
For example: Analysts who submit queries to explore data in the database
Application Programmers
Professional programmers who are responsible for developing application programs or user
interfaces utilized by the naïve users and sophisticated users fall into this category.
Specialized users who write specialized database applications that do not fit into the traditional data-
processing framework like computer-aided design systems, knowledge base and expert systems, systems
that store data with complex data types (graphics data and audio data), and environment-modeling systems
are also included in this category.
Lecture 3
Structure of a DBMS
Data Definition Language Interpreter
The DDL Interpreter converts the data definition statements into a set of tables. These tables
contain the metadata concerning the database and are in a form that can be used by other component of the
DBMS.
Data Manager
Data manager is the central software component of the DBMS. It is sometimes referred to as the
database control system.
The data manager is responsible for interfacing with the file system.
The task of enforcing constraints to maintain the consistency and integrity of the data, as well as
its security, are also performed by the data manager.
Synchronizing the simultaneous operations performed by concurrent users is under the control of
the data manager.
It is also entrusted with backup and recovery operations.
File Manager
Responsibility for the structure of the files and managing the file space rests with the file manager.
Disk Manager
The disk manager is part of the operating system of the host computer and all physical input and
output operations are performed by it.
Data Files
Data files contain the data portion of the database.
Telecommunication System
Online users, whether remote or local, communicate with the database directly or indirectly via a
user interface over communication lines.
The telecommunication system is not part of the DBMS but the DBMS works closely with the
system.
For example: CICS, IDMS-DC, TALKMASTER, IERCOMM.
Data Dictionary
Information pertaining to the structure and usage of data contained in the database, the metadata,
is maintained in a data dictionary.
The term system catalog also describes the metadata.
The data dictionary, which is a database itself, documents the data.
Statistical Data
Statistical data store statistical information about the data in the database.
This information is used by the query processor to select efficient ways to execute a query.
Access Aids
To improve the performance of a DBMS, a set of access aids in the form of indexes are usually
provided in a database system.
Commands are provided to build and destroy additional temporary indexes.
Database Access
A user’s request for data is received by the data manager.
The data manager determines the physical record required.
The data manager sends the request for a specific physical record to the file manager.
The file manager decides which physical block of secondary storage device contains the required
record.
The file manager sends the request for the appropriate block to the disk manager.
A block is a unit physical input/output operations between primary and secondary storage.
The disk manager retrieves the block and sends it to the file manager.
The file manager sends the required record to the data manager.
Steps in Data Access
Response
to user Required Required Block(s) from
record block(s) secondary
storage
Advantages of DBMS
(a) Reduction of Redundancies.
(b) Data Sharing.
(c) Provision of Data Integrity.
(d) Enforcement of Security.
(e) Conflict Resolution.
(f) Data Independence
(g) Enhancement of Data Quality
(h) Centralized Control
Disadvantages of DBMS
(a) Problems Associated with Centralization.
(b) Cost of Software/Hardware and Migration.
(c) Complexity of Backup and Recovery.
Lecture 4
Entity
An entity is a ‘thing’ or ‘object’ in the real world that is distinguishable from all other objects.
For example: each student in the discipline is an entity.
An entity has a set of properties, and the values for some set of properties may uniquely identify
an entity.
For example: the Student No. 010201 uniquely identifies one particular student in the discipline.
An entity may be concrete such as a student or a book.
An entity may be abstract such as a course, or a holiday, or a concept.
Entity Set
An entity set is a set of entities of the same type that share the same properties, or attributes.
For example: the set of all persons who are enrolling in a given discipline can be defined as the
entity set student.
Similarly, the entity set course might represent the set of all courses conducted by a particular
discipline.
Entity sets do not need to be disjoint.
For example: consider the entity set of all teachers of a university is defined as teacher and the set
of all students of the university is defined as student. A person entity may be a teacher entity, a student
entity, both, or neither.
Attribute
Attributes are descriptive properties possessed by each member of an entity set. An entity is
represented by a set of attributes.
For example: possible attributes of the student entity set are student-name, student-no, year, and
term.
Again, possible attributes for the course entity set are course-no, course-title, and credit-hours.
For each attribute, there is a set of permitted values, called the domain, or value set, of that
attribute.
For example: the domain of attribute student-name might be the set of all text strings of a certain
length (say, 40 characters).
Formally, an attribute of an entity set is a function that maps from the entity set into a domain.
It is possible for several attributes to have the same domain.
Composite Attributes
A composite attribute may appear as a hierarchy.
Null Attributes
A null value is used when an entity does not have a value for an attribute and has the meaning of
‘not applicable’.
For example: the value for the attribute telephone-no will be null, if the person does not have any
telephone.
Null can also designate that an attribute value is unknown.
An unknown value may be missing (the value does exist, but presently unavailable).
For example: if the value of the attribute student-no for a particular student is null, it is assumed
that the value is missing.
An unknown value may be not known (it is not known whether or not the value actually exists).
For example: A null value for road-no attribute could mean that the address does not include a
road number, that a road number exists but it is not known, or that it is not known whether or not a road
number is part of the person’s address.
Derived Attributes
The value for this type of attribute can be derived from the values of other related attributes or
entities.
For example: consider that the person entity set has the related attributes date-of-birth and age.
The value for age can be derived from the value for date-of-birth and the current date.
In this case, age is the derived attribute.
date-of-birth may be referred to as a base attribute, or a stored attribute.
Relationship Sets
A relationship set is a set of relationships of the same type.
Formally, it is a mathematical relation on n ≥ 2 (possibly nondistinct) entity sets. If E1, E2, …, En
are entity sets, then a relationship set R is a subset of
Participation
The association between entity sets is referred to as participation.
For example: the entity sets E1, E2, …, En participate in relationship R.
Entity’s Role
The function that an entity plays in a relationship is called that entity’s role.
Since entity sets participating in a relationship set are generally distinct, roles are implicit and are
not usually specified.
Roles are useful when the meaning of a relationship needs classification.
When the same entity set participates in a relationship set more than once, in different roles.
For example: A relationship set supervises has ordered pairs of employee entities; characterized by
(manager, worker) pairs, not (worker, manager) pairs.
Lecture 5
Use of Entity Sets or Attributes
Question 1: What constitutes an attribute?
Question 2: What constitutes an entity set?
Answer: There is no simple answers.
The distinctions mainly depends on:
* The structure of the real-world enterprise being modeled.
* The semantics associated with the attribute in question.
Consider, entity set employee with attributes employee-name and telephone-no.
The definition implies that every employee has precisely one telephone number associated with
him.
Guideline: A relationship set is designated to describe an action that occurs between entities.
Mapping Constraints
Two of the most important types of constraints:
(a) Mapping Cardinalities
(b) Existence Dependencies
Mapping Cardinalities
Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity
can be associated via a relationship set.
Mapping cardinalities are most useful in describing binary relationship sets.
Note: It is always possible to replace a non-binary (n-ary, n > 2) relationship set by a number of
distinct binary relationship sets.
For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of
the following:
(a) One to One: An entity in A is associated with at most one entity in B, and an entity in B is
associated with at most one entity in A.
(b) One to many: An entity in A is associated with any number of entities in B. An entity in B,
however, can be associated with at most one entity in A.
(c) Many to One: An entity in A is associated with at most one entity in B. An entity in B,
however, can be associated with any number of entities in A.
(d) Many to many: An entity in A is associated with any number of entities in B, and an entity in
B is associated with any number of entities in A.
Existence Dependency
If the existence of entity x depends on the existence of entity y, then x is said to be existence
dependent on y.
Entity y is said to be a dominant entity.
Entity x is said to be a sub-ordinate entity.
For example: Consider the entity set course and the entity set registration that keeps information
about all the registrations that were made in connection to a particular course.
A relationship set course-registration is formed between these two entity sets, which is one-to-many from
course to registration.
The entity set course is dominant and registration is sub-ordinate in the relationship set course-registration.
Super Keys
A superkey is a set of one or more attributes that, taken collectively, allows users to identify
uniquely an entity in the entity set.
For example: the student-no attribute of the entity set student is sufficient to distinguish one
student entity from another.
Thus, student-no is a superkey.
Similarly, the combination of student-no and student-name is a superkey for the entity set student.
The student-name attribute of student is not a superkey, because several people might have the same name.
Candidate Keys
The superkey for which no proper subset is a superkey is called candidate key.
The minimal superkeys are called candidate keys.
For example: The student-no attribute of the entity set student is a candidate key for the entity set.
Similarly, the combination of student-name and student-address is a candidate key for the entity set
student.
Although the combination of the attributes student-no and student-name of student is a superkey, but their
combination does not form a candidate key, since the attribute student-no alone is a candidate key.
The term primary key is used to denote a candidate key that is chosen by the database designer as
the principal means of identifying entities within an entity set.
Lecturte 6
Entity-Relationship Diagram
The overall logical structure of a database can be expressed graphically by a Entity-Relationship
(ER) diagram.
For example: A very simple Entity-Relationship (ER) diagram is shown below with the most
common components.
teacherID name courseNo title
Note: Attributes of an entity set that are members of the primary key are underlined.
E1 R E2
The above relationship set may be many-to-many, one-to-many, many-to-one, or one-to-one.
To distinguish among these types, direction is used as follows:
A directed line (→) from the relationship set R to the entity set E2 specifies that R is either a one-
to-one, or many-to-one relationship set, from E1 to E2; R cannot be a many-to-many or a one-to-many
relationship set, from E1 to E2.
A undirected line () from the relationship set R to the entity set E2 specifies that R is either a
many-to-many, or one-to-many relationship set, from E1 to E2.
many - to - many
E1 R E2
many - to - one
E1 R E2
one-to-many
E1 R E2
one-to-one
E1 R E2
Discriminator
The discriminator of a weak entity set is a set of attributes that works as a means of distinguishing
among all those entities in the entity set that depend on one particular strong entity set.
For example: The attribute dependentID in the entity set dependent participating in the
relationship set depend with entity set employee.
The primary key of a weak entity set is formed by the primary key of the strong entity set on
which the weak entity set is existence dependent, plus the weak entity set’s discriminator.
For example: The primary key for the weak entity set depend is formed by the primary key
employeeID of entity set employee and its discriminator dependentID.
A weak entity set is indicated in E-R diagrams by a doubly outlines box, i.e., Double Rectangle.
The corresponding identifying relationship is indicated by a doubly outlined diamond, i.e., Double
Diamond.
The discriminator of a weak entity set is underlined with a dashed line.
Generalization
Generalization is a containment relationship that exists between a higher-level entity set and one
or more lower-level entity sets.
For all practical purpose, generalization is a simple inversion of specialization.
Generalization proceeds from the recognition that a number of entity sets share some common
properties.
Based on the commonalities among entity sets, generalization synthesizes these entity sets into a
single, higher-level entity set.
Generalization is used to emphasize the similarities among lower-level entity sets and to hide the
differences.
Generalization also permits an economy of representation in that shared attributes are not repeated.
Attribute Inheritance
Attribute inheritance is a crucial property of the higher-level and lower-level entities created by
specialization and generalization.
The attributes of higher-level entity sets are said to be inherited by the lower-level entity sets.
A lower-level entity set (or subclass) also inherits participation in the relationship sets in which its higher-
level entity set (or superclass) participates.
Attribute inheritance applies through all tiers of lower-level entity sets.
Design Constraints
Certain constraints may be placed on a particular generalization.
Goal: In order to gain more accurate design model for enterprise.
User-defined: Entities are assigned to a given entity set by the database user.
For example: The assignment of students of higher-level entity set student to one of five work
groups represented by five lower-level entity sets is made on an individual basis by the teacher (the user in
charge of this decision).
The group assignment is implemented by an operation that adds an entity to an entity set.
Constraints on Multi-Membership
This relates to whether or not entities may belong to more than one lower-level entity set within a
single generalization.
Completeness Constraint
This specifies whether or not an entity in the higher-level entity set must be belong to at least one
of the lower-level entity sets within a generalization.