Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
58 views

DBE Unit 1

The document provides an introduction to database management systems (DBMS). It discusses how DBMS organize data into collections called databases and use software to control access and manage databases. It describes how DBMS address issues with traditional file-based data storage like data redundancy, inconsistency, and lack of security. The document also outlines some advantages of DBMS like improved data integrity and sharing, and disadvantages such as complexity, size, and cost of conversion from traditional file systems.

Uploaded by

gully boy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

DBE Unit 1

The document provides an introduction to database management systems (DBMS). It discusses how DBMS organize data into collections called databases and use software to control access and manage databases. It describes how DBMS address issues with traditional file-based data storage like data redundancy, inconsistency, and lack of security. The document also outlines some advantages of DBMS like improved data integrity and sharing, and disadvantages such as complexity, size, and cost of conversion from traditional file systems.

Uploaded by

gully boy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 175

Database Engineering

Unit 1

Introduction to DBMS
Introduction
• Data
• raw, unorganized facts that need to be processed.
• Data can be something simple and seemingly
random and useless until it is organized.

• Information
• When data is processed, organized, structured or
presented in a given context so as to make it useful,
• it is called information.
Introduction
• Database
• collection of related data
• Database Management System (DBMS)
• software that manages and controls access to the
database.
• Database application
• A program that interacts with the database at
some point in its execution.
Introduction
• Database Management System (DBMS) -
• Collection of interrelated data - database
• Set of programs to access those data

• Primary Goal –
• store and retrieve data in convenient and efficient manner

• Define structure of information


• Mechanism for manipulation
• Safety against system crash and unauthorized access
• Avoid anomalous result while sharing
Database System Applications
• Banking: all transactions
• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Manufacturing: production, inventory, orders, supply chain
• Human resources: employee records, salaries, tax
deductions
Purpose of Database System
Database Systems versus Traditional File based
Systems
• Early days, database applications were built on top of file
systems
• Drawbacks of using file systems to store data:
• Data redundancy and inconsistency
• Multiple file formats, duplication of information in
different files
• Difficulty in accessing data
• Need to write a new program to carry out each new task
• Data isolation — multiple files and formats
• Integrity problems
• Integrity constraints (e.g. account balance < 0) become
part of program code
• Hard to add new constraints or change existing ones
• Atomicity of updates
• Failures may leave database in an inconsistent state with
partial updates carried out
• E.g. transfer of funds from one account to another should
either complete or not happen at all
• Concurrent access by multiple users
• Concurrent acces needed for performance
• Uncontrolled concurrent accesses can lead to
inconsistencies
• E.g. two people reading a balance and updating it at the
same time
• Security problems
• Database Approach
All the above limitations of the file-based approach
can be attributed to two factors:
(1) the definition of the data is embedded in the
application programs, rather than being
stored separately and independently;

(2) there is no control over the access and
manipulation of data beyond that imposed by
the application programs.
• To become more effective, a new approach was
required.
• What emerged were the database and the Database
Management System (DBMS).
History of Database Management Systems
• DBMS has its roots in the 1960s
• Apollo moon-landing project
• Which was initiated in response to President
Kennedy’s objective of landing a man on the moon
• At that time there was no system available that would
be able to handle and manage the vast amounts of
information that the project would generate.
• North American Aviation (NAA, now Rockwell
International), the prime contractor for the project,
developed software known as GUAM (Generalized
Update Access Method).
• hierarchical structure
• In the mid-1960s, IBM joined NAA to develop GUAM
into what is now known as IMS (Information
Management System).
• In the mid-1960s, another significant development
was the emergence of IDS (Integrated
Data Store) from General Electric.
• This work was headed by one of the early pioneers
of database systems, Charles Bachmann.
• This development led to a new type of database
system known as the network DBMS
• The network database was developed partly to
address the need to represent more complex data
relationships than could be modeled with hierarchical
structures, and partly to impose a database standard.
• To help establish such standards, the Conference on
Data Systems Languages (CODASYL), comprising
representatives of the US government and the world
of business and commerce, formed a List Processing
Task Force in 1965, subsequently renamed the Data
Base Task Group (DBTG) in 1967.
• The terms of reference for the DBTG were to define
standard specifications for an environment that
would allow database creation and data
manipulation.
• A draft report was issued in 1969 and the first
definitive report in 1971.
• The DBTG proposal identified three components:
• the network schema – the logical organization of the
entire database as seen by the
DBA – which includes a definition of the database
name, the type of each record, and
the components of each record type;
• the subschema – the part of the database as seen
by the user or application program;
• a data management language to define the data
characteristics and the data structure, and
to manipulate the data.
• for standardization, the DBTG specified three distinct
languages:
Data Definition Language (DDL), which enables the
DBA to define the schema;
a subschema DDL, which allows the application
programs to define the parts of the database they
require;
Data Manipulation Language (DML), to manipulate
the data.
Advantages and Disadvantages of DBMSs
• Advantages
• Control of data redundancy
• traditional file-based systems waste space by
storing the same information in more than one file.

• Data consistency
• By eliminating redundancy, we reduce the risk of
inconsistencies occurring.
• If a data item is stored only once in the database,
any update to it has to be performed only once
and the new value is available immediately to all.
• More information from the same amount of data
• With the integration of the operational data, it may
be possible for the organization to derive
additional information from the same data.

• Sharing of data
• database belongs to the entire organization and
can be shared by all authorized users.
• Improved data integrity
• Database integrity refers to the validity and
consistency of stored data.
• Integrity is expressed in terms of constraints,
• which are consistency rules that the database is
not permitted to violate.
• Constraints may apply to data items within a
single record or they may apply to relationships
between records.
• Improved security
• Database security is the protection of the
database from unauthorized users.

• Enforcement of standards
• integration allows the DBA to define and enforce
the necessary standards.
• These may include departmental, organizational,
national, or international standards.
• Economy of scale
• Combining all the organization’s operational data
into one database, and creating a set of
applications that work on this one source of data,
can result in cost savings.

• Improved data accessibility and responsiveness


• As a result of integration, data that crosses
departmental boundaries is directly
accessible to the end-users. This provides a
system with potentially much more functionality
• Increased productivity
• DBMS provides many of the standard functions
that the programmer would normally have to write
in a file-based application.
• At a basic level, the DBMS provides all the low-
level file-handling routines that are typical in
application programs.
• Increased concurrency
• Many DBMSs manage concurrent database
access and ensure problems cannot occur.
• Improved backup and recovery services
• modern DBMSs provide facilities to minimize the
amount of processing that is lost following a
failure.
• Disadvantages
• Complexity
• The provision of the functionality we expect of a
good DBMS makes the DBMS an extremely
complex piece of software.
• Database designers and developers, the data and
database administrators, and end-users must
understand this functionality to take full
advantage of it.
• Failure to understand the system can lead to bad
design decisions, which can have serious
consequences for an organization.
• Size
• The complexity and breadth of functionality makes
the DBMS an extremely large piece of software,
occupying many megabytes of disk space and
requiring substantial amounts of memory to run
efficiently.

• Cost of DBMSs
• The cost of DBMSs varies significantly, depending
on the environment and functionality provided.
• a single-user DBMS for a personal computer may
only cost US$100.
• Additional hardware costs
• The disk storage requirements for the DBMS and
the database may necessitate the purchase of
additional storage space.
• To achieve the required performance, it may
be necessary to purchase a larger machine
• Cost of conversion
• In some situations, the cost of the DBMS and
extra hardware may be insignificant
compared with the cost of converting existing
applications to run on the new DBMS and
hardware.
• This cost also includes the cost of training staff to
use these new systems
• Performance
• a file-based system is written for a specific
application, such as invoicing.
• As a result, performance is generally very good.
• However, the DBMS is written to be more
general, to cater for many applications rather than
just one.
• The effect is that some applications may not run
as fast as they used to.
• Higher impact of a failure
• The centralization of resources increases the
vulnerability of the system.
• Since all users and applications rely on the
availability of the DBMS, the failure of certain
components can bring operations to a halt.
View of Data
• Complex data structures used for efficiency
• Hides certain details of how data are stored and maintained
• Provides abstract view
• Hides complexity
• 3 Levels of Data Abstraction
• Physical Level
• Logical Level
• View Level
Physical Level
• Lowest level of abstraction
• Describes how data are actually stored
• Describes complex low level data structures in detail
• Blocks of consecutive storage locations- words, byte etc
• Administrator may be aware of certain details of the
physical organization
Logical Level
• Next higher level of abstraction
• Describes what data are stored in the database
• And what relationship exists among those data
• Describes entire database in terms of a small number of
relatively simple structures
• Do not bother with complex physical storage
• Database administrator uses logical level
• Who decides what information to keep in database
• Each record is described by a type definition
• Programmers also work at this level
View Level
• Highest level of abstraction
• Describes only part of entire database
• Exists to simplify interaction with the system
• System may provide many views for the same database
• Set of application programs that hide details of the data
types
• Views also provides security mechanisms to prevent users
from accessing certain parts of database
• type customer = record
name : string;
street : string;
city : integer;
end;
Instances and Schemas
• Database changes over time as information is inserted and
deleted
• Instance of the database –
• Collection of information stored in the database at a
particular moment
• Schema –
• The overall design of the database

• Schemas are changed infrequently


Instances and Schemas
• Similar to types and variables in programming languages
• Schema – the logical structure of the database
• e.g. the database consists of information about a set of
customers and accounts and the relationship between them

• Analogous to type information of a variable in a program


• In DBMS, Schema is the overall Design of the
Database.
• Instance is the information stored in the Database at
a particular moment.
• Database schema corresponds to the variable declaration
• The value of the variable in a program at a point in time
correspond to an instance of a database schema
• Database systems have several schemas
• Partitioned according to levels of abstraction
• Physical Schema – describes database design at physical
level
• Logical Schema – describes database design at logical
level
• Several schemas at view level, called as sub-schemas
• Describes different views of database
• Physical Data Independence –
• Logical schema is most important
• the ability to modify the physical schema without changing
the logical schema
• Applications depend on the logical schema
• In general, the interfaces between the various levels and
components should be well defined so that changes in
some parts do not seriously influence others.
Data Models
• Underlying the structure of a database is data model
• A collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints

• The Entity-Relationship Model (E-R)


• Relational Model
Relational Model
• uses a collection of tables to represent both data and
the relationships among those data.
• Each table has multiple columns, and each column
has a unique name.
• relational model is an example of a record-based
model.
• Record-based models are so named because the
database is structured in fixed-format records of
several types.
• Each table contains records of a particular type.
• Each record type defines a fixed number of fields, or
attributes.
• The columns of the table correspond to the attributes
of the record type.
• The relational data model is the most widely used
data model,
• a vast majority of current database systems are
based on the relational model.
• The relational model is at a lower level of abstraction
than the E-R model.
• Database designs are often carried out in the E-R
model, and then translated to the relational model
Other Data Models
• object-oriented data model
• is another data model that has seen increasing
attention.
• The object-oriented model can be seen as extending
the E-R model with notions of encapsulation,
methods (functions), and object identity.

• The object-relational data model combines


features of the object-oriented data model and
relational data model.
• Semi-structured data models permit the specification
of data where individual data items of the same type
may have different sets of attributes.
• This is in contrast with the data models mentioned
earlier,
• where every data item of a particular type must have
the same set of attributes.
• The extensible markup language (XML) is widely
used to represent semistructured data.
• Historically, two other data models, the network data
model and the hierarchical data model, preceded
the relational data model.
• These models were tied closely to the underlying
implementation,
• and complicated the task of modeling data.
• As a result they are little used now, except in old
database code that is still in service in some places.
The Entity-Relationship Model (E-R)
• Collection of basic objects – entities
• And relationships among these objects

• Entity – things or objects in the real world


• Distinguishable from other objects
• E. g. person, bank account
• Entities are described in database by a set of attributes
• E. g. bank account – account numbers and balance
• Some attributes used to uniquely identify the entities
• Relationship – association among several entities
• An entity set is a set of entities of the same type that
share the same properties, or attributes.
• The set of all persons who are customers at a given
bank, for example, can be defined as the entity set
customer
• Entity sets do not need to be disjoint.
• For example, it is possible to define the entity set of
all employees of a bank (employee) and the entity
set of all customers of the bank (customer).
• A person entity may be an employee entity, a
customer entity, both, or neither.
• An entity is represented by a set of attributes.
• Attributes are descriptive properties possessed by
each member of an entity set.
• Each entity has a value for each of its attributes
• Some attributes are used to uniquely identify entities
• For each attribute, there is a set of permitted values,
called the domain, or value set, of that attribute.
• A database thus includes a collection of entity sets,
each of which contains any number of entities of the
same type.
• an attribute of an entity set is a function that maps
from the entity set into a domain.
• Since an entity set may have several attributes,
• each entity can be described by a set of (attribute,
data value) pairs,
• one pair for each attribute of the entity set.
• E.g. (customer-id, 677-89-9011)
• (customer-name, Hayes)

• An attribute can be characterized by the following


attribute types
Simple and composite attributes
• Simple attributes- they are not divided into
subparts.
• Composite attributes- can be divided into subparts
(other attributes).
• E.g. attribute name could be structured as a
composite attribute consisting of first-name, middle-
initial, and last-name
• if a user will wish to refer to an entire attribute on
some occasions, and only a component of the
attribute on other occasions.
• composite attribute may appear as a hierarchy
Single-valued and multivalued attributes
• single value for a particular entity attribute
• There may be instances where an attribute has a set
of values for a specific entity.
• employee entity set with the attribute phone-number.
• An employee may have zero, one, or several phone
numbers
• and different employees may have different numbers
of phones.
• This type of attribute is said to be multivalued.
• appropriate, upper and lower bounds may be placed
on values in a multivalued attribute.
Derived attribute
• The value for this type of attribute can be derived
from the values of other related attributes or entities.
• the customer entity set has an attribute loans-held,
• which represents how many loans a customer has
from the bank.
• derive the value for this attribute by counting the
number of loan entities associated with that
customer.
• The value of a derived attribute is not stored, but is
computed when required.
• An attribute takes a null value when an entity does
not have a value for it.
• The null value may indicate “not applicable”—the
value does not exist for the entity.
• For example, one may have no middle name.
• Null can also designate that an attribute value is
unknown.
• Components of E-R diagram
• Rectangle – represents entity sets
• Ellipses – represents attributes
• Diamonds – represents relationships among entity sets
• Lines – link attributes to entity sets and entity sets to
relationship
• Double Ellipse – Multivalued Attributes
• Dashed Ellipse – Derived Attributes
• Double lines - indicate total participation of an entity
in a relationship set
• Each entity is labeled with name
Relationship Sets
• A relationship is an association among several
entities.
• a relationship that associates customer Hayes with
loan L-15.
• This relationship specifies that Hayes is a customer
with loan number L-15.
• A relationship set is a set of relationships of the
same type.
• If E1, E2, . . .,En are entity sets, then a relationship
set R is a subset of

The association between entity sets is referred to as


participation;

the entity sets E1, E2, . . .,En participate in relationship


set R
• The function that an entity plays in a relationship is
called that entity’s role.
Constraints
• constraints to which the contents of a database must
conform
• Mapping Cardinalities
• Mapping cardinalities, or cardinality ratios,
• express the number of entities to which another
entity can be associated via a relationship set.
• E.g. each account must belong to only one customer
• For a binary relationship set R between entity sets A
and B,
• the mapping cardinality must be one of the following:
• One to one :-
• An entity in A is associated with at most one entity in
B, and
• an entity in B is associated with at most one entity in
A.
• E.g. One Customer – one loan account
• One to many :-
• An entity in A is associated with any number (zero or
more) of entities in B.
• An entity in B, can be associated with at most one
entity in A.
• E.g. One Customer – many loan accounts
• Many to one :-
• An entity in A is associated with at most one entity in
B.
• An entity in B, however, can be associated with any
number (zero or more) of entities in A.
• E.g. 3 Customers - joint loan account
• Many to many :-
• An entity in A is associated with any number (zero or
more) of entities in B, and
• an entity in B is associated with any number (zero or
more) of entities in A.
• E.g. Many customers many loan accounts
Participation Constraints
• The participation of an entity set E in a relationship
set R is said to be total
• if every entity in E participates in at least one
relationship in R.
• If only some entities in E participate in relationships
in R,
• the participation of entity set E in relationship R is
said to be partial.
• For example, we expect every loan entity to be
related to at least one customer through the borrower
relationship.
• Therefore the participation of loan in the relationship
set borrower is total.
• In contrast, an individual can be a bank customer
whether or not she has a loan with the bank.
• Hence, it is possible that only some of the customer
entities are related to the loan entity set through the
borrower relationship,
• and the participation of customer in the borrower
relationship set is therefore partial.
Keys
• must have a way to specify how entities within a
given entity set are distinguished.
• individual entities are distinct the difference among
them must be expressed in terms of their attributes
• The values of the attribute values of an entity must
be such that they can uniquely identify the entity.
• no two entities in an entity set are allowed to have
exactly the same value for all attributes
• A key allows us to identify a set of attributes that
suffice to distinguish entities from each other.
• A superkey is a set of one or more attributes that,
taken collectively,
• allow us to identify uniquely an entity in the entity set.
• For example, the customer-id attribute of the entity
set customer is sufficient to distinguish one customer
entity from another.
• Thus, customer-id is a superkey.
• Similarly, the combination of customer-name and
customer-id is a superkey for the entity set customer.
• The customer-name attribute of customer is not a
superkey, because several people might have the
same name.
• candidate keys :-
• superkey may contain extraneous attributes.
• If K is a superkey, then so is any superset of K.
• superkeys for which no proper subset is a superkey.
• Such minimal superkeys are called candidate keys.
• several distinct sets of attributes could serve as a
candidate key.
• Suppose that a combination of customer-name and
customer-street is sufficient to distinguish among
members of the customer entity set.
• Then, both {customer-id} and {customer-name,
customer-street} are candidate keys.
• Although the attributes customerid and customer-
name together can distinguish customer entities,
• their combination does not form a candidate key,
• since the attribute customer-id alone is a candidate
key
• primary key denote a candidate key that is chosen
by the database designer
• as the principal means of identifying entities within an
entity set.
• A key (primary, candidate, and super) is a property of
the entity set, rather than of the individual entities.
• Any two individual entities in the set are prohibited
from having the same value on the key attributes at
the same time.
• Candidate keys must be chosen with care
• The primary key should be chosen such that its
attributes are never, or very rarely, changed.
• For instance, the address field of a person should not
be part of the primary key,
• since it is likely to change.
• Social-security numbers, on the other hand, are
guaranteed to never change.
• The primary key of a relational table uniquely
identifies each record in the table.
• It can either be a normal attribute that is guaranteed
to be unique (such as Social Security Number in a
table with no more than one record per person)
• or it can be generated by the DBMS (such as a
globally unique identifier, or GUID, in Microsoft SQL
Server).
• Primary keys may consist of a single attribute or
multiple attributes in combination.
Sample – ER Diagrams
Weak Entity Sets
• An entity set may not have sufficient attributes to
form a primary key.
• Such an entity set is termed a weak entity set.
• An entity set that has a primary key is termed a
strong entity set
• Consider the entity set payment, which has the three
attributes: payment-number, payment-date, and
payment-amount.
• each payment entity is distinct starting from 1
• But payments for different loans may share the same
payment number.
• Thus, this entity set does not have a primary key;
• it is a weak entity set.
• For a weak entity set to be meaningful, it must be
associated with another entity set, called the
identifying or owner entity set.
• the weak entity set is said to be existence
dependent on the identifying entity set.
• The identifying entity set is said to own the weak
entity set that it identifies.
• The relationship associating the weak entity set with
the identifying entity set is identifying relationship.
• The identifying relationship is many to one
• from the weak entity set to the identifying entity set,
and the participation of the weak entity set in the
• In our example, the identifying entity set for payment
is loan,
• and a relationship loan-payment that associates
payment entities with their corresponding loan
entities is the identifying relationship.
• The primary key of a weak entity set is formed by the
primary key of the identifying entity set, plus the
weak entity set’s discriminator.
• In the case of the entity set payment, its primary key
is {loan-number, payment-number},
• In E-R diagrams, a doubly outlined box indicates a
weak entity set,
• and a doubly outlined diamond indicates the
corresponding identifying relationship.
• In Figure the weak entity set payment depends on
the strong entity set loan via the relationship set loan-
payment.
• The figure also illustrates the use of double lines to
indicate total participation—
• The participation of the (weak) entity set payment in
the relationship loan-payment is total
Extended E-R Features
• Extended E-R features –
• Specialization
• Generalization
• higher- and lower-level entity sets,
• attribute inheritance
• aggregation
Specialization
• An entity set may include subgroupings of entities
that are distinct in some way from other entities in the
set.
• For instance, a subset of entities within an entity set
may have attributes that are not shared by all the
entities in the entity set.
• The E-R model provides a means for representing
these distinctive entity groupings.
• Consider an entity set person, with attributes name,
street, and city.
• A person may be further classified as one of the
following:
• Customer
• employee
• Each of these person types is described by a set of
attributes that includes all the attributes of entity set
person plus possibly additional attributes.
• For example, customer entities may be described
further by the attribute customer-id
• whereas employee entities may be described further
by the attributes employee-id and salary.
• The process of designating subgroupings within an
entity set is called specialization.
• The specialization of person allows us to distinguish
among persons according to whether they are
employees or customers.
• Another example, the bank wishes to divide accounts
into two categories, Current account and savings
account.
• In terms of an E-R diagram, specialization is depicted
by a triangle component labelled ISA, as Figure
shows.
• The label ISA stands for “is a”
• represents, for example, that a customer “is a”
person.
• The ISA relationship may also be referred to as a
superclass-subclass relationship.
• Higher- and lower-level entity sets are depicted as
regular entity sets—that is, as rectangles containing
the name of the entity set.
Generalization
• The refinement from an initial entity set into
successive levels of entity subgroupings represents a
top-down design process in which distinctions are
made explicit.
• The design process may also proceed in a bottom-
up manner, in which multiple entity sets are
synthesized into a higher-level entity set on the basis
of common features.
• customer entity set with the attributes name, street,
city, and customer-id,
• and an employee entity set with the attributes name,
street, city, employee-id, and salary.
• There are similarities between the customer entity
set and the employee entity set -they have several
attributes in common.
• This commonality can be expressed by
generalization
• which is a containment relationship that exists
between a higher-level entity set and one or more
lower-level entity sets.
• Person is the higher-level entity set and customer
and employee are lower-level entity sets.
• Higher- and lower-level entity sets also may be
designated by the terms superclass and subclass,
respectively.
• The person entity set is the superclass of the
customer and employee subclasses.
• For all practical purposes, generalization is a simple
inversion of specialization.
• apply both processes, in combination, in the course
of designing the E-R Diagram
• Differences in the two approaches may be
characterized by their starting point and overall goal.
• Specialization stems from a single entity set;
• it emphasizes differences among entities within the
set by creating distinct lower-level entity sets.
• These lower-level entity sets may have attributes, or
may participate in relationships, that do not apply to
all the entities in the higher-level entity set.
• Generalization proceeds from the recognition that a
number of entity sets share some common features
• (namely, they are described by the same attributes
and participate in the same relationship sets).
• On the basis of their commonalities, generalization
synthesizes these entity sets into a single, higher-
level entity set.
• Generalization is used to emphasize the similarities
among lower-level entity sets and to hide the
differences
• it also permits an economy of representation in that
shared attributes are not repeated.
Attribute Inheritance
• A crucial property of the higher- and lower-level
entities created by specialization and generalization
is attribute inheritance.
• The attributes of the higher-level entity sets are said
to be inherited by the lower-level entity sets.
• For example, customer and employee inherit the
attributes of person.
• customer is described by its name, street, and city
attributes, and additionally a customer-id attribute;
• employee is described by its name, street, and city
attributes, and additionally employee-id and salary
• A lower-level entity set (or subclass) also inherits
participation in the relationship sets in which its
higher-level entity (or superclass) participates.
• The officer, teller, and secretary entity sets can
participate in the works-for relationship set, since the
superclass employee participates in the works-for
relationship.
• Attribute inheritance applies through all tiers of lower-
level entity sets.
• Single Inheritance, Multiple Inheritance
Constraints on Generalizations
• One type of constraint involves determining which
entities can be members of a given lower-level entity
set.
• Such membership may be one of the following:
• Condition-defined-
• In condition-defined lower-level entity sets,
membership is evaluated on the basis of whether or
not an entity satisfies an explicit condition or
predicate.
• E.g. Account – account type saving
• User-defined- User-defined lower-level entity sets
are not constrained by a membership condition;
• rather, the database user assigns entities to a given
entity set.
• For instance, let us assume that, after 3 months of
employment, bank employees are assigned to one of
four work teams
• A second type of constraint relates to whether or not
entities may belong to more than one lower-level
entity set within a single generalization.
• The lower-level entity sets may be one of the
following:
• Disjoint- A disjointness constraint requires that an
entity belong to no more than one lower-level entity
set.
• Overlapping-In overlapping generalizations, the
same entity may belong to more than one lower-level
entity set within a single generalization.
• A final constraint, the completeness constraint on a
generalization or specialization,
• specifies whether or not an entity in the higher-level
entity set must belong to at least one of the lower-
level entity sets within the generalization/
specialization
• This constraint may be one of the following:
• Total generalization or specialization -Each higher-
level entity must belong to a lower-level entity set
• Partial generalization or specialization- Some
higher-level entities may not belong to any lower-
level entity set
• Partial generalization is the default.
• We can specify total generalization in an E-R
diagram by using a double line to connect the box
representing the higher-level entity set to the triangle
symbol.
• (This notation is similar to the notation for total
participation in a relationship.)
• The account generalization is total:
• All account entities must be either a savings account
or a current account
• The work team entity sets illustrate a partial
specialization.
• Since employees are assigned to a team only after 3
months on the job,
• some employee entities may not be members of any
of the lower-level team entity sets.
Aggregation
• One limitation of the E-R model is that it cannot
express relationships among relationships.
• consider the ternary relationship works-on, between
a employee, branch, and job
• suppose we want to record managers for tasks
performed by an employee at a branch;
• we want to record managers for (employee, branch,
job) combinations.
• One alternative for representing this relationship is to
create a quaternary relationship manages between
employee, branch, job, and manager.
Design of an E-R Database Schema
• The E-R data model gives us much flexibility in
designing a database schema to model a given
enterprise.
• how a database designer may select from the wide
range of alternatives.
• Among the designer’s decisions are:
Design Phases
• The initial phase of database design, is to
characterize fully the data needs of the prospective
database users.
• Next, the designer chooses a data model, and by
applying the concepts of the chosen data model,
translates these requirements into a conceptual
schema of the database.
• The designer reviews the schema to confirm that all
data requirements are indeed satisfied and are not in
conflict with one another.
• In a specification of functional requirements,
users describe the kinds of operations (or
transactions) that will be performed on the data.
• The process of moving from an abstract data model
to the implementation of the database proceeds in
two final design phases.
• In the logical-design phase, the designer maps the
high-level conceptual schema onto the
implementation data model of the database system
that will be used.
• The designer uses the resulting system specific
database schema in the subsequent physical-
design phase, in which the physical features of the
database are specified.
Reduction of an E-R Schema to Tables
• represent a database that conforms to an E-R
database schema by a collection of tables.
• For each entity set and for each relationship set in
the database, there is a unique table to which we
assign the name of the corresponding entity set or
relationship set.
• Each table has multiple columns, each of which has
a unique name.
• Tabular Representation of Strong Entity Sets
• Let E be a strong entity set with descriptive attributes
a1, a2, . . . , an.
• We represent this entity by a table called E with n
distinct columns,
• each of which corresponds to one of the attributes of
E.
• Each row in this table corresponds to one entity of
the entity set E.
• E.g. loan with attributes loan number and amount
• Tabular Representation of Weak Entity Sets
• Let A be a weak entity set with attributes a1, a2, . . . ,
am.
• Let B be the strong entity set on which A depends.
• Let the primary key of B consist of attributes b1, b2, .
. . , bn.
• We represent the entity set A by a table called A with
one column for each attribute of the set:
• {a1, a2, . . . , am} ∪ {b1, b2, . . . , bn}
• loan
• payment
• Tabular Representation of Relationship Sets
• Let R be a relationship set, let a1, a2, . . . , am be the
set of attributes formed by the union of the primary
keys of each of the entity sets participating in R,
• and let the descriptive attributes (if any) of R be b1,
b2, . . . , bn.
• We represent this relationship set by a table called R
with one column for each attribute of the set:
• {a1, a2, . . . , am} ∪ {b1, b2, . . . , bn}
• E.g. relationship - borrower
• Customer – primary key customer id
• Loan – primary key loan number
• Redundancy of Tables
• Combination of Tables
• Composite Attributes
• Multivalued Attributes
• Tabular Representation of Generalization
• Tabular Representation of Aggregation
Redundancy of Tables
• Specifically in case of weak entity set
• Table used for Identifying relationship is redundent
• E. g. Every(loan-number,payment-number)
combination in loan-payment would also be present
in the payment table.
• Thus, the loan-payment table is redundant.
Combination of Tables
• Combine table for account-branch with the table for
account and require only the following two tables:
• account, with attributes account-number, balance,
and branch-name
• branch, with attributes branch-name, branch-city, and
assets
• No need of table account-branch
Tabular Representation of Generalization
• Create a table for the higher-level entity set.
• For each lower-level entity set, create a table that
includes a column for each of the attributes of that
entity set
• plus a column for each attribute of the primary key of
the higher-level entity set.
Tabular Representation of Aggregation

• The table for the relationship set manages between


the aggregation of works-on and the entity set
manager includes
• a column for each attribute in the primary keys of the
entity set manager and the relationship set works-on.
Database System Structure
• A database system is partitioned into modules
• that deal with each of the responsibilities of the
overall system.
• The functional components of a database system
can be broadly divided into
• the storage manager and
• the query processor components.
• storage manager is a program module
• that provides the interface between the low level data
stored in the database and the application programs
and queries submitted to the system.
• The storage manager is responsible for the
interaction with the file manager.
• The raw data are stored on the disk using the file
system, which is usually provided by a conventional
operating system.
• The storage manager translates the various DML
statements into low-level file-system commands.
• the storage manager is responsible for storing,
retrieving, and updating data in the database.

The storage manager components include:


• Authorization and integrity manager
tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
• Transaction manager
ensures that the database remains in a consistent
(correct) state despite system failures, and
concurrent transaction executions proceed without
conflicting.
• File manager -
manages the allocation of space on disk storage and the
data structures used to represent information stored on
disk.

• Buffer manager
responsible for fetching data from disk storage into
main memory, and deciding what data to cache in
main memory.
The buffer manager is a critical part of the database
system, since it enables the database to handle data
sizes that are much larger than the size of main
memory.
• The storage manager implements several data
structures as part of the physical system
implementation:

• Data files, which store the database itself.

• Data dictionary,
which stores metadata about the structure of the
database, in particular the schema of the database.

• Indices, which provide fast access to data items that


hold particular values.
The Query Processor
• The query processor components include
• DDL interpreter, which interprets DDL statements and
records the definitions in the data dictionary.

• DML compiler, which translates DML statements in a


query language into an evaluation plan consisting of
low-level instructions that the query evaluation
engine understands.

• Query evaluation engine, which executes low-level


instructions generated by the DML compiler.
Application Architectures
• Database applications are usually partitioned into two
or three parts
• In a two-tier architecture, the application is
partitioned into a component that resides at the client
machine,
• which invokes database system functionality at the
server machine through query language statements.
• Application program interface standards like JDBC
are used for interaction between the client and the
server.
• in a three-tier architecture, the client machine acts
as merely a front end and does not contain any direct
database calls.
• Instead, the client end communicates with an
application server, usually through a forms
interface.
• The application server in turn communicates with a
database system to access data.
Roles in the Database Environment
• A primary goal of a database system is to retrieve
information from and store new information in the
database.
• People who work with a database can be categorized
as
• database users or
• database administrators.
Database Users and User Interfaces
• There are four different types of database-system
users,
• differentiated by the way they expect to interact with
the system.
• Different types of user interfaces have been
designed for the different types of users.
• Naive users
• are unsophisticated users who interact with the
system by invoking one of the application programs
that have been written previously.
• For example, a bank teller who needs to transfer $50
from account A to account B invokes a program
called transfer.
• Application programmers
• are computer professionals who write application
programs.
• Application programmers can choose from many
tools to develop user interfaces.
• Sophisticated users
• interact with the system without writing programs.
• Instead, they form their requests in a database query
language.
• They submit each such query to a query processor,
whose function is to break down DML statements into
instructions that the storage manager understands.
• Analysts who submit queries to explore data in the
database fall in this category.
• Specialized users
• are sophisticated users who write specialized
database applications that do not fit into the
traditional data-processing framework.
• Among these applications are computer-aided design
systems, knowledge base and expert systems,
systems that store data with complex data types (for
example, graphics data and audio data), and
environment-modeling systems.
Database Administrator
• One of the main reasons for using DBMSs is to have
central control of both the data and the programs that
access those data.
• A person who has such central control over the
system is called a database administrator (DBA).
• The functions of a DBA include:
• Schema definition. The DBA creates the original
database schema by executing a set of data
definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification.
The DBA carries out changes to the schema and
physical organization to reflect the changing needs of
the organization, or to alter the physical organization
to improve performance.
• Granting of authorization for data access.
• By granting different types of authorization, the
database administrator can regulate which parts of
the database various users can access.
• Routine maintenance. Examples of the database
administrator’s routine maintenance activities are:
• Periodically backing up the database,
• Ensuring that enough free disk space is available
• Monitoring jobs running on the database and
ensuring that performance is not degraded
Transaction Management
• A transaction is a collection of operations that
performs a single logical function in a database
application.
• Each transaction is a unit of both atomicity and
consistency
• ACID
• Concurrency control
Structure of Relational Databases
• Collection of tables - relation
• Columns- attributes
• Rows- record- tuple
• Domain – possible values
• Database Schema

• Keys-
• Underline the attributes forming primary key
Relational Query Languages
The Relational Algebra

• relational algebra is a procedural query language.


• It consists of a set of operations that take one or two
relations as input
• and produce a new relation as their result.
• The fundamental operations in the relational algebra
are select, project, union, set difference, Cartesian
product, and rename.
• Fundamental Operations
• select, project, and rename operations are called
unary operations,
• because they operate on one relation.
• The other three operations operate on pairs of
relations and are called binary operations.
• union, set difference, Cartesian product
The Select Operation
• select operation selects tuples that satisfy a given
predicate.
• use the lowercase Greek letter sigma (σ) to denote
selection.
• The predicate appears as a subscript to σ.
• The argument relation is in parentheses after the σ.
• Thus, to select those tuples of the loan relation
where the branch is “Perryridge,” write
• σbranch-name =“Perryridge” (loan)
• We can find all tuples in which the amount is more
than $1200 by writing
• σamount>1200 (loan)
• allow comparisons using =, =, <, ≤, >, ≥ in the
selection predicate.
• we can combine several predicates into a larger
predicate by using the connectives and (∧), or (∨),
and not (¬).
• to find those tuples pertaining to loans of more than
$1200 made by the Perryridge branch, we write
• σbranch-name =“Perryridge”∧ amount>1200 (loan)
The Project Operation
• Suppose we want to list all loan numbers and the
amount of the loans,
• but do not care about the branch name.
• The project operation allows us to produce this
relation.
• The project operation is a unary operation that
returns its argument relation, with certain attributes
left out.
• any duplicate rows are eliminated.
• Projection is denoted by the uppercase Greek letter
pi (Π).
• We list those attributes that we wish to appear in the
result as a subscript to Π.
• The argument relation follows in parentheses.
• Thus, we write the query to list all loan numbers and
the amount of the loan as
• Πloan-number, amount(loan)
Composition of Relational Operations
• the result of a relational operation is itself a relation
• Consider query “Find those customers who live in
Harrison.”
• We write:
• Πcustomer-name (σcustomer-city =“Harrison” (customer))

• instead of giving the name of a relation as the


argument of the projection operation,
• we give an expression that evaluates to a relation.
The Union Operation
• a query to find the names of all bank customers who
have either an account or a loan or both.
• To answer this query, we need the information in the
depositor relation and in the borrower relation
• We know how to find the names of all customers with
a loan in the bank:
• Πcustomer-name (borrower )
• We also know how to find the names of all customers
with an account in the bank:
• Πcustomer-name (depositor)
• To answer the query,
• we need the union of these two sets;
• we need all customer names that appear in either or
both of the two relations.
• We find these data by the binary operation union,
denoted, as in set theory, by ∪.
• So the expression needed is
• Πcustomer-name (borrower ) ∪ Πcustomer-name (depositor)
The Set Difference Operation
• The set-difference operation, denoted by −, allows
us to find tuples that are in one relation but are not in
another.
• The expression r − s produces a relation containing
those tuples in r but not in s.
• We can find all customers of the bank who have an
account but not a loan by writing
• Πcustomer-name (depositor) − Πcustomer-name (borrower )
• we must ensure that set differences are taken
between compatible relations.
• Therefore, for a set difference operation r − s to be
valid,
• we require that the relations r and s be of the same
type
• and the domains of the ith attribute of r and the ith
attribute of s be the same
The Cartesian-Product Operation
• The Cartesian-product operation, denoted by a
cross (×),
• allows us to combine information from any two
relations.
• We write the Cartesian product of relations r1 and r2
as r1 × r2.
• for r = borrower × loan is
• (borrower.customer-name, borrower.loan-number,
loan.loan-number, loan.branch-name, loan.amount)
• σbranch-name =“Perryridge”(borrower × loan)
The Natural-Join Operation
• It is often desirable to simplify certain queries that
require a Cartesian product.
• Usually, a query that involves a Cartesian product
includes a selection operation on the result of the
Cartesian product.
• Πcustomer-name, loan.loan-number, amount
(σborrower .loan-number =loan.loan-number (borrower × loan))
The Tuple Relational Calculus
• In relational-algebra expression-provide a sequence
of procedures that generates the answer to query.
• tuple relational calculus, by contrast, is a
nonprocedural query language.
• It describes the desired information without giving a
specific procedure for obtaining that information.
• A query in the tuple relational calculus is expressed
as {t | P(t)}
• it is the set of all tuples t such that predicate P is true
for t
Example Queries
• we want to find the branch-name, loan-number, and
amount for loans of over $1200:
• {t | t ∈ loan ∧ t[amount] > 1200}

• use t[A] to denote the value of tuple t on attribute A,


and use t ∈ r to denote that tuple t is in relation r.
• Suppose that we want only the loan-number
attribute, rather than all attributes of the loan relation.
• To write this query in the tuple relational calculus, we
need to write an expression for a relation on the
schema (loan-number).
• We need those tuples on (loan-number) such that
there is a tuple in loan with the amount > 1200
• To express this request, we need the construct “there
exists” from mathematical logic.
• The notation ∃ t ∈ r (Q(t))
• means “there exists a tuple t in relation r such that
predicate Q(t) is true.”
• Using this notation, we can write the query “Find the
loan number for each loan of an amount greater than
$1200” as
• {t | ∃ s ∈ loan (t[loan-number] = s[loan-number]
∧ s[amount] > 1200)}

• In English, we read the preceding expression as


• “The set of all tuples t such that there exists a tuple s
in relation loan for which the values of t and s for the
loan-number attribute are equal, and the value of s
for the amount attribute is greater than $1200.”
• Consider the query “Find the names of all customers
who have a loan from the Perryridge branch.”
• {t | ∃ s ∈ borrower (t[customer-name] =
s[customer-name] ∧ ∃ u ∈ loan (u[loan-number] =
s[loan-number ] ∧ u[branch-name] =
“Perryridge”))}
• Tuple variable u ensures that the customer is a
borrower at the Perryridge branch.
• Tuple variable s is restricted to pertain to the
same loan number as s.
• To find all customers who have a loan, an account, or
both at the bank
• {t | ∃ s ∈ borrower (t[customer-name] = s[customer-
name]) ∨ ∃ u ∈ depositor (t[customer-name] =
u[customer-name])}

• If we now want only those customers who have both


an account and a loan at the Bank
• {t | ∃ s ∈ borrower (t[customer-name] = s[customer-
name]) ∧ ∃ u ∈ depositor (t[customer-name] =
u[customer-name])}
Formal Definition
• A tuple-relational-calculus expression is of the form
• {t | P(t)}
• where P is a formula.
• Several tuple variables may appear in a formula.
The Domain Relational Calculus
• A second form of relational calculus, called domain
relational calculus,
• uses domain variables that take on values from an
attributes domain, rather than values for an entire
tuple.
• The domain relational calculus is closely related to
the tuple relational calculus.
Formal Definition
• An expression in the domain relational calculus is of
the form
• {< x1, x2, . . . , xn > | P(x1, x2, . . . , xn)}
• where x1, x2, . . . , xn represent domain variables.
• P represents a formula composed of atoms,
• An atom in the domain relational calculus has one of
the following forms:
• < x1, x2, . . . , xn > ∈ r, where r is a relation on n
attributes and x1, x2, . . . , xn are domain variables or
domain constants.
Example Queries
• Find the loan number, branch name, and amount for
loans of over $1200:
• {< l, b, a > | < l, b, a > ∈ loan ∧ a > 1200}

• Find all loan numbers for loans with an amount


greater than $1200:
• {< l > | ∃ b, a (< l, b, a > ∈ loan ∧ a > 1200)
• Find the names of all customers who have a loan
from the Perryridge branch and find the loan amount:
• {< c,a > | ∃ l (< c, l >∈ borrower ∧ ∃ b (< l, b, a > ∈
loan ∧ b = “Perryridge”))}

• Find the names of all customers who have a loan, an


account, or both at the Perryridge branch:
• {< c > | ∃ l (< c, l >∈ borrower ∧∃b, a (< l, b, a >∈ loan
∧ b = “Perryridge”)) ∨∃a (< c,a >∈ depositor ∧∃b, n (<
a, b, n >∈ account ∧ b = “Perryridge”))}

You might also like