Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

Fundamental and Advanced Database Tutorial

Uploaded by

Abrham Lemu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Fundamental and Advanced Database Tutorial

Uploaded by

Abrham Lemu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

1

Database System Concepts and Architecture


Data Model
ØA set of concepts to describe the logical database

ØData models define how data is connected to each other and how they are processed

and stored inside the system.

• like specifying the datatype, constraints

• Structure of database means data types, relationships and constraints.

• They provide abstraction:- hiding of details of how the data are stored

and maintained
Categories of Data Models
1. Conceptual (high-level, semantic) data models
üProvide concepts that are
ü Also called entity-based or object-based data models.
üUse concepts such as:
• Entities- represents real-world objects
• Attributes- represents properties of entity
• Relationships- represents association among entities
2. Physical (low-level, internal) data models:
ü Provide concepts that is stored in the computer.
ü By representing information like
• Record formats
• Record ordering
• Access paths
3.Representational or Implementation data models:
üRepresent data using record structure

ü Data models under the category of this

vRelational Database Model – data model based on tables.

v Network Database Model– data model based on with records as


and relationships between records as .

vHierarchical Database Model– data model based on


Three-Schema Architecture
1. External or View level (individual user view)

• Contains a number of external schemas or user views

• Each external schema describes the part of the database that a particular user group is interested in
and hides the rest.

2. Conceptual or Logical level (community user view)

• Describes what data are stored in the and relationships among the data

3. Internal or Physical level (physical or storage view)

§ Has an that describes the details of the physical storage structures of


the database
What is Data independence ?
ü The capacity to change the schema at one level of a DBMS without having to change the schema at the
next higher level(s).
ü Only the mappings between the levels are changed
ü There are two kinds of data independence:
1. Logical data independence
• Is the capacity to change the conceptual schema without having to change the external schemas or application
programs
• When do we need to change the conceptual schema ?
• To expand the database
• Reduce the database
2. Physical data independence
• Is the capacity to change the internal schema without having to change the conceptual schema or external
schemas
• When do we need to change the internal schema?
• To improve the performance of retrieval or update of data but the conceptual schema will remain the
same if we don’t add extra record type, data item or constraint into the database.
Database Languages and Interfaces
Data Manipulation Language (DML)
• For manipulation of data in the DB
• Enable the user to perform operations like retrieval, insertion, deletion, and
modification of the data
• Sql Statements like SELECT, UPDATE, INSERT, DELETE

Data Definition Language (DDL)


• DDL Used by the DBA and database designers

• In DBMS where no strict separation of level is maintained


• DDL, is used to define internal and conceptual schemas.
• Sql Statements like CREATE, DROP, RENAME, TRUNCATE
Data Control Language (DCL)

• Allows database administrators to configure security access to relational

databases
• DCL is the simplest of the SQL subsets, as it consists of only three
commands: GRANT, REVOKE, and DENY
Classification of Database Management Systems
• Based on the data models
ü Relational, Hierarchical, Network, Object- Oriented
• Based on number of users
ü Single-user
ü Multi-user
• Based on number of Sites
üCentralized
üDistributed
• Based on Cost
üLow cost
üMedium cost
üHigh cost
Entity Relationship (ER) Model

• E-R modeling is mainly used to create the conceptual schema for the database from the
collected system specifications

• It also contains the descriptions of entities called among


the entities

• An entity-relationship model describes data in terms of the following:

• Entities

• Relationship between entities

• Attributes of entities
E/R Diagram Representation
Strong Entity
Weak Entity
Single Valued Attribute
Multivalued Attribute
Derived Attribute
Composite attribute
Key
Relationship

12
Introduction to the E-R Model

Lname CName
FName ID No. CCode Credit

Instructor Teaches Course

What do you understand from the above E-R diagram?

13
Types of Attribute
• It can be

simple
Stored

attribute
composite

Derived

Single value
Multi value

14
Simple VS Composite Attributes
1. Simple or an atomic attribute: cannot be further divided into smaller
• Components composed of single attribute with an independent Existence
• Attributes that are not divisible
• Examples:
• Gender, and SSN, FName
2. Composite attribute: it can be divided into smaller subparts where each
subpart is either atomic or composite
• Composite attributes can form a hierarchy
• Can be divided into further parts
• Examples:
• Name:( First Name, Last Name)
• Address: (Street, City, State, Zip Code)
15
Single-Valued VS Multi-Valued Attributes
3. Single-valued : attributes have a single value for an entity instance
üThe majority of attributes are single valued attribute for a particular entity.
üExamples: Name, Date Of Birth, Reg. No
4. Multi-valued attributes, may have more than one value for an entity instance
üMay have lower and upper bounds on the number of values allowed for each individual
entity.
üDenoted with a double-lined ellipse
• For example
• college degree: Bachelor, Master and PhD
• Languages: Stores the names of the languages that a student speaks
• Phone number: Mobile phone , Office phone , Home phone
• Hoppy: {Reading book, Listening music, Watching Tv, Playing Football}

16
Stored VS Derived Attributes
• The value of a derived attribute can be determined by analyzing other attributes i.e can be
derived from other attributes

• Therefore, no need to store them in the database

• Denoted with a
• Example:
• Age: Can derived from the current date and the attribute DateOfBirth
• An attribute whose value cannot be derived from the values of other attributes is called a
stored attribute
• stored attribute: From which the value of other attributes are drived.
• E.g Birthdate of a person
17
Database Design
Functional Dependency & Normalization

• Database normalization is a series of steps followed to obtain a


database design that allows for consistent storage and efficient access
of data in a relational database.
• These steps reduce data redundancy and the risk of data becoming inconsistent.
• NORMALIZATION is the process of identifying the logical
associations between data items and designing a database that will
represent such associations but without suffering the update anomalies
which are;
1. Insertion Anomalies
2. Deletion Anomalies
3. Modification Anomalies
Cont…’
Normalization is a process that helps to
• Reduce the total amount of redundant data in the database,
• Organizes data efficiently, Improves data consistency
• Reduces the potential for anomalies during data operations, and
• Reduce the use of NULLS in the database
• Reduce the number of columns in tables
• Reduce the amount of SQL code
• Reduce the total number of indexes
The purpose of normalization is to reduce the chances for anomalies to occur in a
database, and bring database to consistent state.

19
Functional Dependency (FD)
• Two data items A and B are said to be in a determinant or dependent relationship if
certain values of data item B always appears with certain values of data item A.
• If the data item A is the determinant data item and B the dependent data item then the
direction of the association is from A to B and not vice versa.
• " A determines B," or that "B is a function of A," or that "A functionally governs B.“
• "If A, then B” It is important to note that the value B must be unique for a given value
of A
• i.e., any given value of A must imply just one and only one value of B, in order for
the relationship to qualify for the name function.
• X  Y holds if whenever two tuples have the same value for X, they must have the
same value for Y.
• The notation is: A  B which is read as; B is functionally dependent on A

• Each value of A is associated with exactly one value of B.


• In general, a functional dependency is a relationship among attributes.
20
Example
R = ( studid, name, courseno, course_name, dept, Dhead )
• Given a student_ID, we can determine the student name)
• (note that given a student name we cannot determine the
student_ID)
• Given a courseno, we can determine the course name.

• X  Y X functionally determines Y
F = { studID  Name,
courseno  course_Name }

21
Functional dependency
• Example check FD or not
• Rno name FD
• nameRno not FD
• {Rno,name}marks FD

22
Partial Functional Dependency
• A functional dependency X  Y is a partial dependency if there is
some attribute that can be removed from X and yet the dependency
still holds.
• If an attribute which is not a member of the primary key is dependent on some part
of the primary key (if we have composite primary key) then that attribute is
partially functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is no key attribute.
Then if {A,B}C and BC

• Then C is partially functionally dependent on {A,B}

23
Example

• DeptId determine Deptname also functional dependency on student department table


• So,DeptName is functional dependency on studentID,Deptid,and also its own subset
DeptID

24
Full functional Dependency(FFD)
• If X and Y are attribute set of a relation, Y is fully functionally dependent on
X if Y is functionally dependent on X, but not on any proper subset of X.
• If an attribute which is not a member of the primary key is not dependent on
some part of the primary key but the whole key (if we have composite
primary key) then that attribute is fully functionally dependent on the primary
key.
• Let {A,B} is the Primary Key and C is no key attribute
• Then if {A,B}C and BC and AC does not hold
• Then C Fully functionally dependent on {A,B}

25
Example

• b/c price is not functional dependent on any of the subset of the determinate
supplier and itemid
26
Cont…’
Transitive Dependency
• In mathematics and logic, a transitive relationship is a relationship of the
following form: "If A implies B, and if also B implies C, then A implies C."
• Example: If Mr X is a Human, and if every Human is an Animal, then
Mr X must be an Animal.
• Generalized way of describing transitive dependency is that:
• If A functionally governs B, AND
• If B functionally governs C
• THEN A functionally governs C
• Provided that neither C nor B determines A i.e. (B / A and C / A)
In the normal notation:
{(AB) AND (BC)} ==> AC provided that B / A and C / A
27
Normal Forms
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
Normalization towards a logical design consists of the following steps:
• Unnormalized Form: Identify all data elements
• First Normal Form: Find the key with which you can find all data
• Second Normal Form: Remove part-key dependencies. Make all data dependent on the
whole key.
• Third Normal Form:
• Remove non-key dependencies. Make all data dependent on nothing but the key.
• For most practical purposes, databases are considered normalized if they adhere to third
normal form.

28
First Normal Form(1NF)
• Definition: a table (relation) is in 1NF:
Ø There are no duplicated rows in the table. Unique identifier
Ø Each cell is single-valued (i.e., there are no repeating groups, no
composite attributes).
Ø Entries in a column (attribute, field) are of the same kind.
Ø Determine the PK of the new entity
Ø Repeat steps until no more repeating groups.
Example: for First Normal form (1NF )
• Unnormalized Form

EmpID FirstName LastName Skill SkillType School SchoolAdd SkillLevel


12 Abebe Mekuria SQL, Database, AAU, Sidist_Kilo 5
VB6 Programming Helico Piazza 8
16 Lemma Alemu C++ Programming Unity Gerji 6
. IP Programming Jimma Jimma City 4
28 Chane Kebede SQL Database AAU Sidist_Kilo 10
65 Almaz Belay SQL Database Helico Piazza 9
Prolog Programming Jimma Jimma City 8
Java Programming AAU Sidist_Kilo 6
24 Dereje Tamiru Oracle Database Unity Gerji 5
94 Alem Kebede Cisco Networking AAU Sidist_Kilo 7

FIRST NORMAL FORM (1NF)-How can we make it 1NF?


• By Create new rows, so each cell contains only one value (Form new relation for each non-atomic attribute or nested
relation.)
• Remove all repeating groups.
• Distribute the multi-valued attributes into different rows and identify a unique identifier for the relation so that is can be
said is a relation in relational database.
30
Example: for First Normal form (1NF )
• Normalized Form
EmpID FirstName LastName SkillID Skill SkillType School SchoolAdd SkillLevel
12 Abebe Mekuria 1 SQL Database AAU Sidist_Kilo 5
12 Abebe Mekuria 3 VB6 Programming Helico Piazza 8
16 Lemma Alemu 2 C++ Programming Unity Gerji 6
16 Lemma Alemu 7 IP Programming Jimma Jimma City 4
.
28 Chane Kebede 1 SQL Database AAU Sidist_Kilo 10
65 Almaz Belay 1 SQL Database Helico Piazza 9
65 Almaz Belay 5 Prolog Programming Jimma Jimma City 8
65 Almaz Belay 8 Java Programming AAU Sidist_Kilo 6
24 Dereje Tamiru 4 Oracle Database Unity Gerji 5
94 Alem Kebede 6 Cisco Networking AAU Sidist_Kilo 7

FIRST NORMAL FORM (1NF)


Remove all repeating groups.
All attributes depend on the key
31
Second Normal Form(2NF)

• No partial dependency of a non key attribute on part of the primary key.


• Any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is
automatically also in 2NF.
Definition: a table (relation) is in 2NF
Ø It is in 1NF and
Ø If all non-key attributes are fully dependent on the entire primary key.
Ø i.e. no partial dependency.
• Remove part-key dependencies (partial dependency)
• Make all attribute data dependent on the whole key.
• Decompose and set up a new relation for each partial key with its dependent
attribute(s).
32
Cont..’
EmpID EmpName ProjNo ProjName ProjLoc ProjFund ProjMangID Incentive

Business rule: Whenever an employee participates in a project, he/she will be entitled for
an incentive.
• This schema is in its 1NF since we don’t have any repeating groups or attributes with
multi-valued property.
• To convert it to a 2NF we need to remove all partial dependencies of non key attributes on
part of the primary key.
• {EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund, ProjMangID, Incentive
• But in addition to this we have the following dependencies
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo} Incentive
• Some non key attributes are partially dependent on some part of the primary key. This can
be witnessed by analyzing the first two functional dependencies (FD1 and FD2).
• Thus, each Functional Dependencies, with their dependent attributes should be moved to a
new relation where the Determinant will be the Primary Key for each.
33
Cont…’
• EMPLOYEE EmpID EmpName

• PROJECT ProjNo ProjName ProjLoc ProjFund ProjMangID

• EMP_PROJ
EmpID ProjNo Incentive

34
Third Normal Form(3rd)
• Eliminate Columns that are dependent on another non-Primary Key.
• If attributes do not contribute to a description of the key
• i.e. if it is not directly dependent remove them to a separate table.
Definition: a Table (Relation) is in 3NF:
• It is in 2NF and
• All attributes depend on nothing but the key.
• There are no transitive dependencies between a primary key and non-primary key
attributes.
• Generally, a table is said to be normalized if it reaches 3NF.
• A database with all tables in the 3NF is said to be Normalized Database.

35
Cont…’
Example: for (3NF) Assumption: Students of the same batch (same
year) live in one building or dormitory
STUDENT
StudID Stud_FName Stud_LName Dept Year Dormitary

125/97 Abebe Mekuria Info Sc 1 401

654/95 Lemma Alemu Geog 3 403

842/95 Chane Kebede CompSc 3 403

165/97 Alem Kebede InfoSc 1 401

985/95 Almaz Belay Geog 3 403

36
Cont…’
• This schema is in its 2NF since the primary key is a single attribute.
• Let’s take StudID,Year and Dormitary and see the dependencies.
• StudID  Year AND YearDormitary
• And Year can not determine StudID and Dormitary can not determine StudID
Then transitively StudIDDormitary
• To convert it to a 3NF we need to remove all transitive dependencies of non
key attributes on another non-key attribute.
• The non-primary key attributes, dependent on each other will be moved to
another table and linked with the main table using Candidate Key- Foreign Key
relationship.

37
Cont….’
StudID StudF_Na StudL_Na Dept Year
me me Year Dormitary
STUDENT 125/97 Abebe
654/95 Lemma
Mekuria
Alemu
Info Sc
Geog
1
3
DORM 1 401
3 403
842/95 Chane Kebede CompS 3
c
165/97 Alem Kebede InfoSc 1
985/95 Almaz Belay Geog 3

38
SQL LANGUAGE

Structured Query Language (SQL) is a query language that is standardized for


RDBMS.
SQL Statements (commonly referred to as 'queries') are run to retrieve or modify
the requested information from the database
SQL supports:
Data Definition Language (DDL), and
Data Manipulation Language (DML)
DQL (Data Query Language)
DCL (Data Control Language)
TCL (Transaction Control Language)
SQL Commands
The Relational Data Model and the Relational Algebra.

• Relational Model

• The data and the relations between them are organized in tables.

• A table is a collection of records and each record in a table contains the same
fields organized in columns.

• The records in the table form the rows of the table.


CONT’S

• Properties of Relational Tables:

ØValues Are Atomic

ØEach Row is Unique

ØColumn Values Are of the Same Kind

ØThe Sequence of Columns is Insignificant

ØThe Sequence of Rows is Insignificant

ØEach Column Has a Unique Name


Relational Schema
• A relation in a relational model consists of:
• The Relation schema: - that describes the column heads for the table, and
• The Relation instance: - that is the table with the set of tuples.
• The relation schema specifies:
• The relation's name
• Name for each attribute (field or column), and
• Domain of each attribute
Relational Algebra
Relational Algebra is a procedural query language that consists of a set of operations
that take one or two relations as input and produce a new relation as a result.
Fundamental Operations of Relational Algebra
 Unary Operators.
Selection :- 
Projection :- 
Rename :- 
 Binary Operators.
Product (Cartesian Product) :- 
Union :- 
Difference :- –
Advanced Database Tutorial

45
Motivation of ODBMSs

• Complex objects in emerging DBMS applications cannot be effectively


represented as records in relational model.
• Representing information in RDBMSs requires complex and inefficient
conversion into and from the relational model to the application programming
language
• ODBMSs provide a direct representation of objects to DBMSs overcoming the
impedance mismatch problem

46
What is Object Oriented Database? (OODB)

• A database system that incorporates all the important object-oriented concepts

• Some additional features

• Unique Object identifiers

• Persistent object handling

47
Object Oriented Database Management
qObject Oriented databases have evolved along two different paths:
qPersistent Object Oriented Programming Languages: (pure ODBMSs)
üStart with an OO language (e.g., C++, Java, SMALLTALK) which has a rich type
system
üAdd persistence to the objects in programming language where persistent objects
stored in databases
qObject Relational Database Management Systems (SQL Systems)
üExtend relational DBMSs with the rich type system and user-defined functions.
üProvide a convenient path for users of relational DBMSs to migrate to OO
technology
üAll major vendors (e.g., Informix, Oracle) will/are supporting features of SQL.

48
Object Oriented Concepts

q Object:
• observable entity in the world being modeled
• similar to concept to entity in the E/R model
• An object consists of:
Øattributes: properties built in from primitive types
Ørelationships: properties whose type is a reference to some other object or a
collection of references
Ømethods: functions that may be applied to the object.
qClass
• Similar objects with the same set of properties and describing similar real-world
concepts are collected into a class. 49
Class Extents

• For each OQL class, an extent may be declared.


• Extent is the current set of objects belonging to the class.
• Similar notion to the relation in the relational model.
• Queries in OQL refer to the extent of a class and not the class directly.
Subclasses and Inheritance
• A class can be declared to be a subclass of another class.
• Subclasses inherit all the properties
• attributes
• relationships
• methods
from the superclass.

50
Multiple Inheritance
• A class may have more than one superclass.

• A class inherits properties from each of its superclasses.

• There is a potential of ambiguity -- variable with same name inherited


from two super classes:
• rename variable

• choose one

51
Object Identity
• Object identity is a property of data that is created in the context of an object data model,
where an object is assigned a unique internal object identifier, or OID.

• Object identity is a stronger notion of identity than in relational DBMSs.

• Identity in relational DBMSs is value based (primary key).

• Identity in ODBMSs built into data model

• no user specified identifier is required

• OID is a similar notion as pointer in programming language

• Object identifier (OID) can be stored as attribute in object to refer to another object.
52
Persistence

• Objects created may have different lifetimes:


• Transient: allocated memory managed by the programming language run-time system.
üE.g., local variables in procedures have a lifetime of a procedure execution
üglobal variables have a lifetime of a program execution
• Persistent: allocated memory and stored managed by ODBMS runtime system.

ü Classes are declared to be persistence-capable or transient.

• Different languages have different mechanisms to make objects persistent:


• creation time: Object declared persistent at creation time (e.g., in C++ binding) (class must be
persistent-capable)
• persistence by reachability: object is persistent if it can be reached from a persistent object
(e.g., in Java binding) (class must be persistent-capable).
53
Encapsulation
Encapsulation hides the implementation details of your database(s), including their physical
schemas, from your business code.

• To encourage encapsulation, an operation is defined in two parts:

q signature or interface of the operation, specifies the operation name and arguments
(or parameters).

qmethod or body, specifies the implementation of the operation

54
Polymorphism

qThis refers to an operation’s ability to be applied to different types of objects; in


such a situation, an operation name may refer to several distinct implementations,
depending on the type of objects it is applied to. This feature is also called operator
overloading

q Polymorphism is the capability of an object to take multiple forms. This ability


allows the same program code to work with different data types

55
Complex Objects
• Unstructured complex object:
q These is provided by a DBMS and permits the storage and retrieval of large objects
that are needed by the database application.
q Typical examples of such objects are bitmap images and long text strings (such as
documents); they are also known as binary large objects, or BLOBs for short.
• Structured complex object:
§ This differs from an unstructured complex object in that the object’s structure is
defined by repeated application of the type constructors provided by the OODBMS.
§ Hence, the object structure is defined and known to the OODBMS.
§ The OODBMS also defines methods or operations on it.
56
Introduction to Query Processing

qQuery optimization

üThe process of choosing a suitable execution strategy for processing a query.

üThe query optimization techniques are used to chose an efficient execution plan that
will minimize the runtime as well as many other types of resources such as number of
disk I/O, CPU time and so on.

q Query Processing is a procedure of transforming a high-level query (such as SQL) into a


correct and efficient execution plan expressed in low-level language.

57
Steps of query processing

58
Translating SQL Queries into Relational Algebra
qQuery block:
üThe basic unit that can be translated into the algebraic operators and optimized.
üA query block contains a single SELECT-FROM-WHERE expression, as well
as GROUP BY and HAVING clause if these are part of the block.
q Nested queries
ü within a query are identified as separate query blocks.
üAggregate operators in SQL must be included in the extended algebra.

59
Using Heuristics in Query Optimization

• Process for heuristics optimization


1. The parser of a high-level query generates an initial internal representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of operations based on the
access paths available on the files involved in the query.

• The main heuristic is to apply first the operations that reduce the size of intermediate
results.
• E.g., Apply SELECT and PROJECT operations before applying other binary
operations.

60
Internal representation of Query Optimization
Query tree:
ØA tree data structure that corresponds to a relational algebra expression. It represents the
input relations of the query as leaf nodes of the tree, and represents the relational algebra
operations as internal nodes.

ØAn execution of the query tree consists of executing an internal node operation whenever
its operands are available and then replacing that internal node by the relation that results
from executing the operation.
Query graph:
• A graph data structure that corresponds to a relational calculus expression. It does not
indicate an order on which operations to perform first. There is only a single graph
corresponding to each query. 61
Using Selectivity and Cost Estimates in Query Optimization
• Cost-based query optimization:
• Estimate and compare the costs of executing a query using different execution strategies and
choose the strategy with the lowest cost estimate.
• (Compare to heuristic query optimization)
• Issues
• Cost function
• Number of execution strategies to be considered
• Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
• Note: Different database systems may focus on different cost components
62
Cont’s
• Catalog Information Used in Cost Functions
• Information about the size of a file
• number of records (tuples) (r),
• record size (R),
• number of blocks (b)
• blocking factor (bfr)
• Information about indexes and indexing attributes of a file
• Number of levels (x) of each multilevel index
• Number of first-level index blocks (bI1)
• Number of distinct values (d) of an attribute
• Selectivity (sl) of an attribute
• Selection cardinality (s) of an attribute. (s = sl * r)

63
Overview of Query Optimization in Oracle
• Oracle DBMS V8
• Rule-based query optimization: the optimizer chooses execution plans based
on heuristically ranked operations.
• (Currently it is being phased out)
• Cost-based query optimization: the optimizer examines alternative access
paths and operator algorithms and chooses the execution plan with lowest
estimate cost.
• The query cost is calculated based on the estimated usage of resources
such as I/O, CPU and memory needed.
• Application developers could specify hints to the ORACLE query optimizer.
• The idea is that an application developer might know more information about
the data.

64
INTRODUCTION TO TRANSACTION PROCESSING

• A Transaction:
• Logical unit of database processing that includes one or more access operations (read -
retrieval, write - insert or update, delete).
• A transaction (set of operations) may be stand-alone specified in a high level language like
SQL submitted interactively, or may be embedded within application program.
• Example : Transfer of money that amounts 100 from checking account to savings account
• Transaction boundaries:
• Begin and End transaction.
• An application program may contain several transactions separated by the Begin and
End transaction boundaries
• Basic operations are read and write
• read_item(X): Reads a database item named X into a program variable. To simplify our
notation, we assume that the program variable is also named X.
• write_item(X): Writes the value of program variable X into the database item named X

65
Why Concurrency Control is needed:
• The Lost Update Problem
• This occurs when two transactions that access the same database items have their
operations interleaved in a way that makes the value of some database item
incorrect. The update made by the fist transaction is lost (overwritten) by the
second transaction.
• The Temporary Update (or Dirty Read) Problem
• This occurs when one transaction updates a database item and then the
transaction fails for some reason.
• The updated item is accessed by another transaction before it is changed back to
its original value.
• The Incorrect Summary Problem
• If one transaction is calculating an aggregate summary function on a number of
records while other transactions are updating some of these records, the aggregate
function may calculate some values before they are updated and others after they
are updated.
66
What causes a Transaction to fail
1. A computer failure (system crash):

A hardware or software error occurs in the computer system during transaction execution. If the hardware crashes, the contents of the
computer’s internal memory may be lost.

2. A transaction or system error:

Some operation in the transaction may cause it to fail, such as integer overflow or division by zero.

3. Local errors or exception conditions detected by the transaction:

ü Certain conditions necessitate cancellation of the transaction.

4. Concurrency control enforcement:

ü The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or
because several transactions are in a state of deadlock

5. Disk failure:

ü Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash.

6.Physical problems and catastrophes:

ü This refers to an endless list of problems that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks
or tapes by mistake, and mounting of a wrong tape by the operator.

67
Transaction states
• Active state
• Partially committed state
• Committed state
• Failed state
• Terminated State

68
Desirable Properties of Transactions
ACID properties
• Atomicity: A transaction is an atomic unit of processing; it is either
performed in its entirety or not performed at all.
• Consistency preservation: A correct execution of the transaction must
take the database from one consistent state to another.
• Isolation: A transaction should not make its updates visible to other
transactions until it is committed; this property, when enforced strictly,
solves the temporary update problem and makes cascading rollbacks of
transactions unnecessary.
• Durability or permanency: Once a transaction changes the database and
the changes are committed, these changes must never be lost because of
subsequent failure.

69
Database Concurrency Control
• Concurrency Control: the process of managing simultaneous operations on the database without having
them interfere with one another.
• Purpose of Concurrency Control
• To enforce Isolation (through mutual exclusion) among conflicting transactions.
• To preserve database consistency through consistency preserving execution of transactions.
• To resolve read-write and write-write conflicts
Two-Phase Locking Techniques
A lock is a mechanism to control concurrent access to a data item
• Locking is an operation which secures
• (a) permission to Read
• (b) permission to Write a data item for a transaction.
• Example:
• Lock (Li(X)). Data item X is locked in behalf of the requesting transaction.
• Unlocking is an operation which removes these permissions from the data item.
• Example:
• Unlock (Ui(X)): Data item X is made available to all other transactions.
• Lock and Unlock are Atomic operations. 70
Two-Phase Locking Techniques: Essential components
• Two locks modes:
• (a) shared (read) (b) exclusive (write).
• Shared mode: shared lock (X)
• More than one transaction can apply share lock on X for reading its value but no write
lock can be applied on X by any other transaction.
• Exclusive mode: Write lock (X)
• Only one write lock on X can exist at any time and no shared lock can be applied by any
other transaction on X.
• Conflict matrix
Read Write
Read Write

Y N

N N

71
Two-Phase Locking Techniques: The algorithm
• Two Phases:
(a) Locking (Growing)
(b) Unlocking (Shrinking).
• Locking (Growing) Phase:
üA transaction applies locks (read or write) on desired data items one at a time.
üTransaction may obtain locks
üTransaction may not release locks
• Unlocking (Shrinking) Phase:
üA transaction unlocks its locked data items one at a time.
üTransaction may release locks
ü Transaction may not obtain locks
• Requirement:
üFor a transaction these two phases must be mutually exclusively, that is,
during locking phase unlocking phase must not start and during unlocking
phase locking phase must not begin.

72
DATABASE RECOVERY TECHNIQUES

• Purpose of Database Recovery:


üTo bring the database into the last consistent state, which existed prior to the failure.
üTo preserve transaction properties (Atomicity, Consistency, Isolation and
Durability).
The database may become unavailable for use due to:
üTransaction failure: Transactions may fail because of incorrect input, deadlock,
incorrect synchronization.
üSystem failure: System may fail because of addressing error, application error,
operating system fault, RAM failure, etc.
üMedia failure: Disk head crash, power disruption, etc

73
Transaction Log
• For recovery from any type of failure data values prior to modification
(BFIM - Before Image) and the new value after modification (AFIM
– After Image) are required.
• These values and other information is stored in a sequential file called
Transaction log. A sample log is given below

74
Data Update

• Immediate Update: As soon as a data item is modified in cache, the disk copy is
updated.

• Deferred Update: All modified data items in the cache is written either after a
transaction ends its execution or after a fixed number of transactions have
completed their execution.

• Shadow update: The modified version of a data item does not overwrite its disk
copy but is written at a separate disk location.

• In-place update: The disk version of the data item is overwritten by the cache
version.
75
Checkpointing
• Time to time (randomly or under some criteria) the database flushes its buffer to
database disk to minimize the task of recovery
• Possible ways for flushing database cache to database disk:
1. Steal: Cache can be flushed before transaction commits.
q It avoids the need for a very large buffer space to store updated pages in
memory.
2. No-Steal: Cache cannot be flushed before transaction commit.
3. Force: Cache is immediately flushed (forced) to disk
q All pages updated by a transaction are immediately written to disk when the
transaction commits).
4. No-Force: Cache is deferred until transaction commits
qThis eliminate the I/O cost to read that page again from disk

76
Different ways for handling recovery:

• Deferred Update (No Undo/Redo)/ (No-Steal/No-Force)


• A set of transactions records their updates in the log.
• At commit point under WAL scheme these updates are saved on database disk.
• After reboot from a failure the log is used to redo all the transactions affected by this failure. No undo is
required because no AFIM is flushed to the disk before a transaction commits.

• Immediate Update (Undo/No-redo )/ (Steal/Force)


• In this algorithm AFIMs of a transaction are flushed to the database disk under WAL before it commits.
• For this reason the recovery manager undoes all transactions during recovery.
• No transaction is redone.
• It is possible that a transaction might have completed execution and ready to commit but this transaction
is also undone
77
Immediate Update (Undo/redo )/ (Steal/No-Force)
ØRecovery schemes of this category apply undo and also redo for recovery.
ØIn a single-user environment no concurrency control is required but a log is maintained under
WAL.
ØNote that at any time there will be one transaction in the system and it will be either in the
commit table or in the active table.
ØThe recovery manager performs:
• Undo of a transaction if it is in the active table.
• Redo of a transaction if it is in the commit table
Shadow Paging
• The AFIM does not overwrite its BFIM but recorded at another place on the disk. Thus, at any
time a data item has AFIM and BFIM (Shadow copy of the data item) at two different places on
the disk
X Y
X' Y'

Database
78
Shadow Paging

• To manage access of data items by concurrent transactions two


directories (current and shadow) are used.
• The directory arrangement is illustrated below. Here a page is a data
item.

79
Recovery in multidatabase system

• A multidatabase system is a special distributed database system where one node may be
running relational database system under UNIX, another may be running object-
oriented system under Windows and so on.

• Use Two-phase commit protocol(2pcp):

• Phase1: when all participating databases signal the coordinator that the part of the
MDT involving each has concluded, the coordinator sends a “prepare for commit”
message to each participant to get ready for committing the transaction.

• Phase 2: If all participating databases reply ok, the transaction is successful and the
coordinator sends a commit signal to the participating DBs
80
Distributed Databases and Client-Server Architectures

• A distributed database (DDB) is a collection of multiple logically related database


distributed over a computer network, and a distributed database management
system(DDBMS) as a software system that manages a distributed database while
making the distribution transparent to the user.

• Advantages of DDB

ØManagement of distributed data with different levels of transparency:

• This refers to the physical placement of data (files, relations, etc.) which is
not known to the user (distribution transparency).

81
CONT’S
• Distribution and Network transparency:
• Users do not have to worry about operational details of the network.
• There is Location transparency, which refers to freedom of issuing command
from any location without affecting its working.
• Then there is Naming transparency, which allows access to any names object
(files, relations, etc.) from any location.
• Replication transparency:
• It allows to store copies of a data at multiple sites.
• This is done to minimize access time to the required data.
• Fragmentation transparency:
• Allows to fragment a relation horizontally (create a subset of tuples of a relation) or
vertically (create a subset of columns of a relation).
82
CONT’S

Ø Increased reliability and availability:


• Reliability refers to system live time, that is, system is running efficiently most of the time.
• Availability is the probability that the system is continuously available (usable or accessible)
during a time interval.
• A distributed database system has multiple nodes (computers) and if one fails then others are
available to do the job.
Ø Improved performance:
• A distributed DBMS fragments the database to keep data closer to where it is needed most.
• This reduces data management (access and modification) time significantly.
Ø Easier expansion (scalability):
• Allows new nodes (computers) to be added anytime without chaining the entire configuration.

83
Data Fragmentation, Replication and Allocation
Ø Data Fragmentation
• Split a relation into logically related and correct parts. A relation can be fragmented in two ways:
• Horizontal Fragmentation
• Vertical Fragmentation

• Horizontal fragmentation
• It is a horizontal subset of a relation which contain those of tuples which satisfy selection conditions.
• Consider the Employee relation with selection condition (DNO = 5). All tuples satisfy this condition will
create a subset which will be a horizontal fragment of Employee relation.
• A selection condition may be composed of several conditions connected by AND or OR.
• Derived horizontal fragmentation: It is the partitioning of a primary relation to other secondary relations
which are related with Foreign keys.

84
Vertical fragmentation

• It is a subset of a relation which is created by a subset of columns. Thus a vertical

fragment of a relation will contain values of selected columns. There is no selection

condition used in vertical fragmentation.

• Consider the Employee relation. A vertical fragment of can be created by keeping the

values of Name, Bdate, Sex, and Address.

• Because there is no condition for creating a vertical fragment, each fragment must

include the primary key attribute of the parent relation Employee. In this way all

vertical fragments of a relation are connected

85
• Data Replication

• Database is replicated to all sites.

• In full replication the entire database is replicated and in partial replication some

selected part is replicated to some of the sites.

• Data replication is achieved through a replication schema.

• Data Distribution (Data Allocation)

• This is relevant only in the case of partial replication or partition.

• The selected portion of the database is distributed to the database sites.


86
Types of Distributed Database Systems
• Homogeneous
• All sites of the database system have identical setup, i.e., same database
system software.

• The underlying operating system may be different.

• For example, all sites run Oracle or DB2, or Sybase or some other database
system.

• The underlying operating systems can be a mixture of Linux, Window, Unix,


etc.

87
• Heterogeneous
• Federated: Each site may run different database system but the data access is
managed through a single conceptual schema.

• This implies that the degree of local autonomy is minimum. Each site must
adhere to a centralized access policy. There may be a global schema.

• Multidatabase: There is no one conceptual global schema. For data access a


schema is constructed dynamically as needed by the application software.

88
Query Processing in Distributed Databases
• Issues
• Cost of transferring data (files and results) over the network.
• This cost is usually high so some optimization is necessary.
• Example relations: Employee at site 1 and Department at Site 2
• Employee at site 1. 10,000 rows. Row size = 100 bytes. Table size = 106
bytes.

• Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3,500
bytes.
• Q: For each employee, retrieve employee name and department name Where
the employee works.
• Q:  Fname,Lname,Dname (Employee Dno = Dnumber Department)

89
• Result
• The result of this query will have 10,000 tuples, assuming that every employee is
related to a department.
• Suppose each result tuple is 40 bytes long. The query is submitted at site 3 and
the result is sent to this site.
• Problem: Employee and Department relations are not present at site 3.
• Strategies:
1. Transfer Employee and Department to site 3.
• Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3.
• Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size =
400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at site 1, and send the
result to site 3.
• Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
• Optimization criteria: minimizing data transfer.
• Preferred approach: strategy 3.
90
Concurrency Control and Recovery
• Distributed Databases encounter a number of concurrency control and recovery problems which are not
present in centralized databases. Some of them are listed below.
• Dealing with multiple copies of data items: The concurrency control must maintain global
consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after
recovery
• Failure of individual sites: Database availability must not be affected due to the failure of one or two
sites and the recovery scheme must recover them before they are available for use.
• Communication link failure :This failure may create network partition which would affect database
availability even though all database sites may be running.
• Distributed commit: A transaction may be fragmented and they may be executed by a number of sites.
This require a two or three-phase commit approach for transaction commit.
• Distributed deadlock: Since transactions are processed at multiple sites, two or more sites may get
involved in deadlock. This must be resolved in a distributed manner.
91
Client-Server Database Architecture
• It consists of clients running client software, a set of servers which provide all
database functionalities and a reliable communication infrastructure.

Server 1 Client 1

Client 2

Server 2 Client 3

Server n Client n

92
THE END!
Q&A

93

You might also like