Cs9152 Unit I
Cs9152 Unit I
Cs9152 Unit I
UNIT I
CS9152 DATABASE
TECHNOLOGY
UNIT I
DISTRIBUTED DATABASES
TEXT BOOK
1. Elisa Bertino, Barbara Catania, Gian Piero Zarri, Intelligent Database Systems,
Addison-Wesley, 2001.
REFERENCES
1. Carlo Zaniolo, Stefano Ceri, Christos Faloustsos, R.T.Snodgrass, V.S.Subrahmanian,
Advanced Database Systems, Morgan Kaufman, 1997.
2. N.Tamer Ozsu, Patrick Valduriez, Principles of Distributed Database Systems,
Prentice Hal International Inc. , 1999.
3. C.S.R Prabhu, Object-Oriented Database Systems, Prentice Hall Of India, 1998.
4. Abdullah Uz Tansel Et Al, Temporal Databases: Theory, Design And
Principles,Benjamin Cummings Publishers , 1993.
5. Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, Mcgraw
Hill, Third Edition, 2004.
6. Henry F Korth, Abraham Silberschatz, S. Sudharshan, Database System Concepts,
Fourth Ediion, McGraw Hill , 2002.
7. R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, Pearson Education,
2004.
DISTRIBUTED DATABASES
UNIT I
Syllabus:
UNIT I
DISTRIBUTED DATABASES
5
Distributed Databases Vs Conventional Databases Architecture
Fragmentation Query Processing Transaction Processing Concurrency
Control Recovery.
Table of Contents
SL No. Topic
1
Introduction to Distributed Databases
2
Distributed Databases Vs Conventional
Databases
3
Architecture
4
Fragmentation
5
Query Processing
6
Transaction Processing
7
Concurrency Control
8
Recovery.
9
Sample Questions
10
University Questions
DISTRIBUTED DATABASES
Page
2
8
9
15
29
35
38
43
49
51
UNIT I
DBMS
DBMS
DBMS
DBMS
data
data
data
data
DISTRIBUTED DATABASES
UNIT I
DISTRIBUTED DATABASES
UNIT I
Distributed Processing
Functions of a DDBMS
UNIT I
DISTRIBUTED DATABASES
UNIT I
Complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more complex or Increased complexity in system design
and implementation.
Applications of DDBMS:
Manufacturing - especially multi-plant manufacturing
Military command and control
Electronic fund transfers and electronic trading
Corporate MIS
Airline restrictions
Hotel chains
Any organization which has a decentralized organization structure
User access the distributed data via applications.
Two types of applications:
Logical Applications: Applications that do not required data from other
sites.
Physical Applications: Applications that required data from other sites.
DISTRIBUTED DATABASES
UNIT I
Types of DDBMS
DISTRIBUTED DATABASES
UNIT I
UNIT I
Location Transparency
User does not have to know the location of the data.
Data requests automatically forwarded to appropriate sites
Local Autonomy
DISTRIBUTED DATABASES
UNIT I
Local site can operate with its database when network connections fail
Each site controls its own data, security, logging, recovery
Synchronous Distributed Database
All copies of the same data are always identical
Data updates are immediately applied to all copies throughout network
Good for data integrity
High overhead slow response times
Advantages
Increased reliability & availability
Local control
Modular growth
Lower communication costs
Faster response
Disadvantages
Software cost & complexity
Processing overhead
Data integrity
Slow response
DDBS = DB + Communication
non-centralised
DDBMS
Motivated by need to integrate operational data and to provide controlled
access
manages the Distributed database
makes the distribution transparent to the user
DISTRIBUTED DATABASES
10
UNIT I
Site 1
Site 2
Site 5
Site 3
Site 4
Site 1
Site 5
Site 2
Site 3
Site 4
DISTRIBUTED DATABASES
11
UNIT I
Implicit Assumptions
Data stored at a number of sites each site logically consists of a single
processor.
Processors at different sites are interconnected by a computer network no
multiprocessors
o parallel database systems
Distributed database is a database, not a collection of files data logically
related as exhibited in the users access patterns
o relational data model
D-DBMS is a full-fledged DBMS
o not remote file system, not a TP system
Dimensions of the Problem
Distribution
o Whether the components of the system are located on the same
machine or not
Heterogeneity
o Various levels (hardware, communications, operating system)
o DBMS important one
data model, query language,transaction management algorithms
Autonomy
o Not well understood and most troublesome
o Various versions
Design autonomy: Ability of a component DBMS to decide on
issues related to its own design.
Communication autonomy: Ability of a component DBMS to
decide whether and how to communicate with other DBMSs.
Execution autonomy: Ability of a component DBMS to execute
local operations in any manner it wants to.
Issues of a DDBMS
Allocation
Where to locate data and whether to replicate?
Data Fragmentation
Partition the database
Distributed catalog management
Distributed transactions
Distributed Queries
Making all of the above transparent to the user is the key of DDBMSs
Data
DISTRIBUTED DATABASES
12
UNIT I
DISTRIBUTED DATABASES
13
UNIT I
Database
14
UNIT I
Partitioning
Query
Processing
Easy
Same
Difficulty
Directory
Management
Easy or
nonexistent
Same
Difficulty
Concurrency
Control
Moderate
Difficult
Easy
Reliability
Very High
High
Low
Reality
Possible
Application
Realistic
Possible
application
Topic 4: Fragmentation
Concept
DISTRIBUTED DATABASES
15
UNIT I
16
UNIT I
DISTRIBUTED DATABASES
17
UNIT I
HORIZONTAL FRAGMENTATION
Original relation
A1
A2
An
T1
A1
A2
An
T1
T2
T3
.T60
T61
Tn
T2
T3
.
.T60
Site 1
A1
A2
An
T61
.
.
Tn
Site 2
How to reconstruct
DISTRIBUTED DATABASES
R= Rs1 Rs2
Rsn
18
DISTRIBUTED DATABASES
UNIT I
19
UNIT I
PHF Example
DISTRIBUTED DATABASES
20
UNIT I
DISTRIBUTED DATABASES
21
UNIT I
DISTRIBUTED DATABASES
22
UNIT I
More Examples:
Example
Consider two tables
Emp
PAY
Id
Name
Dept
Dept
Sal
100
D1
D1
10K
200
D2
D2
20K
300
D3
D3
30K
PAY1
Id
Name
Dept
100
D1
200
D2
PAY2
Id
Name
Dept
300
D3
DHF Correctness
Completeness
o Referential integrity
o Let R be the member relation of a link whose owner is relation S
which is fragmented as F S = {S 1 , S 2 , ..., S n }. Furthermore, let A be the
join attribute between R and S. Then, for each tuple t of R, there
should be a tuple t' of S such that
t[A]=t'[A]
Reconstruction
o Same as primary horizontal fragmentation.
Disjointness
o Simple join graphs between the owner and the member fragments.
b) Vertical Fragmentation (VF)
DISTRIBUTED DATABASES
23
UNIT I
VERTICAL FRAGMENTATION
A1
Original
Relation
(R)
TID Tuple ID
Hidden Attribute to
ensure account
and simple join
reconstruction
A3
A4
t1
How to Reconstruct:
t2
R=Rs1
Rs2
Rsn
tn
A1
RS1
A2
A2
t1
t2
TID
TID
1
2
1
2
A3
RS2
A4
t1
t2
RS1.TID=RS2.TID
tn
tn
SITE1
Join condition
SITE2
24
use(qi,Aj) =
UNIT I
DISTRIBUTED DATABASES
25
UNIT I
tuple-id
customer-name
Lowman
1
Hillside
Camp
2
Hillside
Camp
3
Valleyview
Kahn
4
Valleyview
Kahn
5
Hillside
Kahn
6
Valleyview
Green
7
Valleyview
deposit1= branch-name, customer-name, tuple-id(employee-info)
account number
tuple-id
balance
500
A-305
1
336
A-226
2
205
A-177
3
10000
A-402
4
62
A-155
5
1123
A-408
6
750
A-639
7
deposit2= account-number, balance, tuple-id(employee-info)
Id
Name
Sal
Dept
100
10K
D1
200
20K
D2
300
30K
D3
Horizontal Fragmentation
Name
Sal
Dept
100
10K
D1
200
20K
D2
Id
Name
Sal
Dept
300
30K
D3
DISTRIBUTED DATABASES
Vertical Fragmentation
Name
Id
Sal
Dept
100
100
10K
D1
200
200
20K
D2
300
300
30K
D3
26
UNIT I
MIXED FRAGMENTATION
A1
Original
Relation
(R)
TID Tuple ID
Hidden Attribute to
ensure account
and simple join
reconstruction
A3
A4
How to Reconstruct:
t2
R=Rs1
Rs2
Rsn
tn
A1
RS1
A2
t1
A2
t1
t2
TID
TID
1
2
1
2
A3
RS2
A4
t1
t2
RS1.TID=RS2.TID
tn
tn
SITE1
DISTRIBUTED DATABASES
Join condition
SITE2
27
UNIT I
Advantages of Fragmentation
Horizontal:
allows parallel processing on fragments of a relation
allows a relation to be split so that tuples are located where they are
most frequently accessed
Vertical:
allows tuples to be split so that each part of the tuple is stored where it
is most frequently accessed
tuple-id attribute allows efficient joining of vertical fragments
allows parallel processing on a relation
Vertical and horizontal fragmentation can be mixed.
Fragments may be successively fragmented to an arbitrary depth.
Advantages
1. Permits a number of transactions to executed concurrently
2. Results in parallel execution of a single query
3. Increases level of concurrency, also referred to as, intra query concurrency
4. Increased System throughput
Disadvantages
1. Applications whose views are defined on more than one fragment may suffer
performance degradation, if applications have conflicting requirements.
DISTRIBUTED DATABASES
28
UNIT I
2. Simple asks like checking for dependencies, would result in chasing after
data in a number of sites
PHF Vs VF :
Primary Horizontal
Fragmentation
Vertical Fragmentation
Grouping
Starts by assigning each attribute to
one fragment
29
UNIT I
select vehicle_id
from vehicles
where year = 1977
Note that this SQL statement will still need to be translated further by the DBMS so
that the functions/methods within the DBMS program can not only process the
request, but do it in a timely manner.
Basic Steps in Query Processing
1.
2.
3.
30
UNIT I
DISTRIBUTED DATABASES
31
UNIT I
32
UNIT I
33
UNIT I
Network Topology
Wide area networks (WAN) point-to-point
o characteristics
low bandwidth
low speed
high protocol overhead
o communication cost will dominate; ignore all other cost factors
o global schedule to minimize communication cost
o local schedules according to centralized query optimization
Local area networks (LAN)
o communication cost not that dominant
o total cost function should be considered
o broadcasting can be exploited (joins)
o special algorithms exist for star networks
34
UNIT I
35
UNIT I
------------
DISTRIBUTED DATABASES
36
UNIT I
End. Reservation
Properties of Transactions
ACID (Atomicity, Consistency, Isolation, Durability) Property
Atomicity All or Nothing
Consistency No violation of integrity constraints
Isolation Concurrent changes invisible & serialisable
Durability Committed update persist
37
UNIT I
Reliability Protocols
Atomicity & Durability
Local recovery protocols
Global commit protocols
DISTRIBUTED DATABASES
38
UNIT I
Whats lock?
A lock is a means of claiming usage rights on some resource.
There can be several different types of resources that can be locked and several
different ways of locking those resources.
Most locks used on Teradata resources are locked automatically by default. The
Teradata lock manager implicitly locks the following objects:
Database, Table, View and Row hash.
User can apply four different levels of locking on Teradata resources:
Exclusive, Write, Read and Access.
The Teradata R DBMS applies most of its locks automatically.
DISTRIBUTED DATABASES
39
UNIT I
Distributed Locks
Just like centralised mechanisms. But we need to consider locks that
manage replication and sub-transactions
Four modes of management possible:
o Centralised 2PL
Read any copy, update all for updates
Single site, bottleneck, failure?
o Primary Copy 2PL
Distributes locks, one copy designated primary, others slaves
Only primary copy locked for updates, slaves updated later
o Distributed 2PL
Each site manages its own data locks
All copies locked for an update, high cost of comms
o Majority Locking
Diagrammatic representation
DISTRIBUTED DATABASES
40
UNIT I
Site 1
D1
D2
Site 2
Site 3
D1 (PC)
D2
D3 (PC)
D2 (PC)
D3
Majority Locking
Extension of distributed 2PL
Doesnt lock all copies before update
Needs more than half of locks on a copy to proceed
If so, it informs other sites
Otherwise it cancels request
Only one transaction with an exclusive lock
Many transactions can hold a majority lock on a shared lock
Deadlock
T3 waiting for T1
T1
DISTRIBUTED DATABASES
41
UNIT I
T1 waiting for T2
T2 waiting for T3
Example
Locally:
Text T3 T1 Text
Text T1 T2 Text
Text T2 T3 Text
Maybe
Deadlock?
Text T3 T1 T2 Text
Site 2 sends WFG to site 3, site 3 combines WFG to
Text T3 T1 T2 T3 Text
Definitely Deadlock!
42
UNIT I
o When a failure occurs, how do the sites where the failure occurred
deal with it.
o Independent : a failed site can determine the outcome of a transaction
without having to obtain remote information.
Independent recovery non-blocking termination
Topic 8: Recovery
Purpose of Database Recovery
To bring the database into the last consistent state, which
existed prior to the failure.
To preserve transaction properties (Atomicity, Consistency,
Isolation and Durability).
Example: If the system crashes before a fund transfer transaction completes its
execution, then either one or both accounts may have incorrect value. Thus, the
database must be restored to the state before the transaction modified any of the
accounts.
Ensures database is fault tolerant, and not corrupted by software, system or media
failure
7x24 access to mission critical data.
Loss of message
By network protocol
DDBMS deals with it transparently
Site failure
Types of Failure
The database may become unavailable for use due to
Transaction failure: Transactions may fail because of incorrect input,
deadlock, incorrect synchronization.
System failure: System may fail because of addressing error,
application error, operating system fault, RAM failure, etc.
Media failure: Disk head crash, power disruption, etc.
DISTRIBUTED DATABASES
43
UNIT I
Write-Ahead Logging
When in-place update (immediate or deferred) is used then log is necessary for
recovery and it must be available to recovery manager. This is achieved by WriteAhead Logging (WAL) protocol.
Checkpointing
Time to time (randomly or under some criteria) the database flushes its buffer to
database disk to minimize the task of recovery.
Steal/No-Steal and Force/No-Force
Possible ways for flushing database cache to database disk:
Steal: Cache can be flushed before transaction commits.
No-Steal: Cache cannot be flushed before transaction commit.
Force: Cache is immediately flushed (forced) to disk.
DISTRIBUTED DATABASES
44
UNIT I
S1
S4
S2
S3
S5
DISTRIBUTED DATABASES
45
UNIT I
Rcovery Protocol
Protocols at failed site to complete all transactions outstanding at the time of
failures.
Classes of failures
1. Site failure
2. Lost messages
3. Network partitioning
4. Byzantine failures
Effects of failures
1. Inconsistent database
2. Transaction processing is blocked
3. Failed component unavailable
Independent Recovery
A recovering site makes a transition directly to a final state without
communicating with other sites.
Lemma
For a protocol, if a local states concurrency set contains both an abort and
commit, it is not resilient to an arbitrary failure of a single site.
S i commit because other sites may be in abort
S i abort because other sites may be in commit
Rule 1: S: Intermediate state
If C(s) contains a commit failure transition from S to commit
Otherwise failure transition from S to abort
Unscheduled restarts occur for one of the following reasons:
AMP or disk failure
Software failure
Parity error
Transaction recovery describes how the Teradata RDBMS restarts itself after a
system or media failure.
Two types of automatic recovery of transactions can occur when an unscheduled
restart occurs:
Single transaction recovery
RDBMS recovery
The following table details when these two automatic recovery mechanisms take
place:
This Recovery Type
DISTRIBUTED DATABASES
Happens When
46
Single transaction
RDBMS
UNIT I
DISTRIBUTED DATABASES
47
UNIT I
DISTRIBUTED DATABASES
48
UNIT I
Sample Questions
Topic 1:
1. What is a distributed database? (2M)
2. Define Distributed DBMS (DDBMS) (2M)
3. What are the characteristics of DDBMS ? (2M)
4. What is important difference between DDBMS and distributed processing? (2M)
5. What are the Functions of a DDBMS ? (2M)
6. What are the Advantages and Disadvantages of DDBSs? (2M)
Topic 2:
1. Explain Distributed Database in detail. ( 8M)
2. Explain Centralized Database System in detail. (8M).
3. List and differentiate between Distributed Databases Vs Conventional
Databases
Topic 3:
1. Explain the Client-Server Architecture with a neat diagram. (8M)
2. Explain the Distributed Database Architecture with a neat diagram. (8M)
3. What is Synchronous Distributed Database?
4. What is Asynchronous Distributed Database?
5. Explain the major issues a DDBMS in detail. (8M)
6. Explain how Replication or Data Replications used in DDBMS. (8M)
Topic 4:
1. What is the Concept behind fragmentation ? Give examples. (8M)
2. Why we need fragmentation? (2M)
3. Explain in detail the Types of Fragmentation and give examples for
each. (16M)
4. Explain in detail on Horizontal Fragmentation (HF). (8M)
5. Explain in detail on Vertical Fragmentation (VF). (8M)
6. Explain in detail on Hybrid Fragmentation (HF). (8M)
7. Two problems :
8. What are the Advantages and Disadvantages of Fragmentation (2M)
9. Compare and contrast the PHF and VF. (8M)
DISTRIBUTED DATABASES
49
UNIT I
Topic 5:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Topic 6:
1. What is transaction ? Give examples. (2M)
2. Give the local transaction manager responsibilities. (2M)
3. Explain in detail on Transaction system Architecture. Illustrate with a
neat diagram (8M)
4. Explain Transaction Structure in detail. (8M)
5. What are the three major Properties of Transactions ? (2M)
6. What is ACID?
7. List and describe the Transaction Processing Issues. (8M).
Topic 7:
1. What is currency control ? (2M)
2. Explain Concurrency control mechanisms. (2M)
3. Whats lock? (2M)
4. What is Distributed Locks ? (2M)
5. Describe the Majority Locking. (2M)
6. Explain in detail on currency control in handled in DDBMS. (8M)
Topic 8:
1. What is Failure? (2M)
2. What is Recovery? (2M)
3. Is Recovery after failure? Explain. (2M)
4. Explain about Recovery Protocol. (2M)
5. What are the major Effects of failures? (8M)
6. Explain in detail about two automatic recovery mechanisms (8M)
7. Explain in detail on Two-Phase Commit Protocol. (8M)
8. Explain in detail on Recovery in handled in DDBMS. (8M)
DISTRIBUTED DATABASES
50
UNIT I
University Questions
1. Differentiate homogenous and hetrogenous databases with reference to
distributed databases. (2M)
2. Name the fragmentations supported in a distributed system and write
examples for each. (2M)
3. Explain how concurrency control and recovery techniques are handled in
DDBMS. (16M)
4. Draw simplified physical client Architecture for distributed database systems
and discuss in detail (8M)
5. Discuss the techniques of fragmentation, data replication used in distributed
database design. (8M)
DISTRIBUTED DATABASES
51