Distributed Database Design
Distributed Database Design
2
Definition Distributed Database Design
– Distributed DBMS (DDBMS): Software system that permits the management of the
distributed database and makes the distribution transparent to users
• Governs storage and processing of logically related data over interconnected computer systems
in which both data and processing functions are distributed among several sites
3
Motivation for Distributed Database Distributed Database Design
4
Distributed Database Design
5
Distributed Database Design
6
DDBMS Advantages Distributed Database Design
7
DDBMS Disadvantages Distributed Database Design
8
Distributed Database Design
9
Characteristics of Distributed Management Systems
Distributed Database Design
• Must handle all necessary functions imposed by the distribution of data and
processing
10
DDBMS Components Distributed Database Design
11
Single-Site Processing, Single-Site Data Distributed
(SPSD) Database Design
12
Multiple-Site Processing, Single-Site Data (MPSD)
Distributed Database Design
13
Multiple-Site Processing, Distributed Database Design
Multiple-Site Data (MPMD)
14
Distributed Database Design
15
Multiple-Site Processing, Distributed Database Design
Multiple-Site Data (MPMD)
• Heterogeneous DDBMSs
– Integrate different types of centralized DBMSs over a network
• Fully heterogeneous DDBMS
– Support different DBMSs that may even support different data models
(relational, hierarchical, or network) running under different computer systems,
such as mainframes and microcomputers
16
Distributed Database Distributed Database Design
Transparency Features
17
Distribution Transparency Distributed Database Design
18
Reference Architecture of DDBMS Distributed Database Design
19
Distributed Database Design
20
Reference Architecture of DDBMS Distributed Database Design
21
Transaction Transparency Distributed Database Design
22
Distributed Database Design
• Distributed transaction
– Can update or request data from several different remote sites on a network
• Remote request
– Lets a single SQL statement access data to be processed by a single
remote database processor
• Remote transaction
– Accesses data at a single remote site
23
Distributed Database Design
• Distributed transaction
– Allows a transaction to reference several different (local or remote) DB sites
24
Distributed Query Distributed Database Design
25
Distributed Transaction Distributed Database Design
26
Distributed Concurrency Control Distributed Database Design
27
Issues in Distributed Database Design Distributed Database Design
28
Issues in Distributed Database Design Distributed Database Design
29
Data Allocation Distributed Database Design
30
Distributed Database Design
Data Allocation
• Centralized: Consists of single database stored at one site with users distributed across
the network (This is not a DDB but distributed processing!)
• Partitioned: Database partitioned into disjoint fragments, each fragment assigned to one
site.
• Complete Replication: Consists of maintaining complete copy of database at each site.
• Selective Replication: Combination of partitioning,
replication, and centralization.
31
Fragment Locations Distributed Database Design
32
Why Fragment? Distributed Database Design
• Usage
– Applications work with partition rather than entire relations
• Efficiency
– Data is stored close to where it is most frequently used
– Data that is not needed by local applications is not stored
• Parallelism
– With fragments as unit of distribution, transaction can be divided into several sub-queries that operate
on fragments
• Security
– Data not required by local applications is not stored and so not available to unauthorized users
33
Design Consideration for FragmentationsDistributed Database Design
• Quantitative information may include:
– frequency with which a transaction is run;
– site from which a transaction is run;
– performance criteria for transactions.
• Qualitative information may include transactions that are executed such as:
– type of access (read or write);
– predicates of read operations.
34
Fragmentation Distributed Database Design
• A relation R is divided into fragments r1, r2, …rn, which contain enough information to
allow reconstruction of R
• Example:
We have a relation Sells (pub, address, price, type)
Type is “small” or “large”. We can split Sells into two different fragments:
• • Sellssmall= σtype = “small”(Sells)
• • SellsLage= σtype = “large”(Sells)
35
Comparison of Strategies for Data Distribution Distributed Database Design
36
Types of Fragmentation Distributed Database Design
37
Horizontal and Vertical Fragmentation Distributed Database Design
38
Horizontal Fragmentation Distributed Database Design
39
Horizontal Fragmentation Distributed Database Design
40
Vertical Fragmentation Distributed Database Design
41
Mixed Fragmentation Distributed Database Design
42
Example - Mixed Fragmentation Distributed Database Design
43
Derived Horizontal Fragmentation Distributed Database Design
44
Derived Horizontal Fragmentation Distributed Database Design
• S3 = σ branchNo=‘B003’(Staff)
• S4 = σ branchNo=‘B005’(Staff)
• S5 = σ branchNo=‘B007’(Staff)
• Could use derived fragmentation for Property:
Pi = PropertyForRent >branchNo Si, 3 ≤ i ≤ 5
45
Derived Horizontal Fragmentation Distributed Database Design
• If relation contains more than one foreign key, need to select one as parent.
• Choice can be based on fragmentation used most frequently or fragmentation
with better join characteristics.
46
Correctness of Fragmentation Distributed Database Design
47
Completeness of Fragmentation Distributed Database Design
48
Reconstruction of Fragmentation Distributed Database Design
49
Disjointness of Fragmentation Distributed Database Design
• Disjointness: if data item x appears in fragment ri, then it should not appear in any other
fragment.
– Exception: vertical fragmentation, where primary key
attributes must be repeated to allow reconstruction.
• For horizontal fragmentation, data item is a tuple
• For vertical fragmentation, data item is an attribute.
50
Correctness of Horizontal Fragment Distributed Database Design
51
Correctness of Vertical Fragment Distributed Database Design
• Relation: Bars(name,address,licence,employees,owner)
• Fragments:
• • r1 =Πname,address,licence (Bars)
• • r2 = Πname,address,employees,owner(Bars)
• Correctness rules
• • Completeness: Each attribute in the Bars relation appears either in
• r1 or in r2
• • Reconstruction: The Bars relation can be reconstructed from the
• fragments
• Bars = r1 >< r2
• • Disjointness: The two fragments are disjoint, except for the primary
• key, name, which is necessary for reconstruction
52
Transparency in Distributed databases Distributed Database Design
• Distribution Transparency
• Transaction Transparency
• Performance Transparency
• DBMS Transparency
53
Distribution Transparency Distributed Database Design
54
Naming Transparency Distributed Database Design
55
Naming Transparency Distributed Database Design
56
Transaction Transparency Distributed Database Design
• Ensures that all distributed transactions maintain distributed database’s integrity and
consistency.
• Distributed transaction accesses data stored at more than one location.
• Each transaction is divided into number of sub transactions, one for each site that has to
be accessed.
• DDBMS must ensure the indivisibility of both the global transaction and each sub-
transactions.
• Must ensure both concurrency transparency, and failure transparency
57
Concurrency Transparency Distributed Database Design
• logically consistent with results obtained if transactions executed one at a time, in some
arbitrary serial order. Same fundamental principles as for centralized DBMS
• DDBMS must ensure both global and local transactions do not interfere with each other
• Similarly, DDBMS must ensure consistency of all sub transactions of global transaction.
• Techniques for concurrency control. Usually different from the ones for DBMS.
58
Concurrency Transparency Distributed Database Design
59
Concurrency Transparency Distributed Database Design
60
Failure Transparency Distributed Database Design
61
Performance Transparency Distributed Database Design
62
Performance Transparency Distributed Database Design
• Distributed Query Processor (DQP) maps data request into ordered sequence
of operations on local databases.
• It must consider fragmentation, replication, and allocation schemas.
• DQP has to decide:
– which fragment to access;
– which copy of a fragment to use;
– which location to use.
63
Performance Transparency Distributed Database Design
64
Performance Transparency - Example Distributed Database Design
65
Performance Transparency - Example Distributed Database Design
66
Performance Transparency - Example Distributed Database Design
67
Distribution Transaction Management Distributed Database Design
68
Distribution Transaction Management Distributed Database Design
69
Two-Phase Commit Protocol Distributed Database Design
70
Two-Phase Commit Protocol Distributed Database Design
71
Two-phase commit Distributed Database Design
72
Two-phase commit – Prepare Phase Distributed Database Design
• The initiating node, called the global coordinator, asks participating nodes
other than the commit point site to promise to commit or roll back the
transaction, even if there is a failure. If any node cannot prepare, the
transaction is rolled back
73
Two-phase commit – Commit Phase Distributed Database Design
74
Two-phase commit – Other possibility Distributed Database Design
• Three possibilities
– All participants reply positive within time-out interval
• Commit transaction
– One or more participants reply negative
• Abort transaction
– One or more participants do not reply within time-out interval
• Abort transaction
75
Two-phase Commit – Actual commit Distributed Database Design
• Commit transaction
– DTC sends Commit OK message to all participants
– All participants commit
• Write from temporary durable storage to permanent durable storage
76
Two-phase Commit – Abort commit Distributed Database Design
• Abort transaction
– When at least one participant not ready to commit or timeout
– DTC sends Abort message to all participants
• All participants rollback
• Removed from temporary durable storage
77
Distributed Lock Management Distributed Database Design
78
Query Optimization Distributed Database Design
79
Query Optimization Distributed Database Design
80
Distributed Database Design
81
Summary Distributed Database Design
82
Distributed Database Design
Summary
• Current database systems can be classified by extent to which they
support processing and data distribution
• DDBMS characteristics are best described as a set of transparencies
• A transaction is formed by one or more database requests
• A database can be replicated over several different sites on a computer
network
83
Reference Distributed Database Design
• http://www.csc.liv.ac.uk/~valli/Comp302/COMP302-DDB-notes.pdf
• M. Tamer Ozsu, Patrick Valduriez – Principle of Distributed Database Systems, Prentice
Hall
• Oracle® Database Administrator's Guide 10g Release 2 (10.2)
• David Bell, Jane Grimson – Distributed Database Systems, Addison-Wesley
• http://www.course.com/downloads/mis/robcoronel/powerpoint_pres.cfm
• http://www.laynetworks.com/Relational%20Database%20Management%20Systems.htm
84
Distributed Database Design
Thank You!
85