Distributed Database Design
Distributed Database Design
2
Definition Distributed Database Design
– Distributed DBMS (DDBMS): Software system that permits the management of the distributed
database and makes the distribution transparent to users
• Governs storage and processing of logically related data over interconnected computer
systems in which both data and processing functions are distributed among several sites
3
Motivation for Distributed Database Distributed Database Design
4
Distributed Database Design
Mohali
Computer Network
Chennai
Bangalore
Corporate Employee
Database
5
Mangalore
Design Strategy Distributed Database Design
• Top-Down Approach
– Mostly for the systems designed from scratch
– Good for Homogeneous Environment
• Bottom-UP Approach
– Most of the databases already exists in several sites
6
Distributed Database Design
Mohali
Database
Mohali
Computer Network
Chennai
Bangalore
Chennai
Database
Bangalore
Database
Mangalore Database
7
Mangalore
DDBMS Advantages Distributed Database Design
8
DDBMS Disadvantages Distributed Database Design
9
Distributed Database Design
10
Distributed Database Design
• Must handle all necessary functions imposed by the distribution of data and
processing
11
Distributed Database Design
DDBMS Components
12
Distributed Database Design
Single-Site Processing,
Single-Site Data (SPSD)
• All processing is done on single CPU or host computer (mainframe,
midrange, or PC)
• All data are stored on host computer’s local disk
• Processing cannot be done on end user’s side of the system
• Typical of most mainframe and midrange computer DBMSs
• DBMS is located on the host computer, which is accessed by dumb
terminals connected to it
• Also typical of the first generation of single-user microcomputer databases
13
Multiple-Site Processing, Distributed Database Design
14
Distributed Database Design
Multiple-Site Processing,
Multiple-Site Data (MPMD)
• Fully distributed database management system with support for multiple
data processors and transaction processors at multiple sites
• Classified as either homogeneous or heterogeneous
• Homogeneous DDBMSs
– Integrate only one type of centralized DBMS over a network
15
Distributed Database Design
16
Distributed Database Design
Multiple-Site Processing,
Multiple-Site Data (MPMD)
• Heterogeneous DDBMSs
– Integrate different types of centralized DBMSs over a network
• Fully heterogeneous DDBMS
– Support different DBMSs that may even support different data models
(relational, hierarchical, or network) running under different computer systems,
such as mainframes and microcomputers
17
Reference Architecture of DDBMS Distributed Database Design
18
Distributed Database Design
19
Reference Architecture of DDBMS Distributed Database Design
20
Distributed Database Design
21
Distributed Database Design
22
Checklist for Fragments Distributed Database Design
• Why to Fragment?
• How to Fragment?
• How much to fragment?
• How to test correctness?
• How to allocate?
• Information Requirements?
23
Fragment Locations Distributed Database Design
24
Why Fragment? Distributed Database Design
• Usage
– Applications work with partition rather than entire relations
• Efficiency
– Data is stored close to where it is most frequently used
– Data that is not needed by local applications is not stored
• Parallelism
– With fragments as unit of distribution, transaction can be divided into several sub-queries that operate on
fragments
• Security
– Data not required by local applications is not stored and so not available to unauthorized users
25
Distributed Database Design
26
Distributed Database Design
Fragmentation
• A relation R is divided into fragments r1, r2, …rn, which contain enough information to
allow reconstruction of R
• Example:
We have a relation Employee (EmpNo, EmpName, Address, Designation, PU, Location)
Location is “Bangalore” or “Mangalore”. We can split Employee into two different
fragments:
• EmployeeBangalore = σBangalore = “Bangalore”(Employee)
• EmployeeMangalore = σMangalore = “Mangalore”(Employee)
27
Distributed Database Design
28
Distributed Database Design
Types of Fragmentation
29
Distributed Database Design
30
Horizontal Fragmentation Distributed Database Design
31
Distributed Database Design
Horizontal Fragmentation
• This strategy is determined by looking at predicates used by transactions
• Involves finding set of minimal (complete and relevant) predicates.
• Set of predicates is complete, if and only if, any two tuples in same fragment are
referenced with same probability by any application.
• Predicate is relevant if there is at least one application that accesses fragments
differently
32
Derived Horizontal Fragmentation Distributed Database Design
33
Derived Horizontal Fragmentation Distributed Database Design
34
Derived Horizontal Fragmentation Distributed Database Design
• If relation contains more than one foreign key, need to select one as parent.
• Choice can be based on fragmentation used most frequently or fragmentation
with better join characteristics.
35
Vertical Fragmentation Distributed Database Design
36
Mixed Fragmentation Distributed Database Design
37
Mixed Fragmentation Distributed Database Design
Relation (R)
38
Example - Mixed Fragmentation Distributed Database Design
39
Correctness of Fragmentation Distributed Database Design
40
Completeness of Fragmentation Distributed Database Design
41
Reconstruction of Fragmentation Distributed Database Design
42
Disjointness of Fragmentation Distributed Database Design
• Disjointness: if data item x appears in fragment ri, then it should not appear in any other
fragment.
– Exception: vertical fragmentation, where primary key
attributes must be repeated to allow reconstruction.
• For horizontal fragmentation, data item is a tuple
• For vertical fragmentation, data item is an attribute.
43
Correctness of Horizontal Fragment Distributed Database Design
44
Correctness of Vertical Fragment Distributed Database Design
45
Distributed Database Design
Data Allocation
• Four strategies regarding placement of data:
• Centralized
• Partitioned (or Fragmented)
• Complete Replication
• Selective Replication
46
Distributed Database Design
Data Allocation
• Centralized: Consists of single database stored at one site with users distributed across
the network (This is not a DDB but distributed processing!)
• Partitioned: Database partitioned into disjoint fragments, each fragment assigned to one
site.
• Complete Replication: Consists of maintaining complete copy of database at each site.
• Selective Replication: Combination of partitioning, replication, and centralization.
47
Allocation Strategy Distributed Database Design
• Database Information
• Selectivity of Fragments
• Size of Fragments
• Application Information
• No of Read access of a query to a fragment
• No of Update access of a query to a fragment
• A matrix indicating which query updates which fragment
• A similar matrix for read
• Originating site of each query
• Site Information
• Unit cost of storing data at a site
• Unit cost of processing at site
• Network Information
• Communication cost between two sites
• Frame size
48
Distributed Database Design
Distributed Database
Transparency Features
• Allow end user to feel like database’s only user
• Features include:
– Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency
– Heterogeneity transparency
49
Distribution Transparency Distributed Database Design
50
Distribution Transparency Distributed Database Design
51
Naming Transparency Distributed Database Design
52
Naming Transparency Distributed Database Design
53
Transaction Transparency Distributed Database Design
• Ensures that all distributed transactions maintain distributed database’s integrity and
consistency.
• Distributed transaction accesses data stored at more than one location.
• Each transaction is divided into number of sub transactions, one for each site that has to
be accessed.
• DDBMS must ensure the indivisibility of both the global transaction and each sub-
transactions.
• Must ensure both concurrency transparency, and failure transparency
54
Concurrency Transparency Distributed Database Design
• logically consistent with results obtained if transactions executed one at a time, in some
arbitrary serial order. Same fundamental principles as for centralized DBMS
• DDBMS must ensure both global and local transactions do not interfere with each other
• Similarly, DDBMS must ensure consistency of all sub transactions of global transaction.
• Techniques for concurrency control. Usually different from the ones for DBMS.
55
Concurrency Transparency Distributed Database Design
56
Concurrency Transparency Distributed Database Design
57
Failure Transparency Distributed Database Design
58
Performance Transparency Distributed Database Design
59
Performance Transparency Distributed Database Design
• Distributed Query Processor (DQP) maps data request into ordered sequence
of operations on local databases.
• It must consider fragmentation, replication, and allocation schemas.
• DQP has to decide:
– which fragment to access;
– which copy of a fragment to use;
– which location to use.
60
Performance Transparency Distributed Database Design
• DQP produces execution strategy optimized with respect to some cost function.
• Typically, costs associated with a distributed request include:
– I/O cost;
– CPU cost;
– Communication cost.
61
Performance Transparency - Example Distributed Database Design
62
Performance Transparency - Example Distributed Database Design
63
Performance Transparency - Example Distributed Database Design
64
Distributed Database Design
• Distributed transaction
– Can update or request data from several different remote sites on a network
• Remote request
– Lets a single SQL statement access data to be processed by a single
remote database processor
• Remote transaction
– Accesses data at a single remote site
65
Distributed Database Design
• Distributed transaction
– Allows a transaction to reference several different (local or remote) DB sites
66
Distributed Query Distributed Database Design
67
Distributed Transaction Distributed Database Design
BEGIN
UPDATE Hanu.dept@sales.in.auto.com
SET loc = 'Gulbarga'
WHERE deptno = 10;
UPDATE Hanu.emp
SET deptno = 11
WHERE deptno = 10;
COMMIT;
68
Distributed Concurrency Control Distributed Database Design
69
Distribution Transaction Management Distributed Database Design
70
Distribution Transaction Management Distributed Database Design
71
Two-Phase Commit Protocol Distributed Database Design
72
Two-Phase Commit Protocol Distributed Database Design
73
Two-phase commit – Prepare Phase Distributed Database Design
• The initiating node, called the global coordinator, asks participating nodes
other than the commit point site to promise to commit or roll back the
transaction, even if there is a failure. If any node cannot prepare, the
transaction is rolled back
74
Two-phase commit – Commit Phase Distributed Database Design
75
Two-phase commit – Other possibility Distributed Database Design
• Three possibilities
– All participants reply positive within time-out interval
• Commit transaction
– One or more participants reply negative
• Abort transaction
– One or more participants do not reply within time-out interval
• Abort transaction
76
Two-phase Commit – Actual commit Distributed Database Design
• Commit transaction
– DTC sends Commit OK message to all participants
– All participants commit
• Write from temporary durable storage to permanent durable storage
77
Two-phase Commit – Abort commit Distributed Database Design
• Abort transaction
– When at least one participant not ready to commit or timeout
– DTC sends Abort message to all participants
• All participants rollback
• Removed from temporary durable storage
78
Distributed Lock Management Distributed Database Design
79
Query Optimization Distributed Database Design
80
Query Optimization Distributed Database Design
81
Distributed Database Design
82
Summary Distributed Database Design
83
Distributed Database Design
Summary
• Current database systems can be classified by extent to which they
support processing and data distribution
• DDBMS characteristics are best described as a set of transparencies
• A transaction is formed by one or more database requests
• A database can be replicated over several different sites on a computer
network
84
Reference Distributed Database Design
• http://www.csc.liv.ac.uk/~valli/Comp302/COMP302-DDB-notes.pdf
• M. Tamer Ozsu, Patrick Valduriez – Principle of Distributed Database Systems, Prentice
Hall
• Oracle® Database Administrator's Guide 10g Release 2 (10.2)
• David Bell, Jane Grimson – Distributed Database Systems, Addison-Wesley
• http://www.course.com/downloads/mis/robcoronel/powerpoint_pres.cfm
• http://www.laynetworks.com/Relational%20Database%20Management%20Systems.htm
85
Distributed Database Design
Thank You!
86