Distributed Database Concepts
Distributed Database Concepts
Concepts
Distributed Database.
A logically interrelated collection of shared data
(and a description of this data), physically
distributed over a computer network.
Distributed DBMS.
Software system that permits the management of
the distributed database and makes the
distribution transparent to users.
Concepts
• Collection of logically-related shared data.
• Data split into fragments.
• Fragments may be replicated.
• Fragments/replicas allocated to sites.
• Sites linked by a communications network.
• Data at each site is under control of a DBMS.
• DBMSs handle local applications autonomously.
• Each DBMS participates in at least one global
application.
Reason for Data Distribution
• Centralized DBMS vs. Distributed Database System
• A distributed database is a collection of data that
belongs logically to the same system but is physically
spread over the sites of a computer network
• Several factors have led to the development of DDBS:
– Distributed nature of some database applications
– Increased reliability and availability
– Allowing data sharing while maintaining some measure of local
control
– Improved performance
Component Architecture for a DDBMS
site 1
GDD
DDBMS
DC LDBMS
GDD
Computer Network
DDBMS
DC
site 2 DB
• Partitioned
– Database partitioned into disjoint fragments,
each fragment assigned to one site.
Data Allocation
• Complete Replication
– Consists of maintaining complete copy of
database at each site.
• Selective Replication
– Combination of partitioning, replication, and
centralization.
Why Fragment?
• Usage
– Applications work with views rather than
entire relations.
• Efficiency
– Data is stored close to where it is most
frequently used.
– Data that is not needed by local applications
is not stored.
Why Fragment?
• Parallelism
– With fragments as unit of distribution, transaction
can be divided into several subqueries that operate
on fragments.
• Security
– Data not required by local applications is not stored
and so not available to unauthorized users.
• Disadvantages
– Performance
– Integrity.
Correctness of Fragmentation
• Three correctness rules:
– Completeness
– Reconstruction
– Disjointness.
Correctness of Fragmentation
• Completeness
– If relation R is decomposed into fragments
R1, R2, ... Rn, each data item that can be found in
R must appear in at least one fragment.
• Reconstruction
– Must be possible to define a relational operation
that will reconstruct R from the fragments.
– Reconstruction for horizontal fragmentation is Union
operation and Join for vertical .
Correctness of Fragmentation
• Disjointness
– If data item di appears in fragment Ri, then it
should not appear in any other fragment.
– Exception: vertical fragmentation, where
primary key attributes must be repeated to allow
reconstruction.
– For horizontal fragmentation, data item is a tuple
– For vertical fragmentation, data item is an
attribute.
Types of Fragmentation
• Four types of fragmentation:
– Horizontal
– Vertical
– Mixed
– Derived.
41
Horizontal Fragmentation
• This strategy is determined by looking at
predicates used by transactions.
• Involves finding set of minimal (complete and
relevant) predicates.
• Set of predicates is complete, if and only if, any
two tuples in same fragment are referenced with
same probability by any application.
• Predicate is relevant if there is at least one
application that accesses fragments differently.
Horizontal Fragmentation of
account Relation
Vertical Fragmentation
• Vertical fragmentation: the schema for relation r is split
into several smaller schemas
– All schemas must contain a common candidate key (or
superkey) to ensure lossless join property.
– A special attribute, the tuple-id attribute may be added to each
schema to serve as a candidate key.
• Example : relation account with following schema
• Account-schema = (branch-name, account-number,
balance)
Vertical Fragmentation of
employee info Relation
Mixed Fragmentation
Advantages of Fragmentation
• Horizontal:
– allows parallel processing on fragments of a relation
– allows a relation to be split so that tuples are located where they
are most frequently accessed
• Vertical:
– allows tuples to be split so that each part of the tuple is stored
where it is most frequently accessed
– tuple-id attribute allows efficient joining of vertical fragments
– allows parallel processing on a relation
• Vertical and horizontal fragmentation can be mixed.
– Fragments may be successively fragmented to an arbitrary
depth.
Data Replication (1)
• A relation or fragment of a relation is
replicated if it is stored redundantly in two
or more sites.
• Full replication of a relation is the case
where the relation is stored at all sites.
• Fully redundant databases are those in
which every site contains a copy of the
entire database.
Data Replication (2)
• Advantages of Replication
– Availability: failure of site containing relation r does not result in
unavailability of r is replicas exist.
– Parallelism: queries on r may be processed by several nodes in
parallel.
– Reduced data transfer: relation r is available locally at each site
containing a replica of r.
• Disadvantages of Replication
– Increased cost of updates: each replica of relation r must be
updated.
– Increased complexity of concurrency control: concurrent updates
to distinct replicas may lead to inconsistent data unless special
concurrency control mechanisms are implemented.
• One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy
Possible Network Topologies
Date’s 12 Rules for Distributed
Systems
Rule 0. Fundamental Principle:
TO THE USER, A DISTRIBUTED SYSTEM SHOULD LOOK EXACTLY LIKE
A NONDISTRIBUTED SYSTEM
1. Local autonomy
2. No reliance on a central site
3. Continuous operation
4. Location independence
5. Fragmentation independence
6. Replication independence
7. Distributed query processing
8. Distributed transaction management
9. Hardware independence
10. Operating system independence
11. Network independence
12. DBMS independence
Rule 1 : Local Autonomy
Autonomy objective: Sites should be autonomous to the maximum extent possible