Lecture3-Distributed Introduction
Lecture3-Distributed Introduction
Lecture3-Distributed Introduction
Systems
M. Tamer Özsu
Patrick Valduriez
sites.
4
Distributed DBMS
• It is the software system that permit the management of the distributed DB and
makes the distribution transparency to users.
• Each fragment is stored on one or more computers under the control of a separate
DBMS with the computers connected by a communication network.
• Each site is capable independently processing user request that require access to
local data and is also capable of processing data stored on other computer in the
network.
• User access the distributed database via an application. Applications are classified
as those that don't require data from other sites (local application), and those that do
require data from other sites (global applications)
7
Centralized DBMS Environment
10
Centralized Vs. Distributed Databases
11
Why Might Data be Distributed
12
DDBMS- Advantages
• Complexity
• Cost
• Security
• Integrity control more difficult
• Lack of standards
• Lack of experience
• Database design more complex
18
Parallel VS. Distributed Databases
19
Different Architectures
20
Different Architectures
21
Parallel VS. Distributed Databases
In Parallel Databases
• Machines are physically close to each other, e.g., same server room
• Machines connects with dedicated high-speed LANs and switches
• Communication cost is assumed to be small
• Can shared-memory, shared-disk, or shared-nothing architecture
In Distributed Databases
• Machines can be far from each other, e.g., in different continent
• Can be connected using public-purpose network, e.g., Internet
• Communication cost and problems cannot be ignored
• Usually shared-nothing architecture.
22
Type of parallelism
1. Inter-query Parallelism
Queries/transactions execute in parallel with one another.
2. Intra-query parallelism
A single query that is executed in parallel using multiple processors or
disks using shared nothing architecture. To improve the query’s
response time.
3. Intra-operation parallelism
▪ Execution of single complex or large operations in parallel in multiple
processors.
◼ Executing concurrently multiple instances of an operator, with each
instance working on a subset of the data.
◼ Intra-operator parallelism is based primarily on partitioning the input
relation into non-overlapping data segments. Followed by a final merge of
the results
For example, ORDER BY clause of a query that tries to execute on millions of
records can be parallelized on multiple processors.
23
Type of parallelism
4 Inter-operation Parallelism
4.1 Pipe-lined parallelism
Execution of different operations in pipe-lined fashion. For example, if
we need to join three tables, one processor may join two tables and
send the result set records as and when they are produced to the other
processor. In the other processor the third table can be joined with the
incoming records and the result can be produced.
24
Types of Distributed Database Systems
25
Homogeneous Distributed Databases
26
Homogeneous Distributed Databases
27
Homogeneous Distributed Databases:
Autonomy
◼ Design autonomy:
28
Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different
operating systems, DBMS products and data models. Its properties
are:
• Different sites use dissimilar schemas and software.
• A site may not be aware of other sites and so there is limited co-
operation in processing user requests
29
Heterogeneous Distributed Databases
Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux
30
Heterogeneous Distributed Databases
31
Distributed DBMS Architectures
32
Classification of DDBMS
Distribution Peer-to-Peer
Distributed DBS
Distributed Multi-
DBS
Client\server
Autonomy
Multi-DBS
Heterogeneity
Federated DBS
33
History – Early Distribution
Peer-to-Peer (P2P)
Improved performance
❑
❑ Distributed DBMS architecture