DDBS Lec1
DDBS Lec1
DDBS Lec1
Distributed Database
Source:
1. Principles of Distributed Database Systems
By Tanner Ozsu, Patric Valdureitz
2. Slides available
Survey of advanced topics in Database Systems
• These systems were generally multiple client/single server systems in which the
distribution was mostly in terms of functionality, not data. If multiple servers
were used, clients were responsible for managing the connections to these
servers.
• Transparency of access was not widely supported, and each client had to “know”
the location of the required data. The distribution of data among multiple servers
was very primitive, systems did not support fragmentation or replication of data.
• Systems of the time were “homogeneous” in that each system could manage
only data that were stored in its own database, with no linkage to other
repositories.
• Today’s client/server systems provide significant transparency in accessing data
from multiple servers, support distributed transactions to facilitate transparency,
and execute queries over (horizontally) fragmented data.
• Object database managers have entered the marketplace and have found a
niche market in some classes of applications which are inherently distributed.
Distributed Database System
Distributed database system (DDBS) technology:
It is one of the major recent developments in the database systems area.
It is the union of what appear to be two diametrically opposed approaches to data
processing:
1. Database system and
2. Computer network technologies.
One of the major motivations behind the use of database systems is the desire to
integrate the operational data of an enterprise and to provide centralized, thus
controlled access to that data.
The technology of computer networks promotes a mode of work that goes against all
centralization efforts.
The most important objective of the database technology is integration, not centralization.
It is possible to achieve integration without centralization, and that is exactly what the DDB
technology attempts to achieve.
Distributed Data Processing
Distributed computing system: It is a number of autonomous processing elements
(not necessarily homogeneous) that are interconnected by a computer network and
that cooperate in performing their assigned tasks. The “processing element” is a
computing device that can execute a program on its own.
• Each site has autonomous processing capability and can perform local applications.
• Each site also participates in the execution of at least one global application which
requires accessing data at several sites.
Database 1
Database 3
Communication Network
Server 1
Server 3
Database 2
Server 2
Promises of DDBSs
- For a DDBS, to localize each data such that data about the employees in
Edmonton office are stored in Edmonton, those in the Boston office are stored in
Boston, and so forth. The same applies to the project and salary information.
We partition each of the relations and store each partition at a different
site. This is known as fragmentation. It may be preferable to duplicate
some of this data at other sites for performance and reliability reasons. The
result is a distributed database which is fragmented and replicated.
To place the database and applications across different sites, there are two alternatives:
i) Partitioned (or non-replicated) and
ii) Replicated.
Partitioned scheme: Database is divided into a number of disjoint partitions each of which is
placed at a different site.
Replicated scheme: It can be fully replicated where the entire database is stored at each site, or
partially replicated where each partition of the database is stored at more than one site but not at
all the sites.
Query processing deals with designing algorithms that analyze queries and
convert them into a series of data manipulation operations. The problem is
how to decide on a strategy for executing each query over the network in the
most cost-effective way.
The objective is to optimize where the inherent parallelism is used to improve
the performance of executing the transaction.