Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Distributed DBMS Architecture

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 49

Distributed DBMS Architecture

Reference book
Database Systems: A Practical Approach to Design, Implementation and Management,
by Thomas M. Connolly and Carolyn E. Begg. Ch 22

Principles of Distributed Database Systems


by M. Tamer Özsu • Patrick Valduriez – (ch 1-page 21-35)
&
internet resources.
Distributed Database Concepts
It is a system to process Unit of execution (a transaction) in
a distributed manner. That is, a transaction can be
executed by multiple networked computers in a unified
manner.
It can be defined as
A distributed database (DDB) is a collection of multiple
logically related database distributed over a computer
network, and a distributed database management
system as a software system that manages a distributed
database while making the distribution transparent to
the user.
Shared nothing architecture
Centralized database
Distributed database
Distributed Database System
Advantages
1. Management of distributed data with different levels
of transparency: This refers to the physical placement of
data (files, relations, etc.) which is not known to the user
(distribution transparency).
Site 5
Site 1

Site 4 Communications neteork

Site 3 Site 2
Distributed Database System
Advantages
The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented
horizontally and stored with possible replication as shown below.
EMPLOYEES - All
PROJECTS - All
WORKS_ON - All
EMPLOYEES - New York
Chicago PROJECTS - All
(headquarters) WORKS_ON - New York Employees

EMPLOYEES - San Francisco and LA New York


PROJECTS - San Francisco
WORKS_ON - San Francisco Employees

San Francisco Communications neteork

Los Angeles Atlanta


EMPLOYEES - LA EMPLOYEES - Atlanta
PROJECTS - LA and San Francisco PROJECTS - Atlanta
WORKS_ON - LA Employees WORKS_ON - Atlanta Employees
Distributed Database System
Advantages
• Distribution and Network transparency: Users do not have to
worry about operational details of the network. There is
Location transparency, which refers to freedom of issuing
command from any location without affecting its working. Then
there is Naming transparency, which allows access to any names
object (files, relations, etc.) from any location.
• Replication transparency: It allows to store copies of a data at
multiple sites as shown in the above diagram. This is done to
minimize access time to the required data.
• Fragmentation transparency: Allows to fragment a relation
horizontally (create a subset of tuples of a relation) or vertically
(create a subset of columns of a relation).
Distributed Database System
Advantages
2. Increased reliability and availability: Reliability refers to system live
time, that is, system is running efficiently most of the time.
Availability is the probability that the system is continuously
available (usable or accessible) during a time interval. A distributed
database system has multiple nodes (computers) and if one fails
then others are available to do the job.
3. Improved performance: A distributed DBMS fragments the
database to keep data closer to where it is needed most. This
reduces data management (access and modification) time
significantly.
4. Easier expansion (scalability): Allows new nodes (computers) to be
added anytime without chaining the entire configuration.
Types of Distributed Database Systems

Federated Database Management Systems Issues

• Differences in data models: Relational, Objected


oriented, hierarchical, network, etc.
• Differences in constraints: Each site may have their
own data accessing and processing constraints.
• Differences in query language: Some site may use SQL,
some may use SQL-89, some may use SQL-92, and so on.
Types of Distributed Database System

 Homogeneous DDBMS
 Heterogeneous

Homogenous Heterogeneous
Homogenous Distributed Database Systems

•In a homogeneous distributed database, all the sites


use identical DBMS and operating systems. Its properties are

•The sites use very similar software.

•The sites use identical DBMS or DBMS from the same vendor.

•Each site is aware of all other sites and cooperates with other
sites to process user requests.

•The database is accessed through a single interface as if it is a


single database.
Example
• homogeneous database system is an
enterprise’s nation-wide ERP system which
comprises of distributed databases, all of
which are Oracle.
Homogeneous Database

Same software
Types of Homogeneous Distributed
Database
• There are two types of homogeneous distributed
database −
• Autonomous − Autonomous distributed database are independent
databases (separate data residing in each database) that function
independently, but, are integrated by the controlling application
software.
• Non-autonomous −
• Non-autonomous distributed database are homogeneous
databases where data is distributed
across homogeneous nodes and is controlled by DBMS at each
node
Example
• Example for a autonomous distributed
database system is Oracle based data marts
which manages data pertaining to sales,
distribution and inventory. Example for a non-
autonomous distributed database system is
Oracle based global sales database which is
partitioned across multiple databases.
Advantages of Homogeneous Distributed Database

 Easy to use
 Easy to mange
 Easy to Design

Disadvantages of Homogeneous Distributed Database

 Difficult for most organizations to force a


homogeneous environment
Heterogeneous Distributed Database Systems

In this type of database , Different data center may run different DBMS products, with
possibly different underlying data models.

Occurs when sites have implemented their own databases and integration is considered
later.

Translations required to allow for:

o Different hardware.
o Different DBMS products.
o Different hardware and different DBMS products.
Heterogeneous Distributed database

Sql oracle
Heterogeneous DDBMS

• In a heterogeneous distributed database different


sites may use different schema and software.
• In heterogeneous systems, different nodes may
have different hardware & software and data
structures at various nodes or locations are also
incompatible.
• Different computers and operating systems,
database applications or data models may be
used at each of the locations.
Heterogeneous DDBMS (contd..)
• On heterogeneous system, translations are
required to allow communication between
different sites (or DBMS).
• The heterogeneous system is often not
technically or economically feasible. In this
system, a user at one location may be able to
read but not update the data at another
location.
Types of Heterogeneous Distributed
Databases
• Federated − The heterogeneous database
systems are independent in nature and
integrated together so that they function as a
single database system.
• Un-federated − Unfederated database systems
are collection of homogeneous database
systems which are generally non-autonomous
by nature and employs centralized control.
Advantages of Heterogeneous Distributed Database

Huge data can be stored in one Global center from different data
center

Remote access is done using the global schema.

Different DBMSs may be used at each node

Disadvantages of Heterogeneous Distributed Database

Difficult to mange

Difficult to design.
Distributed DBMS Architecture
• The architecture of a system defines its structure.
• This means that the components of the system
are identified, the function of each component is
specified, and the interrelationships and
interactions among these components are
defined.
• DDBMS architectures are generally developed
depending on three parameters
Architectural Models for Distributed
DBMSs
• Distribution − It states the physical
distribution of data across the different sites.
• Autonomy − It indicates the distribution of
control of the database system and the degree
to which each constituent DBMS can operate
independently.
• Heterogeneity − It refers to the uniformity or
dissimilarity of the data models, system
components and databases.
ANSI/SPARC Architecture
• In late 1972, the Computer and Information
Processing Committee (X3) of the American
National Standards Institute (ANSI) established a
Study Group on Database Management Systems
under the auspices of its Standards Planning and
Requirements Committee (SPARC).
• The mission of the study group was to study the
feasibility of setting up standards in this area, as
well as determining which aspects should be
standardized if it was feasible.
ANSI -SPARK
Architectural Models
• Some of the common architectural models are –

• Client - Server Architecture for DDBMS


• Peer - to - Peer Architecture for DDBMS
• Multi - DBMS Architecture
Client - Server Architecture for DDBMS

• This is a two-level architecture where the


functionality is divided into servers and clients.
• The server functions primarily encompass data
management, query processing, optimization and
transaction management
• Client functions include mainly user interface.
However, they have some functions like
consistency checking and transaction
management.
types of client/server architecture.

• 1) Multiple client single server


There is only one server which is accessed by multiple
clients.

2) Multiple client multiple server
Two alternative management strategies are possible: wither
each client manages its own
connection to the appropriate server or each client knows of
only its home server which then
communicates with other servers as required.
Peer- to-Peer Architecture for DDBMS

• In these systems, each peer acts both as a


client and a server for imparting database
services.
• The peers share their resource with other
peers and co-ordinate their activities.
This architecture generally has four
levels of schemas −
• Global Conceptual Schema − Depicts the
global logical view of data.
• Local Conceptual Schema − Depicts logical
data organization at each site.
• Local Internal Schema − Depicts physical data
organization at each site.
• External Schema − Depicts user view of data
Multi - DBMS
• Multi database refers to multiple databases
where each database has full autonomy - can be
seen as a collection of autonomous databases
(similar to federated databases).
• In relational DB context there is a separate
schema for each database.
• We talk about database integration that relates
data from multiple databases.
• For example there could be a manufacturing
database that records products and a separate
sales database that records sales. The two
database can make up multi database system.
• Multi-database systems usually reflect a
situation where, for historical reasons, the
data that an organization needs to operate is
held in multiple different databases in
different locations, and possibly from different
vendors.
Design Alternatives

• The distribution design alternatives for the


tables in a DDBMS are as follows −
• Non-replicated and non-fragmented
• Fully replicated
• Partially replicated
• Fragmented
• Mixed
Non-replicated & Non-fragmented

• In this design alternative, different tables are


placed at different sites.
• Data is placed so that it is at a close proximity to
the site where it is used most.
• It is most suitable for database systems where
the percentage of queries needed to join
information in tables placed at different sites is
low.
• If an appropriate distribution strategy is adopted,
then this design alternative helps to reduce the
communication cost during data processing.
Fully Replicated

• In this design alternative, at each site, one copy


of all the database tables is stored.
• Since, each site has its own copy of the entire
database, queries are very fast requiring
negligible communication cost.
• On the contrary, the massive redundancy in data
requires huge cost during update operations.
• Hence, this is suitable for systems where a large
number of queries is required to be handled
whereas the number of database updates is low.
Partially Replicated

• Copies of tables or portions of tables are stored


at different sites.
• The distribution of the tables is done in
accordance to the frequency of access.
• This takes into consideration the fact that the
frequency of accessing the tables vary
considerably from site to site.
• The number of copies of the tables (or portions)
depends on how frequently the access queries
execute and the site which generate the access
queries.
Fragmented

• In this design, a table is divided into two or more


pieces referred to as fragments or partitions, and
each fragment can be stored at different sites.
• This considers the fact that it seldom happens
that all data stored in a table is required at a
given site.
• Moreover, fragmentation increases parallelism
and provides better disaster recovery
• The three fragmentation techniques are −
• Vertical fragmentation
• Horizontal fragmentation
• Hybrid fragmentation

You might also like