Distributed DBMS

Distributed DBMS
A distributed database is a set of interconnected databases that is distributed over the computer
network or internet. A Distributed Database Management System (DDBMS) manages the distributed
database and provides mechanisms to make the databases transparent to the users. In these systems,
data is intentionally distributed among multiple nodes so that all computing resources of the
organization can be optimally used.
A distributed database is a collection of multiple interconnected databases, which are spread
physically across various locations that communicate via a computer network.
Features
• Databases in the collection are logically interrelated with each other. Often they represent a single
logical database.
• Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
• The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
• A distributed database is not a loosely connected file system.
• A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.
Distributed Database Management System
A distributed database management system (DDBMS) is a centralized software system that manages a
distributed database in a manner as if it were all stored in a single location.
Features
• It is used to create, retrieve, update and delete distributed databases.
• It synchronizes the database periodically and provides access mechanisms by the virtue of which the
distribution becomes transparent to the users.
• It ensures that the data modified at any site is universally updated.
• It is used in application areas where large volumes of data are processed and accessed by numerous
users simultaneously.
• It is designed for heterogeneous database platforms.
• It maintains confidentiality and data integrity of the databases.
Factors Encouraging DDBMS

• Distributed Nature of Organizational Units − Most organizations in the current times are subdivided
into multiple units that are physically distributed over the globe. Each unit requires its own set of local
data. Thus, the overall database of the organization becomes distributed.
• Need for Sharing of Data − The multiple organizational units often need to communicate with each
other and share their data and resources. This demands common databases or replicated databases that
should be used in a synchronized manner.
• Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online Analytical
Processing (OLAP) work upon diversified systems which may have common data. Distributed
database systems aid both these processing by providing synchronized data.
• Database Recovery − One of the common techniques used in DDBMS is replication of data across
different sites. Replication of data automatically helps in data recovery if database in any site is
damaged. Users can access data from other sites while the damaged site is being reconstructed. Thus,
database failure may become almost inconspicuous to users.
• Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for using
the same data among different platforms.
Advantages of Distributed Databases
• Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and
local data to the new site and finally connecting them to the distributed system, with no interruption in
current functions.
• More Reliable − In case of database failures, the total system of centralized databases comes to a
halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.
• Better Response − If data is distributed in an efficient manner, then user requests can be met from
local data itself, thus providing faster response. On the other hand, in centralized systems, all queries
have to pass through the central computer for processing, which increases the response time.
• Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not
feasible in centralized systems.
Adversities of Distributed Databases

• Need for complex and expensive software − DDBMS demands complex and often expensive
software to provide data transparency and co-ordination across the several sites.
• Processing overhead − Even simple operations may require a large number of communications and
additional calculations to provide uniformity in data across the sites.
• Data integrity − The need for updating data in multiple sites pose problems of data integrity.
• Overheads for improper data distribution − Responsiveness of queries is largely dependent upon
proper data distribution. Improper data distribution often leads to very slow responses to user
requests.
Distributed Database Vs Centralized
Database Centralized DBMS Distributed DBMS

In Centralized DBMS the database is stored In Distributed DBMS the database are
in an only one site stored on different sites and help of the
network it can access it
If the data is stored at a single computer Database and DBMS software are
site, which can be used by multiple users distributed over many sites, connected by a
computer network
The database is maintained at one site The database is maintained at a number of
different sites
Centralized database
Distributed database
Types of Distributed Databases
Distributed databases can be broadly classified into homogeneous and heterogeneous distributed
database environments
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −
• • The sites use very similar software.
• • The sites use identical DBMS or DBMS from the same vendor.
• • Each site is aware of all other sites and cooperates with other sites to process user requests.
• • The database is accessed through a single interface as if it is a single database.
Types of Homogeneous Distributed Database

There are two types of homogeneous distributed database −
Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS
co-ordinates data updates across the sites.
Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different operating systems, DBMS
products and data models. Its properties are −
• Different sites use dissimilar schemas and software.
• The system may be composed of a variety of DBMSs like relational, network, hierarchical or object
oriented.
• Query processing is complex due to dissimilar schemas. Transaction processing is complex due to
dissimilar software.
• A site may not be aware of other sites and so there is limited co-operation in processing user
requests.
Types of Heterogeneous Distributed Databases

Federated − The heterogeneous database systems are independent in nature and integrated together
so that they function as a single database system.
Un-federated − The database systems employ a central coordinating module through which the
databases are accessed.
Distributed DBMS Architectures
DDBMS architectures are generally developed depending on three parameters −
• Distribution − It states the physical distribution of data across the different sites.
• Autonomy − It indicates the distribution of control of the database system and the degree to which
each constituent DBMS can operate independently.
• Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system components
and databases.
Data Replication
Data replication is the process of storing separate copies of the database at two or more sites. It is a
popular fault tolerance technique of distributed databases.
Advantages of Data Replication
• Reliability − In case of failure of any site, the database system continues to work since a copy is
available at another site(s).
• Reduction in Network Load − Since local copies of data are available, query processing can be
done with reduced network usage, particularly during prime hours. Data updating can be done at non-
prime hours.
• Quicker Response − Availability of local copies of data ensures quick query processing and
consequently quick response time.
• Simpler Transactions − Transactions require less number of joins of tables located at different sites
and minimal coordination across the network. Thus, they become simpler in nature.
Disadvantages of Data Replication

• Increased Storage Requirements − Maintaining multiple copies of data is associated with increased
storage costs. The storage space required is in multiples of the storage required for a centralized
system.
• Increased Cost and Complexity of Data Updating − Each time a data item is updated, the update
needs to be reflected in all the copies of the data at the different sites. This requires complex
synchronization techniques and protocols.
• Undesirable Application – Database coupling − If complex update mechanisms are not used,
removing data inconsistency requires complex coordination at application level. This results in
undesirable application – database coupling.
Some commonly used replication techniques are

Snapshot replication
Near-real-time replication
Pull replication
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are
called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid (a combination
of horizontal and vertical). Horizontal fragmentation can further be classified into two techniques:
primary horizontal fragmentation and derived horizontal fragmentation. Fragmentation should be
done in a way so that the original table can be reconstructed from the fragments. This is needed so
that the original table can be reconstructed from the fragments whenever required. This requirement is
called “re-constructiveness.”
Advantages
1. Permits a number of transactions to be executed concurrently
2. Results in parallel execution of a single query
3. Increases level of concurrency, also referred to as, intra-query concurrency
4. Increased System throughput.
5. Since data is stored close to the site of usage, the efficiency of the database system is increased.
6. Local query optimization techniques are sufficient for most queries since data is locally available.
7. Since irrelevant data is not available at the sites, the security and privacy of the database system can
be maintained.
Disadvantages
1. Applications whose views are defined on more than one fragment may suffer performance
degradation if applications have conflicting requirements.
2. Simple tasks like checking for dependencies, would result in chasing after data in a number of sites
3. When data from different fragments are required, the access speeds may be very high.
4. In the case of recursive fragmentations, the job of reconstruction will need expensive techniques.
5. Lack of backup copies of data in different sites may render the database ineffective in case of
failure of a site.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to
maintain re-constructiveness, each fragment should contain the primary key field(s) of the table.
Vertical fragmentation can be used to enforce the privacy of data.
Grouping
• Starts by assigning each attribute to one fragment
• At each step, joins some of the fragments until some criteria are satisfied.
• Results in overlapping fragments
Splitting
• Starts with relation and decides on beneficial partitioning based on the access behaviour of
applications to the attributes
• Fits more naturally within the top-down design
• Generates non-overlapping fragments
Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance with the values of one or more
fields. Horizontal fragmentation should also confirm the rule of re-constructiveness. Each horizontal
fragment must have all columns of the original base table.
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal
extraneous information. However, reconstruction of the original table is often an expensive task.
Hybrid fragmentation can be done in two alternative ways −
At first, generate a set of horizontal fragments; then generate vertical fragments from one or more of
the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or more of
the vertical fragments.

Distributed DBMS

Uploaded by

Copyright:

Available Formats

Distributed DBMS

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed DBMS

Uploaded by

Copyright:

Available Formats

Distributed DBMS

Factors Encouraging DDBMS

Adversities of Distributed Databases

Distributed Database Vs Centralized

Database Centralized DBMS Distributed DBMS

Types of Distributed Databases

Types of Homogeneous Distributed Database

Types of Heterogeneous Distributed Databases

Disadvantages of Data Replication

Some commonly used replication techniques are

You might also like