Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
19 views

SQL Unit 3 Distributed DB

The document discusses distributed database systems which span multiple computers connected by a network. Each node stores a portion of the data and the entire database is made up of the sum of the data stored on each node. The document then covers the need for distributed databases and provides examples as well as features and challenges of distributed databases.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

SQL Unit 3 Distributed DB

The document discusses distributed database systems which span multiple computers connected by a network. Each node stores a portion of the data and the entire database is made up of the sum of the data stored on each node. The document then covers the need for distributed databases and provides examples as well as features and challenges of distributed databases.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

DISTRIBUTED DATABASE SYSTEM IN DBMS

A distributed database is a database system that spans multiple computers or nodes


that are connected by a network. Each node in a distributed database can store a
portion of the data, and the entire database is made up of the sum of the data stored
on each node.
In a distributed database, data is stored and processed in a distributed manner, and
the system ensures that the data remains consistent and available to users despite
network failures or other system errors. The primary goal of a distributed database is
to provide high availability, scalability, and performance to applications that require
access to large amounts of data.

NEED OF DISTRIBUTED DATABASE


Let's assume for a moment that we have only centralized databases.
We will be inserting all the data into one single database. Making it too large so that it
will take a lot of time to query a single piece of record.
Once a fault occurs, we no longer be able to serve user requests as we have only one
database.
No scaling is possible even if we wanted to and availability is also less which in turn
affects the throughput.
Distributed databases resolve various issues, such as availability, fault tolerance,
throughput, latency, scalability, and many other problems that can arise from using a
single machine and a single database. That's why we need distributed databases.
Distributed Databases
A distributed database is a database that is not limited to one computer system. It is
like a database that consists of two or more files located in different computers or sites
either on the same network or on an entirely different network.
These sites do not share any physical component. Distributed databases are needed
when a particular data in the database needs to be accessed by various users globally.
It needs to be handled in such a way that for a user it always looks like one single
database.
By contrast, a Centralized database consists of a single database file located at one
site using a single network.
Below is a reference diagram for distributed databases.

Examples of distributed databases


Apache Ignite
Apache Cassandra
Apache HBase
Amazon SimpleDB
Clusterpoint
FoundationDB.
Features of Distributed Databases
In general, distributed databases include the following features:
Location independency: Data is independently stored at multiple sites and managed
by independent Distributed database management systems (DDBMS).
Network linking: All distributed databases in a collection are linked by a network and
communicate with each other.
Distributed query processing: Distributed query processing is the procedure of
answering queries in a distributed environment.
Hardware independent: The different sites where data is stored are hardware-
independent. There is no physical contact between these Distributed Database.
Distributed transaction management: Distributed Database provides a consistent
distribution through commit protocols, distributed recovery methods, and distributed
concurrency control techniques.
Types of Distributed Database In Dbms
There are two types of distributed databases:
• Homogenous distributed database.
• Heterogeneous distributed database.
Homogenous Distributed Database
A Homogenous distributed database is a network of identical databases stored on
multiple sites.
Its properties are −

• The sites use very similar software.


• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other sites to process
user requests.
• The database is accessed through a single interface as if it is a single database.

Heterogeneous Distributed Database


It is the opposite of a Homogenous distributed database. It uses different schemas,
operating systems, DDBMS, and different data models causing it difficult to manage.
Its properties are −
• Different sites use dissimilar schemas and software.
• The system may be composed of a variety of DBMSs like relational, network,
hierarchical or object oriented.
• Query processing is complex due to dissimilar schemas.
• Transaction processing is complex due to dissimilar software.
• A site may not be aware of other sites and so there is limited co-operation in
processing user requests.

Note: Heterogenous DDMS have local users while Homogenous DDMS does not have
local users
STRUCTURE OF DISTRIBUTED DATABASE
A Distributed Database System is a kind of database that is present or divided in more
than one location, which means it is not limited to any single computer system. It is
divided over the network of various systems. The Distributed Database System is
physically present on the different systems in different locations. This can be
necessary when different users from all over the world need to access a specific
database
Parameters of Distributed Database Systems:
• Distribution: It describes how data is physically distributed among the several
sites.
• Autonomy: It reveals the division of power inside the Database System and the
degree of autonomy enjoyed by each individual DBMS.
• Heterogeneity: It speaks of the similarity or differences between the databases,
system parts, and data models.
Common Architecture Models of Distributed Database Systems:
• Client-Server Architecture of DDBMS:
This architecture is two level architecture where clients and servers are the points or
levels where the main functionality is divided. There is various functionality provided
by the server, like managing the transaction, managing the data, processing the
queries, and optimization.

• Peer-to-peer Architecture of DDBMS:


In this architecture, each node or peer is considered as a server as well as a client,
and it performs its database services as both (server and client). The peers coordinate
their efforts and share their resources with one another.

• Multi DBMS Architecture of DDBMS:


This is an amalgam of two or more independent Database Systems that functions as
a single integrated Database System.
CHALLENGES / TRADE-OFFS IN DISTRIBUTING THE DATABASE
In distributed databases, trade-offs include
• Data consistency: Consistency is the idea that every node in a distributed
system sees the exact same data at the same time. It makes sure that every
subsequent read will return the updated value following a successful data
update. Strong consistency in distributed systems can be difficult to achieve
because of things like network latency and the requirement for node
coordination.
• Availability: Ensuring high availability of data need to be accessible even in
the failure of network is highly crucial.
• Partition tolerance: Even if some nodes cannot communicate with each
other due to network issues, the distributed database should still provide
consistent and reliable operations.
• Network Communication overhead: Network overhead can lead to delay
and reduced performance due to the time it takes to transmit data between
nodes.
• Ensuring fault tolerance involves strategies to ensure system reliability even
when the components fail.
Advantages of Data Distribution
• Better Reliability: Distributed databases offers better reliability than
centralized databases. When database failure occurs in a centralized
database, the system comes to a complete stop. But in the case of
distributed databases, the system functions even when a failure occurs, only
performance-related issues occur which are negotiable.
• Modular Development: It implies that the system can be expanded by
adding new computers and local data to the new site and connecting them
to the distributed system without interruption.
• Lower Communication Cost: Locally storing data reduces communication
costs for data manipulation in distributed databases. In centralized
databases, local storage is not possible.
• Better Response Time: As the data is distributed efficiently in distributed
databases, this provides a better response time when user queries are met
locally. While in the case of centralized databases, all of the queries have to
pass through the central machine which increases response time.
• Scalability: The database is easier to expand as it is already spread across
multiple systems, and it is not too complicated to add a system.
• Data Storage and availability: there were a natural catastrophe such as
fire or an earthquake all the data would not be destroyed it is stored at
different locations.
Even if some of the data nodes go offline, the rest of the database can
continue its normal functions.
DISADVANTAGES OF DISTRIBUTED DATABASE
• Costly Software: Maintaining a distributed database is costly because we
need to ensure data transparency, coordination across multiple sites which
requires costly software.
• Large Overhead: Many operations on multiple sites require complex and
numerous calculations, causing a lot of processing overhead.
• Improper Data Distribution: If data is not properly distributed across different
sites, then responsiveness to user requests is affected. This in turn increases
the response time.
• Security Issues: It is difficult to provide security in a distributed database as
the database needs to be secured at all the locations it is stored. Moreover, the
infrastructure connecting all the nodes in a distributed database also needs to
be secured.
• It is difficult to maintain data integrity in the distributed database because of
its nature. There can also be data redundancy in the database as it is stored at
multiple locations.

DESIGN OF DISTRIBUTED DATABASES


Designing distributed database involves several key considerations:
• Replication.
• Fragmentation.
Replication
As the name suggests, the system stores copies of data at different sites. If an entire
database is available on multiple sites, it is a fully redundant database.
The advantage of data replication is that it increases availability of data on different
sites. As the data is available at different sites, queries can be processed parallelly.
However, data replication has some disadvantages as well. Data needs to be
constantly updated and synchronized with other sites, if any site fails to achieve it then
it will lead to inconsistencies in the database. Availability of data is highly benefitted
from Replication.
Constant updation complicates concurrency control and it is also overhead for the
servers.

Note: DB-1 replicates on different sites, creating copies of the same data.
Fragmentation
In Fragmentation, the relations are fragmented, which means they are split into smaller
parts. Each of the fragments is stored on a different site, where it is required. In this,
the data is not replicated, and no copies are created. Consistency of data is highly
benefitted from Fragmentation.
The prerequisite for fragmentation is to make sure that the fragments can later be
reconstructed into the original relation without losing any data.
Consistency is not a problem here as each site has a different piece of information.
There are two types of fragmentation,
Horizontal Fragmentation – Splitting by rows.
Vertical fragmentation – Splitting by columns.
Horizontal Fragmentation(or Sharding)
The relation schema is fragmented into group of rows, and each group is then
assigned to one fragment.
Vertical Fragmentation
The relation schema is fragmented into group of columns, called smaller schemas.
These smaller schemas are then assigned to each fragment.
Each fragment must contain a common candidate key to guarantee a lossless join.

ACID Properties in DBMS


ACID properties are a set of properties that guarantee reliable processing of
transactions in a database management system (DBMS). Transactions are a
sequence of database operations that are executed as a single unit of work, and the
ACID properties ensure that transactions are processed reliably and consistently in a
DBMS.

The Atomicity property ensures that a transaction is either executed completely or


not at all.
The Consistency property ensures that the database remains in a consistent state
before and after a transaction.
The Isolation property ensures that multiple transactions can run concurrently without
interfering with each other.
The Durability property ensures that the results of a committed transaction are
permanent and cannot be lost due to system failure.
Together, these properties ensure that transactions are processed reliably and
consistently in a DBMS, which is essential for the integrity and accuracy of data in a
database.
1. Atomicity in DBMS
The term atomicity is the ACID Property in DBMS that refers to the fact that the data
is kept atomic. It means that if any operation on the data is conducted, it should either
be executed completely or not at all. It also implies that the operation should not be
interrupted or just half completed. When performing operations on a transaction, the
operation should be completed totally rather than partially. If any of the operations
aren’t completed fully, the transaction gets aborted.
Example Sometimes, a current operation will be running and then, an operation with
a higher priority enters. This discontinues the current operation and the current
operation will be aborted.

In the given scenario, if two users simultaneously try to book the only available seat
on a train, the transaction is considered incomplete. According to atomicity, the first
user who successfully clicks the booking button will reserve the seat and receive a
notification, while the second user's transaction will be rolled back, and they will be
notified that no more seats are available.

In a simpler example, if a person tries to book a ticket, selects a seat, and proceeds
to the payment gateway but encounters a failure due to bank server issues, their
booked seat will not be reserved for them. A complete transaction involves reserving
the seat and completing the payment. If any step fails, the operation is aborted, and
the user is brought back to the initial state without their seat being reserved.

Atomicity in DBMS is often referred to as the ‘all or nothing’ rule.

2. Consistency in DBMS
This ACID Property will verify the total sum of seats left in the train + sum of seats
booked by users = total the number of seats present in the train. After each transaction,
consistency is checked to ensure nothing has gone wrong.

Example Let us consider an example where one person is trying to book a ticket. They
are able to reserve their seat but their payment hasn’t gone through due to bank
issues. In this case, their transaction is rolled back. But just doing that isn’t sufficient.
The number of available seats must also be updated. Otherwise, if it isn’t updated,
there will be an inconsistency where the seat given up by the person is not accounted
for. Hence, the total sum of seats left in the train + the sum of seats booked by users
would not be equal to the total number of seats present in the train if not for
consistency.
3. Isolation in DBMS
Isolation is defined as a state of separation. Isolation is an ACID Property in DBMS
where no data from one database should impact the other and where many
transactions can take place at the same time. In other words, when the operation on
the first state of the database is finished, the process on the second state of the
database should begin. It indicates that if two actions are conducted on two different
databases, the value of one database may not be affected by the value of the other.
When two or more transactions occur at the same time in the case of transactions,
consistency should be maintained. Any modifications made in one transaction will not
be visible to other transactions until the change is committed to the memory.

Example Suppose two people try to book the same seat simultaneously. Transactions
are serialized to maintain data consistency. The first person's transaction succeeds,
and they receive a ticket. The second person's transaction fails as the seat is already
booked. They receive an error message indicating no available seats.

4. Durability in DBMS
The ACID Property durability in DBMS refers to the fact that if an operation is
completed successfully, the database remains permanent in the disk. The database’s
durability should be such that even if the system fails or crashes, the database will
survive. However, if the database is lost, the recovery manager is responsible for
guaranteeing the database’s long-term viability. Every time we make a change, we
must use the COMMIT command to commit the values.

Example Suppose that there is a system failure in the railway management system
resulted in the loss of all booked train details. Millions of users who had paid for their
seats are now unable to board the train, causing significant financial losses and
eroding trust in the company. The situation is particularly critical as these trains are
needed for important reasons, causing widespread panic and inconvenience.

You might also like