0% found this document useful (0 votes)

33 views

Chapter 7 - Distributed Database System

This document discusses distributed databases and client-server architectures. It covers distributed database concepts, data fragmentation and replication, types of distributed database systems including homogeneous and heterogeneous, query processing in distributed databases including issues around data transfer costs, and concurrency control and recovery challenges. It also outlines a 3-tier client-server architecture.

Uploaded by

ajmelcosc0340

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Chapter 7 - Distributed Database System

Uploaded by

ajmelcosc0340

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Chapter seven

Distributed Databases
and
Client-Server Architectures

1 1
Outline

1. Distributed Database Concepts

2. Data Fragmentation, Replication and Allocation
3. Types of Distributed Database Systems
4. Query Processing
5. Concurrency Control and Recovery
6. 3-Tier Client-Server Architecture

2 2
1. Distributed Database Concepts
 A transaction can be executed by multiple networked
computers in a unified manner.
 A distributed database (DDB) processes Unit of execution (a
transaction) in a distributed manner.
 A distributed database (DDB) can be defined as :
– A collection of multiple logically related database
distributed over a computer network, and a distributed
database management system as a software system that
manages a distributed database while making the
distribution transparent to the user.
– The physical placement of data (files, relations, etc.) which
is not known to the user (distribution transparency).

3 3
• The EMPLOYEE, PROJECT, and WORKS_ON tables
may be fragmented horizontally and stored with possible
replication as shown below.

Remark:
• Each site has a DBMS
– Fragments (replicated or unique).
– Linked by network.
– Can handle local users.
– Participates in at least one global
4 4
requests.
 Advantages of DDB :
i. Distribution and Network transparency:
 Users do not have to worry about operational details of
the network.
– There is Location transparency, which refers to
freedom of issuing command from any location
without affecting its working.
– Then there is Naming transparency, which allows
access to any names object (files, relations, etc.) from
any location.
ii. Replication transparency:
 It allows to store copies of a data at multiple sites as
shown in the above diagram.
 This is done to minimize access time to the required
data.
iii. Fragmentation transparency:
• Allows to fragment a relation horizontally (create a
subset of rows of a relation) or vertically (create a subset
of columns of a relation). 5 5
iv. Increased reliability and availability:
 Reliability refers to system live time, that is, system is
running efficiently most of the time. Availability is the
probability that the system is continuously available
(usable or accessible) during a time interval.
 A distributed database system has multiple nodes
(computers) and if one fails then others are available to
do the job.
v. Improved performance:
 A distributed DBMS fragments the database to keep
data closer to where it is needed most.
 This reduces data management (access and
modification) time significantly.
vi. Easier expansion (scalability):
 Allows new nodes (computers) to be added anytime
without changing the entire configuration. 6 6
 Disadvantages of Distributed Database

i. Complexity- The data replication , failure recovery , network

management …make the system more complex than the central
DBMSs
ii. Cost- Since DDBMS needs more people and more hardware ,
maintaining and running the system can be more expensive
than the centralized system .
iii.Problem of connecting Dissimilar Machine- Additional layers
of operation system software are needed to translate and
coordinate the flow of data between machines.
iv.Data integrity and security problem - Because data maintained
by distributed systems can be accessed at locations in the
network, controlling the integrity of a database can be
difficult.
7
2. Data Replication and Fragmentation: Distributed data storage
 There are two approaches to store the relation in the distributed
database : Replication and Fragmentation
I. Data Replication
 The system maintain several identical copies of the relation & store
each copy at a different site
 In general it enhance the performance of read operation and
increase the availability of data to read only transaction. However,
update transactions incur greater overhead
II. Data Fragmentation
–Split a relation into logically related and correct parts.
–The main reasons for fragmenting a relation are
• Efficiency- data that is not needed by the local applications is not stored
• Parallelism – a transaction can be divided into several subqueries that
operate on fragments which will increase the degree of concurrency
– but reconstruction of the whole relation will require accessing data from all
sites containing part of the relation
8 8
• A relation can be fragmented in two ways:
 Horizontal fragmentation
• It is a horizontal subset of a relation which contain those
of rows which satisfy selection conditions.
• Consider the Employee relation with selection condition
(DNO = 5). All rows satisfy this condition will create a
subset which will be a horizontal fragment of Employee
relation.
• A selection condition may be composed of several
conditions connected by AND or OR.
 Vertical fragmentation
• It is a subset of a relation which is created by a subset of
columns. Thus a vertical fragment of a relation will
contain values of selected columns.
9
– Consider the Employee relation. A vertical fragment of can be
created by keeping the values of Name, Bdate, Sex, and
Address.
– Because there is no condition for creating a vertical fragment,
each fragment must include the primary key attribute of the
parent relation Employee. In this way all vertical fragments of
a relation are connected.
 Representation
 There three rules that must be followed during fragmentation
 Completeness – if a relation r is decomposed into
fragments r1, r2… rn , each data item that can be found in r
must appear in at least one fragment
 Reconstruction – it must be possible to define a relation
operation that will reconstruct the relation r from fragments
 Disjointness –if a data item di appears in fragment ri ,
then it shouldn’t appear in any other fragment
10 10
3. Types of Distributed Database Systems

• Homogeneous Window
– All sites of the database Site 5 Unix
Oracle Site 1
system have identical Oracle
setup, i.e., same database Window
system software. Site 4 Communications
– The system may have network
little or no local
autonomy Oracle
– The underlying operating Site 3 Site 2
systems can be a mixture Linux Oracle Linux Oracle
of Linux, Window, Unix,
etc.

11 11
• Heterogeneous
– At least one of the database must be from different vendor : two variants
– Federated: Each site may run different database system but the data
access is managed through a single conceptual schema.
• This implies that the degree of local autonomy is minimum.
Each site must adhere to a centralized access policy. There may
be a global schema.
– Multidatabase: There is no one conceptual global schema. For data
access a schema is constructed dynamically as needed by the
application software.

Object Unix Relational

Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network

Network
Object DBMS
Oriented Site 3 Site 2 Relational
12 12
Linux Linux
4. Query Processing in Distributed Databases
 Issues
– Cost of transferring data (files and results) over the network.
• This cost is usually high, so some optimization is necessary.
• Example: suppose there are three sites. Where the relation Employee
at site 1, Department at Site 2 and no relation at site 3
– Employee at site 1. 10,000 rows. Row size = 100 bytes. Table
size = 106 bytes.
– Department at Site 2. 100 rows. Row size = 35 bytes. Table size
= 3,500 bytes.
– And a query is initiated from S3 to retrieve employees [First Name (15
byte long), Last name (15 byte long) and Department name (10 byte long)
total of 40 bytes]
• Q: For each employee, retrieve employee Fname, Lname, and
department name
• Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)
Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno

Dname Dnumber Mgrssn Mgrstartdate 13 13

 Assumption
– The result of this query will have 10,000 rows, assuming
that every employee is related to a department.
– Suppose each result row 40 bytes long. The query is
submitted at site 3 and the result is sent to this site.
– Problem: Employee and Department relations are not
present at site 3.

• what is your best strategy that can optimize data

transportation cost?

14 14
• Strategies : Minimizing data transfer.
1. Transfer Employee and Department to site 3.
• Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3.
• Transferring employees data from site 1 to site 2: 1,000,000 bytes
• Query result size = 40 * 10,000 = 400,000 bytes.
• Total transfer size = 1,000,000 + 400,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at site
1, and send the result to site 3.
• Data Transfer from site 2 to site 1: 3500 bytes
• Query result size = 40 * 10,000 = 400,000 bytes
• Total bytes transferred = 3500+ 400,000 = 403,500 bytes.
– Preferred approach: strategy 3.

15 15
Example 2 : Consider the query
– Q’: For each department, retrieve the department name ,Fname
and LName of the department manager
• Relational Algebra expression:
–  Fname,Lname,Dname (Employee Department)
Mgrssn = SSN
• The result of this query will have 100 tuples, assuming that every
department has a manager, the execution strategies are:
1. Transfer Employee and Department to the result site and
perform the join at site 3.
• Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the
result to site 3.
• Site 1-- Site 2: 1,000,000
• Site2-- site3: Query result size = 40 * 100 = 4000 bytes.
• Total transfer size = 4000 +1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site 1 and
send the result to site 3.
• Total transfer size = 4000 + 3500 = 7500 bytes.
Preferred strategy: Choose strategy 3. 16 16
Example 3: Now suppose the result is needed at site2. Possible strategies
:
1. Transfer Employee relation to site 2, execute the query and
present the result to the user at site 2.
• Total transfer size = 1,000,000 bytes for both queries
Q and Q’.
2. Transfer Department relation to site 1, execute join at site 1
and send the result back to site 2.
• Total transfer size for
– Q = 400,000 + 3500 = 403,500 bytes
– Q’ = 4000 + 3500 = 7500 bytes.

 Preferred strategy: Choose strategy 2.

17 17
5. Concurrency Control and Recovery
 Distributed Databases encounter a number of concurrency
control and recovery problems which are not present in
centralized databases. Some of them are listed below.

– Dealing with multiple copies of data items:

The concurrency control must maintain global
consistency. Likewise the recovery mechanism must
recover all copies and maintain consistency after
recovery.
– Failure of individual sites:
• Database availability must not be affected due to the
failure of one or two sites and the recovery scheme must
recover them before they are available for use.

18 18
– Communication link failure:
• This failure may create network partition which would
affect database availability even though all database
sites may be running.
– Distributed commit:
• A transaction may be fragmented and they may be
executed by a number of sites. This require a two or
three-phase commit approach for transaction commit.
– Distributed deadlock:
• Since transactions are processed at multiple sites, two or
more sites may get involved in deadlock. This must be
resolved in a distributed manner.

19 19
5. 1 Distributed Concurrency control
i. Primary site technique: A single site is designated as a
primary site which serves as a coordinator for transaction
management.

Primary site
Site 5
Site 1

Site 4 Communications neteork

Site 3 Site 2

20 20
• Transaction management:
– Concurrency control and commit are managed by this site.
– In two phase locking, this site manages locking and
releasing data items. If all transactions follow two-phase
policy at all sites, then serializability is guaranteed.

– Advantages:
• An extension to the centralized two phase locking so
implementation and management is simple.
• Data items are locked only at one site but they can be
accessed at any site.
– Disadvantages:
• All transaction management activities go to primary site
which is likely to overload the site.
• If the primary site fails, the entire system is inaccessible.
– To aid recovery a backup site is designated which behaves as
a shadow of primary site. In case of primary site failure,
backup site can act as primary site. 21 21
ii. Primary Copy Technique:
– In this approach, instead of a site, a data item partition is
designated as primary copy. To lock a data item just the
primary copy of the data item is locked.
• Advantages:
– Since primary copies are distributed at various sites, a
single site is not overloaded with locking and unlocking
requests.
• Disadvantages:
– Identification of a primary copy is complex. A distributed
directory must be maintained, possibly at all sites.

22 22
Recovery from a coordinator failure
• In both approaches a coordinator site or copy may become
unavailable. This will require the selection of a new
coordinator.
– Primary site approach with no backup site:
• Aborts and restarts all active transactions at all sites.
Elects a new coordinator and initiates transaction
processing.
– Primary site approach with backup site:
• Suspends all active transactions, designates the backup
site as the primary site and identifies a new back up site.
• Primary site receives all transaction management
information to resume processing.
– Primary and backup sites fail or no backup site:
• Use election process to select a new coordinator site.

23 23
iii. Concurrency control based on voting:
– There is no primary copy of coordinator.
– Send lock request to sites that have data item.
– If majority of sites grant lock then the requesting transaction
gets the data item.
– Locking information (grant or denied) is sent to all these sites.
– To avoid unacceptably long wait, a time-out period is defined.
If the requesting transaction does not get any vote information
then the transaction is aborted.

24 24
Client-Server Database Architecture

• It consists of clients running client software, a set of servers

which provide all database functionalities and a reliable
communication infrastructure.

Server 1 Client 1

Client 2

Server 2 Client 3

Server n Client n

25 25
three-tier client/server architecture.

Many Web applications use an architecture called the three-tier

architecture, which adds an intermediate layer between the client
and the database server. This intermediate layer called the Web
server. This server plays an intermediary role by storing
business rules(constraints) that are used to access data from the
database server.
It can also improve database security by checking a client's credentials
before forwarding a request to the database server. The intermediate
server accepts requests from the client, processes the request and sends
database commands to the database server, and then acts as a conduit
for passing (partially) processed data from the database server to the
clients

26
• Clients reach server for desired service, but server does reach
clients.
• The server software is responsible for local data management
at a site, much like centralized DBMS software.
• The client software is responsible for most of the distribution
function.
• The communication software manages communication among
clients and servers.
• The processing of a SQL queries goes as follows:
– Client parses a user query and decomposes it into a number
of independent sub-queries. Each subquery is sent to
appropriate site for execution.
– Each server processes its query and sends the result to the
client.
– The client combines the results of subqueries and produces
the final result. 27 27

Group Disc
No ratings yet
Group Disc
38 pages
DBMS PPT Unit-5
100% (1)
DBMS PPT Unit-5
85 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Data Communication Basics CH 7
No ratings yet
Data Communication Basics CH 7
27 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
7-Distributed DB
No ratings yet
7-Distributed DB
37 pages
7 Distributed DB
No ratings yet
7 Distributed DB
38 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
Database MC A
No ratings yet
Database MC A
16 pages
DBMS-Unit 5
No ratings yet
DBMS-Unit 5
27 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
Week 12- Distributed Databases
No ratings yet
Week 12- Distributed Databases
37 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Final
No ratings yet
Final
46 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
25 pages
ddb unit 1-5
No ratings yet
ddb unit 1-5
190 pages
Distributed Database Frank Chinembiri and Florence-2
No ratings yet
Distributed Database Frank Chinembiri and Florence-2
42 pages
Distributed Databases: Benefits and Issues To Be Considered
No ratings yet
Distributed Databases: Benefits and Issues To Be Considered
25 pages
06 - Distributed DBMSs and Replication
No ratings yet
06 - Distributed DBMSs and Replication
55 pages
Unit i Distributed Databases
No ratings yet
Unit i Distributed Databases
15 pages
DDB Slides
No ratings yet
DDB Slides
67 pages
Unit-2_Distributed Database System
No ratings yet
Unit-2_Distributed Database System
7 pages
Unit I (Distributed Databases)
No ratings yet
Unit I (Distributed Databases)
8 pages
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
No ratings yet
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
67 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Chapter 6 DDBMS
No ratings yet
Chapter 6 DDBMS
41 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
No ratings yet
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
26 pages
Distributed Db New
No ratings yet
Distributed Db New
44 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Answer:: The Different Components of DDBMS Are As Follows
No ratings yet
Answer:: The Different Components of DDBMS Are As Follows
9 pages
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
No ratings yet
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
24 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
Unit V
No ratings yet
Unit V
22 pages
04_Distributed DBMSs - Concepts and Design
No ratings yet
04_Distributed DBMSs - Concepts and Design
72 pages
Distributed Databases
No ratings yet
Distributed Databases
53 pages
dbms-unit-v
No ratings yet
dbms-unit-v
27 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Unit 1
No ratings yet
Unit 1
28 pages
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
No ratings yet
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
12 pages
DistributedDatabases 3
No ratings yet
DistributedDatabases 3
14 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
123 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
42 pages
Chapter 6
No ratings yet
Chapter 6
45 pages
Chapter - 7 Distributed Database System
100% (1)
Chapter - 7 Distributed Database System
54 pages
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
No ratings yet
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
32 pages
02 DistributedDataManagement
No ratings yet
02 DistributedDataManagement
37 pages
Dbms Unit V Notes
No ratings yet
Dbms Unit V Notes
27 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
DDIS U1-3
No ratings yet
DDIS U1-3
40 pages
DDMS Part-1
No ratings yet
DDMS Part-1
35 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Compusoft, 2 (12), 396-399 PDF
No ratings yet
Compusoft, 2 (12), 396-399 PDF
4 pages
How ACID Principle Works
No ratings yet
How ACID Principle Works
2 pages
Ada Course Plan
No ratings yet
Ada Course Plan
9 pages
Dining Philosopher Problem
No ratings yet
Dining Philosopher Problem
8 pages
Process Management - Synchronization
No ratings yet
Process Management - Synchronization
34 pages
Software-Testing - Units 6,7,8 (VTU) 8th Sem
0% (1)
Software-Testing - Units 6,7,8 (VTU) 8th Sem
76 pages
Dbms PUT
No ratings yet
Dbms PUT
1 page
UNIT 5 Part 2
No ratings yet
UNIT 5 Part 2
6 pages
Concurrency Control Dbms
No ratings yet
Concurrency Control Dbms
49 pages
Transaction
No ratings yet
Transaction
25 pages
DBMS 21CS53 QB IA 2 - Updated
No ratings yet
DBMS 21CS53 QB IA 2 - Updated
6 pages
Master of Computer Applications - Mca Course Structure and Scheme of Examination W.E.F 2016-17
No ratings yet
Master of Computer Applications - Mca Course Structure and Scheme of Examination W.E.F 2016-17
145 pages
Chapter 5 Concurrency Control
No ratings yet
Chapter 5 Concurrency Control
11 pages
Chapter 6. Synchronization Tools
No ratings yet
Chapter 6. Synchronization Tools
60 pages
Department of Computer Science and Engineering 18Cs43: Operating Systems Lecture Notes (QUESTION & ANSWER)
100% (1)
Department of Computer Science and Engineering 18Cs43: Operating Systems Lecture Notes (QUESTION & ANSWER)
8 pages
CS3492 DBMS-Important-2-Mark With Answer
No ratings yet
CS3492 DBMS-Important-2-Mark With Answer
16 pages
Very Short Notes of Dbms RGPV (CS502)
No ratings yet
Very Short Notes of Dbms RGPV (CS502)
17 pages
Seventh / Eightth Semester
No ratings yet
Seventh / Eightth Semester
21 pages
Chapter - 3 Transaction Processing
No ratings yet
Chapter - 3 Transaction Processing
47 pages
Acid Properties Dbms
No ratings yet
Acid Properties Dbms
10 pages
Green Black Geometric How To Find The Right University Presentation
No ratings yet
Green Black Geometric How To Find The Right University Presentation
15 pages
Unit 4 and Unit 5 DBMS
No ratings yet
Unit 4 and Unit 5 DBMS
68 pages
Critical Section in OS
No ratings yet
Critical Section in OS
4 pages
DDBS
No ratings yet
DDBS
19 pages
B.Tech. 3rd Yr CSE (AIML) 2022 23
No ratings yet
B.Tech. 3rd Yr CSE (AIML) 2022 23
33 pages
Transaction Management - I
No ratings yet
Transaction Management - I
43 pages
SV University - BTech CBCS Sem5 Syllabus
No ratings yet
SV University - BTech CBCS Sem5 Syllabus
22 pages
CS3492 Database Management Systems Question Bank 1
No ratings yet
CS3492 Database Management Systems Question Bank 1
11 pages

Chapter 7 - Distributed Database System

Uploaded by

Chapter 7 - Distributed Database System

Uploaded by

Chapter seven

1. Distributed Database Concepts

i. Complexity- The data replication , failure recovery , network

Object Unix Relational

Dname Dnumber Mgrssn Mgrstartdate 13 13

• what is your best strategy that can optimize data

 Preferred strategy: Choose strategy 2.

– Dealing with multiple copies of data items:

Site 4 Communications neteork

• It consists of clients running client software, a set of servers

Many Web applications use an architecture called the three-tier

You might also like