0% found this document useful (0 votes)

15 views

7-Distributed DB

Chapter 7 discusses distributed databases and client-server architectures, outlining key concepts such as data fragmentation, replication, and allocation. It covers the advantages of distributed database systems, including transparency, reliability, and improved performance, while also addressing query processing strategies and concurrency control challenges. The chapter emphasizes the importance of optimizing data transfer and maintaining consistency across multiple sites in a distributed environment.

Uploaded by

awekedessie250

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

7-Distributed DB

Uploaded by

awekedessie250

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Chapter 7

Distributed Databases and

Client-Server Architectures

© 2016
Chapter 7 Outline
1. Distributed Database Concepts
2. Data Fragmentation, Replication and Allocation
3. Types of Distributed Database Systems
4. Query Processing
5. Concurrency Control and Recovery
6. 3-Tier Client-Server Architecture

© 2016 Slide 6- 2
Distributed Database Concepts
 A transaction can be executed by multiple
networked computers in a unified manner.
 A distributed database (DDB) processes Unit of
execution (a transaction) in a distributed manner.
A distributed database (DDB) can be defined as
 A distributed database (DDB) is a collection of
multiple logically related database distributed over
a computer network, and a distributed database
management system as a software system that
manages a distributed database while making the
distribution transparent to the user.

© 2016 Slide 6- 3
Distributed Database System
 Advantages
 Management of distributed data with different
levels of transparency:
 This refers to the physical placement of data (files,
relations, etc.) which is not known to the user
(distribution transparency).

© 2016 Slide 6- 4
Distributed Database System
 Advantages (transparency, contd.)
 The EMPLOYEE, PROJECT, and WORKS_ON
tables may be fragmented horizontally and stored
with possible replication as shown below.

© 2016 Slide 6- 5
Distributed Database System
 Advantages (transparency, contd.)
 Distribution and Network transparency:
 Users do not have to worry about operational details
of the network.
 There is Location transparency, which refers to freedom of
issuing command from any location without affecting its
working.
 Then there is Naming transparency, which allows access
to any names object (files, relations, etc.) from any
location.

© 2016 Slide 6- 6
Distributed Database System
 Advantages (transparency, contd.)
 Replication transparency:
 It allows to store copies of a data at multiple sites as
shown in the above diagram.
 This is done to minimize access time to the required
data.
 Fragmentation transparency:
 Allows to fragment a relation horizontally (create a
subset of tuples of a relation) or vertically (create a
subset of columns of a relation).

© 2016 Slide 6- 7
Distributed Database System
 Other Advantages
 Increased reliability and availability:
 Reliability refers to system live time, that is, system
is running efficiently most of the time.
 Availability is the probability that the system is
continuously available (usable or accessible) during
a time interval.
 A distributed database system has multiple nodes
(computers) and if one fails then others are
available to do the job.

© 2016 Slide 6- 8
Distributed Database System
 Other Advantages (contd.)
 Improved performance:
 A distributed DBMS fragments the database to keep
data closer to where it is needed most.
 This reduces data management (access and
modification) time significantly.
 Easier expansion (scalability):
 Allows new nodes (computers) to be added anytime
without changing the entire configuration.

 Data Fragmentation
 Split a relation into logically related and correct
parts. A relation can be fragmented in two ways:
 Horizontal Fragmentation
 Vertical Fragmentation

© 2016 Slide 6- 10
Data Fragmentation, Replication and
Allocation
 Horizontal fragmentation
 It is a horizontal subset of a relation which contain those of
tuples which satisfy selection conditions.
 Consider the Employee relation with selection condition
(DNO = 5). All tuples satisfy this condition will create a
subset which will be a horizontal fragment of Employee
relation.
 A selection condition may be composed of several
conditions connected by AND or OR.
 Derived horizontal fragmentation: It is the partitioning of a
primary relation to other secondary relations which are
related with Foreign keys.

© 2016 Slide 6- 11
Data Fragmentation, Replication and
Allocation
 Vertical fragmentation
 It is a subset of a relation which is created by a subset of
columns. Thus a vertical fragment of a relation will contain
values of selected columns. There is no selection condition
used in vertical fragmentation.
 Consider the Employee relation. A vertical fragment of can
be created by keeping the values of Name, Bdate, Sex, and
Address.
 Because there is no condition for creating a vertical
fragment, each fragment must include the primary key
attribute of the parent relation Employee. In this way all
vertical fragments of a relation are connected.

© 2016 Slide 6- 12
Data Fragmentation, Replication and
Allocation
 Data Replication
 Database is replicated to all sites.
 In full replication the entire database is replicated and in
partial replication some selected part is replicated to some
of the sites.
 Data replication is achieved through a replication schema.
 Data Distribution (Data Allocation)
 This is relevant only in the case of partial replication or
partition.
 The selected portion of the database is distributed to the
database sites.

© 2016 Slide 6- 13
Types of Distributed Database Systems
 Homogeneous
 All sites of the database
system have identical
setup, i.e., same database
system software.
 The underlying operating
system may be different.
 For example, all sites run
Oracle or DB2, or Sybase
or some other database
system.
 The underlying operating
systems can be a mixture
of Linux, Window, Unix,
etc.

© 2016 Slide 6- 14
Types of Distributed Database Systems
 Heterogeneous
 Federated: Each site may run different database system but the
data access is managed through a single conceptual schema.
 This implies that the degree of local autonomy is minimum. Each site
must adhere to a centralized access policy. There may be a global
schema.
 Multidatabase: There is no one conceptual global schema. For
data access a schema is constructed dynamically as needed by
the application software.

© 2016 Slide 6- 15
Query Processing in Distributed
Databases
 Issues
 Cost of transferring data (files and results) over the network.
 This cost is usually high so some optimization is necessary.
 Example relations: Employee at site 1 and Department at Site
2
 Employee at site 1. 10,000 rows. Row size = 100 bytes. Table
size = 106 bytes.
Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno

 Department at Site 2. 100 rows. Row size = 35 bytes. Table

size = 3,500 bytes. Dname Dnumber Mgrssn Mgrstartdate
 Q: For each employee, retrieve employee name and
department name Where the employee works.
 Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)

© 2016 Slide 6- 16
Query Processing in Distributed
Databases
 Result
 The result of this query will have 10,000 tuples,
assuming that every employee is related to a
department.
 Suppose each result tuple is 40 bytes long. The
query is submitted at site 3 and the result is sent to
this site.
 Problem: Employee and Department relations are
not present at site 3.

© 2016 Slide 6- 17
Query Processing in Distributed
Databases
 Strategies:
1. Transfer Employee and Department to site 3.
 Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3.
 Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at
site 1, and send the result to site 3.
 Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
 Optimization criteria: minimizing data transfer.

© 2016 Slide 6- 18
Query Processing in Distributed
Databases
 Strategies:
1. Transfer Employee and Department to site 3.
 Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3.
 Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at
site 1, and send the result to site 3.
 Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
 Optimization criteria: minimizing data transfer.
 Preferred approach: strategy 3.

© 2016 Slide 6- 19
Query Processing in Distributed
Databases
 Consider the query
 Q’: For each department, retrieve the department
name and the name of the department manager
 Relational Algebra expression:
 Fname,Lname,Dname (Employee Mgrssn = SSN
Department)

© 2016 Slide 6- 20
Query Processing in Distributed
Databases
 The result of this query will have 100 tuples, assuming
that every department has a manager, the execution
strategies are:
1. Transfer Employee and Department to the result site and
perform the join at site 3.
 Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3. Query result size = 40 * 100 =
4000 bytes.
 Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site
1 and send the result to site 3.
 Total transfer size = 4000 + 3500 = 7500 bytes.

© 2016 Slide 6- 21
Query Processing in Distributed
Databases
 The result of this query will have 100 tuples, assuming
that every department has a manager, the execution
strategies are:
1. Transfer Employee and Department to the result site and
perform the join at site 3.
 Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3. Query result size = 40 * 100 =
4000 bytes.
 Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site
1 and send the result to site 3.
 Total transfer size = 4000 + 3500 = 7500 bytes.
 Preferred strategy: Choose strategy 3.

© 2016 Slide 6- 22
Query Processing in Distributed
Databases
 Now suppose the result site is 2. Possible
strategies :
1. Transfer Employee relation to site 2, execute the
query and present the result to the user at site 2.
 Total transfer size = 1,000,000 bytes for both
queries Q and Q’.
2. Transfer Department relation to site 1, execute
join at site 1 and send the result back to site 2.
 Total transfer size for Q = 400,000 + 3500 =
403,500 bytes and
 for Q’ = 4000 + 3500 = 7500 bytes.

© 2016 Slide 6- 23
Query Processing in Distributed
Databases
 Semijoin:
 Objective is to reduce the number of tuples in a relation
before transferring it to another site.
 Example execution of Q or Q’:
1. Project the join attributes of Department at site 2, and
transfer them to site 1. For Q, 4 * 100 = 400 bytes are
transferred and for Q’, 9 * 100 = 900 bytes are transferred.
2. Join the transferred file with the Employee relation at site
1, and transfer the required attributes from the resulting file
to site 2. For Q, 34 * 10,000 = 340,000 bytes are
transferred and for Q’, 39 * 100 = 3900 bytes are
transferred.
3. Execute the query by joining the transferred file with
Department and present the result to the user at site 2.

© 2016 Slide 6- 24
Concurrency Control and Recovery
 Distributed Databases encounter a number of
concurrency control and recovery problems which
are not present in centralized databases. Some
of them are listed below.
 Dealing with multiple copies of data items
 Failure of individual sites
 Communication link failure
 Distributed commit
 Distributed deadlock

© 2016 Slide 6- 25
Concurrency Control and Recovery
 Details
 Dealing with multiple copies of data items:
 The concurrency control must maintain global
consistency. Likewise the recovery mechanism
must recover all copies and maintain consistency
after recovery.
 Failure of individual sites:
 Database availability must not be affected due to
the failure of one or two sites and the recovery
scheme must recover them before they are
available for use.

© 2016 Slide 6- 26
Concurrency Control and Recovery
 Details (contd.)
 Communication link failure:
 This failure may create network partition which would affect
database availability even though all database sites may be
running.
 Distributed commit:
 A transaction may be fragmented and they may be executed
by a number of sites. This require a two or three-phase
commit approach for transaction commit.
 Distributed deadlock:
 Since transactions are processed at multiple sites, two or more
sites may get involved in deadlock. This must be resolved in a
distributed manner.

© 2016 Slide 6- 27
Concurrency Control and Recovery
 Distributed Concurrency control based on a
distributed copy of a data item
 Primary site technique: A single site is
designated as a primary site which serves as a
coordinator for transaction management.

© 2016 Slide 6- 28
Concurrency Control and Recovery
 Transaction management:
 Concurrency control and commit are managed by
this site.
 In two phase locking, this site manages locking
and releasing data items. If all transactions follow
two-phase policy at all sites, then serializability is
guaranteed.

© 2016 Slide 6- 29
Concurrency Control and Recovery
 Transaction Management
 Advantages:
 An extension to the centralized two phase locking so
implementation and management is simple.
 Data items are locked only at one site but they can be
accessed at any site.
 Disadvantages:
 All transaction management activities go to primary site which
is likely to overload the site.
 If the primary site fails, the entire system is inaccessible.
 To aid recovery a backup site is designated which behaves
as a shadow of primary site. In case of primary site failure,
backup site can act as primary site.

© 2016 Slide 6- 30
Concurrency Control and Recovery
 Primary Copy Technique:
 In this approach, instead of a site, a data item partition is
designated as primary copy. To lock a data item just the
primary copy of the data item is locked.
 Advantages:
 Since primary copies are distributed at various sites, a single
site is not overloaded with locking and unlocking requests.
 Disadvantages:
 Identification of a primary copy is complex. A distributed
directory must be maintained, possibly at all sites.

© 2016 Slide 6- 31
Concurrency Control and Recovery
 Recovery from a coordinator failure
 In both approaches a coordinator site or copy may become
unavailable. This will require the selection of a new
coordinator.
 Primary site approach with no backup site:
 Aborts and restarts all active transactions at all sites. Elects
a new coordinator and initiates transaction processing.
 Primary site approach with backup site:
 Suspends all active transactions, designates the backup
site as the primary site and identifies a new back up site.
Primary site receives all transaction management
information to resume processing.
 Primary and backup sites fail or no backup site:
 Use election process to select a new coordinator site.

© 2016 Slide 6- 32
Concurrency Control and Recovery
 Concurrency control based on voting:
 There is no primary copy of coordinator.
 Send lock request to sites that have data item.
 If majority of sites grant lock then the requesting
transaction gets the data item.
 Locking information (grant or denied) is sent to all
these sites.
 To avoid unacceptably long wait, a time-out period
is defined. If the requesting transaction does not
get any vote information then the transaction is
aborted.

© 2016 Slide 6- 33
Client-Server Database Architecture
 It consists of clients running client software, a set
of servers which provide all database
functionalities and a reliable communication
infrastructure.
Server 1 Client 1

Client 2

Server 2 Client 3

Server n Client n
© 2016 Slide 6- 34
Client-Server Database Architecture
 Clients reach server for desired service, but
server does reach clients.
 The server software is responsible for local data
management at a site, much like centralized
DBMS software.
 The client software is responsible for most of the
distribution function.
 The communication software manages
communication among clients and servers.

© 2016 Slide 6- 35
Client-Server Database Architecture
 The processing of a SQL queries goes as follows:
 Client parses a user query and decomposes it into
a number of independent sub-queries. Each
subquery is sent to appropriate site for execution.
 Each server processes its query and sends the
result to the client.
 The client combines the results of subqueries and
produces the final result.

© 2016 Slide 6- 36
Recap
 Distributed Database Concepts
 Data Fragmentation, Replication and Allocation
 Types of Distributed Database Systems
 Query Processing
 Concurrency Control and Recovery
 3-Tier Client-Server Architecture

TDS C01
No ratings yet
TDS C01
29 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
35 pages
Maximo Integration Framework Architecture1 0
No ratings yet
Maximo Integration Framework Architecture1 0
17 pages
SQL & Advanced SQL
100% (6)
SQL & Advanced SQL
37 pages
Lab Manual 01 PDF
No ratings yet
Lab Manual 01 PDF
6 pages
7 Distributed DB
No ratings yet
7 Distributed DB
38 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
Data Communication Basics CH 7
No ratings yet
Data Communication Basics CH 7
27 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Distributed Database Frank Chinembiri and Florence-2
No ratings yet
Distributed Database Frank Chinembiri and Florence-2
42 pages
Final
No ratings yet
Final
46 pages
Distributed Database
100% (1)
Distributed Database
24 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
DBMS-Unit 5
No ratings yet
DBMS-Unit 5
27 pages
Chapter 6 DDBMS
No ratings yet
Chapter 6 DDBMS
41 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
Week 12- Distributed Databases
No ratings yet
Week 12- Distributed Databases
37 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
58 pages
Distributed Db New
No ratings yet
Distributed Db New
44 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
25 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
42 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
54 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
123 pages
Distrubuted Database Concept
No ratings yet
Distrubuted Database Concept
22 pages
lecture-1-ho (1)
No ratings yet
lecture-1-ho (1)
62 pages
Lecture 1 Ho PDF
No ratings yet
Lecture 1 Ho PDF
62 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Unit V NoSQL Databases
No ratings yet
Unit V NoSQL Databases
124 pages
Distributed Database: Source
No ratings yet
Distributed Database: Source
19 pages
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
No ratings yet
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
32 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
ddb unit 1-5
No ratings yet
ddb unit 1-5
190 pages
Unit 1 DISTRIBUTED DATABASE
No ratings yet
Unit 1 DISTRIBUTED DATABASE
6 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Lecture 8 - Distributed Databases
No ratings yet
Lecture 8 - Distributed Databases
4 pages
Distributed Databases: Benefits and Issues To Be Considered
No ratings yet
Distributed Databases: Benefits and Issues To Be Considered
25 pages
06 - Distributed DBMSs and Replication
No ratings yet
06 - Distributed DBMSs and Replication
55 pages
Distributed Database Systems (DDBS)
No ratings yet
Distributed Database Systems (DDBS)
30 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
41 pages
Chapter 6
No ratings yet
Chapter 6
45 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Distributed DBMS (Good)
No ratings yet
Distributed DBMS (Good)
58 pages
Database MC A
No ratings yet
Database MC A
16 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Distributed DB
No ratings yet
Distributed DB
146 pages
Distributed Dbms
No ratings yet
Distributed Dbms
57 pages
dbms-unit-v
No ratings yet
dbms-unit-v
27 pages
DDMS Part-1
No ratings yet
DDMS Part-1
35 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
Unit I (Distributed Databases)
No ratings yet
Unit I (Distributed Databases)
8 pages
UNIT- 1 DDB
No ratings yet
UNIT- 1 DDB
34 pages
Dd Mid Answers
No ratings yet
Dd Mid Answers
29 pages
Database Management System
From Everand
Database Management System
Knowledge Flow
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
advanced database lab manual (2)
No ratings yet
advanced database lab manual (2)
27 pages
1 Introduction to AI (2)
No ratings yet
1 Introduction to AI (2)
24 pages
CG chapter 3 ppt
No ratings yet
CG chapter 3 ppt
74 pages
Computer org.and architecture chapter three (1) (5)
No ratings yet
Computer org.and architecture chapter three (1) (5)
64 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
LAB MANUAL.docx
No ratings yet
LAB MANUAL.docx
8 pages
MCA S3 SQL for Data Science U1
No ratings yet
MCA S3 SQL for Data Science U1
18 pages
Bus System Stallion
No ratings yet
Bus System Stallion
51 pages
IDS UNIT-1
No ratings yet
IDS UNIT-1
20 pages
DZ Data Pipeline Essentials 2024
No ratings yet
DZ Data Pipeline Essentials 2024
6 pages
Question 6.16 Page 186: ( ( ( Employee) ) - X - ( ( Works - On) ) - X - ( ( Project) ) )
No ratings yet
Question 6.16 Page 186: ( ( ( Employee) ) - X - ( ( Works - On) ) - X - ( ( Project) ) )
8 pages
What Is Python
No ratings yet
What Is Python
40 pages
SAP Java Troubleshooting Monitoring
No ratings yet
SAP Java Troubleshooting Monitoring
24 pages
Assign Met
No ratings yet
Assign Met
4 pages
Unit8. MS Access - Create Reports
No ratings yet
Unit8. MS Access - Create Reports
14 pages
Zomato Recommendation and Price Prediction System
No ratings yet
Zomato Recommendation and Price Prediction System
5 pages
IP practice/SAMPLE PAPER - 1 2024-25
No ratings yet
IP practice/SAMPLE PAPER - 1 2024-25
11 pages
Data Science Professional Final
No ratings yet
Data Science Professional Final
21 pages
Ecumaster Dl1: User Manual
No ratings yet
Ecumaster Dl1: User Manual
4 pages
PostgreSQL Tutorial
100% (1)
PostgreSQL Tutorial
13 pages
VMCE2021.by .VCEplus.90q-DEMO
No ratings yet
VMCE2021.by .VCEplus.90q-DEMO
44 pages
Power BI Gateway
No ratings yet
Power BI Gateway
19 pages
Cse2006 Programming-In-java LP 1.0 8 Cse2006-Programming-In-java LP 1.0 1 Programming in Java
No ratings yet
Cse2006 Programming-In-java LP 1.0 8 Cse2006-Programming-In-java LP 1.0 1 Programming in Java
4 pages
Rysokqnqm0OOPs CP
No ratings yet
Rysokqnqm0OOPs CP
4 pages
AZ204 Renewal Exam 2023
No ratings yet
AZ204 Renewal Exam 2023
4 pages
MongoDB Official Cheat Sheet
100% (1)
MongoDB Official Cheat Sheet
15 pages
UNIT-2 BI
No ratings yet
UNIT-2 BI
50 pages
Class 10 IT Project
No ratings yet
Class 10 IT Project
15 pages
DMUU Assignment2 - GroupC
No ratings yet
DMUU Assignment2 - GroupC
4 pages
Harshwardhan.M CV
No ratings yet
Harshwardhan.M CV
4 pages