0% found this document useful (0 votes)

256 views

Distributed Database Design

The document discusses distributed database design. It covers topics such as: - The goals of distributed database design including distributing data and programs across network sites. - Approaches to distributed database design including top-down and bottom-up strategies. - Key considerations in distributed database design including the level of sharing, access patterns, and knowledge about access patterns. - The processes involved in distributed database design such as fragmentation, allocation, and ensuring correctness rules are followed. - Factors that influence optimal distribution decisions such as cost, performance, and information about the logical database organization, application locations and access characteristics.

Uploaded by

Pilar Hdez Bastida

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

256 views

Distributed Database Design

Uploaded by

Pilar Hdez Bastida

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Distributed Database Design

Seminary of Distributed and Parallel Database Management Systems

Diego Godoy Gonzlez Cueto

Table of Contents
1.-Design Problem 1.1 What's design? 1.2 Organization of distributed systems
4.- Allocation 4.1 Allocation Problem 4.2 Optimality Measures 4.3 Alternatives 4.4 Information Requirements 4.4.1 Database

1.2.1 Level of sharing

1.2.2 Behavior of access patterns 1.2.3 Level of knowledge

4.4.2 Application
4.4.3 Site 4.4.4 Network 5.- Bottom-Up Strategy 5.1 Data Warehouse Approach 5.2 Design Process 5.3 Approaches 5.4 Relationship Schemes

1.2.4 Framework of Distribution

5.5 Generation Process

5.5.1 Matching/Integration/Mapping

What is design?

What is design?
Design is the creation of a plan or convention for the construction of an object or a system (as in architectural blueprints, engineering drawings, business processes, circuit diagrams and sewing patterns).

In order to be functional and optimize or improve a certain characteristic.

Design Problem

Design problem of distributed systems: Making decisions about the placement of data and programs across the sites of a computer network.

In DDBMS, the distribution of applications involves:

Distribution of the DDBMS software Distribution of applications that run on the database
5

Organization of distributed systems

It has been suggested that the organization of distributed systems can be investigated along three orthogonal dimensions.

Level of sharing
Behavior of access patterns Level of knowledge (on access pattern behavior)

Level of sharing
There are three possibilities in terms of this level:

No Sharing: Each application and its data execute at one site, and there is no communication with any other program or access to any data file at other sites. Level data sharing: All the programs are replicated at all the sites, but data files are not. Data-plus-program sharing: Both data and programs may be shared, meaning that a program at 7 a given site can request a service from another

Behavior of access patterns

It is possible to identify two alternatives in terms of this level. The access patterns of user requests may be: Static: So that they don't change over time. Dynamic: Which can change over time. The relationship between the distributed database design and query processing is established in this dimension.

Level of knowledge
The possibilities within this level are:

Without Information: It is not known by the designer how the users will access the database.

Complete Information: The access patterns can be predicted without deviations.

Partial Information: There are deviations from the predictions.

Framework of Distribution

Design Strategies

Top-down approach - Designing systems from scratch - Homogeneous systems - Better for tightly integrated homogeneous DDBMS

Bottom-up approach
- The databases already exist at a number of sites

- The databases should be connected to solve common tasks

Top-Down Design Process

Requirements Analysis: Defines the environment of the system. System Requirements (Objectives): Performance, reliability and availability, economics, and expandability (flexibility). Requirements document is input to two parallel activities: - Conceptual Design: Determine entity types and relationships among these entities. Entity Analysis/Functional Analysis
12

Top-Down Design Process

Distribution Design: Takes as inputs the results collected from View Design (Global Conceptual Schema -GCS- and Access Patterns).

In order to design the Local Conceptual Schema (LCS):

Distributing the entities (relations) over the sites of the of the distributed system.

Design Schema

Distribution of Relations

We first divide them into sub-relations called fragments. Therefore, the distribution design activity splits in two steps: - Fragmentation - Allocation Physical Design: Maps the LCS to the physical storage devices available at the corresponding sites.

Reasons to have observation and monitoring?

Distribution Design Issues

Why fragment at all? How should we fragment? How much should we fragment? Is there any way to test the correctness of decomposition?

How should we allocate?

What is the necessary information for fragmentation and allocation?
16

Reasons for Fragmentation

From a distribution viewpoint, there is really no need to fragment data. The important issue is to have an appropriate unit of distribution. A relation is not a suitable unit, because: - Application views are usually subsets of relations. - If the applications that have views defined on a given relation reside at different sites, two 17 alternatives can be followed (one site or some/all of

Fragmentation Difficulties

If the applications have conflicting requirements that prevent decomposition of the relation into mutually exclusive fragments, those applications whose views are defined on more than one fragment may suffer performance degradation. As a result of fragmentation, attributes participating in a dependency may be decomposed into different fragments that might be allocated to different sites. In this case, even the simpler task of checking for dependencies would result in chasing after data in 18 a number of sites.

Fragmentation Alternatives
Relation instances are essentially tables, so the issue is one of finding alternative ways of dividing a table into smaller ones. There are two ways to do so:

Horizontally
Vertically Hybrid/Mixed (Nested)

Fragmentation Alternatives

Fragmentation Alternatives
- Horizontal

- Vertical

Degree of Fragmentation
The degree of fragmentation goes from one of two extremes:

Not to fragment at all.

To fragment to the level of individual tuples (horizontal) or to the level of individual attributes (vertical). It is need to find a suitable level of fragmentation that is a compromise between the two extremes. Such a level can only be defined with respect to the 22

Degree of Fragmentation
How to do so?

In general, the applications need to be characterized with respect to a number of parameters.

According to the values of these parameters, individual fragments can be identified.

Correctness Rule of Fragmentation

There are three rules that, together, ensure the database does not undergo semantic change during fragmentation:

Completeness Reconstruction Disjointness

Completeness
For a relation instance R decomposed into fragments FR = {R1, R2, . . . ,Rn}, each data item that can be found in R can also be found in one or more of Ris

Ensures that the data in a global relation are mapped into fragments without any loss.
Horizontal Tuple Vertical - Attribute

Reconstruction
For a relation R decomposed into fragments FR = {R1, R2, . . . ,Rn}, it should be possible to define a relational operator such that:

R = Ri, Ri FR
Ensures that constraints defined on the data in the form of dependencies are preserved. ( will be different for different forms of fragmentation)
26

Disjointness
For a relation R horizontally decomposed into fragments FR = {R1, R2, . . . ,Rn} and data item di is in Rj, it is not in any other fragment Rk (k j).

It ensures that the horizontal fragments are disjoint.

If vertically decomposed, primary key attributes are repeated in all it's fragments (for reconstruction). (For these case disjointness is defined only in nonprimary key attributes).
27

Allocation Problem
Assume that there are a set of fragments:

F = {F1, F2, . . . , Fn } And a distributed system consisting of sites: S = {S1, S2 , . . . , Sm } On which a set of applications:

Q = {q1, q2, . . . , qq } is running.

The allocation problem involves finding the optimal distribution of F to S.
28

Optimality Measures
The optimality can be defined with respect of two measures:

Minimal Cost: The cost function consists of the cost of storing each Fi at a site Sj, the cost of querying Fi at site Sj, the cost of updating Fi at all sites where it is stored, and the cost of data communication. (Scheme that minimizes a combined cost function) Performance: The allocation strategy is designed to maintain a performance metric. Two well-known 29

Allocation
Assuming that the database is fragmented properly, one has to decide on the allocation of the fragments to various sites on the network.

If the data is allocated the data can be:

Replicated Maintained as a single copy

Allocation Alternatives
The reasons for Replication are:

Reliability: With multiple copies, comes good chances for some copies to be accessible even with system failures. Efficiency of read-only queries: Queries that access the same data item can be executed in parallel (since there are copies in multiple sites). However the execution of an updated query, brings troubles, since the system has to check all the copies 31 were updated correctly.

Allocation Alternatives

Partitioned DB: Non-Replicated database contains fragments that are allocated to sites, and there is only one copy of any fragment on the network.

In case of replication:
The database exists in its entirety at each site (Fully Replicated DB). Fragments are distributed to the sites in such a way that copies of a fragment may reside in multiple sites (Partially Replicated DB).
32

Allocation Alternatives

Information Requirements
Too many factors contribute to an optimal design.

The logical organization of the database. The location of the applications. The access characteristics of the applications to the database.

The properties of the computer systems at each site.

All have an influence on distribution decisions. And this complicates the formulation of the distribution problem. 34

Information Requirements
The information needed for distribution design can be divided into four categories:

Database information.

Application information.
Communication network information. (Quantitative nature) Computer system information. (Quantitative nature)
35

Database Information
We now need to define the selectivity of a fragment Fj with respect to query qi. This is the number of tuples of Fj that need to be accessed in order to process qi.

SELi (Fj). Another necessary information on the database fragments is their size. The size of a fragment Fj is given by:

SIZE(Fj) = CARD(Fj) LENGTH(Fj)

Application Information
The two important measures are the number of read accesses that a query qi makes to a fragment Fj during its execution (RRij), and its counterpart for the update accesses (URij).

We also need to define two matrices UM and RM, with elements uij and rij, respectively. A vector O to specifies the originating site of query qi.
37

Site Information

For each computer site, we need to know its storage and processing capacity. (Elaborate functions or by estimates)

The unit cost of storing data at site Sk.

There is also a need to specify the cost of processing one unit of work at site Sk. (The work unit should be identical to that of the RR and UR measures)
38

Network Information

In our model we assume the existence of a simple network where the cost of communication is defined in terms of one frame of data (cost per frame between sites Si and Sj). To enable the calculation of the number of messages, we use the size (in bytes) of one frame.

Bottom-Up Design Process

In this case, a number of databases already exist, and the design task involves integrating them into one database.

The process consists of integrating local databases with their (local) schemes into a global database with its global conceptual schema (GCS). Database integration can be either physical or logical.
The integration is aided by extract-transform-load (ETL) tools that enable extraction of data from
40

Data Warehouse Approach

Approaches

GCS Defined first: In which case the bottom-up design involves mapping LCSs to this schema. (This is the case in data warehouses)

GCS defined as an integration of parts of LCSs: In this case, the bottom-up design involves both the generation of the GCS and the mapping of individual LCSs to this GCS.

Relationship of Schemes
If the GCS is defined up-front, the relationship between the GCS and the LCS can be of two fundamental types:

Local-as-View (LAV): The GCS definition exists, and each LCS is treated as a view definition over it. (The results are constrained by the objects in the local DBMSs).
Global-as-View (GAV): The GCS is defined as a set of views over the LCSs. (The query results are constrained to the set of 43 objects that are defined in the GCS).

GAV and LAV Mapping

Database Generation Process

Then the intermediate schemes are used to generate a GCS.

Schema Matching: to determine the syntactic and semantic correspondences among the translated LCS elements. Schema Integration: of the common schema elements into a global conceptual (mediated) schema if one has not yet been defined. Schema Mapping: determines how to map the elements of each LCS to the other elements of the
45

Database Generation Process

Scheme Heterogeneity
Structural conflicts occur in four possible ways:

Type conflicts: when the same object is represented by an attribute in one schema and by an entity (relation) in another Dependency conflicts: when different relationship modes are used to represent the same thing in different schemes. Key conflicts: when different candidate keys are available and different primary keys are selected in different schemes.

Integration Methodologies
This step is only necessary if a GCS has not already been defined and matching was performed on individual LCSs.

Binary: involve the manipulation of two schemes at a time. These can occur in a stepwise (ladder) fashion or in a purely binary fashion, where each schema is integrated with one other.
Nary: offers more flexibility (more information is available) and is more general (the number of schemes can be varied depending on the integrators48 preferences).

Integration Methodologies
Binary

Nary

Mapping Methodologies

Once a GCS is defined, it is necessary to identify how the data from each of the local databases (source) can be mapped to GCS (target) while preserving semantic consistency (as defined by both the source and the target). Although schema matching has identified the correspondences between the LCSs and the GCS, it may not have identified explicitly how to obtain the global database from the local ones.
50

Mapping Methodologies

Mapping Creation: the process of creating explicit queries that map data from a local database to the global data.

Mapping Maintenance: the detection and correction of mapping inconsistencies resulting from schema evolution.

Data Cleaning
Errors in source databases can always occur, requiring cleaning in order to correctly answer user queries. Cleaning is performed as the global database is created. (Warehouse) The process needs to be performed during query processing when data are returned from the source databases. (Integration Systems)

Snowpro Advanced Data Engineer
No ratings yet
Snowpro Advanced Data Engineer
17 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
J8pF3RMRxmfKRd0TFcZFw Learning Log Template Define Problems and Ask Questions With Data
No ratings yet
J8pF3RMRxmfKRd0TFcZFw Learning Log Template Define Problems and Ask Questions With Data
1 page
System Analysis and Design - Quick Guide - Tutorialspoint
No ratings yet
System Analysis and Design - Quick Guide - Tutorialspoint
70 pages
IntroToDossiers PDF
No ratings yet
IntroToDossiers PDF
119 pages
Quiz #5
No ratings yet
Quiz #5
2 pages
Oracle Enterprise Manager 12c Hands On Lab Cheat Sheet
No ratings yet
Oracle Enterprise Manager 12c Hands On Lab Cheat Sheet
2 pages
Netezza Stored Procedures Guide Rev 2014 PDF
No ratings yet
Netezza Stored Procedures Guide Rev 2014 PDF
86 pages
What Is SAP Erp5
No ratings yet
What Is SAP Erp5
20 pages
Audit - Final Revision Capsule by ICAI
No ratings yet
Audit - Final Revision Capsule by ICAI
67 pages
Data Warehouse Development Approach
No ratings yet
Data Warehouse Development Approach
25 pages
Inmon Vs Kimball
No ratings yet
Inmon Vs Kimball
32 pages
Powershell Interview Questions
No ratings yet
Powershell Interview Questions
2 pages
Database Design For Ordering and Delivery Management System
No ratings yet
Database Design For Ordering and Delivery Management System
7 pages
Informatica Cloud (IICS) Architecture
No ratings yet
Informatica Cloud (IICS) Architecture
21 pages
Mysql Server 5.7 by CJ
100% (1)
Mysql Server 5.7 by CJ
42 pages
Akshay Pratap - Informatica IICS
No ratings yet
Akshay Pratap - Informatica IICS
3 pages
2 Chapter 3 Query Optimization
No ratings yet
2 Chapter 3 Query Optimization
29 pages
ERStudioDA 9.7 QuickStart en
No ratings yet
ERStudioDA 9.7 QuickStart en
63 pages
How To Set Up REPMGR With WITNESS For PostgreSQL 10 Official Pythian®® Blog
No ratings yet
How To Set Up REPMGR With WITNESS For PostgreSQL 10 Official Pythian®® Blog
5 pages
Cloud Computing Gov Conf 1209
No ratings yet
Cloud Computing Gov Conf 1209
21 pages
Querying Microsoft SQL Server
No ratings yet
Querying Microsoft SQL Server
3 pages
Querying With T-SQL - 01
No ratings yet
Querying With T-SQL - 01
24 pages
Crystal Report Server PDF
No ratings yet
Crystal Report Server PDF
16 pages
Structured Approach To Bi Testing
No ratings yet
Structured Approach To Bi Testing
13 pages
Apache Pig
No ratings yet
Apache Pig
61 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Course Material Tableau
No ratings yet
Course Material Tableau
54 pages
C360 103 InstallationAndConfigurationGuide en
No ratings yet
C360 103 InstallationAndConfigurationGuide en
80 pages
Microsoft Dumps 70-761 v2017-01-10 by Matt 60q
No ratings yet
Microsoft Dumps 70-761 v2017-01-10 by Matt 60q
64 pages
Fundamentals of Database Systems: Lesson 1: Introduction
No ratings yet
Fundamentals of Database Systems: Lesson 1: Introduction
35 pages
AWS Certified Database Specialty - Exam Guide
No ratings yet
AWS Certified Database Specialty - Exam Guide
11 pages
20764C 01 PDF
No ratings yet
20764C 01 PDF
26 pages
SQL Replication Basic
No ratings yet
SQL Replication Basic
22 pages
Multiple Choice Questions: Principles of Database Management
No ratings yet
Multiple Choice Questions: Principles of Database Management
8 pages
SQL Server Note
No ratings yet
SQL Server Note
42 pages
4-Stored Procedures
No ratings yet
4-Stored Procedures
22 pages
SQL Server Automation (Maintenance Plan)
No ratings yet
SQL Server Automation (Maintenance Plan)
18 pages
dp-203
No ratings yet
dp-203
353 pages
Hands On Obiee 11g
No ratings yet
Hands On Obiee 11g
34 pages
Methodology For Data Validation v1.0 Rev-2016-06 Final
No ratings yet
Methodology For Data Validation v1.0 Rev-2016-06 Final
76 pages
NoSQL and SQL Data Modeling Bringing Together Data, Semantics, and Software (Hills Ted.)
No ratings yet
NoSQL and SQL Data Modeling Bringing Together Data, Semantics, and Software (Hills Ted.)
282 pages
Uso de SQL
No ratings yet
Uso de SQL
131 pages
ADBMS Parallel and Distributed Databases
No ratings yet
ADBMS Parallel and Distributed Databases
98 pages
MySQL Perf Tuning Best Practices
No ratings yet
MySQL Perf Tuning Best Practices
30 pages
Power BI - Interview Questions
No ratings yet
Power BI - Interview Questions
61 pages
MySQL and SSD: Usage Patterns
No ratings yet
MySQL and SSD: Usage Patterns
29 pages
Dbms Quiz
No ratings yet
Dbms Quiz
13 pages
Leveling Up With SQL Advanced Techniques For Transforming Data Into Insights 9781484296851 9781484296844
No ratings yet
Leveling Up With SQL Advanced Techniques For Transforming Data Into Insights 9781484296851 9781484296844
449 pages
AWS Databases
No ratings yet
AWS Databases
31 pages
Module 09 Advance Query Techniques
No ratings yet
Module 09 Advance Query Techniques
28 pages
MYSQL MCQs - 1
No ratings yet
MYSQL MCQs - 1
3 pages
Datawarehouse Tools
No ratings yet
Datawarehouse Tools
8 pages
Content Server 16.2.4 On SQL Server 2016 Performance Whitepaper
No ratings yet
Content Server 16.2.4 On SQL Server 2016 Performance Whitepaper
69 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
MyRocks LSM Tree Database Storage Engine Serving Facebooks Social Graph
No ratings yet
MyRocks LSM Tree Database Storage Engine Serving Facebooks Social Graph
14 pages
Question: Dimension Modeling Types Along With Their Significance
No ratings yet
Question: Dimension Modeling Types Along With Their Significance
27 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
18 pages
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Johan Galtung - Breaking The Cycle of Violent Conflict
100% (3)
Johan Galtung - Breaking The Cycle of Violent Conflict
44 pages
Institutional Approach
No ratings yet
Institutional Approach
12 pages
The Filipino Way
No ratings yet
The Filipino Way
18 pages
How To Survive An Affair - Chatty Teacup Coaching PDF
100% (1)
How To Survive An Affair - Chatty Teacup Coaching PDF
7 pages
Lesson-3-The-Approaches-and-Methods-in-Teaching-EPPTLE
No ratings yet
Lesson-3-The-Approaches-and-Methods-in-Teaching-EPPTLE
16 pages
Student Teaching Handbook Linked
No ratings yet
Student Teaching Handbook Linked
14 pages
Christian Mysticism - Paul Sedir
100% (1)
Christian Mysticism - Paul Sedir
131 pages
A Debate Between Alok Rai and Shahid Amin Regarding Hindi
No ratings yet
A Debate Between Alok Rai and Shahid Amin Regarding Hindi
22 pages
Gibbs Vs Govt Digest and Fulltext
No ratings yet
Gibbs Vs Govt Digest and Fulltext
5 pages
Acknowledgement
No ratings yet
Acknowledgement
17 pages
The Seven Organisational Levels of Discourse
100% (1)
The Seven Organisational Levels of Discourse
9 pages
Use of Gamification in Vocabulary Learning: A Case Study in Macau
No ratings yet
Use of Gamification in Vocabulary Learning: A Case Study in Macau
8 pages
William Wordsworth: Ode: Intimations of Immortality
100% (1)
William Wordsworth: Ode: Intimations of Immortality
9 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
12 PPG - Week 1
No ratings yet
12 PPG - Week 1
6 pages
Ten Pints of Beer
No ratings yet
Ten Pints of Beer
13 pages
Reading Part 4 Preparation 6
No ratings yet
Reading Part 4 Preparation 6
2 pages
Tai Chi Class Brochure of HARUKA WUSHU CULTURE INSTITUTE
No ratings yet
Tai Chi Class Brochure of HARUKA WUSHU CULTURE INSTITUTE
2 pages
Ethics in Science Communication
No ratings yet
Ethics in Science Communication
28 pages
Bhutan Culture & Values
50% (2)
Bhutan Culture & Values
26 pages
Cases: Ohio Lottery: Innovative Research Design Drives Winning
No ratings yet
Cases: Ohio Lottery: Innovative Research Design Drives Winning
18 pages
Formative Assessment Final
No ratings yet
Formative Assessment Final
3 pages
Daughters and Sons of Caine A Guide To The Sabbat Factions Complete
89% (9)
Daughters and Sons of Caine A Guide To The Sabbat Factions Complete
149 pages
06 Sambajon Vs Suing
No ratings yet
06 Sambajon Vs Suing
14 pages
Narrative Report On Independence Day
No ratings yet
Narrative Report On Independence Day
26 pages
Application of Hofstede
No ratings yet
Application of Hofstede
3 pages
Epsilon Delta Fun of Calculus
No ratings yet
Epsilon Delta Fun of Calculus
4 pages
92-002-Transcendental Appearance (Mumbai)
No ratings yet
92-002-Transcendental Appearance (Mumbai)
7 pages
Cultural Diversity and Globalisation: An Intercultural Hermeneutical (African) Perspective
No ratings yet
Cultural Diversity and Globalisation: An Intercultural Hermeneutical (African) Perspective
4 pages