Distributed Database
Distributed Database
Distributed Database
Introduction: implementation.
This is an advanced course of the previous that you
must have previously studied and that is the Architecture and Design:
“Database Management Systems”. This course
enhances the concepts learnt earlier, moreover, the There are different architectures available for
applications where you will be applying the concepts designing distributed systems and we have to
and the techniques learnt in this course are also more identify the right one as different architectures are
advanced and complex by nature. The Distributed suitable for different environments. Moreover, there
Database Management Systems (DDBMS) uses the are different approaches to implement the DDBS
concepts of:
designs, we will study those approaches since each
1) Database Management Systems
2) Networking suits different environment.
Note:
The key is to identify the environments in which we
have to use the distributed databases. You may realize Selection of wrong architecture in an environment
that using distributed databases in some situations
results in an in-efficient system
may not prove to be fruitful. The implementation
may develop drawbacks or may become in-efficient. Technological Treatment:
As a computer or database expert, you will always be
having the basic assignment that you are given a Different design approaches of DDBS will be
system for which you have to design/develop a implemented using the prevailing DDBMSs, like SQL
solution or a database system. You will have several Server and Oracle. That will give you the idea of how
options available and interesting thing is that every a real DDBS will look like.
one of them would work. Now the challenge lies in
Theoretical Aspects:
selecting the most feasible solution. For this, the
merits and demerits of every approach have to be We will discuss the theoretical aspects related to the
analyzed. If this evaluation is not properly done in DDBS. The study of these issues will help you
the beginning then it is very difficult to apply administering a DDBS on one side and on the other
adjustments in the later stages. To be more precise, side it will help you in the further studies/research in
Distributed Database System (DDBS) will be one of the DDBS. The database management systems
many options available to you for any system that available today do most of the administration
you will be asked to develop. It is your job to analyze automatically but it is important for the database
whether the environment needs a DDBS solution or designer to know the background procedures so that
any other one. A wrong decision in this regard may the overall efficiency of the distributed database
introduce inefficiency rather than any advantage. management systems may be enhanced.
There are going to be four major component in this Recommended Books:
course:
1- Distributed Database Systems (2nd Edition) by
Introductory Stuff T.M., Ozsu, P. Valdusiez
Architectures and Design Issues Technological 2- Distributed Database Systems. By D. Bell, J.
Treatment Grimson, Addison-Wesley, 1992 3- Distributed
Systems: Concepts and Design, 4th Edition, by G.
Related Topics
Coulouris, J.
Introduction:
Dollimore, T. Kindberg, Addison-Wesley
This part will cover the introduction to the basic
concepts of databases and networks. Then we will
also realize the nature of application that need a
DDBS, so that we are convinced that this is the
environment and it requires a DDBS and then we
The book mentioned at No. 1 is the main book for information which is common to all the three
this course. It is a famous and one of the rare books systems but still being stored separately. This results
written on the topic. Course is mainly based on this in:
book. So you will holding this piece very frequent in
1) Data Redundancy
coming days, make it a bedside item.
2) Expensive Changes/Modifications due to
You are already familiar with the marks split, that
redundancy of the data
consists of mid-term exam, assignments and a final
exam. Good luck with your course and now lets start Database Approach:
reading.
To remove the defects from the file processing
History: systems the database was approach was used which
eliminated the interdependency of the program and
This part is from the first course of Databases.
the data. The changes/modifications can be brought
Computer applications are divided into two types:
about easily whether they were related to the
1) Data Processing Applications programs or the data itself.
Distributed Computing:
Note:
- Definition of a Distributed Database System
If two systems are connected through a network
(DDBS)
the coupling may be weak however if they are
sharing some hardware the coupling is strong. - The candidate applications for a DDBS
Local requirements: Each of the sites storing data in Following are some of the Database applications
a DDBS is called a local site. Every site is mainly that are strong candidates for a DDBS.
concerned with the requirements and data
Banking Applications: Take the example of any
management for the users directly associated with
Pakistani Bank. A bank has large number of
that site, called local users.
customers and its branches are spread across all
Global perspective: Apart from catering the local Pakistan (obviously, many of them have branches
requirements, the DDBS also fulfils the global around the world, their candidature is even
requirements. These are the requirements that are stronger). Now, in the modern banking, the
generated by the central management who want to customers not only access/use their accounts from
make overall decisions about the organization and within the branch rather they access data outside
want to have the overall picture of the organization the branch. Like, from ATMs/branches spread
data. The DDBS fulfils the global requirements in a across the city or country. Every time, when a user
transparent way, that is, the data for these operates his account from anywhere in the
requirements is fetched from all the local sites, country/world, his account/data is being accessed.
merged together and is presented to the global
Air ticketing: We now have the facility to book a
users. All this activity of fetching and merging is
seat in any airline from any location to any
hidden from the global user who gets the feeling as
destination. e.g. we can book return ticket from
if the data is being fetched from a single place.
Lahore to Karachi and from Karachi to Lahore from
In a DDBS environment, three types of accesses are the airline’s Lahore office. This system too, has a
involved: large number of users spread across a large area.
Whenever a booking is made, the data of the flights
Local access: the access by the users connected to a is accessed.
site and accessing the data from the same site.
Business at multiple locations: A company having
Remote access: a user connected to a site, lets say offices at multiple locations, or different units at
site 1, and accessing the data from site 2. different locations, like production, warehouses,
sales operating from different locations, each site
Global access: no matter from where ever the
storing data locally however, these units need to
access is made, data will be displayed after being
access each other’s data and data from all the sites
collected from all locations.
is required for the global access.
A user does not know from where he is getting the
Distributed Database Management System:
data. To the user it appears that the data is present
on the machine on which he is working. A software system that permits the management of
distributed database and makes the distribution
Distributed databases; where to apply:
transparent to the users.
As mentioned before, the DDBS is one of the possible
Like we need a DBMS for a centralized or client-
solutions for a database application. We need to
server database, we do need a DDBMS for a DDBS.
analyze the environment to decide whether it
A DDBMS will behave like a normal DBMS on the
local site, however, the additional facility that it
provides is the creation and maintenance of the Distributed files:
global access where data across multiple sites is
A collection of files stored on differed computers of a
accessed against a single query. The approach that
network, not a DDBS; Why?
most of the current commercial DBMS vendors (like
Oracle, SQL Server, DB2, Sybase) have adopted is This is not enough for DDBS, as the data should be
that they provide different versions for different logically related.
situations. If the user needs a desktop database for
the single computer usage, then a smaller version is Note:
available that does not support the remote access
DDBS is logically related, has common structure
or data distribution. For client-server database
among files, and accessed via the same interface.
there is another version, and for the DDBS
environment the Enterprise Edition of the DBMS is Multiprocessor system:
provided that of course supports data distribution
among multiple sites, the establishing of link Multiple processors that share some common
between these sites and finally joining/combining memory.
data from multiple sites against a single query. RAM Sharing Tight coupling.
Decentralized database: HDISK Sharing Loose coupling.
A collection of Independent databases on non- Systems simply connected Share Nothing.
networked computers. In this environment the data
at multiple computers is related but these computers Following diagrams explain these architectures:
are not linked, so whenever data has to be accessed
from multiple computers, we need to apply some
means of transferring data across these computers.
In this lecture:
- Resembling Setups
- Advantages/Promises of DDBS
Shared Everything Loose Coupling
Resembling setups:
Shared Nothing
Note:
As has been discussed in the previous lecture, the A user on a local site is called the local user. Data is
data is managed/manipulated at multiple sites in a generated locally and accessed locally but there are
DDBS. There are many different architectures of a situations where you require certain reports for
DDBS; a very general one is given below; this general which the data must be collected from all sites e.g. A
architecture also establishes a picture of the DDBS in bank wants to know how many customers it has
the mind that further helps to understand the having a balance of more than one core. Local
working of a DDBS. control may be desirable in many situations, as it
improves the efficiency in handling/managing the
database. That is, local control is helpful in Schema contains:
maintenance, backup activities, managing local users
What has to be shown to the global user.
accounts etc.
How we are going to set data for a thing on each site.
Note:
The type of the data stored on each site.
We may require global access to the data.
How we are going to merge the data present on
There are two basic reasons for the DDBS different sites.
Environment. To better understand these reasons,
Note:
we need to see the other (than DDBS) alternative,
and that is the centralized database or a client-server Global users are attached to the Distributed DBMS
environment. Taking the example of our Bank layer.
database, if it is a centralized one, it means that the
database is stored at a single place, lets suppose, in Promises of DDBS:
Pakistan they select a geographically central place,
If we adopt a DDBS as a solution for a particular
let it be Multan, then the database is stored in
application, what features we are going to get:
Multan, now users from all over Pakistan, whenever
they want to use their account, the data will be Transparency:
accessed from the central database (in Multan). If it
is a distributed environment, then the Bank can pick A transparent system hides the implementation
two, three, four or more places and each database details from the user. There are different types of
will be storing the data of its surrounding areas. So transparencies, but at this point the focus is on the
the load now is distributed over multiple places. distribution transparence, that means that the global
With this scenario in mind, lets discuss the reasons user will not have any idea that the data that is being
for DDBS: provided to him is actually coming from multiple
sites, rather he will get the feeling, as if the data is
Reduce telecom cost: coming just from the machine that he is using. It is a
very useful feature, as it saves the global user from
With a centralized database, all the users from all
the complexity and details of the distributed access.
over the country will access it remotely and
therefore there would be an increased Data Independence:
telecommunication cost. However, if it is distributed
then for all users the data is closer and it is local Major advantage of the database approach is the
access most of the time. So, it reduces the overall data independence as the program and data are not
telecommunication cost. dependent on each other i.e. we can change the
program with very little or no changes made to the
Reduce the risk of telecom failures: data and vice versa
With a centralized database, all the working In a 3-layer architecture the changes on lower level
throughout the country depends on the link with the has little or no affect on higher level.
central site. If, due to any reason, link fails then
working at all the places will be discontinued. So Logical data independence:
failure at a single site caused damage at all the If we change the conceptual schema there is little or
places. On the other side, if it is distributed, then no effect on the External level.
failure at a single site will disturb only the users of
that site, remaining sites will be working just normal. Physical data independence:
Moreover, one form of data distribution is replication
If we change the physical or lower level then there is
where the data is duplicated at multiple places. This
little or no effect on the conceptual level.
particular form of data distribution, further reduces
the cost of telecommunication failure.
Network transparency: Note :
This is another form of transparency. The user is Full replication is when all the data is stored on
unaware of even the existence of the network, that every site and therefore every access will be local.
frees him from the problems and complexities of
Fragmentation transparency:
network.
A file or a table is broken down into smaller
Replication transparency:
parts/sections called fragments and those fragments
Replication and fragmentation are the two ways to are stored at different locations. The fragmentation
implement a DDBS. In replication same data is stored will be discussed in detail in the later lectures.
on multiple sites example e.g. In case of a bank every However, briefly, a table can be fragmented
branch is holding the data of every other branch. The horizontally (row-wise) or vertically (column-wise).
replication increases the availability of data and Hence, we have two major types of fragmentations,
reduces the risk of telecom failure. In case of horizontal and vertical. Different fragmentations of a
replication, the DDBS hides the replication from the table are placed at different locations. The basic
end user, advantage is that user simply gets the objective of fragmentation and placement at
benefits of the system and does not need to know different places is to maximize the local access and to
the details or to understand the technical details. reduce the remote access since the later causes cost
and delay.
Summary
Fragmentation transparency is that a user should not
In today’s lecture we continued the discussion on
know that the database is fragmented. The concept
distributed systems. We discussed the setups that
of fragmentation should be kept hidden from the
resemble a DDBS and there we studied distributed
user.
file system and multiprocessor systems. In the later
type, we have share everything and share nothing Note:
systems. We then discussed a centralized C/S system
DBA designs the architecture of fragments where as
that is also a very popular architecture for the
once implemented it is managed by DDBMS.
databases. Then we saw different reasons to have a
DDBS, the situations where it suits, we compared it Responsibility of transparency: -
with its alternative and studied why a DDBS is useful
for certain type of applications. Finally, we saw what Transparency is very much desirable since it hides all
advantages we are going to have if we adopt a DDBS the technical details from the users, and makes the
solution. use/access very easy. However, providing
transparency involves cost, the cost that has to be
In previous lecture: bear by someone. More transparency the DDBS
environment provides most cost has to be paid. In
- Resembling Setups
this section, we are discussing that who is going to
- Why to have a DDBS pay the cost of transparency, that is, who is
responsible of providing transparency. There are
- Advantages/Promises of DDBS
three major players in this regard: the
In this lecture: Language/Compiler, Operating System and the
So far, we have discussed the introduction to It is very simple as it has only on structure i.e a
distributed systems and distributed databases in relation or a table. It has a strong mathematical
particular. We should have some idea in mind about foundation.
what a DDBS is and the environment where it suits
The semantic data model, like Object-Oriented data
and pros and cons of the DDBSs. Before moving to
models could not get popularity as commercial
topics related to DDBS in details, let us step a little
choice as they lack the same two features. So on one
back and discuss some background topics. These
side OO data model was a bit difficult to understand
topics include the Relational Data Model and the
due to its complexity and secondly it is not that well
Networks. These are two important underlying
defined due to lack of mathematical support.
topics that will help to have a better understanding
However, semantic data models are heavily used as
of the DDBSs. We start with the Relational Data
the design tool, that is, for the database design,
Model, we will discuss the Networking concepts
specially for conceptual and external schemas, the
later.
semantic data models are used. The reason for this
Data Model: a set of tools or constructs that are choice is that they provide more constructs and a
used to design a database. There are two major database design in semantic data models is more
categories of data models. Record based data models expressive and hence is easy to understand.
that have relatively less constructs and the
Since the RDM is the dominant in the market, so our
identification of records (defining key) is the
background discussion focuses only on this data
responsibility of the user. This type of data models
model.
are Network, Hierarchical and Relational Data
Models. The record based data models are also Relational Data Model
called the Legacy data models. Whereas the
Semantic data models are the ones that have more A data model that is based on the concept of relation
constructs, so they are semantically rich, moreover or table. The concept of relation has been taken from
the identification of records is managed by the the mathematical relation. In the databases, the
system. Examples of such data models are Entity- same relation represented in a two dimensional way
Relationship and Object-Oriented data models. is called table, so relation or table represent the
same thing. The RDM has got three components:
The legacy data models have been and are
commercially successful. That is, the DBMSs that are
based on these data models have been mostly used
Structure support for storage of data and RDM
supports only a single structure and that is a relation
or a table