Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
46 views

Query Processing

The document discusses query processing in databases. It has two main phases - compile time and runtime. At compile time, the query is parsed, validated, optimized, and a low-level execution plan is generated. At runtime, the execution plan is interpreted and executed to return results to the user. Query optimization is important to choose the most efficient execution plan based on factors like CPU usage, disk access time, and number of operations. The optimizer considers different ways a query can be evaluated and picks the lowest cost plan.

Uploaded by

Amanpreet Dutta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Query Processing

The document discusses query processing in databases. It has two main phases - compile time and runtime. At compile time, the query is parsed, validated, optimized, and a low-level execution plan is generated. At runtime, the execution plan is interpreted and executed to return results to the user. Query optimization is important to choose the most efficient execution plan based on factors like CPU usage, disk access time, and number of operations. The optimizer considers different ways a query can be evaluated and picks the lowest cost plan.

Uploaded by

Amanpreet Dutta
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Query Processing

Query processing is the way Database Management System (DBMS) parses, validates and


optimizes the given query before generating a low-level code understood by the DB. Like any
other High-Level Languages(HLL) where code is first compiled and then executed to
perform different actions, Query Processing in DBMS also comprises of two phases, viz,
compile-time and runtime.

With SQL, operations on DBMS become a lot easier, structured, and systematic. These are
not just easier for the users to understand but with the basic understanding of them, they tend
to become more and more intuitive. In fact, SQL as a DBMS has evolved a lot. It even
optimizes the queries for us and finds out the best way to execute it.

One of the types of DBMS is Relational Database Management System (RDBMS) where


data is stored in the form of rows and columns (in other words, stored in tables) which have
intuitive associations with each other. The users (both applications and humans) have the
liberty to select, insert, update and delete these rows and columns without violating the
constraints provided at the time of defining these relational tables. Let’s say you want the list
of all the employees who have a salary of more than 10,000.

SELECT
emp_name
FROM
employee
WHERE
salary>10000;

The problem here is that DBMS won't understand this statement. So for that, we have SQL
(Structured Query Language) queries. SQL being a High-Level Language makes it easier
not just for the users to query data based on their needs but also bridges the communication
gap between the DBMS which does not really understand human language. In fact, the
underlying system of DBMS won't even understand these SQL queries. For them to
understand and execute a query, they first need to be converted to a Low-Level Language.
The SQL queries go through a processing unit that converts them into low-level Language via
Relational Algebra. Since relational algebra queries are a bit more complex than SQL
queries, DBMS expects the user to write only the SQL queries. It then processes the query
before evaluating it.
As mentioned in the above image, query processing can be divided into compile-
time and run-time phases. Compile-time phase includes:

1. Parsing and Translation (Query Compilation)


2. Query Optimization
3. Evaluation (code generation)

In the Runtime phase, the database engine is primarily responsible for interpreting and
executing the hence generated query with physical operators and delivering the query output.
A note here that as soon as any of the above stages encounters an error, they simply throw the
error and return without going any further (since warnings are not fatal/terminal, that is not
the case with warnings).

Parsing and Translation

The first step in query processing is Parsing and Translation. The fired queries undergo
lexical, syntactic, and semantic analysis. Essentially, the query gets broken down into
different tokens and white spaces are removed along with the comments (Lexical Analysis).
In the next step, the query gets checked for the correctness, both syntax and semantic wise.
The query processor first checks the query if the rules of SQL have been correctly followed
or not (Syntactic Analysis).

Finally, the query processor checks if the meaning of the query is right or not. Things like if
the table(s) mentioned in the query are present in the DB or not? if the column(s) referred
from all the table(s) are actually present in them or not? (Semantic Analysis)

Once the above mentioned checks pass, the flow moves to convert all the tokens into
relational expressions, graphs, and trees. This makes the processing of the query easier for the
other parsers.

Let's consider the same query (mentioned below as well) as an example and see how the flow
works.

Query:

SELECT
emp_name
FROM
employee
WHERE
salary>10000;

The above query would be divided into the following


tokens: SELECT, emp_name, FROM, employee, WHERE, salary, >, 10000.

The tokens (and hence the query) get validated for

 The name of the queried table is looked into the data dictionary table.
 The name of the columns mentioned (emp_name and salary) in the tokens are
validated for existence.
 The type of column(s) being compared have to be of the same type (salary and the
value 10000 should have the same data type).

The next step is to translate the generated set of tokens into a relational algebra query. These
are easy to handle for the optimizer in further processes.

∏ emp_name (σ salary > 10000)


Relational graphs and trees can also be generated but for the sake of simplicity, let's keep
them out of the scope for now.

Query Evaluation

Once the query processor has the above-mentioned relational forms with it, the next step is to
apply certain rules and algorithms to generate a few other powerful and efficient data
structures. These data structures help in constructing the query evaluation plans. For example,
if the relational graph was constructed, there could be multiple paths from source to
destination. A query execution plan will be generated for each of the paths.

As you can see in the above possible graphs, one way could be first projecting followed by
selection (on the right). Another way would be to do selection followed by projection (on the
left). The above sample query is kept simple and straightforward to ensure better
comprehension but in the case of joins and views, more such paths (evaluation plans) start to
open up. The evaluation plans may also include different annotations referring to the
algorithm(s) to be used. Relational Algebra which has annotations of these sorts is known
as Evaluation Primitives. You might have figured out by now that these evaluation
primitives are very essential and play an important role as they define the sequence of
operations to be performed for a given plan.

Query Optimization

In the next step, DMBS picks up the most efficient evaluation plan based on the cost each
plan has. The aim here is to minimize the query evaluation time. The optimizer also evaluates
the usage of index present in the table and the columns being used. It also finds out the best
order of subqueries to be executed so as to ensure only the best of the plans gets executed.

Simply put, for any query, there are multiple evaluation plans to execute it. Choosing the one
which costs the least is called Query Optimization. Some of the factors weighed in by the
optimizer to calculate the cost of a query evaluation plan is:

 CPU time
 Number of tuples to be scanned
 Disk access time
 number of operations
Query Optimization in DBMS
The query optimizer (also known as the optimizer) is database software that identifies the
most efficient way (like by reducing time) for a SQL statement to access data

Introduction to Query Optimization in DBMS

The process of selecting an efficient execution plan for processing a query is known as query
optimization.

Following query parsing which is a process by which this decision making is done that for a
given query, calculating how many different ways there are in which the query can run, then
the parsed query is delivered to the query optimizer, which generates various execution plans
to analyze the parsed query and select the plan with the lowest estimated cost. The catalog
manager assists the optimizer in selecting the optimum plan to perform the query by
generating the cost of each plan.

Query optimization is used to access and modify the database in the most efficient way
possible. It is the art of obtaining necessary information in a predictable, reliable, and timely
manner. Query optimization is formally described as the process of transforming a query into
an equivalent form that may be evaluated more efficiently. The goal of query optimization is
to find an execution plan that reduces the time required to process a query. We must complete
two major tasks to attain this optimization target.

The first is to determine the optimal plan to access the database, and the second is to reduce
the time required to execute the query plan.

Purpose of the Query Optimizer in DBMS

The optimizer tries to come up with the best execution plan possible for a SQL statement.

Among all the candidate plans reviewed, the optimizer chooses the plan with the lowest cost.
The optimizer computes costs based on available facts. The cost computation takes into
account query execution factors such as I/O, CPU, and communication for a certain query in
a given context.

Sr. No Class Name Role


01 10 Shreya CR
02 10 Ritik

For example, there is a query that requests information about students who are in leadership
roles, such as being a class representative. If the optimizer statistics show that 50% of
students are in positions of leadership, the optimizer may decide that a full table search is the
most efficient. However, if data show that just a small number of students are in positions of
leadership, reading an index followed by table access by row id may be more efficient than a
full table scan.

Because the database has so many internal statistics and tools at its disposal, the optimizer is
frequently in a better position than the user to decide the best way to execute a statement. As
a result, the optimizer is used by all SQL statements.

Cost-Based Query Optimization in DBMS

Query optimization is the process of selecting the most efficient way to execute a SQL
statement. Because SQL is a nonprocedural language, the optimizer can merge, restructure,
and process data in any sequence.

The Optimizer allocates a cost in numerical form for each step of a feasible plan for a given
query and environment, and then discovers these values together to get a cost estimate for the
plan or possible strategy. The Optimizer aims to find the plan with the lowest cost estimate
after evaluating the costs of all feasible plans. As a result, the Optimizer is sometimes known
as the Cost-Based Optimizer.

 Execution Plans:

An execution plan specifies the best way to execute a SQL statement.

The plan describes the steps taken by Oracle Database to execute a SQL statement. Each step
physically retrieves or prepares rows of data from the database for the statement's user.

An execution plan shows the total cost of the plan, which is stated on line 0, as well as the
cost of each individual operation. A cost is an internal unit that appears solely in the
execution plan to allow for plan comparisons. As a result, the cost value cannot be fine-tuned
or adjusted.

 Query Blocks The optimizer receives a parsed representation of a SQL statement as


input. Each SELECT block in the original SQL statement is internally represented by
a query block. A query block can be a statement at the top level, a subquery, or an
unmerged view. Let’s take an example where the SQL statement that follows is made
up of two query sections. The inner query block is the subquery in parentheses. The
remainder of the outer query block of the SQL statement obtains the names of
employees in the departments whose IDs were supplied by the subquery. The query
form specifies how query blocks are connected.

SELECT first_name, last_name


FROM hr.employees
WHERE department_id
IN (SELECT department_id
FROM hr.departments
WHERE location_id = 1800);

 Query Sub Plans

The optimizer creates a query sub-plan for each query block.

From the bottom up, the database optimizes query blocks separately. As a result, the database
optimizes the innermost query block first, generating a sub-plan for it, before generating the
outer query block, which represents the full query.

The number of query block plans is proportional to the number of items in the FROM clause.
As the number of objects rises, this number climbs exponentially. The possibilities for a join
of five tables, for example, are far higher than those for a connection of two tables.

 Analogy for the Optimizer

An online trip counselor is one analogy for the optimizer.

A biker wishes to find the most efficient bicycle path from point A to point B. A query is
analogous to the phrase "I need the quickest route from point A to point B" or "I need the
quickest route from point A to point B via point C". To choose the most efficient route, the
trip advisor employs an internal algorithm that takes into account factors such as speed and
difficulty. The biker can sway the trip advisor's judgment by saying things like "I want to
arrive as quickly as possible" or "I want the simplest route possible.”

In this example, an execution plan is a possible path generated by the travel advisor.
Internally, the advisor may divide the overall route into multiple subroutes (sub plans) and
compute the efficiency of each subroute separately. For example, the trip advisor may
estimate one subroute to take 15 minutes and be of medium difficulty, another subroute to
take 22 minutes and be of low difficulty, and so on.

Based on the user-specified goals and accessible facts about roads and traffic conditions, the
advisor selects the most efficient (lowest cost) overall route. The better the guidance, the
more accurate the statistics. For example, if the advisor is not kept up to date on traffic
delays, road closures, and poor road conditions, the proposed route may prove inefficient
(high cost).

Heuristics query optimization technique:

Heuristics are used to reduce the number of choices that must be made in a cost-based
approach.

Rules

Heuristic optimization transforms the expression-tree by using a set of rules which improve
the performance. These rules are as follows −
 Perform the SELECTION process foremost in the query. This should be the
first action for any SQL table. By doing so, we can decrease the number of
records required in the query, rather than using all the tables during the query.
 Perform all the projection as soon as achievable in the query. Somewhat like a
selection but this method helps in decreasing the number of columns in the
query.
 Perform the most restrictive joins and selection operations. What this means is
that select only those sets of tables and/or views which will result in a
relatively lesser number of records and are extremely necessary in the query.
Obviously any query will execute better when tables with few records are
joined.
Some systems use only heuristics and the others combine heuristics with partial cost-based
optimization.

Steps in heuristic optimization

Let’s see the steps involve in heuristic optimization, which are explained below −
 Deconstruct the conjunctive selections into a sequence of single selection
operations.
 Move the selection operations down the query tree for the earliest possible
execution.
 First execute those selections and join operations which will produce smallest
relations.
 Replace the cartesian product operation followed by selection operation with
join operation.
 Deconstructive and move the tree down as far as possible.
 Identify those subtrees whose operations are pipelined.

Functional Dependency in DBMS

Functional Dependency is the relationship between attributes(characteristics) of a table


related to each other. The functional dependency of A on B is represented by A → B,
where A and B are the attributes of the relation.

What is Functional Dependency?

Relational database is a collection of data stored in rows and columns. Columns represent


the characteristic of data while each row in a table represents a set of related data, and every
row in the table has the same structure. The row is sometimes referred to as a tuple.

Have a look at the Employee table below. It contains attributes as column values, namely

1. Employee_Id
2. Employee_Name
3. Employee_Department
4. Salary
Employee Table
Employee_Id Employee_Name Employee_Department

1 Ryan Mechanical

2 Justin Biotechnology

3 Andrew Computer Science

4 Felix Human Resource

Now that we are clear with the jargon related to functional dependency, let's discuss what
functional dependency is.

 Functional Dependency in DBMS, as the name suggests it is the relationship


between attributes(characteristics) of a table related to each other.
 A relation consisting of functional dependencies always follows a set of rules
called RAT rules. They are proposed by William Armstrong in 1974.
 It helps in maintaining the quality of data in the database, and the core concepts
behind database normalization are based on functional dependencies.

How to denote a Functional Dependency?

A functional dependency is denoted by an arrow “→”. The functional dependency


of A on B is represented by A → B.

Consider a relation with four attributes A, B, C and D,

R (ABCD)

1. A → BCD
2. B → CD

 For the first functional dependency A → BCD, attributes B, C and D are functionally


dependent on attribute A.
 Function dependency B → CD has two attributes C and D functionally depending
upon attribute B.

Sometimes everything on the left side of functional dependency is also referred to


as determinant set, while everything on the right side is referred to as depending attributes.

 Functional dependency can also be represented diagrammatically like this,


 Pointing arrows determines the depending attribute and the origin of the arrow
determines the determinant set.

Armstrong’s Axioms/Properties of Functional Dependency

William Armstrong in 1974 suggested a few rules related to functional dependency. They are
called RAT rules.

1. Reflexivity: If A is a set of attributes and B is a subset of A, then the functional
dependency A → B holds true.
o For example, { Employee_Id, Name } → Name is valid.

2. Augmentation: If a functional dependency A → B holds true, then appending any


number of the attribute to both sides of dependency doesn't affect the dependency. It
remains true.
o For example, X → Y holds true then, ZX → ZY also holds true.
o For example, if { Employee_Id, Name } → { Name } holds true
then, { Employee_Id, Name, Age } → { Name, Age }

3. Transitivity: If two functional dependencies X → Y and Y → Z hold true, then X →


Z also holds true by the rule of Transitivity.
o For example, if { Employee_Id } → { Name } holds true and { Name } →
{ Department } holds true, then { Employee_Id } → { Department } also
holds true.

Types of Functional Dependencies

1. Trivial functional dependency


2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency

Trivial Functional Dependency

 In Trivial functional dependency, a dependent is always a subset of the


determinant. In other words, a functional dependency is called trivial if the attributes
on the right side are the subset of the attributes on the left side of the functional
dependency.
 X → Y is called a trivial functional dependency if Y is the subset of X.
 For example, consider the Employee table below.

Employee_Id Name Age


1 Zayn 24
2 Phobe 34
3 Hikki 26
4 David 29

 Here, { Employee_Id, Name } → { Name } is a Trivial functional dependency, since


the dependent Name is the subset of determinant { Employee_Id, Name }.
 { Employee_Id } → { Employee_Id }, { Name } → { Name } and { Age } →
{ Age } are also Trivial.

Non-Trivial Functional Dependency

 It is the opposite of Trivial functional dependency. Formally speaking, in Non-Trivial


functional dependency, dependent if not a subset of the determinant.
 X → Y is called a Non-trivial functional dependency if Y is not a subset of X. So, a
functional dependency X → Y where X is a set of attributes and Y is also a set of the
attribute but not a subset of X, then it is called Non-trivial functional dependency.
 For example, consider the Employee table below.

Employee_Id Name Age


1 Zayn 24
2 Phobe 34
3 Hikki 26
4 David 29

 Here, { Employee_Id } → { Name } is a non-trivial functional dependency


because Name(dependent) is not a subset of Employee_Id(determinant).
 Similarly, { Employee_Id, Name } → { Age } is also a non-trivial functional
dependency.

Multivalued Functional Dependency

 In Multivalued functional dependency, attributes in the dependent set are not


dependent on each other.
 For example, X → { Y, Z }, if there exists is no functional dependency between Y
and Z, then it is called as Multivalued functional dependency.
 For example, consider the Employee table below.

Employee_Id Name Age


1 Zayn 24
2 Phobe 34
3 Hikki 26
4 David 29
4 Phobe 24
 Here, { Employee_Id } → { Name, Age } is a Multivalued functional dependency,
since the dependent attributes Name, Age are not functionally dependent(i.e. Name
→ Age or Age → Name doesn’t exist !).

Transitive Functional Dependency

 Consider two functional dependencies A → B and B → C then according to


the transitivity axiom A → C must also exist. This is called a transitive functional
dependency.
 In other words, dependent is indirectly dependent on determinant in Transitive
functional dependency.
 For example, consider the Employee table below.

Employee_Id Name Department Street Number


1 Zayn CD 11
2 Phobe AB 24
3 Hikki CD 11
4 David PQ 71
5 Phobe LM 21

 Here, { Employee_Id → Department } and { Department → Street


Number } holds true. Hence, according to the axiom of transitivity, { Employee_Id
→ Street Number } is a valid functional dependency.

Advantages of Functional Dependency

Let's discuss some of the advantages of Functional dependency,

1. It is used to maintain the quality of data in the database.


2. It expresses the facts about the database design.
3. It helps in clearly defining the meanings and constraints of databases.
4. It helps to identify bad designs.
5. Functional Dependency removes data redundancy where the same values should not
be repeated at multiple locations in the same database table.
6. The process of Normalization starts with identifying the candidate keys in the
relation. Without functional dependency, it's impossible to find candidate keys
and normalize the database.

Decomposition in DBMS

Decomposition in Database Management System is to break a relation into multiple


relations to bring it into an appropriate normal form. It helps to
remove redundancy, inconsistencies, and anomalies from a database. The decomposition of a
relation R in a relational schema is the process of replacing the original relation R with two or
more relations in a relational schema. Each of these relations contains a subset of the
attributes of R and together they include all attributes of R.
If a relation is not properly decomposed, then it may lead to other problems like information
loss, etc. There are two types of decomposition as shown below:

Rules for Decomposition

Whenever we decompose a relation, there are certain properties that must be satisfied to
ensure no information is lost while decomposing the relations. These properties are:

1. Lossless Join Decomposition.


2. Dependency Preserving.

Lossless Join Decomposition

A lossless Join decomposition ensures two things:

 No information is lost while decomposing from the original relation.


 If we join back the sub decomposed relations, the same relation that was decomposed
is obtained.

We can follow certain rules to ensure that the decomposition is a lossless join
decomposition Let’s say we have a relation R and we decomposed it into R1 and R2, then the
rules are:

1. The union of attributes of both the sub relations R1 and R2 must contain all the
attributes of original relation R.

R1 ∪ R2 = R

2. The intersection of attributes of both the sub relations R1 and R2 must not be null,
i.e., there should be some attributes that are present in both R1 and R2.

R1 ∩ R2 ≠ ∅

3. The intersection of attributes of both the sub relations R1 and R2 must be the
superkey of R1 or R2, or both R1 and R2.

R1 ∩ R2 = Super key of R1 or R2
Let’s see an example of a lossless join decomposition. Suppose we have the following
relation EmployeeProjectDetail as:

<EmployeeProjectDetail>

Employee_Cod Employee_Nam Employee_Email Project_Nam Project_I


e e e D
101 John john@demo.com Project103 P03
101 John john@demo.com Project101 P01
102 Ryan ryan@example.co Project102 P02
m
103 Stephanie stephanie@abc.co Project102 P02
m

Now, we decompose this relation into EmployeeProject and ProjectDetail relations as:

<EmployeeProject>

Employee_Code Project_ID Employee_Name Employee_Email


101 P03 John john@demo.com
101 P01 John john@demo.com
102 P04 Ryan ryan@example.com
103 P02 Stephanie stephanie@abc.com

The primary key of the above relation is {Employee_Code, Project_ID}.

<ProjectDetail>

Project_ID Project_Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102

The primary key of the above relation is {Project_ID}.

Now, let’s see if this is a lossless join decomposition by evaluating the rules discussed above:

Let’s first check the EmployeeProject ∪ ProjectDetail:

<EmployeeProject ∪ ProjectDetail>

Employee_Code Project_ID Employee_Name Employee_Email Project_Name


101 P03 John john@demo.com Project103
101 P01 John john@demo.com Project101
102 P04 Ryan ryan@example.com Project104
103 P02 Stephanie stephanie@abc.com Project102
As we can see all the attributes of EmployeeProject and ProjectDetail are in
EmployeeProject ∪ ProjectDetail relation and it is the same as the original relation. So
the first condition holds.

Now let’s check the EmployeeProject ∩ ProjectDetail:

<EmployeeProject ∩ ProjectDetail>

Project_ID
P03
P01
P04
P02

As we can see this is not null, so the the second condition holds as well. Also
the EmployeeProject ∩ ProjectDetail = Project_Id. This is the super key of the ProjectDetail
relation, so the third condition holds as well.

Now, since all three conditions hold for our decomposition, this is a lossless join
decomposition.

Dependency Preserving

The second property of lossless decomposition is dependency preservation which says that
after decomposing a relation R into R1 and R2, all dependencies of the original relation R
must be present either in R1 or R2 or they must be derivable using the combination of
functional dependencies present in R1 and R2.

Let’s understand this from the same example above:

<EmployeeProjectDetail>

Employee_Code Employee_Name Employee_Email Project_Name Project_ID


101 John john@demo.com Project103 P03
101 John john@demo.com Project101 P01
102 Ryan ryan@example.com Project104 P04
103 Stephanie stephanie@abc.com Project102 P02

In this relation we have the following FDs:

 Employee_Code -> {Employee_Name, Employee_Email}


 Project_ID - > Project_Name

Now, after decomposing the relation into EmployeeProject and ProjectDetail as:

<EmployeeProject>

Employee_Code Project_ID Employee_Name Employee_Email


101 P03 John john@demo.com
101 P01 John john@demo.com
102 P04 Ryan ryan@example.com
103 P02 Stephanie stephanie@abc.com

In this relation we have the following FDs:

 Employee_Code -> {Employee_Name, Employee_Email}

<ProjectDetail>

Project_ID Project_Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102

In this relation we have the following FDs:

 Project_ID - > Project_Name

As we can see that all FDs in EmployeeProjectDetail are either part of the EmployeeProject
or the ProjectDetail, So this decomposition is dependency preserving.

Distributed Databases in DBMS

A distributed database is a database that is not limited to one computer system.


It is like a database that consists of two or more files located in different
computers or sites either on the same network or on an entirely different
network. Instead of storing all of the data in one database, data is divided and
stored at different locations or sites which do not share any physical
component.

Need of Distributed Database


Let's start with the databases and their types,

 A database is an structured collection of information. The data can be easily accessed,


managed, modified, updated, controlled, and organized in a database.
 Databases can be broadly classified into two types,
namely Distributed and Centralized databases. The question here is why do we even
need a distributed database?. Let's assume for a moment that we have only centralized
databases.
o We will be inserting all the data into one single database. Making it too large
so that it will take a lot of time to query a single piece of record.
o Once a fault occurs, we no longer be able to serve user requests as we have
only one database.
o No scaling is possible even if we wanted to and availability is also less which
in turn affects the throughput.

Distributed databases resolve various issues, such as availability, fault


tolerance, throughput, latency, scalability, and many other problems that can arise from using
a single machine and a single database. That's why we need distributed databases. Let's
discuss them in detail.

Distributed Databases
 A distributed database is a database that is not limited to one computer system. It is
like a database that consists of two or more files located in different computers or
sites either on the same network or on an entirely different network.
 These sites do not share any physical component. Distributed databases are needed
when a particular data in the database needs to be accessed by various users globally.
It needs to be handled in such a way that for a user it always looks like one single
database.
 By contrast, a Centralized database consists of a single database file located at one
site using a single network.

 Though there are many distributed databases to choose from, some examples of
distributed databases include Apache Ignite, Apache Cassandra, Apache
HBase, Amazon SimpleDB, Clusterpoint, and FoundationDB.

Features of Distributed Databases


In general, distributed databases include the following features:

1. Location independency: Data is independently stored at multiple sites and managed


by independent Distributed database management systems (DDBMS).
2. Network linking: All distributed databases in a collection are linked by a network and
communicate with each other.
3. Distributed query processing: Distributed query processing is the procedure of
answering queries (which means mainly read operations on large data sets) in a
distributed environment.
o Query processing involves the transformation of a high-level
query (e.g., formulated in SQL) into a query execution plan (consisting of
lower-level query operators in some variation of relational algebra) as well as
the execution of this plan.
4. Hardware independent: The different sites where data is stored are hardware-
independent. There is no physical contact between these distributed databases which
is accomplished often through virtualization.
5. Distributed transaction management: Distributed database provides a consistent
distribution through commit protocols, distributed recovery methods, and distributed
concurrency control techniques in case of many transaction failures.
Advantages of Distributed Database
1. Better Reliability: Distributed databases offers better reliability than centralized
databases. When database failure occurs in a centralized database, the system comes
to a complete stop. But in the case of distributed databases, the system functions even
when a failure occurs, only performance-related issues occur which are negotiable.
2. Modular Development: It implies that the system can be expanded by adding new
computers and local data to the new site and connecting them to the distributed
system without interruption.
3. Lower Communication Cost: Locally storing data reduces communication costs for
data manipulation in distributed databases. In centralized databases, local storage is
not possible.
4. Better Response Time: As the data is distributed efficiently in distributed databases,
this provides a better response time when user queries are met locally. While in the
case of centralized databases, all of the queries have to pass through the central
machine which increases response time.

Disadvantages of Distributed Database


1. Costly Software: Maintaining a distributed database is costly because we need to
ensure data transparency, coordination across multiple sites which requires costly
software.
2. Large Overhead: Many operations on multiple sites require complex and numerous
calculations, causing a lot of processing overhead.
3. Improper Data Distribution: If data is not properly distributed across different sites,
then responsiveness to user requests is affected. This in turn increases the response
time.

You might also like