Introduction To Parallel Databases

Introduction to Parallel Databases
 Parallel database system improves performance of data processing
using multiple resources in parallel, like multiple CPU and disks are
used parallely.
 It also perform many parallelization operations like, data loading and
query processing.
 Parallel processing divides a large task into many smaller tasks, and
executes the smaller tasks concurrently on several nodes. As a result,
the larger task completes more quickly.
 Node is a separate processor
 Multiple processors can reside on a single/Multiple machine.
Goals of Parallel Databases
 Improve performance:
 The performance of the system can be
improved by connecting multiple CPU and
disks in parallel.
 Improve availability of data:

 Data can be copied to multiple locations to
improve the availability of data.
 Improve reliability:
 Reliability of system is improved with
completeness, accuracy and availability of
data.
 Provide distributed access of data:

 Companies having many branches in
multiple cities can access data with the help
of parallel database system.
Parallelism
 Divide a big problem into many smaller ones to be solved in parallel.
Inter Query (Independent parallelism) – Execution of different queries
individually in different processors only if they can be executed independent
of each other. (Joining of Tables -4)
Intra query parallelism – Same query executed by different processors and it
may be either inter operation or intra operation
Inter operation (Pipe-lined) parallelism – Execution of different operations in
in different processor. (Joining of Tables - 3)
Intra operation parallelism – Execution of same operations in parallel by
multiple processors.
For example, ORDER BY clause of a query that tries to execute on millions of
records can be parallelized on multiple processors.
Techniques of Query Evaluation
The two techniques used in query evaluation are as follows:
1. Inter query parallelism

 This technique allows to run multiple queries on different processors
simultaneously.
 Pipelined parallelism is achieved by using inter query parallelism, which improves
the output of the system.
 For example: If there are 6 queries, each query will take 3 seconds for evaluation.
Thus, the total time taken to complete evaluation process is 18 seconds. Inter
query parallelism achieves this task only in 3 seconds.
However, Inter query parallelism is difficult to achieve every time.

2.Intra Query Parallelism
 Query is divided into sub queries which can run simultaneously on different
processors, this will minimize the query evaluation time.
 Intra query parallelism improves the response time of the system.
 For Example: If we have 6 queries, which can take 3 seconds to complete the
evaluation process, the total time to complete the evaluation process is 18
seconds.
 But We can achieve this task in only 3 seconds by using intra query evaluation
as each query is divided in sub-queries.
Optimization of Parallel Query
 Parallel Query optimization is nothing but selecting the efficient query

evaluation plan.
 Parallel Query optimization plays an important role in developing system
to minimize the cost of query evaluation.
 Two factors play a very important in parallel query optimization.
 total time spent to find the best plan.

 amount of time required to execute the plan.
Goals of Query Optimization
 Query Optimization is done with an aim to Speed up the queries by finding
the queries which can give the fastest result on execution.
 Increase the performance of the system.
 Select the best query evaluation plan.
 Avoid the unwanted plan.
Approaches of Query Optimization.

Three approaches to Query Optimization:
1. Horizontal partitioning: Tables are created vertically using columns.

2. Vertical partitioning: Tables are created with fewer columns and partition
the table row wise.
3. De-normalization: In this approach multiple tables are combined into one
table.
Vertical
Horizontal
 In parallel computing multiple processors performs multiple tasks assigned to them
simultaneously.
 A parallel database running multiple instances which "share" a single physical
database.
Types of Parallel Database Architecture
Shared memory system

Sequent, SGI, Sun
 Shared memory system uses multiple processors which is attached to a global
shared memory via intercommunication channel or communication bus.
 Shared memory system have large amount of cache memory at each processors, so
referencing of the shared memory is avoided.
 If a processor performs a write operation to memory location, the data should be
updated or removed from that location.
Advantages of SM
 Data is easily accessible to any processor.
 One processor can send message to other efficiently.
 Establishes effective communication between processors through
single memory addresses space.
 It leads to less communication overhead.

Disadvantages
1.Addition of processor would slow down the existing processors.
2.Cache-coherency should be maintained. That is, if any processor tries to read

the data used or modified by other processors, then we need to ensure that the
data is of latest version.
3. Degree of Parallelism is limited. More number of parallel processes might

degrade the performance.
Shared Disk System
VMScluster, Sysplex
 Shared disk system uses multiple processors which are
accessible to multiple disks via intercommunication
channel and every processor has local memory.
 Each processor has its own memory so the data sharing
is efficient.
 The system built around this system are called as
clusters.
Advantages of Shared Disk System
 Fault tolerance is achieved using shared disk system.

 Fault tolerance: If a processor or its memory fails, the other processor can
complete the task. This is called as fault tolerance.
 Interconnection to the memory is not a bottleneck.
(It was bottleneck in Shared Memory architecture)
 Supports larger number of processors (when compared to Shared Memory
architecture)
Disadvantage of Shared Disk System
 Shared disk system has limited scalability as large amount of data travels
through the interconnection channel.
 If more processors are added the existing processors are slowed down.
 Inter-processor communication is slow. The reason is, all the processors have
their own memory.
Shared nothing disk system
Teradata, Tandem, SP2
 Each processor in the shared nothing system has its own local
memory and local disk.
 Processors can communicate with each other through
intercommunication channel.
 Any processor can act as a server to serve the data which is stored
on local disk.
Advantages of Shared nothing disk system
 Number of processors and disk can be connected as per the requirement in
share nothing disk system.
 Shared nothing disk system can support for many processor, which makes the
system more scalable.
 Unlike in other two architectures, only the data request which cannot be
answered by local processors need to be forwarded through interconnection
network.
Disadvantages of Shared nothing disk system
 Data partitioning is required in shared nothing disk system.
 Cost of communication for accessing local disk is much higher.
 Non-local disk accesses are costly. That is, if one server receives the request. If
the required data not available, it must be routed to the server where the data
is available. It is slightly complex.
Hierarchical System or Non-Uniform Memory Architecture
 Hierarchical model system is a hybrid of shared memory system, shared disk
system and shared nothing system.
 Hierarchical model is also known as Non-Uniform Memory Architecture
(NUMA).
 In this system each group of processor has a local memory. But processors
from other groups can access memory which is associated with the other
group in coherent.
 NUMA uses local and remote memory(Memory from other group), hence it
will take longer time to communicate with each other.
Advantages of NUMA
 Improves the scalability of the system.
 Memory bottleneck(shortage of memory) problem is minimized in this
architecture.
Disadvantages of NUMA
 The cost of the architecture is higher compared to other architectures.

Single Instance with Exclusive Access
A single instance database system

A Multi instance database system
running on a symmetric running on a symmetric multiprocessor
multiprocessor (SMP). (SMP).

The database itself is located on a set of
The database itself is located on a
disks.
set of disks.
Distributed Database System
 A distributed database is basically a database that is not limited to one system, it
is spread over different sites, i.e, on multiple computers or over a network of
computers.
 A distributed database system is located on various sites that don’t share physical
components.
 This maybe required when a particular database needs to be accessed by various
users globally.
 It needs to be managed such that for the users it looks like single database.
distributed
Oracle Parallel Server as Part of a

Distributed Database
Client server
S.NO PARALLEL COMPUTING DISTRIBUTED COMPUTING
1. Many operations are performed System components are located at

simultaneously different locations
2. Single computer is required Uses multiple computers
3. Multiple processors perform Multiple computers perform multiple

multiple operations operations
4. It may have shared or distributed It have only distributed memory

memory
5. Processors communicate with Computer communicate with each

each other through bus other through message passing.
6. Improves the system Improves system scalability, fault

tolerance and resource sharing
performance
capabilities

Introduction To Parallel Databases

Uploaded by

Copyright:

Available Formats

Introduction To Parallel Databases

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Parallel Databases

Uploaded by

Copyright:

Available Formats

Introduction to Parallel Databases

 Parallel database system improves performance of data processing

 It also perform many parallelization operations like, data loading and

 Improve availability of data:

 Provide distributed access of data:

The two techniques used in query evaluation are as follows:

1. Inter query parallelism

However, Inter query parallelism is difficult to achieve every time.

 Parallel Query optimization is nothing but selecting the efficient query

 total time spent to find the best plan.

Approaches of Query Optimization.

1. Horizontal partitioning: Tables are created vertically using columns.

 A parallel database running multiple instances which "share" a single physical

Shared memory system

 Data is easily accessible to any processor.

 One processor can send message to other efficiently.

 Establishes effective communication between processors through

single memory addresses space.

 It leads to less communication overhead.

1.Addition of processor would slow down the existing processors.

2.Cache-coherency should be maintained. That is, if any processor tries to read

3. Degree of Parallelism is limited. More number of parallel processes might

 Fault tolerance is achieved using shared disk system.

 Improves the scalability of the system.

 Memory bottleneck(shortage of memory) problem is minimized in this

 The cost of the architecture is higher compared to other architectures.

A single instance database system

multiprocessor (SMP). (SMP).

is spread over different sites, i.e, on multiple computers or over a network of

 This maybe required when a particular database needs to be accessed by various

Oracle Parallel Server as Part of a

1. Many operations are performed System components are located at

2. Single computer is required Uses multiple computers

3. Multiple processors perform Multiple computers perform multiple

4. It may have shared or distributed It have only distributed memory

5. Processors communicate with Computer communicate with each

6. Improves the system Improves system scalability, fault

You might also like