Parallel and Cluster Computing
Parallel and Cluster Computing
Parallel and Cluster Computing
Advantages:
Increasing speed & optimizing resources utilization
Disadvantages:
Complex programming models – difficult development
3
Several applications on parallel processing:
4
Architectures of Parallel Computer:
PVP (Parallel Vector Processor)
SMP (Symmetric Multiprocessor)
MPP (Massively Parallel Processor)
COW (Cluster of Workstation)
DSM (Distributed Shared Memory)
Towards Inexpensive Supercomputing: Cluster Computing is the
Commodity Supercomputing 58.8%
Cluster 58.80%
MPP 20.00%
5
Other
A computer cluster is a group of linked computers, working
together closely so that in many respects they form a single
computer. The components of a cluster are commonly, but not
always, connected to each other through fast local area networks.
Clusters are usually deployed to improve performance and/or
availability over that provided by a single computer, while typically
being much more cost-effective than single computers of comparable
speed or availability.
Cluster
High availability clusters Network Load
Science Clusters
(HA) (Linux) balancing clusters
operate by
Mission critical distributing a
applications workload evenly over
multiple back end
nodes.
High-availability clusters
(also known as Failover
Typically the cluster
Clusters) are implemented
will be configured
for the purpose of
with multiple
improving the availability
redundant load- Beowulf
of services which the
balancing front ends.
cluster provides.
Master: or service node or front node ( used to interact with users and
manage the cluster )
Nodes : a group of computers (computing node
s)( keyboard, mouse, floppy, video…)
Communications between nodes on an interconnect network platform
( Ethernet, Myrinet….)
In order for the master and node computers to communicate, some sort 9
message passing control structure is required. MPI,(Message Passing
Interface) is the most commonly used such control.
To construct Beowulf cluster there are four distinct but
interrelated areas of consideration:
resource
Hard ware system administration and
structure management
environment
Distributed
Programming Parallel algorithms
libraries and tools
10
Brief Technical Parameters:
• Maximum bandwidth
• Minimum latency
13
Parallel Environment
Two of the most commonly used Parallel Interface Libraries:
o PVM (Parallel Virtual Machine)
o MPI (Message Passing Interface)
Parallel Interface Libraries: provide a group of communication
interface libraries that support message passing. Users can call these
libraries directly in their Fortran and C programs.
14
20
What is Bioinformatics:
Also called “biomedical computing”. The application of
computer science and technology to problems in the
biomolecular sciences.
Cluster Uses:
The Beowulf cluster computing design is been used by
parallel processing computer systems projects to build a
powerful computer that could assist in Bioinformatics
research and data analysis.
In bioinformatics Clusters are used to run DNA string
matching algorithms or to run protein folding applications.
It also use a computer algorithm known as BLAST,(Basic
Local Alignment Search Tool), to analyze massive sets of
DNA sequences for research into Bioinformatics.
21
For Bioinfomatics MPICH2 is used which is an
implementation of MPI that was specifically
designed for use with cluster computing systems
and parallel processing. It is an open source
set of libraries for various high level
programming languages that give programmers
tools to easily control how large problems are
broken apart and distributed to the various
computers in a cluster.
22
Protein folding and how is folding linked to disease?
Proteins are biology's workhorses -- its "nanomachines." Before
proteins can carry out these important functions, they assemble
themselves, or "fold." The process of protein folding, while critical
and fundamental to virtually all of biology, in many ways remains
a mystery.
when proteins do not fold correctly:
Alzheimer's, Mad Cow
How?
Folding@home is a distributed computing project -- people from
throughout the world download and run software to band together
to make one of the largest supercomputers in the world. In each
computer Folding@home uses novel computational methods
coupled to distributed computing, to simulate problems.
the results get back to the main server as you computer will
automatically upload the results to the server each time it finishes
a work unit, and download a new job at that time.
23
24
Brief Architectural information:
27
Academically:
1000 nodes Beowulf Cluster
System
Used for genetic algorithm
research by John Coza,
Stanford University
28
http://www.pssclabs.com/products_powerwulf.asp
29
Parallel Environments are used in building clusters?
Two of the most commonly used Parallel Interface Libraries:
PVM (Parallel Virtual Machine)
MPI (Message passing Interface)
http://www.wikipedia.org/
http://folding.stanford.edu/
http://www.pssclabs.com/
http://www.genetic-programming.com
http://www.lam-mpi.org/mpi/mpi_top10.php 31