Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Parallel and Cluster Computing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

1

PARALLEL & CLUSTER COMPUTING


CS 6260
PROFESSOR: ELISE DE DONCKER
BY: LINA HUSSEIN
 Introduction
 What is cluster computing?
 Classification of Cluster Computing
 Technologies:
 Beowulf cluster
 Construction of Beowulf Cluster
 The use of cluster computing in Bioinformatics & Parallel
Computing
 Folding@Home Project
 High performance clusters (HPC)
a 256-processor Sun cluster.
 Build Your Own Cluster! 2
 Mainly in parallel: Split problem in smaller tasks that
are executed concurrently
 Why?
 Absolute physical limits of hardware components
 Economical reasons – more complex = more expensive
 Performance limits – double frequency <> double
performance
 Large applications – demand too much memory & time

 Advantages:
 Increasing speed & optimizing resources utilization
 Disadvantages:
 Complex programming models – difficult development

3
 Several applications on parallel processing:

Science Digital Aerospace Resources


Computation Biology Exploration

4
 Architectures of Parallel Computer:
 PVP (Parallel Vector Processor)
 SMP (Symmetric Multiprocessor)
 MPP (Massively Parallel Processor)
 COW (Cluster of Workstation)
 DSM (Distributed Shared Memory)
 Towards Inexpensive Supercomputing: Cluster Computing is the
Commodity Supercomputing 58.8%

Architectures and Technology Trend of


Supercomputer

Cluster 58.80%

MPP 20.00%
5
Other
 A computer cluster is a group of linked computers, working
together closely so that in many respects they form a single
computer. The components of a cluster are commonly, but not
always, connected to each other through fast local area networks.
Clusters are usually deployed to improve performance and/or
availability over that provided by a single computer, while typically
being much more cost-effective than single computers of comparable
speed or availability.

Cluster consists of:




Nodes(master+computing)
 Network Cluster Middle ware
 OS
 Cluster middleware:
Middleware such as MPI
which permits compute High Speed Local Network
clustering programs to be
portable to a wide variety
of clusters CPU CPU … 6
CPU

Cluster
High availability clusters Network Load
Science Clusters
(HA) (Linux) balancing clusters

operate by
Mission critical distributing a
applications workload evenly over
multiple back end
nodes.
High-availability clusters
(also known as Failover
Typically the cluster
Clusters) are implemented
will be configured
for the purpose of
with multiple
improving the availability
redundant load- Beowulf
of services which the
balancing front ends.
cluster provides.

all available servers


provide redundancy process requests.

eliminate single points of Web servers, mail 7


failure. servers,..
 A Beowulf Cluster is a computer design that uses parallel
processing across multiple computers to create cheap and
powerful supercomputers. A Beowulf Cluster in practice is
usually a collection of generic computers, either stock
systems or wholesale parts purchased independently and
assembled, connected through an internal network.
 A cluster has two types of computers, a master computer,
and node computers. When a large problem or set of data is
given to a Beowulf cluster, the master computer first runs a
program that breaks the problem into small discrete pieces;
it then sends a piece to each node to compute. As nodes
finish their tasks, the master computer continually sends
more pieces to them until the entire problem has been
computed.
8
( Ethernet,Myrinet….)
+ (MPI)

 Master: or service node or front node ( used to interact with users and
manage the cluster )
 Nodes : a group of computers (computing node
s)( keyboard, mouse, floppy, video…)
 Communications between nodes on an interconnect network platform
( Ethernet, Myrinet….)
 In order for the master and node computers to communicate, some sort 9
message passing control structure is required. MPI,(Message Passing
Interface) is the most commonly used such control.
 To construct Beowulf cluster there are four distinct but
interrelated areas of consideration:

resource
Hard ware system administration and
structure management
environment

Distributed
Programming Parallel algorithms
libraries and tools

10
Brief Technical Parameters:

 OS: CentOS 5 managed by Rochs-cluster


 Service node: 1 (Intel P4 2.4 GHz)
 Computing nodes: 32 (Intel P4 2.4- 2.8 GHz)
 System Memory: 1 GB per node
 Network Platforms: Gigabit Ethernet, 2 cards per node
Myrinet 2 G
 Language: C, C++, Fortran, java
 Compiler: GNU gcc, Intel compiler, sun Java compiler
Parallel Environment: MPICH
 Tools: Ganglia (Monitoring)
Pbs - Torque (Scheduler) 11
OS (Operating System )

 Three of the most commonly used OS are including kernel


level support for parallel programming:
 Windows NT/2000
mainly used to build a High Availability Cluster or a
NLB(Network Local Balance) Cluster, provide services such as
Database , File/Print,Web,Stream Media .Support 2-4 SMP or 32
processors. Hardly used to build a Science Computing Cluster
 Redhat Linux
The most used OS for a Beowulf Cluster.
provides High Performance and Scalability / High Reliability /
Low Cost ( get freely and uses inexpensive commodity
hardware )
12
 SUN Solaris
Uses expensive and unpopular hardware
Network Platform
 Some design considerations for the interconnect network are:
• Fast Ethernet (100Mbps): low cost / min latency: 80µs

• Gigabit Ethernet (1Gbps) expensive/ min latency: 300 µs

• Myrinet (high-speed local area networking system) (2Gbps) The


best network platform.
• Some design considerations for the interconnect network are:

• Network structure Bus/Switched

• Maximum bandwidth

• Minimum latency

13
Parallel Environment
 Two of the most commonly used Parallel Interface Libraries:
o PVM (Parallel Virtual Machine)
o MPI (Message Passing Interface)
 Parallel Interface Libraries: provide a group of communication
interface libraries that support message passing. Users can call these
libraries directly in their Fortran and C programs.

14

Cluster Computer Architecture


15
16
17
18
19
 HPC platform for scientific applications
 Storage and processing of large data
 Satellites image processing
 Information Retrieval, Data Mining

 Computing systems in an academic environment


 Geologists also use clusters to emulate and
predict earthquakes and model the interior of the
Earth and sea floor
 clusters are even used to render and manipulate
high-resolution graphics in engineering.

20
 What is Bioinformatics:
 Also called “biomedical computing”. The application of
computer science and technology to problems in the
biomolecular sciences.
 Cluster Uses:
 The Beowulf cluster computing design is been used by
parallel processing computer systems projects to build a
powerful computer that could assist in Bioinformatics
research and data analysis.
 In bioinformatics Clusters are used to run DNA string
matching algorithms or to run protein folding applications.
It also use a computer algorithm known as BLAST,(Basic
Local Alignment Search Tool), to analyze massive sets of
DNA sequences for research into Bioinformatics.
21
 For Bioinfomatics MPICH2 is used which is an
implementation of MPI that was specifically
designed for use with cluster computing systems
and parallel processing. It is an open source
set of libraries for various high level
programming languages that give programmers
tools to easily control how large problems are
broken apart and distributed to the various
computers in a cluster.

22
 Protein folding and how is folding linked to disease?
 Proteins are biology's workhorses -- its "nanomachines." Before
proteins can carry out these important functions, they assemble
themselves, or "fold." The process of protein folding, while critical
and fundamental to virtually all of biology, in many ways remains
a mystery.
 when proteins do not fold correctly:
 Alzheimer's, Mad Cow
 How?
 Folding@home is a distributed computing project -- people from
throughout the world download and run software to band together
to make one of the largest supercomputers in the world. In each
computer Folding@home uses novel computational methods
coupled to distributed computing, to simulate problems.
 the results get back to the main server as you computer will
automatically upload the results to the server each time it finishes
a work unit, and download a new job at that time.
23
24
 Brief Architectural information:

 Processor : AMD OPETRON 2218 DUAL CORE DUAL SOCKET


 NO. of Master Nodes : 1
 NO. of Computing Nodes : 64
 CLUSTER Software : ROCKS version 4.3
 Total Peak Performance : 1.3 T. F
 Peak Performance: In network performance management, a set of
functions that evaluate and report the behavior of:
 telecommunications equipment
 Efffectiveness of the network or network element
 Other subfunctions, such as
 gathering statistical information,

 maintaining and examining historical logs,

 determining system performance under natural and artificial


25
conditions
 altering system modes of operation.
 Calculation procedure for peak performance:
 No of nodes 64
 Memory RAM 4 GB
 Hard Disk Capacity/each node : 250GB
 Storage Cap. 4 TB
 No .of processors and cores: 2 X 2 = 4(dual core + dual
socket)
 CPU speed : 2.6 GHz
 No. of floating point operations per seconds for AMD
processor: 2 (since it is a dual core)
 Total peak performance : No of nodes X No .of
processors and cores X CPU speed X No of floating
point operations per second = 64 X 4 X 2.6GHz X 2 =
26
1.33 TF
 Scheduler used: Sun Grid Engine: Job scheduler
software tool.
 Application software/s and compilers:
 Open MPI Lam MPI
 C, C++, FORTRAN compilers (both GNU AND
INTEL)
 Bio roll: for Bio-Chemical applications

27
Academically:
 1000 nodes Beowulf Cluster
System
 Used for genetic algorithm
research by John Coza,
Stanford University

28
 http://www.pssclabs.com/products_powerwulf.asp

29
 Parallel Environments are used in building clusters?
 Two of the most commonly used Parallel Interface Libraries:
 PVM (Parallel Virtual Machine)
 MPI (Message passing Interface)

 Why MPI over PVM?


1. MPI has more than one freely available, quality implementation
(LAM, MPICH and CHIMP).
2. MPI defines a 3rd party profiling mechanism.
3. MPI has full asynchronous communication.
4. MPI groups are solid, efficient, and deterministic.
5. MPI efficiently manages message buffers.
6. MPI synchronization protects 3rd party software.
7. MPI can efficiently program MPP and clusters.
8. MPI is totally portable.
9. MPI is formally specified.
10. MPI is a standard, can be implemented with Linux, NT, on many
supercomputers 30
 WMU e-books library:
 Beowulf Cluster Computing with Windows: Thomas Sterling,
ISBN:9780262692755.
 Construction of a Beowulf Cluster System for Parallel
Computing Kun Feng, Jiaqi Dong, Jinhua Zhang
 http://cs.wmich.edu/

 http://www.wikipedia.org/

 http://folding.stanford.edu/

 http://www.pssclabs.com/

 http://www.genetic-programming.com

 http://www.lam-mpi.org/mpi/mpi_top10.php 31

You might also like