Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Parallel Distributed Computing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Parallel and Distributed Computing

Week 01
Introduction
23/09/2024
Introduction

What Is Parallel Computing?


Serial Computing

Traditionally, software has been written for serial computation:


• To be run on a single computer having a single Central Processing Unit (CPU);
• A problem is broken into a discrete series of instructions.
• Instructions are executed one after another.
• Only one instruction may execute at any moment in time.

Parallel and Distributed Computing


Introduction

What Is Parallel Computing?


Serial Computing

e.g.

Parallel and Distributed Computing


Introduction

Parallel Computing?

In the simplest sense, parallel computing is the simultaneous use of multiple


compute resources to solve a computational problem.
• To be run using multiple CPUs
• A problem is broken into discrete parts that can be solved concurrently
• Each part is further broken down to a series of instructions
Instructions from each part execute simultaneously on different CPUs

Parallel and Distributed Computing


Introduction

Parallel Computing

For example:

Parallel and Distributed Computing


Introduction

Parallel Computing

The computational problem should be able to:


• Be broken apart into discrete pieces of work that can be solved
simultaneously;
• Execute multiple program instructions at any moment in time;
• Be solved in less time with multiple compute resources than with a single
compute resource.

The compute resources are typically:


• A single computer with multiple processors/cores
• An arbitrary number of such computers connected by a network

Parallel and Distributed Computing


Introduction

Parallel vs Distributed Computing

Parallel Computing
When different processors/computers work on a single common goal
e.g. Ten men pulling a rope to lift up one rock, supercomputers implement parallel
computing.

Distributed computing
When several different computers work separately on a multi-faceted computing
workload.
e.g. Ten men pulling ten ropes to lift ten different rocks, employees working in an
office doing their own work.

Parallel and Distributed Computing


Introduction

Parallel Computers

Virtually all stand-alone computers today are parallel from a hardware


perspective:
• Multiple functional units (L1 cache, L2 cache, branch, prefetch, decode,
floating-point, graphics processing (GPU), integer, etc.)
• Multiple execution units/cores
• Multiple hardware threads

Compute Chip with 18 cores (PU) and 16 L2 Cache units (L2)


Parallel and Distributed Computing
Introduction

Why Use Parallel Computing?


The Real World Is Massively Complex

• In the natural world, many complex, interrelated events are happening at the
same time, yet within a temporal sequence.
• Compared to serial computing, parallel computing is much better suited for
modeling, simulating and understanding complex, real world phenomena.

Parallel and Distributed Computing


Introduction

Why Use Parallel Computing?


The Real World Is Massively Complex
• For example, imagine modeling these serially:

Distributed versus decentralized systems


Introduction

Main Reasons for Using Parallel Programming


SAVE TIME AND/OR MONEY

• In theory, throwing more resources at a task will shorten its time to completion,
with potential cost savings.
• Parallel computers can be built from cheap, commodity components.

Parallel and Distributed Computing


Introduction

Main Reasons for Using Parallel Programming


SOLVE LARGER / MORE COMPLEX PROBLEMS
• Many problems are so large and/or complex that it is impractical or impossible
to solve them using a serial program, especially given limited computer
memory.
• Example: "Grand Challenge Problems"
(en.wikipedia.org/wiki/Grand_Challenge) requiring petaflops and petabytes of
computing resources.
• Example: Web search engines/databases processing millions of transactions
every second

Parallel and Distributed Computing


Introduction

Main Reasons for Using Parallel Programming


PROVIDE CONCURRENCY
• Many A single compute resource can only do one thing at a time. Multiple
compute resources can do many things simultaneously.
• Example: Collaborative Networks provide a global venue where people from
around the world can meet and conduct work "virtually."

Parallel and Distributed Computing


Introduction

TAKE ADVANTAGE OF NON-LOCAL RESOURCES

• Using compute resources on a wide area network, or even the Internet when
local compute resources are scarce or insufficient.
• Example: SETI@home (setiathome.berkeley.edu) has over 1.7 million users in
nearly every country in the world (May, 2018).

Parallel and Distributed Computing


Introduction

MAKE BETTER USE OF UNDERLYING PARALLEL HARDWARE

• Modern computers, even laptops, are parallel in architecture with multiple


processors/cores.
• Parallel software is specifically intended for parallel hardware with multiple
cores, threads, etc.
• In most cases, serial programs run on modern computers "waste" potential
computing power.

Parallel and Distributed Computing


Introduction

THE FUTURE

• Modern During the past 20+ years, the trends indicated by ever faster
networks, distributed systems, and multi-processor computer architectures
(even at the desktop level) clearly show that parallelism is the future of
computing.
• In this same time period, there has been a greater than 500,000x increase in
supercomputer performance, with no end currently in sight.
• The race is already on for Exascale Computing - we are entering
Exascale era
• Exaflop = 1018 calculations per second
• US DOE Exascale Computing Project: https://www.exascaleproject.org

Parallel and Distributed Computing


Introduction From networked systems to distributed systems

Main Reasons for Using Parallel Programming


THE FUTURE

Distributed versus decentralized systems


Introduction

Who Is Using Parallel Computing?


Science and Engineering

Historically, parallel computing has been considered to be "the high end of


computing," and has been used to model difficult problems in many areas of
science and engineering:
• Atmosphere, Earth, Environment
• Physics - applied, nuclear, particle, condensed matter, high pressure, fusion,
photonics
• Bioscience, Biotechnology, Genetics
• Chemistry, Molecular Sciences
• Geology, Seismology
• Mechanical Engineering - from prosthetics to spacecraft
• Electrical Engineering, Circuit Design, Microelectronics
• Computer Science, Mathematics
• Defense, Weapons

Parallel and Distributed Computing


Introduction

Who Is Using Parallel Computing?


Science and Engineering

Parallel and Distributed Computing


Introduction

Industrial and Commercial

Today, commercial applications provide an equal or greater driving force in the


development of faster computers. These applications require the processing of
large amounts of data in sophisticated ways. For example:
• "Big Data," databases, data mining
• Artificial Intelligence (AI)
• Oil exploration
• Web search engines, web based business services
• Medical imaging and diagnosis
• Pharmaceutical design
• Financial and economic modeling
• Management of national and multi-national corporations
• Advanced graphics and virtual reality, particularly in the entertainment industry
• Networked video and multi-media technologies
• Collaborative work environments

Parallel and Distributed Computing


Introduction

Industrial and Commercial

Parallel and Distributed Computing


Introduction

Global Applications
Parallel computing is now being used extensively around the world, in a wide
variety of applications.

Distributed versus decentralized systems


Introduction

Concepts and Terminology


von Neumann Computer Architecture

• Named after the Hungarian mathematician John von Neumann who first
authored the general requirements for an electronic computer in his 1945
papers.
• Also known as "stored-program computer" - both program instructions and
data are kept in electronic memory. Differs from earlier computers which were
programmed through "hard wiring".
• Since then, virtually all computers
have followed this basic design:

Parallel and Distributed Computing


Introduction

Concepts and Terminology


von Neumann Computer Architecture

Comprised of four main components:


• Memory
• Control Unit
• Arithmetic Logic Unit
• Input/Output

Parallel and Distributed Computing


Introduction

Concepts and Terminology


von Neumann Computer Architecture

• Read/write, random access memory is used to store both program instructions


and data
• Program instructions are coded data which tell the computer to do something
• Data is simply information to be used by the program
• Control unit fetches instructions/data from memory, decodes the instructions
and then sequentially coordinates operations to accomplish the programmed
task.
• Arithmetic Unit performs basic arithmetic operations
• Input/Output is the interface to the human operator

Parallel computers still follow this basic design, just multiplied in units. The basic,
fundamental architecture remains the same. More info on his other remarkable
accomplishments: http://en.wikipedia.org/wiki/John_von_Neumann

Parallel and Distributed Computing


Introduction

Flynn’s Classical Taxonomy


• There are a number of different ways to classify parallel computers. Examples
are available in the references.
• One of the more widely used classifications, in use since 1966, is called
Flynn's Taxonomy.
• Flynn's taxonomy distinguishes multi-processor computer architectures
according to how they can be classified along the two independent dimensions
of Instruction Stream and Data Stream. Each of these dimensions can have
only one of two possible states: Single or Multiple.
• The matrix below defines the 4 possible classifications according to Flynn:

Parallel and Distributed Computing


Introduction

Single Instruction, Single Data (SISD)


• A serial (non-parallel) computer
• Single Instruction: Only one instruction stream is being acted on by the CPU
during any one clock cycle
• Single Data: Only one data stream is being used as input during any one
clock cycle
• Deterministic execution
• This is the oldest type of computer
• Examples: older generation mainframes, minicomputers, workstations and
single processor/core PCs.

Parallel and Distributed Computing


Introduction

Parallel and Distributed Computing


Introduction

Single Instruction, Multiple Data (SIMD)


• A type of parallel computer
• Single Instruction: All processing units execute the same instruction at any
given clock cycle
• Multiple Data: Each processing unit can operate on a different data element
• Best suited for specialized problems characterized by a high degree of
regularity, such as graphics/image processing.
• Synchronous (lockstep) and deterministic execution
• Two varieties: Processor Arrays and Vector Pipelines
• Examples:
• Processor Arrays: Thinking Machines CM-2, MasPar MP-1 & MP-2, ILLIAC
IV
• Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-
2, Hitachi S820, ETA10
• Most modern computers, particularly those with graphics processor units
(GPUs) employ SIMD instructions and execution units.

Parallel and Distributed Computing


Introduction

Single Instruction, Multiple Data (SIMD)

Parallel and Distributed Computing


Introduction

Parallel and Distributed Computing


Introduction

Multiple Instruction, Single Data (MISD)


• A type of parallel computer
• Multiple Instruction: Each processing unit operates on the data
independently via separate instruction streams.
• Single Data: A single data stream is fed into multiple processing units.
• Few (if any) actual examples of this class of parallel computer have ever
existed.
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded message

Parallel and Distributed Computing


Introduction

Multiple Instruction, Multiple Data (MIMD)


• A type of parallel computer
• Multiple Instruction: Every processor may be executing a different instruction
stream
• Multiple Data: Every processor may be working with a different data stream
• Execution can be synchronous or asynchronous, deterministic or non-
deterministic
• Currently, the most common type of parallel computer - most modern
supercomputers fall into this category.
• Examples: most current supercomputers, networked parallel computer clusters
and "grids", multi-processor SMP computers, multi-core PCs.
• Note Many MIMD architectures also include SIMD execution sub-components

Parallel and Distributed Computing


Introduction

Multiple Instruction, Multiple Data (MIMD)

Parallel and Distributed Computing


Introduction

Multiple Instruction, Multiple Data (MIMD)

Parallel and Distributed Computing


Introduction

General Parallel Computing Terminology


Like everything else, parallel computing has its own jargon. Some of the more
commonly used terms associated with parallel computing are listed below. Most of
these will be discussed in more detail later.

CPU
Contemporary CPUs consist of one or more cores - a distinct execution unit with
its own instruction stream. Cores with a CPU may be organized into one or more
sockets - each socket with its own distinct memory . When a CPU consists of two
or more sockets, usually hardware infrastructure supports memory sharing across
sockets.

Node
A standalone "computer in a box." Usually comprised of multiple
CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked
together to comprise a supercomputer.

Parallel and Distributed Computing


Introduction

General Parallel Computing Terminology

Parallel and Distributed Computing


Introduction

General Parallel Computing Terminology

Task
A logically discrete section of computational work. A task is typically a program or
program-like set of instructions that is executed by a processor. A parallel
program consists of multiple tasks running on multiple processors.

Pipelining
Breaking a task into steps performed by different processor units, with inputs
streaming through, much like an assembly line; a type of parallel computing.

Shared Memory
Describes a computer architecture where all processors have direct access to
common physical memory. In a programming sense, it describes a model where
parallel tasks all have the same "picture" of memory and can directly address and
access the same logical memory locations regardless of where the physical
memory actually exists.

Parallel and Distributed Computing


Introduction

General Parallel Computing Terminology

Symmetric Multi-Processor (SMP)


Shared memory hardware architecture where multiple processors share a single
address space and have equal access to all resources - memory, disk, etc.

Distributed Memory
In hardware, refers to network based memory access for physical memory that is
not common. As a programming model, tasks can only logically "see" local
machine memory and must use communications to access memory on other
machines where other tasks are executing.

Communications
Parallel tasks typically need to exchange data. There are several ways this can be
accomplished, such as through a shared memory bus or over a network.

Parallel and Distributed Computing


Introduction

General Parallel Computing Terminology

Synchronization
The coordination of parallel tasks in real time, very often associated with
communications.
Synchronization usually involves waiting by at least one task, and can therefore
cause a parallel application's wall clock execution time to increase.

Computational Granularity
In parallel computing, granularity is a quantitative or qualitative measure of the
ratio of computation to communication.
Coarse: relatively large amounts of computational work are done between
communication events
Fine: relatively small amounts of computational work are done between
communication events

Parallel and Distributed Computing


Introduction

General Parallel Computing Terminology

Observed Speedup
Observed speedup of a code which has been parallelized, defined as:

One of the simplest and most widely used indicators for a parallel program's
performance.

Parallel Overhead
Required execution time that is unique to parallel tasks, as opposed to that for
doing useful work. Parallel overhead can include factors such as:
• Task start-up time
• Synchronizations
• Data communications
• Software overhead imposed by parallel languages, libraries, operating
system,etc.
• Task termination time
Parallel and Distributed Computing
Introduction

General Parallel Computing Terminology

Massively Parallel
Refers to the hardware that comprises a given parallel system - having many
processing elements. The meaning of "many" keeps increasing, but currently, the
largest parallel computers are comprised of processing elements numbering in the
hundreds of thousands to millions.

Embarrassingly (IDEALY) Parallel


Solving many similar, but independent tasks simultaneously; little to no need for
coordination between the tasks.

Scalability
Refers to a parallel system's (hardware and/or software) ability to demonstrate a
proportionate increase in parallel speedup with the addition of more resources.
Factors that contribute to scalability include:
• Hardware - particularly memory-cpu bandwidths and network communication
properties
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application
Parallel and Distributed Computing
Introduction

Potential Benefits, Limits and Costs of Parallel Programming


Amdahl’s Law
• Amdahl's Law states that potential program speedup is defined by the fraction
of code (P) that can be parallelized:

• If none of the code can be parallelized, P = 0 and the speedup = 1 (no


speedup).
• If all of the code is parallelized, P = 1 and the speedup is infinite (in theory).
• If 50% of the code can be parallelized, maximum speedup = 2, meaning the
code will run twice as fast.

Parallel and Distributed Computing


Introduction

Potential Benefits, Limits and Costs of Parallel Programming


Amdahl’s Law
• Introducing the number of processors performing the parallel fraction of work,
the relationship can be modeled by:

where P = parallel fraction, N = number of processors and S = serial fraction.


It soon becomes obvious that there are limits to the scalability of parallelism. For
example:

Distributed versus decentralized systems


Introduction

Potential Benefits, Limits and Costs of Parallel Programming


Amdahl’s Law
• "Famous" quote: You can spend a lifetime getting 95% of your code to be
parallel, and never achieve better than 20x speedup no matter how many
processors you throw at it!
• However, certain problems demonstrate increased performance by increasing
the problem size. For example:

• We can increase the problem size by doubling the grid dimensions and halving
the time step. This results in four times the number of grid points and twice the
number of time steps. The timings then look like:

• Problems that increase the percentage of parallel time with their size are
more scalable than problems with a fixed percentage of parallel time.
Parallel and Distributed Computing
Introduction

Potential Benefits, Limits and Costs of Parallel Programming


Amdahl’s Law

Parallel and Distributed Computing


Introduction

Complexity

In general, parallel applications are more complex than corresponding serial


applications. Not only do you have multiple instruction streams executing at the
same time, but you also have data flowing between them.

The costs of complexity are measured in programmer time in virtually every


aspect of the software development cycle:
• Design
• Coding
• Debugging
• Tuning
• Maintenance

Adhering to "good" software development practices is essential when


developing parallel applications.

Parallel and Distributed Computing


Introduction

Portability

• Thanks to standardization in several APIs, such as MPI, OpenMP and POSIX


threads, portability issues with parallel programs are not as serious as in years
past. However...
• All of the usual portability issues associated with serial programs apply to
parallel programs. For example, if you use vendor "enhancements" to Fortran,
C or C++, portability will be a problem.
• Even though standards exist for several APIs, implementations will differ in a
number of details, sometimes to the point of requiring code modifications in
order to effect portability.
• Operating systems can play a key role in code portability issues.
• Hardware architectures are characteristically highly variable and can affect
portability.

Parallel and Distributed Computing


Introduction

Resource Requirements

• The primary intent of parallel programming is to decrease execution wall clock


time, however in order to accomplish this, more CPU time is required. For
example, a parallel code that runs in 1 hour on 8 processors actually uses 8
hours of CPU time.
• The amount of memory required can be greater for parallel codes than serial
codes, due to the need to replicate data and for overheads associated with
parallel support libraries and subsystems.
• For short running parallel programs, there can actually be a decrease in
performance compared to a similar serial implementation. The overhead costs
associated with setting up the parallel environment, task creation,
communications and task termination can comprise a significant portion of the
total execution time for short runs

Parallel and Distributed Computing


Introduction

Scalability

• Two types of scaling based on time to solution: strong scaling and weak
scaling.
• Strong scaling (Amdahl):
• The total problem size stays fixed as more processors are added.
• Goal is to run the same problem size faster
• Perfect scaling means problem is solved in 1/P time (compared to serial)
• Weak scaling (Gustafson):
• The problem size per processor stays fixed as more processors are
added. The total problem size is proportional to the number of processors
used.
• Goal is to run larger problem in same amount of time
• Perfect scaling means problem Px runs in same time as single processor run

Parallel and Distributed Computing


Introduction

Scalability

• The ability of a parallel program's performance to scale is a result of a number


of interrelated factors. Simply adding more processors is rarely the answer.
• The algorithm may have inherent limits to scalability. At some point, adding
more resources causes performance to decrease. This is a common situation
with many parallel applications.
• Hardware factors play a significant role in scalability. Examples:
• Memory-cpu bus bandwidth on an SMP machine
• Communications network bandwidth
• Amount of memory available on any given machine or set of machines
• Processor clock speed
• Parallel support libraries and subsystems software can limit scalability
independent of your application.

Parallel and Distributed Computing

You might also like