Parallel Distributed Computing
Parallel Distributed Computing
Parallel Distributed Computing
Week 01
Introduction
23/09/2024
Introduction
e.g.
Parallel Computing?
Parallel Computing
For example:
Parallel Computing
Parallel Computing
When different processors/computers work on a single common goal
e.g. Ten men pulling a rope to lift up one rock, supercomputers implement parallel
computing.
Distributed computing
When several different computers work separately on a multi-faceted computing
workload.
e.g. Ten men pulling ten ropes to lift ten different rocks, employees working in an
office doing their own work.
Parallel Computers
• In the natural world, many complex, interrelated events are happening at the
same time, yet within a temporal sequence.
• Compared to serial computing, parallel computing is much better suited for
modeling, simulating and understanding complex, real world phenomena.
• In theory, throwing more resources at a task will shorten its time to completion,
with potential cost savings.
• Parallel computers can be built from cheap, commodity components.
• Using compute resources on a wide area network, or even the Internet when
local compute resources are scarce or insufficient.
• Example: SETI@home (setiathome.berkeley.edu) has over 1.7 million users in
nearly every country in the world (May, 2018).
THE FUTURE
• Modern During the past 20+ years, the trends indicated by ever faster
networks, distributed systems, and multi-processor computer architectures
(even at the desktop level) clearly show that parallelism is the future of
computing.
• In this same time period, there has been a greater than 500,000x increase in
supercomputer performance, with no end currently in sight.
• The race is already on for Exascale Computing - we are entering
Exascale era
• Exaflop = 1018 calculations per second
• US DOE Exascale Computing Project: https://www.exascaleproject.org
Global Applications
Parallel computing is now being used extensively around the world, in a wide
variety of applications.
• Named after the Hungarian mathematician John von Neumann who first
authored the general requirements for an electronic computer in his 1945
papers.
• Also known as "stored-program computer" - both program instructions and
data are kept in electronic memory. Differs from earlier computers which were
programmed through "hard wiring".
• Since then, virtually all computers
have followed this basic design:
Parallel computers still follow this basic design, just multiplied in units. The basic,
fundamental architecture remains the same. More info on his other remarkable
accomplishments: http://en.wikipedia.org/wiki/John_von_Neumann
CPU
Contemporary CPUs consist of one or more cores - a distinct execution unit with
its own instruction stream. Cores with a CPU may be organized into one or more
sockets - each socket with its own distinct memory . When a CPU consists of two
or more sockets, usually hardware infrastructure supports memory sharing across
sockets.
Node
A standalone "computer in a box." Usually comprised of multiple
CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked
together to comprise a supercomputer.
Task
A logically discrete section of computational work. A task is typically a program or
program-like set of instructions that is executed by a processor. A parallel
program consists of multiple tasks running on multiple processors.
Pipelining
Breaking a task into steps performed by different processor units, with inputs
streaming through, much like an assembly line; a type of parallel computing.
Shared Memory
Describes a computer architecture where all processors have direct access to
common physical memory. In a programming sense, it describes a model where
parallel tasks all have the same "picture" of memory and can directly address and
access the same logical memory locations regardless of where the physical
memory actually exists.
Distributed Memory
In hardware, refers to network based memory access for physical memory that is
not common. As a programming model, tasks can only logically "see" local
machine memory and must use communications to access memory on other
machines where other tasks are executing.
Communications
Parallel tasks typically need to exchange data. There are several ways this can be
accomplished, such as through a shared memory bus or over a network.
Synchronization
The coordination of parallel tasks in real time, very often associated with
communications.
Synchronization usually involves waiting by at least one task, and can therefore
cause a parallel application's wall clock execution time to increase.
Computational Granularity
In parallel computing, granularity is a quantitative or qualitative measure of the
ratio of computation to communication.
Coarse: relatively large amounts of computational work are done between
communication events
Fine: relatively small amounts of computational work are done between
communication events
Observed Speedup
Observed speedup of a code which has been parallelized, defined as:
One of the simplest and most widely used indicators for a parallel program's
performance.
Parallel Overhead
Required execution time that is unique to parallel tasks, as opposed to that for
doing useful work. Parallel overhead can include factors such as:
• Task start-up time
• Synchronizations
• Data communications
• Software overhead imposed by parallel languages, libraries, operating
system,etc.
• Task termination time
Parallel and Distributed Computing
Introduction
Massively Parallel
Refers to the hardware that comprises a given parallel system - having many
processing elements. The meaning of "many" keeps increasing, but currently, the
largest parallel computers are comprised of processing elements numbering in the
hundreds of thousands to millions.
Scalability
Refers to a parallel system's (hardware and/or software) ability to demonstrate a
proportionate increase in parallel speedup with the addition of more resources.
Factors that contribute to scalability include:
• Hardware - particularly memory-cpu bandwidths and network communication
properties
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application
Parallel and Distributed Computing
Introduction
• We can increase the problem size by doubling the grid dimensions and halving
the time step. This results in four times the number of grid points and twice the
number of time steps. The timings then look like:
• Problems that increase the percentage of parallel time with their size are
more scalable than problems with a fixed percentage of parallel time.
Parallel and Distributed Computing
Introduction
Complexity
Portability
Resource Requirements
Scalability
• Two types of scaling based on time to solution: strong scaling and weak
scaling.
• Strong scaling (Amdahl):
• The total problem size stays fixed as more processors are added.
• Goal is to run the same problem size faster
• Perfect scaling means problem is solved in 1/P time (compared to serial)
• Weak scaling (Gustafson):
• The problem size per processor stays fixed as more processors are
added. The total problem size is proportional to the number of processors
used.
• Goal is to run larger problem in same amount of time
• Perfect scaling means problem Px runs in same time as single processor run
Scalability