1 Introduction To Parallel Computing
1 Introduction To Parallel Computing
Computing
What is Parallel Computing? (1)
Basic design
– Memory is used to store both
program and data instructions
– Program instructions are coded data
which tell the computer to do
something
– Data is simply information to be used
by the program
A central processing unit (CPU)
gets instructions and/or data from
memory, decodes the instructions
and then sequentially performs
them.
Like everything else, parallel computing has its own "jargon". Some of the
more commonly used terms associated with parallel computing are listed
below. Most of these will be discussed in more detail later.
Task
– A logically discrete section of computational work. A task is
typically a program or program-like set of instructions that is
executed by a processor.
Parallel Task
– A task that can be executed by multiple processors safely
(yields correct results)
Serial Execution
– Execution of a program sequentially, one statement at a
time. In the simplest sense, this is what happens on a one
processor machine. However, virtually all parallel tasks will
have sections of a parallel program that must be executed
serially.
Shared Memory
Distributed Memory
Hybrid Distributed-Shared Memory
Advantages
– Global address space provides a user-friendly programming
perspective to memory
– Data sharing between tasks is both fast and uniform due to the
proximity of memory to CPUs
Disadvantages:
– Primary disadvantage is the lack of scalability between memory
and CPUs. Adding more CPUs can geometrically increases traffic
on the shared memory-CPU path, and for cache coherent systems,
geometrically increase traffic associated with cache/memory
management.
– Programmer responsibility for synchronization constructs that
insure "correct" access of global memory.
– Expense: it becomes increasingly difficult and expensive to design
and produce shared memory machines with ever increasing
numbers of processors.
Advantages
– Memory is scalable with number of processors. Increase the
number of processors and the size of memory increases
proportionately.
– Each processor can rapidly access its own memory without
interference and without the overhead incurred with trying to
maintain cache coherency.
– Cost effectiveness: can use commodity, off-the-shelf processors
and networking.
Disadvantages
– The programmer is responsible for many of the details associated
with data communication between processors.
– It may be difficult to map existing data structures, based on global
memory, to this memory organization.
– Non-uniform memory access (NUMA) times
Although it might not seem apparent, these models are NOT specific to
a particular type of machine or memory architecture. In fact, any of
these models can (theoretically) be implemented on any underlying
hardware.
Shared memory model on a distributed memory machine: Kendall
Square Research (KSR) ALLCACHE approach.
– Machine memory was physically distributed, but appeared to the user as a
single shared memory (global address space). Generically, this approach is
referred to as "virtual shared memory".
– Note: although KSR is no longer in business, there is no reason to suggest
that a similar implementation will not be made available by another vendor
in the future.
– Message passing model on a shared memory machine: MPI on SGI Origin.
The SGI Origin employed the CC-NUMA type of shared memory
architecture, where every task has direct access to global memory.
However, the ability to send and receive messages with MPI, as is
commonly done over a network of distributed memory machines, is not
only implemented but is very commonly used.
OpenMP
– Compiler directive based; can use serial code
– Jointly defined and endorsed by a group of major computer
hardware and software vendors. The OpenMP Fortran API was
released October 28, 1997. The C/C++ API was released in late
1998.
– Portable / multi-platform, including Unix and Windows NT platforms
– Available in C/C++ and Fortran implementations
– Can be very easy and simple to use - provides for "incremental
parallelism"
Microsoft has its own implementation for threads, which is not
related to the UNIX POSIX standard or OpenMP.