Parallel Computing Terminology
Parallel Computing Terminology
• Save time and/or money: In theory, throwing more resources at a task will
shorten its time to completion, with potential cost savings. Parallel clusters can be
built from cheap, commodity components.
• Solve larger problems: Many problems are so large and/or complex that it is
impractical or impossible to solve them on a single computer, especially given
limited computer memory. For example:
o "Grand Challenge" (en.wikipedia.org/wiki/Grand_Challenge) problems
requiring PetaFLOPS and PetaBytes of computing resources.
o Web search engines/databases processing millions of transactions per
second
• Provide concurrency: A single compute resource can only do one thing at a time.
Multiple computing resources can be doing many things simultaneously. For
example, the Access Grid (www.accessgrid.org) provides a global collaboration
network where people from around the world can meet and conduct work
"virtually".
• Use of non-local resources: Using compute resources on a wide area network, or
even the Internet when local compute resources are scarce. For example:
o SETI@home (setiathome.berkeley.edu) uses over 330,000 computers for a
compute power over 528 TeraFLOPS (as of August 04, 2008)
o Folding@home (folding.stanford.edu) uses over 340,000 computers for a
compute power of 4.2 PetaFLOPS (as of November 4, 2008)
• Limits to serial computing: Both physical and practical reasons pose significant
constraints to simply building ever faster serial computers:
o Transmission speeds - the speed of a serial computer is directly dependent
upon how fast data can move through hardware. Absolute limits are the
speed of light (30 cm/nanosecond) and the transmission limit of copper
wire (9 cm/nanosecond). Increasing speeds necessitate increasing
proximity of processing elements.
o Limits to miniaturization - processor technology is allowing an increasing
number of transistors to be placed on a chip. However, even with
molecular or atomic-level components, a limit will be reached on how
small components can be.
o Economic limitations - it is increasingly expensive to make a single
processor faster. Using a larger number of moderately fast commodity
processors to achieve the same (or better) performance is less expensive.
• Named after the Hungarian mathematician John von Neumann who first authored
the general requirements for an electronic computer in his 1945 papers.
• Since then, virtually all computers have followed this basic design, which differed
from earlier computers programmed through "hard wiring".
o Comprised of four main components:
Memory
Control Unit
Arithmetic Logic Unit
Input/Output
o Read/write, random access memory is used to store both program
instructions and data
Program instructions are coded data which tell the computer to do
something
Data is simply information to be used by the program
o Control unit fetches instructions/data from memory, decodes the
instructions and then sequentially coordinates operations to accomplish
the programmed task.
o Aritmetic Unit performs basic arithmetic operations
• There are different ways to classify parallel computers. One of the more widely
used classifications, in use since 1966, is called Flynn's Taxonomy.
• Flynn's taxonomy distinguishes multi-processor computer architectures according
to how they can be classified along the two independent dimensions of
Instruction and Data. Each of these dimensions can have only one of two
possible states: Single or Multiple.
• The matrix below defines the 4 possible classifications according to Flynn:
SISD SIMD
MISD MIMD
Handler classificqation :
The events are created by the framework based on interpreting lower-level inputs, which
may be lower-level events themselves. For example, mouse movements and clicks are
interpreted as menu selections. The events initially originate from actions on the
operating system level, such as interrupts generated by hardware devices, software
interrupt instructions, or state changes in polling. On this level, interrupt handlers and
signal handlers correspond to event handlers.
Created events are first processed by an event dispatcher within the framework. It
typically manages the associations between events and event handlers, and may queue
event handlers or events for later processing. Event dispatchers may call event handlers
directly, or wait for events to be dequeued with information about the handler to be
executed. Handling signals
Signal handlers can be installed with the signal() system call. If a signal handler is not
installed for a particular signal, the default handler is used. Otherwise the signal is
intercepted and the signal handler is invoked. The process can also specify two default
behaviors, without creating a handler: ignore the signal (SIG_IGN) and use the default
signal handler (SIG_DFL). There are two signals which cannot be intercepted and
handled: SIGKILL and SIGSTOP.
[edit] Risks
Signals can cause the interruption of a system call in progress, leaving it to the
application to manage a non-transparent restart.
Signal handlers should be written in a way that doesn't result in any unwanted side-
effects, e.g. errno alteration, signal mask alteration, signal disposition change, and other
global process attribute changes. Use of non-reentrant functions, e.g. malloc or printf,
inside signal handlers is also unsafe.
A process's execution may result in the generation of a hardware exception, for instance,
if the process attempts to divide by zero or incurs a TLB miss. In Unix-like operating
systems, this event automatically changes the processor context to start executing a
kernel exception handler. With some exceptions, such as a page fault, the kernel has
sufficient information to fully handle the event and resume the process's execution. In
other exceptions, however, the kernel cannot proceed intelligently and must instead defer
the exception handling operation to the faulting process. This deferral is achieved via the
signal mechanism, wherein the kernel sends to the process a signal corresponding to the
current exception. For example, if a process attempted to divide by zero on an x86 CPU,
a divide error exception would be generated and cause the kernel to send the SIGFPE
signal to the process. Similarly, if the process attempted to access a memory address
outside of its virtual address space, the kernel would notify the process of this violation
via a SIGSEGV signal. The exact mapping between signal names and exceptions is
obviously dependent upon the CPU, since exception types differ between architectures.
Amdahl's law and Gustafson's law
Most grid computing applications use middleware, software that sits between the
operating system and the application to manage network resources and standardize the
software interface. The most common grid computing middleware is the Berkeley Open
Infrastructure for Network Computing (BOINC). Often, grid computing software makes
use of "spare cycles", performing computations at times when a computer is idling.
S= 1/1-p
where S is the speed-up of the program (as a factor of its original sequential runtime), and
P is the fraction that is parallelizable. If the sequential portion of a program is 10% of the
runtime, we can get no more than a 10× speed-up, regardless of how many processors are
added. This puts an upper limit on the usefulness of adding more parallel execution units.
"When a task cannot be partitioned because of sequential constraints, the application of
more effort has no effect on the schedule. The bearing of a child takes nine months, no
matter how many women are assigned."[12]
Gustafson's law is another law in computer engineering, closely related to Amdahl's law.
It can be formulated as:
S(p)=p- aplha(p-1)
Assume that a task has two independent parts, A and B. B takes roughly 25% of the time
of the whole computation. With effort, a programmer may be able to make this part five
times faster, but this only reduces the time for the whole computation by a little. In
contrast, one may need to perform less work to make part A twice as fast. This will make
the computation much faster than by optimizing part B, even though B got a greater
speed-up (5× versus 2×).
where P is the number of processors, S is the speed-up, and α the non-parallelizable part
of the process.[13] Amdahl's law assumes a fixed-problem size and that the size of the
sequential section is independent of the number of processors, whereas Gustafson's law
does not make these assumptions.