Introduction To Parallel Processing
Introduction To Parallel Processing
Processing
Mr.Venkatesh Shankar
B.E., M.Tech., (PhD)
• Parallel computing is a form of computation in which many calculations are carried
out simultaneously, operating on the principle that large problems can often be
divided into smaller ones, which are then solved concurrently ("in parallel").
• There are several different forms of parallel computing: bit-level, instruction level,
data, and task parallelism.
• The third generation (1962-1975) This generation was marked by the use
of small-scale integrated (SSI) and medium-scale integrated (MSI) circuits
as the basic building blocks. Multilayered printed circuits were used High-
level languages were greatly enhanced with intelligent compilers during
this period.
• The fourth generation (1972-present) The present generation computers
emphasize the use of large-scale integrated (LSI) circuits for both logic and
memory sections. High-density packaging has appeared.
• The future Computers to be used in the I990s may be the next generation.
Very large- scale integrated (VLSI) chips will be used along with high-
density modular design.
Trends Towards Parallel Processing
• According to Sidney Fernbach:
Today's large computers (mainframes) would have been considered
‘supercomputers’ 10 to 20 years ago. By the same token, today's
supercomputers will be considered 'state-of the-art' standard equipment 10 to
20 years from now.
From an application point of view, the mainstream usage of computers is
experiencing a trend of four ascending levels of sophistication
• Data processing
• Information processing
• Knowledge processing
• Intelligence processing
• The relationships between data, information, knowledge, and intelligence are
demonstrated in Figure 1.2. The data space is the largest, including numeric
numbers in various formats, character symbols, and multidimensional measures.
Data objects are considered mutually unrelated in the space.
• As the accumulated knowledge bases expanded rapidly in recent years, there grew a
strong demand to use computers for knowledge processing.
• For example, the various expert computer systems listed in Table 1.1 are used for
problem solving in the specific areas where they can reach a level of performance
comparable to that of human experts.
Definition
• Definition Parallel processing is an efficient form of information processing which
emphasizes the exploitation of concurrent events in the com put in process.
• The CPU contains an arithmetic and logic unit (A.LU) with an optional floating-
point accelerator, and some local cache memory With an optional diagnostic
memory. The CPU can be intervened by the operator through the console
connected to a floppy disk.
• The CPU, the main memory (232 words of 32 bits each), 'and the I/O subsystems
are all connected to a common bus, the synchronous backplane interconnect (SBI).
Through this bus, all I/O devices can communicate with each other With CPU, or
with the memory. Peripheral storage or I/O devices can be connected directly to
the SBI through the unibus and its controller.
Parallel Processing Mechanisms
• A Number of parallel processing mechanisms have been developed in uniprocessor
computers We identify them into six categories
• The CDC-6600(designed in 1964) has 10 functional units built into its CPU (Figure
1.5). These 10 units are independent of each other and may operate simultaneously.
A scoreboard is used to keep track of the availability of the functional units and
Registers being demanded. With 10 functional units and -24-registers available, the
instruction issue rate can be significantly increased.
Parallelism and pipelining within the
CPU
• Parallel adders, using such techniques as carry-look ahead and carry-save, are now
built into almost all ALUs. This contrast to the bit-serial adders used in the first-
generation machines.
• provide direct information transfer between the I/O devices and main M/M The
DMA is conducted on a cycle stealing basis. which is apparent to the CPU
Use of Hierarchical memory system
• Use of Hierarchical memory system Usually, the CPU is about 1000 times faster
than memory access. A hierarchical memory system can be used to close up speed
gap. Computer memory hierarchy is conceptually illustrated in Figure
Balancing of Subsystem Bandwidth
• In general, the CPU is the fastest unit in a computer, with a processor cycle tp of
tens of nanoseconds; the main memory has a cycle time tm of hundreds of
nanoseconds; and the I/O devices are the slowest with an average access. time td
Of a few milliseconds. It is thus observed that
td>tm>tp
• The bandwidth of a system is defined as the number of operations performed per
unit time. Let W be the number of words delivered per memory cycle tm Then the
maximum memory bandwidth Bm is equal to
Multiprogramming and Time Sharing
Multiprogramming With in the same time interval there may be multiple process
active in a computer, competing for memory, I/O, and CPU resources, we are
aware of the fact that some computer programs are CPU bound and some are
I/O-bound (input-output intensive).
We can mix the execution of various types of programs in the computer to balance
bandwidth among the various functional units.
• these four steps must be completed before the next instruction can be issued. In a
pipelined computer, successive instructions are executed in an overlapped fashion,
as illustrated in Figure 1.10
A typical pipeline computer is conceptually depicted Figure 1.11architecture is
very similar to several commercial machines like Cray-1 to be described in Chapter
4. Both scalar arithmetic pipelines and vector arithmetic pipelines provided.
The instructions Pre processing unit is it self pipelined with three stages shown.
The OF stage consist of two independent stages one for fetching scalar operands -
and the other for vector operands .
Array Computers
• IAn array processor is a synchronous parallel computer with multiple arithmetic
logic units, called processing elements (PE), that can operate in parallel in a
lockstep fashion. By replication of ALUs, one can achieve the spatial parallelism.
The PEs are synchronized to perform the same function at the same time
• array processor is depicted in Figure 1.12. Scalar and control-type instructions are
directly executed in the control unit (CU). Each PE consists of an ALU with
registers and a local memory. The PEs are interconnected by a data-routing
network. The interconnection pattern to be established for specific computation is
under program control from the CU.
• vector processor or array processor is a central processing unit (CPU) that
implements an containing instructions that operate on one-dimensional arrays of
data called vectors.
• scalar processor. A CPU that performs computations on one number or set of data
at a time. Most computers have scalar CPUs. A scalar processor is known as a
"single instruction stream single data stream" (SISD) CPU. Contrast with
vector processor.
A multiprocessor system
• Research and development of multiprocessor systems are aimed at improving
throughput, reliability, flexibility, and availability. A basic multiprocessor
• In general, digital computers may be classified into four categories, accord to the
multiplicity of instruction and data streams. This scheme for classify computer
organizations was introduced by Michael J. Flynn.
• Instructions or data -are define with respect to a referenced machine. An
instruction stream is a sequence of instructions as executed by the machine; a data
stream is a sequence of data including input, partial, or temporary results, called
for by the instruction stream.
• Computer simulations are far cheaper and faster than physical experiments.
• Computer can solve a much wider range of problems than specific laboratory
equipment can.