Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Slides PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 268

Figure 1.

1 Astrophysical N-body
simulation by Scott Linssen (undergraduate
University of North Carolina at Charlotte
[UNCC] student).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 1
Main memory

Instructions (to processor)


Data (to or from processor)

Processor Figure 1.2 Conventional computer having


a single processor and memory.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 2
Memory modules
One
address
space

Interconnection
network

Figure 1.3 Traditional shared memory


Processors multiprocessor model.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 3
Interconnection
network
Messages
Processor

Local
memory

Figure 1.4 Message-passing


Computers multiprocessor model (multicomputer).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 4
Interconnection
network
Messages
Processor

Shared
memory
Figure 1.5 Shared memory multiprocessor
Computers implementation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 5
Program Program

Instructions Instructions

Processor Processor

Data Data

Figure 1.6 MPMD structure.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 6
P M Computers P M

C C

Network with direct links


between computers

P M

Figure 1.7 Static link multicomputer.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 7
Computer (node)

Switch
Links Links
to other to other
nodes nodes

Processor Memory

Figure 1.8 Node with a switch for internode message transfers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 8
Link

Node Node
Figure 1.9 A link between two nodes with
separate wires in each direction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 9
Figure 1.10 Ring.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 10
Computer/
Links processor

Figure 1.11 Two-dimensional array


(mesh).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 11
Root

Processing
element
Links

Figure 1.12 Tree structure.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 12
110 111

100 101

010 011

000 001 Figure 1.13 Three-dimensional hypercube.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 13
0110 0111 1110 1111

0100 0101 1100 1101

0010 0011 1010 1011

0000 0001 1000 1001

Figure 1.14 Four-dimensional hypercube.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 14
Ring

Figure 1.15 Embedding a ring onto a torus.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 15
Nodal address
1011

10

11

01

00

x Figure 1.16 Embedding a mesh into a


y 00 01 11 10 hypercube.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 16
A A
Root

A A

A A

Figure 1.17 Embedding a tree into a mesh.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 17
Packet Head
Movement

Flit buffer

Request/
Acknowledge
signal(s)

Figure 1.18 Distribution of flits.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 18
Source Destination
processor processor

Data
Figure 1.19 A signaling method between
R/A processors for wormhole routing (Ni and
McKinley, 1993).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 19
Packet switching

Network
latency

Wormhole routing
Circuit switching

Distance
(number of nodes between source and destination) Figure 1.20 Network delay characteristics.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 20
Node 4 Node 3

Messages

Figure 1.21 Deadlock in store-and-forward


Node 1 Node 2 networks.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 21
Virtual channel
buffer Node Node

Route

Physical link

Figure 1.22 Multiple virtual channels mapped onto a single physical channel.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 22
Ethernet

Workstation/ Workstations Figure 1.23 Ethernet-type single wire


file server network.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 23
Frame check Data Type Source Destination Preamble
sequence address address
(variable) (16 bits) (64 bits)
(32 bits) (48 bits) (48 bits)

Direction

Figure 1.24 Ethernet frame format.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 24
Network

Workstation/
file server
Workstations

Figure 1.25 Network of workstations connected via a ring.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 25
Workstations

Workstation/
file server

Figure 1.26 Star connected network.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 26
Parallel programming cluster

(a) Using specially designed adaptors

(b) Using separate Ethernet interfaces

Figure 1.27 Overlapping connectivity Ethernets.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 27
Process 1

Process 2 Computing

Process 3
Slope indicating time
to send message
Process 4

Waiting to send a message Message Time

Figure 1.28 Space-time diagram of a message-passing program.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 28
ts
fts (1 − f)ts

Serial section Parallelizable sections

(a) One processor

(b) Multiple
processors

n processors

(1 − f)ts /n
tp

Figure 1.29 Parallelizing sequential problem — Amdahl’s law.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 29
20 f = 0% 20
n = 256
Speedup factor, S(n)

Speedup factor, S(n)


16 16

12 12
f = 5%
8 8
f = 10%

4 f = 20% 4
n = 16

4 8 12 16 20 0.2 0.4 0.6 0.8 1.0


Number of processors, n Serial fraction, f
(a) (b)
Figure 1.30 (a) Speedup against number of processors. (b) Speedup against serial fraction, f.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 30
Source
file

Compile to suit
processor

Executables

Figure 2.1 Single program, multiple data


Processor 0 Processor n − 1 operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 31
Process 1

Start execution
spawn(); of process 2 Process 2

Time

Figure 2.2 Spawning a process.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 32
Process 1 Process 2
x y

Movement
send(&x, 2); of data
recv(&y, 1);
Figure 2.3 Passing a message between
processes using send() and recv()
library calls.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 33
Process 1 Process 2

Time send(); Request to send


Suspend Acknowledgment
process recv();
Both processes Message
continue

(a) When send() occurs before recv()

Process 1 Process 2

Time recv();
Request to send Suspend
send(); process
Both processes Message
continue Acknowledgment

(b) When recv() occurs before send()

Figure 2.4 Synchronous send() and recv() library calls using a three-way protocol.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 34
Process 1 Process 2

Message buffer
Time send();
Continue recv();
process Read
message buffer

Figure 2.5 Using a message buffer.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 35
Process 0 Process 1 Process n − 1

data data data

Action
buf

bcast(); bcast(); bcast();


Code

Figure 2.6 Broadcast operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 36
Process 0 Process 1 Process n − 1

data data data

Action

buf

scatter(); scatter(); scatter();


Code

Figure 2.7 Scatter operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 37
Process 0 Process 1 Process n − 1

data data data

Action

buf

gather(); gather(); gather();


Code

Figure 2.8 Gather operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 38
Process 0 Process 1 Process n − 1

data data data

Action

buf +

reduce(); reduce(); reduce();


Code

Figure 2.9 Reduce operation (addition).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 39
Workstation

PVM
daemon

Application
program
(executable)
Messages
sent through
Workstation network

Workstation
PVM
daemon

Application
program PVM
(executable) daemon

Application
program
(executable)

Figure 2.10 Message passing between workstations using PVM.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 40
Workstation

PVM
daemon

Messages
sent through
Workstation network

PVM
daemon Workstation

PVM
daemon

Application
program
(executable)

Figure 2.11 Multiple processes allocated to each processor (workstation).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 41
Process 1 Process 2
Array Send buffer Array to
holding receive
data Pack data

pvm_psend();
Continue pvm_precv(); Wait for message
process

Figure 2.12 pvm_psend() and pvm_precv() system calls.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 42
Process_1
Process_2

pvm_initsend(); x
Send s
buffer y
pvm_pkint( … &x …);
pvm_pkstr( … &s …);
pvm_pkfloat( … &y …);
pvm_send(process_2 … ); Message
pvm_recv(process_1 …);
pvm_upkint( … &x …);
Receive pvm_upkstr( … &s …);
buffer pvm_upkfloat(… &y … );

Figure 2.13 PVM packing messages, sending, and unpacking.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 43
#include <stdio.h> Master
#include <stdlib.h>
#include <pvm3.h>
#define SLAVE “spsum”
#define PROC 10
#define NELEM 1000
main() {
int mytid,tids[PROC];
int n = NELEM, nproc = PROC;
int no, i, who, msgtype; Slave
int data[NELEM],result[PROC],tot=0;
char fn[255]; #include <stdio.h>
FILE *fp; #include “pvm3.h”
mytid=pvm_mytid();/*Enroll in PVM */ #define PROC 10
#define NELEM 1000
/* Start Slave Tasks */
no= main() {
pvm_spawn(SLAVE,(char**)0,0,““,nproc,tids); int mytid;
if (no < nproc) { int tids[PROC];
printf(“Trouble spawning slaves \n”); int n, me, i, msgtype;
for (i=0; i<no; i++) pvm_kill(tids[i]); int x, nproc, master;
pvm_exit(); exit(1); int data[NELEM], sum;
}
mytid = pvm_mytid();
/* Open Input File and Initialize Data */
strcpy(fn,getenv(“HOME”)); /* Receive data from master */
strcat(fn,”/pvm3/src/rand_data.txt”); msgtype = 0;
if ((fp = fopen(fn,”r”)) == NULL) { pvm_recv(-1, msgtype);
printf(“Can’t open input file %s\n”,fn); pvm_upkint(&nproc, 1, 1);
exit(1); pvm_upkint(tids, nproc, 1);
} pvm_upkint(&n, 1, 1);
for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]); pvm_upkint(data, n, 1);

/* Broadcast data To slaves*/ /* Determine my tid */


pvm_initsend(PvmDataDefault); for (i=0; i<nproc; i++)
msgtype = 0; if(mytid==tids[i])
pvm_pkint(&nproc, 1, 1); {me = i;break;}
pvm_pkint(tids, nproc, 1);
pvm_pkint(&n, 1, 1); Broadcast data /* Add my portion Of data */
pvm_pkint(data, n, 1); x = n/nproc;
pvm_mcast(tids, nproc, msgtag); low = me * x;
high = low + x;
for(i = low; i < high; i++)
/* Get results from Slaves*/ sum += data[i];
msgtype = 5;
for (i=0; i<nproc; i++){ /* Send result to master */
pvm_recv(-1, msgtype); pvm_initsend(PvmDataDefault);
Receive results pvm_pkint(&me, 1, 1);
pvm_upkint(&who, 1, 1);
pvm_upkint(&result[who], 1, 1); pvm_pkint(&sum, 1, 1);
printf(“%d from %d\n”,result[who],who); msgtype = 5;
} master = pvm_parent();
pvm_send(master, msgtype);
/* Compute global sum */
for (i=0; i<nproc; i++) tot += result[i]; /* Exit PVM */
printf (“The total is %d.\n\n”, tot); pvm_exit();
return(0);
pvm_exit(); /* Program finished. Exit PVM */ }
return(0);
Figure 2.14 Sample PVM program.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 44
Process 0 Process 1
Destination
send(…,1,…);

lib() send(…,1,…); Source

recv(…,0,…); lib()

recv(…,0,…);

(a) Intended behavior

Process 0 Process 1

send(…,1,…);

lib() send(…,1,…);

recv(…,0,…); lib()

recv(…,0,…);

(b) Possible behavior

Figure 2.15 Unsafe message passing with libraries.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 45
#include “mpi.h”
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000

void main(int argc, char *argv)


{
int myid, numprocs;
int data[MAXSIZE], i, x, low, high, myresult, result;
char fn[255];
char *fp;

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);

if (myid == 0) { /* Open input file and initialize data */


strcpy(fn,getenv(“HOME”));
strcat(fn,”/MPI/rand_data.txt”);
if ((fp = fopen(fn,”r”)) == NULL) {
printf(“Can’t open the input file: %s\n\n”, fn);
exit(1);
}
for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);
}

/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);

/* Add my portion Of data */


x = n/nproc;
low = myid * x;
high = low + x;
for(i = low; i < high; i++)
myresult += data[i];
printf(“I got %d from %d\n”, myresult, myid);

/* Compute global sum */


MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(“The sum is %d.\n”, result);

MPI_Finalize();
}
Figure 2.16 Sample MPI program.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 46
Time

Startup time
Figure 2.17 Theoretical communication
Number of data items (n) time.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 47
c2g(x) = 6x2
160

140 f(x) = 4x2 + 2x + 12

120

100

80

60 c1g(x) = 2x2

40

20

0
0 1 2 3 4 5
x0

Figure 2.18 Growth of function f(x) = 4x2 + 2x + 12.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 48
110 111

100 101
3rd step

010 011
2nd step

1st step 000 001

Figure 2.19 Broadcast in a three-dimensional hypercube.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 49
P000
Message
Step 1

P000 P001
Step 2

P000 P010 P001 P011


Step 3

P000 P100 P010 P110 P001 P101 P011 P111

Figure 2.20 Broadcast as a tree construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 50
Steps
1 2 3

2 3 4 4

3 4 5 5

4 5 6 6

Figure 2.21 Broadcast in a mesh.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 51
Message

Figure 2.22 Broadcast on an Ethernet


Source Destinations network.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 52
Source

Sequential

N destinations Figure 2.23 1-to-N fan-out broadcast.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 53
Source

Sequential message issue

Figure 2.24 1-to-N fan-out broadcast on a


Destinations tree structure.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 54
Process 1

Process 2

Process 3

Time
Computing
Waiting
Message-passing system routine
Message

Figure 2.25 Space-time diagram of a parallel program.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 55
Number of repetitions or time

1 2 3 4 5 6 7 8 9 10
Statement number or regions of program Figure 2.26 Program profile.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 56
Input data

Processes

Figure 3.1 Disconnected computational


Results graph (embarrassingly parallel problem).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 57
spawn() Send initial data
send()
recv()
Slaves
Master
send()
recv()
Collect results

Figure 3.2 Practical embarrassingly parallel computational graph with dynamic process
creation and the master-slave approach.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 58
x

80 Process

y 640

80 Map

480

(a) Square region for each process

Process
640
10
Map

480

(b) Row region for each process

Figure 3.3 Partitioning into regions for individual processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 59
+2

Imaginary

−2
−2 0 Real +2

Figure 3.4 Mandelbrot set.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 60
Work pool

(xa, ya) (xe, ye)


(xc, yc)

(xb, yb) (xd, yd)

Task
Return results/
request new task

Figure 3.5 Work pool approach.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 61
Rows outstanding in slaves (count)

0 Row sent disp_height


Increment

Row returned
Terminate
Decrement Figure 3.6 Counter termination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 62
Total area = 4

2 Area = π

Figure 3.7 Computing π by a Monte Carlo


2 method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 63
1
f(x)

1
y = 1 – x2

x Figure 3.8 Function being integrated in


1 computing π by a Monte Carlo method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 64
Master

Partial sum
Request

Slaves

Random
number

Random number Figure 3.9 Parallel Monte Carlo


process integration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 65
x1 x2 xk-1 xk xk+1 xk+2 x2k-1 x2k

Figure 3.10 Parallel computation of a sequence.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 66
x0 … x(n/m)−1 xn/m … x(2n/m)−1 … x(m−1)n/m … xn−1

+ + +
Partial sums

Sum

Figure 4.1 Partitioning a sequence of numbers into parts and adding the parts.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 67
Initial problem

Divide
problem

Final tasks

Figure 4.2 Tree construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 68
Original list

P0

P0 P4

P0 P2 P4 P6

P0 P1 P2 P3 P4 P5 P6 P7

x0 xn−1

Figure 4.3 Dividing a list into parts.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 69
x0 xn−1

P0 P1 P2 P3 P4 P5 P6 P7

P0 P2 P4 P6

P0 P4

P0

Final sum

Figure 4.4 Partial summation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 70
Found/ OR
Not found

OR OR

Figure 4.5 Part of a search tree.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 71
Figure 4.6 Quadtree.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 72
Image area

First division
into four parts

Second division

Figure 4.7 Dividing an image.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 73
Unsorted numbers

Buckets
Sort
contents
of buckets
Merge lists
Sorted numbers

Figure 4.8 Bucket sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 74
Unsorted numbers

p processors

Buckets

Sort
contents
of buckets
Merge lists
Sorted numbers

Figure 4.9 One parallel version of bucket sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 75
n/m numbers
Unsorted numbers

p processors

Small
buckets

Empty
small
buckets

Large
buckets
Sort
contents
of buckets
Merge lists
Sorted numbers

Figure 4.10 Parallel version of bucket sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 76
Process 0 Process n − 1

Send Receive
buffer buffer

Send
buffer
0 n−1 0 n−1 0 n−1 0 n−1

Process 1 Process n − 1 Process 0 Process n − 2

Figure 4.11 “All-to-all” broadcast.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 77
“All-to-all”
P0 A0,0 A0,1 A0,2 A0,3 A0,0 A1,0 A2,0 A3,0

P1 A1,0 A1,1 A1,2 A1,3 A0,1 A1,1 A2,1 A3,1

P2 A2,0 A2,1 A2,2 A2,3 A0,2 A1,2 A2,2 A3,2

P3 A3,0 A3,1 A3,2 A3,3 A0,3 A1,3 A2,3 A3,3 Figure 4.12 Effect of “all-to-all” on an
array.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 78
f(x)

f(p) f(q)

Figure 4.13 Numerical integration using


a p δ q b x rectangles.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 79
f(x)

f(p) f(q)

Figure 4.14 More accurate numerical


a p δ q b x integration using rectangles.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 80
f(x)

f(p) f(q)

Figure 4.15 Numerical integration using


a p δ q b x the trapezoidal method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 81
f(x)
C

A B

Figure 4.16 Adaptive quadrature


x construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 82
f(x)
C=0

A B

Figure 4.17 Adaptive quadrature with false


x termination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 83
Center of mass

Distant cluster of bodies


r

Figure 4.18 Clustering distant bodies.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 84
Subdivision
direction

Particles Partial quadtree

Figure 4.19 Recursive division of two-dimensional space.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 85
Figure 4.20 Orthogonal recursive bisection
method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 86
log n numbers

+ + + +

+ + + +

+ + + +

+ +

Binary Tree

Result

Figure 4.21 Process diagram for Problem 4-12(b).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 87
y
f(a)
f(x)

b
a x

f(b) Figure 4.22 Bisection method for finding


the zero crossing location of a function.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 88
Figure 4.23 Convex hull (Problem 4-22).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 89
P0 P1 P2 P3 P4 P5

Figure 5.1 Pipelined processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 90
a[0] a[1] a[2] a[3] a[4]

a a a a a
sum sin sout sin sout sin sout sin sout sin sout

Figure 5.2 Pipeline for an unfolded loop.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 91
Signal without Signal without Signal without Signal without
frequency f0 frequency f1 frequency f2 frequency f3

f0 f1 f2 f3 f4
Filtered signal
f(t) fin fout fin fout fin fout fin fout fin fout

Figure 5.3 Pipeline for a frequency filter.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 92
p−1 m
Instance Instance Instance Instance Instance
P5 1 2 3 4 5
Instance Instance Instance Instance Instance Instance
P4 1 2 3 4 5 6
Instance Instance Instance Instance Instance Instance Instance
P3 1 2 3 4 5 6 7
Instance Instance Instance Instance Instance Instance Instance
P2 1 2 3 4 5 6 7
Instance Instance Instance Instance Instance Instance Instance
P1 1 2 3 4 5 6 7
Instance Instance Instance Instance Instance Instance Instance
P0 1 2 3 4 5 6 7

Time

Figure 5.4 Space-time diagram of a pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 93
Instance 0 P0 P1 P2 P3 P4 P5

Instance 1 P0 P1 P2 P3 P4 P5

Instance 2 P0 P1 P2 P3 P4 P5

Instance 3 P0 P1 P2 P3 P4 P5

Instance 4 P0 P1 P2 P3 P4 P5

Time

Figure 5.5 Alternative space-time diagram.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 94
Input sequence
d9d8d7d6d5d4d3d2d1d0 P0 P1 P2 P3 P4 P5 P6 P7 P8 P9

(a) Pipeline structure


p−1 n

P9 d0 d1 d2 d3 d4 d5 d6

P8 d0 d1 d2 d3 d4 d5 d6 d7

P7 d0 d1 d2 d3 d4 d5 d6 d7 d8

P6 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

P5 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

P4 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

P3 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

P2 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

P1 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

P0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

Time
(b) Timing diagram

Figure 5.6 Pipeline processing 10 data elements.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 95
P5 P5

P4 P4
Information
P3 P3
transfer
sufficient to P2 P2
start next
process P1 P1
Information passed
P0 to next stage P0

Time Time
(a) Processes with the same (b) Processes not with the
execution time same execution time

Figure 5.7 Pipeline processing where information passes to next stage before end of process.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 96
Processor 0 Processor 1 Processor 2

P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11

Figure 5.8 Partitioning processes onto processors.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 97
Multiprocessor

Host
computer

Figure 5.9 Multiprocessor system with a line configuration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 98
2 3 4 5
Σ1 i Σ1 i Σ1 i Σ1 i Σ1 i
P0 P1 P2 P3 P4

Figure 5.10 Pipelined addition.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 99
Master process Slaves

dn−1… d2d1d0 P0 P1 P2 Pn−1

Sum

Figure 5.11 Pipelined addition numbers with a master process and ring configuration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 100
Master process

Numbers
d0 d1 Slaves dn−1

P0 P1 P2 Pn−1

Sum

Figure 5.12 Pipelined addition of numbers with direct access to slave processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 101
P0 P1 P2 P3 P4

1 4, 3, 1, 2, 5

2 4, 3, 1, 2 5

2
3 4, 3, 1 5

1
4 4, 3 5 2

3 1
5 4 5 2
Time
(cycles) 4 2
6 5 3 1

3 1
7 5 4 2

2
8 5 4 3 1

1
9 5 4 3 2

10 5 4 3 2 1

Figure 5.13 Steps in insertion sort with five numbers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 102
P0 Smaller P1 P2
numbers

Series of numbers Compare


xn−1 … x1x0
xmax

Largest number Next largest


number
Figure 5.14 Pipeline for sorting using insertion sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 103
Master process

dn−1… d2d1d0
P0 P1 P2 Pn−1
Sorted sequence

Figure 5.15 Insertion sort with results returned to the master process using a bidirectional line configuration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 104
Sorting phase Returning sorted numbers
2n − 1 n

P4 Shown for n = 5

P3

P2

P1

P0

Time
Figure 5.16 Insertion sort with results returned.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 105
Not multiples of
1st prime number
P0 P1 P2

Series of numbers
xn−1 … x1x0

Compare 1st prime 2nd prime 3rd prime


multiples number number number

Figure 5.17 Pipeline for sieve of Eratosthenes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 106
P0 P1 P2 P3

x0 x0 x0
x0 x1 x1
Compute x0 Compute x1 x1 Compute x2 Compute x3
x2 x2
x3

Figure 5.18 Solving an upper triangular set of linear equation using a pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 107
P5

P4

P3 Final computed value


Processes
P2

P1

P0 First value passed onward


Figure 5.19 Pipeline processing using back
Time substitution.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 108
P0 P1 P2 P3 P4
divide
send(x0) ⇒ recv(x0)
end send(x0) ⇒ recv(x0)
multiply/add send(x0) ⇒ recv(x0)
divide/subtract multiply/add send(x0) ⇒ recv(x0)
send(x1) ⇒ recv(x1) multiply/add send(x1) ⇒
end send(x1) ⇒ recv(x1) multiply/add
multiply/add send(x1) ⇒ recv(x1)
divide/subtract multiply/add send(x1) ⇒
Time
send(x2) ⇒ recv(x2) multiply/add
end send(x2) ⇒ recv(x2)
multiply/add send(x2) ⇒
divide/subtract multiply/add
send(x3) ⇒ recv(x3)
end send(x3) ⇒
multiply/add
divide/subtract
send(x4) ⇒
end

Figure 5.20 Operations in back substitution pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 109
x1 x2 x3 x4

x x x x
y4y3y2y1 yin yout yin yout yin yout yin yout Output
a a a a

a1 a2 a3 a4

Figure 5.21 Pipeline for Problem 5-9.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 110
Display Display

Audio input
(digitized)
Pipeline

Audio input
(digitized)

(a) Pipeline solution (b) Direct decomposition

Figure 5.22 Audio histogram display.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 111
Processes
P0 P1 P2 Pn−1

Active

Time
Waiting
Barrier

Figure 6.1 Processes reaching the barrier at


different times.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 112
Processes

P0 P1 Pn−1

Barrier();

Barrier();
Processes wait until
all reach their Barrier();
barrier call

Figure 6.2 Library call barriers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 113
Processes

P0 P1 Pn−1
Counter, C

Increment Barrier();
and check for n

Barrier();

Barrier();

Figure 6.3 Barrier using a centralized counter.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 114
Master Slave processes

Arrival Barrier:
for(i=0;i<n;i++) send(Pmaster);
phase
recv(Pany); recv(Pmaster);
Departure
for(i=0;i<n;i++)
phase
send(Pi); Barrier:
send(Pmaster);
recv(Pmaster);

Figure 6.4 Barrier implementation in a message-passing system.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 115
P0 P1 P2 P3 P4 P5 P6 P7

Arrival Sychronizing
at barrier message

Departure
from barrier

Figure 6.5 Tree barrier.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 116
P0 P1 P2 P3 P4 P5 P6 P7

1st stage

Time
2nd stage

3rd stage

Figure 6.6 Butterfly construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 117
Instruction
a[] = a[] + k;

Processors a[0]=a[0]+k; a[1]=a[1]+k; a[n-1]=a[n-1]+k;

a[0] a[1] a[n-1]

Figure 6.7 Data parallel computation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 118
Numbers x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15

Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step 1 Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
(j = 0) i=0 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 i=11 i=12 i=13 i=14

Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step 2 Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
(j = 1) i=0 i=0 i=0 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 i=11 i=12

Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step 3 Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
(j = 2) i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8

Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Final step Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
(j = 3) i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0

Figure 6.8 Data parallel prefix sum operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 119
Computed
value

Error

Exact value

t t+1 Iteration Figure 6.9 Convergence rate.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 120
Process 0 Process 1 Process n − 1
data data data
Send x0 x1 xn−1
buffer

Receive
buffer
Allgather(); Allgather(); Allgather();

Figure 6.10 Allgather operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 121
2 × 106

Execution
time
(τ = 1)
1 × 106

Overall
Communication

0 Computation
0 4 8 12 16 20 24 28 32

Number of processors, p

Figure 6.11 Effects of computation and communication in Jacobi iteration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 122
j
Metal plate
i

Enlarged
hi−1,j

hi,j

hi,j−1 hi,j+1

hi+1,j

Figure 6.12 Heat distribution problem.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 123
x1 x2 xk−1 xk

xk+1 xk+2 x2k−1 x2k

xi−k

xi−1 xi+1
xi

xi+k
xk2 Figure 6.13 Natural ordering of heat
distribution problem.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 124
j send(g, Pi-1,j);
column send(g, Pi+1,j);
send(g, Pi,j-1);
i send(g, Pi,j+1);
row
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);

send(g, Pi-1,j); send(g, Pi-1,j); send(g, Pi-1,j);


send(g, Pi+1,j); send(g, Pi+1,j); send(g, Pi+1,j);
send(g, Pi,j-1); send(g, Pi,j-1); send(g, Pi,j-1);
send(g, Pi,j+1); send(g, Pi,j+1); send(g, Pi,j+1);
recv(w, Pi-1,j) recv(w, Pi-1,j) recv(w, Pi-1,j)
recv(x, Pi+1,j); recv(x, Pi+1,j); recv(x, Pi+1,j);
recv(y, Pi,j-1); recv(y, Pi,j-1); recv(y, Pi,j-1);
recv(z, Pi,j+1); recv(z, Pi,j+1); recv(z, Pi,j+1);

send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);

Figure 6.14 Message passing for heat distribution problem.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 125
P0 P1 Pp−1

P0 P1

Pp−1

Blocks Strips (columns)

Figure 6.15 Partitioning heat distribution problem.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 126
n
---
p n

Square blocks

Strips

Figure 6.16 Communication consequences of partitioning.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 127
2000

Strip partition best


tstartup

1000

Block partition best

0
1 10 100 1000
Figure 6.17 Startup times for block and
Processors, p strip partitions.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 128
Process i
Array held
by process i
One row
of points

Ghost points

Copy
Array held
by process i+1

Process i+1

Figure 6.18 Configurating array into contiguous rows for each process, with ghost points.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 129
20°C 4ft
100°C

10ft

10ft

Figure 6.19 Room for Problem 6-14.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 130
vehicle

Figure 6.20 Road junction for


Problem 6-16.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 131
Airflow

Actual dimensions
selected at will

Figure 6.21 Figure for Problem 6-23.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 132
P5
P4
P
Processors 3
P2
P1
P0

Time
(a) Imperfect load balancing leading
to increased execution time

P5
P4
P
Processors 3
P2
P1
P0
t

(b) Perfect load balancing Figure 7.1 Load balancing.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 133
Work pool
Queue
Tasks
Master
process

Send task
Request task
(and possibly
submit new tasks)
Slave “worker” processes

Figure 7.2 Centralized work pool.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 134
Initial tasks
Master, Pmaster

Process M0 Process Mn−1

Slaves

Figure 7.3 A distributed work pool.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 135
Process
Process

Requests/tasks

Process
Process
Figure 7.4 Decentralized work pool.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 136
Slave Pi Slave Pj
Requests Requests

Local Local
selection selection
algorithm algorithm

Figure 7.5 Decentralized selection algorithm requesting tasks between slaves.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 137
Master
process
P0

P1 P2 P3 Pn−1

Figure 7.6 Load balancing using a pipeline structure.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 138
Pcomm
If buffer empty,
make request Request for task

Receive task If buffer full,


from request send task
If free, Receive
request task from
task request

Ptask

Figure 7.7 Using a communication process in line load balancing.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 139
P0

Task
when
requested

P1 P2

P3 P5 P4 P6

Figure 7.8 Load balancing using a tree.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 140
Parent

Process Final
acknowledgment
Inactive First task

Acknowledgment
Task

Other processes
Active Figure 7.9 Termination using message
acknowledgments.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 141
Token passed to next processor
when reached local termination condition

P0 P1 P2 Pn−1

Figure 7.10 Ring termination detection algorithm.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 142
Token

AND

Terminated Figure 7.11 Process algorithm for local


termination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 143
Task

P0 Pj Pi Pn−1

Figure 7.12 Passing task to previous processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 144
AND

Terminated AND

AND Terminated

Terminated

Figure 7.13 Tree termination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 145
Summit
F

B D

A
Base camp Possible intermediate camps

Figure 7.14 Climbing a mountain.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 146
F 17

E
9
51
24 D
13
14

10 8
A B C

Figure 7.15 Graph of mountain climb.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 147
Destination
A B C D E F

A ∞ 10 ∞ ∞ ∞ ∞

B ∞ ∞ 8 13 24 51

C ∞ ∞ ∞ 14 ∞ ∞
Source
D ∞ ∞ ∞ ∞ 9 ∞

E ∞ ∞ ∞ ∞ ∞ 17

F ∞ ∞ ∞ ∞ ∞ ∞

(a) Adjacency matrix

Weight NULL
A B 10

B C 8 D 13 E 24 F 51

C D 14
Source
D E 9

E F 17

F
(b) Adjacency list

Figure 7.16 Representing a graph.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 148
Vertex j
di Vertex i wi,j

dj Figure 7.17 Moore’s shortest-path algo-


rithm.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 149
Master process

Start at
source
vertex
Vertex Vertex w[]
w[]

New
distance
dist
Vertex w[]
dist Process C
New
Process A distance
Other processes

dist

Process B

Figure 7.18 Distributed graph search.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 150
Entrance
Search path

Exit Figure 7.19 Sample maze for Problem 7-9.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 151
Gold
Entrance

Figure 7.20 Plan of rooms for Problem 7-10.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 152
Room B

Door

Figure 7.21 Graph representation for


Room A Problem 7-10.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 153
Bus

Cache

Figure 8.1 Shared memory multiprocessor


Processors Memory modules using a single bus.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 154
TABLE 8.1 SOME EARLY PARALLEL PROGRAMMING LANGUAGES

Language Originator/date Comments


Concurrent Pascal Brinch Hansen, 1975a Extension to Pascal
Ada U.S. Dept. of Defense, 1979b Completely new language
Modula-P Bräunl, 1986c Extension to Modula 2
C* Thinking Machines, 1987d Extension to C for SIMD systems
Concurrent C Gehani and Roome, 1989e Extension to C
Fortran D Fox et al., 1990f Extension to Fortran for data parallel programming

a. Brinch Hansen, P. (1975), “The Programming Language Concurrent Pascal,” IEEE Trans. Software Eng.,
Vol. 1, No. 2 (June), pp. 199–207.
b. U.S. Department of Defense (1981), “The Programming Language Ada Reference Manual,” Lecture
Notes in Computer Science, No. 106, Springer-Verlag, Berlin.
c. Bräunl, T., R. Norz (1992), Modula-P User Manual, Computer Science Report, No. 5/92 (August), Univ.
Stuttgart, Germany.
d. Thinking Machines Corp. (1990), C* Programming Guide, Version 6, Thinking Machines System Docu-
mentation.
e. Gehani, N., and W. D. Roome (1989), The Concurrent C Programming Language, Silicon Press, New
Jersey.
f. Fox, G., S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu (1990), Fortran D
Language Specification, Technical Report TR90-141, Dept. of Computer Science, Rice University.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 155
Main program

FORK
Spawned processes

FORK

FORK

JOIN JOIN
JOIN JOIN Figure 8.2 FORK-JOIN construct.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 156
Code Heap

IP

Stack
Interrupt routines

Files

(a) Process

Code Heap
Stack Thread
IP

Interrupt routines
Stack Thread
Files
IP

Figure 8.3 Differences between a process


(b) Threads and threads.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 157
Main program

thread1

proc1(&arg)
{
pthread_create(&thread1, NULL, proc1, &arg);

return(*status);
}
pthread_join(thread1, *status);

Figure 8.4 pthread_create() and pthread_join().

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 158
Main program

pthread_create(); Thread

pthread_create();
Thread

Thread
pthread_create(); Termination

Termination
Termination

Figure 8.5 Detached threads.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 159
Shared variable, x

Write Write

Read Read

+1 +1
Figure 8.6 Conflict in accessing shared
Process 1 Process 2 variable.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 160
Process 1 Process 2
while (lock == 1) do_nothing; while (lock == 1)do_nothing;
lock = 1;

Critical section

lock = 0;
lock = 1;

Critical section

lock = 0;

Figure 8.7 Control of critical sections through busy waiting.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 161
R1 R2 Resource

P1 P2 Process

(a) Two-process deadlock

R1 R2 Rn −1 Rn

P1 P2 Pn −1 Pn

(b) n-process deadlock Figure 8.8 Deadlock (deadly embrace).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 162
Main memory

7
6
5
4
Block 3
2
1
0

Address
tag

Cache Cache

Block in cache

Processor 1 Processor 2
Figure 8.9 False sharing in caches.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 163
sum
Array a[]

addr

Figure 8.10 Shared memory locations for Section 8.4.1 program example.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 164
global_index sum
Array a[]

addr

Figure 8.11 Shared memory locations for Section 8.4.2 program example.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 165
TABLE 8.2 LOGIC CIRCUIT DESCRIPTION FOR FIGURE 8.12

Gate Function Input 1 Input 2 Output


1 AND Test1 Test2 Gate1
2 NOT Gate1 Output1
3 OR Test3 Gate1 Output2

Test1
1 2 Output1
Test2

3 Output2
Test3 Figure 8.12 Sample logic circuit.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 166
Log
Movement
of logs
River

Frog

Figure 8.13 River and frog for Problem 8-23.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 167
Pool of threads

Request Request Slaves


serviced

Master Signal

Figure 8.14 Thread pool for Problem 8-24.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 168
a[i] a[0] a[i] a[n-1]

Compare

Increment
counter, x

b[x] = a[i] Figure 9.1 Finding the rank in parallel.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 169
a[i] a[0] a[i] a[1] a[i] a[2] a[i] a[3]

Compare
0/1 0/1 0/1 0/1

Add Add

0/1/2 0/1/2

Tree
Add

0/1/2/3/4

Figure 9.2 Parallelizing the rank computation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 170
Master

a[] b[]

Read Place selected


numbers number

Figure 9.3 Rank sort using a master and


Slaves slaves.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 171
Sequence of steps
P1 P2
1
A Send(A) B

If A > B send(B)
else send(A)
If A > B load A
2 else load B
Compare 3

Figure 9.4 Compare and exchange on a message-passing system — Version 1.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 172
P1 P2
1
A Send(A) B

Send(B)
2
If A > B load B If A > B load A
3 Compare Compare 3

Figure 9.5 Compare and exchange on a message-passing system — Version 2.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 173
P1 P2

Merge
88 88 98
Original 50 50 Keep
88 higher
numbers 28 28 80 numbers
25 25 50
43 98 43
42 Return
Final 42 80 lower
numbers 28 43 28
25 numbers
25 42

Figure 9.6 Merging two sublists — Version 1.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 174
P1 P2
Original
Merge numbers Merge
98 98 Keep
98 98
80 80 higher
88 88
43 43 numbers
80 80 (final
50 42 42 50
Keep numbers)
43 88 88 43
lower 42 42
numbers 50 50
28 28 28 28
(final 25 25
numbers) 25 25
Original
numbers

Figure 9.7 Merging two sublists — Version 2.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 175
Original
sequence: 4 2 7 8 5 1 3 6

4 2 7 8 5 1 3 6

2 4 7 8 5 1 3 6

2 4 7 8 5 1 3 6
Phase 1
Place 2 4 7 8 5 1 3 6
largest
number
2 4 7 5 8 1 3 6

2 4 7 5 1 8 3 6

2 4 7 5 1 3 8 6

2 4 7 5 1 3 6 8

2 4 7 5 1 3 6 8

Phase 2
2 4 7 5 1 3 6 8
Place
next
largest 2 4 5 7 1 3 6 8
number

2 4 5 1 7 3 6 8

2 4 5 1 3 7 6 8

2 4 5 1 3 6 7 8
Phase 3
2 4 5 1 3 6 7 8

Time

Figure 9.8 Steps in bubble sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 176
Phase 1

Phase 2

2 1

Time 2 1

Phase 3

3 2 1

3 2 1

Phase 4

4 3 2 1

Figure 9.9 Overlapping bubble sort actions in a pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 177
P0 P1 P2 P3 P4 P5 P6 P7
Step
0 4 2 7 8 5 1 3 6

1 2 4 7 8 1 5 3 6

2 2 4 7 1 8 3 5 6

3 2 4 1 7 3 8 5 6
Time 4 2 1 4 3 7 5 8 6

5 1 2 3 4 5 7 6 8

6 1 2 3 4 5 6 7 8

7 1 2 3 4 5 6 7 8

Figure 9.10 Odd-even transposition sort sorting eight numbers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 178
Smallest
number

Largest
number Figure 9.11 Snakelike sorted list.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 179
4 14 8 2 2 4 8 14 1 4 7 3

10 3 13 16 16 13 10 3 2 5 8 6

7 15 1 5 1 5 7 15 12 11 9 14

12 6 11 9 12 11 9 6 16 13 10 15

(a) Original placement (b) Phase 1 — Row sort (c) Phase 2 — Column sort
of numbers

1 3 4 7 1 3 4 2 1 2 3 4

8 6 5 2 8 6 5 7 8 7 6 5

9 11 12 14 9 11 12 10 9 10 11 12

16 15 13 10 16 15 13 14 16 15 14 13

(d) Phase 3 — Row sort (e) Phase 4 — Column sort (f) Final phase — Row sort

Figure 9.12 Shearsort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 180
(a) Operations between elements (b) Transpose operation (c) Operations between elements
in rows in rows (originally columns)

Figure 9.13 Using the transpose operation to maintain operations in rows.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 181
Unsorted list

4 2 7 8 5 1 3 6 P0

4 2 7 8 5 1 3 6 P0 P4
Divide
list

4 2 7 8 5 1 3 6 P0 P2 P4 P6

4 2 7 8 5 1 3 6 P0 P1 P2 P3 P4 P5 P6 P7

2 4 7 8 1 5 3 6 P0 P2 P4 P6

Merge
2 4 7 8 1 3 5 6 P0 P4

1 2 3 4 5 6 7 8 P0

Sorted list Process allocation

Figure 9.14 Mergesort using tree allocation of processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 182
Unsorted list
Pivot
4 2 7 8 5 1 3 6 P0

3 2 1 4 5 7 8 6 P0 P4

2 1 3 4 5 7 8 6 P0 P2 P4 P6

1 2 3 6 7 8 P0 P1 P6 P7

Sorted list Process allocation

Figure 9.15 Quicksort using tree allocation of processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 183
Unsorted list
Pivot
4 2 7 8 5 1 3 6 4

3 2 1 5 7 8 6 3 5

1 2 7 8 6 1 7

2 6 8 2 6 8

Sorted list Pivots

Figure 9.16 Quicksort showing pivot withheld in processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 184
Work pool

Sublists

Request
sublist Return
sublist
Figure 9.17 Work pool implementation of
Slave processes quicksort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 185
(a) Phase 1 000 001 010 011 100 101 110 111

≤ p1 > p1

(b) Phase 2 000 001 010 011 100 101 110 111

≤ p2 > p2 ≤ p3 > p3

(c) Phase 3 000 001 010 011 100 101 110 111

≤ p4 > p4 ≤ p5 > p5 ≤ p6 > p6 ≤ p7 > p7

Figure 9.18 Hypercube quicksort algorithm when the numbers are originally in node 000.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 186
Broadcast pivot, p1

(a) Phase 1 000 001 010 011 100 101 110 111

≤ p1 > p1

Broadcast pivot, p2 Broadcast pivot, p3

(b) Phase 2 000 001 010 011 100 101 110 111

≤ p2 > p2 ≤ p3 > p3

Broadcast Broadcast Broadcast Broadcast


pivot, p4 pivot, p5 pivot, p6 pivot, p7

(c) Phase 3 000 001 010 011 100 101 110 111

≤ p4 > p4 ≤ p5 > p5 ≤ p6 > p6 ≤ p7 > p7

Figure 9.19 Hypercube quicksort algorithm when numbers are distributed among nodes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 187
110 111

(a) Phase 1 communication 010 011

100 101

000 001
110 111

(b) Phase 2 communication 010 011

100 101

000 001
110 111

(c) Phase 3 communication 010 011

100 101
Figure 9.20 Hypercube quicksort
communication.
000 001

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 188
Broadcast pivot, p1

(a) Phase 1 000 001 011 010 110 111 101 100

≤ p1 > p1

Broadcast pivot, p2 Broadcast pivot, p3

(b) Phase 2 000 001 011 010 110 111 101 100

≤ p2 > p2 ≤ p3 > p3

Broadcast Broadcast Broadcast Broadcast


pivot, p4 pivot, p5 pivot, p6 pivot, p7

(c) Phase 3 000 001 011 010 110 111 101 100

≤ p4 > p4 ≤ p5 > p5 ≤ p6 > p6 ≤ p7 > p7

Figure 9.21 Quicksort hypercube algorithm with Gray code ordering.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 189
a[] b[]
Sorted lists 2 4 5 8 1 3 6 7
Merge
Even indices
Odd indices Merge

c[] 1 2 5 6 d[] 3 4 7 8

Compare and exchange

Figure 9.22 Odd-even merging of two


Final sorted list e[] 1 2 3 4 5 6 7 8 sorted lists.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 190
Compare and
exchange
bn c2n
bn−1 c2n−1
c2n−2
Even
mergesort
b4
b3
b2
b1
an
an−1

Odd c7
mergesort c6
c5
a4 c4
a3 c3
a2 c2
a1 c1 Figure 9.23 Odd-even mergesort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 191
Value

a0, a1, a2, a3, … an−2, an−1 a0, a1, a2, a3, … an−2, an−1

(a) Single maximum (b) Single maximum and single minimum

Figure 9.24 Bitonic sequences.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 192
Bitonic sequence

3 5 8 9 7 4 2 1

Compare and
exchange

3 4 2 1 7 5 8 9
Figure 9.25 Creating two bitonic
Bitonic sequence Bitonic sequence sequences from one bitonic sequence.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 193
Unsorted numbers
3 5 8 9 7 4 2 1

Compare and
exchange
3 4 2 1 7 5 8 9

2 1 3 4 7 5 8 9

1 2 3 4 5 7 8 9
Sorted list Figure 9.26 Sorting a bitonic sequence.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 194
Unsorted numbers

Bitonic
sorting
operation

Direction
of increasing
numbers

Sorted list
Figure 9.27 Bitonic mergesort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 195
Compare and exchange
ai with ai+n/2 (n numbers)

8 3 4 7 9 2 1 5 = bitonic list
Step [Fig. 9.24 (a) or (b)]
1 n=2 ai with ai+1

Form
bitonic lists 3 8 7 4 2 9 5 1
of four
numbers
2 n=4 ai with ai+2

Split
Form 3 4 7 8 5 9 2 1
bitonic list
of eight
numbers 3 n=2 ai with ai+1

Sort
3 4 7 8 9 5 2 1

4 n=8 ai with ai+4

Sort bitonic list Split


3 4 2 1 9 5 7 8

5 n=4 ai with ai+2

Compare and Split


exchange 2 1 3 4 7 5 9 8

6 n=2 ai with ai+1


Lower Higher 1 2 3 4 5 7 8 9
Sort

Figure 9.28 Bitonic mergesort on eight numbers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 196
88 98
50 80
Step 1 28 43
25 42

50 98
42 88
Step 2 28 80
25 43

43 98
42 88
Step 3 28 80
25 50

Figure 9.29 Compare-and-exchange


Terminates when insertions at top/bottom of lists algorithm for Problem 9-5.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 197
Column
a0,0 a0,1 a0,m−2 a0,m−1
a1,0 a1,1 a1,m−2 a1,m−1

Row

an−2,0 an−2,1 an−2,m-2 an−2,m−1


an−1,0 an−1,1 an−1,m−2 an−1,m−1
Figure 10.1 An n × m matrix.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 198
Column
Multiply Sum
j results

Row

i
ci,j

A × B = C

Figure 10.2 Matrix multiplication, C = A × B.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 199
A × b = c

Row
sum
i ci

Figure 10.3 Matrix-vector multiplication


c = A × b.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 200
q Sum
Multiply results

A × B = C

Figure 10.4 Block matrix multiplication.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 201
a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3

a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3

×
a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3

a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3

(a) Matrices
A0,0 B0,0 A0,1 B1,0

a0,0 a0,1 b0,0 b0,1 a0,2 a0,3 b2,0 b2,1


× + ×
a1,0 a1,1 b1,0 b1,1 a1,2 a1,3 b3,0 b3,1

a0,0b0,0 + a0,1b1,0 a0,0b0,1 + a0,1b1,1 a0,2b2,0 + a0,3b3,0 a0,2b2,1 + a0,3b3,1

= +
a1,0b0,0 + a1,1b1,0 a1,0b0,1 + a1,1b1,1 a1,2b2,0 + a1,3b3,0 a1,2b2,1 + a1,3b3,1

a0,0b0,0 + a0,1b1,0 + a0,2b2,0 + a0,3b3,0 a0,0b0,1 + a0,1b1,1 + a0,2b2,1 + a0,3b3,1

=
a1,0b0,0 + a1,1b1,0 + a1,2b2,0 + a1,3b3,0 a1,0b0,1 + a1,1b1,1 + a1,2b2,1 + a1,3b3,1

= C0,0
(b) Multiplying A0,0 × B0,0 to obtain C0,0

Figure 10.5 Submatrix multiplication.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 202
Column j b[][j]

Row i a[i][]

Processor Pi,j

Figure 10.6 Direct implementation of


c[i][j] matrix multiplication.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 203
a0,0 b0,0 a0,1 b1,0 a0,2 b2,0 a0,3 b3,0

× × × ×
P0 P1 P2 P3

+ +
P0 P2

+
P0

Figure 10.7 Accumulation using a tree


c0,0 construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 204
i j
P0 P1 P2 P3
i
P0 + P1 P2 + P3
App Apq Bpp Bpq Cpp Cpq

j
P4 + P5 P6 + P7
Aqp Aqq Bqp Bqq Cqp Cqq

P4 P5 P6 P7
Figure 10.8 Submatrix multiplication and summation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 205
j

i
A

Pi,j
B Figure 10.9 Movement of A and B
elements.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 206
j

B
i
i places
A
j places ai,j+i
bi+j,j
Figure 10.10 Step 2 — Alignment of
elements of A and B.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 207
j

B
i

A
Pi,j

Figure 10.11 Step 4 — One-place shift of


elements of A and B.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 208
b3,3
b3,2 b2,3
b3,1 b2,2 b1,3
Pumping b2,1 b1,2
b3,0 b0,3
action b1,1 b0,2
b2,0
b1,0 b0,1
b0,0

a0,3 a0,2 a0,1 a0,0 c0,0 c0,1 c0,2 c0,3

One cycle delay

a1,3 a1,2 a1,1 a1,0 c1,0 c1,1 c1,2 c1,3

a2,3 a2,2 a2,1 a2,0 c2,0 c2,1 c2,2 c2,3

a3,3 a3,2 a3,1 a3,0 c3,0 c3,1 c3,2 c3,3

Figure 10.12 Matrix multiplication using a systolic array.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 209
b3
b2
Pumping b1
action b0

a0,3 a0,2 a0,1 a0,0 c0

a1,3 a1,2 a1,1 a1,0 c1

a2,3 a2,2 a2,1 a2,0 c2

a3,3 a3,2 a3,1 a3,0 c3 Figure 10.13 Matrix-vector multiplication


using a systolic array.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 210
Column

Row

Row i

aji
Step through
Row j

Cleared
to zero
Already
cleared
to zero Column i

Figure 10.14 Gaussian elimination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 211
Column

Row
n − i +1 elements
(including b[i])

Row i

Broadcast
ith row

Already
cleared
to zero

Figure 10.15 Broadcast in parallel implementation of Gaussian elimination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 212
P0 P1 P2 Pn−1

Row

Broadcast Figure 10.16 Pipeline implementation of


rows Gaussian elimination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 213
Row
0

P0

n/p

P1

2n/p

P2

3n/p

P3
Figure 10.17 Strip partitioning.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 214
Row
0

n/p

P0
2n/p P1

3n/p

Figure 10.18 Cyclic partitioning to


equalize workload.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 215
Solution space

∆ ∆
f(x, y)

y
x Figure 10.19 Finite difference method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 216
Boundary points (see text)

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

x11 x12 x13 x14 x15 x16 x17 x18 x19 x20

x21 x22 x23 x24 x25 x26 x27 x28 x29 x30

x31 x32 x33 x34 x35 x36 x37 x38 x39 x40

x41 x42 x43 x44 x45 x46 x47 x48 x49 x50

x51 x52 x53 x54 x55 x56 x57 x58 x59 x60

x61 x62 x63 x64 x65 x66 x67 x68 x69 x70

x71 x72 x73 x74 x75 x76 x77 x78 x79 x80

x81 x82 x83 x84 x85 x86 x87 x88 x89 x90

x91 x92 x93 x94 x95 x96 x97 x98 x99 x100

Figure 10.20 Mesh of points numbered in natural order.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 217
Those equations with a boundary To include
point on diagonal unnecessary boundary values x1 0
for solution and some zero x2 0
entries (see text)

1 1 −4 1 1
1 1 −4 1 1
ith equation 1 1 −4 1 1 × =
ai,i−n ai,i−1 ai,i ai,i+1 ai,i+n
1 1 −4 1 1
1 1 −4 1 1

xN-1 0
xN 0
A x

Figure 10.21 Sparse matrix for Laplace’s equation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 218
Sequential order of computation

Point
computed

Point to be
computed

Figure 10.22 Gauss-Seidel relaxation with natural order, computed sequentially.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 219
Red

Black

Figure 10.23 Red-black ordering.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 220
Figure 10.24 Nine-point stencil.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 221
Coarsest grid points Finer grid points
Processor

Figure 10.25 Multigrid processor


allocation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 222
50°C

40°C 60°C

Ambient temperature at edges of board = 20°C

Figure 10.26 Printed circuit board for Problem 10-18.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 223
j
Origin (0, 0)

Picture element p(i, j)


(pixel)

Figure 11.1 Pixmap.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 224
Number
of pixels

0 Gray level 255 Figure 11.2 Image histogram.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 225
x0 x1 x2

x3 x4 x5

x6 x7 x8
Figure 11.3 Pixel values for a 3 × 3 group.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 226
Step 1 Step 2 Step 3 Step 4
Each pixel adds Each pixel adds Each pixel adds pixel Each pixel adds pixel
pixel from left pixel from right from above from below

Figure 11.4 Four-step data transfer for the computation of mean.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 227
x0 x1 x2 x0 x1 x2

x0 + x1 x0 + x1 + x2

x3 x4 x5 x3 x4 x5

x3 + x4 x3 + x4 + x5

x6 x7 x8 x6 x7 x8

x6 + x7 x6 + x7 + x8

(a) Step 1 (b) Step 2

x0 x1 x2 x0 x1 x2

x0 + x1 + x2 x0 + x1 + x2

x3 x4 x5 x3 x4 x5
x0 + x1 + x2
x0 + x1 + x2
x3 + x4 + x5
x3 + x4 + x5
x6 + x7 + x8
x6 x7 x8 x6 x7 x8

x6 + x7 + x8 x6 + x7 + x8

(c) Step 3 (d) Step 4

Figure 11.5 Parallel mean data accumulation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 228
Largest Next largest
in row in row

Next largest
in column

Figure 11.6 Approximate median algorithm requiring six steps.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 229
Mask Pixels Result

w0 w1 w2 x0 x1 x2

w3 w4 w5 ⊗ x3 x4 x5 = x4'

w6 w7 w8 x6 x7 x8
Figure 11.7 Using a 3 × 3 weighted mask.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 230
1 1 1
1
k= 1 1 1
9
1 1 1
Figure 11.8 Mask to compute mean.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 231
1 1 1
1
k= 1 8 1
16
1 1 1
Figure 11.9 A noise reduction mask.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 232
−1 −1 −1
1
k=
9 −1 8 −1
Figure 11.10 High-pass sharpening filter
−1 −1 −1
mask.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 233
Intensity transition

First derivative

Second derivative
Figure 11.11 Edge detection using
differentiation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 234
x
Image

y
Constant
intensity
f(x, y)
φ

Gradient Figure 11.12 Gray level gradient and


direction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 235
−1 −1 −1 −1 0 1

0 0 0 −1 0 1

1 1 1 −1 0 1
Figure 11.13 Prewitt operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 236
−1 −2 −1 −1 0 1

0 0 0 −2 0 2

1 2 1 −1 0 1
Figure 11.14 Sobel operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 237
(a) Original image (Annabel) (b) Effect of Sobel operator

Figure 11.15 Edge detection with Sobel operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 238
0 −1 0

−1 4 −1

0 −1 0
Figure 11.16 Laplace operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 239
Upper pixel
x1

x3 x4 x5

Left pixel Right pixel

x7 Figure 11.17 Pixels used in Laplace


Lower pixel operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 240
Figure 11.18 Effect of Laplace operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 241
y b b = −x1a + y1
y = ax + b

(x1, y1) b = −xa + y


(a, b)

Pixel in image

x a
(a) (x, y) plane (b) Parameter space

Figure 11.19 Mapping a line into (a, b) space.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 242
y r
y = ax + b
r = x cos θ + y sin θ

(r, θ)

θ
r
x θ
(a) (x, y) plane (b) (r, θ) plane

Figure 11.20 Mapping a line into (r, θ) space.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 243
x

θ
r

y Figure 11.21 Normal representation using


image coordinate system.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 244
Accumulator

15
10
5
0
0°10°20°30° Figure 11.22 Accumulators, acc[r][θ], for
θ the Hough transform.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 245
k Transform Transform
rows columns
j

xjk Xjm Xlm

Figure 11.23 Two-dimensional DFT.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 246
Image Transform

Convolution Multiply Inverse


fj,k f(j, k) F(j, k) transform

∗ hj,k × H(j, k) h(j, k)

gj,k g(j, k) G(j, k)

Filter/image

(a) Direct convolution (b) Using Fourier transform

Figure 11.24 Convolution using Fourier transforms.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 247
Master process

w0 w1 wn−1

Slave processes

Figure 11.25 Master-slave approach for


X[0] X[1] X[n−1] implementing the DFT directly.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 248
x[j]

Process j Values for


next iteration
X[k]
+ X[k]
× a × x[j]

a
wk × a
Figure 11.26 One stage of a pipeline
wk
implementation of DFT algorithm.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 249
x[0] x[1] x[2] x[3] x[N−1]

Output sequence
0 X[k] X[0],X[1],X[2],X[3]…
1 a
wk wk
P0 P1 P2 P3 PN−1
(a) Pipeline structure

X[0] X[1] X[2] X[3] X[4] X[5] X[6]

PN−1

PN−2

Pipeline
stages

P2

P1

P0

Time
(b) Timing diagram

Figure 11.27 Discrete Fourier transform with a pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 250
Input sequence Transform
x0
x1
Xeven
+
x2 N/2 pt
x3 DFT Xk

N/2 pt
DFT − Xk+N/2
xN−2
xN−1 Xodd × wk
k = 0, 1, … N/2

Figure 11.28 Decomposition of N-point DFT into two N/2-point DFTs.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 251
x0 + + X0

x1 + − X1

x2 + + X2
Figure 11.29 Four-point discrete Fourier
x3 + − X3 transform.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 252
Xk = Σ(0,2,4,6,8,10,12,14)+wkΣ(1,3,5,7,9,11,13,15)

{Σ(0,4,8,12)+wkΣ(2,6,10,14)}+wk{Σ(1,5,9,13)+wkΣ(3,7,11,15)}

{[Σ(0,8)+wkΣ(4,12)]+wk[Σ(2,10)+wkΣ(6,14)]}+{[Σ(1,9)+wkΣ(5,13)]+wk[Σ(3,11)+wkΣ(7,15)]}

x0 x8 x4 x12 x2 x10 x6 x14 x1 x9 x5 x13 x3 x11 x7 x15


0000 1000 0100 1100 0010 1010 0110 1011 0001 1001 0101 1101 0011 1011 0111 1111

Figure 11.30 Sixteen-point DFT decomposition.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 253
x0 X0

x1 X1

x2 X2

x3 X3

x4 X4

x5 X5

x6 X6

x7 X7

x8 X8

x9 X9

x10 X10

x11 X11

x12 X12

x13 X13

x14 X14

x15 X15

Figure 11.31 Sixteen-point FFT computational flow.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 254
Process
Row

Inputs Outputs
P/r
0000 x0 X0

0001 x1 X1
P0
0010 x2 X2

0011 x3 X3

0100 x4 X4

0101 x5 X5
P1
0110 x6 X6

0111 x7 X7

1000 x8 X8

1001 x9 X9
P2
1010 x10 X10

1011 x11 X11

1100 x12 X12

1101 x13 X13


P3
1110 x14 X14

1111 x15 X15

Figure 11.32 Mapping processors onto 16-point FFT computation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 255
P0 P1 P2 P3

x0 x1 x2 x3

x4 x5 x6 x7

x8 x9 x10 x11

x12 x13 x14 x15 Figure 11.33 FFT using transpose


algorithm — first two steps.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 256
P0 P1 P2 P3

x0 x1 x2 x3

x4 x5 x6 x7

x8 x9 x10 x11

x12 x13 x14 x15 Figure 11.34 Transposing array for


transpose algorithm.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 257
P0 P1 P2 P3

x0 x4 x8 x12

x1 x5 x9 x13

x2 x6 x10 x14

x3 x7 x11 x15 Figure 11.35 FFT using transpose


algorithm — last two steps.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 258
7

2
Mask
1

1 2 3 4 5 6 7 Figure 11.36 Image for Problem 11-3.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 259
First choice C0 C1 Cn−1

Second choice Not Not Not


including including including
C0 C1 Cn−1

Third choice

Figure 12.1 State space tree.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 260
1 p p+1 m

Parent A A1 A2

1 p p+1 m

Parent B B1 B2

1 p p+1 m

Child 1 A1 B2

1 p p+1 m

Child 2 B1 A2
Figure 12.2 Single-point crossover.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 261
Subpopulation

Migration path;
every island sends
to every other island

Figure 12.3 Island model.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 262
Island subpopulations

Limited migration path Figure 12.4 Stepping stone model

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 263
Program

Instructions

Clock
Processors
with local
memory
Data

Shared memory
Figure D.1 PRAM model.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 264
d[0] s[0] d[1] s[1] d[2] s[2] d[3] s[3] d[4] s[4] d[5] s[5] d[6] s[6] d[7] s[7]

1 1 1 1 1 1 1 0
Null

2 2 2 2 2 2 1 0

4 4 4 4 3 2 1 0

7 6 5 4 3 2 1 0

Figure D.2 List ranking by pointer jumping.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 265
Threads or processes

Local computation
(maximum time w)

Maximum of h
sends or receives

Communication

Barrier synchronization

Figure D.3 A view of the bulk synchronous parallel model.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 266
o g
Pi

Next message
Processors Message

Pk

Pi
L o Time
Figure D.4 LogP parameters.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 267
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1998 268

You might also like