Slides PDF
Slides PDF
Slides PDF
1 Astrophysical N-body
simulation by Scott Linssen (undergraduate
University of North Carolina at Charlotte
[UNCC] student).
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 1
Main memory
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 2
Memory modules
One
address
space
Interconnection
network
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 3
Interconnection
network
Messages
Processor
Local
memory
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 4
Interconnection
network
Messages
Processor
Shared
memory
Figure 1.5 Shared memory multiprocessor
Computers implementation.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 5
Program Program
Instructions Instructions
Processor Processor
Data Data
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 6
P M Computers P M
C C
P M
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 7
Computer (node)
Switch
Links Links
to other to other
nodes nodes
Processor Memory
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 8
Link
Node Node
Figure 1.9 A link between two nodes with
separate wires in each direction.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 9
Figure 1.10 Ring.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 10
Computer/
Links processor
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 11
Root
Processing
element
Links
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 12
110 111
100 101
010 011
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 13
0110 0111 1110 1111
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 14
Ring
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 15
Nodal address
1011
10
11
01
00
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 16
A A
Root
A A
A A
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 17
Packet Head
Movement
Flit buffer
Request/
Acknowledge
signal(s)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 18
Source Destination
processor processor
Data
Figure 1.19 A signaling method between
R/A processors for wormhole routing (Ni and
McKinley, 1993).
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 19
Packet switching
Network
latency
Wormhole routing
Circuit switching
Distance
(number of nodes between source and destination) Figure 1.20 Network delay characteristics.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 20
Node 4 Node 3
Messages
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 21
Virtual channel
buffer Node Node
Route
Physical link
Figure 1.22 Multiple virtual channels mapped onto a single physical channel.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 22
Ethernet
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 23
Frame check Data Type Source Destination Preamble
sequence address address
(variable) (16 bits) (64 bits)
(32 bits) (48 bits) (48 bits)
Direction
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 24
Network
Workstation/
file server
Workstations
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 25
Workstations
Workstation/
file server
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 26
Parallel programming cluster
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 27
Process 1
Process 2 Computing
Process 3
Slope indicating time
to send message
Process 4
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 28
ts
fts (1 − f)ts
(b) Multiple
processors
n processors
(1 − f)ts /n
tp
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 29
20 f = 0% 20
n = 256
Speedup factor, S(n)
12 12
f = 5%
8 8
f = 10%
4 f = 20% 4
n = 16
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 30
Source
file
Compile to suit
processor
Executables
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 31
Process 1
Start execution
spawn(); of process 2 Process 2
Time
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 32
Process 1 Process 2
x y
Movement
send(&x, 2); of data
recv(&y, 1);
Figure 2.3 Passing a message between
processes using send() and recv()
library calls.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 33
Process 1 Process 2
Process 1 Process 2
Time recv();
Request to send Suspend
send(); process
Both processes Message
continue Acknowledgment
Figure 2.4 Synchronous send() and recv() library calls using a three-way protocol.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 34
Process 1 Process 2
Message buffer
Time send();
Continue recv();
process Read
message buffer
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 35
Process 0 Process 1 Process n − 1
Action
buf
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 36
Process 0 Process 1 Process n − 1
Action
buf
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 37
Process 0 Process 1 Process n − 1
Action
buf
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 38
Process 0 Process 1 Process n − 1
Action
buf +
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 39
Workstation
PVM
daemon
Application
program
(executable)
Messages
sent through
Workstation network
Workstation
PVM
daemon
Application
program PVM
(executable) daemon
Application
program
(executable)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 40
Workstation
PVM
daemon
Messages
sent through
Workstation network
PVM
daemon Workstation
PVM
daemon
Application
program
(executable)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 41
Process 1 Process 2
Array Send buffer Array to
holding receive
data Pack data
pvm_psend();
Continue pvm_precv(); Wait for message
process
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 42
Process_1
Process_2
pvm_initsend(); x
Send s
buffer y
pvm_pkint( … &x …);
pvm_pkstr( … &s …);
pvm_pkfloat( … &y …);
pvm_send(process_2 … ); Message
pvm_recv(process_1 …);
pvm_upkint( … &x …);
Receive pvm_upkstr( … &s …);
buffer pvm_upkfloat(… &y … );
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 43
#include <stdio.h> Master
#include <stdlib.h>
#include <pvm3.h>
#define SLAVE “spsum”
#define PROC 10
#define NELEM 1000
main() {
int mytid,tids[PROC];
int n = NELEM, nproc = PROC;
int no, i, who, msgtype; Slave
int data[NELEM],result[PROC],tot=0;
char fn[255]; #include <stdio.h>
FILE *fp; #include “pvm3.h”
mytid=pvm_mytid();/*Enroll in PVM */ #define PROC 10
#define NELEM 1000
/* Start Slave Tasks */
no= main() {
pvm_spawn(SLAVE,(char**)0,0,““,nproc,tids); int mytid;
if (no < nproc) { int tids[PROC];
printf(“Trouble spawning slaves \n”); int n, me, i, msgtype;
for (i=0; i<no; i++) pvm_kill(tids[i]); int x, nproc, master;
pvm_exit(); exit(1); int data[NELEM], sum;
}
mytid = pvm_mytid();
/* Open Input File and Initialize Data */
strcpy(fn,getenv(“HOME”)); /* Receive data from master */
strcat(fn,”/pvm3/src/rand_data.txt”); msgtype = 0;
if ((fp = fopen(fn,”r”)) == NULL) { pvm_recv(-1, msgtype);
printf(“Can’t open input file %s\n”,fn); pvm_upkint(&nproc, 1, 1);
exit(1); pvm_upkint(tids, nproc, 1);
} pvm_upkint(&n, 1, 1);
for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]); pvm_upkint(data, n, 1);
recv(…,0,…); lib()
recv(…,0,…);
Process 0 Process 1
send(…,1,…);
lib() send(…,1,…);
recv(…,0,…); lib()
recv(…,0,…);
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 45
#include “mpi.h”
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Finalize();
}
Figure 2.16 Sample MPI program.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 46
Time
Startup time
Figure 2.17 Theoretical communication
Number of data items (n) time.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 47
c2g(x) = 6x2
160
120
100
80
60 c1g(x) = 2x2
40
20
0
0 1 2 3 4 5
x0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 48
110 111
100 101
3rd step
010 011
2nd step
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 49
P000
Message
Step 1
P000 P001
Step 2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 50
Steps
1 2 3
2 3 4 4
3 4 5 5
4 5 6 6
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 51
Message
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 52
Source
Sequential
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 53
Source
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 54
Process 1
Process 2
Process 3
Time
Computing
Waiting
Message-passing system routine
Message
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 55
Number of repetitions or time
1 2 3 4 5 6 7 8 9 10
Statement number or regions of program Figure 2.26 Program profile.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 56
Input data
Processes
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 57
spawn() Send initial data
send()
recv()
Slaves
Master
send()
recv()
Collect results
Figure 3.2 Practical embarrassingly parallel computational graph with dynamic process
creation and the master-slave approach.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 58
x
80 Process
y 640
80 Map
480
Process
640
10
Map
480
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 59
+2
Imaginary
−2
−2 0 Real +2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 60
Work pool
Task
Return results/
request new task
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 61
Rows outstanding in slaves (count)
Row returned
Terminate
Decrement Figure 3.6 Counter termination.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 62
Total area = 4
2 Area = π
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 63
1
f(x)
1
y = 1 – x2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 64
Master
Partial sum
Request
Slaves
Random
number
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 65
x1 x2 xk-1 xk xk+1 xk+2 x2k-1 x2k
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 66
x0 … x(n/m)−1 xn/m … x(2n/m)−1 … x(m−1)n/m … xn−1
+ + +
Partial sums
Sum
Figure 4.1 Partitioning a sequence of numbers into parts and adding the parts.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 67
Initial problem
Divide
problem
Final tasks
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 68
Original list
P0
P0 P4
P0 P2 P4 P6
P0 P1 P2 P3 P4 P5 P6 P7
x0 xn−1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 69
x0 xn−1
P0 P1 P2 P3 P4 P5 P6 P7
P0 P2 P4 P6
P0 P4
P0
Final sum
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 70
Found/ OR
Not found
OR OR
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 71
Figure 4.6 Quadtree.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 72
Image area
First division
into four parts
Second division
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 73
Unsorted numbers
Buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 74
Unsorted numbers
p processors
Buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 75
n/m numbers
Unsorted numbers
p processors
Small
buckets
Empty
small
buckets
Large
buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 76
Process 0 Process n − 1
Send Receive
buffer buffer
Send
buffer
0 n−1 0 n−1 0 n−1 0 n−1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 77
“All-to-all”
P0 A0,0 A0,1 A0,2 A0,3 A0,0 A1,0 A2,0 A3,0
P3 A3,0 A3,1 A3,2 A3,3 A0,3 A1,3 A2,3 A3,3 Figure 4.12 Effect of “all-to-all” on an
array.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 78
f(x)
f(p) f(q)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 79
f(x)
f(p) f(q)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 80
f(x)
f(p) f(q)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 81
f(x)
C
A B
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 82
f(x)
C=0
A B
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 83
Center of mass
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 84
Subdivision
direction
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 85
Figure 4.20 Orthogonal recursive bisection
method.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 86
log n numbers
+ + + +
+ + + +
+ + + +
+ +
Binary Tree
Result
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 87
y
f(a)
f(x)
b
a x
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 88
Figure 4.23 Convex hull (Problem 4-22).
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 89
P0 P1 P2 P3 P4 P5
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 90
a[0] a[1] a[2] a[3] a[4]
a a a a a
sum sin sout sin sout sin sout sin sout sin sout
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 91
Signal without Signal without Signal without Signal without
frequency f0 frequency f1 frequency f2 frequency f3
f0 f1 f2 f3 f4
Filtered signal
f(t) fin fout fin fout fin fout fin fout fin fout
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 92
p−1 m
Instance Instance Instance Instance Instance
P5 1 2 3 4 5
Instance Instance Instance Instance Instance Instance
P4 1 2 3 4 5 6
Instance Instance Instance Instance Instance Instance Instance
P3 1 2 3 4 5 6 7
Instance Instance Instance Instance Instance Instance Instance
P2 1 2 3 4 5 6 7
Instance Instance Instance Instance Instance Instance Instance
P1 1 2 3 4 5 6 7
Instance Instance Instance Instance Instance Instance Instance
P0 1 2 3 4 5 6 7
Time
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 93
Instance 0 P0 P1 P2 P3 P4 P5
Instance 1 P0 P1 P2 P3 P4 P5
Instance 2 P0 P1 P2 P3 P4 P5
Instance 3 P0 P1 P2 P3 P4 P5
Instance 4 P0 P1 P2 P3 P4 P5
Time
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 94
Input sequence
d9d8d7d6d5d4d3d2d1d0 P0 P1 P2 P3 P4 P5 P6 P7 P8 P9
P9 d0 d1 d2 d3 d4 d5 d6
P8 d0 d1 d2 d3 d4 d5 d6 d7
P7 d0 d1 d2 d3 d4 d5 d6 d7 d8
P6 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
P5 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
P4 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
P3 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
P2 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
P1 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
P0 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
Time
(b) Timing diagram
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 95
P5 P5
P4 P4
Information
P3 P3
transfer
sufficient to P2 P2
start next
process P1 P1
Information passed
P0 to next stage P0
Time Time
(a) Processes with the same (b) Processes not with the
execution time same execution time
Figure 5.7 Pipeline processing where information passes to next stage before end of process.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 96
Processor 0 Processor 1 Processor 2
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 97
Multiprocessor
Host
computer
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 98
2 3 4 5
Σ1 i Σ1 i Σ1 i Σ1 i Σ1 i
P0 P1 P2 P3 P4
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 99
Master process Slaves
Sum
Figure 5.11 Pipelined addition numbers with a master process and ring configuration.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 100
Master process
Numbers
d0 d1 Slaves dn−1
P0 P1 P2 Pn−1
Sum
Figure 5.12 Pipelined addition of numbers with direct access to slave processes.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 101
P0 P1 P2 P3 P4
1 4, 3, 1, 2, 5
2 4, 3, 1, 2 5
2
3 4, 3, 1 5
1
4 4, 3 5 2
3 1
5 4 5 2
Time
(cycles) 4 2
6 5 3 1
3 1
7 5 4 2
2
8 5 4 3 1
1
9 5 4 3 2
10 5 4 3 2 1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 102
P0 Smaller P1 P2
numbers
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 103
Master process
dn−1… d2d1d0
P0 P1 P2 Pn−1
Sorted sequence
Figure 5.15 Insertion sort with results returned to the master process using a bidirectional line configuration.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 104
Sorting phase Returning sorted numbers
2n − 1 n
P4 Shown for n = 5
P3
P2
P1
P0
Time
Figure 5.16 Insertion sort with results returned.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 105
Not multiples of
1st prime number
P0 P1 P2
Series of numbers
xn−1 … x1x0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 106
P0 P1 P2 P3
x0 x0 x0
x0 x1 x1
Compute x0 Compute x1 x1 Compute x2 Compute x3
x2 x2
x3
Figure 5.18 Solving an upper triangular set of linear equation using a pipeline.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 107
P5
P4
P1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 108
P0 P1 P2 P3 P4
divide
send(x0) ⇒ recv(x0)
end send(x0) ⇒ recv(x0)
multiply/add send(x0) ⇒ recv(x0)
divide/subtract multiply/add send(x0) ⇒ recv(x0)
send(x1) ⇒ recv(x1) multiply/add send(x1) ⇒
end send(x1) ⇒ recv(x1) multiply/add
multiply/add send(x1) ⇒ recv(x1)
divide/subtract multiply/add send(x1) ⇒
Time
send(x2) ⇒ recv(x2) multiply/add
end send(x2) ⇒ recv(x2)
multiply/add send(x2) ⇒
divide/subtract multiply/add
send(x3) ⇒ recv(x3)
end send(x3) ⇒
multiply/add
divide/subtract
send(x4) ⇒
end
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 109
x1 x2 x3 x4
x x x x
y4y3y2y1 yin yout yin yout yin yout yin yout Output
a a a a
a1 a2 a3 a4
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 110
Display Display
Audio input
(digitized)
Pipeline
Audio input
(digitized)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 111
Processes
P0 P1 P2 Pn−1
Active
Time
Waiting
Barrier
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 112
Processes
P0 P1 Pn−1
Barrier();
Barrier();
Processes wait until
all reach their Barrier();
barrier call
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 113
Processes
P0 P1 Pn−1
Counter, C
Increment Barrier();
and check for n
Barrier();
Barrier();
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 114
Master Slave processes
Arrival Barrier:
for(i=0;i<n;i++) send(Pmaster);
phase
recv(Pany); recv(Pmaster);
Departure
for(i=0;i<n;i++)
phase
send(Pi); Barrier:
send(Pmaster);
recv(Pmaster);
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 115
P0 P1 P2 P3 P4 P5 P6 P7
Arrival Sychronizing
at barrier message
Departure
from barrier
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 116
P0 P1 P2 P3 P4 P5 P6 P7
1st stage
Time
2nd stage
3rd stage
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 117
Instruction
a[] = a[] + k;
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 118
Numbers x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15
Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step 1 Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
(j = 0) i=0 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 i=11 i=12 i=13 i=14
Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step 2 Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
(j = 1) i=0 i=0 i=0 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 i=11 i=12
Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step 3 Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
(j = 2) i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8
Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Final step Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
(j = 3) i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0 i=0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 119
Computed
value
Error
Exact value
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 120
Process 0 Process 1 Process n − 1
data data data
Send x0 x1 xn−1
buffer
Receive
buffer
Allgather(); Allgather(); Allgather();
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 121
2 × 106
Execution
time
(τ = 1)
1 × 106
Overall
Communication
0 Computation
0 4 8 12 16 20 24 28 32
Number of processors, p
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 122
j
Metal plate
i
Enlarged
hi−1,j
hi,j
hi,j−1 hi,j+1
hi+1,j
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 123
x1 x2 xk−1 xk
xi−k
xi−1 xi+1
xi
xi+k
xk2 Figure 6.13 Natural ordering of heat
distribution problem.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 124
j send(g, Pi-1,j);
column send(g, Pi+1,j);
send(g, Pi,j-1);
i send(g, Pi,j+1);
row
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);
send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 125
P0 P1 Pp−1
P0 P1
Pp−1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 126
n
---
p n
Square blocks
Strips
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 127
2000
1000
0
1 10 100 1000
Figure 6.17 Startup times for block and
Processors, p strip partitions.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 128
Process i
Array held
by process i
One row
of points
Ghost points
Copy
Array held
by process i+1
Process i+1
Figure 6.18 Configurating array into contiguous rows for each process, with ghost points.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 129
20°C 4ft
100°C
10ft
10ft
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 130
vehicle
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 131
Airflow
Actual dimensions
selected at will
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 132
P5
P4
P
Processors 3
P2
P1
P0
Time
(a) Imperfect load balancing leading
to increased execution time
P5
P4
P
Processors 3
P2
P1
P0
t
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 133
Work pool
Queue
Tasks
Master
process
Send task
Request task
(and possibly
submit new tasks)
Slave “worker” processes
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 134
Initial tasks
Master, Pmaster
Slaves
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 135
Process
Process
Requests/tasks
Process
Process
Figure 7.4 Decentralized work pool.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 136
Slave Pi Slave Pj
Requests Requests
Local Local
selection selection
algorithm algorithm
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 137
Master
process
P0
P1 P2 P3 Pn−1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 138
Pcomm
If buffer empty,
make request Request for task
Ptask
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 139
P0
Task
when
requested
P1 P2
P3 P5 P4 P6
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 140
Parent
Process Final
acknowledgment
Inactive First task
Acknowledgment
Task
Other processes
Active Figure 7.9 Termination using message
acknowledgments.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 141
Token passed to next processor
when reached local termination condition
P0 P1 P2 Pn−1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 142
Token
AND
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 143
Task
P0 Pj Pi Pn−1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 144
AND
Terminated AND
AND Terminated
Terminated
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 145
Summit
F
B D
A
Base camp Possible intermediate camps
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 146
F 17
E
9
51
24 D
13
14
10 8
A B C
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 147
Destination
A B C D E F
A ∞ 10 ∞ ∞ ∞ ∞
B ∞ ∞ 8 13 24 51
C ∞ ∞ ∞ 14 ∞ ∞
Source
D ∞ ∞ ∞ ∞ 9 ∞
E ∞ ∞ ∞ ∞ ∞ 17
F ∞ ∞ ∞ ∞ ∞ ∞
Weight NULL
A B 10
B C 8 D 13 E 24 F 51
C D 14
Source
D E 9
E F 17
F
(b) Adjacency list
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 148
Vertex j
di Vertex i wi,j
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 149
Master process
Start at
source
vertex
Vertex Vertex w[]
w[]
New
distance
dist
Vertex w[]
dist Process C
New
Process A distance
Other processes
dist
Process B
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 150
Entrance
Search path
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 151
Gold
Entrance
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 152
Room B
Door
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 153
Bus
Cache
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 154
TABLE 8.1 SOME EARLY PARALLEL PROGRAMMING LANGUAGES
a. Brinch Hansen, P. (1975), “The Programming Language Concurrent Pascal,” IEEE Trans. Software Eng.,
Vol. 1, No. 2 (June), pp. 199–207.
b. U.S. Department of Defense (1981), “The Programming Language Ada Reference Manual,” Lecture
Notes in Computer Science, No. 106, Springer-Verlag, Berlin.
c. Bräunl, T., R. Norz (1992), Modula-P User Manual, Computer Science Report, No. 5/92 (August), Univ.
Stuttgart, Germany.
d. Thinking Machines Corp. (1990), C* Programming Guide, Version 6, Thinking Machines System Docu-
mentation.
e. Gehani, N., and W. D. Roome (1989), The Concurrent C Programming Language, Silicon Press, New
Jersey.
f. Fox, G., S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu (1990), Fortran D
Language Specification, Technical Report TR90-141, Dept. of Computer Science, Rice University.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 155
Main program
FORK
Spawned processes
FORK
FORK
JOIN JOIN
JOIN JOIN Figure 8.2 FORK-JOIN construct.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 156
Code Heap
IP
Stack
Interrupt routines
Files
(a) Process
Code Heap
Stack Thread
IP
Interrupt routines
Stack Thread
Files
IP
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 157
Main program
thread1
proc1(&arg)
{
pthread_create(&thread1, NULL, proc1, &arg);
return(*status);
}
pthread_join(thread1, *status);
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 158
Main program
pthread_create(); Thread
pthread_create();
Thread
Thread
pthread_create(); Termination
Termination
Termination
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 159
Shared variable, x
Write Write
Read Read
+1 +1
Figure 8.6 Conflict in accessing shared
Process 1 Process 2 variable.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 160
Process 1 Process 2
while (lock == 1) do_nothing; while (lock == 1)do_nothing;
lock = 1;
Critical section
lock = 0;
lock = 1;
Critical section
lock = 0;
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 161
R1 R2 Resource
P1 P2 Process
R1 R2 Rn −1 Rn
P1 P2 Pn −1 Pn
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 162
Main memory
7
6
5
4
Block 3
2
1
0
Address
tag
Cache Cache
Block in cache
Processor 1 Processor 2
Figure 8.9 False sharing in caches.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 163
sum
Array a[]
addr
Figure 8.10 Shared memory locations for Section 8.4.1 program example.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 164
global_index sum
Array a[]
addr
Figure 8.11 Shared memory locations for Section 8.4.2 program example.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 165
TABLE 8.2 LOGIC CIRCUIT DESCRIPTION FOR FIGURE 8.12
Test1
1 2 Output1
Test2
3 Output2
Test3 Figure 8.12 Sample logic circuit.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 166
Log
Movement
of logs
River
Frog
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 167
Pool of threads
Master Signal
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 168
a[i] a[0] a[i] a[n-1]
Compare
Increment
counter, x
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 169
a[i] a[0] a[i] a[1] a[i] a[2] a[i] a[3]
Compare
0/1 0/1 0/1 0/1
Add Add
0/1/2 0/1/2
Tree
Add
0/1/2/3/4
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 170
Master
a[] b[]
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 171
Sequence of steps
P1 P2
1
A Send(A) B
If A > B send(B)
else send(A)
If A > B load A
2 else load B
Compare 3
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 172
P1 P2
1
A Send(A) B
Send(B)
2
If A > B load B If A > B load A
3 Compare Compare 3
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 173
P1 P2
Merge
88 88 98
Original 50 50 Keep
88 higher
numbers 28 28 80 numbers
25 25 50
43 98 43
42 Return
Final 42 80 lower
numbers 28 43 28
25 numbers
25 42
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 174
P1 P2
Original
Merge numbers Merge
98 98 Keep
98 98
80 80 higher
88 88
43 43 numbers
80 80 (final
50 42 42 50
Keep numbers)
43 88 88 43
lower 42 42
numbers 50 50
28 28 28 28
(final 25 25
numbers) 25 25
Original
numbers
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 175
Original
sequence: 4 2 7 8 5 1 3 6
4 2 7 8 5 1 3 6
2 4 7 8 5 1 3 6
2 4 7 8 5 1 3 6
Phase 1
Place 2 4 7 8 5 1 3 6
largest
number
2 4 7 5 8 1 3 6
2 4 7 5 1 8 3 6
2 4 7 5 1 3 8 6
2 4 7 5 1 3 6 8
2 4 7 5 1 3 6 8
Phase 2
2 4 7 5 1 3 6 8
Place
next
largest 2 4 5 7 1 3 6 8
number
2 4 5 1 7 3 6 8
2 4 5 1 3 7 6 8
2 4 5 1 3 6 7 8
Phase 3
2 4 5 1 3 6 7 8
Time
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 176
Phase 1
Phase 2
2 1
Time 2 1
Phase 3
3 2 1
3 2 1
Phase 4
4 3 2 1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 177
P0 P1 P2 P3 P4 P5 P6 P7
Step
0 4 2 7 8 5 1 3 6
1 2 4 7 8 1 5 3 6
2 2 4 7 1 8 3 5 6
3 2 4 1 7 3 8 5 6
Time 4 2 1 4 3 7 5 8 6
5 1 2 3 4 5 7 6 8
6 1 2 3 4 5 6 7 8
7 1 2 3 4 5 6 7 8
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 178
Smallest
number
Largest
number Figure 9.11 Snakelike sorted list.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 179
4 14 8 2 2 4 8 14 1 4 7 3
10 3 13 16 16 13 10 3 2 5 8 6
7 15 1 5 1 5 7 15 12 11 9 14
12 6 11 9 12 11 9 6 16 13 10 15
(a) Original placement (b) Phase 1 — Row sort (c) Phase 2 — Column sort
of numbers
1 3 4 7 1 3 4 2 1 2 3 4
8 6 5 2 8 6 5 7 8 7 6 5
9 11 12 14 9 11 12 10 9 10 11 12
16 15 13 10 16 15 13 14 16 15 14 13
(d) Phase 3 — Row sort (e) Phase 4 — Column sort (f) Final phase — Row sort
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 180
(a) Operations between elements (b) Transpose operation (c) Operations between elements
in rows in rows (originally columns)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 181
Unsorted list
4 2 7 8 5 1 3 6 P0
4 2 7 8 5 1 3 6 P0 P4
Divide
list
4 2 7 8 5 1 3 6 P0 P2 P4 P6
4 2 7 8 5 1 3 6 P0 P1 P2 P3 P4 P5 P6 P7
2 4 7 8 1 5 3 6 P0 P2 P4 P6
Merge
2 4 7 8 1 3 5 6 P0 P4
1 2 3 4 5 6 7 8 P0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 182
Unsorted list
Pivot
4 2 7 8 5 1 3 6 P0
3 2 1 4 5 7 8 6 P0 P4
2 1 3 4 5 7 8 6 P0 P2 P4 P6
1 2 3 6 7 8 P0 P1 P6 P7
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 183
Unsorted list
Pivot
4 2 7 8 5 1 3 6 4
3 2 1 5 7 8 6 3 5
1 2 7 8 6 1 7
2 6 8 2 6 8
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 184
Work pool
Sublists
Request
sublist Return
sublist
Figure 9.17 Work pool implementation of
Slave processes quicksort.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 185
(a) Phase 1 000 001 010 011 100 101 110 111
≤ p1 > p1
(b) Phase 2 000 001 010 011 100 101 110 111
≤ p2 > p2 ≤ p3 > p3
(c) Phase 3 000 001 010 011 100 101 110 111
Figure 9.18 Hypercube quicksort algorithm when the numbers are originally in node 000.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 186
Broadcast pivot, p1
(a) Phase 1 000 001 010 011 100 101 110 111
≤ p1 > p1
(b) Phase 2 000 001 010 011 100 101 110 111
≤ p2 > p2 ≤ p3 > p3
(c) Phase 3 000 001 010 011 100 101 110 111
Figure 9.19 Hypercube quicksort algorithm when numbers are distributed among nodes.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 187
110 111
100 101
000 001
110 111
100 101
000 001
110 111
100 101
Figure 9.20 Hypercube quicksort
communication.
000 001
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 188
Broadcast pivot, p1
(a) Phase 1 000 001 011 010 110 111 101 100
≤ p1 > p1
(b) Phase 2 000 001 011 010 110 111 101 100
≤ p2 > p2 ≤ p3 > p3
(c) Phase 3 000 001 011 010 110 111 101 100
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 189
a[] b[]
Sorted lists 2 4 5 8 1 3 6 7
Merge
Even indices
Odd indices Merge
c[] 1 2 5 6 d[] 3 4 7 8
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 190
Compare and
exchange
bn c2n
bn−1 c2n−1
c2n−2
Even
mergesort
b4
b3
b2
b1
an
an−1
Odd c7
mergesort c6
c5
a4 c4
a3 c3
a2 c2
a1 c1 Figure 9.23 Odd-even mergesort.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 191
Value
a0, a1, a2, a3, … an−2, an−1 a0, a1, a2, a3, … an−2, an−1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 192
Bitonic sequence
3 5 8 9 7 4 2 1
Compare and
exchange
3 4 2 1 7 5 8 9
Figure 9.25 Creating two bitonic
Bitonic sequence Bitonic sequence sequences from one bitonic sequence.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 193
Unsorted numbers
3 5 8 9 7 4 2 1
Compare and
exchange
3 4 2 1 7 5 8 9
2 1 3 4 7 5 8 9
1 2 3 4 5 7 8 9
Sorted list Figure 9.26 Sorting a bitonic sequence.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 194
Unsorted numbers
Bitonic
sorting
operation
Direction
of increasing
numbers
Sorted list
Figure 9.27 Bitonic mergesort.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 195
Compare and exchange
ai with ai+n/2 (n numbers)
8 3 4 7 9 2 1 5 = bitonic list
Step [Fig. 9.24 (a) or (b)]
1 n=2 ai with ai+1
Form
bitonic lists 3 8 7 4 2 9 5 1
of four
numbers
2 n=4 ai with ai+2
Split
Form 3 4 7 8 5 9 2 1
bitonic list
of eight
numbers 3 n=2 ai with ai+1
Sort
3 4 7 8 9 5 2 1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 196
88 98
50 80
Step 1 28 43
25 42
50 98
42 88
Step 2 28 80
25 43
43 98
42 88
Step 3 28 80
25 50
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 197
Column
a0,0 a0,1 a0,m−2 a0,m−1
a1,0 a1,1 a1,m−2 a1,m−1
Row
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 198
Column
Multiply Sum
j results
Row
i
ci,j
A × B = C
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 199
A × b = c
Row
sum
i ci
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 200
q Sum
Multiply results
A × B = C
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 201
a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3
×
a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3
(a) Matrices
A0,0 B0,0 A0,1 B1,0
= +
a1,0b0,0 + a1,1b1,0 a1,0b0,1 + a1,1b1,1 a1,2b2,0 + a1,3b3,0 a1,2b2,1 + a1,3b3,1
=
a1,0b0,0 + a1,1b1,0 + a1,2b2,0 + a1,3b3,0 a1,0b0,1 + a1,1b1,1 + a1,2b2,1 + a1,3b3,1
= C0,0
(b) Multiplying A0,0 × B0,0 to obtain C0,0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 202
Column j b[][j]
Row i a[i][]
Processor Pi,j
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 203
a0,0 b0,0 a0,1 b1,0 a0,2 b2,0 a0,3 b3,0
× × × ×
P0 P1 P2 P3
+ +
P0 P2
+
P0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 204
i j
P0 P1 P2 P3
i
P0 + P1 P2 + P3
App Apq Bpp Bpq Cpp Cpq
j
P4 + P5 P6 + P7
Aqp Aqq Bqp Bqq Cqp Cqq
P4 P5 P6 P7
Figure 10.8 Submatrix multiplication and summation.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 205
j
i
A
Pi,j
B Figure 10.9 Movement of A and B
elements.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 206
j
B
i
i places
A
j places ai,j+i
bi+j,j
Figure 10.10 Step 2 — Alignment of
elements of A and B.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 207
j
B
i
A
Pi,j
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 208
b3,3
b3,2 b2,3
b3,1 b2,2 b1,3
Pumping b2,1 b1,2
b3,0 b0,3
action b1,1 b0,2
b2,0
b1,0 b0,1
b0,0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 209
b3
b2
Pumping b1
action b0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 210
Column
Row
Row i
aji
Step through
Row j
Cleared
to zero
Already
cleared
to zero Column i
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 211
Column
Row
n − i +1 elements
(including b[i])
Row i
Broadcast
ith row
Already
cleared
to zero
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 212
P0 P1 P2 Pn−1
Row
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 213
Row
0
P0
n/p
P1
2n/p
P2
3n/p
P3
Figure 10.17 Strip partitioning.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 214
Row
0
n/p
P0
2n/p P1
3n/p
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 215
Solution space
∆ ∆
f(x, y)
y
x Figure 10.19 Finite difference method.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 216
Boundary points (see text)
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
x11 x12 x13 x14 x15 x16 x17 x18 x19 x20
x21 x22 x23 x24 x25 x26 x27 x28 x29 x30
x31 x32 x33 x34 x35 x36 x37 x38 x39 x40
x41 x42 x43 x44 x45 x46 x47 x48 x49 x50
x51 x52 x53 x54 x55 x56 x57 x58 x59 x60
x61 x62 x63 x64 x65 x66 x67 x68 x69 x70
x71 x72 x73 x74 x75 x76 x77 x78 x79 x80
x81 x82 x83 x84 x85 x86 x87 x88 x89 x90
x91 x92 x93 x94 x95 x96 x97 x98 x99 x100
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 217
Those equations with a boundary To include
point on diagonal unnecessary boundary values x1 0
for solution and some zero x2 0
entries (see text)
1 1 −4 1 1
1 1 −4 1 1
ith equation 1 1 −4 1 1 × =
ai,i−n ai,i−1 ai,i ai,i+1 ai,i+n
1 1 −4 1 1
1 1 −4 1 1
xN-1 0
xN 0
A x
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 218
Sequential order of computation
Point
computed
Point to be
computed
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 219
Red
Black
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 220
Figure 10.24 Nine-point stencil.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 221
Coarsest grid points Finer grid points
Processor
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 222
50°C
40°C 60°C
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 223
j
Origin (0, 0)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 224
Number
of pixels
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 225
x0 x1 x2
x3 x4 x5
x6 x7 x8
Figure 11.3 Pixel values for a 3 × 3 group.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 226
Step 1 Step 2 Step 3 Step 4
Each pixel adds Each pixel adds Each pixel adds pixel Each pixel adds pixel
pixel from left pixel from right from above from below
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 227
x0 x1 x2 x0 x1 x2
x0 + x1 x0 + x1 + x2
x3 x4 x5 x3 x4 x5
x3 + x4 x3 + x4 + x5
x6 x7 x8 x6 x7 x8
x6 + x7 x6 + x7 + x8
x0 x1 x2 x0 x1 x2
x0 + x1 + x2 x0 + x1 + x2
x3 x4 x5 x3 x4 x5
x0 + x1 + x2
x0 + x1 + x2
x3 + x4 + x5
x3 + x4 + x5
x6 + x7 + x8
x6 x7 x8 x6 x7 x8
x6 + x7 + x8 x6 + x7 + x8
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 228
Largest Next largest
in row in row
Next largest
in column
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 229
Mask Pixels Result
w0 w1 w2 x0 x1 x2
w3 w4 w5 ⊗ x3 x4 x5 = x4'
w6 w7 w8 x6 x7 x8
Figure 11.7 Using a 3 × 3 weighted mask.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 230
1 1 1
1
k= 1 1 1
9
1 1 1
Figure 11.8 Mask to compute mean.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 231
1 1 1
1
k= 1 8 1
16
1 1 1
Figure 11.9 A noise reduction mask.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 232
−1 −1 −1
1
k=
9 −1 8 −1
Figure 11.10 High-pass sharpening filter
−1 −1 −1
mask.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 233
Intensity transition
First derivative
Second derivative
Figure 11.11 Edge detection using
differentiation.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 234
x
Image
y
Constant
intensity
f(x, y)
φ
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 235
−1 −1 −1 −1 0 1
0 0 0 −1 0 1
1 1 1 −1 0 1
Figure 11.13 Prewitt operator.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 236
−1 −2 −1 −1 0 1
0 0 0 −2 0 2
1 2 1 −1 0 1
Figure 11.14 Sobel operator.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 237
(a) Original image (Annabel) (b) Effect of Sobel operator
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 238
0 −1 0
−1 4 −1
0 −1 0
Figure 11.16 Laplace operator.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 239
Upper pixel
x1
x3 x4 x5
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 240
Figure 11.18 Effect of Laplace operator.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 241
y b b = −x1a + y1
y = ax + b
Pixel in image
x a
(a) (x, y) plane (b) Parameter space
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 242
y r
y = ax + b
r = x cos θ + y sin θ
(r, θ)
θ
r
x θ
(a) (x, y) plane (b) (r, θ) plane
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 243
x
θ
r
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 244
Accumulator
15
10
5
0
0°10°20°30° Figure 11.22 Accumulators, acc[r][θ], for
θ the Hough transform.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 245
k Transform Transform
rows columns
j
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 246
Image Transform
Filter/image
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 247
Master process
w0 w1 wn−1
Slave processes
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 248
x[j]
a
wk × a
Figure 11.26 One stage of a pipeline
wk
implementation of DFT algorithm.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 249
x[0] x[1] x[2] x[3] x[N−1]
Output sequence
0 X[k] X[0],X[1],X[2],X[3]…
1 a
wk wk
P0 P1 P2 P3 PN−1
(a) Pipeline structure
PN−1
PN−2
Pipeline
stages
P2
P1
P0
Time
(b) Timing diagram
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 250
Input sequence Transform
x0
x1
Xeven
+
x2 N/2 pt
x3 DFT Xk
N/2 pt
DFT − Xk+N/2
xN−2
xN−1 Xodd × wk
k = 0, 1, … N/2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 251
x0 + + X0
x1 + − X1
x2 + + X2
Figure 11.29 Four-point discrete Fourier
x3 + − X3 transform.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 252
Xk = Σ(0,2,4,6,8,10,12,14)+wkΣ(1,3,5,7,9,11,13,15)
{Σ(0,4,8,12)+wkΣ(2,6,10,14)}+wk{Σ(1,5,9,13)+wkΣ(3,7,11,15)}
{[Σ(0,8)+wkΣ(4,12)]+wk[Σ(2,10)+wkΣ(6,14)]}+{[Σ(1,9)+wkΣ(5,13)]+wk[Σ(3,11)+wkΣ(7,15)]}
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 253
x0 X0
x1 X1
x2 X2
x3 X3
x4 X4
x5 X5
x6 X6
x7 X7
x8 X8
x9 X9
x10 X10
x11 X11
x12 X12
x13 X13
x14 X14
x15 X15
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 254
Process
Row
Inputs Outputs
P/r
0000 x0 X0
0001 x1 X1
P0
0010 x2 X2
0011 x3 X3
0100 x4 X4
0101 x5 X5
P1
0110 x6 X6
0111 x7 X7
1000 x8 X8
1001 x9 X9
P2
1010 x10 X10
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 255
P0 P1 P2 P3
x0 x1 x2 x3
x4 x5 x6 x7
x8 x9 x10 x11
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 256
P0 P1 P2 P3
x0 x1 x2 x3
x4 x5 x6 x7
x8 x9 x10 x11
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 257
P0 P1 P2 P3
x0 x4 x8 x12
x1 x5 x9 x13
x2 x6 x10 x14
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 258
7
2
Mask
1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 259
First choice C0 C1 Cn−1
Third choice
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 260
1 p p+1 m
Parent A A1 A2
1 p p+1 m
Parent B B1 B2
1 p p+1 m
Child 1 A1 B2
1 p p+1 m
Child 2 B1 A2
Figure 12.2 Single-point crossover.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 261
Subpopulation
Migration path;
every island sends
to every other island
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 262
Island subpopulations
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 263
Program
Instructions
Clock
Processors
with local
memory
Data
Shared memory
Figure D.1 PRAM model.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 264
d[0] s[0] d[1] s[1] d[2] s[2] d[3] s[3] d[4] s[4] d[5] s[5] d[6] s[6] d[7] s[7]
1 1 1 1 1 1 1 0
Null
2 2 2 2 2 2 1 0
4 4 4 4 3 2 1 0
7 6 5 4 3 2 1 0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 265
Threads or processes
Local computation
(maximum time w)
Maximum of h
sends or receives
Communication
Barrier synchronization
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 266
o g
Pi
Next message
Processors Message
Pk
Pi
L o Time
Figure D.4 LogP parameters.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 267
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998 268