MPI Exercises PDF
MPI Exercises PDF
MPI Exercises PDF
8 documentation
Using MPI
We illustrate the collective communication commands to scatter data and gather results. Point-
to-point communication happens via a send and a recv (receive) command.
100
S = i.
∑
i=1
Scattering an array of 100 number over 4 processors and gathering the partial sums at the 4
processors to the root is displayed in Fig. 18.
The scatter and gather are of the collective communication type, as every process in the
universe participates in this operation. The MPI commands to scatter and gather are
respectively MPI_Scatter and MPI_Gather .
The specifications of the MPI command to scatter data from one member to all members of a
group are described in Table 2. The specifications of the MPI command to gather data from all
members to one member in a group are listed Table 3.
MPI_SCATTER(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm)
sendbuf address of send buffer
sendcount number of elements sent to each process
sendtype data type of send buffer elements
recvbuf address of receive buffer
recvcount number of elements in receive buffer
recvtype data type of receive buffer elements
root rank of sending process
comm communicator
Table 3 Arguments of the MPI_Gather command.
MPI_GATHER(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm)
sendbuf starting address of send buffer
sendcount number of elements in send buffer
sendtype data buffer of send buffer elements
recvbuf address of receive buffer
recvcount number of elements for any single receive
recvtype data type of receive buffer elements
root rank of receiving process
comm communicator
The code for parallel summation, in the program parallel_sum.c , illustrates the scatter and the
gather.
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Scatter(data,25,MPI_INT,tosum,25,MPI_INT,0,MPI_COMM_WORLD);
MPI_Gather(&sums[myid],1,MPI_INT,sums,1,MPI_INT,0,MPI_COMM_WORLD);
http://homepages.math.uic.edu/~jan/mcs572/mcs572notes/lec05.html 2/7
29/01/2019 Using MPI — mcs572 0.7.8 documentation
if(myid==0) /* after the gather, sums contains the four sums */
{
printf("The four sums : ");
printf("%d",sums[0]);
for(j=1; j<4; j++) printf(" + %d", sums[j]);
for(j=1; j<4; j++) sums[0] += sums[j];
printf(" = %d, which should be 5050.\n",sums[0]);
}
MPI_Finalize();
return 0;
}
of values for x. A session with the parallel code with 4 processes runs as
To perform point-to-point communication with MPI are MPI_Send and MPI_Recv . The syntax for
the blocking send operation is in Table 4. Table 5 explains the blocking receive operation.
http://homepages.math.uic.edu/~jan/mcs572/mcs572notes/lec05.html 3/7
29/01/2019 Using MPI — mcs572 0.7.8 documentation
Code for a parallel square is below. Every MPI_Send is matched by a MPI_Recv . Observe that
there are two loops in the code. One loop is explicitly executed by the root. The other, implicit
loop, is executed by the mpiexec -n p command.
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
http://homepages.math.uic.edu/~jan/mcs572/mcs572notes/lec05.html 4/7
29/01/2019 Using MPI — mcs572 0.7.8 documentation
double startwtime,endwtime,totalwtime;
startwtime = MPI_Wtime();
/* code to be timed */
endwtime = MPI_Wtime();
totalwtime = endwtime - startwtime;
The story of Fig. 19 can be told as follows. Consider the distribution of a pile of 8 pages among
8 people. We can do this in three stages:
Already from this simple example, we observe the pattern needed to formalize the algorithm. At
stage k, processor i communicates with the processor with identification number i + 2k .
The algorithm for fan out broadcast has a short description, shown below.
The cost to broadcast of one item is O(p) for a sequential broadcast, is O(log2 (p)) for a fan
out broadcast. The cost to scatter n items is O(p × n/p) for a sequential broadcast, is
http://homepages.math.uic.edu/~jan/mcs572/mcs572notes/lec05.html 5/7
29/01/2019 Using MPI — mcs572 0.7.8 documentation
Every send must have a matching recv . For the script to continue, process 1 must do
mpi4py uses pickle on Python objects. The user can declare the MPI types explicitly.
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
if(RANK == 0):
DATA = {'a': 7, 'b': 3.14}
COMM.send(DATA, dest=1, tag=11)
print RANK, 'sends', DATA, 'to 1'
elif(RANK == 1):
DATA = COMM.recv(source=0, tag=11)
print RANK, 'received', DATA, 'from 0'
With mpi4py we can either rely on Python’s dynamic typing or declare types explicitly when
processing numpy arrays. To sum an array of numbers, we distribute the numbers among the
processes that compute the sum of a slice. The sums of the slices are sent to process 0 which
computes the total sum. The code for the script is ref{figmpi4pyparsum} and what appears on
screen when the script runs is below.
http://homepages.math.uic.edu/~jan/mcs572/mcs572notes/lec05.html 6/7
29/01/2019 Using MPI — mcs572 0.7.8 documentation
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
SIZE = COMM.Get_size()
N = 10
if(RANK == 0):
DATA = np.arange(N*SIZE, dtype='i')
for i in range(1, SIZE):
SLICE = DATA[i*N:(i+1)*N]
COMM.Send([SLICE, MPI.INT], dest=i)
MYDATA = DATA[0:N]
else:
MYDATA = np.empty(N, dtype='i')
COMM.Recv([MYDATA, MPI.INT], source=0)
S = sum(MYDATA)
print RANK, 'has data', MYDATA, 'sum =', S
Recall that Python is case sensitive and the distinction between Send and send , and between
Recv and recv is important. In particular, COMM.send and COMM.recv have no type declarations,
whereas COMM.Send and COMM.Recv have type declarations.
Bibliography
1. L. Dalcin, R. Paz, and M. Storti. MPI for Python. Journal of Parallel and Distributed
Computing, 65:1108–1115, 2005.
2. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPI - The Complete
Reference Volume 1, The MPI Core. Massachusetts Institute of Technology, second
edition, 1998.
Exercises
1. Adjust the parallel summation to work for p processors where the dimension n of the
array is a multiple of p .
2. Use C or Python to rewrite the program to sum 100 numbers using MPI_Send and MPI_Recv
instead of MPI_Scatter and MPI_Gather .
3. Use C or Python to rewrite the program to square p numbers using MPI_Scatter and
MPI_Gather .
4. Show that a hypercube network topology has enough direct connections between
processors for a fan out broadcast.
http://homepages.math.uic.edu/~jan/mcs572/mcs572notes/lec05.html 7/7