Message Passing Interface: Parallel Processing Course University of Tehran

MPI MESSAGE PASSING INTERFACE
Parallel Processing Course University Of Tehran

Fall 1392
Distributed Computing Paradigms

Communication Models: Message Passing(send, receive, broadcast, ...) Shared Memory(load, store, lock, unlock) Computation Models: Functional Parallel(Task) - MIMD Data Parallel - SIMD
Explicit Parallelism
Same thing as multithreading for shared memory.
Explicit parallelism is more common with message
passing.
User has explicit control over processes. Good: control can be used to performance benefit.
Bad: user has to deal with it.
Distributed Memory - Message Passing
mem1 mem2 mem3

proc1 proc2 proc3
memN
procN
network
Distributed Memory - Message Passing

A variable x, a pointer p, or an array a[] refer to different
memory locations, depending of the processor.

A process is a program counter and address space.
Message passing as a programming model (can be on
any hardware)
Message passing is used for communication among processes.
Inter-process communication:
Type:
Synchronous / Asynchronous Movement of data from one processs address space to anothers
Synchronous Vs. Asynchronous

A synchronous communication is not complete until the
message has been received.

An asynchronous communication completes as soon as
the message is on the way.
What does the user have to do?

This is what we said for shared memory: Decide how to decompose the computation into parallel parts. Create (and destroy) processes to support that decomposition. Add synchronization to make sure dependences are covered. Is the same true for message passing?
What does the user need to do?

Divide up program in parallel parts.
Create and destroy processes to do above.

Partition and distribute the data. Communicate data at the right time. (Sometimes) perform index translation.
Still need to do synchronization? Sometimes, but many times goes hand in hand with data communication.
Message Passing Systems

Provide process creation and destruction.
Provide message passing facilities (send and receive, in
various flavors) to distribute and communicate data. Provide additional synchronization facilities.
Message Passing Interface

Derived from several previous libraries PVM, P4, Express
Standard message-passing library includes best of several previous libraries

Versions for C/C++ and FORTRAN
Available for free

Can be installed on Networks of Workstations Parallel Computers (Cray T3E, IBM SP2, Parsytec PowerXplorer, other)
MPI Services
Hide details of architecture
Not a language or compiler specification

Not a specific implementation or product Hide details of message passing, buffering Provides message management services packaging send, receive broadcast, reduce, scatter, gather message modes
MPI Program Organization

MIMD Multiple Instruction, Multiple Data Every processor runs a different program SPMD Single Program, Multiple Data Every processor runs the same program Each processor computes with different data Variation of computation on different processors through if or switch statements
MPI Progam Organization

MIMD in a SPMD framework Different processors can follow different computation paths Branch on if or switch based on processor identity
MPI Basics
Starting and Finishing
Identifying yourself
Sending and Receiving messages Communicator Collection of processes Determines scope to which messages are relative identity of process (rank) is relative to communicator scope of global communications (broadcast, etc.)
MPI starting and finishing

Statement needed in every program before any other MPI
code
MPI_Init(&argc, &argv);
Last statement of MPI code must be MPI_Finalize();

Program will not terminate without this statement
MPI Process Identification

MPI_Comm_size( comm, &size ) Determines the number of processes. MPI_Comm_rank( comm, &pid ) Pid is the process identifier of the caller.
MPI Messages
Message content, a sequence of bytes
Message needs wrapper analogous to an envelope for a letter
Letter Address Return Address Type of Mailing (class) Letter Weight Country Magazine
Message Destination Source Message type Size (count) Communicator Broadcast
MPI Basic Send

MPI_Send(buf, count, datatype, dest, tag, comm) buf: address of send buffer count: number of elements datatype: data type of send buffer elements dest: process id of destination process tag: message tag (ignore for now) comm: communicator (ignore for now)
MPI Basic Receive

MPI_Recv(buf, count, datatype, source, tag, comm, &status)
buf: address of receive buffer count: size of receive buffer in elements datatype: data type of receive buffer elements source: source process id or MPI_ANY_SOURCE tag and comm: ignore for now status: status object
Data Types
The data message which is sent or received is
described by a triple (address, count, datatype). The following data types are supported by MPI:
Predefined data types that are corresponding to data
types from the programming language. Arrays. Sub blocks of a matrix User defined data structure. A set of predefined data types
Basic MPI types

MPI datatype MPI_CHAR MPI_SIGNED_CHAR MPI_UNSIGNED_CHAR MPI_SHORT MPI_UNSIGNED_SHORT MPI_INT MPI_UNSIGNED MPI_LONG MPI_UNSIGNED_LONG MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE C datatype signed char signed char unsigned char signed short unsigned short signed int unsigned int signed long unsigned long float double long double
Why defining the data types during the send of a message?
Because communications take place between heterogeneous machines. Which may have different data representation and length in the memory.
Message Passing Example

#include <stdio.h> #include <string.h> #include "mpi.h" #define MAXSIZE 100 int main(int argc, char* { int myRank; /* int numProc; /* int source; /* int dest; /* int tag = 0; /* char mess[MAXSIZE]; /* int count; /* MPI_Status status; /* argv[]) rank (identity) of process number of processors rank of sender rank of destination tag to distinguish messages message (other types possible) number of items in message status of message received */ */ */ */ */ */ */ */ /* includes MPI library code specs */

MPI_Init(&argc, &argv); /* start MPI */ /* get number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &numProc); /* get rank of this process */ MPI_Comm_rank(MPI_COMM_WORLD, &myRank); /***********************************************/ /* code to send, receive and process messages */ /***********************************************/ MPI_Finalize(); } /* shut down MPI */

if (myRank != 0){/* all processes send to root */ /* create message */ sprintf(message, "Hello from %d", myRank); dest = 0; /* destination is root */ count = strlen(mess) + 1; /* include '\0' in message */ MPI_Send(mess, count, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else{/* root (0) process receives and prints messages */ /* from each processor in rank order */ for(source = 1; source < numProc; source++){ MPI_Recv(mess, MAXSIZE, MPI_CHAR, source, tag, MPICOMM_WORLD, &status); printf("%s\n", mess); } }
Output
> mpirun -np 4 ./helloworld Hello from 1 Hello from 2 Hello from 3
Point-to-Point communications
A synchronous communication does not complete until the message has been received.
An asynchronous communication completes as soon as the message is on its way
Non-blocking operations
Non blocking communication allows useful work to be performed while waiting for the communication to complete
Collective communications
Broadcast
A broadcast sends a message to a number of recipients
Barrier
A barrier operation synchronises a number of processors.
Reduction operations
Reduction operations reduce data from a number of processors to a single item.
Introduction to collective operations in MPI

o Collective operations are called by all processes in a communicator o MPI_Bcast distributes data from one process(the root) to all others in a communicator.
Syntax: MPI_Bcast(void *message, int count, MPI_Datatype datatype, int root, MPI_Comm comm)
o MPI_Reduce combines data from all processes in communicator or and returns it to one process
Syntax: MPI_Reduce(void *message, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm)
o In many numerical algorithm, send/receive can be replaced by Bcast/Reduce, improving both simplicity and efficiency.
Collective Message Passing

Broadcast Sends a message from one to all processes in the group Scatter Distributes each element of a data array to a different process for computation Gather The reverse of scatterretrieves data elements into an array from multiple processes
Collective Message Passing w/MPI

MPI_Bcast()
Broadcast from root to all other processes
MPI_Gather()
MPI_Scatter() MPI_Alltoall()
Gather values for group of processes

Scatters buffer in parts to group of processes Sends data from all processes to all processes
MPI_Reduce()
MPI_Bcast()
Combine values on all processes to single val
MPI_Reduce_Scatter() Broadcast from root to all other processes

Broadcast from root to all other processes
Broadcasting a message
Broadcast: one sender, many receivers Includes all processes in communicator, all processes must
make an equivalent call to MPI_Bcast Any processor may be sender (root), as determined by the fourth parameter First three parameters specify message as for MPI_Send and MPI_Recv, fifth parameter specifies communicator Broadcast serves as a global synchronization
MPI_Bcast() Syntax
MPI_Bcast(mess, count, MPI_INT, root, MPI_COMM_WORLD); mess pointer to message buffer count number of items sent MPI_INT type of item sent Note: count and type should be the same on all processors root sending processor MPI_COMM_WORLD communicator within which broadcast takes place
Examine add.c
Compile & execute add.c
Edit add_mpi.c
or
Edit add_mpi.c
Edit add_mpi.c
Edit add_mpi.c
Edit add_mpi.c
Edit add_mpi.c
Edit add_mpi.c
MPI Matrix Multiply (w/o Index Translation)

main(int argc, char *argv[]) { MPI_Init (&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &p); from = (myrank * n)/p; to = ((myrank+1) * n)/p; /* Data distribution */ ... /* Computation */ ... /* Result gathering */ ... MPI_Finalize(); }

/* Data distribution */
if( myrank != 0 ) { MPI_Recv( &a[from], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); } else { for( i=1; i<p; i++ ) { MPI_Send( &a[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD ); MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD ); } }

/* Computation */ for ( i=from; i<to; i++) for (j=0; j<n; j++) { C[i][j]=0; for (k=0; k<n; k++) C[i][j] += A[i][k]*B[k][j]; }

/* Result gathering */ if (myrank!=0) MPI_Send( &c[from], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD); else for (i=1; i<p; i++) MPI_Recv( &c[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD, &status);
Compile and run the code

Compile using:
mpicc o pi pi.c
Or
mpic++ o pi pi.cpp
mpirun np # of procs machinefile XXX pi -machinefile tells MPI to run the program on the
machines of XXX.
Toward a Portable MPI Environment

MPICH: high-performance portable implementation of MPI (1+2)
runs on MPP's, clusters, and heterogeneous networks of
workstations In a wide variety of environments, one can do:

configure make mpicc -mpitrace myprog.c mpirun -np 10 myprog or: mpiexec n 10 myprog mpirun -n 1 -host machine1 test : -n 1 -host machine2 test
to build, compile, run, and analyze performance Others: LAM MPI, OpenMPI, vendor X MPI
MPI Sources
Standard: http://www.mpi-forum.org
Books: Using MPI: Portable Parallel Programming with the MessagePassing Interface, by Gropp, Lusk, and Skjellum, MIT Press, 1994. MPI: The Complete Reference, by Snir, Otto, Huss-Lederman, Walker, and Dongarra, MIT Press, 1996. Designing and Building Parallel Programs, by Ian Foster, AddisonWesley, 1995. Parallel Programming with MPI, by Peter Pacheco, MorganKaufmann, 1997. MPI: The Complete Reference Vol 1 and 2,MIT Press, 1998(Fall).
Other information on Web http://www.mcs.anl.gov/mpi

Message Passing Interface: Parallel Processing Course University of Tehran

Uploaded by

Copyright:

Available Formats

Message Passing Interface: Parallel Processing Course University of Tehran

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Message Passing Interface: Parallel Processing Course University of Tehran

Uploaded by

Copyright:

Available Formats

MPI MESSAGE PASSING INTERFACE

Parallel Processing Course University Of Tehran

Distributed Computing Paradigms

Explicit parallelism is more common with message

Bad: user has to deal with it.

Distributed Memory - Message Passing

mem1 mem2 mem3

Distributed Memory - Message Passing

memory locations, depending of the processor.

Message passing as a programming model (can be on

Synchronous Vs. Asynchronous

message has been received.

the message is on the way.

What does the user have to do?

What does the user need to do?

Create and destroy processes to do above.

Message Passing Systems

Provide message passing facilities (send and receive, in

Message Passing Interface

Standard message-passing library includes best of several previous libraries

Available for free

Not a language or compiler specification

MPI Program Organization

MPI Progam Organization

MPI starting and finishing

Last statement of MPI code must be MPI_Finalize();

MPI Process Identification

Message needs wrapper analogous to an envelope for a letter

Message Destination Source Message type Size (count) Communicator Broadcast

MPI Basic Send

MPI Basic Receive

Basic MPI types

Why defining the data types during the send of a message?

Message Passing Example

Message Passing Example

Message Passing Example

An asynchronous communication completes as soon as the message is on its way

Introduction to collective operations in MPI

Collective Message Passing

Collective Message Passing w/MPI

Gather values for group of processes

Combine values on all processes to single val

MPI_Reduce_Scatter() Broadcast from root to all other processes

Compile & execute add.c

MPI Matrix Multiply (w/o Index Translation)

MPI Matrix Multiply (w/o Index Translation)

MPI Matrix Multiply (w/o Index Translation)

MPI Matrix Multiply (w/o Index Translation)

Compile and run the code

Toward a Portable MPI Environment

runs on MPP's, clusters, and heterogeneous networks of

workstations In a wide variety of environments, one can do:

You might also like