Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Message Passing Interface: Parallel Processing Course University of Tehran

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

MPI MESSAGE PASSING INTERFACE

Parallel Processing Course University Of Tehran


Fall 1392

Distributed Computing Paradigms


Communication Models: Message Passing(send, receive, broadcast, ...) Shared Memory(load, store, lock, unlock) Computation Models: Functional Parallel(Task) - MIMD Data Parallel - SIMD

Explicit Parallelism
Same thing as multithreading for shared memory.

Explicit parallelism is more common with message

passing.
User has explicit control over processes. Good: control can be used to performance benefit.

Bad: user has to deal with it.

Distributed Memory - Message Passing

mem1 mem2 mem3


proc1 proc2 proc3

memN
procN

network

Distributed Memory - Message Passing


A variable x, a pointer p, or an array a[] refer to different

memory locations, depending of the processor.


A process is a program counter and address space.

Message passing as a programming model (can be on

any hardware)
Message passing is used for communication among processes.

Inter-process communication:
Type:

Synchronous / Asynchronous Movement of data from one processs address space to anothers

Synchronous Vs. Asynchronous


A synchronous communication is not complete until the

message has been received.


An asynchronous communication completes as soon as

the message is on the way.

What does the user have to do?


This is what we said for shared memory: Decide how to decompose the computation into parallel parts. Create (and destroy) processes to support that decomposition. Add synchronization to make sure dependences are covered. Is the same true for message passing?

What does the user need to do?


Divide up program in parallel parts.

Create and destroy processes to do above.


Partition and distribute the data. Communicate data at the right time. (Sometimes) perform index translation.

Still need to do synchronization? Sometimes, but many times goes hand in hand with data communication.

Message Passing Systems


Provide process creation and destruction.

Provide message passing facilities (send and receive, in

various flavors) to distribute and communicate data. Provide additional synchronization facilities.

Message Passing Interface


Derived from several previous libraries PVM, P4, Express

Standard message-passing library includes best of several previous libraries


Versions for C/C++ and FORTRAN

Available for free


Can be installed on Networks of Workstations Parallel Computers (Cray T3E, IBM SP2, Parsytec PowerXplorer, other)

MPI Services
Hide details of architecture

Not a language or compiler specification


Not a specific implementation or product Hide details of message passing, buffering Provides message management services packaging send, receive broadcast, reduce, scatter, gather message modes

MPI Program Organization


MIMD Multiple Instruction, Multiple Data Every processor runs a different program SPMD Single Program, Multiple Data Every processor runs the same program Each processor computes with different data Variation of computation on different processors through if or switch statements

MPI Progam Organization


MIMD in a SPMD framework Different processors can follow different computation paths Branch on if or switch based on processor identity

MPI Basics
Starting and Finishing

Identifying yourself
Sending and Receiving messages Communicator Collection of processes Determines scope to which messages are relative identity of process (rank) is relative to communicator scope of global communications (broadcast, etc.)

MPI starting and finishing


Statement needed in every program before any other MPI

code
MPI_Init(&argc, &argv);

Last statement of MPI code must be MPI_Finalize();


Program will not terminate without this statement

MPI Process Identification


MPI_Comm_size( comm, &size ) Determines the number of processes. MPI_Comm_rank( comm, &pid ) Pid is the process identifier of the caller.

MPI Messages
Message content, a sequence of bytes

Message needs wrapper analogous to an envelope for a letter

Letter Address Return Address Type of Mailing (class) Letter Weight Country Magazine

Message Destination Source Message type Size (count) Communicator Broadcast

MPI Basic Send


MPI_Send(buf, count, datatype, dest, tag, comm) buf: address of send buffer count: number of elements datatype: data type of send buffer elements dest: process id of destination process tag: message tag (ignore for now) comm: communicator (ignore for now)

MPI Basic Receive


MPI_Recv(buf, count, datatype, source, tag, comm, &status)

buf: address of receive buffer count: size of receive buffer in elements datatype: data type of receive buffer elements source: source process id or MPI_ANY_SOURCE tag and comm: ignore for now status: status object

Data Types
The data message which is sent or received is

described by a triple (address, count, datatype). The following data types are supported by MPI:
Predefined data types that are corresponding to data

types from the programming language. Arrays. Sub blocks of a matrix User defined data structure. A set of predefined data types

Basic MPI types


MPI datatype MPI_CHAR MPI_SIGNED_CHAR MPI_UNSIGNED_CHAR MPI_SHORT MPI_UNSIGNED_SHORT MPI_INT MPI_UNSIGNED MPI_LONG MPI_UNSIGNED_LONG MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE C datatype signed char signed char unsigned char signed short unsigned short signed int unsigned int signed long unsigned long float double long double

Why defining the data types during the send of a message?

Because communications take place between heterogeneous machines. Which may have different data representation and length in the memory.

Message Passing Example


#include <stdio.h> #include <string.h> #include "mpi.h" #define MAXSIZE 100 int main(int argc, char* { int myRank; /* int numProc; /* int source; /* int dest; /* int tag = 0; /* char mess[MAXSIZE]; /* int count; /* MPI_Status status; /* argv[]) rank (identity) of process number of processors rank of sender rank of destination tag to distinguish messages message (other types possible) number of items in message status of message received */ */ */ */ */ */ */ */ /* includes MPI library code specs */

Message Passing Example


MPI_Init(&argc, &argv); /* start MPI */ /* get number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &numProc); /* get rank of this process */ MPI_Comm_rank(MPI_COMM_WORLD, &myRank); /***********************************************/ /* code to send, receive and process messages */ /***********************************************/ MPI_Finalize(); } /* shut down MPI */

Message Passing Example


if (myRank != 0){/* all processes send to root */ /* create message */ sprintf(message, "Hello from %d", myRank); dest = 0; /* destination is root */ count = strlen(mess) + 1; /* include '\0' in message */ MPI_Send(mess, count, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else{/* root (0) process receives and prints messages */ /* from each processor in rank order */ for(source = 1; source < numProc; source++){ MPI_Recv(mess, MAXSIZE, MPI_CHAR, source, tag, MPICOMM_WORLD, &status); printf("%s\n", mess); } }

Output
> mpirun -np 4 ./helloworld Hello from 1 Hello from 2 Hello from 3

Point-to-Point communications

A synchronous communication does not complete until the message has been received.

An asynchronous communication completes as soon as the message is on its way

Non-blocking operations

Non blocking communication allows useful work to be performed while waiting for the communication to complete

Collective communications

Broadcast
A broadcast sends a message to a number of recipients

Barrier
A barrier operation synchronises a number of processors.

Reduction operations
Reduction operations reduce data from a number of processors to a single item.

Introduction to collective operations in MPI


o Collective operations are called by all processes in a communicator o MPI_Bcast distributes data from one process(the root) to all others in a communicator.
Syntax: MPI_Bcast(void *message, int count, MPI_Datatype datatype, int root, MPI_Comm comm)

o MPI_Reduce combines data from all processes in communicator or and returns it to one process
Syntax: MPI_Reduce(void *message, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm)

o In many numerical algorithm, send/receive can be replaced by Bcast/Reduce, improving both simplicity and efficiency.

Collective Message Passing


Broadcast Sends a message from one to all processes in the group Scatter Distributes each element of a data array to a different process for computation Gather The reverse of scatterretrieves data elements into an array from multiple processes

Collective Message Passing w/MPI


MPI_Bcast()
Broadcast from root to all other processes

MPI_Gather()
MPI_Scatter() MPI_Alltoall()

Gather values for group of processes


Scatters buffer in parts to group of processes Sends data from all processes to all processes

MPI_Reduce()
MPI_Bcast()

Combine values on all processes to single val

MPI_Reduce_Scatter() Broadcast from root to all other processes


Broadcast from root to all other processes

Broadcasting a message
Broadcast: one sender, many receivers Includes all processes in communicator, all processes must

make an equivalent call to MPI_Bcast Any processor may be sender (root), as determined by the fourth parameter First three parameters specify message as for MPI_Send and MPI_Recv, fifth parameter specifies communicator Broadcast serves as a global synchronization

MPI_Bcast() Syntax
MPI_Bcast(mess, count, MPI_INT, root, MPI_COMM_WORLD); mess pointer to message buffer count number of items sent MPI_INT type of item sent Note: count and type should be the same on all processors root sending processor MPI_COMM_WORLD communicator within which broadcast takes place

Examine add.c

Compile & execute add.c

Edit add_mpi.c
or

Edit add_mpi.c

Edit add_mpi.c

Edit add_mpi.c

Edit add_mpi.c

Edit add_mpi.c

Edit add_mpi.c

MPI Matrix Multiply (w/o Index Translation)


main(int argc, char *argv[]) { MPI_Init (&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &p); from = (myrank * n)/p; to = ((myrank+1) * n)/p; /* Data distribution */ ... /* Computation */ ... /* Result gathering */ ... MPI_Finalize(); }

MPI Matrix Multiply (w/o Index Translation)


/* Data distribution */
if( myrank != 0 ) { MPI_Recv( &a[from], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); } else { for( i=1; i<p; i++ ) { MPI_Send( &a[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD ); MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD ); } }

MPI Matrix Multiply (w/o Index Translation)


/* Computation */ for ( i=from; i<to; i++) for (j=0; j<n; j++) { C[i][j]=0; for (k=0; k<n; k++) C[i][j] += A[i][k]*B[k][j]; }

MPI Matrix Multiply (w/o Index Translation)


/* Result gathering */ if (myrank!=0) MPI_Send( &c[from], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD); else for (i=1; i<p; i++) MPI_Recv( &c[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD, &status);

Compile and run the code


Compile using:

mpicc o pi pi.c
Or

mpic++ o pi pi.cpp
mpirun np # of procs machinefile XXX pi -machinefile tells MPI to run the program on the

machines of XXX.

Toward a Portable MPI Environment


MPICH: high-performance portable implementation of MPI (1+2)

runs on MPP's, clusters, and heterogeneous networks of

workstations In a wide variety of environments, one can do:


configure make mpicc -mpitrace myprog.c mpirun -np 10 myprog or: mpiexec n 10 myprog mpirun -n 1 -host machine1 test : -n 1 -host machine2 test

to build, compile, run, and analyze performance Others: LAM MPI, OpenMPI, vendor X MPI

MPI Sources
Standard: http://www.mpi-forum.org

Books: Using MPI: Portable Parallel Programming with the MessagePassing Interface, by Gropp, Lusk, and Skjellum, MIT Press, 1994. MPI: The Complete Reference, by Snir, Otto, Huss-Lederman, Walker, and Dongarra, MIT Press, 1996. Designing and Building Parallel Programs, by Ian Foster, AddisonWesley, 1995. Parallel Programming with MPI, by Peter Pacheco, MorganKaufmann, 1997. MPI: The Complete Reference Vol 1 and 2,MIT Press, 1998(Fall).
Other information on Web http://www.mcs.anl.gov/mpi

You might also like