Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Pseudo Code of Mpi Programs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

SCALABLE SYSTEMS

ASSIGNMENT I
SUBMITTED BY:
MOHD ASHFAQ
192CD013
MTECH CDS, 1ST SEM
Q1. Write an OpenMP program which can efficiently parallelize prime
number generator. Given a large input N, program should generate
all the prime numbers till N. Note the time taken by sequential
program to generate prime numbers.
a) Try and report what is the best #OMP threads that are required
to parallelize this efficiently. Experiment with different number
of threads = 2,4,8... Compare the time taken by parallel version of
your code with serial code.
b) Try to experiment with parallel for contruct's schedule clause.
Use different schedule kind to check if it has any effect on the
execution time. Use different chunk sizes for each kind.

I. PSUEDOCODES:

A> Pseudocode for serial execution: -


1. Get num;
2. for( i = 2; i <= num; i++ )
{
last = sqrt(i);
flag = 0;
for( j = 2; j <= last; j++ )
{
if( i%j == 0 )
{
flag = 1;
break;
}
}
if( flag == 0 )
{
Display i; /*print the prime number found*/
}
}

B> Pseudocode for parallel execution: -


1. Get num, chunk, nthread;
2. #pragma omp parallel /*parallel region starts*/
{
#pragma omp for schedule( <schedule_type> ,chunksize)
for( i = 2; i <= num; i++ )
{
last = sqrt(i);
flag = 0;
for( j = 2; j <= last; j++ )
{
if( i%j == 0 )
{
flag = 1;
break;
}
}
if( flag == 0 )
{
Display i; /*print the prime number found*/
}
} /*parallel region ends*/

II. OBSERVATION TABLE:

A> N= 1000000, number of threads = 8


Sl. Schedule kind Chunk size Execution Time
No.
1 Static 10 1.096680
2 20 1.125488
3 30 1.094238
1 Dynamic 10 1.062988
2 20 1.079590
3 30 1.062500
1 Guided 10 1.078125
2 20 1.087402
3 30 1.097656

B> N = 1000000, change in thread count

Sl. No. of threads Execution Time


No.
1 2 1.077637
2 4 1.007812
3 6 1.147461
4 8 1.067383
5 10 1.092773
6 16 1.109863
2. Write a sequential program to find the smallest element in an
array. Convert the same program in to parallel using OpenMP.
Initialize array with random numbers. Consider an array size as
10000, 50000 and 100000.Analyze the result for maximum number of
threads and various schedule() function. Based on observations,
perform analysis of the total execution time and explain the result
by plotting the graph.

I. PSEUDOCODES:

A> Pseudocode for serial execution: -


1. Get num;
2. for( i = 0; i < num; i++ )
{
arr[i] = rand(); /*initialize with random numbers */
}
3. Initialize small = arr[0];
4. for( i = 1; i < num; i++ )
{
if( arr[i] < small )
small = arr[i];
}
5. Print small;

B> Pseudocode for parallel execution: -


1.Get num;
2. for( i = 0; i < num; i++ )
{
arr[i] = rand(); /*initialize with random numbers */
}
3. for( i = 0; i < NUM_THREADS; i++ )
{
small[i] = arr[0];
}
4. #pragma omp parallel /*parallel region starts*/
{
tid = omp_get_thread_num();
#pragma omp for( <schedule_type> ,chunksize)
for( i = 0; i < num; i++ )
{
if(arr[i] < small[tid])
small[tid] = arr[i];
}
} /*parallel region ends*/

5. Initialize my_min = small[0];


6. for( i = 1; i < NUM_THREADS; i++ )
if( small[i] < my_min )
my_min = small[i];
7. Print my_min; /*Print the global minimum*/

II. OBSERVATION TABLE:

Schedule() Total execution Total execution Total execution


time for Number time for Number time for Number
of iterations of iterations of iterations
=10000 =50000 =100000

Sequential 0 0 0
Static 0.001953 0.001953 0.005859
Static, 0.001953 0.002930 0.008789
chunksize
Dynamic 0.000977 0.003906 0.007812
Guided 0.000977 0.001953 0.002930
Runtime 0.000977 0.007812 0.011719

III. GRAPH

Result analysis: Static without specifying the chunk size takes


comparatively lowest time.
The system decides the chunk size and, equally divides it between
the threads before the execution starts.
Since, the array has random numbers, the computation to compare and
store is similar in each chunk. It doesn’t get smaller or larger
with iterations.
Therefore, dynamic or guided schedule type do not perform better
than static. As, they just add the overhead of computing and
deciding the next chunk value (guided) or assignment of chunks
(dynamic) during runtime.
3. Write MPI program to calculate the product of two matrices A (of
size N*32) and B (of size 32*N), which
should be a N*N matrix. Design a parallel scheme for computing
matrix multiplication,
a. Blocking P2P (point-to-point) communication
b. Collective communication
c. Observe the running time of your programs; change some of
the parameters to see how it is associated
with N and communication type. Write down the observations.

I. PSEUDOCODES:

A> Blocking P2P (point-to-point) communication


1. MPI_Init( &argc, &argv );
2. MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
3. MPI_Comm_size(MPI_COMM_WORLD,&numtasks);
4. if( numtask < 2 )
{
MPI_Abort(MPI_COMM_WORLD, rc);
exit(1);
}
5. numworkers = numtasks-1;
6. if( taskid == MASTER )
{
6.1 Initialize A[NRA][NCA];
6.2 Initialize B[NCA][NCB];

/*Send matrix data to the workers tasks*/


6.3 averow = NRA/numworkers;
6.4 extra = NRA%numworkers;
6.5 offset = 0;
6.6 my_type = FROM_MASTER;
6.7 for( dest = 1; dest <= numworkers; dest++ )
{
6.7.1 rows = (dest <= extra)? averow+1:averow;
6.7.2 MPI_Send(&offset, 1, MPI_INT, dest, mtype,
6.7.3 MPI_Send(&rows, 1, MPI_INT, dest, mtype,
6.7.4 MPI_Send(&a[offset][0], rows*NCA, MPI_DOUBLE,
6.7.5 MPI_Send(&b, NCA*NCB, MPI_DOUBLE, dest, mtype,
MPI_COMM_WORLD);
6.7.6 offset = offset + rows;
}

/*Receive results from worker tasks*/


7. mytype = FROM_WORKER;
8. for( i = 1; i <= numworkers; i++ )
{
8.1 source = i;
8.2 MPI_Recv(&offset, 1, MPI_INT, source, mytype,
MPI_COMM_WORLD, &status);
8.3 MPI_Recv(&rows, 1, MPI_INT, source, mytype,
8.4 MPI_Recv(&c[offset][0], rows*NCB, MPI_DOUBLE,
}
9. Display resultant matrix C[NRA][NCB];
10. if( taskid != MASTER )
{
10.1 mytype = FROM_MASTER;
10.2 MPI_Recv(&offset, 1, MPI_INT, MASTER, mtype,
10.3 MPI_Recv(&rows, 1, MPI_INT, MASTER, mtype,
10.4 MPI_Recv(&a, rows*NCA, MPI_DOUBLE, MASTER, mtype,
10.5 MPI_Recv(&b, NCA*NCB, MPI_DOUBLE, MASTER, mtype,
10.6 for( k = 0; k < NCB; k++ ) /*calculate the product*/
{
/* rows of matrix A sent to each worker */
for( i = 0; i < rows; i++ )
{
sum = 0.0;
for( j = 0; j < NCA; j++ )
sum = sum + A[i][j]*B[j][k];
C[i][k] = sum;
}
}
10.7 mytype = FROM_WORKERS;
10.8 MPI_Send(&offset, 1, MPI_INT, MASTER, mtype,
10.9 MPI_Send(&rows, 1, MPI_INT, MASTER, mtype,
}
11. MPI_Finalize();

B> Collective communication


1.MPI_Init(&argc, &argv);
2.MPI_Comm_size(MPI_COMM_WORLD, &size);
3.MPI_Comm_rank(MPI_COMM_WORLD, &rank);
4.if( rank == MASTER )
{
Initialise array A[N][N];
Initialise array B[N][N];
}

/*scatter rows of first matrix to different processes*/


5.MPI_Scatter(A, N*N/size, MPI_INT, aa, N*N/size,
MPI_INT,0,MPI_COMM_WORLD);
/*broadcast second matrix to all processes*/
6.MPI_Bcast(b, N*N, MPI_INT, 0, MPI_COMM_WORLD);
7.MPI_Barrier(MPI_COMM_WORLD);
/*calculate the product*/
8.for( i = 0; i < N; i++)
{
sum = 0;
for( j = 0; j < N; j++ )
sum = sum + aa[j] * B[j][i];
/*aa[] has partial number of rows of A[][]*/
C[i] = sum;
}
9.MPI_Gather(cc, N*N/size, MPI_INT, c, N*N/size, MPI_INT, 0,
MPI_COMM_WORLD);
10.MPI_Barrier(MPI_COMM_WORLD);
11.if( rank == MASTER )
{
Display resultant matrix C[N][B];
}
12.MPI_Finalize();

II. OUTPUTS

A> Point to point communication

$ mpicc 3pp.c
$ mpiexec -n 7 ./a.out
mpi_mm has started with 7 tasks.
Initializing arrays...
Array A[7][10]:
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

Array b[10][7]:
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
Sending 2 rows to task 1 offset=0
Sending 1 rows to task 2 offset=2
Sending 1 rows to task 3 offset=3
Sending 1 rows to task 4 offset=4
Sending 1 rows to task 5 offset=5
Sending 1 rows to task 6 offset=6
Received results from task 1
Received results from task 2
Received results from task 3
Received results from task 4
Received results from task 5
Received results from task 6
******************************************************
Result Matrix:

7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
******************************************************
The process took: 0.280028
******************************************************

B> Collective communication


$ mpicc 3Bcast.c
$ mpiexec -n 7 ./a.out
Array A[7][10]:
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

Array B[7][10]:
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1

Resultant, A*B =
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7
7 7 7 7 7 7 7

******************************************************
The process took: 0.022957
******************************************************

III. OBSERVATION

Collective communication takes smaller amount of time to


execute than point to point communication.

Due to iterative send and receive in point-to-point, it


results in more execution time than compared to simultaneous
message passing in broadcast communication.
4. Write an MPI program to calculate the dot product of two arrays
and to perform reduction on the product array.
The working follows:
• The root process populates two arrays, each of size N.
• The arrays are distributed amongst P processes in the
communicator.
• Each process calculates the dot product and the reduction
operation.
• Root performs the final reduction operation to obtain the
final answer. The values for N and P are your choice.
The arrays can be filled with random numbers.
Implement two versions of this code.
a. Each process gets an equal sized chunk of both the arrays.
(using MPI Scatter).
b. Each process gets an unequal sized chunk of both the
arrays. (using MPI Scatterv).

I. PSEUDOCODES:

A> MPI_Scatter usage


1. MPI_Init(&argc,&argv);
2. MPI_Comm_size(MPI_COMM_WORLD,&size);
3. MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
4. if( MASTER )
Initialize matrix A[] and B[];
5. Initialize ch = N/size;
6. *c = malloc(sizeof(<var_type>) * ch);
7. *d = malloc(sizeof(<var_type>) * ch);
/*Scatter A[] & B[] to slave processes
8. MPI_Scatter(A,ch,MPI_INT,c,ch,MPI_INT,0,MPI_COMM_WORLD);
9. MPI_Scatter(B,ch,MPI_INT,d,ch,MPI_INT,0,MPI_COMM_WORLD);
10. for( i = 0; i < N/size; i++ )
sum_temp += (c[i] * d[i]);
/*calculate partial dot product*/
11. MPI_Reduce(&sum_temp,&sum,1,MPI_INT,MPI_SUM,0,
MPI_COMM_WORLD);
12. if( MASTER )
Display sum;
/*calculated globally from MPI_Reduce in line 11*/
13. MPI_Finalize();

B> MPI_Scatterv usage


1.MPI_Init(&argc,&argv);
2.MPI_Comm_size(MPI_COMM_WORLD,&size);
3.MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
4.if( myrank == MASTER )
{
Initialize matrix A[] & B[];
Compute variable displacement: disp1[], disp2[];
}
5.Initialize ch = N/size;
6.*c = malloc(sizeof(<var_type>) * 2 * ch);
7.*d = malloc(sizeof(<var_type>) * 2 * ch);
8.MPI_Scatterv(A,z1,disp1,MPI_INT,c,(myrank + 2), MPI_INT, 0,
MPI_COMM_WORLD);
9.MPI_Scatterv(B,z2,disp2,MPI_INT,d,(myrank +2), MPI_INT, 0,
MPI_COMM_WORLD);
10. for( i = 0; i < (myrank+2); i++ )
sum_temp += (c[i]*d[i]);
/*calculate partial dot product*/
11.MPI_Reduce(&sum_temp,&sum,1,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);
12.if( MASTER )
Display sum;
/*calculated globally from MPI_Reduce in line 11*/
13.MPI_Finalize();

II. OUTPUT

A> Scatter
$ mpicc 4Scatter.c
$ mpiexec -n 10 ./a.out
Matrix A:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Matrix B:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

DOT PRODUCT= 50

B> Scatterv

$ mpicc 4ScatterV.c
$ mpiexec -n 10 ./a.out
Matrix A:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Matrix B:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

DOT PRODUCT= 50
5. The value of π is computed mathematically as follows:


Write an MPI program to compute π using MPI _Bcast and MPI _Reduce.
Compare execution time for serial code and parallel code.

I. PSEUDOCODES:
A> main()
1.MPI_Init(&argc,&argv);
2.MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
3.MPI_Comm_rank(MPI_COMM_WORLD,&myid);
4.if( MASTER )
n = 100000; /* Total no. of evaluation points*/
5.MPI_Bcast( &n, 1, MPI_INT, 0, MPI_COMM_WORLD );
/* Share intervals with other processors */
6.Initialize sum = 0.0, h = 1.0/n;
7.for( i = myid+0.5 ; i < n; i += numprocs )
sum += dx_arctan(i*h);
8.mypi = h*sum;

9.MPI_Reduce( &mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,


MPI_COMM_WORLD ); /* Consolidate and Sum Results */
10.if( MASTER )
Display pi;
11.MPI_Finalize();

B> dx_arctan(x): -
1. Return ((4.0/(1.0+x*x)); /*f(x) for calculating pi value*/

II. OUTPUT:

$ mpicc 5.c
$ mpiexec -n 8 ./a.out
This is Process-7/8
This is Process-6/8
This is Process-4/8
This is Process-3/8
This is Process-1/8
This is Process-2/8
This is Process-0/8
This is Process-5/8
This program uses 8 processes

The number of intervals = 100000


###############################################
pi is approximately 3.14159
Execution time = 0.001740
###############################################
7. Write an OpenMP program to compute histogram . Use lock function
omp_init_lock(), omp_set_lock(),omp_unset_lock(),
omp_destroy_lock() to handle synchronization problems among
threads.

I. PSEUDOCODE:
1. Get N, max;
2. Initialize arr[N], hist[max];
3. for( i = 0; i < N ; i++ )/*initialize arr[] with random no */
arr[i] =(int) rand()%max;
4. omp_lock_t writelock;
5. omp_init_lock(&writelock);
6. omp_set_num_threads(thread_count);
7. #pragma omp parallel_for schedule() /*parallelize the code*/
for( i = 0; i < N; i++ )
{
omp_set_lock( &writelock);
hist[arr[i]] +=1;
omp_unset_lock(&writelock);
}
8. omp_destroy_lock(&writelock);
9. for( i = 0; i < max; i++ )
{
perc[i] = 100*((float)hist[i]/(float)N);
for (int j = 0; j < perc[i]; j++ )
Display “ <pattern> ” ;
Display hist[i];
}

II. OUTPUT:

$ mpicc 7.c
$ mpiexec -n 8 ./a.out
Enter the array length: 1000
Enter the largest histogram value:10
0:***********107
1:**********95
2:**********99
3:**********97
4:**********97
5:*********89
6:**********100
7:***********106
8:***********104
9:***********106
8. Write a hybrid Open MP-MPI program to compute Matrix Vector
multiplication.

I. PSEUDOCODE
1. MPI_Init(&argc,&argv);
2. MPI_Comm_size(MPI_COMM_WORLD, &size);
3. MPI_Comm_rank(MPI_COMM_WORLD, &rank);
4. if( rank == MASTER )
{
4.1 **arr = malloc(sizeof(<var_type*) * rows);
4.2 for( i = 0; i <rows; i++ )
arr[i] = malloc(sizeof(<var_type>) * columns);
4.3 Initialize arr[rows][columns];
4.4 *vector = malloc(sizeof(<var_type>) * columns);
4.5 Initialise vector[columns];
4.6 row_each = rows/size;
/* Share rows and vector with other processors */
4.7 MPI_Bcast( &row_each, 1, MPI_INT, 0, MPI_COMM_WORLD);
4.8 MPI_Bcast( vector, columns, MPI_INT, 0,
MPI_COMM_WORLD);
4.9 for( i = 1; i < size; i++ )
{
for( j = 0; j < row_each; j++ )
MPI_Send(arr[j + (i*row_each)], columns,
MPI_INT, (i), 0, MPI_COMM_WORLD);
}
4.10 MPI_Recv(vector, columns, MPI_INT , 0, MPI_ANY_TAG,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
4.11 Display vector;
}
5. else
{
5.1 *local_arr = malloc(sizeof(<var_type>*)*row_each);
5.2 for( i = 0; i < row_each; i++ )
*local_arr[i] = malloc(sizeof(<var_type>)*columns);
5.3 for( i = 0; i < row_each; i++ )
MPI_Recv(local_arr[i], columns, MPI_INT , 0,
MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
5.4 #pragma omp parallel
{
#pragma omp for
for( i = 0; i < row_each; i++ )
{
for(j=0; j<columns; j++)
product += vector[j]*local_arr[i][j];
}

}
5.5 MPI_Send(product, columns, MPI_INT, (i), 0,
MPI_COMM_WORLD);
}
6. MPI_Finalize();

II. OUTPUT:

$ mpicc 8.c -fopenmp


$ mpiexec -n 4 ./a.out
printing main matrix
4 7 8 6
4 6 7 3
10 2 3 8
1 10 4 7

printing vector:
1
7
3
7

The resultant vector is:


119
88
89
132
9. Write a MPI program to find sum of 'n' integers on 'p'
processors using Point-to-Point communication library
calls and linear array topology. In this rank of process is the
input to process and output must be printed using
rank 0 process.

I. PSEUDOCODE
1. MPI_Init(&argc, &argv);
2. MPI_Comm_rank(MPI_COMM_WORLD, &pid);
3. MPI_Comm_size(MPI_COMM_WORLD, &np);
4. if( pid == MASTER )
{
4.1 Get n; /*size of the array*/
4.2 for( i = 0; i < n; i++ )
A[i] = rand();
/*Initialize A[] with random no */
4.3 elements_per_process = n/np;
4.4 if( np > 1 )
{
/* distributes the portion of array to child
processes to calculate their partial sums*/
4.4.1 for(i = 1; i < np - 1; i++)
{
4.4.1.1 index =
i*elements_per_process;
4.4.1.2 MPI_Send(
&elements_per_process, 1, MPI_INT,
i, 0, MPI_COMM_WORLD);
4.4.1.3 MPI_Send(&a[index],
elements_per_process,
MPI_INT, i,
0, MPI_COMM_WORLD);
}
/* last process adds remaining elements*/
4.4.2 index = i*elements_per_process;
4.4.3 elements_left = n - index;
4.4.4 MPI_Send(&elements_left, 1, MPI_INT, i, 0,
MPI_COMM_WORLD);
4.4.5 MPI_Send(&a[index], elements_left,
}
/*master process add its own sub array*/
4.5 Initialize sum = 0;
4.6 for (i = 0; i < elements_per_process; i++)
sum += a[i];
/*collects partial sums from other processes*/
4.7 for( i = 1; i < np; i++ )
{
MPI_Recv(&tmp, 1, MPI_INT, MPI_ANY_SOURCE,
0, MPI_COMM_WORLD, &status);
sender = status.MPI_SOURCE;
sum += tmp;
}
4.8 Display sum; /*Print the final sum of the array*/
}
5. else /*slave processes */
{
5.1 MPI_Recv(&n_elements_recieved, 1, MPI_INT, 0, 0,
MPI_COMM_WORLD,&status);
5.2 MPI_Recv(&a2, n_elements_recieved, MPI_INT, 0, 0,
MPI_COMM_WORLD, &status);
5.3 Initislize partial_sum = 0;
5.4 for( i = 0; i < n_elements_recieved; i++)
partial_sum += a2[i];
/*send the partial sum to the root process*/
5.5 MPI_Send(&partial_sum, 1, MPI_INT, 0, 0,
}
6. MPI_Finalize();

II. OUTPUT:

$ mpicc 9.c
$ mpiexec -n 8 ./a.out

Enter the size of array: 50

The array formed:


3 6 7 5 3 5 6 2 9 1 2 7 0 9 3 6 0 6 2 6 1 8 7 9 2 0 2 3 7 5 9
2 2 8 9 7 3 6 1 2 9 3 1 9 4 7 8 4 5 0
Sum of array is: 231
10. Write an MPI program to estimate the approximate area under
the curve y= f(x) using Trapezoidal rule and compare execution
time of parallel version and serial version.

I. PSEUDOCODE:

1. Parallel pseudocode
A> main()
1.Get a, b, n;
2.h = (b-a)/n;
3.MPI_Init(&argc, &argv);
4.MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
5.MPI_Comm_size(MPI_COMM_WORLD, &commsz);
6.local_n = n/commsz;
7.local_a = a + myrank*local_n*h;
8.local_b = local_a + local_n*h;
9.local_int = Trap(local_a, local_b, local_n, h);
10.if( myrank != MASTER )
MPI_Send ( &local_int, 1, MPI_DOUBLE, 0, 0,MPI_COMM_WORLD
);
else
{
total_int = local_int;
for ( source = 1; source < commsz; source++ )
{
MPI_Recv ( &local_int, 1, MPI_DOUBLE, source,
0,MPI_COMM_WORLD,
MPI_STATUS_IGNORE );
total_int += local_int;
}
11.if( MASTER )
Display total_int; /*the final area under the curve*/
12.MPI_Finalize();

B> Trap( leftendpt, rightendpt, trapcount, baselen )


1.estimate = (f(leftendpt) + f(rightendpt))/2.0;
2.for(i = 1; i <= trapcount-1; i++)
{
x = leftendpt + i*baselen;
estimate += f(x);
}

C> f(x)
1.return(1/(1+pow(x,2)));

II. Serial pseudocode


A>main()
1. Get a, b, n;
2. h = (b-a)/n;
3. Initaialize a = 0.0, b = 0.0, n = 1000000, sum = 0.0;
4. for( i = 1; i < n; i++ )
{
x = a + i*h;
sum = sum + f(x);
}
5. integral=(h/2)*(f(a)+f(b)+2*sum);
6. Display integral;

D> f(x)
1. return(1/(1+pow(x,2)));

II. OUTPUT

A> Parallel code


$ mpicc 10Parallel.c -lm
$ mpiexec -n 10 ./a.out

The integral is: 0.463648


******************************************************
Execution time of the code: 0.343410
******************************************************

B> Serial code


The integral is: 0.463648
*****************************************
Execution time of code: 1.000000
*****************************************

You might also like