0% found this document useful (0 votes)

151 views

Openmp Tutorial: Seung-Jai Min

This document provides an introduction to parallel programming using OpenMP. It discusses how OpenMP is used to parallelize loops across multiple threads using compiler directives. OpenMP provides a fork-join model of parallel execution where parallel regions are flanked by sequential regions. Within parallel regions, data can be declared as shared, where it is accessible by all threads, or private, where it is accessible only by the thread that owns it.

Uploaded by

Victor Manuel Chavez Bruno

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views

Openmp Tutorial: Seung-Jai Min

Uploaded by

Victor Manuel Chavez Bruno

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

OpenMP Tutorial

Seung-Jai Min
(smin@purdue.edu)

School of Electrical and Computer Engineering

Purdue University, West Lafayette, IN

ECE 563 Programming Parallel Machines 1

Parallel Programming Standards

• Thread Libraries
- Win32 API / Posix threads
• Compiler Directives OUR FOCUS
- OpenMP (Shared memory programming)
• Message Passing Libraries
- MPI (Distributed memory programming)

ECE 563 Programming Parallel Machines 2

Shared Memory Parallel
Programming in the Multi-Core Era

• Desktop and Laptop

– 2, 4, 8 cores and … ?
• A single node in distributed memory clusters
– Steele cluster node: 2  8  (16) cores
• Shared memory hardware Accelerators
• Cell processors: 1 PPE and 8 SPEs
• Nvidia Quadro GPUs: 128 processing units

ECE 563 Programming Parallel Machines 3

OpenMP:
Some syntax details to get us started
• Most of the constructs in OpenMP are compiler
directives or pragmas.
– For C and C++, the pragmas take the form:
#pragma omp construct [clause [clause]…]
– For Fortran, the directives take one of the forms:
C$OMP construct [clause [clause]…]
!$OMP construct [clause [clause]…]
*$OMP construct [clause [clause]…]
• Include files
#include “omp.h”

ECE 563 Programming Parallel Machines 4

How is OpenMP typically used?
• OpenMP is usually used to parallelize loops:
• Find your most time consuming loops.
• Split them up between threads.
Parallel Program
Sequential Program
#include “omp.h”
void main() void main()
{ {
int i, k, N=1000; int i, k, N=1000;
double A[N], B[N], C[N]; double A[N], B[N], C[N];
for (i=0; i<N; i++) { #pragma omp parallel for
A[i] = B[i] + k*C[i] for (i=0; i<N; i++) {
} A[i] = B[i] + k*C[i];
} }
}

ECE 563 Programming Parallel Machines 5

How is OpenMP typically used?
(Cont.)
• Single Program Multiple Data (SPMD)
Parallel Program

#include “omp.h”
void main()
{
int i, k, N=1000;
double A[N], B[N], C[N];
#pragma omp parallel for
for (i=0; i<N; i++) {
A[i] = B[i] + k*C[i];
}
}

ECE 563 Programming Parallel Machines 6

How is OpenMP typically used?
(Cont.)
• Single Program Multiple Data (SPMD)
Thread 0
Thread 1
void main() Thread 2
{ void main() Thread 3
{
int i, k, N=1000; void main()
int i,C[N];
double A[N], B[N], {
k, N=1000; void main()
lb = 0; int i,C[N];
double A[N], B[N], {
k, N=1000;
ub = 250; lb = 250; int i,C[N];
double A[N], B[N], k, N=1000;
ub = 500;
for (i=lb;i<ub;i++) { lb = 500; double A[N], B[N], C[N];
A[i] = B[i] for ub = 750;
(i=lb;i<ub;i++)
+ k*C[i]; { lb = 750;
} A[i] = B[i] for
+ k*C[i]; ub = 1000;
(i=lb;i<ub;i++) {
} } A[i] = B[i] for (i=lb;i<ub;i++) {
+ k*C[i];
} } A[i] = B[i] + k*C[i];
} }
}

ECE 563 Programming Parallel Machines 7

OpenMP Fork-and-Join model

printf(“program begin\n”);
N = 1000; Serial

#pragma omp parallel for

for (i=0; i<N; i++) Parallel
A[i] = B[i] + C[i];

M = 500; Serial
#pragma omp parallel for
for (j=0; j<M; j++) Parallel
p[j] = q[j] – r[j];
printf(“program done\n”); Serial

ECE 563 Programming Parallel Machines 8

OpenMP Constructs
1. Parallel Regions
– #pragma omp parallel
2. Worksharing
– #pragma omp for, #pragma omp sections
3. Data Environment
– #pragma omp parallel shared/private (…)
4. Synchronization
– #pragma omp barrier
5. Runtime functions/environment variables
– int my_thread_id = omp_get_num_threads();
– omp_set_num_threads(8);

ECE 563 Programming Parallel Machines 9

OpenMP: Structured blocks

• Most OpenMP constructs apply to structured

blocks.
– Structured block: one point of entry at the top and
one point of exit at the bottom.
– The only “branches” allowed are STOP statements in
Fortran and exit() in C/C++.

ECE 563 Programming Parallel Machines 10

OpenMP: Structured blocks

A Structured Block Not A Structured Block

#pragma omp parallel if(count==1) goto more;

{ #pragma omp parallel
more: do_big_job(id); {
if(++count>1) goto more; more: do_big_job(id);
} if(++count>1) goto done;
printf(“ All done \n”); }
done: if(!really_done()) goto more;

ECE 563 Programming Parallel Machines 11

Structured Block Boundaries
• In C/C++: a block is a single statement or a group of
statements between brackets {}

#pragma omp parallel #pragma omp for

{ for (I=0;I<N;I++) {
id = omp_thread_num(); res[I] = big_calc(I);
A[id] = big_compute(id); A[I] = B[I] + res[I];
} }

ECE 563 Programming Parallel Machines 12

Structured Block Boundaries
• In Fortran: a block is a single statement or a group of
statements between directive/end-directive pairs.

C$OMP PARALLEL C$OMP PARALLEL DO

10 W(id) = garbage(id) do I=1,N
res(id) = W(id)**2 res(I)=bigComp(I)
if(res(id) goto 10 end do
C$OMP END PARALLEL C$OMP END PARALLEL DO

ECE 563 Programming Parallel Machines 13

OpenMP Parallel Regions
double A[1000];
omp_set_num_threads(4);
#pragma omp parallel
{
int ID = omp_get_thread_num();
double A[1000]; pooh(ID, A);
}
omp_set_num_threads(4) printf(“all done\n”);

A single
copy of “A”
is shared
pooh(0,A) pooh(1,A) pooh(2,A) pooh(3,A)
between all
threads.
printf(“all done\n”); Implicit barrier: threads wait here for
all threads to finish before proceeding
ECE 563 Programming Parallel Machines 14
The OpenMP API
Combined parallel work-share
• OpenMP shortcut: Put the “parallel” and the
work-share on the same line
int i; int i;
double res[MAX]; double res[MAX];
#pragma omp parallel #pragma omp parallel for
{ for (i=0;i< MAX; i++) {
#pragma omp for res[i] = huge();
for (i=0;i< MAX; i++) { }
res[i] = huge();
}
}
the same OpenMP
ECE 563 Programming Parallel Machines 15
Shared Memory Model
private private

thread2 • Data can be shared or

thread1
private
• Shared data is accessible
Shared by all threads
Memory • Private data can be
accessed only by the
thread3 thread that owns it
thread5
• Data transfer is transparent
thread4 to the programmer
private
private
private
ECE 563 Programming Parallel Machines 16
Data Environment:
Default storage attributes

• Shared Memory programming model

– Variables are shared by default
• Distributed Memory Programming Model
– All variables are private

ECE 563 Programming Parallel Machines 17

Data Environment:
Default storage attributes

• Global variables are SHARED among threads

– Fortran: COMMON blocks, SAVE variables, MODULE variables
– C: File scope variables, static

• But not everything is shared...

– Stack variables in sub-programs called from parallel regions
are PRIVATE
– Automatic variables within a statement block are PRIVATE.

ECE 563 Programming Parallel Machines 18

Data Environment
int A[100]; /* (Global) SHARED */

int main()
{
int ii, jj; /* PRIVATE */
int foo(int x)
int B[100]; /* SHARED */
{
#pragma omp parallel private(jj)
/* PRIVATE */
{
int count=0;
int kk = 1; /* PRIVATE */
return x*count;
#pragma omp for
}
for (ii=0; ii<N; ii++)
for (jj=0; jj<N; jj++)
A[ii][jj] = foo(B[ii][jj]);
}
}
ECE 563 Programming Parallel Machines 19
Work Sharing Construct
Loop Construct
#pragma omp for [clause[[,] clause …] new-line
for-loops

Where clause is one of the following:

private / firstprivate / lastprivate(list)
reduction(operator: list)
schedule(kind[, chunk_size])
collapse(n)
ordered
nowait

ECE 563 Programming Parallel Machines 20

Schedule
for (i=0; i<1100; i++)
A[i] = … ;

#pragma omp parallel for schedule (static, 250) or (static)

250 250 250 250 100 or 275 275 275 275

p0 p1 p2 p3 p0 p0 p1 p2 p3
#pragma omp parallel for schedule (dynamic, 200)

200 200 200 200 200 100

p3 p0 p2 p3 p1 p0
#pragma omp parallel for schedule (guided, 100)

137 120 105 100 100 100 100 100 100 100 38
p0 p3 p0 p1 p2 p3 p0 p1 p2 p3 p0

#pragma omp parallel for schedule (auto)

ECE 563 Programming Parallel Machines 21
Critical Construct
sum = 0;
#pragma omp parallel private (lsum)
{
lsum = 0;
#pragma omp for
for (i=0; i<N; i++) {
lsum = lsum + A[i];
}
#pragma omp critical
{ sum += lsum; } Threads wait their turn;
} only one thread at a time
executes the critical section

ECE 563 Programming Parallel Machines 22

Reduction Clause
Shared variable

sum = 0;
#pragma omp parallel for reduction (+:sum)
for (i=0; i<N; i++)
{
sum = sum + A[i];
}

ECE 563 Programming Parallel Machines 23

Performance Evaluation
• How do we measure performance? (or
how do we remove noise?)
#define N 24000
For (k=0; k<10; k++)
{
#pragma omp parallel for private(i, j)
for (i=1; i<N-1; i++)
for (j=1; j<N-1; j++)
a[i][j] = (b[i][j-1]+b[i][j+1])/2.0;
}

ECE 563 Programming Parallel Machines 24

Performance IssuesSpeedup
• What if you see
a speedup saturation?
# CPUs
1 2 4 6 8

#define N 12000
#pragma omp parallel for private(j)
for (i=1; i<N-1; i++)
for (j=1; j<N-1; j++)
a[i][j] = (b[i][j-1]+b[i][j]+b[i][j+1]
b[i-1][j]+b[i+1][j])/5.0;

ECE 563 Programming Parallel Machines 25

Performance Issues
Speedup

• What if you see

a speedup saturation?
# CPUs
1 2 4 6 8

#define N 12000
#pragma omp parallel for private(j)
for (i=1; i<N-1; i++)
for (j=1; j<N-1; j++)
a[i][j] = b[i][j];

ECE 563 Programming Parallel Machines 26

Loop Scheduling

• Any guideline for a chunk size?

#define N <big-number>

chunk = ???;
#pragma omp parallel for schedule (static, chunk)
for (i=1; i<N-1; i++)
a[i][j] = ( b[i-2] + b[i-1] + b[i]
b[i+1] + b[i+2] )/5.0;

ECE 563 Programming Parallel Machines 27

Performance Issues

• Load imbalance: triangular access pattern

#define N 12000
#pragma omp parallel for private(j)
for (i=1; i<N-1; i++)
for (j=i; j<N-1; j++)
a[i][j] = (b[i][j-1]+b[i][j]+b[i][j+1]
b[i-1][j]+b[i+1][j])/5.0;

ECE 563 Programming Parallel Machines 28

Summary
• OpenMP has advantages
– Incremental parallelization
– Compared to MPI
• No data partitioning
• No communication scheduling

ECE 563 Programming Parallel Machines 29

Resources

http://www.openmp.org
http://openmp.org/wp/resources

ECE 563 Programming Parallel Machines 30

2015 Midterm 1 Exam Solution
No ratings yet
2015 Midterm 1 Exam Solution
10 pages
OpenMP Lec11 Week4
No ratings yet
OpenMP Lec11 Week4
18 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
Worksharing and Parallel Loops
No ratings yet
Worksharing and Parallel Loops
23 pages
Parallel and Distributed Computing CSE4001 Lab - 4
100% (1)
Parallel and Distributed Computing CSE4001 Lab - 4
5 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Module 2 - New 1
No ratings yet
Module 2 - New 1
72 pages
Micro
No ratings yet
Micro
30 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Openmp 4
No ratings yet
Openmp 4
31 pages
Exercise 1 (Openmp-I)
No ratings yet
Exercise 1 (Openmp-I)
10 pages
Openmp 3
No ratings yet
Openmp 3
32 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Multicore Architecture and Programming Lab Manual
No ratings yet
Multicore Architecture and Programming Lab Manual
29 pages
SWE2017 - Lab Assignment 1pages-7
No ratings yet
SWE2017 - Lab Assignment 1pages-7
5 pages
20BCE260
No ratings yet
20BCE260
13 pages
C Step2
No ratings yet
C Step2
27 pages
PC File
No ratings yet
PC File
57 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
PDC LAB Experiment 2
No ratings yet
PDC LAB Experiment 2
12 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
PDC-Assignment#02
No ratings yet
PDC-Assignment#02
5 pages
Name: Harshvardhan Singh Gahlaut Reg. No.: 19BCE2372 Slot: L41+L42
No ratings yet
Name: Harshvardhan Singh Gahlaut Reg. No.: 19BCE2372 Slot: L41+L42
3 pages
Omp Hands On SC08
No ratings yet
Omp Hands On SC08
153 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
100% (1)
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
15 pages
PDC Lab 2-5
No ratings yet
PDC Lab 2-5
5 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
final
No ratings yet
final
30 pages
C Programs
No ratings yet
C Programs
5 pages
MPI Plamen Krastev
No ratings yet
MPI Plamen Krastev
49 pages
HPC LAB MANUAL
No ratings yet
HPC LAB MANUAL
31 pages
Cse 4001-Parallel and Distributed Computing Lab Digital Assessment-1 Name: Avulapati Anusha REG - NO: 17BCE0435
No ratings yet
Cse 4001-Parallel and Distributed Computing Lab Digital Assessment-1 Name: Avulapati Anusha REG - NO: 17BCE0435
5 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Lab 3
No ratings yet
Lab 3
23 pages
CP4252 Multicore Architecture and Programming Lab Manual
No ratings yet
CP4252 Multicore Architecture and Programming Lab Manual
26 pages
FOP_Exam
No ratings yet
FOP_Exam
10 pages
Dot NeT Technology
No ratings yet
Dot NeT Technology
36 pages
Lecture 13 Pdc Bcs 6ef Smi Spring 2025
No ratings yet
Lecture 13 Pdc Bcs 6ef Smi Spring 2025
17 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
As 3
No ratings yet
As 3
2 pages
Ecole Militaire Polytechnique: Content
No ratings yet
Ecole Militaire Polytechnique: Content
16 pages
07 OpenMP
No ratings yet
07 OpenMP
28 pages
Basic TCS
No ratings yet
Basic TCS
39 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Parallel Computing Lab Manual PDF
No ratings yet
Parallel Computing Lab Manual PDF
51 pages
Shared Memory and Accelerators
No ratings yet
Shared Memory and Accelerators
88 pages
Excelente
No ratings yet
Excelente
64 pages
Lab # 1
No ratings yet
Lab # 1
11 pages
Exp 3 HPC
No ratings yet
Exp 3 HPC
8 pages
Daa Unit-1-1
No ratings yet
Daa Unit-1-1
41 pages
Parallel Lab Program
No ratings yet
Parallel Lab Program
2 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Message
No ratings yet
Message
2 pages
Process Schedduling-Synchronization and Deadlock Course Material
No ratings yet
Process Schedduling-Synchronization and Deadlock Course Material
41 pages
Unit 1 - IMED
No ratings yet
Unit 1 - IMED
37 pages
MPLS Troubleshooting Guide
100% (1)
MPLS Troubleshooting Guide
176 pages
Dell Powerprotect Data Manager: Protecting Kubernetes Workloads
No ratings yet
Dell Powerprotect Data Manager: Protecting Kubernetes Workloads
38 pages
Drop Counter Tree
No ratings yet
Drop Counter Tree
2 pages
Azure API Management
No ratings yet
Azure API Management
30 pages
Data Apoyo 01 - 10
No ratings yet
Data Apoyo 01 - 10
51 pages
JAVA Presentation2-1
No ratings yet
JAVA Presentation2-1
33 pages
Cs8791 Cloud Computing Unit1 Notes
No ratings yet
Cs8791 Cloud Computing Unit1 Notes
34 pages
Configuring Global Parameters: Summary Steps
No ratings yet
Configuring Global Parameters: Summary Steps
12 pages
GC 2024 05 01
No ratings yet
GC 2024 05 01
7 pages
Download Internet Technologies Handbook Optimizing the IP Network 1st Edition Mark A. Miller ebook All Chapters PDF
No ratings yet
Download Internet Technologies Handbook Optimizing the IP Network 1st Edition Mark A. Miller ebook All Chapters PDF
67 pages
Ax Adc: Application Delivery Controller
No ratings yet
Ax Adc: Application Delivery Controller
5 pages
WBR 3408 Datasheet
No ratings yet
WBR 3408 Datasheet
3 pages
DAG2500-48S-72S-96S Analog Gateway Datasheet v1.0
No ratings yet
DAG2500-48S-72S-96S Analog Gateway Datasheet v1.0
2 pages
8.terraform Modules
No ratings yet
8.terraform Modules
147 pages
HP 5820 VPN Firewall Module-C04140074
No ratings yet
HP 5820 VPN Firewall Module-C04140074
8 pages
Difference B
No ratings yet
Difference B
2 pages
ACN Lab Manual
No ratings yet
ACN Lab Manual
70 pages
Chapter 6: CPU Scheduling: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
No ratings yet
Chapter 6: CPU Scheduling: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
28 pages
Shopee - MDD API Documentation.V.1.0
No ratings yet
Shopee - MDD API Documentation.V.1.0
11 pages
Connection Log
No ratings yet
Connection Log
140 pages
Asa 99 Firewall Config PDF
No ratings yet
Asa 99 Firewall Config PDF
500 pages
20411D Lab01 Ex03 Installing and Configuring DNS Zones Template
No ratings yet
20411D Lab01 Ex03 Installing and Configuring DNS Zones Template
4 pages
Red Intuitiva - BRKCRS-2814 PDF
No ratings yet
Red Intuitiva - BRKCRS-2814 PDF
91 pages
Brkarc 3190
No ratings yet
Brkarc 3190
81 pages
VRF Leak Route Between Global and VRF Routing Table
No ratings yet
VRF Leak Route Between Global and VRF Routing Table
4 pages
DLL - HDLC & PPP
No ratings yet
DLL - HDLC & PPP
6 pages