0% found this document useful (0 votes)

10 views

Parallel Programming Module 3

Uploaded by

divyansh.death

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Parallel Programming Module 3

Uploaded by

divyansh.death

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

DS3202:

PARALLEL PROGRAMMING
MODULE – 3
Advanced OpenMP

6th SEM
B.Tech
DSE

1
Advanced OpenMP Constructs
Schedule clause
• Specifies how loop iteration are divided among team of threads.
• Schedule refers to the way in which loop indices are distributed
among threads
• The default schedule is implementation dependent.
• Supported scheduling types are:
• Static
• Dynamic
• Guided
• Runtime
• Auto
2
Why Schedule Clause?
• A parallel region has at least one barrier, at its end, and may have
additional barriers within it.
• At each barrier, the other members of the team must wait for the last
thread to arrive.
• To minimize this wait time, shared work should be distributed so that
all threads arrive at the barrier at about the same time.
• If some of that shared work is contained in for constructs, the
schedule clause can be used for this purpose.

3
Static Clause
• Default is static
• Loop iterations are divided into pieces of size chunk and then statically
assigned to threads.
• If chunk is not specified, the iteration are evenly (if possible) divided
contiguously among the threads.
• The size of a chunk, denoted as chunk_size, must be a positive integer.
• Each thread is assigned a contiguous range of indices in order of thread
number called round robin and is known as block cyclic scheduling
• Number of indices assigned to each thread is as equal as possible
• #pragma omp parallel for schedule (static,[n])
{
// ...some stuff
} 4
5
Example

6
7
8
Dynamic Clause
• Loop iterations are divided into chunks containing n iterations each and
then dynamically assigned to threads.
• When a thread finishes one chunk, it is dynamically assigned another.
• The default chunk size is 1.
• Iterations picked by threads depends upon the relative speeds of thread
execution
• #pragma omp parallel for schedule(dynamic, 1) is equivalent to
#pragma omp parallel for schedule(dynamic)
• #pragma omp parallel for schedule (dynamic, chunk_size)
for(i=0; i<8; i++)
{
… (loop body)
} 9
10
Example

11
12
Guided Clause
• Chunk size is dynamic while using guided method
• The size of a chunk is proportional to the number of unassigned iterations
divided by the number of the threads, and the size will be decreased to chunk-
size (but the last chunk could be smaller than chunk-size)
• chunk size = max((num_of_iterations remaining / 2*num_of_threads), n)
• The formula may differ across compiler implementations
• If you specify n, that is the minimum chunk size that each thread should get.
• Size of each successive chunks is decreasing
• The default chunk size is 1.
• #pragma omp parallel for schedule (guided, chunk_size)
for(i=0; i<8; i++)
{
… (loop body)
13
}
• The guided schedule is appropriate for the case in which the threads
may arrive at varying times at a for construct with each iteration
requiring about the same amount of work.
• This situation can happen if, for example, the for construct is
preceded by one or more sections or for constructs with nowait
clauses.

14
Example

15
16
Runtime Clause
• Determine the scheduling type at run time by the OMP_SCHEDULE
environment variable
• export OMP_SCHEDULE=“static, 4”
• Schedule can be specified through omp_schedule environment variable
• The schedule(runtime) clause tells it to set the schedule using the
environment variable
• The scheduling decision is deferred until runtime by the environment variable
OMP_SCHEDULE.
• It is illegal to specify a chunk size for this clause
• #pragma omp for schedule(runtime)
for(i=0; i<8; i++)
{
… (loop body)
17
}
Auto Clause
• With auto, scheduling is delegated to the compiler and runtime system.
• The compiler and runtime system can choose any possible mapping of
iterations to threads (including all possible valid schedules) and these
may be different in different loops.
• That is delegates the decision of the scheduling to the compiler and/or
runtime system.
• Thus, scheduling will be decided automatically by your machine.
• #pragma omp parallel for schedule (auto)
for(i=0; i<8; i++)
{
… (loop body)
}
18
Flush Directive
• Flush operation does not actually synchronize different threads.
• It just ensures that a thread’s values are made consistent with main memory.
• Flush forces data to updated in memory so other threads see the most recent
value.
• Thread-visible variables are written back to memory at this point.
• It prevents threads from observing stale or outdated values of variables.
• For pointers in the list, note that the pointer itself is flushed, not the object it
points to.

19
• However, processors can have their own registers and cache.
• –If a thread updates shared data, the new value will first be saved
in register and then stored back to the local cache.
• –The update are thus not necessarily immediately visible to other
threads.

20
Usage of flush directive
• The flush directive is typically used in conjunction with shared variables that
are accessed by multiple threads within a parallel region.
• Placed before a memory read operation to ensure that the most recent
values of shared variables are observed.
• Placed after a memory write operation to ensure that the updated values
are visible to other threads.
• Syntax : #pragma omp flush(list)
• where list is a comma-separated list of variables whose updates need to be
synchronized across threads.
• The list contains a list of named variables that will be flushed in order to
avoid flushing all variables.

21
• The FLUSH directive is implied for the directives shown in the table
below.
• The directive is not implied if a NOWAIT clause is present.

22
Memory ordering using flush directive
• The flush directive imposes a memory ordering constraint on the execution
of the program, ensuring that updates to variables are propagated in the
specified order.
• Prevents reordering of memory operations by the compiler or hardware,
which could lead to incorrect program behavior.

23
Example Code-Flush Directive

24
Explanation of code
• In this code, we have a shared variable shared_var. Thread 0 updates the
shared variable to the value 10.
• We use the flush directive after updating shared_var to ensure that the
update is visible to other threads.
• All threads then wait at the barrier (#pragma omp barrier) to ensure that
the update is visible before proceeding.
• Thread 0 updates the shared variable shared_var to the value 10 and
flushes the update. Both threads then wait at the barrier.
• Once the barrier is reached and the update is visible to all threads, both
threads print the value of the shared variable, which is 10 in this case.

25
Nested Parallelism
• OpenMP parallel regions can be nested inside each other.
• That means the ability to create parallel regions within other parallel regions.
• Nested parallelism enables a hierarchical structure of parallelism.
• Nested parallelism can potentially improve the utilization of computing resources.
• You need to turn on nested parallelism by setting OMP_NESTED or
omp_set_nested because many implementations turn off this feature by default.
• If nested parallelism is disabled, then the new team created by a thread
encountering a parallel construct inside a parallel region consists only of the
encountering thread.
• If nested parallelism is enabled, then the new team may consist of more than one
thread.
26
Creation of nested parallel regions
• Nested parallel regions are created by encountering additional
parallel constructs (#pragma omp parallel) within an existing parallel
region.
• When a thread encounters a nested parallel construct, it creates a
team of threads to execute the nested parallel region.
• The number of threads created for the nested parallel region may be
determined by the num_threads clause or the environment settings.

27
Example code 1– Nested Parallelism

28
Explanation of Code
• Each thread in the outer parallel region will print its thread ID.
• Then, each thread in the outer parallel region will enter the inner parallel
region.
• Within the inner parallel region, each thread will print its thread ID again.
• This demonstrates that each thread in the outer parallel region can
spawn threads to execute the inner parallel region.

29
Example 2

30
31
32
Performance Considerations in Nested Parallelism

• Nested parallelism allows for more granular control over thread allocation
and resource utilization.
• The threads from outer parallel regions can be reused or allocated to
execute inner parallel regions.
• While nested parallelism can potentially improve performance by exploiting
additional parallelism, it can also introduce overhead due to thread
management and synchronization.
• Excessive nesting may lead to diminishing returns or increased overhead.

33
Thread private directive
• Used to declare variables that should have private instances for each thread
in a parallel region.
• Variables declared with the "thread private" directive have private instances
for each thread in a parallel region.
• Syntax : #pragma omp threadprivate(variable_list)
• where variable_list is a comma-separated list of variables that should
have private instances for each thread.

34
Scope and initialization
• Scope :
• The "thread private" directive typically appears outside of parallel regions, often at
the global or file scope.
• Specifies that the listed variables should be treated as thread-private for all
subsequent parallel regions.
• Initialization :
• Thread-private variables are initialized once at the beginning of the program
execution.
• The initial values are shared across all threads but are independent of the initial
values of variables in other threads.

35
Usage and memory overhead
• Usage : Commonly used for global variables or variables declared at file
scope that need to be shared across multiple parallel regions but have
private instances for each thread.
• Memory Overhead :
• Using the "thread private" directive incurs memory overhead, as each thread
maintains its own private copy of the variable.
• Hence, it should be used judiciously, especially for large variables or in scenarios with
a large number of threads.

36
Example code of thread private
#include <stdio.h>
#include <omp.h>
// Declaring a global variable as threadprivate
#pragma omp threadprivate(global_var)

// Global variable declared as threadprivate

int global_var = 0;

int main() {
// Setting the global variable's value to the thread's ID
global_var = omp_get_thread_num();

// Parallel region with two threads

#pragma omp parallel num_threads(2)
{
// Each thread prints its own thread-private value of the global variable
printf("Thread %d: Global Variable = %d\n", omp_get_thread_num(),
global_var);
}
37
return 0; }
Code explanation
• The global variable global_var is marked as threadprivate, so each thread
has its own private instance of this variable.
• Before entering the parallel region, the main thread sets the value of
global_var to its own thread ID (which is 0 in this case).
• Within the parallel region, there are two threads.
• Thread 0 prints its own private instance of global_var, which retains the
value set outside the parallel region (0).
• Thread 1 also prints its own private instance of global_var, which retains the
value set outside the parallel region (1).

38
Example 2

39
40
• Data in THREADPRIVATE objects is guaranteed to persist only if the
dynamic threads mechanism is “turned off” and the number of
threads in different parallel regions remains constant. The default
setting of dynamic threads is undefined.

• The THREADPRIVATE directive must appear after every declaration of

a thread private variable/common block.

41
Difference between private and thread private
• Scope :
• The "thread private" directive is typically used at the global or file scope and specifies
that the listed variables should have private instances for each thread in all subsequent
parallel regions.
• The "private" data scope attribute clause is used within parallel constructs (e.g.,
parallel, for, sections) to declare private variables and applies only within the specific
parallel region where it is used and affects only that region.
• Usage :
• Thread private directive is used for variables that need to be shared across multiple
parallel regions but have private instances for each thread.
• Private data scope attribute clause is used to declare private variables within a specific
parallel region.

42
Continued………

• Initialization :
• Variables declared with the "thread private" directive are initialized once at the
beginning of the program execution.
• Private variables declared within a parallel region are typically uninitialized at the start
of the region and may retain their previous values if they have been used in an outer
scope.
• Memory overhead :
• Thread private directive Incurs memory overhead as each thread maintains its own
private copy of the variable.
• Private variables declared within a parallel region do not incur memory overhead
outside of that region.

43
End of Module 3
(THANK YOU)

Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Open MP
No ratings yet
Open MP
35 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
CSL 730: Parallel Programming: Openmp
No ratings yet
CSL 730: Parallel Programming: Openmp
74 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Open MP2
No ratings yet
Open MP2
28 pages
4 Openmp
No ratings yet
4 Openmp
32 pages
Unit III
No ratings yet
Unit III
15 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
OPENMP Language Features - Part 1 - 2
No ratings yet
OPENMP Language Features - Part 1 - 2
38 pages
W8L2 OpenMP6 Furthertopics
No ratings yet
W8L2 OpenMP6 Furthertopics
20 pages
Openmp
No ratings yet
Openmp
61 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
W7L2 OpenMP4 Worksharing
No ratings yet
W7L2 OpenMP4 Worksharing
26 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
No ratings yet
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
50 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Openmp: Martin Kruliš Ji Ří Dokulil
No ratings yet
Openmp: Martin Kruliš Ji Ří Dokulil
38 pages
W8L2 OpenMP7 Tasks
No ratings yet
W8L2 OpenMP7 Tasks
21 pages
UNIT III
No ratings yet
UNIT III
61 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
17 Shared Memory
No ratings yet
17 Shared Memory
73 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
No ratings yet
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
73 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
Updated_CS8083 MCP UNIT III notes
No ratings yet
Updated_CS8083 MCP UNIT III notes
26 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
07 OpenMP
No ratings yet
07 OpenMP
28 pages
Lab 2: Brief Tutorial On Openmp Programming Model: Adrián Álvarez, Sergi Gil Par4207 2019/2020
No ratings yet
Lab 2: Brief Tutorial On Openmp Programming Model: Adrián Álvarez, Sergi Gil Par4207 2019/2020
11 pages
Openmp
No ratings yet
Openmp
115 pages
UNIT 3
No ratings yet
UNIT 3
13 pages
3unit3 Mca Pecnotes
No ratings yet
3unit3 Mca Pecnotes
23 pages
Lecture 10 Shared Memory Programming with OpenMP.pptx
No ratings yet
Lecture 10 Shared Memory Programming with OpenMP.pptx
30 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Cao Da1
No ratings yet
Cao Da1
9 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
Program Excecution ExpFinal
No ratings yet
Program Excecution ExpFinal
10 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
Open MP
No ratings yet
Open MP
30 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
C++ - Module 04: Subtype Polymorphism, Abstract Classes, Interfaces
No ratings yet
C++ - Module 04: Subtype Polymorphism, Abstract Classes, Interfaces
13 pages
FullStackCafe - Disgn Paterin
No ratings yet
FullStackCafe - Disgn Paterin
9 pages
Ariba Spend Management Query API Guide 9r2
No ratings yet
Ariba Spend Management Query API Guide 9r2
80 pages
Basic Game Programming With FreeBASIC - Rev1
No ratings yet
Basic Game Programming With FreeBASIC - Rev1
22 pages
Chapter 1 2
No ratings yet
Chapter 1 2
20 pages
21CSC101T Dec 2022
No ratings yet
21CSC101T Dec 2022
6 pages
Final Project
No ratings yet
Final Project
71 pages
K L University Freshman Engineering Department: A Project Based Lab Report On
No ratings yet
K L University Freshman Engineering Department: A Project Based Lab Report On
12 pages
Jhipster Book PDF Screen v7.0.0 1678177918638
No ratings yet
Jhipster Book PDF Screen v7.0.0 1678177918638
167 pages
DVCon Europe 2015 TA1 2 Paper
No ratings yet
DVCon Europe 2015 TA1 2 Paper
8 pages
HND-OOP Assignment
100% (2)
HND-OOP Assignment
25 pages
Configuration Programming Exercise: Configuration Theme: Creating A Configurable Component
No ratings yet
Configuration Programming Exercise: Configuration Theme: Creating A Configurable Component
4 pages
Pascal Environment Setup
100% (1)
Pascal Environment Setup
2 pages
Coinmetrics: Release 0.2.4
No ratings yet
Coinmetrics: Release 0.2.4
26 pages
TM04 Producing Basic Server-side Scrip for Dynamic Web Page
No ratings yet
TM04 Producing Basic Server-side Scrip for Dynamic Web Page
94 pages
Tutorial How To Make Macro
No ratings yet
Tutorial How To Make Macro
9 pages
Ug897 Vivado Sysgen User
No ratings yet
Ug897 Vivado Sysgen User
256 pages
C Memory Allocation Questions
No ratings yet
C Memory Allocation Questions
10 pages
Logcat
No ratings yet
Logcat
61 pages
Online Exam System in PHP Using CodeIgniter
No ratings yet
Online Exam System in PHP Using CodeIgniter
10 pages
Atkore Toolbar Installation Instructions
No ratings yet
Atkore Toolbar Installation Instructions
3 pages
The Queuemetrics Uniloader User Manual
No ratings yet
The Queuemetrics Uniloader User Manual
47 pages
Website Authoring
No ratings yet
Website Authoring
24 pages
Computer Science Sample Paper 1 Class 9
No ratings yet
Computer Science Sample Paper 1 Class 9
1 page
All Salesforce Interview Questions With Scenarion Based: List Things That Can Be Customized On Page Layouts?
No ratings yet
All Salesforce Interview Questions With Scenarion Based: List Things That Can Be Customized On Page Layouts?
19 pages
Gnuplot 5 4
No ratings yet
Gnuplot 5 4
313 pages
Visual Basic Hallo World
No ratings yet
Visual Basic Hallo World
6 pages
SQL Server Database Configuration Best Practices
No ratings yet
SQL Server Database Configuration Best Practices
19 pages
Hotel Management Report
No ratings yet
Hotel Management Report
95 pages
Prep Udupi Paper PU II Year 2024
No ratings yet
Prep Udupi Paper PU II Year 2024
16 pages

Parallel Programming Module 3

Uploaded by

Parallel Programming Module 3

Uploaded by

DS3202:

// Global variable declared as threadprivate

// Parallel region with two threads

• The THREADPRIVATE directive must appear after every declaration of

You might also like