Lab 2: Brief Tutorial On Openmp Programming Model: Adrián Álvarez, Sergi Gil Par4207 2019/2020
Lab 2: Brief Tutorial On Openmp Programming Model: Adrián Álvarez, Sergi Gil Par4207 2019/2020
Lab 2: Brief Tutorial On Openmp Programming Model: Adrián Álvarez, Sergi Gil Par4207 2019/2020
In this laboratory we’re going to be introduced to the main constructs in the OpenMP
extensions to the C programming language. In the first part we’re going through a set of
different code versions for the computation of number Pi in parallel, and then we’re going
to be presented with a set of examples to practice the main components of the OpenMP
programming model. In the Deliverable part of this session we need to fill the
questionnaire. Next, we will be finishing the Deliverable by observing the overheads
introduced by the use of different synchronisation constructs in OpenMP.
2. OpenMP questionnaire
A) Parallel regions
1.hello.c:
1. How many times will you see the "Hello world!" message if the program is executed
with "./1.hello"?
The program print 24 times "Hello world!".
2. Without changing the program, how to make it to print 4 times the "Hello World!"
message?
If we execute the command line “OMP_NUM_THREADS=4 ./1.hello” we can set
externaly the number of threads we want to use.
2.hello.c:
1. Is the execution of the program correct? (i.e., prints a sequence of "(Thid) Hello (Thid)
world!" being Thid the thread identifier). If not, add a data sharing clause to make it
correct?
No, is incorrect. We need to change the clause “#pragma omp parallel num_threads(8)” to
“#pragma omp parallel private(id) num_threads(8)”.
2. Are the lines always printed in the same order? Why the messages sometimes appear
inter- mixed? (Execute several times in order to see this).
No, the lines are not always printed in the same order and that’s because it’s executed in
parallel and not in sequencial. Furthermore, the messages appear inter-mixed, because
we’re not controlling the threads what are printing, so if one thread is printing but another
one is going to print it can appear this situation.
3.how many.c:
Assuming the OMP NUM THREADS variable is set to 8 with "export OMP NUM
THREADS=8"
1. How many "Hello world …" lines are printed on the screen?
20 lines.
2. What does omp get num threads return when invoked outside and inside a parallel
region?
If it’s outside a parallel region it return 1 thread, and inside a parallel region it will return
the number of threads that are using right now.
4.data sharing.c:
1. Which is the value of variable x after the execution of each parallel region with different
data-sharing attribute ( shared, private, firstprivate and reduction)? Is that the value you
would expect? (Execute several times if necessary).
The first one that is shared the variable inside the construct is the same as the one
outside the construct. In a parallel construct this means all threads see the same variable
but not necessarily the same value.
The second one that is privatethe variable inside the construct is a new variable of the
same type with an undefined value. In a parallel construct this means all threads have a
different variable.
The third one that is firstprivate he variable inside the construct is a new variable of the
same type but it is initialized to the original variable value. In a parallel construct this
means all threads have a different variable with the same initial value.
The fourth one that is reduction where all threads accumulate values into a single
variable. The compiler creates a private copy of each variable in list that is properly
initialized to the identity value At the end of the region, the compiler ensures that the
shared variable is properly (and safely) updated with the partial values of each thread,
using the specified operator.
B) Loop parallelism
1.schedule.c
1. Which iterations of the loops are executed by each thread for each schedule kind?
-Static: In this case the iterations are distributed equality to the threads(3 iterations for
each thread).
-Static(chunk=2): In this case the iterations are distributed equality to the threads, but with
the exception that each thread start with 2 chunks.
-Dynamic(chunk=2): In this case the iterations are distributed to the thread that finish
before the others and give him 2 iterations more, until it runs out of iterations.
-Guided(chunk=2): Variant of dynamic. The size of the chunks decreases as the
threads grab iterations, but it is at least of size 2.
2.nowait.c
2. How does the sequence of printf change if the nowait clause is removed from the first
for directive?
The nowait clause is used to remove the synchronization time between threads, so it goes
more slowly than the execution of the program with nowait.
3. What would happen if dynamic is changed to static in the schedule in both loops?
(keeping the nowait clause)
It doesn’t change, because the amounts of iterations are enough to see the difference
between the dynamic clause and the static clause.
3.collapse.c
1. Which iterations of the loop are executed by each thread when the collapse clause is
used?
I and j loops are folded and iterations distributed among all threads. Both i and j are
privatized. That means that the iterations of the loops are combined.
2. Is the execution correct if the collapse clause is removed? Which clause (different than
collapse) should be added to make it correct?
The execution is not correct some iterations are repeated, in order to be correct we need
to privatize the variable j.
#pragma omp parallel for private(j)
C) Synchronization
1.datarace.c
No, never.
2. Add two alternative directives to make it correct. Explain why they make the execution
correct.
One is to put #pragma omp critical and the other is to put #pragma omp atomic, inside de
loop.
2.barrier.c
1. Can you predict the sequence of messages in this program? Do threads exit from the
barrier in any specific order?
Not at all, we can predict that the messages are going to be going to sleep 4 times, then enter
the barrier 4 times and for last all awake 4 times, but we don’t know the order of threads.
3.ordered.c
1. Can you explain the order in which the ”Outside” and ”Inside” messages are printed?
Inside the ordered section we can ensure that it will be printed in order, but outside we
cannot ensure the order.
2. How can you ensure that a thread always executes two consecutive iterations in order
during the execution of the ordered part of the loop body?
Writting the code #pragma omp ordered before the first printf.
D) Tasks
1.single.c
1. Can you explain why all threads contribute to the execution of instances of the single
work-sharing construct? Why are those instances appear to be executed in bursts?
#pragma omp single, it used to make that only one thread of the team executes the
structured block. Because there is a barrier implicit in the single construct.
2.fibtasks.c
1. Why all tasks are created and executed by the same thread? In other words, why the
program is not executing in parallel?
#pragma omp task, allow to only one thread to work, but the others cooperate with the
thread that is working to execute the operation.
2. Modify the code so that the program correctly executes in parallel, returning the same
answer that the sequential execution would return.
We need to write #pragma omp single, just before the first while.
3.synchtasks.c
Main Task
Foo1 Foo2
Foo3 Foo4
Foo5
2. Rewrite the program using only taskwait as task synchronisation mechanism (no
depend
clauses allowed)
4.taskloop.c
1. Find out how many tasks and how many iterations each task execute when using the
grainsize and num tasks clause in a taskloop. You will probably have to execute the
program several times in order to have a clear answer to this question.
If a grainsize clause is present, the number of iterations assigned to each generated task
is greater than or equal to the minimum of the value of the grain-size expression, but less
than two times the value of the grain-size expression. So we have 6 iterations for each
thread used and we use 2 threads.
If num_tasks is specified, the taskloop creates as many tasks as the minimum of the num-
tasks expression and the number of iterations. Each task must have at least one iteration.
So it’s quite variable.
2. What does occur if the nogroup clause in the first taskloop is uncommented
While executing the program pi_omp_parallel we can observe that the time of the
overheads are:
in this table we can observe that if we use more threads provoke an increase of increase
the time excution of the overheads, but on the other hand each thread has less time
execution.
Task creation and synchronization
While executing the program pi_omp_tasks we can observe that the time of the
overheads are:
2 0.1184 0.0592
4 0.5055 0.1264
6 0.7445 0.1241
8 0.9947 0.1243
10 1.2360 0.1236
12 1.4786 0.1232
14 1.7191 0.1228
16 1.9504 0.1219
18 2.2065 0.1226
20 2.4496 0.1225
22 2.6902 0.1223
24 2.9456 0.1227
26 3.1772 0.1222
28 3.4752 0.1241
30 3.6529 0.1218
32 3.8936 0.1217
34 4.1446 0.1219
36 4.3907 0.1220
38 4.6352 0.1220
40 4.8735 0.1218
42 5.1101 0.1217
44 5.3618 0.1219
46 5.6119 0.1220
48 5.8287 0.1214
50 6.0919 0.1218
52 6.3349 0.1218
54 6.5618 0.1215
56 6.8169 0.1217
58 7.0564 0.1217
60 7.3124 0.1219
62 7.5558 0.1219
64 7.7741 0.1215
In this table we can observe that if we use more threads, provoke an increase of the time
excution of the overheads, but on the other hand after using 4 threads the time execution
for each thread keep constant for the following executions.
4. Conclusion
After doing the deliverable and reading all the information gave off in the session, in
addition to the short OpenMP tutorial slides posted in Atenea, we learned more about the
different parallelization clauses and how to use them properly.
Furthermore, we practised the behaviour of the overheads in a parallel area relying of the
number of threads used in the program.