Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lab 2: Brief Tutorial On Openmp Programming Model: Adrián Álvarez, Sergi Gil Par4207 2019/2020

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 11

PAR

Lab 2: Brief tutorial on OpenMP programming model

Adrián Álvarez, Sergi Gil


par4207
2019/2020
1. Introduction

In this laboratory we’re going to be introduced to the main constructs in the OpenMP
extensions to the C programming language. In the first part we’re going through a set of
different code versions for the computation of number Pi in parallel, and then we’re going
to be presented with a set of examples to practice the main components of the OpenMP
programming model. In the Deliverable part of this session we need to fill the
questionnaire. Next, we will be finishing the Deliverable by observing the overheads
introduced by the use of different synchronisation constructs in OpenMP.
2. OpenMP questionnaire
A) Parallel regions

1.hello.c:

1. How many times will you see the "Hello world!" message if the program is executed
with "./1.hello"?
The program print 24 times "Hello world!".

2. Without changing the program, how to make it to print 4 times the "Hello World!"
message?
If we execute the command line “OMP_NUM_THREADS=4 ./1.hello” we can set
externaly the number of threads we want to use.

2.hello.c:

1. Is the execution of the program correct? (i.e., prints a sequence of "(Thid) Hello (Thid)
world!" being Thid the thread identifier). If not, add a data sharing clause to make it
correct?
No, is incorrect. We need to change the clause “#pragma omp parallel num_threads(8)” to
“#pragma omp parallel private(id) num_threads(8)”.

2. Are the lines always printed in the same order? Why the messages sometimes appear
inter- mixed? (Execute several times in order to see this).
No, the lines are not always printed in the same order and that’s because it’s executed in
parallel and not in sequencial. Furthermore, the messages appear inter-mixed, because
we’re not controlling the threads what are printing, so if one thread is printing but another
one is going to print it can appear this situation.

3.how many.c:

Assuming the OMP NUM THREADS variable is set to 8 with "export OMP NUM
THREADS=8"

1. How many "Hello world …" lines are printed on the screen?
20 lines.

2. What does omp get num threads return when invoked outside and inside a parallel
region?
If it’s outside a parallel region it return 1 thread, and inside a parallel region it will return
the number of threads that are using right now.
4.data sharing.c:

1. Which is the value of variable x after the execution of each parallel region with different
data-sharing attribute ( shared, private, firstprivate and reduction)? Is that the value you
would expect? (Execute several times if necessary).

The first one that is shared the variable inside the construct is the same as the one
outside the construct. In a parallel construct this means all threads see the same variable
but not necessarily the same value.

The second one that is privatethe variable inside the construct is a new variable of the
same type with an undefined value. In a parallel construct this means all threads have a
different variable.

The third one that is firstprivate he variable inside the construct is a new variable of the
same type but it is initialized to the original variable value. In a parallel construct this
means all threads have a different variable with the same initial value.

The fourth one that is reduction where all threads accumulate values into a single
variable. The compiler creates a private copy of each variable in list that is properly
initialized to the identity value At the end of the region, the compiler ensures that the
shared variable is properly (and safely) updated with the partial values of each thread,
using the specified operator.

B) Loop parallelism

1.schedule.c

1. Which iterations of the loops are executed by each thread for each schedule kind?

-Static: In this case the iterations are distributed equality to the threads(3 iterations for
each thread).
-Static(chunk=2): In this case the iterations are distributed equality to the threads, but with
the exception that each thread start with 2 chunks.
-Dynamic(chunk=2): In this case the iterations are distributed to the thread that finish
before the others and give him 2 iterations more, until it runs out of iterations.
-Guided(chunk=2): Variant of dynamic. The size of the chunks decreases as the
threads grab iterations, but it is at least of size 2.

2.nowait.c

1. Which could be a possible sequence of printf when executing the program?


It could be any sequence of threads. (f.e. 0,2,3,1), because is dynamic.

2. How does the sequence of printf change if the nowait clause is removed from the first
for directive?

The nowait clause is used to remove the synchronization time between threads, so it goes
more slowly than the execution of the program with nowait.

3. What would happen if dynamic is changed to static in the schedule in both loops?
(keeping the nowait clause)

It doesn’t change, because the amounts of iterations are enough to see the difference
between the dynamic clause and the static clause.

3.collapse.c

1. Which iterations of the loop are executed by each thread when the collapse clause is
used?

I and j loops are folded and iterations distributed among all threads. Both i and j are
privatized. That means that the iterations of the loops are combined.

2. Is the execution correct if the collapse clause is removed? Which clause (different than
collapse) should be added to make it correct?

The execution is not correct some iterations are repeated, in order to be correct we need
to privatize the variable j.
#pragma omp parallel for private(j)

C) Synchronization

1.datarace.c

(execute several times before answering the questions)


1. Is the program always executing correctly?

No, never.

2. Add two alternative directives to make it correct. Explain why they make the execution
correct.
One is to put #pragma omp critical and the other is to put #pragma omp atomic, inside de
loop.
2.barrier.c

1. Can you predict the sequence of messages in this program? Do threads exit from the
barrier in any specific order?

Not at all, we can predict that the messages are going to be going to sleep 4 times, then enter
the barrier 4 times and for last all awake 4 times, but we don’t know the order of threads.

3.ordered.c

1. Can you explain the order in which the ”Outside” and ”Inside” messages are printed?

Inside the ordered section we can ensure that it will be printed in order, but outside we
cannot ensure the order.

2. How can you ensure that a thread always executes two consecutive iterations in order
during the execution of the ordered part of the loop body?

Writting the code #pragma omp ordered before the first printf.

D) Tasks

1.single.c

1. Can you explain why all threads contribute to the execution of instances of the single
work-sharing construct? Why are those instances appear to be executed in bursts?

#pragma omp single, it used to make that only one thread of the team executes the
structured block. Because there is a barrier implicit in the single construct.

2.fibtasks.c

1. Why all tasks are created and executed by the same thread? In other words, why the
program is not executing in parallel?

#pragma omp task, allow to only one thread to work, but the others cooperate with the
thread that is working to execute the operation.

2. Modify the code so that the program correctly executes in parallel, returning the same
answer that the sequential execution would return.

We need to write #pragma omp single, just before the first while.
3.synchtasks.c

1. Draw the task dependence graph that is specified in this program

Main Task

Foo1 Foo2

Foo3 Foo4

Foo5

2. Rewrite the program using only taskwait as task synchronisation mechanism (no
depend
clauses allowed)

We need to write #pragma omp taskwait after every foo function.

4.taskloop.c

1. Find out how many tasks and how many iterations each task execute when using the
grainsize and num tasks clause in a taskloop. You will probably have to execute the
program several times in order to have a clear answer to this question.

If a grainsize clause is present, the number of iterations assigned to each generated task
is greater than or equal to the minimum of the value of the grain-size expression, but less
than two times the value of the grain-size expression. So we have 6 iterations for each
thread used and we use 2 threads.
If num_tasks is specified, the taskloop creates as many tasks as the minimum of the num-
tasks expression and the number of iterations. Each task must have at least one iteration.
So it’s quite variable.

2. What does occur if the nogroup clause in the first taskloop is uncommented

If the nogroup clause is present, no implicit taskgroup region is created.


3. Observing overheads
Please explain in this section of your deliverable the main results obtained and your
conclusions in terms of overheads for parallel, task and the different synchronisation
mechanisms. Include any tables/plots that support your conclusions.

Thread creation and termination

While executing the program pi_omp_parallel we can observe that the time of the
overheads are:

All overheads expressed in microseconds


Nthr Overhead Overhead per thread
2 1.2585 0.6292
3 1.5174 0.5058
4 1.4346 0.3587
5 1.4287 0.2857
6 1.6045 0.2674
7 1.5804 0.2258
8 1.8923 0.2365
9 2.0098 0.2233
10 2.0199 0.2020
11 1.9740 0.1795
12 2.0101 0.1675
13 2.2740 0.1749
14 2.3075 0.1648
15 2.2777 0.1518
16 2.3456 0.1466
17 2.5894 0.1523
18 2.4365 0.1354
19 2.4644 0.1297
20 2.4940 0.1247
21 2.5319 0.1206
22 2.5928 0.1179
23 2.7006 0.1174
24 2.8307 0.1179

in this table we can observe that if we use more threads provoke an increase of increase
the time excution of the overheads, but on the other hand each thread has less time
execution.
Task creation and synchronization

While executing the program pi_omp_tasks we can observe that the time of the
overheads are:

All overheads expressed in microseconds


Ntasks Overhead Overhead per task

2 0.1184 0.0592
4 0.5055 0.1264
6 0.7445 0.1241
8 0.9947 0.1243
10 1.2360 0.1236
12 1.4786 0.1232
14 1.7191 0.1228
16 1.9504 0.1219
18 2.2065 0.1226
20 2.4496 0.1225
22 2.6902 0.1223
24 2.9456 0.1227
26 3.1772 0.1222
28 3.4752 0.1241
30 3.6529 0.1218
32 3.8936 0.1217
34 4.1446 0.1219
36 4.3907 0.1220
38 4.6352 0.1220
40 4.8735 0.1218
42 5.1101 0.1217
44 5.3618 0.1219
46 5.6119 0.1220
48 5.8287 0.1214
50 6.0919 0.1218
52 6.3349 0.1218
54 6.5618 0.1215
56 6.8169 0.1217
58 7.0564 0.1217
60 7.3124 0.1219
62 7.5558 0.1219
64 7.7741 0.1215
In this table we can observe that if we use more threads, provoke an increase of the time
excution of the overheads, but on the other hand after using 4 threads the time execution
for each thread keep constant for the following executions.

4. Conclusion

After doing the deliverable and reading all the information gave off in the session, in
addition to the short OpenMP tutorial slides posted in Atenea, we learned more about the
different parallelization clauses and how to use them properly.

Furthermore, we practised the behaviour of the overheads in a parallel area relying of the
number of threads used in the program.

You might also like