Chapter 5 - OpenMP Flashcards

1
Q

What is OpenMP?

A

API for shared memory MIMD programming

Open Multi-Processing

View a system using OpenMP as a collective of autonomous cores, all having access to the same memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some differences between omp and pthreads?

A

Both are standard APIs for shared-memory programming.

Pthread needs the programmer to explicitly define the behaviour of threads. In omp the programmer can just let the program know that a specefic block of code will be executed by threads. The compiler and run-time system defines the specifics of the thread use.

Pthread is a library, but OMP needs compiler support in addition

Pthread pros: lower level, gives opportunity to program virtually any possible thread behaviour

Pthread Cons: Need to specify every detail of thread behaviour - more difficult to implement

Pros OMP: Simpler to implement - runtime and compiler takes care of the details.

Cons OMP: Some lower level thread behaviour may be more difficult to implement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a directives-based shared-memory API?

A

There are special preprocessor instructions know as pragmas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are pragmas?

A

Added to a system to allow behaviour that aren’t part of the basic C specifications.

Not all compilers support pragmas, and will ignore them.

compilator discovers these during its initial scan. If it understands the text, their functionality is implemented, if not, they are ignored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are omp pragmas defined ?

A

pragma omp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the header file of omp?

A

include <omp.h></omp.h>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

List omp directives and what they do

A

pragma omp parallel

Specifies that the structured block of code that follows, should be executed by multiple threads. Number of threads started is determined by run-time system (typically one thread per core, but algorithm for deciding this is quite complicated).

parallel directive + num_threads clause to specify number of threads. System can’t guarantee that n_threads will be started, because of system limits, but most of the time it will.

Directive that tells the compiler that we need a mechanism to ensure the following block of code is accessed mutually exclusive by threads - only one thread at a time

Parallel for directive forks a team to execute the following block. This structured block must be a for-loop.

Only master thread reaches within this block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a clause in omp?

A

Just some text that modifies a directive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a team in omp?

A

The collection of threads executing the directive(parallel) block

original thread + (n-1) new threads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens when #pragma omp parallel is used?

A

From the start, the program is running a single thread.

When the directive is reached, the original thread continues execution. Then n-1 additional threads are started.

Each thread executes the following block of code in parallel.

When this block is completed, there is an implicit barrier. A thread that has completed the block will wait for all other threads in the team to complete. The children will then terminate and the parent continues executing the following code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define these terms in omp:

master, parent, child

A

master: first thread of execution, thread 0

parent: thread that encountered parallel directive and started a team of threads. This is often the master thread.

child: Each thread started by parent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What data does a child thread have access to

A

ID: rank, omp_get_thread_num();

number of threads in team: omp_get_num_threads();

Stack, and therefor local variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are critical sections handled in omp to avoid condition variables?

A

pragma omp critical

Following code block is accessed by one block at the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is varible-scope in omp?

A

Scope of a variable refers to the set of threads that can access the variable in a parallel block.

Shared scope: Accessible by all threads
Private scope: Accessible by a single thread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the default scope of variables declared outside a parallel block, and within?

A

Outside: Global

Within: Private

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a reduction variable in openmp?

A

A reduction operator is a binary operation (e.g. add, mul)

A reduction is a computation that repeatedly applies the same reduction operator to sequence of data to get a single result.

A reduction variable is where all the intermediate results of the operation are stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can reduction be used in omp?

A

pragma omp parallel reduction(+: global_result)

Add reduction clause to parallel directive

This specifies that the global_result is the reduction variable.

What happens is that omp creates private variables for each thread, and run-time stores each thread’s result in this variable. Omp then creates a critical sections and adds all the private variables together.

The private values are initialized to the identity value of the operator:
+ : 0
- : 0
* : 1
&& : 1
…and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When is it beneficial to use reduction?

A

When a function call happens within a critical section, it will be serialized. But using reduction avoids this serialization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How are for-loops parallelized using the parallel for-directive

A

Iterations are divided between the threads. The default partitioning of the iterations is done by the system, where in normal parallel directive, the work is partitioned by the threads themselves.

In for loops, a normal partitioning is giving the first m/n_threads iterations to thread 0, and the the next m/n_thread to 1, and so on.

Compiler does not check for dependences between the iterations. This can cause error during execution, programmer needs to take care of this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the default scope of loop variables in a parallel for directive?

A

private

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What types of for loops can be parallelized?

A

Only loops with canonical form

Not while, do-while

Only for-loops where the number of iterations can be determined from the loop statement itself (i: i<n: i++) and prior to execution of loop

Loops that are infinite, or that has conditional breaks in them cannot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a canonical form of a loop?

A

Loops where number of iterations can be determined prior to the execution of the loop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a loop-carried dependence?

A

A dependence between loop iterations where a value is calculated in one iteration, and the result is used in a subsequent iteration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the private clause?

A

pragma omp parallel for private(var_name)

Creates a copy of the global variable var_name and gives it a private scope

25
Q

What is the default clause?

A

pragma omp for default(none) reduction(+: global_sum) private(x, factor) shared(n)

Default(none) requires us to explicitly define the scope of variables used in the block, that was defined outside it.

26
Q

What does the for directive do, and why is it useful?

A

pragma omp for

Does not fork any new threads, but uses already-forked threads in the coming for-loop block.

If threads are created earlier in the program, and then used in the for loop, it saves overhead by not creating new threads for the for loop.

… code …

for(…){}

27
Q

What does the schedule clause do?

A

pragma omp parallel for schedule(<type> [, <chunksize>])</chunksize></type>

Clause specifies how iterations are to be assigned to threads in a for or parallel for directive.

chunksize: A number of iterations executed consecutively in a serial loop. Only used in static, dynamic and guided

Types:
static: Iterations assigned before loop is executed

dynamic/guided: Iterations assigned while loop is running. After a thread completes, it can request more iterations

auto: compiler/run-time decides schedule, good guess when elements have more equal workload

runtime: Schedule determined at run-time by environment variable

28
Q

What is the static schedule type?

A

Chunks of chunksize iterations to each thread in a round-robin fashion

Example:
12 iterations, 3 threads

(static, 2):

thread 1: [0, 1], [6, 7]
thread 2: [2, 3], [8, 9]
thread 3: [4, 5], [10, 11]

29
Q

What is the default schedule type often?

A

schedule(static, iterations/num_threads)

30
Q

When should you use the different scheduling types?

A

Static: When each iterations takes roughly the same time to execute

Dynamic: Iterations takes different times to compute

Guided: Improves balance between loads when later iterations are more compute heavy

31
Q

What is the dynamic schedule?

A

Iterations broken into chunksized chunks.

Each thread execute a chunk, and then requests a new one.

First-come, first-served assignment of chunks

A bit more overhead of dynamically assigning chunks. Larger chunk size helps with this

32
Q

What is the guided schedule?

A

Similar to dynamic.

But as chunks are completed, the chunk size decreases.

If no chunksize is specified, the chunksize will eventually decrease down to 1. If specified, decreases to chunksize.
Last chunk may be less.

33
Q

What is the runtime schedule?

A

Uses environment variable OMP_SCHEDULE.

This var can take any of the static, dyn., or guided values.

You can modify this variable to try out the performance of different scheduling types.

34
Q

How does omp use barriers?

A

pragma omp barrier

35
Q

What is the atomic directive?

A

pragma omp atomic

omp assumes the computers architecture has some range of atomic operations.

Only used at critical sections consisting of a single C assignment statement of the following form:

var <op>= <expression>;
var++;
\++var;
var--;
--var;</expression></op>

Expression must not reference <op></op>

A thread will complete the expression before any other threads starts executing it.

A critical section only performing a load-modify-store can benefit from this, as a lot of hardware is optimized for atomic load-stores.

36
Q

What does the critical(name) directive do?

A

Two blocks protected by critical directives with different names, can execute in parallel.

37
Q

How are locks used in omp?

A

Pseudo code:

initialize lock (one thread)

// Multiple threads
attempt to lock - block until ready
critical section
unlock

Destroy lock (one thread)

2 types: simple- and nested locks

38
Q

What are simple locks?

A

Can only be set once before it is unset

39
Q

What are nested locks?

A

Can be set multiple times by same thread before unset

40
Q

What is the omp task directive?

A

pragma omp task

default scope for variables in tasks is private.

Specifies units of computations.

When the program reaches this, a new task is generated by the omp run-time, that will be scheduled for execution, not necessarily immediately.

Tasks must be launched within a parallel block, but this is generally done by only one thread in the team.

41
Q

What is a common structure of task-programs?

A

pragma omp parallel ; creates a team of threads

#pragms omp single ; instruct run-time to only launch 1 thread
{
#pragma omp task
}

If single is not used, all of the threads would launch multiple times

42
Q

What is the taskwait directive?

A

pragma omp task shared(j)

Operates as a barrier for tasks.

Makes a task wait for all its sub-tasks to complete

j = 1 + 2

result = j + 1

43
Q

Howcan we create conditional tasks?

A

pragma omp task shared(i) if (n > 20)

i = func(n)

if i weren’t set to shared, it would have private scope

44
Q

Why does omp not use signals (condition variables)

A

OpenMP threads aren’t supposed to be sleeping, and critical sections should be held short as they use busy waiting

45
Q

What are worksharing directives?

A

Directives that can split a given workload between threads for you.

Have an implicit barrier at the end

46
Q

What is functional decomposition?

A

When work is split by the function of its sub-tasks

e.g. pipelining

47
Q

What is data decomposition?

A

Split work by input/output of its sub-tasks

Every threads does the same thing

48
Q

What are sections?

A

pragma omp section

Each section only run by one thread

49
Q

What does the clause nowait do?

A

Skip the barrier at the end of worksharing directives

50
Q

What does the clause firstprivate(var) do?

A

If var was shared, and you are privitizing it, its initial state is given to all the private copies

51
Q

What does lastprivate(var) do?

A

When threaded region finishes, one of the private copies are stored back into the shared copy

52
Q

What are the trade offs between small and big units of work when scheduling iterations in omp for?

A

Big:
- less scheduling to do
- limit on how big blocks can be
- if an entire iteration space is one block - there is no parallelisation

small:
- more disruptions in memory access pattern, more units to distribute
- greater flexibility to assign work to unemployed threads

53
Q

What is the blocksize attribute in schedule()

A

The minimal number of iterations handed to a thread

54
Q

What is the master/worker pattern?

A

Way to implement worksharing in queued systems

Keep active thread pool of available worker threads

Keep a queue of finite work packages

assign next package in the queue every time a threads becomes available

55
Q

What is nested parallelism?

A

One work-package spawns more wor-packages that should be distributed amongst threads

56
Q

What is task-based programming?

A

Alternative way for parallel programming. Generates dependency graphs

take block of work and dispatch it for background execution

record which blocks depend on the others

Assign blocks to the team of threads in anorder that matches their dependencies

57
Q

What does the omp task directive do?

A

pragma omp task

Creates arbitrary dependency graphs

Block context queued internally, to be executed at first opportunity

Wait for task:
#pragma omp taskwait

58
Q

How can you task-ify functions?

A

pragma omp task

void some_func(int arg1, int arg2)
{
// function body
}

Every call to this will create background tasks

59
Q

What does #pragma omp taskloop do?

A

Automates making a task out of every iteration in a loop