Factors affecting parallel performance Flashcards by Matthew Gilbert

What are the SLOW factors affecting parallel performance?

Starvation
Latency
Overhead
Waiting

How well did you know this?

Not at all

Perfectly

How does Starvation affect parallel performance?

Insufficient parallel work to keep the processors busy sometimes, due to an uneven distribution of work.

How well did you know this?

Not at all

Perfectly

How does Latency affect parallel performance?

The time taken for information to travel from one part of the system to another slows down the entire system.

How well did you know this?

Not at all

Perfectly

How does Overhead affect parallel performance?

The work required in addition to the computation such as starting and stopping OpenMP parallel regions.

How well did you know this?

Not at all

Perfectly

How does Waiting affect parallel performance?

When multiple threads/processes are accessing a shared resource they are in contention for memory or network bandwidth.

How well did you know this?

Not at all

Perfectly

What is parallel speedup?

How much faster the parallel program is than the linear version.

How well did you know this?

Not at all

Perfectly

How is parallel speedup calculated?

SN = T0/TN

where T0 is the time for the serial program to run and TN is the time taken for the parallel program to run on N processors

How well did you know this?

Not at all

Perfectly

How is parallel efficiency calculated?

EN = SN/N

Where SN is the speedup on N processors

How well did you know this?

Not at all

Perfectly

What is strong scaling?

When the total problem size is fixed and the number of processors is increased to reduce run time.

How well did you know this?

Not at all

Perfectly

What is Amdahl’s law?

The idea that parallel speedup is limited by the fraction of the program that can be parallelised.

How well did you know this?

Not at all

Perfectly

How is parallel speedup calculated using Amdahl’s Law?

SN = 1/(s + p/N)

Where SN is the parallel speed up, s is the fraction that can not be parallelised, and p is the fraction that can be parallelised.

How well did you know this?

Not at all

Perfectly

What did Gustafson observe?

He observed that in practice the problem size scales with the number of processors.

How well did you know this?

Not at all

Perfectly

What is weak scaling?

The idea that speed up increases linearly with the number of processors so the run time remains constant and it is easier to make use of a large number of processors.

How well did you know this?

Not at all

Perfectly

What is the equation for weak scaling?

Sn = s + pN

How well did you know this?

Not at all

Perfectly

How are parallel regions kept synchronised?

There is an implied barrier after parallel regions and work share constructs which acts as a synchronisation point.

How well did you know this?

Not at all

Perfectly

How is the implicit synch point removed?

Study These Flashcards

Using the nowait command.

How does nowait improve performance?

Study These Flashcards

It can improve performance by avoiding unnecessary waiting but you need to be careful the program still works correctly.

How is synchronisation added?

Study These Flashcards

Using the command barrier

What are the three loop scheduling options?

Study These Flashcards

Static
Dynamic
Guided

How does static loop scheduling work?

Study These Flashcards

Iterations are divided into pieces of size chunk and distributed round-robin between threads.

How does dynamic loop scheduling work?

Study These Flashcards

iterations divided into pieces of size chunk, when a thread finishes its chunk it is given another to work on.

How does guided loop scheduling work?

Study These Flashcards

Dynamic scheduling with decreasing chunk size. for chunk=1 chunk size is proportional to the number of unassigned iterations divided by the number of threads in the team. chunk=k sets minimum chunk size.

What is interconnect?

Study These Flashcards

Compute nodes are linked by an interconnect which carries these messages. the time spent communicating needs to be minimised to achieve high parallel efficiency and scale to large numbers of processors.

What are two types of interconnect?

Study These Flashcards

Gigabit ethernet
Infiniband

How do network topologies affect latency?

The number of hops between nodes affects latency. Ideally, every node is connected to every other node, but this only works for smaller networks. Bus or ring are simple but don't provide enough connectivity. More sophisticated topologies are used.

What network topology is common im HPC systems?

Fat Tree

How does MPI process placement affect overhead?

MPI processes within the same node can communicate faster than processes on different nodes. The speed of communication between processes on different nodes depends on network connectivity. The placement of MPI processes can be optimised to reduce communication time.

What are the main transmission time factors?

1. Number of hops between nodes 2. Blocking factor of the network 3. Other network traffic

How is transmission time calculated?

t = L + (M/B) where L is the latency, M is the message length, B is the bandwidth.

What is super-linear speed up?

While applying domain decomposition, an important data structure becomes small enough to fit into memory cache, resulting in a sudden performance increase.

How do the ration of halo points to domain points scale?

1/N

What are parallel overheads?

They are the extra work required to run a program in parallel. MPI has explicit overheads but they are less obvious in OpenMP.

How are parallel overheads incorporated into Amdahl's Law?

An overhead (np)v is added to the parallel run time

Factors affecting parallel performance Flashcards

(33 cards)