7 - Communication & Synchronisation Flashcards by Andrew Smith

How do work items/threads communicate?

Through memory

How well did you know this?

Not at all

Perfectly

What is the idea case for memory?

One type that is large, cheap and fast

How well did you know this?

Not at all

Perfectly

What are the attributes of large, cheap and fast memory?

Large = slow/expensive

Cheap = small/slow

Fast = small/expensive

How well did you know this?

Not at all

Perfectly

What are 4 types of GPU memory types?

Private memory, local memory, global memory, constant memory

How well did you know this?

Not at all

Perfectly

What are the attributes of private memory?

Very fast, only accessible by a single work item, registers, 10/100 bytes

How well did you know this?

Not at all

Perfectly

What are the attributes of local memory?

Fast, accessible by all work items within a single work group, user accessible cache, K/MB

How well did you know this?

Not at all

Perfectly

What are the attributes of global memory?

Slow, accessible by threads from all work groups, DRAM, GB

How well did you know this?

Not at all

Perfectly

What are the attributes of constant memory?

Fast, also accessible, by all threads, part of global memory but cached, not writable, relatively small, KB

How well did you know this?

Not at all

Perfectly

What should you minimse time spent on?

Memory operations

How well did you know this?

Not at all

Perfectly

How do you minimise time spent on memory operations?

Move frequently accessed data to a faster memory

How well did you know this?

Not at all

Perfectly

What is the order of fast memory?

host&raquo_space; global&raquo_space; local&raquo_space; private

How well did you know this?

Not at all

Perfectly

What doesn’t benefit from moving frequently accessed data to a faster memory?

Single or sporadic accesses

How well did you know this?

Not at all

Perfectly

When does data become global memory?

When it is transferred from host to device

How well did you know this?

Not at all

Perfectly

What is local memory?

Making a local copy of the input to make accesses faster

How well did you know this?

Not at all

Perfectly

Why do you need synchronisation?

Accesses to shared locations need to be correctly synchronised/coordinated to avoid race conditions

How well did you know this?

Not at all

Perfectly

What are 3 types of synchronisation mechanisms?

Study These Flashcards

Barriers/memory fences
Atomic operations
Separate kernel launches

What do barriers do?

Study These Flashcards

Ensure that all work items within the same work group reach the same point

Which has lower overhead, global or local memory barriers?

Study These Flashcards

Local

Where should you avoid putting barriers?

Study These Flashcards

In conditional statements, should always apply to all work items from the group otherwise deadlock

What is impossible in modern GPU/CPU hardware?

Study These Flashcards

Synchronise different work groups

How do you synchronise different workgroups?

Study These Flashcards

By writing and launching separate kernels

What do Atomic functions do?

Study These Flashcards

Provide a mechanism for atomic (without interruption) memory operations

What do Atomic functions guarantee?

Study These Flashcards

Race free execution

How are Atomic updates performed?

Study These Flashcards

Serially, so performance penalty

What is the order in Atomic functions?

The order is unspecific, so can only be used with associative and commutative operators

What are the limitations of Atomic functions?

Atomics are slower than normal accesses Performance degrades with many simultaneous attempts to perform atomic operations on the same data

What is the usage for Atomic functions?

For infrequent, sparse and/or unpredictable global communication Attempt to use shared memory and structure algorithms to avoid synchronisation whenever possible

What is does global memory reads by GPU involve?

Reading entire blocks of data

What is memory coalescing?

Sequential data access for better performance. When another value is requested and is from the same block then no additional memory access is required

What are the effects of the Stride in Strided memory access?

A stride affects the access the pattern, if the stride is larger than the block size then the benefits of blocking are gone

7 - Communication & Synchronisation Flashcards

(30 cards)