7 - Communication & Synchronisation Flashcards
How do work items/threads communicate?
Through memory
What is the idea case for memory?
One type that is large, cheap and fast
What are the attributes of large, cheap and fast memory?
Large = slow/expensive
Cheap = small/slow
Fast = small/expensive
What are 4 types of GPU memory types?
Private memory, local memory, global memory, constant memory
What are the attributes of private memory?
Very fast, only accessible by a single work item, registers, 10/100 bytes
What are the attributes of local memory?
Fast, accessible by all work items within a single work group, user accessible cache, K/MB
What are the attributes of global memory?
Slow, accessible by threads from all work groups, DRAM, GB
What are the attributes of constant memory?
Fast, also accessible, by all threads, part of global memory but cached, not writable, relatively small, KB
What should you minimse time spent on?
Memory operations
How do you minimise time spent on memory operations?
Move frequently accessed data to a faster memory
What is the order of fast memory?
host»_space; global»_space; local»_space; private
What doesn’t benefit from moving frequently accessed data to a faster memory?
Single or sporadic accesses
When does data become global memory?
When it is transferred from host to device
What is local memory?
Making a local copy of the input to make accesses faster
Why do you need synchronisation?
Accesses to shared locations need to be correctly synchronised/coordinated to avoid race conditions
What are 3 types of synchronisation mechanisms?
Barriers/memory fences
Atomic operations
Separate kernel launches
What do barriers do?
Ensure that all work items within the same work group reach the same point
Which has lower overhead, global or local memory barriers?
Local
Where should you avoid putting barriers?
In conditional statements, should always apply to all work items from the group otherwise deadlock
What is impossible in modern GPU/CPU hardware?
Synchronise different work groups
How do you synchronise different workgroups?
By writing and launching separate kernels
What do Atomic functions do?
Provide a mechanism for atomic (without interruption) memory operations
What do Atomic functions guarantee?
Race free execution
How are Atomic updates performed?
Serially, so performance penalty