General Purpose GPUs Flashcards

Question 1

Q

what 3 things are CPUs limited by

Answer

A

small number of threads
high overhead per thread
power and heat constraints

Question 2

Q

describe GPU

Answer

A

orginally designed for fast 3D graphics rendering
now used for Ai, simulation, medical imaging & finance
massive parallelism - thousands of lightweight threads
ideal for compute-heavy, data-parallel tasks
enabled by programmer-friendly APIs

Question 3

Q

what is CUDA

Answer

A

Compute Unified Device Architecture is NVIDIA’s platform for GPGPU
provides parallel programming model and Apis to harness GPU’s massive core count

Question 4

Q

what is the program structure in CUDA

Answer

A

programs run partly on the CPU (host) and partly on the GPU (device)
C/C++ includes:
- host code
- device code (kernel)
  - data transfer code

Question 5

Q

describe kernel

Answer

A

function written by the programmer to run on the GPU
when launched, the kernel runs in parallel across many threads
each thread executed the same kernel code but on different data
threads must mostly be independent - branching or divergence leads to slower serial execution

Question 6

Q

how are threads launched

Answer

A

blocks and grids
- threads -> blocks
- blocks -> grids
- threads in blocks can share fast memory & sync
- threads execute in groups of 32 = warps
- threads, blocks & grids = software abstraction

Question 7

Q

how do we chose grid & block dimension

Answer

A

each block runs of a streaming multiprocessor (SM)
each SM has a max number of threads per block
too many blocks = idle SMs = wasted performance
choose based on:
- problem size
- hardware constraints
  - maximising GPU usage

Question 8

Q

what is CPU optimised for

Answer

A

sequential code
- chip = cache & control logic
- handles complex logic

Question 9

Q

what is a GPU optimised for

Answer

A

data-parallel tasks
- chip = processing logic - SIMD
- minimal control logic & cache
- hides memory latency by oversubscribing threads to cores
- designed to maximise FLOPs - graphics & scientific applications

Question 10

Q

Question 11

Q

how does memory & thread scheduling work with NVIDIA fermi

Answer

A

global DRAM interfaces surround the chip & connects SMs to external VRAM
VRAM is GPU’s dedicated memory
host interface: connects GPU to CPU through PCIe
GigaThread scheduler assigns thread blocks to SMs

Question 12

Q

why should we maximise warp throughput in GPUs

Answer

A

GPUs are most effective when many warps are active, maximising CUDA core utilisation
fermi architecture: 2 warps every 2 clock cycles
structural hazards - can block warps
memory latency - can be hidden by having other ready warps available

Question 13

Q

what is the warp execution flow

Answer

A

each warp executes one instruction at a time
threads use different execution units:
- ALU ops
- Loads/store
  - Special ops

Question 14

Q

what are the 4 items in CUDA memory hierachy

Answer

A

registers
L1 cache
L2 cache
global memory (VRAM)

Question 15

Q

what are registers used for in CUDA

Answer

A

private to each thread, extremely fast
used for temporary variables during execution

Question 16

Q

what is L1 cache used for in CUDA memory hierarchy

Answer

Study These Flashcards

A

shared among threads in SM
caches global memory to hide latency

Question 17

Q

what is L2 cache used for in CUDA memory hierarchy

Answer

Study These Flashcards

A

shared across all SMs
backup if data misses in L1

Question 18

Q

what is VRAM used for in CUDA

Answer

Study These Flashcards

A

largest but slowest
shared across the whole CPU

Question 19

Q

how does CUDA use shared memory for thread communication

Answer

Study These Flashcards

A

registers are private - inaccessible by others
shared memory is per block - threads in the same block can share data via shared memory
avoid write hazards: assign specific threads to write to specific shared memory locations
synchronise threads: use barriers to ensure all writes finish before reads begin

Question 20

Q

when does a program benefit from GPU acceleration

Answer

Study These Flashcards

A

GPUs are SIMD processors with hundreds/thousands of lightweight cores
best suited for tasks with large-scale parallelism
ideal for processing large data sets concurrently using many lightweight threads

General Purpose GPUs Flashcards

Week 2.10 (20 cards)