Lecture 11 - GPUs Flashcards

Question 1

Q

Programming GPUs

Answer

A

OpenCL is a C-like language used for programming heterogeneous hardware, including GPUs. It aims for a “write once, run anywhere” approach. It uses a C API with C++ bindings.

Question 2

Q

OpenCL Specifications

Answer

A

The OpenCL specification is divided into four main parts: Platform model, Execution model, Memory model, and Programming model.

Question 3

Q

Platform Model

Answer

A

Specifies the host/device relationship, where the host is the processor coordinating execution and the device is the processor executing OpenCL code. OpenCL functions are called kernels, and these kernels execute on devices, which don’t necessarily have to be GPUs.
A device is divided into one or more compute units, which are further divided into processing elements, each with its own program counter.

Question 4

Q

Execution Model

Answer

A

Describes how OpenCL is configured on the host and how kernels are executed, setting up the OpenCL context and allowing host-device interaction. It also defines the concurrency model for kernel execution.
The execution model includes work items and work groups, with kernels executed on devices

Question 5

Q

Memory Model

Answer

A

Defines an abstract memory hierarchy, independent of the underlying memory architecture but closely resembling modern GPU architecture. This model can be adapted for other architectures like FPGAs.
The programmer allocates memory to spaces within this hierarchy, and the runtime system maps these spaces to the physical memory hierarchy. The memory hierarchy includes private memory (per work-item), local memory (per work-group), and global memory.

Question 6

Q

Programming Model

Answer

A

Describes how the concurrency model is mapped to physical hardware, with contexts executing kernels mapped to actual device hardware units.

Question 7

Q

Typical Setup

Answer

A

A typical setup involves an x86 host CPU and an OpenCL GPU device. The host CPU sets up the kernel for the device and instantiates it with a specified degree of parallelism. The GPU device executes the kernel, with input/output handled via the host.

Question 8

Q

Concurrency Model

Answer

A

OpenCL uses a hierarchical concurrency model consisting of work groups and work items, promoting scalability. Each kernel is specified with an n-dimensional range (NDRange), which is a 1-, 2-, or 3-dimensional index of work items, generally corresponding to the dimensionality of the input or output space

Question 9

Q

Vector Add Example

Answer

A

Involves adding corresponding elements from two arrays, with each thread performing one addition. OpenCL allows for defining kernels to perform such operations, with work items and work groups managing the parallel execution.

Question 10

Q

Thread Structures

Answer

A

Massively parallel programs are often written such that each thread computes one part of a problem. The thread structure is usually arranged in the same shape as the data. OpenCL’s thread structure is designed to be scalable, using work-items organized into work-groups. Work-items can identify themselves uniquely based on a global ID or a work-group ID and local ID.

Question 11

Q

Memory Model Details

Answer

A

The OpenCL memory model defines various types of memories, closely related to GPU memory hierarchy. These include global memory, constant memory, local memory, and private memory, each with different scopes and accessibility. Memory management is explicit, requiring data movement between host and device memory.

Lecture 11 - GPUs Flashcards

(11 cards)