Lecture 2: The world of parallelism Flashcards

1
Q

What is the trend in GPU and CPU usage?

A

The number of cores is increasing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Flynn’s Taxonomy?

A

Classification system for computer architectures based on number of instruction and data streams they can process simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the categories of Flynn’s Taxonomy?

A

SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
MIMD: Multiple Instruction, Multiple Data (Chip MPs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define the four categories of Flynn’s Taxonomy

A

SISD: One instruction executed at a time
processing one data element at a time. e.g. Traditional single processors

SIMD: One instruction executed at a time
operating on multiple independent streams of data e.g. Vector processors (1970s), Vector units (MMX, SSE, AVX), GPUs

MIMD: Multiple sequences of instructions executed independently
each one operating on a different stream of data e.g. Chip Multiprocessors

SIMD: Multiple instruction streams but with the same code on
multiple independent streams of data e.g. Data Parallel machines built from independent processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe ineterconnections and communication in parallelism

A
  1. Between cores
  2. Between cores and memory

The way the connections are affects the type of computations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Inerconnections in Chip MPs Advantages and Disadvantages

A

Advantages: Faster than traditional interconnects so lower cost
Disadvantages: Limited silicon and power for network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a grid?

A
  1. Direct link to neighbours
  2. Private on-chip memory
  3. Staged communication with non-neighbours
  4. NxN grid worst case: 2*(N-1) steps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a torus?

A
  1. Direct link to neighbours
  2. Private on-chip memory
  3. More symmetrical, more paths, shorter paths
  4. NxN grid worst case: N steps
  5. More wires, complex routing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the smallest kind of on-chip interconnect?

A

A grid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In what interconnect do all cores connect four neighbors?

A

Torus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which interconnects are suitable for smaller and which for larger systems?

A

Grid -> Smaller
Torus -> Larger
Bus -> Smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Whats a key property of Torus?

A

They can be generalized further to multiple dimensions:
2D torus 2D grid →
folded 4 neighbours →
3D torus 3D grid →
folded 6 neighbours →
4D torus 4D grid →
folded 8 neighbours →
CMPs rarely go above 2D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which inerconnect is relied on by many multiprocessors?

A

A bus, partially or fully

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a bus?

A
  1. All cores to all cores
  2. Simple to build
  3. Constant latency
  4. Memory can be oranized in any way: private to each core or shared between cores
  5. Time-shared bus (disadvantage)
    → complexity, lesser bandwidth (fraction of that of grid)
  6. Very long wires (to connect all these cores)
    → area, routing, power, slow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For a large number of cores what would be the main bottleneck?

A

The bus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which topologies are more suitable for larger systems (scalable)?

A

1.Trees
2. Hierarchical (Crossbars,
Hypercubes, Rings,
MIN, etc)

Important for high core count

17
Q

What type of switching is in scalable topologies? Desrcibe it.

A

Packet switching: Dividing data into small packets for efficient transmission across a network. Can follow different paths to the destination and arrive out of order.

18
Q

What is a reason for discontuity in core count growth?

A

Interconnection. Larger counts require more complex network topologies.

19
Q

What is shared memory? Describe its hardware and software view.

A

Accessible from every part of the computation.

Hardware: Memory connected to all cores

Software: Global. Accessible from all threads (Reads/Writes)

20
Q

What is distributed memory? Describe its hardware and software view.

A

Accessible from only one part of the computation.

Hardware: Memory connected to only one core

Software: Local. Accessible only by the owning thread (Message passing)

21
Q

What is the software view referred to?

A

Programming model

22
Q

What are the types of programming model?

A

Serial globally accessible memory →
(sw restrictions may apply)

Data Sharing globally accessible memory →
(sw equivalent of Shared Memory)

Message Passing (distributed) thread-owned memory →
(sw equivalent of Distributed Memory)

23
Q

True or False

Programming model = Memory organisation

A

False

24
Q

Match the following memoery with programming model

  1. Shared memory
  2. Distributed memory

A. Message passing
B. Data sharing

A
  1. B
  2. A
25
Q

How efficient is simulating data sharing on distributed memory?

A

Slow

26
Q

How efficient is message passing on shared memory?

A
  1. Fast but slower than Data Sharing
  2. Extra traffic might impact bandwidth
27
Q

Which memory is better for HW perspective?

A

Distributed Memory.
1. Easier implementation
2. Higher Bandwidth
3. Scales better (E.g. Super computers!)

28
Q

Which memory is better for SW perspective?

A

Memory Sharing.
1. Easier programming
2. Works with irregular communication

29
Q

What is one of the central conflicts of contemporary architecture?

A

Hardware is complex

30
Q

What are the software issues of complex hardware?

A

SW exposed to it (distributed)
→Higher SW cost
→Complicated code
→Wasted energy & cycles

31
Q

What are the hardware issues of complex hardware?

A

HW hides it (shared memory)
→Higher HW cost
→Complicated design
→Wasted energy & cycles

32
Q

What type of memory for chip mp?

A

Shared

33
Q

Where is distributed memory used?

A

Supercomputers

34
Q

What is the NxN worst case for torus and grid?

A

Torus -> N
Grid -> 2*(N-1)