Lecture 2: The world of parallelism Flashcards

(34 cards)

1
Q

What is the trend in GPU and CPU usage?

A

The number of cores is increasing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Flynn’s Taxonomy?

A

Classification system for computer architectures based on number of instruction and data streams they can process simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the categories of Flynn’s Taxonomy?

A

SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
MIMD: Multiple Instruction, Multiple Data (Chip MPs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define the four categories of Flynn’s Taxonomy

A

SISD: One instruction executed at a time
processing one data element at a time. e.g. Traditional single processors

SIMD: One instruction executed at a time
operating on multiple independent streams of data e.g. Vector processors (1970s), Vector units (MMX, SSE, AVX), GPUs

MIMD: Multiple sequences of instructions executed independently
each one operating on a different stream of data e.g. Chip Multiprocessors

SIMD: Multiple instruction streams but with the same code on
multiple independent streams of data e.g. Data Parallel machines built from independent processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe ineterconnections and communication in parallelism

A
  1. Between cores
  2. Between cores and memory

The way the connections are affects the type of computations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Inerconnections in Chip MPs Advantages and Disadvantages

A

Advantages: Faster than traditional interconnects so lower cost
Disadvantages: Limited silicon and power for network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a grid?

A
  1. Direct link to neighbours
  2. Private on-chip memory
  3. Staged communication with non-neighbours
  4. NxN grid worst case: 2*(N-1) steps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a torus?

A
  1. Direct link to neighbours
  2. Private on-chip memory
  3. More symmetrical, more paths, shorter paths
  4. NxN grid worst case: N steps
  5. More wires, complex routing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the smallest kind of on-chip interconnect?

A

A grid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In what interconnect do all cores connect four neighbors?

A

Torus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which interconnects are suitable for smaller and which for larger systems?

A

Grid -> Smaller
Torus -> Larger
Bus -> Smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Whats a key property of Torus?

A

They can be generalized further to multiple dimensions:
2D torus 2D grid →
folded 4 neighbours →
3D torus 3D grid →
folded 6 neighbours →
4D torus 4D grid →
folded 8 neighbours →
CMPs rarely go above 2D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which inerconnect is relied on by many multiprocessors?

A

A bus, partially or fully

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a bus?

A
  1. All cores to all cores
  2. Simple to build
  3. Constant latency
  4. Memory can be oranized in any way: private to each core or shared between cores
  5. Time-shared bus (disadvantage)
    → complexity, lesser bandwidth (fraction of that of grid)
  6. Very long wires (to connect all these cores)
    → area, routing, power, slow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For a large number of cores what would be the main bottleneck?

A

The bus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which topologies are more suitable for larger systems (scalable)?

A

1.Trees
2. Hierarchical (Crossbars,
Hypercubes, Rings,
MIN, etc)

Important for high core count

17
Q

What type of switching is in scalable topologies? Desrcibe it.

A

Packet switching: Dividing data into small packets for efficient transmission across a network. Can follow different paths to the destination and arrive out of order.

18
Q

What is a reason for discontuity in core count growth?

A

Interconnection. Larger counts require more complex network topologies.

19
Q

What is shared memory? Describe its hardware and software view.

A

Accessible from every part of the computation.

Hardware: Memory connected to all cores

Software: Global. Accessible from all threads (Reads/Writes)

20
Q

What is distributed memory? Describe its hardware and software view.

A

Accessible from only one part of the computation.

Hardware: Memory connected to only one core

Software: Local. Accessible only by the owning thread (Message passing)

21
Q

What is the software view referred to?

A

Programming model

22
Q

What are the types of programming model?

A

Serial globally accessible memory →
(sw restrictions may apply)

Data Sharing globally accessible memory →
(sw equivalent of Shared Memory)

Message Passing (distributed) thread-owned memory →
(sw equivalent of Distributed Memory)

23
Q

True or False

Programming model = Memory organisation

24
Q

Match the following memoery with programming model

  1. Shared memory
  2. Distributed memory

A. Message passing
B. Data sharing

25
How efficient is simulating data sharing on distributed memory?
Slow
26
How efficient is message passing on shared memory?
1. Fast but slower than Data Sharing 2. Extra traffic might impact bandwidth
27
Which memory is better for HW perspective?
Distributed Memory. 1. Easier implementation 2. Higher Bandwidth 3. Scales better (E.g. Super computers!)
28
Which memory is better for SW perspective?
Memory Sharing. 1. Easier programming 2. Works with irregular communication
29
What is one of the central conflicts of contemporary architecture?
Hardware is complex
30
What are the software issues of complex hardware?
SW exposed to it (distributed) →Higher SW cost →Complicated code →Wasted energy & cycles
31
What are the hardware issues of complex hardware?
HW hides it (shared memory) →Higher HW cost →Complicated design →Wasted energy & cycles
32
What type of memory for chip mp?
Shared
33
Where is distributed memory used?
Supercomputers
34
What is the NxN worst case for torus and grid?
Torus -> N Grid -> 2*(N-1)