Mulitcore Systems Flashcards

Week 2.10 (38 cards)

1
Q

what are the 3 drivers of growth in microprocessor performance

A
  1. higher clock frequency
  2. increased transistor density
  3. smarter on-chip architecture designs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what 3 things are now used in processor designs to exploit ILP

A
  1. pipelining
  2. superscalar
  3. SMT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

describe pipelining

A
  • enables overlapping of instructions
  • while one instruction is in one stage, another can occupy a different stage
  • increases instruction throughput and performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

limit of pipeling

A

deeper pipelines requrie more control & logic, with diminishing returns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

describe superscalar

A
  • replicate execution units to create multiple pipelines
  • multiple instructions are issued and executed in parallel during each instruction cycle
  • parallelism is limited by instruction dependencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

limit of superscalar

A

code rarely offers parallelism beyond 6 pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

describe SMT

A
  • allows multiple threads to share pipeline resources within a single core
  • instructions from different threads can be issues in the same cycle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

limit of SMT

A

complexity in scheduling and resource sharing limits scalability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

describe the affect of increasing clock speeds & complexity on power consumption

A

^ clock speed & added complexity = ^ power density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the response to rising power density

A

^ on-chip cache memory as it consumes less power than logic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does multicore offer in terms of performance

A

if software is parallelisable = near-linear performance
- large caches are underutilised by single threads
- multiple cores and threads better exploit on-chip cache memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does amdahl’s law represent

A

theoretical speedup for a single application on N cores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

amdhal’s law formula

A

Speedup(N) = 1/((1 - f) + f/N)
where:
- f = fraction of the program that is parallelisable
- 1 - f = fraction that is inherently sequential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does amdhal’s law assume

A

perfect parallelism with no scheduling overhead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a more realisitc assuption about amdahl’s law

A

as core count increases, ^ overhead = peak & degrade

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

4 applications that scale well with multicore systems

A
  1. multithreaded native apps
  2. multiprocess applications
  3. java applicationds
  4. multi-instance applications
17
Q

define threading granularity

A

refers to the smallest unity of work that can be parallelised

18
Q

what does fine-grained threading give

A

flexibility
- BUT ^ overhead from managment

19
Q

trade off of threading granularity

A

parallelism vs overhead

20
Q

what does rendering use

A

hierarchical thread structure

21
Q

what are the 4 multicore cache organisations

A
  1. dedicated L1 cache
  2. dedicated L2 cache
  3. shared L2 cache
  4. shared L3 cache
22
Q

5 advantages of shared higher-level cache

A
  1. constructive interface
  2. no data repliction needed
  3. dynmaic cache allocation
  4. easier inter-core communication
  5. simplified coherency
23
Q

why is a constructive interface an advantage of shared higher-level cache

A

threads on different cores benefit from shared data already located in cache

24
Q

why is dynamic cache allocation an advantage of shared higher-level cache

A

cores can use more or less cache depending on workload needs

25
why is simplified coherency an advantage of shared higher-level cache
coherence issues are limited to private (lower-level) caches, reducing overhead
26
what is the trend in cache hierachy design
as you ^ memory capacity and core counts = ^ importance of cache coherency
27
new cache hierachy design
L1 per core, L2 shared among 2-4 cores, and global shared L3
28
why is SMT deployed in multicore design
- SMT increases the number of hardware threads per chip - as software becomes more parallel, SMT becomes more attractive than purely superscalar designs
29
define homogenous multicore
all cores are identical
30
define heterogenous multicore
differnt types of cores on one chip - mixing cores in this context means using cores with differnt ISAs optimised for different tasks
31
describe CPU/GPU multicore
- GPUs - support thousands of parallel threads. - combining CPUs & GPUs improve flexibility & performance across diverse workloads - CPUs & GPUs share key-on-chip resources such as last level cache, interconnect & memory controllers
32
what are the 2 solutions to CPU/GPU cache issues
1. physical memory partitioned between CPU & GPU 2. enabling shared access to memory & unified expression
32
how does physical memory paritioned between CPU & GPU work
- CPU had to explicitly copy memory to GPU memory - GPU copied results back after computation - significant performance penalty
32
what are the 2 challenges with CPU/GPU multicore
1. ensuring coordination & correctness between different types of cores 2. cache sharing - differences in access patterns & sensitivity
33
how does enabling shared access to memory & unified execution work
- shared virtual memory - demand paging - cache coherence - unified programming interface
34
describe CPU/DSP multicore
DSP - excel at ultra-fast, math-intensive operations
35
uses of CPU/DSP
- cellphones - modems - sound cards - hard drives
36
describe cache design in heterogenous multicore systems
use dedicated L2 caches per processor type - hardware-based cache coherence preferred in SoCs