Mulitcore Systems Flashcards
Week 2.10 (48 cards)
what are the 3 drivers of growth in microprocessor performance
- higher clock frequency
- increased transistor density
- smarter on-chip architecture designs
what 3 things are now used in processor designs to exploit ILP
- pipelining
- superscalar
- SMT
describe pipelining
- enables overlapping of instructions
- while one instruction is in one stage, another can occupy a different stage
- increases instruction throughput and performance
limit of pipeling
deeper pipelines requrie more controlo & logic, with diminishing returns
describe superscalar
- replicate execution units to create multiple pipelines
- multiple instructions are issued and executed in parallel during each instruction cycle
- parallelism is limited by instruction dependencies
limit of superscalar
code rarely offers parallelism beyond 6 pipelines
describe SMT
- allows multiple threads to share pipeline resources within a single core
- instructions from different threads can be issues in the same cycle
limit of SMT
complexity in scheduling and resource sharing limits scalability
describe the affect of icnreasing clock speeds & complexity on power consumption
^ clock speed & added complexity = ^ power density
what is the response to rising power density
^ on-chip cache memory as it consumes less power than logic
what does multicore offer in terms of performance
if software is parallelisable = near-linear performance
- large caches are underutilised by single threads
- multiple cores and threads better exploit on-chip cache memory
what does amdahl’s law represent
theoretical speedup for a single application on N cores
amdhal’s law formula
Speedup(N) = 1/((1 - f) + f/N)
where:
- f = fraction of the program that is parallelisable
- 1 - f = fraction that is inherently sequential
what does amdhal’s law assume
perfect parallelism with no scheduling overhead
what is a more realisitc assuption about amdahl’s law
as core count increases, ^ overhead = peak & degrade
4 applications that scale well with multicore systems
- multithreaded native apps
- multiprocess applications
- java applicationds
- multi-instance applications
define threading granularity
refers to the smallest unity of work that can be parallelised
what does fine-grained threading give
flexibility
- BUT ^ overhead from managment
trade off of threading granularity
parallelism vs overhead
what does rendering use
hierarchical thread structure
what are the 4 multicore cache organisations
- dedicated L1 cache
- dedicated L2 cache
- shared L2 cache
- shared L3 cache
5 advantages of shared higher-level cache
- constructive interface
- no data repliction needed
- dynmaic cache allocation
- easier inter-core communication
- simplified coherency
why is a constructive interface an advantage of shared higher-level cache
threads on different cores benefit from shared data already located in cache
why is dynamic cache allocation an advantage of shared higher-level cache
cores can use more or less cache depending on workload needs