Caches Flashcards

1
Q

The memory wall

A

processors are getting faster at a rate faster than memories are getting faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

temporal locality

A

recently referenced data likely to be referenced again soon. Reactive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

spacial locality

A

more likely to reference data near recently referenced data. Proactive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is temporal and spacial locality used for both data and instructions?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to find average memory access time

A

latency_avg = latency_hit + %_miss*latency_miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Primary caches

A

split instructions (l$) and data (d$). on chip (with CPU), made of SRAM (same circuit type as CPU)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

2nd level caches

A

on chip (with CPU), SRAM); unified (holds both I and D)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How large are primary caches?

A

8KB to 64KB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How large are second level caches?

A

typically 512KB to 16MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

4th level cache = main memory

A

Made of DRAM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How large is fourth level cache?

A

1GB to 4GB for desktop, severs can have much more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

5th level cache

A

disk/SSD (swap and files)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Processors are how much cache by area?

A

30-70%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Static RAM (SRAM)

A

6 transistors per bit
optimized for speed and density
fast (sub-nanosecond latency for small SRAM)
speed proportional to area
integrates well with standard processor logic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Dynamic RAM (DRAM)

A

1 transistor + 1 capacitor per bit
optimized for density
slow (>40ns internal access, ~100ns pin-to-pin)
different fabrication steps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nonvolatile storage

A

magnetic disk, flash, STT, Re-RAM, PCM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cache Lookup Algorithm

A

Read frame indicated by index bits

“Hit” if tag matches and valid bit is set, otherwise miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Fill path also called what?

A

backside

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cache controller

A

finite state machine - remembers miss address, accesses next level, waits for response, writes data and tag in proper locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

%miss (miss rate)

A

misses/#accesses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

t_hit (hit time)

A

time to read data from (write data to) cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

t_miss (miss penalty)

A

time to read data into cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Average access time: t_avg

A

t_hit + %miss * t_miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what roughly determines t_hit

A

cache capacity and circuits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what roughly determines t_hit
lower level memory structures
26
How to measure %_miss?
hardware performance counters, simulation, paper simulation
27
how to find offset
Log_2(block size)
28
how to find index
log_2(number of sets)
29
How to reduce %miss?
increase capacity | increase block size
30
What happens if you increase cache capacity?
reduce % miss, but t_hit increases
31
What is t_hit latency proportional to?
sqrt(capacity)
32
What are the advantages of increasing block size?
reduce %miss | reduce tag overhead
33
What are the disadvantages of increasing block size?
potentially useless data transfer | premature replacement of useful data
34
For same size cache, will increasing the block size increase or reduce the tag overhead?
Increasing the block size will reduce the tag overhead
35
Effects of block size on miss rate
spacial prefetching | interference
36
Spacial prefetching
good; for blocks with adjacent addresses turns miss/miss into miss/hit
37
Interference
For blocks with non-adjacent addresses but adjacent frames; turns hits into misses by disallowing simultaneous residence
38
What offsets the time to read/transfer/fill a larger block?
critical word first/early restart
39
Can critical word first/early restart help with a cluster of misses?
No. Reads/transfers/fills of two misses can't happen simultaneously
40
Name for a frame group
set
41
Each frame in set
way
42
Pros and cons of increasing set associativity
pro: reduces conflicts con: increases t_hit (additional tage match and muxing)
43
Lookup algorithm for multi-way set associative cache
Use index bit to find set read data/tags in all frames in that set in parallel if match and valid bit, hit
44
NMRU and miss handling
Add MRU bits to each set, hit will update MRU, miss will replace any way but MRU
45
Can split data and tags into two different arrays, so can access in parallel. Why are multi-way associative caches still slower than direct mapped caches?
Still more logic in the critical path than direct mapped caches (an additional multiplexor), so slower t_hit time
46
Pros and cons of higher associative caches
Pro: have better (lower) % miss Con: T_hit increases - the more associative, the slower
47
Why are instruction caches smaller/simpler?
don't have to worry about writing/storing
48
Why are writes slower than reads?
For reads, can read tag and data in parallel
49
Stages of write pipeline
1) match tag 2) write to matching way bypass to avoid load stalls, may introduce structural hazards
50
Two options for when to propagate new value to lower level memory
1) write through | 2) write back
51
Write Through
on hit, update cache | immediately send the write to the next level
52
Write Back
write to lower level when block is replaced | requires an extra dirty bit per block
53
Writeback buffer (WBB)
keeps writes off the critical path 1) send fill request to next level 2) while waiting, write dirty block to buffer 3) when new block arrives, put it into cache 4) write buffer sends contens to next-level
54
Disadvantages of write through
requires additional bus bandwidth | without a write buffer, must wait for writes to complete to memory
55
Advantages of write through
Easier to implement, no need for dirty bits in cache Don't have to deal with coherence traffic at this cache level Simplifies miss handling (no write back buffer step)
56
Advantage of Write back
Uses less bandwidth since some writes don't go to memory (also saves power)
57
Read vs Write Miss
Read miss: load can't go on w/o data, must stall | Write miss: no instruction waiting for data, so don't need to stall
58
Store buffer
writes to D$ in background eliminates stalls on write misses loads must search store buffer in addition to D$
59
Store vs. writeback buffer
store buffer: in front of D$, hides store misses | writeback buffer: behind D$, hides write backs
60
Write allocate is used with with what time of write (write back or write through)?
write back
61
Write-allocate
when a write miss occurs, allocate a frame in the cache for the miss data
62
Advantage of write alloccate
decreases read misses
63
No write allocate
when a write miss occurs, just write to next level, no need to allocate a cache frame for the miss data
64
Pros/cons of no-write-allocate
potentially more read misses, but doesn't use a frame in the cache
65
4 types of cache miss
compulsory, capacity, conflict, coherence
66
compulsory cache miss
never seen this address before, would miss in infinite cache
67
capacity cache miss
miss caused because cache is too small (would miss in fully associative cache)
68
conflict cache miss
miss caused because cache associativity is too low
69
coherence cache miss
miss due to external invalidations in shared memory multiprocessors and multicores
70
How does larger block size effect 3 C's and hit rate?
decreases compulsory misses (spacial locality) increases conflict misses (fewer frames) can increase t_miss - reading more bytes from next level no significant effect on t_hit
71
How does larger cache effect 3 C's and hit rate?
decreases capacity miss | increases t_hit
72
How does higher associativity effect 3 C's and hit rate?
decreases conflict misses | increases t_hit
73
local hit/miss rate
percent of references to this cache that hit -# misses/total accesses to this cache local miss rate = (100%- local hit rate)
74
global hit/miss rate
misses/total # of memory references
75
inclusive caches
a block in the L1 is always in the L2 good for write throughs coherence traffic only needs to check L2
76
exclusive caches
block is either in L1 or L2 (never both) holds more data coherence traffic must check both L1 and L2
77
Give reads priority over writes
read must check contents of the WBB since it could hold the read value reduces write costs in writeback cache- if read miss will replace a dirty block, write the dirty block to WRR, read memory, then write WBB to memory