6.0 The memory system Flashcards

1
Q

What is the memory wall

A

Processor is much faster than memory fetching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is memory latency hidden?

A

Using a memory hierarchy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a memory hierarchy provide?

A

Hide latency

illusion of infinite capacity

relatively low cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the components in the memory hierarchy?

A

Register file (SRAM)

L1 cache (on-chip) (SRAM)

L2 cache (on-chip) (SRAM))

Main memory (DRAM)

Hard drive (Magnetic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why does memory hierarchies work?

A

Programs have locality
Spatial, temporal

Move frequently used data and instructions close to the processor (in registers - done by compiler, in cache - done by hardware).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is temporal locality?

A

Accesses to the same location are likely to occur in near time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is spatial locality?

A

Accesses to nearby locations are likely to happen in near time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a cache?

A

Consist of a number of equally sized locations (block/line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the three types of caches?

A

Direct mapped
Fully associative
Set associative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a direct mapped cache?

A

Each block address is mapped to exactly one cache location

index = block_addr % block_count

Tag: part of the address that accesses the cache

More efficient lookup, but more conflict misses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a fully associative cache?

A

All blocks can be stored anywhere in the cache.

No index

Need to search the whole cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a set associative cache?

A

Each block can map to n locations in the cache. Need to look up the index of the block in each set to be sure if the address is stored in cache

sets = block_count / n

index: block_address % set_count

less prone to location conflicts compared to direct mapped caches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the cache block address consist of?

A

Block address (tag + index)

Block offset: addresses individual bytes within a block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the trade offs when choosing block size?

A

Large blocks:
- Greater oppurtunity for spatial locality
- More likely that large portions of the block are unreferenced before the block is evicted
- Take longer time to read into a cache - increased miss penalty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the 3C miss classification?

A

Compulsary misses
Capacity misses
Conflict misses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are compulsary misses?

A

Misses that still would occur in an infinite sized cache.

I.e. the first access to a block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are capacity misses?

A

Misses that would occur even if the cache was fully associative.
The amount of data access within a time frame is larger than the cache capacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are conflict misses?

A

Misses that occur because the cache is not fully associative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are seperate caches?

A

Used when we know the caches have specific functions (i.e. data- and instruction caches)

Instructions have the same format, and are only read

Ofen used for L1 caches. Instructions have very good locality, so L1 will grab most of the potential. Meaning we don’t need to use split caches further down the memory hierarchy.

Avoids structural hazards for simple pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a write buffer

A

When a cache needs to write, it writes the data into the write buffer.
This write buffer can continue writing into memory in the background, so that the pipeline does not need to stall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What do we need to think about when designing caches?

A
  • How to deal with writes
  • How to allocate space in cache for writes
  • How to allocate space in cache for reads
  • What block to evict on a miss
  • Should we restrict what data can be placed in higher level caches?
  • How many concurrent misses should we support?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the different ways of handling writes?

A

Write-back
Write-through

Write-allocate
Write-no-allocate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is write-back?

A

Update cached value, if it is there, and write cache value to memory at a different time.

Betting on that there will be multiple writes to the same cache line, and therefore is better to do one big write instead of multiple small ones.

Tradeoff: Save bandwith, but adds complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is write-through

A

Write updated data to lower-level memory on all writes, in addition to updating cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is write allocate

A

Insert written value into cache on a write miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is write no-allocate

A

Bypass cache on a write miss.
If the value is not in cache, write straight to the lower level memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are typical combinations of write-handling?

A

Write-back + write-allocate

Write-through + write no-allocate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How are read allocations handled?

A

Allcate-on-fill
Allocate-on-miss

29
Q

What is allocate on fill?

A

Insert the requested data into the cache, when the miss returns from lower levels.

You can still hit in all ways while the miss is being serviced (may improve performance)

Higher complexity

30
Q

What is allocate on miss?

A

When you miss, you already evict and reserve this block for the load.

31
Q

What is a replacement policy, and name three examples?

A

Decides what to evict on a miss.
On a fully- or set-associate cache, there are more than one cache line that can be evicted.

Random, FIFO; LRU

32
Q

What is the Random replacement policy?

A

Choose a random block

Less effective, very simple

33
Q

What is the FIFO replacement policy?

A

Evict block inserted longest ago

More complexity: need to remember insertion order

Medium complexity, but more effective

34
Q

What is the LRU replacement policy?

A

Evict least-recently-used

Need to remember the reuse order, which is updated on every access

When you access a block, it is moved to the top of the stack/queue. On a miss, evict the tail of the queue within the set

Most complex, most effective.

Can be expensive in high-associative caches

35
Q

What are inclusive caches?

A

Any block cached at level n is also cached at the level n+1 cache

Pro:
This simplifies coherence: if block is not in level n, it is not in level n-1

Con:
-Reduces cache utilization, as we are in prectise storing redundant copies.
- Adds some overhead

36
Q

What are exclusive caches?

A

Any block in level n cache, is not in level n+1 cache.

Simplifies coherence - know a block is in most 1 level

Does not reduce cache utilization as blocks are stored in most 1 cache

37
Q

Are all caches either inclusive or exclusive?

A

No, an option is to not enforce inclusivity or exclusivity.

Let the cache coherence deal with the fact that any block can be cached anywhere in the hierarchy

38
Q

What are non-blocking caches?

A

The key component is the MSHR

MHSR:
- Miss Information/status Holding Register
- On a miss, go into the miss handling architecture (MHA)
- Ask if there is a MHSR outstanding for this block address
- If there is, that means there is already a miss in progress, and you allocate an entro into space called target information saying that this request also requested this word from this cache block.
- When the miss returns, cache goes through the target information and responds to all words

The cache can service as many misses as there is MSHRs - selecting the number of MSHRs is critical

If the cache runs out of MSHRs it needs to block and cannot handle either hits or misses. Because there is no where to put the miss data on misses

39
Q

What are the benefits of non-blocking caches?

A

Enables servicing cache hits while a miss is being resolved(hit-under-miss policy)

Enables hiding memory latency by servicing multiple misses concurrently (memory-level-parallelism MLP)

Enables merging concurrent requests to the same cache block

40
Q

What is virtual memory?

A

Every process sees a 23/64-bit address space. Physical address space is much smaller

Virtual addresses needs to be mapped into physical ones

Because every process have the illusion that they own the complete address space, time sharing and demand paging is much easier.

41
Q

What is time sharing?

A

Being able to switch between processes at runtime

42
Q

What is demand paging?

A

Hardware figure out if a page is available in main memory or not.

If it is not, it might not have been allocated yet. In this case, the software needs to load from disk.

This is very expensive, so instead of blocking the processor, we generate an exception (page fault).

When a page fault occurs, the OS schedules the process out, finds the page to be replaced, load from disk. Meanwhile, as this is non-blocking I/O, other processes is run.

When the page is ready the process is set as runnable again, and can be scheduled.

43
Q

How does address translation work?

A

Take the virtual address and do the address translation in hardware.

From the address translation we get a physical address that is used to access main memory.

Allocate memory in fixed sized chunck called pages (4KB or 8KB)

44
Q

What are pages?

A

Allocate memory in fixed sized chunck called pages (4KB or 8KB)

45
Q

What is the page table?

A

Key data structure to support virtual memory.

Maintained by the OS

Two types:
- Forward page tables
- Inverted page tables

46
Q

What are forward page tables?

A

Have a pointer from any virtual address to a physical address.

This is inefficient, as the virtual address space is much bigger than the physical one

47
Q

What are inverted page tables?

A

One entry for each physical page in memory

48
Q

What is the TLB (translation lookaside buffer)

A

Hardware implementation of inverted page tables.

Small hardware structure that is highly associative, with typical 64 entries. Essentially a small cache for the page tables in memory.

Often have seperate I-TLBs and D-TLBs

49
Q

What happens on a TLB miss?

A

Need to grab the address translation from the page table that is stored in memory (this can be done both in HW and SW)

Note that sometimes the page table might be too large to be kept in memory - then the OS might need to be involved to get it resolved.

50
Q

What is one important thing when using virtual memory?

A

Important that a process cannot access physical memory that is not allocated to it.

Because of this, we must perform the address translation before we send the request to physical memory.

51
Q

How does memory access work with virtual addresses to physical ones?

A

Tag and index of virtual address is used to access TLB

The TLB gives a physical page number.

Add this physical page number to the page offset from the virtual address. This gives the physical address

52
Q

What does a virtual address entry contain?

A

Tag: Used to access TLB
Index: Used to access TLB
Page offset: Address within page

53
Q

What does a physical address entry contain?

A

Cache Tag
Cache Index
Block offset

All used to access cache

54
Q

What is memory protection?

A

When running multiple processes, we have multiple pages from multiple processes in memory. Don’t want processes to change memory of each other.

The different processes might use the same virtual addresses, but the OS makes sure they map to different parts of the physical memory.

Processes might share some memory (shared library). In this case we don’t want a copy in each processes memory. Instead the virtual addresses will map to the same part of memory.

55
Q

What are virtually indexed, physically tagged cahces?

A

Make sure the index-bits and block offset of the physical address, is set within the page offset of the virtual address.

The page offset is the same in the virtual- and the physical address.

In this case we can lookup the memory in cache (using the index and block offset from page offset of virtual/physical address) at the same time as the TLB lookup happens using the index and tag from the virtual address.

The physical page number from the TLB becomes the tag of the virtual address. Now compare this tag to the tags from the cache and determine if we had a hit.

56
Q

What are the three steps of memory operations?

A

Address calculation
Translation
Mem access

57
Q

How are loads done in out-of-order processors?

A

Loads need to wait for the register with the address info

58
Q

How are stores done in out-of-order processors?

A

Needs to wait for 2 registers (address + value)

59
Q

What is the execution flow of loads/stores in out-of-order processors?

A

Address calculations are done in the FUs

Second stage: Then access TLB to get translation.

Third stage:
- Load: Read value from cache/mem and write to reg
- Store: Write value to store queue. Later into store buffer on completion.Yet later written to cache/mem on retirement

If we were to use virtually indexed, physically tagged. We would do these checks on virtual addresses.

60
Q

How does speculative execution affect loads/stores

A

Loads: Do not handle page fault unless we are on correct path

Stores: Because of in-order completion, memory state is not updated speculatively

61
Q

What hazards can occur on out-of-order memory operations?

A

RAW: loads may read data before a store has writte their value

WAW and WAR cannot occur because writes happen in program order

62
Q

How can we accelerate loads

A

Load bypassing
Load forwarding

63
Q

What is load bypassing?

A

Move load before prior stores

64
Q

What is load forwarding?

A

Take the value being stored and forward it to the dependent load without accessing memory.

65
Q

How can out-of-order stores be implemented in hardware

A

Stores are first in reservation station

Stores are then issued to a store-unit

When stores have the address and the data, then the store is finished.

When the load becomes the oldest instruction in the ROB it becomes completed

Whenever a cache is ready to receive the store it gets retired and the data and address gets written to cache.

66
Q

How can out-of-order stores be implemented in hardware

A

Loads are first in reservation station

Loads are then issued to a load-unit.

The load sits in the load buffer while the address is being looked up in cache.

The cache then returns with the data that is written to a register.

When the load becomes the oldest instruction in the ROB and is ready to commit, the register will change into a architecturally visible register.

Loads are often given priority above stores, as they often are on the critical path.

67
Q

Why do bypassing and forwarding cause complexity?

A

Most recent value may not be in cache/memory, but in store queue/buffer

Need to search the store queue/buffer for the most recent value of that same memory location

if we hit in store queue, we forward data to the load

Needs comparators to search store queue

Possible multiple stores with the same address, need to pick most recent one

This is Content Addressable Memory

68
Q

What is a potential problem when doing loads/stores out-of-order?

A

Store prior to a dependent load might not have executed

Store might be in reservation station or in flight execution

In these cases, the value may not yet be in queue/buffer

Even worse, the address may not even be known yet

69
Q

How do we implement out-of-order loads in hardware to solve the potential problems when it is dependent on a prior store?

A

Add a structure called the finished load buffer

Whan a load has gotten the data from cache/memory, the address and data is put in the finished load buffer.

When the load completes, i.e. is the oldest instruction, we can remove it from the finished load buffer.

In the meanwhile, when the stores complete we search through the finished load buffer for a load with the same address as the store.

If we find that the load was dependent on the store, and has loaded the wrong data. All the instructions that were dependent on this load, i.e. used theloaded data, will be rolled back.

These are then re-executed with the correct data. This is expensive, so we can implement some prediction to avoid this