Comp Arch Flashcards Preview

Chemistry > Comp Arch > Flashcards

Flashcards in Comp Arch Deck (35):
1

Effective speed of two layers of memory with different access times.

h*tmem1 + (1 - h)*tmem2
h = hit ratio (number of cache hits/number of memory accesses)
tmem1 = the access time of the closer, faster memory
tmem2 = the access time of the farther, slower memore

2

Sequential memory access

Read through data as stream until desired data is found.. Time depends on relative positions of start and target. (e.g., tape)

3

Direct memory access

Jump to first block/sector, then perform sequential search. Faster than sequential because you can "jump" over some data. (e.g., hard disk)

4

Random memory access

Address identifies location exactly. Access time is consistent and independent of previous access. (e.g., RAM)

5

Associative memory access

Addressing information stored with data. Data located by comparing desired address with address portion of stored elements. Access time is consistent. (e.g., cache)

6

Direct Mapping Cache

- Each block of main memory maps to only one cache line
- Line number is determined with i = j mod m
i = line number, j = block number, m = number of lines in cache
- The memory address is divided into three parts:
the word id
the cache line number
the tag
- PROS:
Simple and inexpensive
No need for a replacement algorithm
No searching, so very fast
- CONS:
Two active blocks that use the same line thrash, causing high number of misses

7

Fully Associative Cache

- A memory block can load into any line of cache
- Every line's tag must be examined for a match
- The algorithm for storing is independent of the size of the cache. Searching gets expensive and slower.
- Memory address is interpreted as:
Least significant word bits = word position within block
Most significant bits = tag used to identify which block is stored in a particular line of cache
- PROS:
Eliminates thrashing
- CONS:
Every line's tag must be examined for a match...expensive and slow

8

Set Associate Cache

- Cache is divided into a power of 2 number of sets
- Each set contains k blocks/lines...this is called "k-way" set associative mapping
- Each block of main memory maps to only one cache set, but can be stored in any of the k-lines.
- Effect of cache set size on addressing partitioning:
Two lines per set is the most common organization...this is called 2-way associative mapping
A given block can be in one of 2 lines in only one specific set
Significant improvement over direct mapping
- PROS:
Greatly reduces thrashing
Most commonly used for cache implementations
- CONS:
More expensive than direct mapping, but not significantly so
- Uses LRU replacement algorithm with a USE bit. If first line was last used, USE = 0. If second line was last used, USE = 1.

9

Least Recently Used (LRU)

Replace the block that hasn't been touched in the longes period of time

10

First in, first out (FIFO)

replace block that has been in cache longest

11

Least frequently use (LFU)

replace block which has had fewest hits

12

Random

Just pick one. Only slight performance hit with respect to use-based algorithms like LRU, FIFO, and LFU.

13

Split cache

- One cache for code and one cache for data
- One cache feeds instruction decoder and one feeds data to execution unit
- Supports pipelined architectures and other mechanisms intended to improve performance.
- Instruction cache doesn't need to worry about writes
- Data cache doesn't need to worry about branch history or decoding information

14

Victim cache

holds blocks evicted from cache due to replacement algorithm

15

Trace cache

holds instructions already decoded from a previous execution

16

Loop buffer

a small trace cache used to speed up the execution of loops

17

Writing to cache problems

- If cache is written to, main memory is invalid or if main memory is written to, cache is invalid. Can occur if I/O or other CPUs can address main memory directly.
- Multiple CPUs may have individual caches; once one cache is written to, all caches are invalid.

18

Write Through

- All writes go to main memory as well as cache
- Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date
- Simpler logic
- Lots of bus traffic
- Slows down writes

19

Write Back (a.k.a. Write Behind)

- Updates initially made in cache only
- A dirty bit or update bit for cache slot is set when update occurs
- If block is to be replaced, write to main memory only if dirty bit is set...sometimes called a "lazy write."
- Problem: other caches get out of sync
- I/O must access main memory through cache
- More complex to implement
- Must track which blocks have been modified (dirty bit_ and write back when block is to be replaced
- This means a read miss in a write-back cache could require two memory accesses:
one to write back the replaced data
one to retrieve the new block of data

20

Third Write Policy - Programmable

- A write back may be triggered in code
- Requires programmer to access cache control hardware to initiate the write back of a specific block.

21

Multiple Processors/Multiple Caches

- Even if a write through policy is used, other processors may have invalid data in their caches.
- In other words, if a processor updates its cache and updates main memory, a second processor may have been using the same data in its own cache, which is now invalid.

22

Solutions to Prevent Problems with Multiprocessor/cache systems

- Bus watching with write through/Distributed Control:
each cache watches the bus to see if data they contain is being written to the main memory by another processor. All processors must be using the write through policy. Either "write invalidate" or "write update"
- Hardware transparency/Central Control:
a "big brother" watches all caches, and upon seeing an update to any processor's cache, it updates main memory AND all of the caches.
- Non-cacheable memory:
any shared memory (identified by its region in memory) may not be cached.

23

Inclusive cache

- Any block present in a level of the cache must be present in all level below it
- Simplifies cache coherence
- Performance is limited when size of larger caches is not much greater than that of caches above it

24

Exclusive cache

- If a block is present anywhere, it is only present in one level of the cache system
- More blocks can be stored
- Protocol necessary to support exclusive caches increases cache resource use and communication

25

Non-inclusive cache

between inclusive and exclusive in that a block may or may not be present in only one level of the cache system.

26

Line Status "Flag" Bits

- Dirty bit:
indicates that one or more words in line have been written to and need to be written to main memory
- Valid bit:
indicates whether line contains valid data loaded from memory
- Lock bit:
Prevents replacement with new block from memory
- Type:
in a unified (non-split) cache, identifies line contents as code or data

27

Pipelining

As with a manufacturing assembly line, the goal of a CPU pipeline is to accomplish the following:
- break the process into smaller steps, each step handled by an independent sub process
- when one sub process completes, result passed to next sub process, and attempt made to begin the next task
- reduces chance of idle components
- multiple tasks being operated on simultaneously improves performance
- NO SINGLE INSTRUCTION IS MADE FASTER
- ENTIRE WORKLOAD CAN BE DONE FASTER

28

Instruction Pre-fetch

- The simplest approach is to divide instruction into only 2 stages:
FETCH instruction
EXECUTE instruction
- There are times when the execution of an instruction doesn't use main memory
- In these cases, use idle bus to fetch next instruction in parallel with execution
- This is called instruction PRE-FETCH
- A JUMP OR BRANCH INVALIDATES PRE-FETCHED INSTRUCTIONS

29

Pipelining Strategy

FETCH INSTRUCTION - read next instruction into buffer
DECODE INSTRUCTION - determine the opcode
CALCULATE OPERANDS - find source operand address
FETCH OPERANDS - get source operands from memory
EXECUTE INSTRUCTION - perform indicated operation
WRITE OPERANDS - store the result

30

Pipeline Disruptions

RESOURCE LIMITATIONS - if the same resource is required for more than one stage of the pipeline, e.g., the system bus
DATA HAZARDS - if calculated operands of a subsequent instruction depend on the write operands of a previous instructions, pipe is stalled
CONDITIONAL PROGRAM FLOW - the next instruction of a branch cannot be fetched until we know that we-re branching

31

Speedup Factor

Sk = nk/[k + (n - 1)]
*ASK PROFESSOR TARNOFF TO EXPLAIN*

32

Static Branch Prediction

- Predict always taken
- Predict never taken
- Predict by opcode
- Predict by offset (forward/backward)

33

Dynamic Branch Prediction

Depends on execution history
- Taken/not taken switch
- Branch history table

34

Elements Stored in Branch History Table

- Address of the branch instruction itself
- Current prediction
- Bits indicating branch history
- Branch target information, i.e., where do we go if we decide to branch?

35

Factors limiting the realization of an ideal speed-up of the pipeline

- Increased cost
- Increase of delay between stages
- Increase consequences of a branch