Computer Systems Flashcards

1
Q

Why is a write-invalidate protocol is usually preferred to a write-update protocol?

A
  • Multiple updates to the same cache block with no intervening reads require multiple update broadcasts in an update protocol, which is wasteful over the limited bus bandwidth.
  • The invalidate protocol only requires a single invalidate broadcast on the first write.
  • Minimizing bus traffic of paramount importance as the bus is usually the bottleneck that limits the number of processes that can be accommodated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is meant by a ‘load-store instruction set architecture’?

A

A load-store ISA is an example of a general-purpose register architecture where the operands if the arithmetic and logic instructions are located in registers, not memory. The only instructions that can access memory are load/store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the adoption of a load-store ISA influence the design of CPU hardware?

A
  • Facilitates fixed length, fixed format instructions that are easy to decode
  • Demands a large set of general-purpose registers to store intermediate results
  • Does not admit complex instructions, so the datapath can be controlled without excessive use of microcode.
  • Results in all instructions having similar complexity, which facilitates effective pipelining.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain why the latency of the ALU is of paramount importance in computer hardware design?

A
  • According to Amdahl’s Law, it is of paramount importance to improve the speed of frequent operations in a system to have the greatest effect on the performance of a system (i.e. make the common case fast).
  • In most ISAs, the ALU is used at least once in most instructions, hence improving the latency of the ALU will significantly improve the speed of computer systems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is meant by a pipelined datapath?

A
  • Extra registers between the principle datapath stages.
  • Advances through just one datapath stage in each clock cycle, storing interim results in the registers.
  • In this way, several instructions can be in the pipeline at the same time.
  • Pipeline therefore increase instruction throughput.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are pipeline hazards?

A

Describe dependencies between running instructions that disrupt operation of the pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data hazard

A

Occur when an instruction requires data before a previous instruction has updated its contents to the register file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Branch hazard

A

Occur when the address to the next instruction is required before an earlier conditional branch instruction has been evaluated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What extra hardware would be required in a data pipeline to support super-scalar operation?

A
  • Superscalar CPUs issue > 1 instruction/clock cycle.
  • Need to duplicate essential elements (for arithmetic and storage) in the pipeline.
  • Instructions should be sequenced to avoid simultaneous use of resources (i.e. ALU + MEM).
  • Widen registers to store double states.
  • Widen instruction memory to process 64 bits per cycle.
  • Read 4 input registers and write 2 at a time
  • Extra hazard detection logic.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What extra hardware would be required in a data pipeline to support SMT operation?

A
  • Extra hardware to deal with each process’s independent state.
  • Separate registers, PC, and TLB per thread.
  • Since instruction can be shared, can expect fewer stalls and higher throughput.
  • More cache conflicts and misses due to simultaneous updates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain why inclusion of a cache, between the CPU and MM, generally improves a computer’s performance?

A
  • Faster because of physical proximity to CPU
  • Construction out of static RAM instead of slower dynamic RAM
  • Data still needs to be fetched from MM to the cache, but this overhead is easily amortized through temporal and spatial locality of reference, so main memory latency price paid just once.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the relative advantages of direct-mapped and set-associative caches?

A
  • DM caches have an index pointing to a unique block (i.e. no searching): low hit time
  • Might replace blocks to be referenced soon as there is no choice: high miss rate
  • SA points to set of blocks to search over: increased hit time
  • Flexibility of blocks to replace (i.e. LRU): low miss rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What requirements of a modern computer system motivate the adoption of a virtual memory system?

A
  • Should be able to write programs without having to worry about the amount of physical memory in the computer.
  • CPU should be able to execute multiple process separately, with each process unaware of, and protected, from the others.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of the TLB?

A

The need to look up address translations in the page table undermines the presence of a cache.

The CPU contains a small, fast cache called the TLB to store recently used page entries, consisting of an index (i.e. virtual page #) that corresponds to a physical page #.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

TLB Properties

A
  • Total size: 32-4096 page table entries
  • Block size: 1-2 page entries
  • Hit time: < 1 clock cycle
  • Miss rate: 0.01-1%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interfacing peripherals -
Polling

A
  • Polling requires the least hardware.
  • CPU periodically checks a status register to see if an I/O event needs servicing and handles the transaction if needed.
  • Polling frequency must be high enough to satisfy the devices max data transfer rate.
  • This process wasteful of CPU time.
  • Use for low-frequency devices like mice that can tolerate low-freq polling.
17
Q

Interfacing peripherals -
Interrupt-driven

A
  • Extra signal lines to interrupt CPU when I/O device needs attention.
  • CPU still involved in every transaction, so still heavily loaded for active devices.
  • No CPU load when device is idle, so useful for low bandwidth devices that are mostly idle like printers.
18
Q

Interfacing peripherals -
DMA

A
  • Most expensive in terms of hardware, with dedicated DMA controller.
  • CPU interrupted by I/O service request, and hands control of bus to DMA to handle transaction.
  • CPU interrupted after completion to check if successful.
  • DMA suitable for high-bandwidth devices like disk access.
19
Q

DMA precautions -
Virtual memory

A
  • Raises question of whether to use virtual or physical page addresses.
  • If controller has a physical address, DMA transfers must be constrained to a single page since contiguous virtual pages to not generally map to contiguous physical pages.
  • If having a virtual address, the DMA needs the necessary page translations for transfer.
20
Q

DMA precautions -
Cache

A
  • If DMA writes data directly from disk to memory, data in the cache will be stale and unsuitable for the CPU.
  • If cache is write-back, DMA may transfer stale data from memory to disk, bypassing newer cache data.
21
Q

DMA cache solutions

A
  • Route all DMA activity through the cache, which is wasteful of cache space and CPU bandwidth.
  • Have the OS invalidate the cache for an I/O write, or force write-backs for an I/O read.
  • Selectively flush or invalidate individual cache entries for DMA.