Architectures - Pipelining Flashcards

1
Q

What is a pipelined architecture?

A

An architecture in which more than one instructions are executed in parallel by the processor.
Every clock cycle an instruction begins a new task corresponding to one of the five stages that are part of the instruction. The instructions being executed in parallel, whatever their current stage is, have to be synchronized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which are the stages by which an instruction is composed?

A
  1. Instruction Fetch
  2. Instruction Decode
  3. Execute
  4. Memory
  5. Write Back
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the time of machine cycle?

A

the time required to the processor to execute one stage. It is determined by the slowest stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the CPI?

A

clock cycle per instructions: number of clock cycle required to finish an instruction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the throughput? How are the un-pipelined throughput and the pipelined one related?

A

throughput = # of instructions that exit the pipeline in the time unit

Throughput (ideal pipeline) = Throughput (un-pipeline) * # of stages

The time required to execute a step is higher in a pipelined architecture but the overall throughput is lower.

t is higher because due to the pipeline control overheads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How many clock cycles does an instruction require?

A

5
4 only for branch instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do we need at level memory to implement a pipeline? Why?

A

To implement a pipeline we need to work on different instructions at the same time being sure to not “break” the memory coherence so we need new structures:
4 pipeline registers:
1. IF - ID
2. ID - EX
3. EX - MEM
4. MEM - WB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an hazard? Which types of hazards are there?

A

An hazard identifies a situation in which the processor cannot complete an instruction execution in the designed time

  • Structural
  • Data
  • Control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Structural hazards: characteristics, example, solutions

A

Resource conflicts

There is a structural hazard when the processor cannot complete the operation of a certain stage in one clock cycle.

Examples:
- there is just one register-file write port but there are cycles in which two registers writes are required
- there is a single-port memory but different instructions try to access the memory together

Sol:
- adding new hardware or improving the existing one
-> a trade-pff between the performance and the cost is required based on the hazards frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data hazards: characteristics, example, solutions

A

An instruction depends on the result of a previous one and they are close enough -> OVERLAPPING -> the execution in the pipeline could change the order of the read and write operations and this could lead to wrong results o indeterminists behaviour

This can happen both for register operands and memory ones.
In case of memory operands a data hazard ca happen if a store and a load are not made in the same stage or if the execution can go on even if an instruction is waiting for a cache miss to be solved.

Example:
ADD R1, r2, r3
SUB r4, R1, r5 -> 2 stages before R1 availability
AND r6, R1, r7 -> 1 stage before R1 availability
OR r8, R1, r9 -> OK before writing is done before reading
XOR -> OK

Sol:
- stalling the instructions requiring some data until they are available
- forwarding (or bypassing) technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the forwarding technique and for what is it used?

A

This technique requires special hardware that can understand when a previous and a current ALU operations have to use the same register
In such a case the special hardware selects the result directly from the ALU output rather than from the register file

The hardware must:
- forward a data from any pipeline register to any inout of any functional unit
- not forward anything if the next instruction is stalled or an interrupt occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can all hazards be solved using data forwarding?

A

No! Forwarding cannot be done back in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens when an instruction is stalled?

A

The following instructions are also stalled.
The previous ones continue.

A bubble is introduced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we implement a stall for an instruction in the ID stage?

A

We put the stall in the EX stage:
- nop instruction: the ID/EX stage is forced to zero
- forcing the pipeline register IF/ID to maintain the same value
- not updating the PC’s value -> same IF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Control hazards: characteristics, example, solutions

A

They are related to conditional and unconditional branches that could change the PC when the following instructions have already been fetched

Solution (basic): when a branch instruction is detected the pipeline is stalled and this is done but deciding earlier if the branch has to be taken (by moving the comparison unit ahead of one stage) and by computing earlier the new PC value (by adding a new adder to the comparison unit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do branches lead to? How can we reduce it?

A

Performance degradation

Techniques to reduce it:
- freezing the pipeline: stalling the pipeline when a branch instruction is detected -> deciding earlier and updating the PC -> comparison unit move ahead of one stage and adder added to that unit
- predict taken: if the target address is known before the branch outcome it may be possibile to assume the branch as taken
- predict untaken: the branch is assumed to not be taken. If it will be taken the previous operations will be undone. If the branch decision is not taken there are no changes in the pipeline status

  • delayed branches: the processor decodes the branch instructions but it does nothing related to it. The processor just fills some “branch-delay” slots with instructions that have to be executed no metter the outcome -> using it only 30% of the branches produce a penalty. Several RISC architectures no long support this technique
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do we manage to execute multi cycle operations?

A

By modifying the EX stage: it will be composed of different functional units and it can be repeated many times as needed.

It is composed of:
- 1 integer unit
- 7 floating point and integer multiply unit (M1 - M7)
- 4 floating point adders (A1 - A7)
- a floating point or integer divider block

extended structure of the EX stage -> more frequent hazards

operations have longer latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is latency?

A

number of cycles that should last between an instruction that produces a result and the one that used it

19
Q

What is the initialization interval?

A

number of cycles that must elapse between issuing two operations of the same type to the same unit

20
Q

Structural hazards in case of multi-cycle operations

A

they happen because:
- the divide unit is not pipelined and several instructions could need it at the same time
- instructions have varying running time
- a single cycle requires more than 1 write operation

sol:
- adding write ports ($$$)
- forcing a structural hazard by stalling the instructions in the ID stage or before entering the MEM or WB one

21
Q

Data hazards in case of multi-cycle operations

A

Operations have longer latency so they are more frequent

Read After Write (RAW): hazard type due to the fact that instructions no longer reach WB in order

Write After Write (WAW): the result is a wrong overwriting

22
Q

MIPS R4000 Processor

A
  • 64 bit micro-processor
  • similar to MIPS64
  • pipeline with 8 stages to account for slower cache access and higher clock frequency -> memory accesses are decomposed in several stages
23
Q

Exceptions and Interrupts, difference

A
  • exceptions: they refer to an internal CPU event such as floating point overflow, MMU fault, trap…
  • interrupt: they refer to an external I/O event such as an I/O device request or a reset
24
Q

How can we classify exceptions?

A
  • synchronous: triggered always at the same point in the code
  • asynchronous: generated by external devices
  • user requested: similar to procedures
  • user coerced: out of the control of the user program
  • user maskable: the user can force the hardware not to answer to the exception request
  • user not-maskable
  • within instructions
  • between instructions
  • resume
  • terminate
25
Q

What is a restartable machine?

A

they are able to handle an exception, save the state, restart the program without affecting its execution

26
Q

Which are the steps to stop and restart an execution when an exception occurs?

A
  1. force a team instruction in the next IF stage
  2. until the trap is taken all the writes of the instruction that raised the exception and for all the following ones have to be turned off
  3. the exception-handling procedures receives the control and it immediately saves the PCof the faulting instruction
  4. there are special instructions that return the machine from the exception by reloading the PC and by restarting the instruction stream
27
Q

Interrupt protocol in 80x86

A
28
Q

Interrupt protocol in ARM

A
29
Q

Precise and Unprecise exceptions

A

Precise:
- if the pipeline can be stopped so that the instruction before the fault one can be completed and the instruction following the fault one can be restarted from scratch
- they are slower
- for integer instructions they are the default ones
- they are complex to guarantee with multi cycle instructions

Unprecise:
- it does not satisfy the above conditions

30
Q

Solution for imprecise exceptions

A
  • accepting them
  • providing two operating modes: fast and imprecise + slow and precise
  • forcing the FP units to earlier determine if an instruction could cause an exception and issuing further instructions only if the previous ones are guaranteed to not cause an exception
  • buffering the result of each instruction until all the previous ones are completed
31
Q

Which are the MIPS possible sources for exceptions?

A
32
Q

What are two contemporary exceptions?

A
33
Q

What do we mean with exception order?

A

there are cases in which two exceptions ca occur in the opposite order of the instructions they relate to

34
Q

Which one of the write and the read operation is executed first?

A

The write one

35
Q

Which are two of the most important pipeline limits?

A
  • we need balanced stages
  • pipeline overhead: pipeline registers delay + clock skew
36
Q

Pipeline overhead

A

pipeline registers delay + clock skew

37
Q

clock skew

A

the same clock signal arrives at different components at slightly different times

38
Q

Situation - action image at page 21

A
39
Q

Which is the compiler role if the hardware supports the predict taken and untaken techniques?

A

To maximize the chance for the process to make the right decision

  • for -> predict untaken
  • do while -> predict taken
40
Q

Solution to exception order

A
  • each instruction has a status flag
  • this flag is set if the instruction generates an exception and if it is set this instruction cannot perform any write operation
  • at the last stage, if the flag is set, an exception is triggered
41
Q

What is a committed instruction?

A

An instruction is considered committed when it has successfully completed all stages of execution, and its effects (state changes) have been fully applied to the processor’s state (e.g., registers, memory). Committing an instruction implies that the processor has reached a point where the instruction’s effects are permanent and won’t be rolled back.

42
Q

Which problems do we have related to the state of the machine and to the ability to implement precise exceptions? Solutions?

A

Some instructions, like those using autoincrement addressing modes, alter the machine state (e.g., updating a pointer or counter) before the instruction has fully completed. If such an instruction is aborted due to an exception (an error or special condition requiring special handling), it leaves behind changes in the machine state, making it difficult to revert the system to a consistent state.

Precise exceptions require the ability to accurately track and undo the effects of instructions if an exception occurs. For instructions that alter the state early in their execution, implementing precise exceptions is challenging because the system must be able to roll back these changes if the instruction cannot be committed due to an exception.

Solutions:
- Roll-Back Mechanism: A solution to this problem is to allow the system to roll back all state changes made by an instruction if it cannot be committed. This requires keeping track of all changes and having the ability to undo them, which adds complexity to the processor design.
- Condition Codes and Data Hazards: Instructions that implicitly update condition codes (flags used for branching decisions) introduce additional complications. They can cause data hazards (situations where instruction execution depends on the result of a previous instruction that has not yet completed). Managing these requires saving and restoring condition codes in the event of an exception, adding overhead and complexity.
- Compiler Challenges: These complications also affect compiler design, as the compiler must manage the scheduling of instructions to fill potential delay slots (gaps in the pipeline caused by waiting for condition codes to be updated) and ensure correct program execution.

43
Q

How could we implement complex instructions in a pipeline?

A

Complex instructions, especially those that do not have uniform length or behavior, are difficult to implement in a pipelined architecture, where multiple instructions are partially executed in parallel at different stages. One strategy to address this is to pipeline the microinstructions (simpler instructions that together implement a complex instruction), although this introduces its own set of challenges.