8.0 Micro architectural design options Flashcards

1
Q

What is scoreboarding?

A

Mostly used in combination with a super-scalar in-order pipeline.

Assume you don’t use register renaming (have WAW, and WAR hazards)

Provides precise exceptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is scoreboarding implemented?

A

Extend pipeline with splitting the issue stage in two:
- issue
- operand fetch

Add a scoreboard: a table that keeps track of each instructions that are in flight.

Scoreboarding is a centralized technique for hazard detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the issue stage of score boarding

A

Check if the functional unit where the instruction can run is available.

Check that no active instructions have the same destination register.

This takes care of WAW- and structural hazards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the operand fetch stage of scoreboarding

A

An instruction can fetch its operands when they’re available.

Operands are available when no in flight instructions are in the process of writing to them.

Takes care of RAW hazards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the execution stage of scoreboarding

A

The functional unit executes the instruction.

When complete, the scoreboard is notified.

No registers are read/written to - no new hazards in this stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the writeback stage of scoreboarding

A

Need to delay writing if there is an preceeding instruction, in program order, that has not fetched its operands yet. And this register is the one to be written to. Takes care of WAR hazards.

This may cause out-of-order completion. An younger instruction that goes to a shorter latency ALU unit, may complete before an older instruction that is in a longer latency unit. If the older instruction then has an exception, we don’t get presise exceptions.

To ensure presise exceptions, we require the instructions to write results in program order.
The results are buffered in the pipeline registers until they can be written. Because of this, we need to issue instructions in order within a functional unit to avoid deadlocks.

Can still do out-of-order execution across the different functional units, but not within

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between scoreboarding and data flow execution with Tomasulo algorithm?

A

Scoreboarding is a sentralized scheme, where Tomasulo is distributed. When the work is distributed across the reservation stations, it will preceed without any more coordination

Tomasulo uses register renaming (hazard handling in frontend), while scoreboarding takes care of the hazards in the issue and writeback stages.

Scoreboarding executes instructions in order within the functional unit, where Tomasulo executes instructions when they are ready meaning they can be out-of-order.

Tomasulo dispatches instructions in order (inserts into ROB and reservation stations at the same time), scoreboarding issues instructions to different functional units out of order - when the dependences are met and the units are available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the behaviour of in-order execution.
Name a limitation and advantage

A

Instructions are executed in-order, where the order is determined by the software(programmer).

A limitation is that one instruction stall, blocks the entire pipeline.

Offload the ILP analysis to the compiler. Saves complexity.

ILP (instruction-level-parallelism) in hardware is limited.

Less complex hardware. Less complexity, but higher clock speed but (in this case a good thing).

Lower power consumption because of less hardware complexity.

These types of processors can for example be useful in embedded systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the different orders of instructions we find in a pipeline?

A

Issue order
dispatch order
execution order
completion order (commit or writeback)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What affect how much you can deviate from program order?

A

The microarchitectural techniques implemented in the given processor.

As long as hazards are respected - anything goes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some techniques that can be used when designing a processor?

A

Dataflow execution (Tomasulo)
Scoreboarding
Prediction and speculation
Caching
Register renaming

Depending of how these techniques are combined, they will achieve different design points (area, performance, power, cost)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe an architecure where register renaming is combined with scoreboarding

A

WAW and WAR hazards are handled automatically.

Instruction scheduling is simplified (complexity) at the expense of more physical area taken and register management overhead (complexity).

Issue and operand fetch stages are merges, because we don’t need to seperate these anymore, as this seperation handled one of the hazards. Register rename is before issue/operand.

Issue stage:
- Issue when operands are available
- issue when functional unit is available
- handles RAW and structural hazards

Writeback stage:
- Don’t need to stall the functional unit
- write the result back to the rename register
- The rename register is promoted to architectural register when the instruction is the oldest in-flight instruction

Execution stages remains the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a dependency chain?

A

A sequence of instructions that depend on each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a common pattern of dependency chains?

A
  • instruction compute the address of a load (Address Generating Instruction)
    -Data is loaded from memory
  • computation is performed
  • result is written back to memory

Within loops, step 1 and 2 can be overlapped with 3 and 4 of later iterations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is AGI

A

Address generating instruction (i.e. load)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When looking at the performance potential, what are some observations that can be made?

A

1: Memory-hierarchy parallelism is key to achieve performance. This is because of the huge gap between computational speed and memory latency. If we can hide memory latency, performance will increase significantly.

2: Allowing loads+AGIs to go out-of-order relative to the rest of the program, you can close a significant part of performance gap between in-order stall-on-use and dataflow out-of-order. AGIs+loads goes in-order in respect to the internal stream, but out-of-order in respect to the rest of the program. Put AGIs and loads in their own queue, and let these execute independently of the other instruction queue. This requires the use of speculation

17
Q

How can we identify AGIs and loads?

A

Loads: easy, these are just load instructions

Some simplified assumptions:
- focus on loops (not too much to gain without loops)
- need to know if an AGI is an AGI, but not necessarily need to know which load it is used in

Solution: Iterative Backward Dependency Analysis (IBDA):
- When we see a load or a previously identified AGi, mark the instruction that this is dependent on, as an AGI
- Each loop iteration, the dependency chain will grow by one instruction

18
Q

Describe the Load Slice Core

A

2 decode stages

Uses renaming

Instruction slice table

Instruction dependency table

Has two issue queues: A queue and B queue

ALU and Load/store are seperate units

19
Q

What are the scheduling rules of the Load Slice Core?

A

1.
When identified, loads and AGIs goes to the Bypass (B) instruction queue. Some AGIs will be missed in early loop-iterations.
Loads and AGIs runs ahead of compute, and therefore overlap memory access

2.
Rest of instructions go in the A instruction queue

3.
Stores goes to both A and B queue to avoid hazards.

Instructions are executed in-order within each queue, but (potentially) out-of-order in total