4.0 Out-of-order execution Flashcards

1
Q

What are some limitations with pipelines

A

1: treat all instructions the same (same execution time for the different stages)

2: maximum IPC: 1

3: Limits to in-order execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What determines the clock frequency?

A

The slowest pipeline stage.
All the stages will execute at this frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a solution to unification happening during pipelining?

A

Diversification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is diversification?

A

Update the pipeline by allowing the Execution phase to take different amount of cycles depending on operation to be done.
For example iteger operations take 1 cycle and memory operations take multiple.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are a benefit with using diversification

A

Shorter clock cycles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some problems with diversification?

A

1: Might get out-of-order completion

2: Multiple write operations to the register file in the same clock cycle

3: Exceptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can out-of-order completion happen when using diversification?

A

If the first instruction uses 3 cycles during the execute phase and the next uses 1, the second instruction will complete before the first.

If both instruction write to the same register, we have a write after write hazard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do we need to do to fix a write-after-write hazard caused by out-of-order execution?

A

Stall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some hazards that can happen as a result of out-of-order execution?

A

Write-after-write

Multiple writes per cycle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we fix the problem of having multiple writes per cycle as a result of out-of-order execution?

A

Multiple write ports, if different registers are being used

Consider write port as structural hazard. When detected, the pipeline can delay until the first write has finished

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are precise exceptions and how can out-of-order execution affect these?

A

Precise exceptions are when all instructions prior to the exception, has executed to their completion. And non instructions after the exception has completed at all.

Out-of-order instructions may allow for later instructions to complete write back before the excpetion happens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are interrupts?

A

Due to external factors

Asynchronous to program execution

Independent of program running

Of-of-order execution is not a problem for interrupts, as these don’t need to be handled immediately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are interrupts handled?

A

Stop fetching new instruction

Drain pipeline (execute instructions in the pipeline)

Store state (registers, PC)

Handle interrupt

restore program state and resume execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are exceptions and how are these handled?

A

Synchronous: result of program execution

Precise exeption flow:
- Store architecture state from just before the instruction that caused exception
- handle exception
- restore state and resume execution from instruction that caused exception

Can guarantee precise exceptions by stalling pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can parallelism be exploited in pipelines?

A

Superscalar pipelines are pipelines where multiple instructions can complete each cycle.

Parallelism in time(pipelining)

Parallelism in space (superscalar execution)

Superscalar pipeline use both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does superscalar pipelines handle hazards?

A

Handle hazard before sending instruction to a functioning unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Whatcan cause performance loss when using in-order issue?

A

Even though some instructions are completely independent on the previous ones, if previous instructions are stalling, these also need to stall to ensure in-order execution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are a key-idea behind out-of-order execution (2)?

A

Register renaming: Remove all anti dependences

Data flow execution: instruction execute as soon as their inputs are evailable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is data flow limit

A

The only thing preventing execution is that input data is not ready yet

Can illustrate by setting instructions as nodes in a graph, and real (RAW) data dependences as edges

Height of graph show minimum number of cycles needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe the out-of-order pipeline

A

Two new stages:
- register renaming
- dispatch

New structures:
- issue buffer
- reorder buffer
- store buffer and queue

writeback stage divided into two stages: complete and retire

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does the register renaming stage do?

A

Remove all WAW and WAR hazards

These dependences are caused by the lack of names

22
Q

What does the dispatch stage do?

A

Dispatches the instructions that now only have data dependences to the issue buffers

Instructions stay in the issue buffer until all their inputs have become available, then they continue execution

23
Q

What does the reorder buffer do?

A

Instructions are also dispatched to the reorder-buffer as well as he issue buffers. This to pretend the instructions are ordered to get precise exceptions

24
Q

What does the completion stage do?

A

Architecture state is updated
To software, the instruction appears to be fully completed

25
Q

What does the retirement stage do?

A

Applies to stores only.

Data is written to memory hierarchy

26
Q

How does the completion and retirement phase appear to different type of instructions?

A

For non-store instructions the completion phase is the same as the retirement.

For store instruction, completion moves from store queue to store buffer and retirement writes to memory and leave the store buffer.
In the store queue we have the stores that has not yet happend. Store buffer is the stores that hascompleted architecturaly, but not written back to memory.

27
Q

What does the issue buffer do?

A

Keeps track of instructions that has not executed yet

Inserted in buffer in program order

Execute on the functional unit when ready, and possibly leaves buffer out of program order

28
Q

What does the reorder buffer do?

A

Keeps track of window of instructions currently being execution

Main job is to maintain program order

insert in program order, and leaves reorder buffer in program order

Software gets illusion of sequentiality, precise exceptions

29
Q

What does register renaming accomplish and how

A

Remove anti and output dependences, only real data dependences

achieve data-flow-limit

Architectural registers are renamed to physical/rename registers

Need more physical registers then architectural

Key observation: a physical register is written at most once by an instruction in execution

Done completely in hard ware

30
Q

How are registers renamed?

A

Real data dependences will get the same name as the registers they’re dependent on (One instruction uses R3 and another is real dependent on R3, Renamed to F3 at both instructions)

anti dependency registers get new names (R3 used by two instruction, named F3 at one of them and F4 at the other)

Data flow show no reused name

31
Q

How is the architectural to physical register mapping done?

A

Need two tables.
- mapping table
- physical register file

In the frontend of the pipeline, the renaming stage, we need a mapping table. Provide the architectural register and retrieve a pointer to physical register. Size of mapping table: arch-registers times log_2(physical_reg)

In the backend pipeline we access the physical register file.

32
Q

What does a physical register consist of?

A

Have a state:
- starts as ‘available’

  • then physical register gets allocated and becomes a rename register (value not computed)
  • When instruction with this register is executed, the register becomes a rename register, value is computed
  • when instruction leaves the reorder buffer it is time to make the value of the register appear in program state - update the state of the architectural register.
  • architectural register is then overwritten and released
33
Q

How is register renaming implemented?

A

Initialization:
- All physical registers that map to architectural registers are in the ‘architectural register’ state
- all other registers are ‘availeble’

Renaming register:
- Read the physical register that corresponds with the architectural from the mapping table

Decide what to do with the output:
- Select an ‘available’ physical register and change the state to ‘rename register, value not computed’. update mapping table
- if no more ‘available’, stall pipeline until physical register becomes available

Instruction finish:
- State change to ‘rename register, value computed’

Instruction leaves reorder buffer (complete):
- change state of physical to architectural AND
- physical reg previously associated with same arch register changes to ‘available’

34
Q

What is the final stage of the frontend pipeline?

A

Dispatch stage

35
Q

What happens during the dispatch stage?

A

Allocates space in- and dispatches instructions to the issue buffers of their respective functional unit and the reorder buffer.

If the issue buffers OR the register buffer are full, the pipeline must stall.

36
Q

What is Tomasulos algorithm?

A

The out-of-order execution

37
Q

What does the reservation station/issue buffers do?

A

Issue buffer decouples the frontend from the backend of the pipeline.

Buffering of instructions until all their operands are available.

Instructions leave the issue buffer when all operands are available and the functional unit is available. (Instruction ready signal = 1)

When execution has finished, the target ID and result is put on the forwarding bus.

38
Q

What does a reservation station entry contain?

A

Busy: Is entry in use

op1: if valid - contain reg value, if valid is 0 - contain reg ID
valid
op2
valid

ready: instruction is ready to be executed

39
Q

How is an instruction woken up in the reservation station?

A

The results come in from the FUs along side their reg-identifiers.

A comparator compares the incoming ID with the one stored in the operand X field in the issue-buffer entry.

The output of this comparison is used as control signal for a multiplexer using the FU-result as input, so that the result can be written to the operand field.

Then the valid bit is set.

Both valid bits are AND together and the result of this is stored in the ready-bit of the RS-entry

40
Q

What happens when more than one instruction are ready in the RS for the same functional unit?

A

Implement policies to choose what instruction to issue first.

For example schedule oldest-first

41
Q

What does the reorder buffer do?

A

Contain all in-flight instructions.

All instructions after dispatch and before completion.

Is a circular buffer - head/tail pointers

The number of new instructions that can be added each cycle is the dispatch width. Dispatch happens at the tail, in-order dispatch.

The number of instructions that can leave the ROB is limited by the completion width. completion happens at head, in-order completion

42
Q

What does the ROB-entry consist of (reorder-buffer entry)

A

busy: entry is in use

finished: instruction has finished execution and is ready to be committed

instruction address

previous reg mapping: used to deallocate the architectural register that this instruction writes to

speculative: is this instruction fetched beyond a predicted branch

store: if the instruction is a store

exception?: set if the instruction generated an exception

43
Q

What is the instruction window?

A

When ROB and issue buffer are combined into one structure.

Cons: large structure and HW complexity

44
Q

What are precise exceptions?

A

We save architectural state befor the exception occured, handle exception, and continue execution from the instruction that caused exception.

45
Q

How do we get precise instructions in out-of-order processors?

A

Set exception flag in ROB-entry

Let all instructions up until this complete

Handle instruction when the excpetion instruction is about to complete.

Then need to undo the register renaming that was done after the exception.

46
Q

How do we undo register renaming after an exception?

A

The exception instruction is at the ROB-head.

The register mapping is restored from the ROB tail to the ROB head

47
Q

How is mispredicted branches handled in out-of-order processors?

A

Register mapping must be restored to the state just after the branch

Can do this the same way as with exception, but this is more work as branches often gets mispredicted.

Use checkpointing instead: take a snapshot of the mapping table after each branch.
Restore table table from the snapshot after the mispredicted branch

48
Q

What is data captured scheduling?

A

Data and IDs are transfered from the FUs to the reservation station

Data is set as operands

Instructions are then issued to the FUs

A downside of doing this is that there is a fair share of data copying

49
Q

What is non-data captured scheduling?

A

Only the tags (IDs) are passed around, and the register file is between the reservation station and the FUs.

50
Q

What are the advantages of non-data captured organization?

A

Reservation station is less complex because it does not store the register values, only the tags.

Reduce wiringbetween FUs and the reservation station entries.

51
Q

How is RAW hazards handled in non-data captured scheduling

A

When a result is ready from the FUs it is forwarded to the FU inputs for the next instruction. This instruction needs it, but it is not yet written to the register file.

52
Q

What is Value prediction?

A

An attempt on approving the data flow limit.

Done at the beginning of the pipeline.

Predict outcome of loads/ALU ops

Don’t need to wait for the result to be computed and can therefor execute dependent instructions sooner

A speculative technique - prediction needs to be verified.

Can work because of locality

Is not implemented as much because of the power wall, and it being not efficient. If prediction is wrong we have done work that is useless, and we need to restore state.