Lecture 9 - Simulation Flashcards

Question 1

Q

Problem (Motivation - Introduction)

Answer

A

Designing Architectures is generally expensive (economically).
- Costs of High-Level Design
- Costs of Verification of Design
- Costs of Low-level Design

Its hard to determine if our new architectural design works other than to simulate it.

Question 2

Q

What does a simulator simulate?

Answer

A

Component Simulator: Simulates an architectural component, e.g., branch predictor, cache.
Instruction Set Simulator: Simulates just the instruction set of a particular architecture. NOTE: might have to emulate the OS interface.
Microarchitecture simulator: Simulates the underlying microarchitecture (e.g., cache, BP, ROB, speculation…). NOTE: could be cycle-accurate, or cycle-approximate.
Full-system simulator: Simulates everything (although not necessarily microarchitecture) and has to deal with IO, MMU, interrupts….
System-on-chip simulator: Simulates the full system on a chip including CPU, GPU, DSPs, additional processors, network, and IO.
Electronic circuit simulator: Simulates the electrical components and signals in a circuit.

Question 3

Q

Terminology

Answer

A

Host machine: The platform that is running the simulator.
Guest/target machine: The platform that is being simulated (e.g., AArch64, x86).
Cross-architecture: Simulating a different architecture than the host (e.g., x86 on AArch64).
Same-architecture: Simulating the same architecture as the host (e.g., x86 on x86).

Question 4

Q

Instruction Set Simulator

Answer

A

Simulates just the instruction set of the guest architecture.
User-mode simulation: Takes a single application binary as input and executes the guest instructions without needing to simulate system instructions.
Full-system simulation: Boots an entire operating system as if it was running on a real machine and needs to simulate system instructions.
Functional: Instructions alter architectural state like a real machine (e.g., register files) but don’t care about microarchitectural state.
Cycle-accurate: Model of microarchitecture required to simulate caches, branch predictors, etc.
Paravirtualization: Program/OS under simulation “knows” it’s being simulated.

Question 5

Q

Microarchitecture Simulator

Answer

A

Simulates the underlying microarchitectural components of the platform, e.g., the caches, branch predictor, reorder buffer.
Typically cycle-accurate but can be slow.
Trace driven: Updates microarchitectural state based on a fixed sequence of records coming from a trace file.

Question 6

Q

Full-System Simulator

Answer

A

Simulates not only the instruction set of the architecture but also devices that might be attached to the platform, MMU for virtual memory, and interrupts.
May or may not include the microarchitecture, if cycle-accurate simulation is desired.
Instruction Set Simulation extended to include “system” instructions.
Emulation of devices (e.g., serial, storage, networking, etc.) needs to be implemented.
Emulation of MMU required.
Needs the ability to handle interrupts

Question 7

Q

System-on-chip Simulator

Answer

A

Simulates multiple components that might be found on a single chip, i.e., multiple processors (CPUs, GPUs, DSPs), IO, memory, etc.
Can be used to help application developers start writing software before the hardware becomes available.
Can be modified to try experimental features - rapid prototyping.
Gives an insight into how hardware could be adapted to suit the target application, or the firmware/software.

Question 8

Q

Scope of Simulation

Answer

A

Cycle-accurate simulation.
Cycle-approximate simulation.
Instruction Set Simulation (or Functional Simulation): Just model the functional behavior of the guest architecture’s instructions.
Hybrid approaches.
Sampling-based (or statistical) simulation.

Question 9

Q

Approaches to Simulation

Answer

A

Static simulation: Translates application program ahead-of-time, but is not really used in practice because it’s tricky to do static binary translation.
Dynamic simulation:
- Interpreter: Can introduce instrumentation at runtime for measuring/debugging.
- Dynamic Binary Translation: Can introduce instrumentation at runtime for measuring/debugging and is the most popular approach with lots of implementations.

Question 10

Q

Interpretation

Answer

A

Fetch/Decode/Execute loop is SLOW but “simple”.
Each instruction is processed one-at-a-time, in order.
Execute behavior might be a function, or it might be inline in a big switch statement

Question 11

Q

Dynamic Binary Translation

Answer

A

Considers (at least) a guest basic block of instructions at a time.
Translates the guest instructions in the block, into host instructions.
Caches the translated code, and executes it.
Tricky to implement and hard to make fast, but it is SIGNIFICANTLY faster than interpretation.
Translation granularity can be basic-blocks, traces, or regions.
Optimisation approaches can be none, intra-basic-block, inter-basic-block, trace or region.
Compilation strategy can be always just-in-time, synchronous/asynchronous, or a hybrid interpreter/DBT.

Question 12

Q

Simulation Speeds

Answer

A

Varies greatly depending on the scope and implementation.
Functional Simulation: Really good interpreter: 100+ MIPS (2-100x slowdown), Really good DBT: 3000 MIPS (sometimes faster than native!).
Cycle Accurate Simulation: 0.1-1 MIPS (10,000-100,000x slowdown).
Multicore Cycle Accurate Simulation: 1-5 KIPS (1,000,000x slowdown).
MIPS: Millions of (guest) Instructions Per Second - measure of simulation throughput

Question 13

Q

Sampling

Answer

A

A sampling simulator tries to solve the speed problem (for cycle accurate simulation).
Only actually fully simulates a small part of the code and runs the rest functionally in some other way (DBT, native implementation, etc.).

Question 14

Q

Challenges of Sampling Simulation

Answer

A

What to sample? (Program phase detection)
How to handle IO?
How to handle system calls (if emulating OS layer)
Resource sharing (and context switching)
Building an accurate statistical model.
The warming problem.

Question 15

Q

The Warning Problem

Answer

A

How do you transition between one type of simulation to another?
For example, if a sampling simulator works in functional mode 90% of the time, but does 10% cycle accurate simulation, how do we restart cycle accurate simulation after running in functional mode? What is the state of the processor?

Question 16

Q

Some Solutions:

Answer

Study These Flashcards

A

Restart with previous state (fast forwarding) which could simulate parts of the processor in the fast forward state but is slow/inaccurate.
Live points.
Reusable warm architectural/micro-architectural checkpoints. Multiple checkpoints allows simulation parallelism

Question 17

Q

Checkpointing

Answer

Study These Flashcards

A

Two types of checkpointing:
- High level: Cache and directory tags, complete memory data (~10-200MB).
- Low level: Registers, TLB, branch predictor, cache tags, touched memory data (~150KB).

Could use both.
Wenisch et al. propose the low level for uniprocessors, high level for multiprocessors

Question 18

Q

Accuracy

Answer

Study These Flashcards

A

Really hard to know.
Some simulators are ‘verified’ to perform identically (to a given tolerance) to hardware (usually embedded processors). This means they have a certificate (usually) from the manufacturer and are generally the most accurate.
Some simulators are the reference platform where silicon is derived from the simulator.
Modern microprocessors are complex, and even in a full simulator, there may be shortcuts taken/inaccuracies.
The real processor might have bugs, and the simulator might have bugs or be incomplete.

Question 19

Q

Power Modelling

Answer

Study These Flashcards

A

Some simulators support power/energy modelling, which is an active area of research.
There is no such thing as ‘cycle-accuracy’ for power.
It is usually based on some ‘power model’, and accuracy is very difficult to determine, generally done by empirical experimentation

Question 20

Q

Summary

Answer

Study These Flashcards

A

It’s a really hard problem.
Don’t blindly trust simulation and try it on real hardware if possible.
But it is a super-useful tool for all stages of development/design/debugging.
Fast simulators can really help to enable rapid prototyping and development, continue running legacy applications, support hardware/software co-design, and bridge the gap between hardware development and application development.
FPGA implementation is an alternative, but only high level design simulated.

Lecture 9 - Simulation Flashcards

(20 cards)