Lecture 9 - Simulation Flashcards

(20 cards)

1
Q

Problem (Motivation - Introduction)

A

Designing Architectures is generally expensive (economically).
- Costs of High-Level Design
- Costs of Verification of Design
- Costs of Low-level Design

Its hard to determine if our new architectural design works other than to simulate it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a simulator simulate?

A
  • Component Simulator: Simulates an architectural component, e.g., branch predictor, cache.
  • Instruction Set Simulator: Simulates just the instruction set of a particular architecture. NOTE: might have to emulate the OS interface.
  • Microarchitecture simulator: Simulates the underlying microarchitecture (e.g., cache, BP, ROB, speculation…). NOTE: could be cycle-accurate, or cycle-approximate.
  • Full-system simulator: Simulates everything (although not necessarily microarchitecture) and has to deal with IO, MMU, interrupts….
  • System-on-chip simulator: Simulates the full system on a chip including CPU, GPU, DSPs, additional processors, network, and IO.
  • Electronic circuit simulator: Simulates the electrical components and signals in a circuit.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Terminology

A
  • Host machine: The platform that is running the simulator.
  • Guest/target machine: The platform that is being simulated (e.g., AArch64, x86).
  • Cross-architecture: Simulating a different architecture than the host (e.g., x86 on AArch64).
  • Same-architecture: Simulating the same architecture as the host (e.g., x86 on x86).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Instruction Set Simulator

A
  • Simulates just the instruction set of the guest architecture.
  • User-mode simulation: Takes a single application binary as input and executes the guest instructions without needing to simulate system instructions.
  • Full-system simulation: Boots an entire operating system as if it was running on a real machine and needs to simulate system instructions.
  • Functional: Instructions alter architectural state like a real machine (e.g., register files) but don’t care about microarchitectural state.
  • Cycle-accurate: Model of microarchitecture required to simulate caches, branch predictors, etc.
  • Paravirtualization: Program/OS under simulation “knows” it’s being simulated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Microarchitecture Simulator

A
  • Simulates the underlying microarchitectural components of the platform, e.g., the caches, branch predictor, reorder buffer.
  • Typically cycle-accurate but can be slow.
  • Trace driven: Updates microarchitectural state based on a fixed sequence of records coming from a trace file.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Full-System Simulator

A
  • Simulates not only the instruction set of the architecture but also devices that might be attached to the platform, MMU for virtual memory, and interrupts.
  • May or may not include the microarchitecture, if cycle-accurate simulation is desired.
  • Instruction Set Simulation extended to include “system” instructions.
  • Emulation of devices (e.g., serial, storage, networking, etc.) needs to be implemented.
  • Emulation of MMU required.
  • Needs the ability to handle interrupts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

System-on-chip Simulator

A
  • Simulates multiple components that might be found on a single chip, i.e., multiple processors (CPUs, GPUs, DSPs), IO, memory, etc.
  • Can be used to help application developers start writing software before the hardware becomes available.
  • Can be modified to try experimental features - rapid prototyping.
  • Gives an insight into how hardware could be adapted to suit the target application, or the firmware/software.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Scope of Simulation

A
  • Cycle-accurate simulation.
  • Cycle-approximate simulation.
  • Instruction Set Simulation (or Functional Simulation): Just model the functional behavior of the guest architecture’s instructions.
  • Hybrid approaches.
  • Sampling-based (or statistical) simulation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Approaches to Simulation

A
  • Static simulation: Translates application program ahead-of-time, but is not really used in practice because it’s tricky to do static binary translation.
  • Dynamic simulation:
    • Interpreter: Can introduce instrumentation at runtime for measuring/debugging.
    • Dynamic Binary Translation: Can introduce instrumentation at runtime for measuring/debugging and is the most popular approach with lots of implementations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interpretation

A
  • Fetch/Decode/Execute loop is SLOW but “simple”.
  • Each instruction is processed one-at-a-time, in order.
  • Execute behavior might be a function, or it might be inline in a big switch statement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Dynamic Binary Translation

A
  • Considers (at least) a guest basic block of instructions at a time.
  • Translates the guest instructions in the block, into host instructions.
  • Caches the translated code, and executes it.
  • Tricky to implement and hard to make fast, but it is SIGNIFICANTLY faster than interpretation.
  • Translation granularity can be basic-blocks, traces, or regions.
  • Optimisation approaches can be none, intra-basic-block, inter-basic-block, trace or region.
  • Compilation strategy can be always just-in-time, synchronous/asynchronous, or a hybrid interpreter/DBT.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Simulation Speeds

A
  • Varies greatly depending on the scope and implementation.
  • Functional Simulation: Really good interpreter: 100+ MIPS (2-100x slowdown), Really good DBT: 3000 MIPS (sometimes faster than native!).
  • Cycle Accurate Simulation: 0.1-1 MIPS (10,000-100,000x slowdown).
  • Multicore Cycle Accurate Simulation: 1-5 KIPS (1,000,000x slowdown).
  • MIPS: Millions of (guest) Instructions Per Second - measure of simulation throughput
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sampling

A
  • A sampling simulator tries to solve the speed problem (for cycle accurate simulation).
  • Only actually fully simulates a small part of the code and runs the rest functionally in some other way (DBT, native implementation, etc.).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Challenges of Sampling Simulation

A
  • What to sample? (Program phase detection)
  • How to handle IO?
  • How to handle system calls (if emulating OS layer)
  • Resource sharing (and context switching)
  • Building an accurate statistical model.
  • The warming problem.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The Warning Problem

A
  • How do you transition between one type of simulation to another?
  • For example, if a sampling simulator works in functional mode 90% of the time, but does 10% cycle accurate simulation, how do we restart cycle accurate simulation after running in functional mode? What is the state of the processor?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Some Solutions:

A
  • Restart with previous state (fast forwarding) which could simulate parts of the processor in the fast forward state but is slow/inaccurate.
  • Live points.
  • Reusable warm architectural/micro-architectural checkpoints. Multiple checkpoints allows simulation parallelism
17
Q

Checkpointing

A

Two types of checkpointing:
- High level: Cache and directory tags, complete memory data (~10-200MB).
- Low level: Registers, TLB, branch predictor, cache tags, touched memory data (~150KB).

  • Could use both.
  • Wenisch et al. propose the low level for uniprocessors, high level for multiprocessors
18
Q

Accuracy

A
  • Really hard to know.
  • Some simulators are ‘verified’ to perform identically (to a given tolerance) to hardware (usually embedded processors). This means they have a certificate (usually) from the manufacturer and are generally the most accurate.
  • Some simulators are the reference platform where silicon is derived from the simulator.
  • Modern microprocessors are complex, and even in a full simulator, there may be shortcuts taken/inaccuracies.
  • The real processor might have bugs, and the simulator might have bugs or be incomplete.
19
Q

Power Modelling

A
  • Some simulators support power/energy modelling, which is an active area of research.
  • There is no such thing as ‘cycle-accuracy’ for power.
  • It is usually based on some ‘power model’, and accuracy is very difficult to determine, generally done by empirical experimentation
20
Q

Summary

A
  • It’s a really hard problem.
  • Don’t blindly trust simulation and try it on real hardware if possible.
  • But it is a super-useful tool for all stages of development/design/debugging.
  • Fast simulators can really help to enable rapid prototyping and development, continue running legacy applications, support hardware/software co-design, and bridge the gap between hardware development and application development.
  • FPGA implementation is an alternative, but only high level design simulated.