Metrics and Evaluation Flashcards

1
Q

Latency

A

The time it takes to get from starting a task to finishing it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Throughput

A

How many things we can do per second. (or over some time period?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Throughput = 1 / Latency ?

A

Not always! Pipelining, for example, can significantly improve throughput over 1 / Latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Two metrics of performance

A

Latency and throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Speedup

A

Speedup = N where “X is N times faster than Y”

i.e., N = Speed(X) / Speed(Y) = Throughput(X) / Throughput(Y)

For latency, N = Latency(Y)/Lateny(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Performance as a function of latency and throughput

A

Performance is proportional to throughput, and proportional to 1/Latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the types of benchmarks?

A
  • Real applications
  • Kernels
  • Synthetic benchmarks
  • Peak performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pros and cons of real applications for benchmarking?

A

Pro: Most representative
Con: Most difficult to set up, you need an OS, graphics cards, etc, and these might not all be set up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a benchmark?

A

A set of programs and input data agreed upon for performance measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an application kernel (re: benchmarking)? Pros and cons?

A

Only the most time-consuming part of an application. Pro: Better than using real applications
Con: but you still might be missing things that are necessary for running the kernel, i.e., a compiler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a synthetic benchmark?

A

Designed to emulate an application kernel, but is simpler to compile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is peak performance (re: benchmarking)?

A

At best, how many instructions per second? This is really only good for marketing!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When should we use each type of benchmark?

A

When designing a new machine, synthetic benchmarks. We can run them at the design stage because they have minimal requirements, enabling us to choose the best design. However, they’re not great for reporting performance to others.

Once a prototype is built, we can use kernels!

Once we have a real machine that we are trying to sell someone, we should use real applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you compute the average of several speedup calculations? How do you NOT compute it?

A

Do NOT use the arithmetic mean! This is because speedups are ratios.

Say we get our speedup of 2 for one app because throughput changes from 2 to 4 and a speedup of 8 for another app because throughput changes from 1 to 8. Ideally we would calculate average speedup directly from these throughputs like this:

(8 + 4) / (1 + 2) = 12/3 = 4

Note this is NOT equal to the average of average speedups, (2 + 8) / 2 = 5.

If we only have the average speedups, we can use their geometric mean, i.e., for n speedups, the geometric mean = the nth root of the product of all n speedups. Geometric mean of 2 and 8 is sqrt(2*8) = 4.

We can use it because the speedup of geometric means of throughputs = the geometric mean of speedups.

sqrt(48)/sqrt(21) = 4 = sqrt(2*8)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Iron Law of Performance

A

CPU Time = # of instructions in program * CPI * clock cycle time, because:

instructions/program * cycles/instruction * seconds/cycle = seconds/program (after canceling cycles and instructions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What determines the values of each component in the Iron Law of Performance/CPU time?

A

Number of instructions in program: algorithm, compiler, instruction set

CPI: instruction set, processor design

clock cycle time: processor design, circuit design, transistor physics

17
Q

How does computer architecture affect the Iron Law of Performance/CPU time?

A

Via instruction set and processor design.

Instruction set: more complex instructions will reduce # of instructions, but increase CPI. Vice versa for simple instructions.

Processor design: We can have processor with short clock cycle at the expense of greater CPI, or longer clock cycle with lower CPI.

A good design balances these tradeoffs.

18
Q

Iron Law for Unequal Instruction Times (different CPIs for different instruction types)

A

We need to combine instructions/program and cycles/instruction into:

The sum over all instructions of (the number of instructions of each type) * (CPI for that instruction type)

19
Q

Amdahl’s Law

A

We use this to calculate overall speedup when only some instructions are affected by a change.

overall speedup = 1/ ((1-FRAC_enh) + FRAC_enh/speedup_enh)

where FRAC_enh is percentage of execution TIME affected by the enhancement. TIME ONLY, not # of instructions.

20
Q

Implications of Amdahl’s Law

A

It’s usually better to have a small speedup on a larger percentage of execution time than a large speedup on a small percentage, i.e. MAKE THE COMMON CASE FAST!

Imagine an infinite speedup on a small portion of execution time to understand why (there is a hard cap on overall speedup)

21
Q

Lhadma’s Law

A

While Amdahl tells us to make the common case fast, we shouldn’t mess up the uncommon case too much!

While percent of execution time caps the benefit of any speedup, there is no such floor for slowdowns! If you slow down a small portion enough, you can make the program take arbitrarily long.

22
Q

What happens if we keep speeding up the same part of program?

A

Diminishing returns! After each improvement, that portion of the program becomes a smaller percentage of the total, so the overall speedups become smaller.