Week 10 & 11 - CPU Performance & Design and Pipelining Flashcards

Question 1

Q

What are the 2 types of CPU?

Answer

A

CPU: Central Processing Unit (very fast, 3GHz, only 8 cores)
GPU: Graphical Processing Unit (not so fast, <1GHz, many cores (100+))

CPUs are latency-focused, and GPUs are throughput-focused.
CPU software is serial, GPU software is parallel.

Question 2

Q

What is response time and throughput?

Answer

A

How long it takes to do a task (duh), and total work done per unit of time (eg. tasks per hour)

Question 3

Q

lab exam question

What is the formula for performance?
Example: 10s on A, 15s on B

Answer

A

Performance = 1/Execution Time
PerformanceA/PerformanceB =
Execution TimeB/Execution TimeA

A is then 1.5 times faster than B.

Question 4

Q

What are elapsed time and CPU time?

Answer

A

Elapsed time = response time - total response time, including all aspects (processing, I/O, OS overhead, idle time), this one determines system perfomance

CPU Time - time spent processing a given job (discounts I/O time and other), comprises user CPU time and system CPU time

Question 5

Q

What does clock rate f depend on?

Answer

A

Clock rate f depends on implementation technology used and CPU organization used.

Clock frequency (rate) f: cycles per second
Clock cycle time => C = 1/f

eg.
f=1GHz
C = 1/f = (1/10^9) sec = 1ns

Question 6

Q

What is performance improved by?

Answer

A

By reducing the number of clock cycles.
CPUtime = Clock Cycle / Clock Rate
Clock rate must often be traded off against cycle count.

Question 7

Q

Formula for CPU time?

Answer

A

clock cycle * clock time
clock cyles / clock rate
Icount * CPI * clock time
Icount * CPI / clock rate
or even
Instructions/Program *
ClockCycles / Instruction *
Seconds / ClockCycle

Question 8

Q

What is a micro-operation?

Answer

A

an elementary hardware operation thay can be carried out in one clock cycle.

register transfer, arithemetic, logic

Question 9

Q

Instructions can be divided into four classes according to their CPI (classes A, B, C, D). P1 with a clock rate of 2.5Ghz and CPIs of 1, 2, 3, 3 and P2 with a clock rate of 3GHz and CPIs of 2, 2, 2, 2. Instruction count is 1.0E6: 10% A, 20% B, 50% C, 20% D. Which is faster: P1 or P2?

a) global CPI
b) clock cyles for both?

Answer

A

Class X = 1E6 * %
Class A = 1E6 * 0.1 etc

CPU time = Icount * CPI / clockrate
CPU time = SUM (I count class A * CPI) / clockrate

CPU time for P1 = (10^5 * 1 + 2 * 10^5 * 2
5 * 10^5 * 3 + 2 * 10^5 * 3) / 2.5 * 10^9 = 10.4 x 10^(-4) s

a) CPI = CPU time * Clock rate / IC
CPI (P1) = 10.4 * 10^(-4) X 2.5 * 10^9 /
10^6 = 2.6

Question 10

Q

Reducint power example:

Suppose a new CPU has:
85% of capacitive load of old CPU
15% voltage and 15% frequency reduction

What’s the capacity now?

Answer

A

P new /P old =
(C old x 0.85 x (V old x 0.85)^2 x F old x 0.85)
/ ( C old x V old x F old) = 0.85^4 = 0.52

Question 11

Q

Formulas for MIPS?

Millions of Instructions per Second

Answer

A

MIPS = Instruction count / Execution time * 1E6 =
Instruction count / (Instruction count x CPI / Clock rate) x 1E6 =

Clock rate / CPI * 1E6

Question 12

Q

Latency and throughput of a pipeline.

Fetch 100ps, Decode 200ps, Execute 150ps, Memory 350ps, Writeback 50ps

a) For a non-pipelined processor, what is the cycle time, latency and throughput?
b) same for pipelined

Answer

A

a) Cycle Time = 100ps+200ps + 150ps+350ps+50ps = 850ps
Latency = 850ps
Throughput = 1/850ps

b) Cycle Time = max (100, 200, 150, 350, 50) + 20) = 370ps
Latency =370 * 5 ps = 1850 ps
Throughput = 1/370ps

Question 13

Q

5 stages of pipelining?

Answer

A

Instruction execution in 5 stages:
○ Instruction fetch (IF)
○ Instruction Decode (ID)
○ ALU operation (EX)
○ Memory Access (MEM)
○ Write Back result to register file (WB)

i hope you know how to draw a diagram (k+n-1 cycles)

Question 14

Q

What hazards are there when it comes to pipelining?

Answer

A

Different instructions need the same resources (structural hazards)
- Different instructions need results from other instructions (data
hazards)
- Different instructions execute depending on other instructions
(branch instructions, control hazards)

Question 15

Q

The instruction pipeline has the following stages: instruction fetch (IF), instruction decode (ID), operand fetch (OF), perform
operation (PO) and writeback (WB). The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards.
The number of clock cycles required for completion of execution of the sequence of instruction is ______.

Answer

A

1 instr: 1 + 1 + 1 + 3 + 1 = 7 clock cycles
39 instr: 39 x 3 cycle = 117
35 instr → 35 x 2 cycle = 70
25 instr → 25 x 1 cycle = 25
Total 219 cycles

Question 16

Q

The stage delays in a 4-stage pipeline are 800, 500, 400, and 300 picoseconds. The
first stage (with the delay of 800 ps) is replaced with a functionally equivalent
design involving two stages with respective delays 600 and 350 picoseconds.
The throughput increase of the pipeline is _____ %.

Answer

Study These Flashcards

A

In the initial 4-stage pipeline, the slowest stage has a delay of 800 picoseconds. So, the throughput is 1/800 picoseconds.

In the new pipeline, the stages have delays of 600, 350, 500, 400, and 300 picoseconds. So, the new throughput is 1/600 picoseconds.

The throughput increase is given by the formula:
Throughput increase = (New - Old) / Old

Throughput increase = 0.3333 = 33.33%

Question 17

Q

Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are
intermediate storage buffers after each stage and the delay of each buffer is 1
ns. A program consisting of 12 instructions I1, I2, I3, …, I12 is executed in this pipelined processor. Instruction I4 is the only branch instruction and its branch
target is I9.
If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is ___ ?

Answer

Study These Flashcards

A

Cycyle time = max of all stages delay + buffer delay = max (5, 7, 10, 8, 6) +1 = 11ns
When I4 takes branch, the control will jump to the target instruction I9. There will be 3 stalls in stages DI, FO, EI. Total number of cycles will be 15. Since 1 clock cycle time = 11ns, total time will be 15 * 11ns = 165ns

Week 10 & 11 - CPU Performance & Design and Pipelining Flashcards

(17 cards)