Single Processor Computing Flashcards

1
Q

Von Neumann Architecture

A

A stream of instructions executed in an order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Control Flow

A

A prescribed sequence of instructions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stored Program

A

Instructions and program data stored in the same memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fetch, Execute, Store

A

Load next instruction onto processor,
Operation is executed
Writes content back to memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Direct To Memory Architecture

A

Allows data to be directly sent to and from memory to processors.

Work stays in the memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Out-of-order instruction handling

A

Instructions can be processed in a different order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do modern processors use out-of-order instruction handling?

A

Increased performance with pipelining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pipelining

A

Allowing the CPU to work on multiple smaller instructions at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Execution time for pipelined instructions?

A

t(n) = (n + n[1/2])x

x = basic operation time
n = Linear operation time
n[1/2] = Sublinear operation time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Instruction level parallelism (ILP)

A

Finding independent instructions and running them in parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Speculative Execution

A

Assumes the test will turn out true or consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Prefetching

A

Data can be speculatively requested before any instruction needing it is actually encountered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Branch Prediction

A

Guessing whether a conditional instruction will evaluate to true and executing accordingly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

von Neumann bottleneck

A

Transferring data from memory takes more time than actually processing the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does the memory hierarchy address the von Neumann bottleneck?

A

By reducing the number of times the CPU has to wait for data to be transferred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Latency

A

The delay between the processor issuing a request from a memory item and the item arriving.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bandwidth

A

The rate at which data arrives at its destination after the initial latency is overcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What role do registers play in the memory hierarchy?

A

Registers are the fastest and allow the CPU to access data instantly without going to slower caches or memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What role do cache systems play in the memory hierarchy?

A

Cache stores frequently accessed data that the CPU might need without having the CPU go to the main memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Cache Tags

A

Info keeping track of the location and status of data in the cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the 3 types of cache misses in single-processor systems?

A

Compulsory: Caused when data is first accessed and is not present in the cache.

Capacity: Caused by data having been overwritten because the cache can not contain all your problem data.

Conflict: Caused by one data item being mapped to the same cache location as another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What additional type of cache miss occurs in multi-core systems?

A

Coherence: multiple processor cores have copies of the same data in their cache, but those copies become inconsistent due to changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What two properties of typical program memory access patterns are exploited by cache systems?

A

Temporal Locality: When a memory location is accessed, it is more likely to be accessed again.

Spatial locality: When a memory location is accessed, it is likely that nearby memory locations will also be accessed.

24
Q

LRU Replacement Policy

A

Least Recently Used

25
FIFO Replacement Policy
First In First Out
26
Cache Line
The smallest unit of data on a cache.
27
What property of program memory accesses do cache lines exploit?
Spatial locality.
28
What effect does stride have on cache line efficiency?
If the cache line is large, then having a large stride means we'll only access a small part of it. If the cache line is very small, then striding wouldn't necessarily cause a problem.
29
Cache Mapping
Deciding how data is stored on the cache.
30
Direct Mapped Cache
Each memory block is mapped to only one specific cache location.
31
Fully associative cache
Each memory block can be mapped to any cache location.
32
K-way set associative cache
The cache is divided into a number of sets, each containing k cache lines.
33
Cache Memory
Small, fast memory used to store frequently accessed data
34
Dynamic Memory
Data that is currently being executed.
35
Stall
Delay in the execution of instructions caused by a dependency between instructions.
36
Cache Miss
When the data is not available in the cache and has to be fetch from main memory.
37
Prefetch data stream
When the processor tries to predict what data will be accessed and fetch that data to the cache.
38
What is a hardware prefetch and how is it different from a prefetch intrinsic?
Intrinsic = Controlled by software Hardware = Controlled by hardware
39
What is Little’s Law? What does it tell us about the effect of latency on computing systems?
Concurrency = Bandwidth X Latency. Increasing latency will increase the build-up of requests and reduce performance.
40
Why are memory banks used in memory systems?
To increase memory bandwidth. If multiple requests target the same bank (bank conflict), the requests must be handled in serial, which reduces memory bandwidth.
41
What is a bank conflict?
When two consecutive operations access the same memory bank at the same time.
42
What is the TLB? What function does it have in the memory hierarchy?
Translation Look-aside Buffer: a cache of frequently used page table entries, providing fast address translation of pages. If a program needs a memory location, the processor first checks the TLB, else it looks up the page in the main memory.
43
What property would a program have that would cause performance degradation due to TLB misses?
Poor spatial locality
44
How big is a typical TLB?
Between 64 and 512 entries.
45
What are cache-aware and cache-oblivious algorithms?
Aware: Designed to take advantage of specific cache sizes. Oblivious: Designed to perform well on a wide range of sizes without requiring any knowledge of cache parameters.
46
What is loop tiling and what is it used for?
Breaking up loops into smaller, nested loops to increase performance
47
What is the motivation for multi-core architectures?
Separate cores can work on unrelated tasks, or introduce data parallelism.
48
Define Core, Socket, and Node
Core: General Processing Unit of Distributed memory Socket: Connects a CPU to the motherboard Node: Contains multiple sockets accessing the same shared memory
49
Cache Coherence
Ensuring all cached data are an accurate copy of the main memory.
50
False Sharing
When multiple threads access different variables located in the same cache line.
51
NUMA
Non-uniform memory access: For a processor running on some core, the memory attached to its socket is faster to access than the memory attached to another socket.
52
First touch phenomenon
Dynamically allocated memory is not allocated until it’s first written to.
53
What is arithmetic intensity and how do we compute it?
Number of operations per memory access. f(n) / n where f(n) is the number of operations it takes, and n is the number of data items that an algorithm operates on.
54
What is the roofline model? What defines the roofline?
Visually expresses the arithmetic intensity and performance limits. Performance limit defines the roofline.
55
How can we use the roofline model to optimize code performance?
To see if the code is limited by hardware or arithmetic throughput.