CSCI 223 Midterm 2 Flashcards

1
Q

3 mechanisms in procedures

A
  1. passing control (call, return)
  2. passing data (using either registers or memory)
  3. memory management (stack - allocate/deallocate)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

x86-64 stack grows toward

A

lower memory addresses (down)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

register ? contains address of top of stack

A

%rsp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

stack operation: pushq Src

A

fetch (read) operand at Src), decrememt %rsp by 8 (or 4 on 32-bit), write operand at address given by %rsp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

register ? contains address of the frame pointer

A

%rbp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

stack operation: popq Dest

A

read value at address given by %rsp, increment %rsp by 8 (or 4 on 32-bit machine), store value at Dest (must be register)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

procedure control flow

A

use stack
procedure call: push return address on stack, jump to label
procedure ret: pop address from stack, jump to address

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

return address

A

address of the next instruction right after call

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

return value

A

%rax (64-bit) or %eax (32-bit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

stack allocated in ?

A

frames (state for single procedure instantiation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

contents of stack frames

A

return information, local storage (if needed), temporary storage (if needed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

management of stack frames

A

space allocated when enter procedure (“set-up” code; includes push by call instruction) and deallocated when return (“finish” code; includes pop by ret instruction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

we create a new stack frame when

A

we enter a new function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

calling conventions

A

caller saved and callee saved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

caller saved

A

caller saves temporary values in its frame before the call

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

callee saved

A

callee saves temporary values in its frame before using, then restores them before returning to caller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

%rax is the ?, ?-saved, and can be modified by ?

A

return value, caller-saved, procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

%rdi…%r9 are the ?, ?-saved, and can be modified by ?

A

arguments, caller-saved, procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

%r10, %r11 are ?-saved, and can be modified by ?

A

caller-saved, procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

%rbx, %r12, %r13, %r14 are ?-saved, and can be modified by ?

A

caller-saved, callee must save and restore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

%rbp is the ?, ?-saved, and can be modified by ?

A

frame pointer, callee-saved, callee must save and restore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

%rsp is the ?, ?-saved, and can be modified by ?

A

stack pointer, callee-saved (special form), restored to original value upon exit from procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

recursion is handled by ? calling conventions

A

normal (in the use of the stack)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

char/unsigned char

A

1 byte

2^8 (256) numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
short/unsigned short
2 bytes | 2^16 numbers
26
int/unsigned int
4 bytes | 2^32 numbers
27
long/unsigned long
8 bytes | 2^64 numbers
28
integer encoding standards: unsigned integers
like converting binary to decimal: the sum of the number (0 or 1) times 2^i
29
integer encoding standards: two's complement for signed integers
first bit is a sign bit (0 = non-negative, 1 = negative) in addition to a value, converting the rest is like binary to decimal
30
to find the negative version of an integer
1. find bit pattern for positive version 2. flip all bits 3. add 1
31
for a 16-bit integer, UMax (unsigned)
0xFFFF = 2^16-1 (need to know context to know difference between -1 and 2^16-1)
32
for a 16-bit integer, TMax (signed)
0x7FFF = 2^15-1
33
for a 16-bit integer, TMin (signed)
0x8000 = -2^15
34
for a 16-bit integer, -1
0xFFFF = -1 (need to know context to know difference between -1 and 2^16-1)
35
for a 16-bit integer, 0
0x0000 = 0 (always all zeroes for signed and unsigned)
36
the size of TMin is equal to
TMax + 1
37
the size of UMax is equal to
2*TMax + 1
38
equivalence of unsigned/signed numeric values
same encodings for nonnegative values
39
uniqueness of unsigned/signed numeric values
every bit pattern represents a unique integer value; each representable integer has a unique bit encoding
40
constants by default are considered to be ? integers unless
signed; U as suffix
41
sign extension
given w-bit signed integer x, convert it to w+k bit integer with same value
42
rule for sign extension
make k copies of sign bit
43
floating point numbers
encode rational numbers of the form v = x*2^y
44
standard for representing floating point numbers (but not all)
IEEE 754
45
single-precision for IEEE 754
32-bit
46
double precision for IEEE 754
64-bit
47
IEEE 754 sign bit is
MSB
48
IEEE 754 next set of bits
exp
49
IEEE 754 last set of bits
frac
50
IEEE 754 formula
(-1)^s (1.frac) 2^(exp - bias)
51
divide by two by shifting
right
52
multiply by two by shifting
left
53
IEEE 754 single precision bit allocation
sign bit = 1 bit exp = 8 bits frac = 23 bits
54
IEEE 754 double precision bit allocation
sign bit = 1 bit exp = 11 bits frac = 52 bits
55
normalized values for IEEE 754
when exp =/= 000...0 or 111...1
56
bias for exp
2^(k-1) - 1
57
bias for exp for 32 bit, 64 bit
127, 1023
58
denormalized values for IEEE 754
exp = 000...0 cases: frac = 000...0 represents zero value (+0, -0 depending on sign bit) frac =/= 000...0 represents numbers very close to 0.0
59
special values for IEEE 754
exp = 111...1 cases: frac = 000...0 represents infinity (both positive and negative depending on sign bit) frac =/= 000...0 represents Not-a-Number (NaN; when no numeric value can be determined)
60
IEEE 754 zero
``` exp = 00...00 frac = 00...00 ```
61
IEEE 754 smallest positive denormalized
``` exp = 00...00 frac = 00...01 ```
62
IEEE 754 largest denormalized
``` exp = 00...00 frac = 11...11 ```
63
IEEE 754 smallest positive normalized
``` exp = 00...01 frac = 00...00 ```
64
IEEE 754 one
``` exp = 01...11 frac = 00...00 ```
65
IEEE 754 largest normalized
``` exp = 11...10 frac = 11...11 ```
66
floating point zero is (the same/different from) the integer zero
the same
67
rounding modes
towards zero, round down (towards -inf), round up (towards +inf), nearest even (default)
68
int to double conversion
exact conversion, as long as int has <= 53 bit word size
69
int to float conversion
will round according to rounding mode
70
double/float to int conversion
truncates fractional part, like rounding toward zero; not definite when out of range or NaN
71
key features of RAM
traditionally packaged as a chip, basic storage unit is normally a cell (one bit per cell)
72
RAM stands for
Random-Access Memory
73
static RAM (SRAM)
used for cache; each cell stores a bit with a four or six-transistor circuit; retains value indefinitely, as long as it is kept powered; relatively insensitive to electrical noise (EMI), radiation, etc.; faster and more expensive than DRAM
74
dynamic RAM (DRAM)
used for main memory; each cell stores bit with capacitor; one transistor per bit; value must be refreshed every 10-100ms; more sensitive to disturbances than SRAM; slower and cheaper than SRAM
75
DRAM and SRAM are ? memories
volatile (lose info if powered off)
76
nonvolatile memories + examples
retain value even if powered off (ROM, PROM, EPROM, EEPROM, flash memory)
77
ROM
read-only memory; programmed during production
78
PROM
programmable ROM; can be programmed once after manufacturing
79
EPROM
erasable PROM; can be bulk erased (UV, XRay)
80
EEPROM
electronically erasable PROM; on our system; electronic erase capability
81
flash memory
EEPROMs with partial (sector) erase capability; wears out after about 100,000 erasings
82
uses for nonvolatile memories
firmware programs stored in ROM, solid state disks, disk caches
83
bus
a collection of parallel wires that carry address, data, and control signals; typically shared by multiple devices
84
memory read transaction
movl A, %eax CPU places address A on the memory bus main memory reads A from the memory bus, retrieves word x, and places it on the bus CPU reads word x from the bus and copies it into register %eax
85
memory write transaction (more complicated)
movl %eax, A CPU places address A on bus; memory reads it and waits for the corresponding word to arrive CPU places data word y on the bus main memory reads data word y from the bus and stores it at address A
86
explain the power wall
CPU clock rates increase up to 2003, then drop because the number of cores was increased in exchange for lower speed to save power
87
the CPU-memory performance gap
widens between DRAM, disk, and CPU speeds; the key to bridging this gap is locality
88
locality
programs tend to use data and instructions with addresses near or equal to those they have used recently
89
temporal locality
recently referenced items are likely to be referenced again in the near future
90
spatial locality
items with nearby addresses tend to be referenced close together in time
91
referencing array elements in succession is an example of ? locality
spatial
92
referencing the variable "sum" in each iteration is an example of ? locality
temporal
93
referencing instructions in sequence is an example of ? locality
spatial
94
cycling through a loop repeatedly is an example of ? locality
temporal
95
stride
the distance between two consecutively accessed addresses (e.g. stride-N, stride-1)
96
CPU registers hold ? retrieved from ?
words; L1 cache
97
L1/L2 cache holds ? retrieved from ?
cache lines; main memory
98
main memory holds ? retrieved from ?
disk blocks/"page" local disks
99
local disks hold ? retrieved from ?
files; disks on remote network servers
100
memory hierarchy
registers, L1 cache, L2 cache, main memory, local secondary storage, remote secondary storage
101
cache
a smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device
102
fundamental idea of memory hierarchy
for each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1
103
why do memory hierarchies work?
because of locality, programs tend to access the data at level k more often than they access the data at level k+1. thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit
104
big idea of memory hierarchy
creates an illusion of a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top
105
cache hit
data block b is needed and found in the cache
106
cache miss
data block b is needed, not found in the cache, fetched from memory, then stored in cache
107
types of cache misses
cold (compulsory), conflict, capacity
108
cold/compulsory cache miss
occurs because the cache is empty
109
conflict cache miss
occur when the level k cache is large enough, but multiple data objects all map to the same level k block
110
capacity cache miss
occurs when the set of active cache blocks (working set) is larger than the cache
111
cache memories
small, fast SRAM-based memories managed automatically in hardware that hold frequently accessed blocks of main memory
112
cache configurations
direct-mapped, e-way associative, fully associated
113
general cache organization
``` S, E, B S = 2^s sets E = 2^e lines per set B = 2^b bytes per cache block cache size = C = SxExB data bytes ```
114
direct-mapped cache
1 line per set number of lines = number of sets fastest
115
E-way associative cache
E = # lines per set
116
fully associated cache
one set with all cache lines
117
cache read
locate set, check if any line in set has matching tag, yes + line valid: hit, locate data starting at offset
118
cache block mapping: direct-mapped
block will always go to the same location (address % number of blocks)
119
cache block mapping: fully associative
block can go anywhere
120
cache block mapping: E-way associative
block has E options in one set
121
what to do on a write hit
write-through (write immediately to memory) or write-back (defer write to memory until replacement of line; need a dirty bit to determine if the line is different from memory or not)
122
what to do on a write miss
write-allocate (load into cache, update line in cache) or no-write-allocate (writes immediately to memory)
123
typical selection for write hit/write miss
write-back + write-allocate
124
intel core i7 hierarchy has ? cores, each with ?
4; Registers, L1 d-cache, L1 i-cache, and L2 unified cache
125
the intel core i7 processor package holds
the 4 cores and the L3 unified cache (shared by all cores)
126
intel core i7 L1 i-cache and d-cache
32 KB, 8-way, access: 4 cycles
127
intel core i7 L2 unified cache
256 KB, 8-way, access: 11 cycles
128
intel core i7 L3 unified cache
8MB, 16-way, access: 30-40 cycles
129
intel core i7 block size
64 bytes for all caches
130
cache miss rate
fraction of memory references not found in cache; misses/accesses = 1 - hit rate
131
hit time
time to deliver a line in the cache to the processor (includes time to determine whether the line is in the cache)
132
miss penalty
additional time required because of a miss
133
average memory access time
AMAT = hit time + (miss rate * miss penalty)
134
99% hits is ? times as good as 97% hits
two
135
three primary strategies for cache block replacement policy
1. random (or psuedo-random) 2. least recently used (LRU) 3. first in, first out (FIFO - oldest block is replaced)
136
cache optimization
1. reduce miss rate 2. reduce miss penalty 3. reduce hit time
137
six basic cache optimizations
1. larger block size to reduce miss rate 2. larger cache to reduce miss rate 3. higher associativity to reduce miss rate 4. multilevel caches to reduce miss penalty 5. give priority to read misses over writes to reduce miss penalty 6. avoid address translation to reduce hit time
138
2:1 cache rule of thumb
a direct-mapped cache of size N has about the same miss rate as a 2-way set associative cache of size N/2. This held in three C's figures for cache sizes less than 128 KB