Floating Point Flashcards

1
Q

Most modern architectures implement ___ standard for floating point representations

A

IEEE 754

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False: “floating-point units (FPUs) are outdated and included in CPUs anymore”

A

False. Most CPUs include FPUs now

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Benefits and purpose of FPUs

A

Have instructions to do FP arithmetic very quickly. If no FPU, FPU simulation is very slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the difference between fixed and floating point representations

A

Fixed: number has a fixed number of decimal digits, more precise, smaller range

Floating: number of decimal digits may vary, less precise, large range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is the position of the binary point established in a fixed-point number?

A

By convention. It is agreed upon by the programmer and the system so it may not be the same for every program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the binary point comes after the second bit in a fixed point number, 0111, what decimal number will result?

A

0x2^1 + 1x2^0 + 1x2^(-1) + 1x2^(-2) = 1.75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mantissa (significand) and what is used to represent it in a floating point representation?

A

4.65 in 465 x10^(-4)

Fixed point is used to represent the mantissa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the differences between floating-point single format and scientific notation?

A
  • Base 2 instead of 10
  • The significand (mantissa) and exponent are in 0b
  • The number is normalized so that 1<= significand <2
    • Since MSB is always 1, it’s not stored
    • the exponent is biased by adding the constant 127 so it’s stored as a + integer
  • IEEE 754 standard
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does IEEE 754 standard look like for floating point single-format numbers?

A
  • Uses 4 bytes
  • Number = (-1)^s x 1.f x 2^(e-127)

s= signed bit
f = fractional part of significand
e = biased exponent (add 127 to unbiased)

-check notes for diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Range of biased exponents allowed

A

1-254

  • 255 used to represent not numbers (NaNs)
  • 0 used for subnormal numbers (tiny fractional quantities)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Range for floating-point single format. Also approximately how many digits of precision?

A

1.175e-38 to 3.403e+38
1.0 x2^(-126) to (2.0 -ε) x2^(127)

total range:
-3.403e+38 to -1.175e-38
Gap to 0.0 gap to
+1.175e-38 to +3.403

~7 digits of precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the s, e and f values for |0|01111101|0000000…| (0x3E800000)

A

sign = 0
biased exponent = 125
fractional part of significand: 0.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the s, e and f values for |0|01111101|0100000…| (0x3E900000)

A

sign = 0
biased exponent = 125
fractional part of significand: 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does IEEE 754 standard look like for floating point double-format numbers?

A
  • 8 bytes
  • Number = (-1)^s x 1.f x 2^(e-1023)
  • check diagram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Range for floating-point double format. Also approximately how many digits of precision?

A

~2.2e-308 to ~1.8e+308
~17 digits

Also NaNs are represented with biased exponent 2047

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

+/- infinity, root(-1) NaNs in single format

A

+: 0x7f800000
-:0xff800000
root(-1): 0x7fffffff

16
Q

+/- infinity, root(-1) NaNs in double format

A

+: 0x7ff00000 00000000
-:0xfff00000 00000000
root(-1): 0x7fffffff ffffffff

17
Q

How can NaN be used?

A
  • in most instructions causes an exception but can be compared to a number using fcmp
  • can be used as an argument to some functions
18
Q

How many floating point registers does ARMv8 have? Also what types of registers are they?

A

32 128-bit fp registers

s registers use low-order 32 bits for single-precision fp numbers

d registers use low-order 64 bits for double-precision fp numbers

19
Q

opcode for multiply-negate of a fp number

20
Q

How is #fpimm encoded? How is it expressed?

A

With 1 sign bit, 4 bits of fraction and 3 bits for exponent

as +-n/16x2^r
n in range: 16-31
r in range: -3 to +4

21
Q

Instruction for converting s <-> d

Instruction for converting s or d to nearest 32-bit or 64-bit signed int

instruction for converting s or d to unsigned ints

A

fcvt

fcvtns

fcvtnu

22
Q

Instruction for converting Wn or Xn to Sd or Dd

Instruction for converting unsigned ints to floats

A

scvtf
ucvtf

23
Q

What registers are used to pass fp arguments into a subroutine?

A

d0-d7 / s0-s7

24
What are registers d8-d15 called?
Callee-saved - if used in a subroutine, it must save & restore their values on the stack - only the bottom 64-bits of the 128 bit register need to be preserved
25
Which registers can be overwritten by a subroutine?
0-7 and 16-31 caller is responsible for saving/restoring if they need to be preserved over a subroutine call
26
What are two things that are stored in .text?
String literals floating point values
27
What is the bias we add to the exponent for floating point double format?
1023