Floating Point Arithmetic Flashcards

Question

Answer 1

A

We need to be able to handle very small and very large numbers and need a representation where the decimal point can move.

Answer 2

A

The floating point representation has a base and a precision. It takes the form:

d.dddd x b^e

Where d.dddd is the significand (mantissa) which has p digits. and e is the exponent.

Answer 3

A

It is the most widely adopted standard for floating-point arithmetic. The standard specifies number representations as well as how operations should behave.

Answer 4

A

Double precision is a representation of a floating point number using 64 bits.

Answer 5

A

Floating point numbers are an approximation of the actual number and have a finite accuracy and finite range. Because of the range and accuracy needed for scientific and engineering calculations double precision is used.

Answer 6

A

52 bits for the mantissa, 11bits for the exponent, 1 bit for the sign

Answer 7

A

A floating point number that has an exponent that is neither all zeros or all ones.

Answer 8

A

All mantissa bits are 0, all exponent bits are 0 except the last.

Answer 9

A

all mantissa bits are 1 and all exponent bits are 1 except the last

Answer 10

A

The difference between 1 and the next largest representable floating point number.

Answer 11

A

The error introduced from the difference between the floating point approximation and the actual number represented.

Answer 12

A

If the error from the numerical method is smaller than the error from the floating point representation.

Answer 13

A

If the exponent bits of a floating point number are all 1’s or all 0’s.

Answer 14

A

A number with all the exponent bits as 0s.

Answer 15

A

They expand the range around 0 and allow for more gradual underflow.

Answer 16

Study These Flashcards

A

inf - all exponent bits are 1s, all mantissa bits are 0s
-inf - all exponent bits are 1s, all mantissa bits are 0s, sign bit is 1
NaN - all exponent bits are 1s and not all mantissa bits are 0

Answer 17

Study These Flashcards

A

Overflow
Underflow
Divide by zero
Invalid
Inexact

Answer 18

Study These Flashcards

A

Floating point errors set a flag in the hardware and can be trapped for debugging.

Answer 19

Study These Flashcards

A

The theoretical maximum performance of a system

Answer 20

Study These Flashcards

A

The number of sockets
The number of processor cores per socket
The clock frequency
The number of operations per cycle

Answer 21

Study These Flashcards

A

vector instructions
fused multiply-add

Answer 22

Study These Flashcards

A

Rpeak = Sockets X CoresPerSockets X Clock X OperationsPerCycle

Answer 23

Study These Flashcards

A

Multiple node peak performance (Rpeak) by the number of compute nodes

Answer 24

Study These Flashcards

A

The instruction set
Precision

Floating Point Arithmetic Flashcards

(25 cards)