Course 1 Part 2 Flashcards

Question 1

Q

How often do you have enough data to obtain a smooth normal distribution

Answer

A

Almost never (with center X and width parameter sigma)

Question 2

Q

How do you calculate the mean

Answer

A

x̄ = (x1 + x2 + x3 … + xn) / n

As n increases the mean gets closer to the true value of the parent (ie perfect) distribution

Question 3

Q

How to calculate spread of results?

Answer

A

dᵢ = xᵢ - x̄
The deviations will be positive and negative (random errors), and, by definition, sum to 0

dᵢ2 = (xᵢ - x̄)2 is done by squaring all values to give a non 0 value used in standard dev

Question 4

Q

Standard deviation equation

Answer

A

σ = sqrt ( (Σx2 / n) - x̄2 )

σx = sqrt ( (1/n-1) x sum of dᵢ^2)
As n increases σ tends towards the width parameter of parent distribution
Don’t calculate σ if n < 5

Question 5

Q

Sample standard deviation

Answer

A

If the mean has been calculated from the data us the sample standard deviation
n
σ (subsctipt n-1) = sqrt( (1/(n-1) × Σ(xᵢ - x̄)2
i=1

• Difference between population and sample std. dev. decreases with larger n.
• The term variance is often used for σ2. Be careful with variances unless you are clear about n vs. n – 1!

The -1 accounts for having used information from the data (a “degree of freedom”) in determining the mean; can be shown that the population std. dev. underestimates σ.

Question 6

Q

Standard error of the mean (SEM) - very important

Answer

A

• As we increase the number of data points (n) the mean (x̄) tends towards X, and its standard deviation (σx) tends towards σ.
• It can be proved that the (hypothetical) distribution function characterising x̄ is also a normal distribution centred on X but with width parameter σ/sqrt(n)

Question 7

Q

Central limit theorem (diversion)

Answer

A

• Even if parent distribution is not Gaussian, the distribution function for the sample mean tends quickly to Gaussian; this is the Central Limit Theorem.
• E.g. adding results from just 4 dice gives a very Gaussian-like shape.
• Since measurements generally involve multiple sources of uncertainty, the CLT allows us to use simple statistics.

Question 8

Q

Equation for standard error of the mean

Answer

A

σx̄ = σx/sqrt(n)

Standard error is vague (is the uncertainty on x or x̄?)
This equation however measures the precision to which we know x̄. Precision increases with n

Therefore quote results as x̄ ± kσx̄
K is “coverage factor” and usually =1 or 2

Question 9

Q

How to put σx̄ into your answer
Eg. E = 12.34 ± 0.05 J and E = 12.34(5) J

Answer

A

E = 12.34 ± 0.05 J - uncertainty is given to 1 standard dev.
x̄ σx̄

E = 12.34(5) J

Question 10

Q

Uncertainty on the uncertainty
What are the rules?

Answer

A

• The SEM tells us about the uncertainty on the measured mean
• What is the uncertainty on the uncertainty (or “error on the error”)?

Hence the following rules
• Uncertainties are quoted to only one figure
• But good practice to use 2 figures if leading digit is 1
• Note uncertainties are rounded UP, not to nearest figure.
• 2 figures may be justified for a very large number of measurements (e.g. n > 500) e.g. NIST** web site gives
elementary charge e = 1.6021766208(98) x 10-19 C

Question 11

Q

Golden rules for reporting measured results

Answer

A

The best estimate of a parameter is the mean
Its uncertainty is the standard error in the mean (SEM)
Round the uncertainty up to the appropriate number of significant figures (usually 1)
Match the number of decimal places in the mean to the uncertainty, and apply “normal” rounding
INCLUDE UNITS

Question 12

Q

What are conifidence intervals?

Answer

A

Integrate between x2 and x1 for function G(x) ie the normal distribution to get the probability

Widely used limites are:
1 times the σ = 68% probability that measured value is within σ of the true value
2 times the σ = 95 %
3 times the σ = 99.7% of the data are within this

(Ie there is only a 68% chance value lies in 12.6 +-0.2 range)

Note the parent distribution is not normally known so limited practical relevance

Question 13

Q

Key ranges in the confidence interval

Answer

A

68% probability that true value (μ / X) is within 1 x SEM (σx̄) of x̄
95% probability that true value is within 2 x SEM of x̄
99.7% probability that true value is within 3 x SEM of x̄

Side note:
2.3 ± 0.1 is termed a conifidence interval (CI) so can also write results x = 2.3 ± 0.2 (95 % confidence interval) recommended for k > 1

Question 14

Q

Confidence intervals vs tolerances
- important

Answer

A

• x = 2.3 ± 0.1 without qualification is sadly ambiguous* (ie not sure if its 68% or 95% etc) however x =2.3(1) is not as this is always k =1
• If this is a confidence interval with k = 1, then 32% chance that true value of x is <2.2 or >2.4!
• If measuring against a scale by eye, often say we are measuring to nearest scale point (e.g. mm). Usually fudge this into a confidence interval by writing L = 23.0 ± 0.5 mm where 0.5 mm (half scale interval) is a fake SEM.
• Values associated with equipment are tricky:
• ± 1 mV might mean that true voltage is 99.7% guaranteed to within 1 mV (effectively a CI with k = 3) (therefore need to divide answer by 3 to get SEM value)
• A tolerance of e.g. 0.2 ml on glassware probably means thatvolume is guaranteed to be within this range. This is NOT a CI.

Question 15

Q

Using error integrals to reject data points
(Principles rather than details is important)

Answer

A

1) calculate the mean and standard dev. of all the points
2) then do t(suspect) = (mean — potentially rejected data point) / (standard dev.) this gives you how many standard deviations you are from the mean
3) Calculate the expected number of data points that are at least as bad as this one using n(1-P(t(suspect)). And so 100% — this answer will give you the expected number of points

Haven’t finished

Question 16

Q

Rejecting data points: pros and cons

Answer

Study These Flashcards

A

• Chauvenet’s criterion is an objective means of scrutinising ONE suspect data point.
• In general, it is good practice only to reject data points that are actual mistakes.
• If conclusions depend on rejecting data you don’t like, experiment is not robust. If conclusions are not significantly compromised, why mess with data?
• Best option is repeating suspect measurements, only using fancy stats as last resort.

Course 1 Part 2 Flashcards

(16 cards)