Course 1 Part 2 Flashcards

(16 cards)

1
Q

How often do you have enough data to obtain a smooth normal distribution

A

Almost never (with center X and width parameter sigma)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you calculate the mean

A

x̄ = (x1 + x2 + x3 … + xn) / n

As n increases the mean gets closer to the true value of the parent (ie perfect) distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to calculate spread of results?

A

dᵢ = xᵢ - x̄
The deviations will be positive and negative (random errors), and, by definition, sum to 0

dᵢ2 = (xᵢ - x̄)2 is done by squaring all values to give a non 0 value used in standard dev

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Standard deviation equation

A

σ = sqrt ( (Σx2 / n) - x̄2 )

σx = sqrt ( (1/n-1) x sum of dᵢ^2)
As n increases σ tends towards the width parameter of parent distribution
Don’t calculate σ if n < 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample standard deviation

A

If the mean has been calculated from the data us the sample standard deviation
n
σ (subsctipt n-1) = sqrt( (1/(n-1) × Σ(xᵢ - x̄)2
i=1

• Difference between population and sample std. dev. decreases with larger n.
• The term variance is often used for σ2. Be careful with variances unless you are clear about n vs. n – 1!

  • The -1 accounts for having used information from the data (a “degree of freedom”) in determining the mean; can be shown that the population std. dev. underestimates σ.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Standard error of the mean (SEM) - very important

A

• As we increase the number of data points (n) the mean (x̄) tends towards X, and its standard deviation (σx) tends towards σ.
• It can be proved that the (hypothetical) distribution function characterising x̄ is also a normal distribution centred on X but with width parameter σ/sqrt(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Central limit theorem (diversion)

A

• Even if parent distribution is not Gaussian, the distribution function for the sample mean tends quickly to Gaussian; this is the Central Limit Theorem.
• E.g. adding results from just 4 dice gives a very Gaussian-like shape.
• Since measurements generally involve multiple sources of uncertainty, the CLT allows us to use simple statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Equation for standard error of the mean

A

σx̄ = σx/sqrt(n)

Standard error is vague (is the uncertainty on x or x̄?)
This equation however measures the precision to which we know x̄. Precision increases with n

Therefore quote results as x̄ ± kσx̄
K is “coverage factor” and usually =1 or 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to put σx̄ into your answer
Eg. E = 12.34 ± 0.05 J and E = 12.34(5) J

A

E = 12.34 ± 0.05 J - uncertainty is given to 1 standard dev.
x̄ σx̄

E = 12.34(5) J

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Uncertainty on the uncertainty
What are the rules?

A

• The SEM tells us about the uncertainty on the measured mean
• What is the uncertainty on the uncertainty (or “error on the error”)?

Hence the following rules
• Uncertainties are quoted to only one figure
• But good practice to use 2 figures if leading digit is 1
• Note uncertainties are rounded UP, not to nearest figure.
• 2 figures may be justified for a very large number of measurements (e.g. n > 500) e.g. NIST** web site gives
elementary charge e = 1.6021766208(98) x 10-19 C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Golden rules for reporting measured results

A
  1. The best estimate of a parameter is the mean
  2. Its uncertainty is the standard error in the mean (SEM)
  3. Round the uncertainty up to the appropriate number of significant figures (usually 1)
  4. Match the number of decimal places in the mean to the uncertainty, and apply “normal” rounding
  5. INCLUDE UNITS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are conifidence intervals?

A

Integrate between x2 and x1 for function G(x) ie the normal distribution to get the probability

Widely used limites are:
1 times the σ = 68% probability that measured value is within σ of the true value
2 times the σ = 95 %
3 times the σ = 99.7% of the data are within this

(Ie there is only a 68% chance value lies in 12.6 +-0.2 range)

Note the parent distribution is not normally known so limited practical relevance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Key ranges in the confidence interval

A

68% probability that true value (μ / X) is within 1 x SEM (σx̄) of x̄
95% probability that true value is within 2 x SEM of x̄
99.7% probability that true value is within 3 x SEM of x̄

Side note:
2.3 ± 0.1 is termed a conifidence interval (CI) so can also write results x = 2.3 ± 0.2 (95 % confidence interval) recommended for k > 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Confidence intervals vs tolerances
- important

A

• x = 2.3 ± 0.1 without qualification is sadly ambiguous* (ie not sure if its 68% or 95% etc) however x =2.3(1) is not as this is always k =1
• If this is a confidence interval with k = 1, then 32% chance that true value of x is <2.2 or >2.4!
• If measuring against a scale by eye, often say we are measuring to nearest scale point (e.g. mm). Usually fudge this into a confidence interval by writing L = 23.0 ± 0.5 mm where 0.5 mm (half scale interval) is a fake SEM.
• Values associated with equipment are tricky:
• ± 1 mV might mean that true voltage is 99.7% guaranteed to within 1 mV (effectively a CI with k = 3) (therefore need to divide answer by 3 to get SEM value)
• A tolerance of e.g. 0.2 ml on glassware probably means thatvolume is guaranteed to be within this range. This is NOT a CI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Using error integrals to reject data points
(Principles rather than details is important)

A

1) calculate the mean and standard dev. of all the points
2) then do t(suspect) = (mean — potentially rejected data point) / (standard dev.) this gives you how many standard deviations you are from the mean
3) Calculate the expected number of data points that are at least as bad as this one using n(1-P(t(suspect)). And so 100% — this answer will give you the expected number of points

Haven’t finished

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Rejecting data points: pros and cons

A

• Chauvenet’s criterion is an objective means of scrutinising ONE suspect data point.
• In general, it is good practice only to reject data points that are actual mistakes.
• If conclusions depend on rejecting data you don’t like, experiment is not robust. If conclusions are not significantly compromised, why mess with data?
• Best option is repeating suspect measurements, only using fancy stats as last resort.