Statistical methods for chemical analysis Flashcards

(68 cards)

1
Q

What are statistical methods for chemical analysis?

A
  • Data
  • Distributions
  • Associations
  • Graphical methods
  • Hypothesis testing
  • Averages
  • Power
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some different data types and what is each one mentioned?

A

*** Nominal/ categorial **
o Data that you can put into a names category
-E.g., alive, or dead

*** Ordinal **
o Data that you can order (and categorise)
-E.g., Mild/ moderate/ severe

*** Interval/ ratio **
o Data that has a measurement (and you can order and categorise)
- Interval- differences between measurements are equal e.g., time, temperature
- Ratio- has a true zero so can be negative- e.g., heights, weight, percentage, concentration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 5 rules for significant figures in measurements?

A
  1. All non-zero numbers **are significant **
    e.g., 563 has 3 sig. figs.
  2. All zeros between non-zero numbers **are significant **
    e.g., 24006 has 5 sig. fig
    2.404 has 4 sig. fig.
  3. Leading zeros are **not significant **
    e.g., 0.0063 has 2 sig/fig
  4. Trailing zeros after a number are** not significant **
    e.g., 420 have 2 sig. fig.
  5. Unless there is a decimal point before trailing zeros
    e.g., 420.0 has 4 sig fig.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data is based on measurements that are uncertain.
Not all digits have meaning (are significant) and only those numbers derived from a measurement should be written down. For instance trailing zeros if written must have meaning.

When doing addition/ subtraction and multiplications/ divisions what are some ensurences with significant figures that need to be made?

A

**Adding and subtracting- **the answer must reflect the reliability of the least precise number
2.2 + 2.66 = 4.9 (rounded to the least number after the dp)- as only have precision to 2 sf

* Multiplication and divisions- report with the least number of significant figures
* 14 is not the same the 14.0- same value but different meanings about its trustworthiness

o 2.5 x 3.42 = 8.6 (calculator 8.55 2 s.f)
o 3.10 x 4.520= 14.0 (calculator 14.012)
o 5.042 x 20= 100 (calculator 100.84)- note 1 sf in answer
o 5.042 x20.0= 101 (calculator 100.84)-note 3 sf in answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do would you round 5 in the following instances:
* Less than 5
* Greater than 5
* 5
* Exactly 5 (followed only by zeros)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Whats the relationship between accuracy and precision?

A

There isn’t one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is accuracy?

A

A measurement of average difference between experimental value and true value

Differences are due to systematic errors

The true value must be known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Every measurmeent has an associated uncertainty
Whats precision?

A

How close measurements are to each other

The differences due to random errors

The distribution of the random measurements is **guassian or normal **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is the normal distribution of data described?

A

As the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is a histogram used?

A

For normally distributed data when a large sample size is used and this is better as leads to a bell-shaped curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sometimes a histogram can have skewed data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What kind of graphical method is used to compare groups and distributions?

A

Box and whisker plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different averages and what is each one?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for calculating the mean?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mean?
How is it calculates?
How is it represented in a box and whisker plot?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the mode and what type of thing do you have to look out for?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If a statistical test is carried out and it gets that p<0.05, what does this mean?

A

P<0.05 shows less that 5% chance that these two data sets came from the same distribution which suggest that they are different sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Draw a box and whisker plot and what does each aspect of itrepresent and how would the minimum and maximum otherwise be written?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

For standard deviations:
* What kind of data is it used for?
* What is it a measure of?
* What does it describe?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the equation for the standard deviation and variance?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the** 95% reference range**?

A

**Standard deviation **
o Gives an indication of spread
o 95% of observations with mean +/-2sd (actually 1.96 sd)
o 95% reference (normal) range; expect 95% of the samples to be within this range in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is standard error?

A

The standard deviation of the means of the representative data is known as the standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Graphical representation of a standard error

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the 95% confidence interval?

A

Standard error is a standard deviation
o Of means, rather than data observations
o 95% of means lie within the mean (of means) +/- 2se (**95% confidence interval) **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the equation for calculating standard error?
26
Error bars, deciding when to use them in your data
27
What is a **normal distribution data set**?
Data based on continuous distributions follow a mathematical distribution- usually a normal distribution
28
What do **parametric tests rely on**?
Parametric tests rely on the data being normally distributed- plot your data
29
What can you use if your data is not normally distributed?
If your data is not normally distributed you may be able to transform it mathematically, or use a **non-parametric test** E.g., log the data values, plot, and test for normality
30
What does the **central limit theorem** suggest?
Central limit theorem suggests that you can usually use parametric tests if you have a large sample size (>30)
31
When should a **non-parametic test** be used?
Non-parametric tests do not assume a particular distribution/normal distribution. Use these if your data is better represented by a median than a mean
32
Parametric tests normally assume that the variances in the sets of data are homogenous (homoscedastic). What can be done to support this?
o Use an **F test** to check o If **In doubt, use a non-parametric test **
33
What are **F tests**?
* F test looks to see if the ratio of the variances falls outside an expected level * Depends on the **degrees of freedom** (n-1) in each group and the variance (s2)
34
What are **F tests**?
* F test looks to see if the ratio of the variances falls outside an expected level * Depends on the **degrees of freedom** (n-1) in each group and the variance (s2)
35
When doing hypothesis tests, what is the first thing to consider?
Need to consider whether the data is **independent** (unpaired) or **dependent** (paired) o Patients given treatment V patients given placebo - 2 sets of independent data o Patients measured at baseline and then after treatment -1 set of data- the difference- normally distributed, even if the original data was not
36
What is a **null hypothesis**?
* The **null hypothesis H0** assumes that there will be no observed difference because of an experiment * The statistical test aims to look for evidence against the null hypothesis- a result that is so different from this distribution that we believe it has not occurred just by chance * For example, if a result falls into the extremes of the distribution we might be prepared to reject the null hypothesis * If the result does not fall into the extremes of the distribution we cannot reject the null hypothesis, but that does not mean that we accept the null hypothesis
37
Whats the **alternative hypothesis**?
* The **alternative hypothesis H1** assumes that there will be an observed difference as a results of an experiment * If, what we see, is not representative of the data distribution, then we reject the null and accept the alternative hypothesis * P<0.05- less than 5% chance of the measurement falling into the null hypothesis distribution * Result fall outside that 95% confidence interval
38
What is the general equation for a test statistic and what are some examples statistical tests which can be done?
* All statistics tests involve calculating a test statistic * Test statistic is compared with a particular distribution * E.g., **F test, T test, Chi squared test etc.**
39
Deciding what statistical test to use...
40
For a **t- test (or students t-test)** what does the distribution describe, what are they used to compare?
* The t test distribution describes sample data from the normal distribution * As the amount of data increases, so it approaches the normal distribution * T-tests are used to compare two sets of normally distributed data
41
What are the 3 different forms of t-tests and the equations for each?
**3 different forms of t-test ** o** Independent samples** t-test compares means of 2 different groups o **Paired samples **t-test compares means from the same group at different times o** One sample** t-test compares the mean of a group against the known mean
42
How do you calculate the degrees of freedom for multiple data sets?
**Calculating degrees of freedom of samples:** (number in sample A + number in sample B) -number of different data sets Degrees of freedom= n-1** (for one data set)**
43
One-tailed or two-tailed tests What percentage do they lie in, in the normal distribution curve?
44
If there are more than 2 groups to test, what is used?
**AVOVA**
45
What is AVOVA? What is it used to compare?
* **ANOVA (analysis of variation)** * Used to compare multiple groups in a single test- an extension of the t-test
46
What are the different types of AVOVA test you can have and what is each one used for?
*** One-way ANOVA-** compares 3 or more single independent variables *** MANOVA-** tests effect of one or more independent variable on two or more dependent variables o E.g., repeated measures over time in treated and placebo groups *** Null-** all sample means are identical *** Alternate- **at least one sample mean is significantly different
47
When the term 'power' is used in stats, what is this describing and what is a good level of power?
Power- How many samples do I need to test? *** Do I have enough power?** o Is my sample size large enough to detect a significant difference where a difference truly exists (although the truth is not known to you)? *** Questions to ask** o What power do I need? do I want to be 80% (80% power) that I will detect a difference in my test, if one really exists- or 90% sure? **o Power = Beta** o The higher the power, the more samples I will need
48
What level of significance do I want to set? If we decide that something that occurs is less than 5% of the time in an experiment is unlikely to be due to chance, then we set the p value at what? and alpha become what? If we feel we need to be more certain that this is not a chance event, then we should set the p and alpha values to what?
What level of significance do I want to set? If we decide that something that occurs is less than 5% of the time in an experiment is unlikely to be due to chance, then we set the **p value at 0<0.05 ** -**Alpha= 0.05 ** If we feel we need to be more certain that this is not a chance event, then we should set **p<0.01 ** **-Alpha= 0.01 ** ** The lower the p value set, the more samples we will need to detect a difference where one truly exists**
49
When is power the greatest?
When the variability is reduced
50
What different things may power be?
* Power is the probability of rejecting the null hypothesis when in fact the null hypothesis is false * Power is the probability of making a correct decision (to reject the null hypothesis) when the null hypothesis is false * Power is the probability that a test of significance will pick upon an effect that is present * Power is the probability that a test of significance will detect a deviation from the null hypothesis, should such a deviation exist * Power is the probability of avoiding a type II error (a false negative)
51
What is the equation for calculating power?
52
With power there are type I and type II errors, what are each?
53
With associations how is it decided what statistical test to use?
54
For categorical data the **chi-squared test** can be used, what is the equation for this test?
55
Associations- observed data
56
Associations- expected data
57
Associations- calculations
58
The associations flow chart when looking at relationships
59
How is correlation measured and what are the different types?
60
Associations- plotting data
61
Associations method comparison...
62
What is linear regression and what used to fit the line?
* Line is fitted using the **least squares method ** * Minimises the sum of squares of the residuals (the vertical difference of a point from a fitted line)
63
Predictions from associations
64
With associations there are r and R squared,what is each of these?
* **r is the correlation coefficient** o indicated the strength of the relationship between two variables o ranges for -1 to +1 where 0 is no correlations * **R square is the regression coefficient ** o Indicates how well the x variable can be used to predict the variable on the y axis o Ranges from 0 (poor predictor) to 1 (excellent predictor) o R squared= 0.8 implies that the y (outcome) variable explains 80% of the variation seen in the x (dependent) variable
65
In HPLC the principles of linear regression are used to predict the concentration of an analyte based on a standard curve In terms of validation of the method it is also important to determine the **limit of detection (LOD)** and the **limit of quantification (LOQ) **and this can be done easily in excel What is the **LOD and LOQ**
o LOS is the lowest amount of analyte that can be detected o LOQ is the lowest amount of analyte that can be quantified with reasonable accuracy and precision
66
Graphing the data-HPLC analysis of caffeine
67
Part 2
Part 3
68
Part 4