CHEMOMETRICS Flashcards
When do we use permutation tests or bootstrapping
when the observed data is sampled from an unknown or mixed
distribution
low sample sizes
Where outliers are a problem?
Too complex to estimate
the distribution?
Note this is an alternative to non parametric approaches
How do permutation tests work ? under what basis
Assume that if A and B are the same then labels don’t matter
so if testing to see if groups A and B are different
Steps:
1) calculate observed test (for example t test often non parametric- can be anything, ANOVA, quadratic etc) - called to
2) place all in a single group
3) - randomly assign to groups of equal size
4) calculate new test stat
5) repeat - for every single possible random placement into groups
6) arrand all the tests stats in ascending order - this is an empirical dist based on the data
7) if t0 falls outside the middle 95% of the empirical distribution then reject null hypo
What is an exact test vs approximate test in permutations?
exact does all the possible combos whereas approximate samples from all and samples some
What is Bootstrapping
Generates an emprical distribution but based off replacing the members of the original sample with other random members of the original sample (sampling with replacement) - basically just make a bunch of data sets with the same # of samples using those original values and that’s the equivalent of running the experiment a bunch of times - this way we can see where the data really lies instead of having just one set
(again can do with any stat)
What is Jackknifing
It’s a mean to estimate variance by doing subsampling (randomly leaving out samples from the set
What is K fold cross validation
used to validate a predictive model - splits data into K subsets each held out in turn as a validation set to test
What is a time series?
longitudinal data sets - over time - they plot the data (what happened) but also try to predict what happens next (forecast)
What are the steps in time series analysis
1) visualize data
2)Smooth /clean -
3)decomposition (eg if seasonally such as monthly or quarterly - can be decomposed into trend component (change in level over time)
4) show irregular components (not part of trend
What are trends people see in time series
They see additive trend (increase over time)
Additive seasonal (see it go up and down with seasons - almost sinusoidal)
and multiplicative trend (with seasonal gets larger/wider)
How are things smoothed in timem series
movign average - average points next to you - k = how many points
Exponential forecasting models
single - a series with constant level and irregular component (no trend or seasonal)
Double (holt) - exponential- series with a level and a trend
Triple (Holt Winters) exponential- series with level, trend and seasonal
Types of Error
I - alpha rejection of true null hypothesis (false positive)
II - beta - non rejection of false neative
What is LOD
lowest amount of analyte in sample that can be detected WITHIN a specific confidence level
is LOD agreed upon?
no - typically s/n relationship
Draw curves for signal to noise and blank and what shades represent what
Used for LOD determination - want stdev of blank but ours to be 3x that
So that we have a distribution over our blank - we want the lowest signal we analyze to be above that but how much overlap in dist?
we ideally want just a 5% overlap and to do that we need 3.3 stdev - that means our distribution overlaps with the blank distribution such that the portion in the blank is our BETA rate - false negative
and the region ov overlap in our sample dist is alpha - false positive.
Basically want a 5% overalp between the 2 so often 2 *sd of blank or 3.3 uis used to achieve that - so 5% for type I and type II error (type I is in sample Type 2 is in blank
Old answer:
LOQ vs LOD
10x
Calculate LOD or LOQ from signal to noise
need to use it with a nother method to verify
its mean + either 3 or 10 * stdev
if linear cal curve its 3.3 or 10 * stdev / b
slope of linear regression
What are selectivity and specitificity
selectivity - abiltiy of method to determine analyte in complex matrix without interference
Specificity - confirm the method ability to assess the analytes in presence of any other components that might be present (including matrix)
so specificity is selectivity +
Accuracy vs precision
accruacy - trueness or bias - measure of systematic error compare to reference,
Precision -closeness of repeated individual measurements under specified conditions
How to run accuracy and rpecision tests
against standard material want accruacy within and between run - bias - use a low and high QC
Precision - use % CV
ROBUST what is it
capacity of method to be uanffected by natural variation - test over range of parameters
UNCERTAINTY
sig source must be identified and tabulated
2 types
A and B
A is random
B is systematic
example - user skill, sampling, environe , instrument, etc
Stability
use QC - store at room temp, 4 cetc test against fresh
HOW DO WE HANDLE NON DETECTS
Exclude or delete from data set (worst)
Substitue (0, 1/2 LOD , LOD etc
Left and right indicate whether its too low or too high in terms of an unknown