CHEMOMETRICS Flashcards

1
Q

When do we use permutation tests or bootstrapping

A

when the observed data is sampled from an unknown or mixed
distribution
low sample sizes
Where outliers are a problem?
Too complex to estimate
the distribution?
Note this is an alternative to non parametric approaches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do permutation tests work ? under what basis

A

Assume that if A and B are the same then labels don’t matter
so if testing to see if groups A and B are different
Steps:
1) calculate observed test (for example t test often non parametric- can be anything, ANOVA, quadratic etc) - called to
2) place all in a single group
3) - randomly assign to groups of equal size
4) calculate new test stat
5) repeat - for every single possible random placement into groups
6) arrand all the tests stats in ascending order - this is an empirical dist based on the data
7) if t0 falls outside the middle 95% of the empirical distribution then reject null hypo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an exact test vs approximate test in permutations?

A

exact does all the possible combos whereas approximate samples from all and samples some

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Bootstrapping

A

Generates an emprical distribution but based off replacing the members of the original sample with other random members of the original sample (sampling with replacement) - basically just make a bunch of data sets with the same # of samples using those original values and that’s the equivalent of running the experiment a bunch of times - this way we can see where the data really lies instead of having just one set
(again can do with any stat)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Jackknifing

A

It’s a mean to estimate variance by doing subsampling (randomly leaving out samples from the set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is K fold cross validation

A

used to validate a predictive model - splits data into K subsets each held out in turn as a validation set to test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a time series?

A

longitudinal data sets - over time - they plot the data (what happened) but also try to predict what happens next (forecast)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the steps in time series analysis

A

1) visualize data
2)Smooth /clean -
3)decomposition (eg if seasonally such as monthly or quarterly - can be decomposed into trend component (change in level over time)
4) show irregular components (not part of trend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are trends people see in time series

A

They see additive trend (increase over time)
Additive seasonal (see it go up and down with seasons - almost sinusoidal)
and multiplicative trend (with seasonal gets larger/wider)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are things smoothed in timem series

A

movign average - average points next to you - k = how many points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Exponential forecasting models

A

single - a series with constant level and irregular component (no trend or seasonal)
Double (holt) - exponential- series with a level and a trend
Triple (Holt Winters) exponential- series with level, trend and seasonal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Error

A

I - alpha rejection of true null hypothesis (false positive)
II - beta - non rejection of false neative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is LOD

A

lowest amount of analyte in sample that can be detected WITHIN a specific confidence level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

is LOD agreed upon?

A

no - typically s/n relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Draw curves for signal to noise and blank and what shades represent what

A

Used for LOD determination - want stdev of blank but ours to be 3x that
So that we have a distribution over our blank - we want the lowest signal we analyze to be above that but how much overlap in dist?
we ideally want just a 5% overlap and to do that we need 3.3 stdev - that means our distribution overlaps with the blank distribution such that the portion in the blank is our BETA rate - false negative
and the region ov overlap in our sample dist is alpha - false positive.
Basically want a 5% overalp between the 2 so often 2 *sd of blank or 3.3 uis used to achieve that - so 5% for type I and type II error (type I is in sample Type 2 is in blank

Old answer:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

LOQ vs LOD

A

10x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Calculate LOD or LOQ from signal to noise

A

need to use it with a nother method to verify
its mean + either 3 or 10 * stdev
if linear cal curve its 3.3 or 10 * stdev / b
slope of linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are selectivity and specitificity

A

selectivity - abiltiy of method to determine analyte in complex matrix without interference
Specificity - confirm the method ability to assess the analytes in presence of any other components that might be present (including matrix)
so specificity is selectivity +

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Accuracy vs precision

A

accruacy - trueness or bias - measure of systematic error compare to reference,
Precision -closeness of repeated individual measurements under specified conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to run accuracy and rpecision tests

A

against standard material want accruacy within and between run - bias - use a low and high QC
Precision - use % CV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

ROBUST what is it

A

capacity of method to be uanffected by natural variation - test over range of parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

UNCERTAINTY

A

sig source must be identified and tabulated
2 types
A and B
A is random
B is systematic
example - user skill, sampling, environe , instrument, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Stability

A

use QC - store at room temp, 4 cetc test against fresh

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

HOW DO WE HANDLE NON DETECTS

A

Exclude or delete from data set (worst)
Substitue (0, 1/2 LOD , LOD etc
Left and right indicate whether its too low or too high in terms of an unknown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is survival analysis and NADA
- how long will it be before event occurs (eg medical) NADA is non detects and data analysis
26
Fit What is fit for purpose
ensures the analytical method fills certain criteria of reliability and can can perform - gives us confidence shows its reproducible and repeatable (so as a list reproducible, broad coverage - sensitivity and selectivity, linearity, precision, stable)
27
What are some key poitns to precision testing
sample should be stable and homogenous (representative of whats bein tested) Should be applied to the whole sample preparation method analysis procedure 2 factors - precision estimate and design of precision experiment
28
What are the types of precision estimates
1) REPEATABILITY -within batch or intra assay - one analyst on the same equipment over a short time period 2) Intermediate precision - made in a single lab but variable conditions different days, analysts, equipment etc - within lab reproducibility 3) Reproducibility - DIfferent labs - different equipment (interlab)
29
Types of precision experiments
Simple replication - repeated measurements on a suitable sample - want 6-15 reps NEsted design - used when cant generate enough reps with simple replication (not feasible) - basically each batch has different params - so can be inter lab, intra lab etc
30
PRECISION limits - what are and howto calc
repeatabiltiy limit = r = t* root(2) *s confidence interal 95% for difference between two results obtained under repeatbility conditions Reproducibility is R = t*root(2)*sr t is 2 tailed students t tested for confidence level and DOF, They are calculated by multiplying the repeatability standard deviation (sr) or the reproducibility standard deviation (sR) by 2.8 respectively. The factor 2.8 is derived from 1.96 (95% of the population is within 1.96 standard deviations of the mean) times the square root of 2.
31
How to statistically evaluate precision estimates
F test
32
What is bias and how calculated and how evaluated
difference form true value - so just mean - accepted - can be % t test statistic
33
Ruggedness study - how evaluated/set up
PLACKET BURMAN - 7 parameters to study - you pick (eg extraction time) each has levels (eg 30 min extract vs 10 min) to investigate effect - difference between average of results of parameter at normal level vs average of results at alternate level
34
Measurement of uncertainty - what is and how tested
dispersion of values possible for measurement - eg stdev can be propgated
35
ROC curves
OK so a ROC curve is a plot of TP RATE against FP rate we often take AUC - area under curve AUC ranges in value from 0-1 - a model with 100% wrong predictions has an AUC of 0 and has an AUC of1 if 100% right REceiever operating characteristics - evalute prediction accuracy of classifier model - tradeoff between sensitivity and specificity - the same LOD vs blank curve formula TPR = TP / all tP FPR = FP / all FP area under cuvrve AUC is the thing An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: True Positive Rate. False Positive Rate https://en.wikipedia.org/wiki/Receiver_operating_characteristic
36
What is metrology
formal system to enable informed decision through data assessment - levels of confidence to what we're doing - reliable network of measurements for use to confidently make assessments about concentration
37
3 fundanmentals of metrology
1) TRACABILITY - SI - translates units to results - go from standards to where you are (higher order to CRM's 2) UNCERTAINTY - (of measure) -using the rsd in results to make claims not individual points - the dist 3) VALIDATION (methods et)
38
What is QA and exmaples
he planned and systematic activities implemented in a quality system so that quality requirements for a service will be fulfilled, quality assurance occurs before the data is collected eg suitable lab environment, educated staff, training, documented and validated methods, preventative actions, etc
39
What is QC
Quality Control: the observation techniques and activities used to evaluate and report quality, quality control occurs during and after data is collected examples: blanks, spiked samples, controls, reference materials etc
40
Example of system suitabiltiy testing
small number of standards - acquire data for accuracy precision - not in bio, m/z accuracyRT peak shape are assessed
41
Dispersion ratio - what is it
Standard deviation for pooled QC sample vs test sample stdevso can use the D- ratio (MAD of QC/MAD of sample) and AD of 0% means technical variance Is 0 - perfect measurement all cahgnes are due to biological cause AD of 100% means all variance is due to noise - no bio info
42
Whats pooled QC
generated a single QC sample that can be distributed evenly throughout analytical batch
43
Batch design and pooled QC - what can you do
basically run QC throughout to see time based variance
44
Main uses of reference materials
examine skills of analyst cOntrols precision accuracy accreditation means uncertainty
45
How to score profiency testing results
2 steps - specify assigned value and setting the standard dev - ASSIGNMED VALUE - can either be known (CRM), REFERENCE value ( one lab determines) or it can be determined based off consensusfrom other labs STDEV - set by scheme organizer - set by prescription or based on the results of a reproducibility experiment, from a general model (eg horowitz funct)
46
What is Z and Q in proficiecny testing
Z score is what we thinkg - value minus mean divided by stdev z less than 2 (abs value of Z) pretty good - less than 3 - hmm questionable Q score - alternative to Z - takes no account of stdev - dsit of Q centered on 0 then - relies on EXTERNAL PRESCRIPTION of acceptability
47
WHat is a YOUDEN plot
scatter plot - plots results from multiple labs on graph to show if labs are equals, outliers, inconsistencies etc x and y each represent one of the reported values (eg concentration of analytes A and B) draw lines parallel to x and y axis and depending on where they are indicates - various things about results - eg random error vs systematic error
48
SHEWHART plot
sequential plots of observatiosn from QC material analyzed in succesiely - mean QC for each run and measurement # (y axis shows the mean
49
General princiiples of experimental design
Resaerch method where manipulate independant variables and look at dependant variable things to do: arrange experiments for cancellation or comparison..? - bias plan to do replication or independent uncertainty estimates (precision) Need statistical analysis or approach
50
Experimental designs list 4
Simple replication - series of observations on a single test material Lienar calibration design - observations at a range of levels (some quantitative factor) Nested: - has levels of factors in unique to that level Factorial - has factors or levels but not wholly distinct - eg one group can be one factor, another can be another and and another group can be both factors at once
51
Why do we randomzie in expperimental design
to minimize nuisance effects - unwanted effects that influence the results - eg not effected by ordering/sampling order i
52
What is blocking
Basically have all replicates/groups of test materials subject to same nuisance effects (run at the same time - eg we have sets a b and c
53
What is blocking
Basically have all replicates/groups of test materials subject to same nuisance effects (run at the same time - eg we have sets a b and c - we can run them separately or run all in the same trial so subject o sam eeffects
54
Sampling theory (define randomization, representation and composite)
Randomization - equal membrs of pop - equal chance for selection Representation - have enough of a population to draw inference on total pop Composite - reduce effort by combining individuals to make a subset
55
List different sampling strats
Simple - everything equal chance (easy but not great for long continuous sequences also doesn't reflect sub groups in population) Stratified - divide pop into segments and randomly sample each segment - good because minimizes variance further - can get unique pockets Systematic - First select random m then further ar at a fixed interval -simple and easy - regularly covers everything - cannot deal with any number specific variation - will miss it
56
4 quantities of power analysis
sample size significance level (alpha 0 probabiltiy of making type I error) Power - one minus the probability of making a type II error (probability of finding an effect is there effect size - magnitude of the effect under alternate research hypothesis
57
How do you determine how many participants are needed for a study
power.t.test power package - theres a test you can do - uses sig level, power level etc can do for various tests, ANOVA need means, common error variance etc, anova, linear regression chi squared etc
58
What is proportionality constant k
basically signal from instrumetn = the concentration * this factor
59
What is single point cal
basically just using this proportionality factor - just one point (S = k*C)- I guess also by default does through 0 then
60
Sensitivity from calibration curve
sensitivity is the slope b - capiabiltiy of responding reliably across changes in analyte concentreation
61
What is r in cal curve -
Its the pearson correlation coefficient - to describe relationship of response and concnvertation - 1- -1 describing correlation R^2 measure how close data fits to linear model - 99% means 99% of difference variability in our responseis accounted for by changes in concentration
62
How to evaluate matrix effect
take sample matrix - extrat and spike sample in - compare to a normal standard solution (response/response) -1 ) - if neg value suppression OR can do spiked recovery - compare matrix unspiked to matrix spiked (in same matrix) - this is (spiked sample - unspiked ) / Cadded x100
63
Tyes of blanks -
method blank -unspiked sample reagent blank -ust solvent afield blank - unspiekd sample goes for trip (trip same but unopened)
64
Weighted regression
error with a emasruement proportional to conetration so with larger concentration more error so we give more weight to points where error bars are smallest for higher weights (divide by n -
65
Methods of standard addition
make cal curve in sample -
66
ISTD
strucutre nalogue, Stable isotop elabeled
67
Isotope dilution
absically do consecutive dilutions to make inteernal standard - - same ISTD in all samples
68
Multi LDR
basically if not large enough - make 2 curves
69
OMICS quantitation - how
no calc curve so do - response IS/conc IS = response target/conc target
70
What is a neural net
series of algorithims designed to recognzie underlying relationships ina large data set (input, hidden layer - outut
71
What is lachine learning
compute rprogram that improves its performance in a task through experience
72
4 ingredients of machine learning
1) data 2) a model that specifies how input data related to output 3) a loss function - shows how well model performs 4) optimization algorithm - so it can improve the model and minimize the loss function
73
Wat is overfitting (also udnerfitting)
your mode matches the training set too closely and isn't generalizable (underfitting is if too loose - wrong assumption made)
74
What is supervised machine learning and what are common types
You tell it to develop mdoel based on input AND output eg classification or regression
75
What is a decision tree
binary splits on predictor variable to create a ree and classify observations into one of two groups (repetitively) this way we can choose the predictor that best splits the two groups - want HOMOGENEITY in each group maximized (eg the groupings make sense)
76
What is aconditional inference tree
splits based on signifigance tests
77
What is Random Forest
Ensembe learning aproach - uses multiple learning approaches to improve classification rates
78
What are support vector machines
SVMS are UNSUPERVISED machine learning - for classification and regression - seeks for optimal hyperplane for separating two classes in multidimensional space
79
What is a confusion matrix
a matrix of basically : True Neg False Pos False Neg True Pos
80
Stats from a confusion matrix
Sensitivity - TP / (total actual positive (FN+TP) Specificity - TN/ (total actual negs (TN+FP)) False positive rate - FP / (TN+FP) Precision = TP / (TP+FP)
81
What is PLS-DA
supervised pattern recognition - its is partial least square discriminant analysis - asks if groups are different and which features explain
82
How do distance based clustering methods work
1- calc a centroid 2- distance from each point to centroid of each group is calculated 3- sample assigned to group of closest centroid
83
Clustering - is it supervised or un - and describe it
unsupervised - data reduction technique - exaclty what it sounds like - cluster your obs
84
2 types of clusterings - bottom up and top down explain
85
How to normalize for clustering
scale standardize to a mean of - and sd of 1 divide by max
86
Common steps for cllustering
normalize screen for outliers calculate distances
87
What is a dendogram
clustering but in a clade kind of
88
pros and cpns of dendogram
finds comapct clusters, sensitive to loutliers - need to remember interp that makes clustering make sensr
89
What are the different linkage types
single, complete, average , centroid
90
How to interpret dendo grams
heigh indicates order joined, read from bottom up - height reflects distance
91
What is k means clustering
select k centroids - assign each data point to closest centroid - recalculate the centroids as the average of all data points in a cluster assign data point to closest centroid - continue steps 3 and 4 until observations no longer rassigned
92
Partitioning around medoids
K means is based on means so suscpetibleto outliers PAM is k means but uses median as observation not mean
93
List the variable types
Continuous - numeric across any set of numbers Ordinal - categorical but can be ranked eg grades nominal -are categorical and cant be ranked counts- are non negative integers (come from counting not ranking)
94
How do you test for sig frelationship between two nominal (categorical) variables)
Chi Squared again interpret the p value (p value are probability of obtaining the sampled result so less than 5 means less than 5% chance this is a false positive (low chance they independent)
95
Chi square limitations
should be used when observations greater than 50 and individual expected frequencies are no fewer than 5 - so BIG things
96
What is Fishers test for indeoendance for
nominal or categorical variables for small sample sizes
97
What is cochran mantel haenzel
Test for 2 nominal variables conditionally independant in each stratum of a 3rd variable
98
What is measure of association
for nominal variables - if you have a significant result from an independence test can test strength of that relationship eg can use for chi quared
99
What is a mosaic plot
visualize data sets with 2 or more categoirlca variables - colors shadings, size etc all use to demonstrate things
100
What are generalzied inear models vs logistic regression
linear models but for categorical variables where dist isn't normal often the variable can be categorical like binary or different groupings or categories (group A group B) or OUT come variables that count up and take a limited # such as traffic accidents - not often distributed normal LOGISTICS REGRESSIon - is used when the response Is BINARY
101
Overdispersion what is it
when observed variance is larger than what it should be leading to inaccurate significance testing can test with deviance in R - if the value is close to1 no dispersion
102
What is a poisson regression
used where response variables is # of events to occur or counts - so you have y being a response and x is predictor variable interpret the results: its a log value so eg if we have an x value that gets an estimate value of 0.022 - that means that a 1 increase in our x value is increased with a 0.022 increase in log mean # of y
103
What is PCA and what is it used for
Unsupervised multivariate (encompasses simultaenous observations and analysis of more than one outcome) - for high dimension data - used to identify patterns every feature is used to calculate principal components (so dimension reducing approach to summarize large data)
104
What type of data can beanalyzed with PCA
mutlidimensional data sets (usualyl 2 groups, 3 reps each - biological reps, technical reps, profile analysis etc
105
How is PCA done -on a base level
looks at variability of a feature or variable across samples - and does that for each variable. plot all observations on plot - and draw the lines with the best fit -minmizes the distance (it maximizes variances..?) - we make PC 2 perpindicular but calculated the same way (which one is best fit - keep going until stop
106
How do the PC's in PCA compare
PC 1 is the most important and captures the most info
107
What is an issue with PCA
you give up some accuracy - since you are using less data reducing it down (parsimony want to explain the data with the least # of qualifiers
108
How are PC's calculated
based on the variance (which puts it on the p[lot) and the magnitdue of that (eg how much does it influence the PC (eg if looking at genes and a result from one - those with the greatest variabiliy have the greatest impact on the PC's
109
What do you need to consider before doing PCA
Scaling - (to make variables and the magnitude of influence comparable) - eg can do log transformation, mean centering etc Overla - TRANSFORM - CENTER- SCALE so transform ypically log center - subtract mean from each and scale by dividing by stdev
110
What is a SCREEE plot
Line graph shows the proportion each PC accounts for variability generally has elbow shape as first one or two generally suggest most of the variance with the first showing the most (the first 3 should be 80% or else its not a great PCA - maybe do something else)
111
What is a PCA scores plot
scres calculated for each PC plotted against each other (generally just show PC1 and PC2 - kind of like a corrgram - so plot PC 1 on the x and PC2 on the y for example to compare the influence on the data
112
What is a PCA LOADING plot
shows all observations and demonstrates which features most greatly influence the PC scored (so a plot for each PC) ( the farther from the origin the greater the influence
113
What is a PCA biplot
combination of scores plot and loading plot (essentially superimposed upon each other
114
What are some outlier tests and what do they do
Dixon (Q ) single - for small data Grubbs - Iglewicz Hoaglin - robust test for multi outlier - 2 sided - z score
115
What makes a non parametric method non paremtric vs robust
non parametric use the median robust - based on the idea that sample pop is in fact NORMAL but has significant outliers
116
When to use non parametric
small data sets different dsistrubtions categorical
117
What is the wilcoxon signed rank
paired t test (two sample paired t test (non parametric)
118
What is the mann whitney U
two sample indenpendant t test (non parametric)
119
What is kruskal wallis
non parametric ANOVA (one way)
120
What is SPearman rank correlation
non parametric pearson correlation
121
Local Regression LOWESS and LOESS
regression analysis
122
What is regression used for
describe relationship 0 gie an equation
123
What is a residual in regression
signed difference between observed and fitted value
124
What is correlatoin coefficient
degree of linear assoc between x and y variables
125
What is second order polynomial regression
DOF n-3 and 3 params a b and c term (cx^2 bx +a
126
Scatter Plot matrix
Plot linear relationship between a whole bunch of variables
127
Bonferroni adjust p value what does it mean
adjust p value based on # of tests doing
128
What is hat statistic
p/n - shows high leverage or outliers
129
Covariance what is
tells you how 2 data sets change together in tandem
130
Correlation
Tells you when a change in one variable leads to another
131
Covariacne vs correlation
covariance is affected by change in scale covariance keeps units - 0 when independent for both correlation descibres the degree to which 2 variables move in sequence
132
Assumptions for correlation
Normal dist
133
What is a corrgram
shows a bunch of cariables against each other also scatter plot matrix
134
correlation does it equal causation?
no - causation means it causes it directly
135
What is ANOVA used for
analysis of variance - have variance associated with 2 or more things eeg an population means of groups all equal or not equal data grouped by factor like dose
136
How do we calculate variance in anova (or what types are there)
theres with each analyst , within group factor and between group factor
137
What are assumptions for ANOVA
independance of observations normality of residuals homoscedasticity
138
ANova null and alt hypo
null all means the same - alt they different
139
What test do we get from ANOVA
we get F - compare F calc to F crit - if f calc is less than f crit - no sig difference
140
What is a post hoc test
tukeys hsd tels you whats different
141
What is aconfounding factor
a variable that could also explain group differences on the dependant variable - we are not interested in this -its a nuisance variable - want to remove it
142
What type of anova deals with confounding factors?
ANCOVA - add your nuisance as a covariate
143
What is ANOVA with MULTIPLE dependant variables
MANOVA - multivariate analysis of variance
144
what is MANCOVA
multivariate with covariate
145
What are ANCOVA assumptions
Linearity between covariate and outcome variable at each level of the independent variable (so basically all of your groups of the dependant need to be equally influenced by our covariate - or more like it does in fact effect our dependant variable at each independent variable level) Homogeneity of regression slopes - sloesp of covariate against outcome variable should be same across groups (so basically no interaction of dependant variable and covariate - same effect across all independatn levels) Outcome variable normal Dist Homoscedascitiy
146
What is 2 factor ANOVA
AANOVA - but with subjects assigned to two groups that are a cross classification of independent variable levels eg for TOEFL scores as outcome can initially have one independent variable - educational level (3 groups in there) but then can add another group - learning styles (which has 4 groups in there
147
2 way or factor ANOVA - assumptions
1 Dependant variable continuous 2 Both independant variables should have >- 2 levels 3 Independance of observations 4) dependant variable normally dist for each combo of independent variables 5) - homoscedasticity 6- balanced design
148
What are 2 way anova hypotehses
no difference in meanas of factor A no difference in means of factor B no interaction between A and B
149
What is 2 way factorial anova?
grouped again 2 dependant variables but cross classification between the two (eg can have medicine type but also dose - (so can have 2 med types and then 3 dosages for each emd type
150
Interaction plot
2 way anova plot
151
repeated meaures vs replication
repeated measuer is different measures on the same subject, replicates is with replicates
152
what is MANOVA
multivariate analysis - means that we have 2 dependant variables - looking at two factors (can combine with others MACNOVA< 2 way mmanova etc
153
MANOVA assumptions
independant observations no outliers for outcome multivariate normality no multicolinearity (dependant variables cant be related)j inearity between all outcome variables for each group homogeneity of variances
154
What is a mahalanoblis plot
if data follows a multivariate normal dist - data points should fill on the line
155
How do you post hoc Manova
univariate one way anova for each outcome or TUKEY
156
what are teh 5 descriptive stats:
Frequency.counts (mode) central tendency/location (mean) dispersion (stdev) position (quartiles - medians etc) shape of observations (skew and kurtosis)
157
Steps for sig testing
state null hypo state alternate hypo check dist select test choose sig level calc stat obtain crit value and compare
158
What is cohens D
measure effect size
159
What is distribution
function that shows possible values fora variable and tendency to occur
160
normal dist stdev
68%, 95 then 99
161
what is z scores
value - mean div by stdev (basically normallie to the dist)
162
How to get probability of value occufring
get z value - escribes area to the left look up in table
163
what is central limit theorem
dist of means gets closer and closer to normal the bigger the sample size
164
What is skew
taling and fronting
165
kurotiss
is peakedness - can be narrow or flat
166
How to test for normality
graphically histogram, QQ plot stat test anderson darling etg