Stats Year 2 Flashcards

(140 cards)

1
Q

what are 4 claims of science

A
  • rationality: rational methods
  • truth:
  • objectivity: can be tested and verified
  • Reality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is science

A

understanding/acquiring more knowledge of world through observations and experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is scientific method?

A
  • method which provide rational truth about science/world
  • Inquire using PEL method: Question, -> hypothesis set -> presuppositions (parameters) + Evidence -> Logic -> conclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the PEL method

A

Inquire using PEL method: Question, -> hypothesis set -> presuppositions (parameters) + Evidence -> Logic -> conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is deduction and induction logic?

A
  • deduction: given model to infer expected data. E.g every mammal has a heart, every horse is a mammal. Hence everyhorse has a heart
  • Induction: have data/observed data to infer or come up with a model that represents or describes data. Eg: every observed horse has a heart, conclusion: everyhorse has a heart
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is statistics

A
  • methods to study and measure nature of the world/universe.
  • Methods to PREDICT and ESTIMATE in given/measurable parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is parameter?

A

a quantity of interest: i.e number of viruses, volume of water…etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does stats allow us to measure?

A
  • often we cant predict what we dont know even if we are given some known facts
  • stats measures uncertainty in an estimate of a real value in a parameter population
  • uncertainty = probability of the estimate we obtrained with data/sample which truly reflects the actual value
  • probability of truth
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is data

A

measurement of a variable in a sample or census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the types of data/variables

A
  • categorical data (not numbers): nominal variable (qualitative + no order); ordinal variable (qualitative + order)
  • numerical data (numbers/quantitative): discrete variables, continuous variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a variable

A

measurement/characteristic of interest in a population/census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the types of variables in experiment

A

explanatory (independednt - on X axis), response (dependent variable - on Y axis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is observational and experimental studies?

A
  • observational studies: cause effect not yet defined. first observe and record variables of interest: then measure and correlate/associate.
  • experimental studies: established treatment and control groups, measured and test hypothesis of a cause and effect. random samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

pros and cons of observational vs experimental studies

A
  • Obersvational
  • pro: reflects actual present event, measures many variables simultaneously
  • con: cannot establish causation
  • Experimental
  • pros: controls variables of interest to establish causation; limits other variabilities controls other factors
  • cons: does not reflect actual present/natural setting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is statistical inference?

A

aim of stats is to use data from a subset/sample of population to infer truth (characteristics or parameters) about the population/census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

population vs sample

A
  • population: entire collection of units that we want to research a parameter(s) about
  • sample: subset of units from population which estimates a population parameter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

whart is sampling error?

A
  • the deviation of an estimate from its truth in its population parameter
  • the fact that the estimate is different to the truth
  • estimates based on samples are rarely exactly equal to the true population values, bcuz a sample does not capture every member of the population
  • precision is DIRECTLY RELATED to sampling error
  • Accuracy/bias is NOT related to sampling error, it is related to systematic error (error with sampling method)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is relationship of sampling error and standard error

A
  • sampling error measures deviation from truth, which in related to standard error
  • The standard error measures the variability (or standard deviation) of a sample statistic from sample to sample and provides an estimate of the sampling error.
  • For example, the standard error of the mean estimates how much the sample mean is expected to vary from the true population mean due to sampling error.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is precision and accuracy

A
  • precision is related to sampling error and standard eror, and is the SPREAD of estiates from sample, DUE TO sampling error
  • Accuracy: systematic error/BIAS. Something wrong w method, ccausing estimate to not reflect population. DUE TO BIAS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how to reduce bias/increase accuracy?

A
  • random sampling
  • placebo
  • standardize methods
  • inaccuracy can be due to chance and small sample size (not neccessarily bias, but results can seem biased), hence increasing sample size will increase accuracy BUT NOT REDUCE BIAS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

how to increase precision/reduce standard error/sampling error

A
  • larger sample size
  • more sample trials
  • smaller deviation and less standard error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is random sample vs sample convenience. Why sample random?

A
  • random sample: INDEPENDENT selection of units, each unit have equal chance of being selected
  • sample convenience: volunteers, by opportunity
  • random sample REDUCES bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is frequency distibution?

A
  • histogram: records actual data from sample
  • discrete + continuous
  • different to PDF as it doesn’t predict, but simply used to RECORD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is probability density function?

A
  • a maths model/function which estimates/predicts probability of a random variable being a certain value in the population
  • used for continuous random variables -> hence probability of a specific data point occuring is 0
  • estimated data from population
  • distribution of probabilities which might occur
  • models e.g: uniform distibution, normal distibution…etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what are ways to measure central tendency and their related ways to measure spread?
1. median: uses IQR, range 2. mean: often used in normal distributions, where mean = median = mode. Uses variance, or standard deviation (variance square rooted). j rmbr variance > square root; standard error: standard dev but for sample means 3. mode
26
what is variance
a way to measure distribution of spread, how much an individual of a population deviate from mean (or other central tendancy, normally mean tho) - standard dev, variance, standard error for continuous - variance = (SD)^2 - SD= root variance - split into sample variance and population variance, depending on distribution
27
what is sample variance and what is population variance
* sample variance (divided by n-1, to reduces bias and standardizes variance so it depends on sample size. * sample variance is the variance used to calculate spread of a sample distribution, population variance is used to calculate spread of a population distribution * larger variance/SD = more spread out data is
28
equation for sample variance, and sample standard deviation?
* standard deviation for sample is n-1, for population is N * SD instead of variance is to linearize the squared/quadratic part of the variance so its more comparable
29
properties of normal distribution?
* 68.3% of values within 1SD of mean * 95% of values within 2SD of mean
30
what is the coefficient of variance (CV) and what is it used for?
* large value of CV = relatively high variation * CV scales SD to allow comparisons on same scales to be made, regardless of magnitudes
31
how do you measure deviance around median?
IQR
32
When do we use median or mean to represent most common value?
* mean = affected greatly by outliers, takes into account of magnitude * median: not affected greatly by outliers, not affected by magnitude * median better for skewed distributions to find out most common value (or when outliers present) * mean better for symmetrical distributions to find out most common value (or when no outliers present) * symmetrical = mean = median = mode (so doesnt rlly matter) * uniform distribution: no mode
33
what does repeated random sampling provide?
* turns frequency distribution into a normal distribution (or a PDF) * central limit theorem = when u sample again and again and create a distribution of the mean of multiple samples, with each sample having a large sample size, the distribution of resampling means is a NORMAL DISTRIBUTION * the MEAN of the resampled distribution (sampling distribution of Y bar) is used to estimate the true u (mean of population).
34
what is a sampling distribution of Y bar?
The sampling distribution of refers to the distribution you get when you consider the means of many different samples taken from the same population. Each sample has its own average value and if you plot the frequency of these averages, you get the sampling distribution of the sample mean.
35
What is Y bar, Y hat and u in sampling distribution/population
* u true mean of population * Y bar = a single sample mean of a frequency distribution of a sample * Y hat: mean of the sampling distribution ( normal distribution of multiple sample means) * Y hat should be an estimate of true value u. u = Y hat
36
what is standard error?
standard deviation of a sampling distribution. effected by precision and sampling error hence sample size effects the precision/spread/standard error of the mean derived from sampling distribution
37
how do we calculate standard error for sampling distribution, and sample (frequency distribution)
same equation
38
how does sample size affect standard error
* increase in sample size = increase in precision = decrease in spread = decrease in SE * small SE = more precise = more n * small SE = large n
39
what is confidence interval?
a range of values which is likely to contain the population parameter (like the mean or proportion) with a certain level of confidence.
40
what is the 95% confidence interval and how to calculate?
It does not mean there's a 95% probability that the specific confidence interval you calculated from your sample data contains the true population parameter. The true population parameter is either in the interval or not; the interval does not tell us about the likelihood of its location within the range. * IT IS ABOUT METHOD RELIABILITY! simply is if you were to take 100 samples, 95% of sample's parameter would be in this interval
41
what is random trial
process/experiment where 2+ outcomes cannot be predicted with certainty. random sampling is a type of random trial
42
what is probability?
proportion of times an event occurs when a random trial is repeated under same conditions
43
what is probability distribution and probability density function
* probability distribution: true relative frequency of all possible values of a discrete random variable. All mutually exclusive outcomes of a random trial * probability density function: true relative frequency of all possible values of a continuous random variable
44
what is mutually exclusive events?
* events that cannot occur at the same time
45
what is the general addition rule for probability, given they are mutually exclusive?
add probabilities together if they are OR
46
what is the general addition rule for probabilities (not mutually exclusive)
union = OR intersect = AND
47
what is the general multiplication rule for probability (joint probability - 2 events occuring dependently)
AND = intersect
48
what is the general multiplication rule if independent
49
what is the law of total probability (or conditional probability)
2 equations that can be used, law of total probability or conditional probability P(A|B) = P(A U B) / P(B) for different reasons, draw a probability tree if not sure
50
complete this probability question: what is the P(male offspring), given host is 20% parasitized has a probability of 90% male offspring, and host not parasitizied has a 5% being male
51
whta is the null hypothesis
H0 = testing a claim, by taking the viewpoint that there is 'no change/difference' in parameters being tested
52
what is the alternative hypothesis?
H1 = taking the meaningful viewpoint: something has changed i.e: X is lower, or X is different to what is believed
53
what is a one-sided or 2 sided test
* 1 sided: is less than/greater than * 2 sided: different to
54
what is a type 1 or type 2 error?
* type 1: rejecting null hypothesis when null hypothesis is true (false positive) * type 2: failing to reject null hypothesis when null hypothesis is false (false negative)
55
what are critical values?
* critical values based on significance level * significance level = 5% * determines how significant our data is, 5% states that we are confident in our data if less that 5% of the time we commit false positives (type 1 error)
56
what is p value?
The p-value is the probability of observing results at least as extreme as those measured in your study, given that the null hypothesis is true. p< 0.05 is strong evidence against null hypothesis, so reject null hypothesis p>= 0.05 is NOT enough evidence to reject H0
57
what are the reasons effecting statistical power
1) The size of the effect we are measuring is small (decreases power) 2) The sample size is too small (decreases power) 3) The variance (σ) is large (decreases power)
58
what is statistical power?
the probability of a test correctly rejecting the null hypothesis when it is false. probability of false negative occurin? Power = 1- beta * power is used to calculate sample size required in experiments. beta is how much you are willing to have false negatives. i.e if beta is 0.2, then it means you are happy having false negatives 20% of time.
59
what is simpsons paradox?
the existence of data for which a statistical association holds for a population but is reversed in a subpopulation arises when there are hidden variables which influence correlations
60
what is parsimony and ockham's razor
among the theories that fit the data equally well, choose the simplest theory.
61
what is meant by correlation not causation
* correlation might be due to hidden variables, which seemingly ties factors together, even tho they are not caused by each other
62
what are the requirements for a binomial test?
* must fit the binomial distribution: discrete variables * only 2 outcomes: success or failure * the events and trials have to be independent of each other * events/trials (n) are finite * probability of success and failure is constant (fixed)
63
what is the binomial equation/distribution formula
64
what is the sample proportion (p-hat)
* unbiased estimation of the population proportion * p-hate becomes more precise as sample size increases
65
what are 2 ways to do binomial test
* use the confidence interval method, calculate 95% confidence interval and then see if p value lies within it * use the binomial equation to calculate p value,
66
what determines the shape of a binomial distribution?
* binomial distribution are not neccessarily symmetrical, they can be skewed * dependent on n, and p * if n is large and p is close to 0.5, the more the binomial distribution will start looking like a normal distribution
67
For this example, please calculate using BOTH binomial methods and give a conclusion.
68
how do you create confidence interval for binomial distribution using p-hat standard error
69
what is the agresti-coull for creating confidence intervals for proportions?
70
for this question, please use agresti coull method for creating confidence intervals for proportions Sea turtle sex ratio is though to be 50:50, however in the population there are 131 females out of total pop of 169. Test if the actual sex ratio is skewed towards females?
71
Complete this question using 2 binomial methods including the Agresti-coull (confidence Interval method). Does butered toast habe a higher chance of landing down
72
what method do we use if we don't have a binomial variable (i.e we want to see if there is a pattern in a week's rain)
* can use chi squared test or poisson distribution (and then tested by chi squared) * we can build a proportional model: proportional model is a probability model where the frequency of occurence of events is proportional to the number of opportunities
73
what is a proportional model?
* simple probability model where freq of occurence of events are proportional to the number of opportunities
74
what is an example of a constructed proportional model (chi squared test) hypothesis test given these data?
H0: frequency of births on each day of the week ** IS PROPORTIONAL** to the number of times each day of the week occurs H1: frequency of births on each day of the week ** IS NOT PROPORTIONAL** to the number of times each day of the week occurs
75
when to use goodness of fit/chi squared test?
* categorical data * more than binomial variables (not just success or fail) * ONLY to test whether there is a difference between observed and expected values * this will not allow specific comparisons between specific categories using chi squared test * degrees of freedom = categories-1- number of parameters (often 0)
76
what is chi squared test equation?
77
For this data, please test if there is a difference between expected and observed differences (proportional model).
78
how to calculate degrees of freedom for chi squared test
df = categories - 1 - number of parameters (this is often 0)
79
how do we determine the hypothesis test result for chi squared test, given chi squared value and df?
* according to sampling distribution of the null distribution given sig level at 0.05, and df (variable) * look on stats table and find the corresponding value * if calculated chi squared critical value less than the 0.05 value, then accept H0.
80
when to not use chi squared test?
* when any of the categories have an expected frequency < 1 * when more than 20% of the categories have an expected frequency of less than (<5)
81
how can we tell if a pattern occurs randomely or not?
* use poisson distribution and then test it using chi squared test
82
what is the poisson distribution? What is the formula for it?
* probability distribution * the number of event's successes in a certain time/space, where success happens independently of each other and occur with equal probability * for discrete data * used to test whether events are randomly distributed in time/space * i.e if a laundromat breakd down 3 times every months on average,, what is the probability that it breaks down twice next month (can use poisson distribution to calculate)
83
no need to mem: but what is the difference between poisson and binomial (explained better)
The binomial distribution describes a distribution of two possible outcomes designated as successes and failures from a given number of trials. The Poisson distribution focuses only on the number of discrete occurrences over some interval. A Poisson experiment does not have a given have a given number of trials (n) as binomial experiment does. For example, whereas a binomial experiment might be used to determine how many black cars are in a random sample of 50 cars, a Poisson experiment might focus on the number of cars randomly arriving at a car wash during a 20-minute interval.
84
characteristics of poisson distribution
* It is a discrete distribution. * Each occurrence is independent of the other occurrences. * It describes discrete occurrences over an interval. * The occurrences in each interval can range from zero to infinity. * The mean number of occurrences must be constant throughout the experiment.
85
Give answer for: did mass extinction occur randomely through time?
* H0: The number of extinctions per time interval has a Poisson distribution * H1: The number of extinctions per time interval does not have a Poisson distribution * calculate mean (u/lambda) first = 4.21 for all data points * for each category, calculate the probabiloty for each category using poisson distribution for all extinctions * multiply the probability for each category by the observed frequency for each category, and get the Expected value for each category * then using the chi squared test, calculate the X2 value, which is 23.93 (test statistics) * for df = no. categories - 1 - no. parameters = 8-1-1=6 parameter is 1 here because the parameter is the 'number of extinctions' * critical value for df =6, and sig = 0.05, it is 12.59 * 23.93 > 12.59, hence reject H0 (test stats is greater than crit value)
86
what is test statistic and critical value in hypothesis testing?
* test statistic is the value you calculated to compare w the critical value * critical value is the value at 0.05 significant level
87
what is contingency analysis?
* test for association between two or more categorical variables
88
what is the relative risk?
* relative risk is the probability of an outcome in the treatment group divided by probability of the same outcome in a control group *
89
calculate relative risk given the data below
90
what method does this question use, please calculate: Does aspirin reduce the risk of cancer?
* calculate relative risk RR * value close to 1
91
what is the odds ratio short cut?
92
what is odds ratio and 2x2 tables
93
how do we use odds ratio to test hpothesis?
* calculate 95% confidence interval * must calculate SE for ln[OR] first and then do e^, to find actual CI
94
For this example please give exact working steps of a hypothesis test: using x2 contingency test (multiple variables).
95
how to calculate df for x2 contingency test?
df = (row-1)(column-1)
96
what is the equation for x2 contingency test?
97
assumptions of the x2 contingency test
1) No more than 20% of the cells can have an expected frequency < 5 2) No cell can have a frequency < 1
98
what happens when conditions of x2 contingency test not met?
* combine categories to increase frequence (only if combined categories still menaingful) * use fisher's exact test: only for tables that are 2x2 where x2 cotingency test cannot be used
99
complete this example using fisher's exact test
100
when to use fischer's exact test?
* when multiple variables, and when the frequency of each cell in a table is too low to use x2 contingency test (expected frequency of cells are too low) * also tests associations between multiple variables
101
how to determine hypothesis test outcome? Test stats vs critical values, and significant level vs p-values
* test stats > critical value then reject H0 * if p value is less than significance level, then reject H0
102
difference between odds ratio, relative risk and x2 contingency tests
* odds ratio and relative risk used where there is a treatment group, and a control group. For clinical trials, and 2x2 tables * contingency test: for more than 2x2 tables, and need to calculate expected values. not neccessarily a control vs treatment group situation. Tests independence, and whether association exists
103
what is the equation for fisher's exact test?
* fisher's test provides p value NOT test stats
104
what is a probability density function?
a probability density is the true relative frequency of all possible values of a continuous random variable
105
for a normal distribution, how to calculate area under the curve?
106
equation for standard (z) normal distribution
107
what is the central limit theorem
the mean of a large number of measurements randomely sampled from a non-normal distribution is approximately normally distributed
108
how to calculae the student's t?
* calculate standard error from the sampling distribution * then calculate student's t: using sample mean - population mean / standard error * degrees of freedom is n-1, n-1 because we estimated a parameter to calculate t
109
student's t test significance level?
* normal is a 2 tail or 1 tail test * hence important to note whether it is 1 tailed (0.05) or 2 tailed (0.025 on each side)
110
how to calculate 95% CI for population mean?
population mean (Y bar) (+/-) (critical value) * Standard Error
111
what is the equation for a one sample t-test, and hypothesis test, assumptions?
Assumptions 1. Data are randomely smapled from population 2. variable is normally distributed in the population
112
For this data set, calculate the student's one sample t-test (using both CI method, and t-test method)
113
what happens once sample number increases?
* sample size increases, precision increases
114
what do we do when we want to compare measurements from 2 groups to test a hypothesis
* we can use either paired samples t-test, or independent samples t-test
115
what is paired samples t test and independent samples t test?
* paired is: testing if the change in mean is 0 2 groups (mean change =0) * independent: testing if there is a change in mean between 2 groups (mean 1 = mean 2)
116
assumptions for paired-samples t-test and independent samples t-tes?
Assumptions for paired samples t-test * sampling units are randomely sampled from the population * paired differences have a normal distribution in the population Assumptions for a independent samples t-test? * each of the 2 samples are all random samples from population * the numerical variable (response, dependent) is normally distributed in each population * the SD and variance are same in both populations
117
calculate this question with paired samples t-test? use both methods
118
for this question, use independent samples t-test, use both methods
119
T-test equation, df, SE, and 95% CI for independent samples t-test,
120
How to test for equality in the variances of independent samples in t-tests? the 2 samples need to have the same SD and variance in the assumptions, how do you tes for that?
Use the F test:
121
what is the F test?
* determines whether 2 variances are equal * H0: variance 1 = variance 2 * H1: Variance 1 does not equal to Variance 2 * F test = larger variance / smaller variance * 2 tailed test, * df1 = n1-1, df2= n2-1 * check critical value * Reject H0 if F> Critical Value
122
what is alternative to F test?
Levene's test for homogeneity of variances, but requires stats algorithm, more robust than F test
123
What to do when variances/SDs of a independent samples T test are not the same?
* Use a Welch's approximate T test * Welchs uses same T test where (y1-y2)/SE * Difference: SE equation is different; df equation is different
124
How to do a welch's approximate T test?
125
6 assumptions of normal distribution for statistical inference?
1. data are randomely sampled 2. samples are independent 3. difference between observed and predicted are normally distributed 4. mean and variances of errors are independent of explanatory variables 5. one source of unmeasured random variance 6. variance among groups are equal (or can be adjusted using other tests)
126
what are 4 ways to determine whether data fits assumptions?
1. Basic questions: source of data? biased? is it independent? 2. graph the data: does it deviate from assumption? 3. quantity vs quality: lots of variability? sample size? 4. alternatives to normal distribution: alternative stats approaches or other distributions/ data transformations
127
what are 4 ways to deal with data that doesnt meet assumptions?
1. Ignore violations of assumptions if sample size is large due to central limit theorem 2. Transform data: use mathematical transformation methods to alter the distribution. e.g using natural log, arcsin, square root 3. use a non parametric method: methods to calculate probability taht does not require response variable to be normally distributed. Have less stats power than normal distribution 4. use permutation test (bootstrapping): use computer algorithm to repeatedly randomly generate your sample to produce a null distribution with large sample size
128
when can we ignore violations of assumptions due to central limit theorem (method 1)
1. large sample size - then yes, can be ignored 2. when all the samples are skewed towards same direction (not one sample distribution towards left, and th eother skewed towards right) - then yes, can be ignored 3. can we use a method to adjust the differences in variance/SD: i.e welch's t test - then yes, can ignore 4. if none of the above are met, then must transform
129
what are the ways and equations to transform normal distribution?
130
How to test if your data fits well to the normal distribution?
* Shapiro Wilk's test: * test goodness of fit your data is to a normal distribution * DOES NOT: tell you whether your data is/is not normally distributed * DOES: determine deviations from normal distribution, tell u whether ur inference from normal distribution is flawed
131
How to do a log transformation data, and when to use?
1. when data is ratios or products of variables (i.e in odds ratio) 2. frequency distribution skewed to right 3. group w larger mean also has larger SD 4. data spans several orders of magnitudes * note if any data is 0 log wont work, you must +1 to data then do transformation
132
what are some non-parametric methods?
* sign ranks test for paired samples (makes data binomial * Mann Whitney U test: comparisons of 2 groups
133
how to conduct a signs ranks test for paired samples?
* calculate difference * asin a + or - for either the difference is greater of smaller than 0 * make null hypothesis H0: above 0 = below 0 * use binomial test to calculate P value and test for it
134
how to conduct a comparisons of 2 groups: using Mann Whitney U test
* put both groups of data into 1 coulumn, and rank values from smallest to largest. Assign rank of value from 1 * for each group: add up the sum of the ranks * calculate test statistics U1 and U2. The larger one is used as test statistics * use U distribution to find critical value at 0.05 * df = sample size of larger group * if test stats > critical value, reject H0
135
what are 4 assumptions/limitations of Mann whitney U test?
1. assumes randomely sampled data 2. to test whether the data have different distributions (not a robust test for diff in mean) 3. MWU test can be used to test diff in mean/median only is both groups have the same shape of distributions 4. MWU has low stats power because it throuws out order of magnitude. CREATES HUGE TYPE II errors
136
what is a permutation test?
* bootstrapping * permutation test generates a null distribution for the association between 2 variables, by randomley and repeatedly rearranging values of one of the variables in the data
137
# what i what is the steps for permutatuon method?
1. create a permuted set of data where the values of the response variables are randomely ordered 2. calculate the measure of associated for the permuted sample (the difference between means, medians...etc) 3. repeat permutation process 1000 times to create null distribution 4. from the null distribution you have created, identify the location of your actual data, and compare w critical value to see if it is actually significant.
138
which test should I use?
139
quick summary of all tests
check chapt 13 stats sheet
140