Teaching block 3 - Statistics - (weeks 7,8&9) Flashcards

(53 cards)

1
Q

what are the 2 largest categories for data

A

categorical and quantative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the 3 types of categorical data

A

binomial - presence / absence
nominal - non ordered groups (yellow)
ordinal - ordered groups eg low med and high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the 2 types of quantitative data

A

continuous and descrete
continuous data can also be interval or ratio data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the 3 main data types in EEA

A

continuous, descrete and nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what practices should be avoided when carrying out statistical tests

A

cherry picking (selecting best data)
p-hacking (continuous analysis)
harking (hypothesising after result)
fishing experiments (testing between random variables to find an effect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are good statistical practices

A

consider analysis and stats test before data collection
make sure data is quality as well as quantity
document all data prep ands analysis
make sure data meets assumptions of the stats test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is hypothesis testing

A

it tells us how different is different enough when looking at two supposedly differing groups.
it gives a chance to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

probability distribution facts

A

area under curve = 1
they describe the probability of obtaining a certain value with in a range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a p value

A

probability of either outcomes assuming the null
choose a significance level (0.05). this is the probability of a false positive (type I error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

when should a chai squared test be used X^2

A

when both the explanatory and response variables are categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

give an example of when a X^” test should be used

A

do humming birds have a preference for flower color.
when setting this experiment up make sure the number of red and yellow flowers is in a ration 1:1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how should a X^” test be carried out

A
  1. calculate the expected frequencies under null hypothesis
  2. quantify difference between expected and observed frequency’s using the X^” equation.
  3. calculate degreed of freedom
  4. find p-value with degrees of freedom
  5. check to see if X^2 value is past the critical value on the probability distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the chai squared equation

A

sum of - (observed - expected)^2/expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a degree of freedom?

A

the number of data points that are free to change independently without changing the statistical parameters of the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

why are degrees of freedom important

A

the probability distribution for the p value changes significantly depending on the degrees of freedom in the experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the limitations of the chai squared test

A
  • there must be more than 5 numbers in each class, if not they should be combined
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

chai squared checklist

A
  • always calculate observed and expected VALUES not proportions
  • make sure null is biologically sound
  • use null as basis to calculate the expected values
  • always quote degrees of freedom
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

when should a linear model be used

A

when the response variable is continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

when should an ANOVA model be used

A

continuous response variable and categorical explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does an ANOVA model analyse

A

analyses the difference (ratio) between among group variation and with in group variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what would a significant result look like from an ANOVA test

A

if the among group variability is greater than the within group variability. this means there is significant differences between groups but not a lot of variation with in the groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is mew

A

the global mean (across all groups)

23
Q

what does it mean if each group is very close to mew

A

there is no relationship between the groups

24
Q

what is a residual regarding ANOVA

A

the difference of each data point from the with in group mean

25
what are the key assumptions of ANOVA
all observations are independent residuals are normally distributed variance of residuals between groups is equal
26
what is heteroscedasticity
unequal variance occurs when residuals are unequally distributed
27
what can be done of the data does not meet these assumptions
- transform the data do a non-parametric test
28
what is a non-parametric test and which one should be used for ANOVA
non-parametric tests havemore relaxed assumptions and can be used when data and residuals do not meet requirements of ANOVA. the Kriskal-Wallice test should be used here. it ranks all data values and tests weather they fit into groups
29
if an anova test is used what test can be used to see where the variation lies in the data
tukeys test it is a sequence of t-tests, the p-value is adjusted to account for multiple compatrisons
30
what is a disadvantage of using a nonparametric test like the Kriskal-wallice test instead of ANOVA
increased risk of false negative results.
31
how does anova classify variation
by classifying the data into discrete categories and tests weather differences between individuals falling into categories is significant
32
how are the degrees of freedom calculated with anova
1. among group degrees of freedom - (number of categories -1) 2. with in group degrees of freedom (number of indipendant data points - number of categories)
33
when should a linear regression be used
when the explanatory and response variables are both continuous
34
what does a linear regression do
- does explanatory variable have a significant effect on the response variable - quantify relationship between explanatory and response variable - extrapolation of values
35
what do residuals show in a linear regression
the line doesn't perfectly match up with the data points. The residuals are the difference between the regression line and the raw data - residuals are calculated in y direction only
36
what is the equation of a linear regression line
y = bx + bo + E
37
how can significance be calculated when using a linear regression
the slope of the line (B1) has a related p-value
38
which non-parametric equation should be used if the data does not meet the assumptions of a linear regression
Spearman correlation - describes nonmonotonic relationships
39
what is a nonmonotonic relationship
when ever X increases Y increases too
40
what are the assumptions of a linear nmodel
- residuals are normally distributed - each data point is independent - results are linear - no error in response variable measurement - error in response variable does not depend on independent variable
41
what did Anscombe stress the importance of
plotting raw data
42
why is plotting raw data so important
1, make sure data follows linear model (normal distribution/nonmonotonic) 2, look for data poets with high leverage 3, helps you understand the nature of the data 4,looks foe heteroscedacitiy - variance of error is not constant
43
what are the limitations of ANOVA and linear regression
- can only have one explanatory variable - strict assumptions to follow - response variable must be continuous
44
how can an outlier have the highest leverage
if it is at a 90* angle to the other data points
45
how can a diagnostic plot be coded on R
plot(tree_model)
46
what are diagnostic plots
they make a series of plots showing the residuals. they can be used to determine weather the data fits linear regression assumptions
47
what do diagnostic plots check for
Are residuals normally distributed? do residuals have equal variance? are there any outliers with vary high leverage?
48
what does the R^2 value show
the proportion of variance explained by the model - high r^2 = model explains most of the variance
49
when should a multiple regression model be used
when the 2 explanatory variables are continuous and the response variable is also continuous
50
what should be checked for using a correlation matrix when choosing multicollinearity models
multicollinearity - This is when there is a near linear relationship between the explanatory variable
51
what issues does multicollinearity cause
- inflates standard errors of coefficients - false p-values - poor predictive accuracy of model makes the estimates very imprecise (unbiased)
52
When should an ANCOVA model be used and what does it analyse
models with multiple categorical explanatory variables it analyses covariance
53
give an example of when ANCOVA should be used
relationship between humidity and flower length for 2 colors. ancova should be used here because there is something with in the response variable that is driving variance