5. Aug 29th Flashcards

Question 1

Q

Today we’re talking about assumptions

Answer

A

Anytime you analyze data with statistics, you make some sort of assumptions (or the technique you use intrinsically makes assumptions).

EX: Assume that your data is representative of a larger population
– If it’s NOT representative, it’s not representative of truth

Question 2

Q

General assumptions of the linear model (5)

Answer

A

Key to regression AND the first half of the class covering general linear models

1) Your Y data is continuous
- – If it is categorical (mortality “lived/died”) then regression won’t work
- – You don’t really need to TEST this: you’ll just know

2) Your error is normally distributed
- – Some people say your Y data needs to be normally distributed to participate in the linear model > INCORRECT
- – yi = B0 + B1x + error(sigma)
- – Ex: comparing size of males to females
- —– End up with 2 modes (bimodal)
- —– But the ERROR is normally distributed

3) You will have a linear relationship between X & Y
- “This one’s a pet peeve of mine. An assumption that is often ignored in analyses.”
- In ecology, nothing is REALLY linear
- Breaking this assumption can really muck up your data
- – STORY: Post-doc. John Chase’s lab. Buys kiddie pools from Wal Mart, leaves them in a field with water, goes away for a month, comes back to see what habitated. Wrote a paper: X-predator density, Y-prey density. Predator was mosquito larvae, prey something. Clearly non-linear results. He DIDN’T CARE that his linear model didn’t match. He just wanted to know IF there was a relationship.
- He ALWAYS plots his data before running lm(), guesses at the shape (linear or non-linear)

4) Homoscedasticity
- Homo = same, scedasticity = dispersion/variance
- Means a CONSTANT variance, a constant standard deviation
- The standard deviation around your line doesn’t change
- – Vs. Heteroscedasticity
- —- Higher values of variance at higher values of x and y
- —- Most common form in ecology: low variation at low values of x and y, higher variation at higher values of x and y
- This is an assumption that doesn’t matter that much
- – You’ll get higher p-values
- – There are techniques to measure heteroscedasticity (weighted regression), but they’re beyond this course
- See in population abundance

5) No auto correlation
AKA All of your samples are independent
- Ex. through how this assumption might be violated
- Ex: sampling from a river. There’s a factory dumping pollutants up stream
— How does concentration of pollution change with distance from the factory
— You take samples every 5 meters
——(how much x changes as function of distance from factory downstream)
— How much can the pollution possibly change in 5 meters? Not much.
— We’d get a lot of auto-correlation - the measurement you got for X sample will be highly correlated with the previous measurement
—— If you’re taking repeated measurements over time, you are likely to get auto-correlation
—— The amount will be a function of how close together those spaces are temporally or spatially

In non-autocorrelated , they are normally distributed but not correlated
— autocorrelated is where each point is a function of the previous point

But ultimately, it doesn’t have much of an effect on slope or p-value.
We get into dealing with that in mixed effects models

Pseudoreplication != autocorrelation

Question 3

Q

Saying (axiom) that goes around regarding GLM

Answer

A

“ANOVA, regression, t-tests, are all ROBUST to violations of assumptions.”

– Your error doesn’t have to be PERFECTLY normal
– Y data doesn’t have to be PERFECTLY continuous
– AKA Even if your assumption is violated, it is likely the data won’t be effected (too much).

What really happens when you violate your assumptions?

Slopes are still fairly accurate (unbiased estimates of truth)
P-values and confidence intervals will be conservative (larger than they should be/would be if assumptions not violated)
– ANOTHER AXIOM: The more assumptions a statistical test makes, the more powerful is it (the smaller p-value it tends to give).
—– As long as the assumptions are met.

5. Aug 29th Flashcards

(3 cards)