6. Sept. 5 Flashcards

Question 1

Q

Quiz

Answer

A

Stats equation for line
Y = B0 + B1X1 + E-N(0,s)
Y equals beta zero plus beta 1 times x plus error that’s normally distributed with a mean of zero and standard deviation of sigma

Confidence interval correct definition
95% of such intervals contain the true value

What are the 3 characteristics of the data and underlying relationship that influence the significance (p-value) of a regression analysis?
- Sample size, slope(effect size), noise,

Definition of R^2?
Proportion of variation in Y explained by variation in X

Question 2

Q

5 Assumptions of General Linear Model

Answer

A

Y is continuous
Normal distribution of error
Linear relationship
Homoscedasticity (constant variance)
No autocorrelation (lack of interdependence)

Question 3

Q

How to look for violations of assumptions?

Question 4

Q

In R

Answer

A

datum is non-normal
datumNonlin=read.csv(file.choose())
head(datum)
datumHetero=read.csv(file.choose())
head(datum)
datumAuto=read.csv(file.choose())
head(datumAuto)

plot(Y~X,datum)

Question 5

Q

Residuals Plot

Answer

A

Useful for each one of these violations
Have to run the analysis

results=lm(Y~X,data=datum)
plot(residuals(results) ~datum$X)
- Pulling from two different objects, so have to use dollar in datum$X to stipulate
— He’s taken the regression line and “flattened it” so it is now the x axis (horizontal)
— You see it’s non-normally distributed because it’s not equally above and below 0

Question 6

Q

Nonlinear

Answer

A

plot(Y~X, data=datumNonlin)
For non-linear

resultsNonlin=lm(Y~X,data=datumNonlin)
abline(resultsNonlin)

plot(residuals(resultsNonlin)~datumNonlin$X)

Question 7

Q

Hetero

Answer

A

plot(Y~X,data=datumHetero)
resultsHetero=lm(Y~X,data=datumHetero)
abline(resultsHetero)
plot(residuals(resultsHetero)~datumHetero$X)

Question 8

Q

Autocorrelation

Answer

A

plot(Y~X,data=datumAuto)
resultsAuto=lm(Y~X,data=datumAuto)
abline(resultsAuto)
plot(residuals(resultsAuto)~datumAuto$X)

Question 9

Q

Histogram of residuals

Answer

A

A way of looking at normality of residuals in a GLOBAL sense.
There’s 2 ways of thinking about how NORMAL data really is.
- Global - how normal is it around the entire line
- Locally - how normal it is in relation to other points

Important for things like:
If we fit a line through the non-linear
- Non-normally distributed

Question 10

Q

Also, if you have supposedly TWO violations

Answer

A

You usually just have one BIG one, and the other is not so bad when you fix the first one

Question 11

Q

Histogram in R

Answer

A

hist(residuals(results))

Best for looking at ____

Question 12

Q

Bell curve “critosis”?

Answer

A

Where the curve is pinched. Most of it in tiny sliver in middle

Question 13

Q

ACF

Answer

A

Auto-correlation function

Both a graph AND a statistical test

datumNorm=read.csv(file.choose())
plot(Y~X,data=datumNorm)
resultsNorm=lm(Y~X,data=datumNorm)
abline(resultsNorm)

acf(residuals(results) [order(datum$X)])

– Still need things to be in X order, but we can’t technically use a tilde because it’s not a ____
Have to put in the brackets
Data has to be in order spatially or temporally, and that happens with the order command
– Not necessary if your data was already [in such and such order]
– It will show you that a point is perfectly correlated with itself (x = 0, y = 1.0 for perfect correlation)

You’re looking for correlation within the first couple (1-4). Past 10 you really don’t care much.

acf(residuals(resultsAuto) [order(datumAuto$X)])

Question 14

Q

Nonlinear ACF

Answer

A

acf(residuals(resultsNonlin) [order(datumNonlin$X)])

LOOKS like it’s autocorrelated, but it’s related LOCALLY. There is a consistent amount of brethren points close to it because they are clustered such

SO, ACF is only good for autocorrelation.
- BUT remember it will give a strong signature if you have non-linearity.

Question 15

Q

SHIFT GEARS from assumptions to predictions

Answer

A

One of the real values of statistics is you can use them to make PREDICTIONS.
- You can easily use the GLM to make predictions

Question 16

Q

GLM as an example

Answer

A

y = B0 + B1X1 + E-N(0,s)
Biomass - y
Rainfall - x
We can use the formula to go FORWARD and PREDICT
- Just plug everything in

Question 17

Q

Two (3) things to talk about when making predictions

Answer

A

Anytime you make predictions, you need to be careful
- Ex. data from 0-10 cm of rainfall at 1cm intervals
- How much biomass we got
- – Plug in
- —– called INTERPOLATION
- —– Making predictions within the range of observed data
- —– It’s GOOD, cool, one of statistics STRENGTHS
- —– Probably at

What about if you go PAST what you measured (17 cm when you stopped at 10cm)
2) EXTRAPOLATION
Making predictions outside the range of observed values
- It’s not BAD
- But you have to be VERY CAREFUL
— At BEST it is a hypothesis

WHY is it such a problem? Why be so careful

Because we have NO idea what happens to the Y at that point
– Without extra data, we’re being very ambitious
– Understand EXACTLY what you’re doing if you choose to extrapolate

3) Measures of Uncertainty
Any time you give an estimate of truth, give a measure of certainty (confidence interval)
- For predictions, we use a slightly different measure of certainty
— PREDICTION interval

Confidence: measure of uncertainty on the average value of something

Ex: CI of a slope, what we think the slope might be, it’s range
– Remember, the slope is an AVERAGE of a bunch of points at a certain X value

Prediction interval
If we want to make predictions of how much Y you get at certain value of X
- Predictions have MUCH more uncertainty
— Because they have individual points, NOT averages (As confidence intervals are)
— They are measures of uncertainty in possible INDIVIDUAL outcomes
- Prediction intervals much BIGGER (what we might see in ANY given outcome)

Prediction intervals are about individual outcomes (data), while the confidence intervals are about averages (slopes, groups of data)

Question 18

Q

Predictions in R

Answer

A

plot(Y~X,data=datumNorm)
summary(resultsNorm)

Equation is Y = 2.83 + 2.00 * X

When I do predictions, I do them in a big sequence

x=seq(from=0,to=10,by=0.1)
x
NewX=data.frame(X=x)
head(NewX)

You have to create a new dataset that has that X data in it

Question 19

Q

Create object

Answer

A

predictions=predict(resultsNorm, NewX, interval=”prediction”)
predictions

3 arguments

What do you use to make the predictions (your results)
The inputs you want to make predictions AT (your newly created thing)
Then give prediction intervals for us

Question 20

Q

Plot it last

Answer

A

matplot(NewX,predictions[,1:3],type=”l”)

Gives three lines

Brainscape's Knowledge GenomeTM

6. Sept. 5 Flashcards

Brainscape's Knowledge Genome^TM