3. Aug 24th Flashcards

1
Q

Can the null hypothesis be true?

A

Yes, in a controlled manipulative study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are multiple types of regression tests an example of?

A

The general linear model

- The most important topic of this entire class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the simplest form of the general linear model?

A

Simple linear regression

- Continuous x, continuous y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two purposes of regression?

A
  1. Fit a line to data
  2. Test if the slope of that line is significant
    - – If p-value < 0.05
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three possible causes of a large p-value?

A

If the p-value > 0.05, we don’t know if:
A) Sample size is too small
B) Effect size is too small
C) Too much noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The traditional equation for a line

A

y = mx + b

b is the y intercept
m is the slope (change in y value/change in x value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The stats equation for a line

A

General
Y = Beta-0 + Beta-1(X)
B0 is a constant
B1 is regression coefficient

Specific
y-hat = Beta-0 + Beta-1(X)
y-hat = predicted value of dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What two elements make up the relationship between every x and y (i.e. the line)?

A

Our equation PLUS Epsilon

E = error (normally distributed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What two requirements must be met to be a “best fit line”?

A
  1. Average error = 0
    - – (The distance from each point to the line added up)/n = 0
  2. The sum of squared error is minimized
    - – Squaring to get rid of negatives
    - – Complicated models are done iteratively
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the more technical name for “total variance in y”?

A

Total Sum of Squares

Total Sums of Squares (SST): summed square of distance from data to null hypothesis line (0 slope)
Represents ALL the information we have
SST = SSR + SSE

Sigma(n top, i=1 bottom) (Yi - y-bar)^2
Yi - any individual Y (observed dependent variable)
Y-bar- average/mean y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What 2 things does the total sum of squares partition variation into?

A

https://365datascience.com/sum-squares/

  1. SSR- sum of squares due to regression
    — The variation in y due to variation in x
    Sigma(n top, i=1 bottom) (Yi-hat - y-bar)^2
    Yi-hat = Your predicted value of y
    Y-bar = average/mean of Ys AKA mean of the dependent variable
  2. SSE- sum of squares due to error
    — Noise
    Sigma(n top, i=1 bottom) (ei^2)
    e = difference between the observed value and the predicted value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of p-value will you get if sum of square error (SSE) is larger than sum of squares regression (SSR)?

A

A large p value (greater than 0.05)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What type of p-value will you get if sum of squares regression (SSR) is greater than sum of squares error (SSE)?

A

A smaller p-value (smaller than 0.05)

Also means that movement in y is mostly due to x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Important take aways of regression

A

1) Best fit lines mean
- –a) Average error = 0
- –b) Minimize sum of squares error (SSE)

2) P-values are calculated by partioning variation in y into
- – Sum of squares regression (SSR)
- – Sum of squares error (SSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does regression display?

A

Correlation

NOT causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What DO you have to do to determine causation?

A

A MANIPULATIVE experiment.

Observational studies will not prove causation.

This is the VERY reason that debates about global warming even exist.
— We don’t have 3 Earths to manipulate. We cannot do manipulative studies.

17
Q

How can you test to be absolutely sure your R data results are correct?

A

Make the data yourself.

18
Q

What does the professor not believe in?

A

“I don’t believe in randomness. Randomness is just other measurable things we haven’t measured.”

19
Q

How to load data to R

A

1) datum=read.csv(file.choose())

2) head(datum)

20
Q

How to plot in R

A

plot(Y~X,data=datum)

21
Q

How to run almost any regression in R

A

lm(same inside as plot)

22
Q

What does he recommend you save your results as?

A

results

results=lm(Biomass~Rainfall, data=datum)
summary(results)

23
Q

What single function gives you most of the data you’re looking for in your results?

A

summary()

24
Q

What is R^2?

A

The proportion of variation in y explained by x

If:
R^2 = 1, all points are on the regression line. Perfect fit.
R^2 = 0, no slope, no points on the line

R^2 always goes up when you add x-values