Lecture 9 - Regression Depression Flashcards
Why is regression analysis used so often?
It models a predictor for the future, or further along the gradients measured?
How is the independence assumption evaluated?
By plotting the residuals against increasing observed values (fitted)
occurs when the residuals fluctuate uniformly about 0 (no pattern)
ie. no unbalanced + or - groups
When does a lack of independence with residuals occur?
When adjacent residuals tend to be similar and thus appear to be correlated = autocorrelation
ie. groupings that consistently fall below or above the line
What is positive autocorrelation (dependence?)?
Positive residuals are followed by other positive individuals
What is negative autocorrelation?
negative residuals are followed by other negative residuals
What is the purpose of regression depression?
To examine linearity between two variables
Determine if a linear relationship exists between two variables
If the R2 value is good, what might still lead us to be concerned with what a scatter plot depicts?
If the scatterplot shows a plateau
Could mean constraints on data
Do we want to fit a curve to the data (polynomial)?
Not desirable to curve data and fit a polynomial because it gets complicated and is not very suitable for biology
Sometimes we might transform the data in this situation
What is deriving R2 and the decomposition of variability?
a least square analysis to fit the best line that reduces the variability around the dependent response variable (y)
What is the Y-bar?
Mean of all x and y values (pivots variability around dependent response?)
What is (B) in the decomposition of variability?
linear distance from observed to expected = residual or error or unexplained term
(yi-y-hat-i)
What is yi?
the observed value
What is y-hat-i?
the expected value
What is (C) in the decomposition of variability?
The Model (y-hat-i - Y-bar) and is linear distance from the expected to the mean of all x and y values (Y-bar) =model or regression #
What is (A) in the decomposition of variability?
The two components of (B) and (C)
where (A) = (B)+(C) to = the total variability
Total variation equation = ???
Total variation (A) = Residual (error/unexplained/B) + Model (explained/regression#/C)
Why do we sum the squares of the residuals?
To get rid of negative differences from the difference between the expected yi and the Y-bar mean of the x and y’s
Negatives would make it add up to 0 or some other weird number
What is the equation for the Total residuals (SSt)? And how does this relate to the A B C?
Total SSt = Residual SSe (B) + Model SSm (C)
What is SSe a measure of?
SSe is a measure of how well the regression line fits the actual data (difference between observed and expected values)
What is SSm a measure of?
SSm is a measure of how different the line y-hat-i is from Y-bar (how different is the slope from 0)
What is the equation for the Coefficient of determination R2?
R2 = SSm (C)/ SSt (A, total)
What is the Model referring to in an ANOVA table?
The treatment/factor
How do you determine F with the ANOVA?
F=MSm/MSE
or F=MSm/MSres (same thing, different label)
What is the equation for the sum of squares Model (SSm)?
Sum of (Y-bar - y-hat-i)2