PSCH 443 - Midterm Flashcards
(46 cards)
What is covariance?
the average product of deviation scores; how much scores covary together w/ respect to their standard deviation scores. This can be understood as the extent of how much they shared variability within the model.
What is correlation?
the average product of z-scores. A standardized measure of covariance; measured w/ an association value that falls b/w 1 and -1.
What is slope?
the beta weight; predictor of Y-hat; “steepness of the line.”
What is the intercept?
the constant; the value of Y when X = 0; “elevation of the line.”
What is error variance?
the average of the squared differences b/w the actual value of Y-hat and the predicted values from the model.
What is SSM?
the variation in Y that our model captures.
What is SSRes?
the sum of the squared errors; error/Residual variation.
What is SST?
the total variance in Y; all the variance that could possibly account for Y outside of (and including) the predictor variable(s).
What is R^2?
the proportion of the variance in Y that is accounted for by the model.
What is F?
a test for significance of a group of variables; an index of how probable a result in the model is or is not due to sampling error.
What is MSm?
the variance in Y our model accounts for.
What is MSRes?
error (Residual) variance in the model.
What are force methods of parameter estimation?
Gradient descent/brute force:
- Start with viable parameter value
- Calculating the error using slightly different value
- Continue moving the best guess parameter value in the direction of the smallest error
- Repeat this process until the error is as small as it can be.Effectively just plugging in values until the smallest error is manually located.
What is least-squares estimation?
Uses beta values to create a line of fit from the sample data; seeks to capture the smallest difference b/w the predicted and actual values of Y. In linear least-squares estimation, this line is in the form of:
Y = a + (b(X)), where Y is the actual value, a is the constant and b is the slope times the beta variable’s value.
- the goal is to minimize error variance.
- uses the correlation b/w X and Y, the standard deviations of X and Y, and the means of X and Y to calculate least squares estimation.
What is partial correlation?
Looks at the relationship between X and Y while holding a third variable (Z) constant; partials out the covariance of Z from both X and Y; ignores all the variability in the model to examine specific relationships b/w those variables.
- serves as a means to control for variance within subset variables
- does not provide as clear a picture of how the model does as a whole
What is semipartial correlation?
Correlation indicates the relationship between X and Y by holding the covariation of X and Z constant. This is the foundation of multiple regression.
- allows us to examine the unique effects of X on the whole of Y, while holding third variable (Z) constant
- we can assess the unique contribution of X relative to the whole of Y (i.e., the unique percentage of total variance in Y accounted for by X, holding the effects of any other variable(s) constant)
What is multiple regression?
The ultimate goal is to model the influence of several predictors on an outcome variable. This model should account for:
- unique overlap of each predictor with the outcome
- degree of overlap between predictors
- extent to which the overlap between predictors overlaps with the outcome
- the overall degree to which the predictors explain the variability in the outcome.
What is multiple correlation?
A measure of how well a given variable can be predicted using a linear function of a set of other variables; the correlation between the variable’s values and the best predictions that can be computed linearly from the predictive variables.
- takes on a value b/w 1 and 0, like correlation
- the correlation of Y and Ŷ
What are the basics of ANOVA?
Statistical models used to analyze the differences among/between group means and their associated procedures (such as “variation” among and between groups).
F-statistic > 0.05 is considered significant; if not, we fail to reject the null. It measures the partition of variance throughout the different means of all the variables that are used within the model.
Explain the interpretation of regression coefficients (both b and beta).
b = Unstandardized Coefficients; the weight variable in the units from the study itself.
Ex. Interpretation: “For every point on the GRE one gains, they shave .002 years (less than a day) off of their completion time.” Beta = Standardized Coefficients; the weight variable in SD units from the mean. Ex. Interpretation: “For every standard deviation increase in GRE score, we predict a .2o1 standard deviation reduction in years to complete MS.”
SE (standard error) in regression
The measure of sampling error associated with each coefficient or predictor variable.
- average amount we would expect each parameter to vary if we were to take repeated samples
- affected by sample size; larger sample size means less error
- ideally standard error would be small
Significance testing (t-tests) for regression coefficients
The logic is based on whether or not the model has any slope it can evaluate. If there is no slope, then no effect exists. It uses two steps:
- Tests regression parameter against an expected value of zero for b in the population
- If p
What is multicolinearity?
Exists when there is a strong correlation b/w two or more predictors. Problems include:
- increased error in our parameters
- limits the size of R, and by extension R2 we can observe
- increased difficulty in assessing importance of predictorsIf multicolinearity occurs, we can drop the variable, combine it w/ the variable it correlates w/ to test if they measure the same thing, or leave it in the model provided it poses no huge issues w/ the data.
Explain outliers.
Outliers are extreme cases on one or more of our variables; can have too much of an influence on parameter estimates and regression solution.
- univariate outliers are extreme on one variable
- multivariate outliers are extreme on combinations of variablesOutliers have the following negative impacts on the regression model:
- less normality
- skewed distributions
- results less likely to generalize to population