Statistical Models Flashcards
(36 cards)
Define parameters in regression equations (slope, Y-intercept)
The intercept and weight (measurement error) values are called the parameters of the model.
What are the weight values?
Regression weights, or regression coefficients. Weight values are values that assume for the amount of error/random error in the model. They are calculated using the slope of the line, where the dependent Y [reliant] has a quantitative relationship with the independent X [controlled]. Y is assumed to have a constant standard deviation over multiple observations, which can then minimize error by calculating least squared estimates [or taking correlation by SD of Y compared to SD of X].
What is the intercept?
Used to account for error when the mean of the residuals [Y] is theoretically equal to zero [X=0]. It tells you nothing about the relationship b/w X and Y, and serves as a constant that gives some idea of where Y crosses the y-axis.
What do these two parameters estimate?
The slope and the intercept define the linear relationship between two variables, and can be used to estimate an average rate of change.
What are predicted values?
The values of Y-hat, or the values of the dependent variable assumed by the parameters of the model.
What are observed values?
The values of Y, or the actual values of the dependent variable.
How should we understand model fit?
Model fit is a measurement/representation of how well the actual values of Y correspond to the predicted values of Y.
What specific equation is used to assess/estimate model fit?
Error variance, or the average squared deviations of the model parameters. This is used because the degree of error in the model tells us how far the predicted values distance from reality.
What is good error variance?
Small, minor differences in variability that do not detract or misrepresent the relationship b/w Y and Y-hat. Good error variance is not necessarily explainable, but also does not impact how accurate the model is assumed to be or can be corrected for.
What is bad error variance?
Differences in variability that are not explainable and that confound the relationship b/w Y and Y-hat. In this case, the error that we are attempting to examine can be related to entirely different variables, or are not actually relevant to the question that the model is attempting to represent.
What does least squares estimation do?
Finds a line determined by using an estimation of squared SD error variance relative to the constant Y [if the mean of X=0]. It makes the sum of squared errors as small as possible, and thus minimizes the total compounded error.
What is a residual?
Error estimates that result from the model being unable to perfectly reconstruct the actual data b/c it cannot realistically represent the full population error.
What is error variance?
The average squared differences b/w the predict and the observed values.
What is R^2?
A squared correlation that represents the proportion of the variance in Y that is accounted for by the model. It estimates the strength of the relationship b/w the model and the response variable.
Explain the purpose of statistical modeling.
To provide a mathematical representation of theories [about the relationships b/w different factors].
Explain how changing different parts of a linear equation will alter the model.
The model changes depending on which variable is different in the equation. This change is unique if it is a linear or quadratic model.
Linear:
a-value: Y values increase as we move up. Elevation of the line changes.
b-value: the line becomes steeper or less steep.
Quadratic:
a-value: elevation of the curve increases.
b-value: the depth of the curve increases.
What if the slope is equal to zero?
The predicted values of Y are equal to a.
Explain the goals of parameter estimation.
To find weight and intercept values that allow us to calculate the smallest discrepancy between the predicted Y values and the actual values of Y.
Explain the least-squares estimation.
Finds a line determined by using an estimation of squared SD error variance relative to the constant Y [if the mean of X=0]. It makes the sum of squared errors as small as possible, and thus minimizes the total compounded error.
Explain 3 sources of error variance.
Sampling error - error caused by observing a sample instead of the whole population.
Incomplete Model/Missing Variables - variables that mattered, but were not identified or applied to the model.
Imprecision in measurement - nonsystematic problems in how data was collected, how subjects interpreted the questions of interest, or w/ the process of experimentation. Random error has a normal distribution; high and low scores are roughly symmetrical.
Explain what the best guess of the y-intercept would be if we have no information about x-values.
The mean of Y, because it is the point at which the sum of the deviations above that point are balanced by the sum of the deviations below that point. (e.g. a least squares statistic) The mean is the only value that makes the variance as small as possible. This is equivalent to ignoring any X serving as a predictor variable in the model.
Explain the goal of model comparison.
To approximate which of multiple theories gives a better account of the data and better represents reality.
Explain why R-squared is useful.
R^2 is useful b/c it is a standard metric for interpreting model fit. It evaluates the portion of the variance of Y accounted for by the model relative to the actual variance in Y.
Identify parameters in a model, given an equation.
Linear: Y = a + bX | Quadratic: Y = a + bX^2
a = Y-intercept (the value of Y when X = 0) b = slope (the-rise-over-the-run, the steepness of the line); a weight