Exam 1 Flashcards

Question

What two values are contained in the allowance factor? What is another name for the allowance facto

Answer 1

An allowance factor is a margin of error that contains the critical t value and the standard error of the regression coefficient

Answer 2

If a confidence interval includes zero, it tells me that 0 is a possible value of the true population statistic, which means that the regression equation is not significant.

Answer 3

Unstandardized regression coefficient measures the change in Y for every 1 unit increase in X.

Answer 4

Standardized regression coefficients are measured in standard deviation units.

Answer 5

b*yx = rxy

Answer 6

A loss function is a function of the data that we want to behave in a certain way. In OLS, we want to select the slope and the intercept so that the squared residuals of the line are as low as possible. We do this analytically by starting with the loss function (minimize sum of squared residuals) and get an equation for the slope (byx) and the intercept (bo) that minimizes the sum of the squared residuals as much as possible and creates a unique solution. The loss function is the function that sets the criterion for the solution to the analysis.

Answer 7

Listwise deletion (AKA casewise deletion, complete case deletion) throws out all the data of any case in which a person is missing even one score out of the variables in the variables listed for analysis. Pairwise deletion means that all the data points available for a particular correlation are entered into that correlation. These are not optimal nor state-of-the-art methods. In fact, multiple imputation and maximum likelihood are much better approaches.

Answer 8

A scatterplot matrix is a matrix of scatterplots which is a visual representation of a correlation matrix. Each correlation is replaced by a little scatterplot that comprise the correlation. It’s laid out where rjk is the Pearson product moment correlation between the variable in row j and the variable in column k.

Answer 9

A kernel density estimate is a smoothing function that is applied to a sample frequency distribution in order to provide an estimate of the shape of the distribution in the population.

Answer 10

A lowess line is a nonparametric smoothing function helps inspect whether a model is appropriate. It helps track the data and is subject to extreme scores. The percentage of points involved in fitting the line will make the line smooth or jagged. Lower percentage of data (50%) will make the line appear more jagged, whereas a higher percentage (80%) will smooth the line. Helps us note a specification error.

Answer 11

All t-tests have the parameter value under the null hypothesis in the numerator divided by an estimate of the standard error of the test statistic in the denominator. t= byx – 0/ sbyx df=(n-p-1)

Answer 12

The t-tests applies to both the standardized and unstandardized regression coefficients and it has the same degrees of freedom.

Answer 13

An estimation is the use of sample statistics to make an “educated guess” of the value of corresponding population parameters. Point estimates are single sample statistic value to make a direct estimate of the population parameter (Byx). Interval estimates are estimates of a range of values within which it is likely that the parameter in question lies (confidence intervals).

Answer 14

Unreliability attenuates the correlation

Answer 15

Truncation of range attenuates the correlation between variables. Because the total variance is much larger in a sample with a greater range.

Answer 16

An observed score has a random component that weakens its correlation with other variables. An observed score is made up of the true score plus some error. The reliability of a score is the proportion of the total variance that is the true score variance. The reliability of variables is the proportion of a score that is available to correlate with other scores.

Answer 17

A correlation that’s been computed between some variable j and another variable w, which is the sum of scores on a set of variables the includes j. These correlations should not be attached to significance because two independent correlations correlate to some degree and this correlation is not pulled out of the part-whole correlation. The variables with the highest variance will correlate the most highly with the total score

Answer 18

It is the model underlying OLS regression. It assumes that predictors are measured without error and that the predictors take on a fixed set of values.

Answer 19

The regression of y on predictor x1 holding x2 constant. The regression of y on predictor x2 holding x1 constant. These notations stand for regression coefficients. They represent predictors in our regression equation. They are called partial regression coefficients because each measures the unique prediction from one predictor as it relates to the criterion, taking out any redundancy in the prediction with the other predictors.

Answer 20

(1) In a regression equation, first you predict X2 from X1, which give you the part of X2 that is redundant and overlapping with X1. (2) In a second step, you take away the part of X2 that can be predicted from X1 (i.e., is redundant with X1), so that you leave only the part that is not redundant with X1. In equation form (1) X2predicted = b1 X1 + b0; (2) X2 independent of X1 = X2 - X2predicted

Answer 21

We want to obtain a regression coefficient for a predictor that reflects its unique prediction of Y that is unrelated to the other predictors. Therefore, we statistically control for or partial out the portion that is related to the other predictor. We are taking out any variance in a predictor that overlaps with the variance of another predictor to understand the unique variance it contributes.

Answer 22

The two predictors together define a predictor plane, like the floor of a room (2 dimension surface). Each pair of observed values of the predictors are represented by one point on the plane defined by the predictor. The criterion Y is plotted along the third axis going perpendicular to the plane defined by the predictors (dowel rod). For each pair of values, the Y is represented by a point directly above point X1 X2, with the height of the point defined by the value of the observed score on the criterion Y. The regression plane covers the average of all the predicted scores. The intercept says how high the plane is off the ground.

Answer 23

The data matrix (X) contains case numbers, observed values of the predictors and the value of the criterion

Answer 24

The covariation matrix (P) contains the variation of X1, X2, and Y as well as the covariation (sum of products) between each pair of variables as well: X1 and X2, X1 with Y, and X2 with Y. The main diagonal contains the variation of each predictor and the variation of the criterion. The off-diagonal elements are all covariations between pairs of scores.

Answer 25

The covariance matrix (S) contains the variances of each predictor and the criterion on the main diagonal and the covariances off the main diagonal.

Answer 26

The correlation matrix (R) contains 1 along the main diagonal, representing the correlation with each variable with itself and the correlations off the main diagonal. The correlation of each predictor with the criterion is the validity. The column of the correlations with each predictor with the criterion is the validity vector.

Answer 27

Only if x1 and x2 (the two predictors) are completely uncorrelated. Is that the same for standardized regression coefficient.

Answer 28

The squared multiple correlation represents the square of the correlation between the observed criterion scores and the predicted scores. If it’s 0, there is no prediction.

Answer 29

A linear combination is any combination in the form: W=c1X1 + c2X2 + ckXk. This regression equation derived in ordinary least squares regression is a special case in which the weights are chosen to minimize the sum of squared residuals. The coefficients lead to a predicted score that is maximally correlated with the criterion Y. No other set of weights can lead to a linear combination of the predictors that has a higher correlation with the criterion than does the predicted score from OLS regression.

Answer 30

(a) SSyhat/SSy (b) (SSy - SSy-yhat)/SSy (c) just correlate Y and Yhat (d) b*y1.2 x ry1 + b*y2.1 x ry2

Answer 31

The higher the correlation between predictors, the smaller the standard error. A good overall prediction measured by rsquared multiple decreases the standard error of each predictor.

Answer 32

The more predictors, the higher the standard error. The more cases, the less the standard error.

Answer 33

We are concerned with the relationship among predictors (2 predictors). Tolerance measures the proportion of variance in the predictor that does not overlap (is independent) with the other predictor. It appears in the (1-r212). It measures the redundancy of the variable. High redundancy will cause the regression coefficients to be unstable, and increase standard error.

Answer 34

The variance inflation factor is a measure of the extent to which the redundancy of one predictor with the other predictor causes an increase in the standard error. It multiplies the rest of the standard error. As the correlation between predictors increases, VIF increases and the standard error of the regression coefficient increases. The extent to which the redundancy of one predictor with all other predictors increases the standard error. (1/1-r212).

Answer 35

We need to measure the adjusted r2 because r2 multiple is a positively biased estimate of the extent to which the predictors accounts for the criterion in the population. It’s not completely unbiased, but it is less biased. The larger the ratio of the number of predictors p to the sample size n, the greater is the downward adjustment in the estimate of the r squared multiple. There is more shrinkage in generalizing from one sample to another sample. Again, the r2multiple is a positively biased estimate, we would expect it to be greater. However, when we try to generalize to another sample, we have to keep adjusting and making r2adjusted smaller.

Answer 36

It shows two or more dependent variables plotted as a function of a single independent variable.

Answer 37

The linear trend is removed. When we predict residuals from x, the slope = 0. The residuals are what is left of each observed Y score that does not fit the OLS regression line. It’s the part of each observed score that does NOT follow a linear trend. When we plot these residuals as a function of predictor X, we no longer see the linear trend.

Answer 38

A specification error means we have fitted the wrong model on our data. This can occur by having the wrong model content (omitting relevant variables) or from fitting the wrong form of a relationship. For example, we fit a linear model in our data, which resulted in an r2multiple of .187 (low correlation). When we fit a quadratic model to the data, we came up with an r2multiple of .993, which clearly fits our data better.

Answer 39

R: ryyhat = rmultiple, the correlation of the observed criterion scores with the predicted scores R Square: ryyhat2 = r2multiple, squared multiple correlation, the square of the correlation of the observed criterion scores with the predicted scores; this is the effect size measure of the overall regression analysis Adjusted R Square: an adjusted estimate of the overall squared multiple correlation in the population Std. Error of the Estimate: Standard deviation of the residual scores, used in computing z-score of the residuals.

Answer 40

The standard error of any sample statistic tells us the estimate of variability between our sample statistic and the true population value.

Exam 1 Flashcards

(64 cards)