Summa Week 7 Flashcards Preview

Dats great > Summa Week 7 > Flashcards

Flashcards in Summa Week 7 Deck (88)
Loading flashcards...
1
Q

What is regression?

A

a way of predicting the value of one variable from another

2
Q

Regression is a _____ model of the relationship between ____ variables

A

hypothetical

two

3
Q

The regression model is a ____ one.

Linear or curvilinear?

A

linear

4
Q

We describe the relationship of a regression using the equation of a ________ _____

A

straight line

5
Q

_______ association can be summarized with a line of best fit

A

bivariate

6
Q

Bivariate association can be summarized with a ______________________

A

line of best fit

7
Q

The _____________ would have the least amount of errors in a regression line

A

the line of best fit

8
Q

What do we also call the “line of best fit”?

A

the regression line

9
Q

What else do we also call the “line of best fit”?

A

the prediction line

10
Q

What is the formula for a best fit line?

A

Yi = bo + b1X1 + E
or
Yi = B0 +B1X1 + Ei

11
Q

What is bi in regression?

A

the regression coefficient for the predictor

12
Q

what is the predictor?

A

the horizontal axis of a scatterplot used to find a regression line

13
Q

what is another name of the gradient of the regression line?

A

slope

14
Q

what is another name of the slope of the regression line?

A

gradient

15
Q

What is the slope symbolized by?

A

bi

16
Q

What does bi suggest regarding the relationship of a regression line?

A

the direction and/or strength of the relationship

17
Q

What does b0 mean in a regression line?

A

the intercept (value of Y when X = 0)

18
Q

When using b0 in a regression line, the value of Y is determined by X = ?

A

0

19
Q

What also is b0?

A

the point at which the regression line crosses the Y-axis

20
Q

What is another name of the point at which the regression line crosses the Y-axis?

A

the ordinate

21
Q

When the regression line is properly fitted, the error sum of squares is ____ than that which would obtain with any other straight line.

A

smaller

22
Q

When the regression line is properly fitted, the error sum of squares is smaller than that which would obtain with any other straight line. What is this describing?

A

the least squares criterion for determining the line of best fit/regression

23
Q

What is the least squares approach?

A

the least squares line has a sum of errors (SE), and sum of squared errors (SSE) which is smallest of all straight line models

24
Q

What does SE signify?

A

sum of errors in a least squares line

25
Q

What does SSE refer to?

A

the sum of Squared errors in the least squares line approach

26
Q

How good is the least squares line model?

A

only as good as the data given

27
Q

do we need to test how well the least squares model fits the observed data in a regression?

A

hell yeah

28
Q

What is another way of understanding regression (and by that token, ANOVA)?

A

total variation = explained variation + unexplained variation

29
Q

What is the formula for regression?

A

Sum(Y-Y_)^2 = Sum(Y’-Y_)^2 + Sum(Y-Y’)^2

30
Q

What is the sum of squares?

A

the proportion of variance accounted for by the regression model

31
Q

the proportion of variance accounted for by the regression model

A

the sum of squares

32
Q

What is a symbol for the sum of squares?

A

r^2

33
Q

What is r^2?

A

the Pearson Correlation Coefficient Squared

34
Q

What is te formula for the Pearson Correlation Coefficient Squared / proportion of variance accounted for by the regression model / r^2?

A

r^2 =
sum(Y’-Y_)^2/Sum(Y - Y_)^2

= Explained Variation / Total Variation

35
Q

A regression allows you to predict Y values given a set of X values, however it does not allow you to attribute causality to the relationship. To or F?

A

True

36
Q

The variability in Y is caused by X. Is this t or f in a Pearson’s correlation coefficient squared?

A

It’s false!

The variability can be accounted for by the variability in X, but NOT necessarily caused by X

37
Q

A regression allows you to predict __ values given a set of __ values, however it does not allow you to attribute _________ to the relationship

A

Y
X
causality

38
Q

What are two methods of identifying extreme outliers?

A
  • using a boxplot

- determining using z-scores

39
Q

How do you find extreme outliers in a boxplot?

A

SPSS: Graphs - Legacy Dialogs - Boxplot

right click on individual * (extreme outlier), and select “Clear”

40
Q

How do you identify extreme outliers using z-scores?

A

SPSS: analyze - descriptives - descriptives and select the “Save standardized values as variables” option
Eliminate cases with a z-scores +-3 SD from the mean

41
Q

What do +-3 z-scores refer to?

A

extreme outliers more than 3 SD away from the mean

42
Q

Why are extreme outliers important in regression?

A

they could influence the entire results of the study away from the estimated population parameters

43
Q

What are residuals?

A

the differences between the values of the outcome predicted by the model and the values of the outcome observed in the sample (extreme outliers)

44
Q

What is another term for residuals?

A

influential cases, or extreme outliers

45
Q

Influential cases are what?

A

those with an absolute value of standardized residuals greater than 3

46
Q

What are standardized residuals?

A

those that are divided by an ESTIMATE of their standard deviation

47
Q

What in SPSS looks at linear regression?

A

SPSS Casewise diagnostics

48
Q

Other methods to identifying influential cases in SPSS include:

A

Areas under Distances and Influence statistics in the Linear REgression form of SPSS

49
Q

Do we assume linearity is robust in regression analysis?

A

hell naw. Who can say?

50
Q

Do we assume errors are independent in a regression analysis?

A

nahhhhhhh. there could be a third or fourth variable

51
Q

Do we assume errors are normally distributed?

A

Yeah, as long as the sample size is large enough

52
Q

Do we assume homoscedasticity in regression analysis?

A

the residuals at each level of the predictor should have the same variance, but not as big of a deal if violated

53
Q

What is homogeneity of variance in arrays?

A

the variance of Y for each value of X is constant in the population

54
Q

What is normality in arrays/

A

in the population, the values of Y corresponding for any specified value of X are normally distributed around the predicted Y

55
Q

spooled^2 formula?

A

spooled^2 = df1/dftotal (s1^2) + df2/dftotal (s2^2)

56
Q

What are variable types for regression analysis?

A

the predictor variable must be quantitative or categorical, and the outcome variable must be quantitative, continuous and unbounded

57
Q

What is non-zero variance?

A

the predictor should have some variation in value

58
Q

What are predictors that are uncorrelated with “external variables”?

A

external variables are variables that haven’t been included in the regression model which influence the outcome variable

59
Q

What is the minimum sample size for regression analysis?

A

10 or 15 cases per predictor variable

60
Q

How do you visually inspect the linearity through the scatterplot of the predictor and the outcome variable?

A

SPSS Graphs legacy dialogs, scatter/dot, simple scatter – x-axis is the predictor, the y-axis is the outcome variable. Add the “line of best fit” to assist in checking linearity. If the scatterplot follows a linear pattern (versus a curvilinear pattern) then the assumption nis met

61
Q

What shape does the scatterplot of a regression analysis need to be for the assumption of linearity to be met?

A

it needs to be a line, rather than a curvilinear pattern

62
Q

How do yu check for assumptions for indpendent errors?

A

using the Durbin-Watson test for serial correlations between errors

63
Q

how can the Durbin-Watson test vary?

A

the test stat can vary between 0 and 4 with a value of 2 meaning that the residuals are uncorrelated

64
Q

Values less than __ or ___ than ___ are definitely cause for concern; however, values closer to 2 may still be problematic depending on your sample and model

A

1 or greater than 3, however values closer to 2

65
Q

Normally distributed error are found in a regression analysis by

A

visually inspect the normality through the Q-Q plot of the residuals
statistically inspect the normality: conduct z tests on skew and kurtosis of the residuals

66
Q

How do you visually inspect a scatterplot of the standardized residuals?

A

ZRESID, versus the standardized predicted values, ZPRED

67
Q

The standardized residuals ____ vary as a function of the standardized predicted values: the trend is centered around zero but also that the variance around _____ is _____ uniformly and randomly

A

MUST NOT; zero, scattered

68
Q

What is the difference between homoscedasticity and heteroscedasticity?

A

a sequence or a vector of random variables is homoscedastic /ˌhoʊmoʊskəˈdæstɪk/ if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity

69
Q

How do get a regression analysis in SPSS

A

Analyze - Regression - Linear - predicted as DV and predictor as IV, OK

70
Q

What columns are the line of best fit from?

A

y-intercept = unstandardized coefficient/ B (constant)

71
Q

y = b0 + b1 X

A

b1 = unstandardied coefficient std. error for the 2nd line item (IV)

72
Q

What is the B or bivariate correlation in regression analysis?

A

was under standardized coefficients Beta, at the IV row

73
Q

What is B?

A

the standardized coefficient for the predictor variable, or the percentage associated with

74
Q

How to test if a sample b is different from the hypothesized b* (b*=0) use df = N -2 for the formula…

A

t = b - b*/sb

75
Q

If H0 is rejected it means that in the population the ______ ______ is significantly different from zero

A

regression slope

76
Q

It can be shown that b is normally distributed about b* with a standard error approximated by the formula

A

sb = sYX / sx * square root of (N - 1)

77
Q

CI(b*) =

A

b + - (t a/2) [(sY*X) / sx square root of (N - 1)], with df = N - 2

78
Q

How do you find the adjusted R ^ 2?

A

under the Model Summary, adjusted R squared
Adjusted R^2 = .33, F(1,198) - 99.59, p < .001
(N = 200)

79
Q

What does adjusted R^2 refer to?

A

approximately (adjusted R^2) of the variance of the DV was accounted for by its linear relationship with the IV

80
Q

SSt

A

total variability between scores and the mean (how the individual stats vary from the sample mean)

81
Q

SSr

A

residual/error variability, between the regression model and the actual data (how the individual stats vary from the regression line)

82
Q

SSm

A

model variability between the model and the mean (how the mean value of U differs from the regression line)

83
Q

What are the purpose of the sums of squares?

A

SS uses the differences between the observed data and the mean value of Y

84
Q

If the model results in better prediction than using the mean, then we expect SSm to be much ______ than SSr

A

greater!

85
Q

Mean squared error is…

A

the sums of squares that are total values

86
Q

mean squared error can be expressed as…

A

averages

87
Q

Mean squared errors are called

A

mean squares, MS

88
Q

What is the formula for F-stat for regression analysis?

A

MSm / MSr