Summa Week 7 Flashcards

regression (88 cards)

1
Q

What is regression?

A

a way of predicting the value of one variable from another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression is a _____ model of the relationship between ____ variables

A

hypothetical

two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The regression model is a ____ one.

Linear or curvilinear?

A

linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

We describe the relationship of a regression using the equation of a ________ _____

A

straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

_______ association can be summarized with a line of best fit

A

bivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bivariate association can be summarized with a ______________________

A

line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The _____________ would have the least amount of errors in a regression line

A

the line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do we also call the “line of best fit”?

A

the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What else do we also call the “line of best fit”?

A

the prediction line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for a best fit line?

A

Yi = bo + b1X1 + E
or
Yi = B0 +B1X1 + Ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is bi in regression?

A

the regression coefficient for the predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the predictor?

A

the horizontal axis of a scatterplot used to find a regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is another name of the gradient of the regression line?

A

slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is another name of the slope of the regression line?

A

gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the slope symbolized by?

A

bi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does bi suggest regarding the relationship of a regression line?

A

the direction and/or strength of the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does b0 mean in a regression line?

A

the intercept (value of Y when X = 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When using b0 in a regression line, the value of Y is determined by X = ?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What also is b0?

A

the point at which the regression line crosses the Y-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is another name of the point at which the regression line crosses the Y-axis?

A

the ordinate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When the regression line is properly fitted, the error sum of squares is ____ than that which would obtain with any other straight line.

A

smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When the regression line is properly fitted, the error sum of squares is smaller than that which would obtain with any other straight line. What is this describing?

A

the least squares criterion for determining the line of best fit/regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the least squares approach?

A

the least squares line has a sum of errors (SE), and sum of squared errors (SSE) which is smallest of all straight line models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does SE signify?

A

sum of errors in a least squares line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does SSE refer to?
the sum of Squared errors in the least squares line approach
26
How good is the least squares line model?
only as good as the data given
27
do we need to test how well the least squares model fits the observed data in a regression?
hell yeah
28
What is another way of understanding regression (and by that token, ANOVA)?
total variation = explained variation + unexplained variation
29
What is the formula for regression?
Sum(Y-Y_)^2 = Sum(Y'-Y_)^2 + Sum(Y-Y')^2
30
What is the sum of squares?
the proportion of variance accounted for by the regression model
31
the proportion of variance accounted for by the regression model
the sum of squares
32
What is a symbol for the sum of squares?
r^2
33
What is r^2?
the Pearson Correlation Coefficient Squared
34
What is te formula for the Pearson Correlation Coefficient Squared / proportion of variance accounted for by the regression model / r^2?
r^2 = sum(Y'-Y_)^2/Sum(Y - Y_)^2 = Explained Variation / Total Variation
35
A regression allows you to predict Y values given a set of X values, however it does not allow you to attribute causality to the relationship. To or F?
True
36
The variability in Y is caused by X. Is this t or f in a Pearson's correlation coefficient squared?
It's false! | The variability can be accounted for by the variability in X, but NOT necessarily caused by X
37
A regression allows you to predict __ values given a set of __ values, however it does not allow you to attribute _________ to the relationship
Y X causality
38
What are two methods of identifying extreme outliers?
- using a boxplot | - determining using z-scores
39
How do you find extreme outliers in a boxplot?
SPSS: Graphs - Legacy Dialogs - Boxplot | right click on individual * (extreme outlier), and select "Clear"
40
How do you identify extreme outliers using z-scores?
SPSS: analyze - descriptives - descriptives and select the "Save standardized values as variables" option Eliminate cases with a z-scores +-3 SD from the mean
41
What do +-3 z-scores refer to?
extreme outliers more than 3 SD away from the mean
42
Why are extreme outliers important in regression?
they could influence the entire results of the study away from the estimated population parameters
43
What are residuals?
the differences between the values of the outcome predicted by the model and the values of the outcome observed in the sample (extreme outliers)
44
What is another term for residuals?
influential cases, or extreme outliers
45
Influential cases are what?
those with an absolute value of standardized residuals greater than 3
46
What are standardized residuals?
those that are divided by an ESTIMATE of their standard deviation
47
What in SPSS looks at linear regression?
SPSS Casewise diagnostics
48
Other methods to identifying influential cases in SPSS include:
Areas under Distances and Influence statistics in the Linear REgression form of SPSS
49
Do we assume linearity is robust in regression analysis?
hell naw. Who can say?
50
Do we assume errors are independent in a regression analysis?
nahhhhhhh. there could be a third or fourth variable
51
Do we assume errors are normally distributed?
Yeah, as long as the sample size is large enough
52
Do we assume homoscedasticity in regression analysis?
the residuals at each level of the predictor should have the same variance, but not as big of a deal if violated
53
What is homogeneity of variance in arrays?
the variance of Y for each value of X is constant in the population
54
What is normality in arrays/
in the population, the values of Y corresponding for any specified value of X are normally distributed around the predicted Y
55
spooled^2 formula?
spooled^2 = df1/dftotal (s1^2) + df2/dftotal (s2^2)
56
What are variable types for regression analysis?
the predictor variable must be quantitative or categorical, and the outcome variable must be quantitative, continuous and unbounded
57
What is non-zero variance?
the predictor should have some variation in value
58
What are predictors that are uncorrelated with "external variables"?
external variables are variables that haven't been included in the regression model which influence the outcome variable
59
What is the minimum sample size for regression analysis?
10 or 15 cases per predictor variable
60
How do you visually inspect the linearity through the scatterplot of the predictor and the outcome variable?
SPSS Graphs legacy dialogs, scatter/dot, simple scatter -- x-axis is the predictor, the y-axis is the outcome variable. Add the "line of best fit" to assist in checking linearity. If the scatterplot follows a linear pattern (versus a curvilinear pattern) then the assumption nis met
61
What shape does the scatterplot of a regression analysis need to be for the assumption of linearity to be met?
it needs to be a line, rather than a curvilinear pattern
62
How do yu check for assumptions for indpendent errors?
using the Durbin-Watson test for serial correlations between errors
63
how can the Durbin-Watson test vary?
the test stat can vary between 0 and 4 with a value of 2 meaning that the residuals are uncorrelated
64
Values less than __ or ___ than ___ are definitely cause for concern; however, values closer to 2 may still be problematic depending on your sample and model
1 or greater than 3, however values closer to 2
65
Normally distributed error are found in a regression analysis by
visually inspect the normality through the Q-Q plot of the residuals statistically inspect the normality: conduct z tests on skew and kurtosis of the residuals
66
How do you visually inspect a scatterplot of the standardized residuals?
ZRESID, versus the standardized predicted values, ZPRED
67
The standardized residuals ____ vary as a function of the standardized predicted values: the trend is centered around zero but also that the variance around _____ is _____ uniformly and randomly
MUST NOT; zero, scattered
68
What is the difference between homoscedasticity and heteroscedasticity?
a sequence or a vector of random variables is homoscedastic /ˌhoʊmoʊskəˈdæstɪk/ if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity
69
How do get a regression analysis in SPSS
Analyze - Regression - Linear - predicted as DV and predictor as IV, OK
70
What columns are the line of best fit from?
y-intercept = unstandardized coefficient/ B (constant)
71
y = b0 + b1 X
b1 = unstandardied coefficient std. error for the 2nd line item (IV)
72
What is the B or bivariate correlation in regression analysis?
was under standardized coefficients Beta, at the IV row
73
What is B?
the standardized coefficient for the predictor variable, or the percentage associated with
74
How to test if a sample b is different from the hypothesized b* (b*=0) use df = N -2 for the formula...
t = b - b*/sb
75
If H0 is rejected it means that in the population the ______ ______ is significantly different from zero
regression slope
76
It can be shown that b is normally distributed about b* with a standard error approximated by the formula
sb = sYX / sx * square root of (N - 1)
77
CI(b*) =
b + - (t a/2) [(sY*X) / sx square root of (N - 1)], with df = N - 2
78
How do you find the adjusted R ^ 2?
under the Model Summary, adjusted R squared Adjusted R^2 = .33, F(1,198) - 99.59, p < .001 (N = 200)
79
What does adjusted R^2 refer to?
approximately (adjusted R^2) of the variance of the DV was accounted for by its linear relationship with the IV
80
SSt
total variability between scores and the mean (how the individual stats vary from the sample mean)
81
SSr
residual/error variability, between the regression model and the actual data (how the individual stats vary from the regression line)
82
SSm
model variability between the model and the mean (how the mean value of U differs from the regression line)
83
What are the purpose of the sums of squares?
SS uses the differences between the observed data and the mean value of Y
84
If the model results in better prediction than using the mean, then we expect SSm to be much ______ than SSr
greater!
85
Mean squared error is...
the sums of squares that are total values
86
mean squared error can be expressed as...
averages
87
Mean squared errors are called
mean squares, MS
88
What is the formula for F-stat for regression analysis?
MSm / MSr