W1: Multiple Linear Regression Flashcards

1
Q

In multiple linear regression, there are:

A

Multiple independent variables (X) and one dependent variable (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In multiple linear regression, there are:

A

Multiple independent variables (X) and one dependent variable (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The multiple R-squared value for a regression represent the proportion of the variation in the Y variable that can explained by its regression on the X variables.

True or False?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The assumptions which we need to check when we perform a multiple linear regression are (3):

A

Normality of the errors
Common variance of the errors
Independence of the errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For the Kolmogorov-Smirnov and Shapiro-Wilk tests of Normality, if p < 0.05 then we conclude that the Normality assumption has been satisfied.
True or False?

Multiple linear regression

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the p-value for a correlation coefficient was p = 0.036 then the correlation would be significant at

A

5% level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We can use multiple linear regression to allow the use of several X-variables (predictors/IV) to predict the

A

response Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the multiple linear regression model equation?

A

Y = a + (b1 * X1) + (b2 * X2) + … + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the multiple linear regression model equation - Y?

A

Y is the response (DV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the multiple linear regression model equation? - X

A

X is predictors/IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the multiple linear regression model equation? - B1/B2

A

B1/B2 is the slope/gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the multiple linear regression model equation? - a

A

A is constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the multiple linear regression model equation? - e

A

e is error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The multiple linear regression has predictor variables (X) with its own

A

coefficient (b1/b2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is their an error term ( e ) in multiple linear regression?

A

Knowing the values of X1,X2…. does not allow us to predict the value of Y exactly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a residual?

A

Difference between the observed Y-value and its prediction (fitted value) based on corresponding X-values

17
Q

How to calculate residual?

Multiple linear regression

A

Residual = Observations - Fitted Valeu

18
Q

If the scatterplot of residuals are not independent + common variance (funnel effect graph)

Multiple linear regression

19
Q

If the scatterplot of residuals are not independent + common variance (funnel effect graph)

Graph does not have independence

20
Q

Test signifiance of each predictor, test null and alternate hypothesis that:

Multiple linear regression

A

H0: b = 0 vs H1 : b≠ 0 (for each particular X variable)

21
Q

Generally, an R-Squared above 0.6 (2)

Multiple linear regression

A

makes a model worth your attention
Means that most of the variability in Y var can be explained by X var/multiple linear regression model

22
Q

Step 1 (In SPSS): Writing Regression Equation (2)

A

The regression equation is:
MRI Count = 237.598 + 55.236(Gender) + 1.280 (PIQ) + 6.515 (Height)

23
Q

Step 1 (In R): Writing Regression Equation (2)

A

The regression equation is:
Costs = -3085.657 -86.774(Region) + 511.084(Sex) + 115.61(Age) -2.62(Martial) + 51.16 (Alcohol) + 138.00 (Cigs) -269.264(Exercise)

24
Q

How can you tell Y and X variables utilised in multiple linear regression model in R? (4)

A
  • Costs = Y
  • X = Region, Sex, Age, Marit, Alco
  • Data is from ex.data
  • This is all stored in variable called model
25
Step 2: Writing R^2 and Interpreting it (In R) - (2) where R^2 is less than 60% (11.3%) Multiple linear regression
R^ = 0.113 and so 11.3% of the variation in Y var (name it) can be explained by our multiple linear regression model using X variable (e.g., using X2 and X4 var) Most of the variation remains unexplained
26
Step 2: Writing R^2 and Interpreting it (In SPSS) Multiple linear regression
We see R^2 is 0.618 and so 61.8% of the variability in MRI count is explained by our multiple linear regression model
27
Step 3 Rule: What P value to include or not? Multiple linear regression
p
28
Step 3 Rule: How to interpret signifiance in R? (5)
* Anything with ' ' = significant at 100% (non-sig for mul linear reg) * Anything with . = significant at 10% (non-sig for mul linear reg) * Anything with one * = significant at 5% * Anything with two ** = significant at 1% * Anything with three *** = significant at 0.1%
29
Step 3: Interpreting p-value of predictor and whether to include them (In R) - (3) Multiple linear regression
The coefficient for X2 is significant at 5% level ( p = 0.0397) whereas the coefficient for X4 is not significant (p = 0.123) Only X2 should be kept in model
30
Step 3: Interpreting p-value of predictor and whether to include them (In SPSS) - (3)
The coefficients for Gender, PIQ and Height are all significant at the 5% level or greater, and so all can be kept in the model
31
How to write B value?
32
Step 4: Interpreting assumptions - histogram normally disturbed Multiple linear regression
33
Step 4: Interpreting assumptions - histogram is not normally disturbed Multiple linear regression
34
Step 4 - Interpreting assumptions - scatterplot random scattor Multiple linear regression
35
Step 4 - Interpreting assumptions - scatterplot no random
36
Step 5: Making a prediction and find residual for following squireel Time = 52.9382 + 21.6954 (Mass) -0.8899 (Length) + 2.9466 + 0.5157(Distance) - (5) Multiple linear regression
Input values into the equation Time = 52.9382+21.6954(1.1)+−0.8899(17)+2.9466(1.2)+0.5157(42.37) Time = 87.060969 (Fitted value) Residual = Observation - Fitted Value Residual = 78 (from table) - Fitted
37
The Kolomorgorv and Smirnov test should be greater than 0.05 so
assumption of normality of errors are satisfied
38
Written assumptions (2) Multiple linear regression that is satisfied
The histogram of residuals and normality tests ( p = 0.749 and p = 0.182) suggest that we have no evidence against the assumption of normal errors The scatterplot of predicted against residuals doesnt show any pattern suggesting the independence and constant variance assumptions on the errors are reasonable.
39
What would your next steps in modelling confidence based on multiple regression analysis? (4) Grade and income covariates not significantly
Try to remove covariants from the regression In backgrounds elimination strategy we would remove the least significant covariants (income) and consider its effect on R^2 and signing ace of remaining covariates Following we could remove grade to see it’s effect on regression model Best regression model is one with high R^2 with fewest covariates