Multiple Regression Flashcards

1
Q

What type of analysis does Multiple Regression come under?

A

Analysis of dependence = “in which one variable is identified for study and is then examined in terms of its dependence on others”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Multiple Regression, why is it useful?

A

It involves moving beyond the simple and often inappropriate reliance on a single variable (X) for predicting changes in Y.

It uses multiple variables to explain a single outcome, thus allowing the investigation of more complex phenomena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the coefficient of explanation for a multiple regression?

A

Adjusted R-square = how much variation we see that can be accounted for by the model, between -1 and +1, give as a PERCENTAGE %. Unlike simple linear, here it is ‘adjusted’ to account for the number of predictor variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In what other ways are the independent variables called in this type of test?

A

‘Predictors’ or ‘regressors’ - the variables (X) that are being used to predict changes in Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the general equation for this?

A

y = a + b1x1 + b2x2…… + bnxn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the problem of visualising a multiple regression dealt with?

A

Uses a ‘best-fit plane’ to which variables fit.

Using the adjusted R-square, we can compare ‘goodness of fit’ between similar models (but NOT different studies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is scale-dependency? Why does it occur?

A

Different variables are often measured differently (on different scales), but the model requires variables to be on the same scale otherwise the relative influence of a variable on the outcome variable is impossible to determine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is scale-dependecy dealt with?

A

Standardisation = conversion of partial regression coefficients into standardised coefficients, called BETA VALUES.

(all variables converted based on the distribution of data, into z-units by taking mean of X from X, divided by standard deviation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two other model outcomes needed to report results (besides adjusted R-square)?

A

F-ratio = significance of overall regression model, total explained variance; the larger the more variance that is explained

T = the significance of coefficients in explaining the variance (null says B = 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name the three problems that could stop the model from working effectively.

A

Multicollinearity
Heteroscedasticity
Autocorrelation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is multicollinearity? How is it identified and how is it addressed?

A

In multiple regression, each predictor should be independent.

When there is high correlation between any two predictor variables (makes it difficult to isolate the influence of each X variable on Y).

It is present when tolerance is lower than 0.20 and VIF scores exceed 5.0.

Addressed by removing one or more of the correlated variables, creating an interaction term, or reducing them using a factor/principal component analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is heteroscedasticity? How is it identified?

A

Variables should be homoscedastic, represented by an even scatter on P-P plot, a Gaussian distribution, and a random scatterplot.

When the x variables are uneven, the above figures would not occur and the models would be heteroscedastic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is autocorrelation? How is it identified?

A

When you don’t have independence between residuals.

The result of the Durbin-Watson test indicates whether null can be rejected or not. BUT, it is more of a problem to deal with if using time series data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name three approaches and outline their differences.

A

SIMULTANEOUS - “cauldron pot”; used when there is no strong theoretical consideration underpinning the importance of variables

HIERARCHICAL - predictors added in a particular order based on priority (often statistical significance), often following theoretical considerations based on past research; a logical order

STEP-WISE - forward or backward; adding one by one or taking out one by one; predictors are based only on their statistical significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is explained variance calculated?

A

Sum of ‘best estimate of Y’ subtract ‘mean of Y’ squared, over K (no. of predictors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is unexplained variance calculated?

A

Sum of ‘individual observations of Y’ subtract ‘best Y estimate’,
squared, over n - K - 1 (no. of pairs of observations, no. of predictors, -1)