Lecture 3: Regression Assumptions Flashcards

(36 cards)

1
Q

Why are statistical assumptions important in regression?

A

To ensure reliable and generalisable inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Are all assumption violations equally serious?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In large samples, are assumption violations usually a major concern?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the linearity assumption require?

A

A linear relationship between predictors and the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can be used to model nonlinearity?

A

Transformations or polynomial terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens if the true relationship is nonlinear and uncorrected?

A

The model may misrepresent the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the normality assumption refer to in regression?

A

Normal distribution of residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is normality of residuals important for large samples?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Do non-normal residuals usually bias regression estimates?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do simulation studies suggest about regression with skewed data?

A

It remains robust.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of outlier is extreme on one variable?

A

Univariate outlier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of outlier has an unusual combination of variable values?

A

Multivariate outlier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What statistic is used to measure the influence of a data point?

A

Cook’s distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What value of Cook’s distance indicates high influence?

A

Greater than 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In small samples, are outliers more problematic?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does homoskedasticity mean?

A

Equal variance of residuals across predictor values.

17
Q

What is it called when variance of residuals is unequal?

A

Heteroskedasticity.

18
Q

What kind of error is inflated by heteroskedasticity?

A

Type I error.

19
Q

Name one test for heteroskedasticity.

20
Q

What transformation can help with long-tailed data?

A

Log transformation of the dependent variable.

21
Q

What is one way to correct inference under heteroskedasticity?

A

Heteroskedasticity-consistent standard errors.

22
Q

What is the most critical assumption in regression?

A

Independence of observations.

23
Q

What are common causes of non-independence?

A

Clustering and repeated measures.

24
Q

What does non-independence do to variability estimates?

A

Underestimates variability.

25
What error increases due to non-independence?
Type I error.
26
What statistic tests for autocorrelation in time series?
Durbin-Watson statistic.
27
What model handles clustered data properly?
Multilevel model.
28
What is multicollinearity?
High correlation between predictors.
29
Does multicollinearity bias overall model predictions?
No.
30
What does multicollinearity affect most?
Individual regression coefficients.
31
What VIF value suggests a problem?
Greater than 5.
32
What tolerance value indicates problematic multicollinearity?
Less than 0.2.
33
What is one solution for multicollinearity?
Combine predictors.
34
How should researchers deal with assumption violations?
With transparency.
35
Is regression robust to distributional violations?
Yes.
36
Why is transparency important in reporting regression analyses?
To maintain credibility and avoid misuse.