Lecture 3: Regression Assumptions Alt 2 Flashcards

(40 cards)

1
Q

Why are statistical assumptions necessary in regression-based analyses?

A

To make inferential techniques tractable and generalizable, given the impossibility of designing tests for every possible data configuration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When are violations of regression assumptions not necessarily cause for concern?

A

Particularly in the case of distributional violations in large samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the assumption of linearity in multiple regression?

A

The relationship between independent and dependent variables is assumed to be linear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens if the true relationship is nonlinear in a regression model?

A

A linear model may misrepresent it or underestimate the association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can nonlinearity be modelled in a linear regression framework?

A

Using transformations or quadratic terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the assumption of normality of residuals entail?

A

Residuals are assumed to be normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How important is the normality of residuals for inference in large samples?

A

It is shown to be largely unimportant for inference, especially in samples larger than about 10.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Do violations of residual normality bias regression estimates or p-values significantly?

A

No, simulation studies show regression remains robust across a variety of skewed distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a downside of using non-parametric tests to address normality violations?

A

They often introduce more problems than they solve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are univariate outliers and how are they defined?

A

Univariate outliers are extreme values on one variable, defined as > 3 or > 3.29 standard deviations from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are multivariate outliers in regression?

A

Unusual combinations of scores across variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Cook’s distance used for?

A

To quantify how much the regression changes when a data point is removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What value of Cook’s distance indicates a very influential or possibly problematic point?

A

Greater than 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What should be done with influential data points in regression analysis?

A

They should be examined for accuracy, and their impact reported if they substantively affect conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Under what conditions are extreme values less problematic due to the Central Limit Theorem?

A

When they are real and the sample size is large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What should be done with extreme values in small sample sizes?

A

The regression should be run both including and excluding them, and this should be reported to the reader.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the assumption of homoskedasticity in regression?

A

Equal variance of residuals across all values of the independent variable (predictor).

18
Q

What can violations of homoskedasticity (heteroskedasticity) lead to?

A

Inflated type I errors and biased coefficient estimates.

19
Q

How can heteroskedasticity be tested?

A

Via residual plots or formal tests like the White test.

20
Q

In what kinds of data is heteroskedasticity more likely to occur?

A

In highly skewed data with very long tails.

21
Q

What are some causes of heteroskedasticity?

A

Un-modelled variables (e.g., a moderator affecting different processes at different levels of the IV) and nonlinear effects.

22
Q

What are remedies for heteroskedasticity?

A

Transforming the dependent variable, modelling potential moderating variables, or applying heteroskedasticity-consistent standard errors.

23
Q

What does applying heteroskedasticity-consistent standard errors correct?

A

Only inference issues, not coefficient bias.

24
Q

What is arguably the most critical assumption in regression analysis?

A

Independence of observations.

25
What are examples of data structures that violate independence of observations?
Clustered or serial data (e.g., students within schools, repeated measures, or time series).
26
What is the statistical consequence of non-independent observations?
Inflated precision due to underestimated variability, leading to increased false positives (Type I error inflation).
27
What statistic can test for autocorrelation in serial data?
The Durbin-Watson statistic.
28
What are solutions to non-independence of observations?
Multilevel modelling or aggregating data within clusters.
29
What is the drawback of aggregating data within clusters?
Reduces power by using less evidence than available, even though it reduces Type I error.
30
What is multicollinearity in regression?
Excessive correlation between predictors.
31
What are the consequences of multicollinearity?
Unstable and imprecise estimates of regression coefficients.
32
Does multicollinearity bias overall model predictions (R²)?
No, but it impairs interpretation of individual predictors.
33
What correlation levels between predictors indicate problematic multicollinearity?
Greater than .8 or .9.
34
What tolerance value indicates problematic multicollinearity?
Less than .2 (i.e., less than 20% variance not accounted for by other IVs).
35
What variance inflation factor (VIF) value indicates problematic multicollinearity?
Greater than 5.
36
What are remedies for multicollinearity?
Combining variables, deleting redundant predictors, using factor analysis, centering or standardising independent variables, or increasing sample size.
37
When can multicollinearity be tolerated in regression analysis?
If individual coefficients aren’t of interest.
38
What is the recommended attitude toward assumption violations in regression analysis?
Understand assumption violations as common and manageable, rather than disqualifying.
39
Why is transparency important in statistical analysis?
It ensures credibility and prevents misuse of researcher degrees of freedom to manipulate p-values.
40
What are researchers encouraged to report in regression analyses?
Assumption checks and decisions, allowing others to assess the robustness of their findings.