Lecture 3: Regression Assumptions Alt 2 Flashcards by Michello Condollini

Why are statistical assumptions necessary in regression-based analyses?

To make inferential techniques tractable and generalizable, given the impossibility of designing tests for every possible data configuration.

How well did you know this?

Not at all

Perfectly

When are violations of regression assumptions not necessarily cause for concern?

Particularly in the case of distributional violations in large samples.

How well did you know this?

Not at all

Perfectly

What is the assumption of linearity in multiple regression?

The relationship between independent and dependent variables is assumed to be linear.

How well did you know this?

Not at all

Perfectly

What happens if the true relationship is nonlinear in a regression model?

A linear model may misrepresent it or underestimate the association.

How well did you know this?

Not at all

Perfectly

How can nonlinearity be modelled in a linear regression framework?

Using transformations or quadratic terms.

How well did you know this?

Not at all

Perfectly

What does the assumption of normality of residuals entail?

Residuals are assumed to be normally distributed.

How well did you know this?

Not at all

Perfectly

How important is the normality of residuals for inference in large samples?

It is shown to be largely unimportant for inference, especially in samples larger than about 10.

How well did you know this?

Not at all

Perfectly

Do violations of residual normality bias regression estimates or p-values significantly?

No, simulation studies show regression remains robust across a variety of skewed distributions.

How well did you know this?

Not at all

Perfectly

What is a downside of using non-parametric tests to address normality violations?

They often introduce more problems than they solve.

How well did you know this?

Not at all

Perfectly

What are univariate outliers and how are they defined?

Univariate outliers are extreme values on one variable, defined as > 3 or > 3.29 standard deviations from the mean.

How well did you know this?

Not at all

Perfectly

What are multivariate outliers in regression?

Unusual combinations of scores across variables.

How well did you know this?

Not at all

Perfectly

What is Cook’s distance used for?

To quantify how much the regression changes when a data point is removed.

How well did you know this?

Not at all

Perfectly

What value of Cook’s distance indicates a very influential or possibly problematic point?

Greater than 1.

How well did you know this?

Not at all

Perfectly

What should be done with influential data points in regression analysis?

They should be examined for accuracy, and their impact reported if they substantively affect conclusions.

How well did you know this?

Not at all

Perfectly

Under what conditions are extreme values less problematic due to the Central Limit Theorem?

When they are real and the sample size is large.

How well did you know this?

Not at all

Perfectly

What should be done with extreme values in small sample sizes?

The regression should be run both including and excluding them, and this should be reported to the reader.

How well did you know this?

Not at all

Perfectly

What is the assumption of homoskedasticity in regression?

Study These Flashcards

Equal variance of residuals across all values of the independent variable (predictor).

What can violations of homoskedasticity (heteroskedasticity) lead to?

Study These Flashcards

Inflated type I errors and biased coefficient estimates.

How can heteroskedasticity be tested?

Study These Flashcards

Via residual plots or formal tests like the White test.

In what kinds of data is heteroskedasticity more likely to occur?

Study These Flashcards

In highly skewed data with very long tails.

What are some causes of heteroskedasticity?

Study These Flashcards

Un-modelled variables (e.g., a moderator affecting different processes at different levels of the IV) and nonlinear effects.

What are remedies for heteroskedasticity?

Study These Flashcards

Transforming the dependent variable, modelling potential moderating variables, or applying heteroskedasticity-consistent standard errors.

What does applying heteroskedasticity-consistent standard errors correct?

Study These Flashcards

Only inference issues, not coefficient bias.

What is arguably the most critical assumption in regression analysis?

Study These Flashcards

Independence of observations.

What are examples of data structures that violate independence of observations?

Clustered or serial data (e.g., students within schools, repeated measures, or time series).

What is the statistical consequence of non-independent observations?

Inflated precision due to underestimated variability, leading to increased false positives (Type I error inflation).

What statistic can test for autocorrelation in serial data?

The Durbin-Watson statistic.

What are solutions to non-independence of observations?

Multilevel modelling or aggregating data within clusters.

What is the drawback of aggregating data within clusters?

Reduces power by using less evidence than available, even though it reduces Type I error.

What is multicollinearity in regression?

Excessive correlation between predictors.

What are the consequences of multicollinearity?

Unstable and imprecise estimates of regression coefficients.

Does multicollinearity bias overall model predictions (R²)?

No, but it impairs interpretation of individual predictors.

What correlation levels between predictors indicate problematic multicollinearity?

Greater than .8 or .9.

What tolerance value indicates problematic multicollinearity?

Less than .2 (i.e., less than 20% variance not accounted for by other IVs).

What variance inflation factor (VIF) value indicates problematic multicollinearity?

Greater than 5.

What are remedies for multicollinearity?

Combining variables, deleting redundant predictors, using factor analysis, centering or standardising independent variables, or increasing sample size.

When can multicollinearity be tolerated in regression analysis?

If individual coefficients aren’t of interest.

What is the recommended attitude toward assumption violations in regression analysis?

Understand assumption violations as common and manageable, rather than disqualifying.

Why is transparency important in statistical analysis?

It ensures credibility and prevents misuse of researcher degrees of freedom to manipulate p-values.

What are researchers encouraged to report in regression analyses?

Assumption checks and decisions, allowing others to assess the robustness of their findings.

Lecture 3: Regression Assumptions Alt 2 Flashcards

(40 cards)