Chapter 12: Assumptions Starblind Flashcards

1
Q

What are two tests that evaluate normality in a distribution?

A

The Kolmogorov–Smirnov (K-S) and Shapiro–Wilk tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

During what analysis is it OK to not have homogeneity of variance?

A

During parameter estimates you can bootstrap to make up for a lack of homoscedasticity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is multicollinearity?

A

“…when there is a strong relationship between two or more predictors.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pertaining to normal distribution, when should you use the method of least squares?

A

When a distribution is normal, but not when it is not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

b0 represents what in the general linear model?

A

The intercept, or the value of the outcome variable when the predictor variable is zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are measures of homoscedasticity and normality fundamentally flawed?

A

Their goal is to detect whether the variance in a data set is equally or normally distributed.

They rely on the concept of power to fuel this prediction (given that they are based on NHST). And power comes from sample size.

When sample size is low, i.e. when heteroscedasticity or non-normality is most likely, these tests lack power.

When sample size is high, i.e. when equal/normal variance is guaranteed due to the central limit theorem, these tests have appropriate power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of variables should our predictor variable contain? How do outcome variables differ?

A

“All predictor variables must be quantitative or categorical (with two categories), and the outcome variable must be quantitative, continuous and unbounded.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The assumption of homogeneity of variance has been violated.

1) What two possible methods could you have calculated this with? Give a short explanation about interpreting them…
2) What does this mean for your analysis?

A

1) Levene’s test: Tests whether the variances between groups are equal. If less than p = 0.05 then the variances are not equal.

Hartley’s Fmax: The ratio of the variance between the group with the biggest variance and the group with the smallest variance. If greater than the critical value then the variances are not equal, in some cases we use 2 as a default value.

2) Without homoscedasticity, any formula that uses standard error is invalid (confidence intervals and test statistics).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

As collinearity increases there are three issues that we should be aware of. What are these, and why should we care?

A

1) As collinearity increases so do estimates of the standard error of the parameters, b.

This makes your sample less representative of the population.

2) Multicollinearity limits the fit of the overall model.

Factors account for the same variance in the model

3) Multicollinearity between predictors makes it difficult to assess the individual importance of a predictor.

If the predictors are highly collinear, and each accounts for similar variance in the outcome, then how can we know which of the two variables is important?’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the best ways to assess additivity and linearity along with homoscedasticity?

A

A scatter plot graph.

Linearity: Should roughly follow a straight line without curvature.

Homoscedasticity: Should have an equal spread across the graph—with no cone shapes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If a parameter estimate is biased, what else would you expect to have error?

A

“Standard errors, confidence intervals, test statistics and p-values.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are two ways that we measure multicollinearity in a model?

A

1) Variance Inflation Factor (VIF): Measures whether a predictor has a strong linear relationship with the other predictors.

> 10 = Serious multicollinearity
<10 but >1 = Potential multicollinearity

2) Tolerance Statistic (1/VIF): Reciprocal of VIF.

<0.1 = Serious multicollinearity
<0.2 = Potential multicollinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is kurtosis?

A

“Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

b1 represents what in the general linear model?

A

The parameter of the predictor variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Briefly explain the assumption of no external variables.

A

There is no relationship between external variables and the model—as they have been controlled for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The assumption of independent errors has been violated.

1) How did you test for this?
2) What does this mean for your analysis?

A

1) Durbin-Watson: A test that assesses the dependence of fixed order data points (independence).
2) Without independence, any formula that uses standard error is invalid.

17
Q

What are the three most important linear model assumptions in order?

A

1) Additivity and linearity.
2) Independent error.
3) Homoscedasticity.