Model Selection Flashcards

(36 cards)

1
Q

What is the traditional strategy?

A

No model selection. You run one model, and assume every continuous variable has a linear relationship. Include all interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Problems with traditional strategy?

A

No multicollinearity assessed, overfiting data and not all relationships are linear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Historical strategy

A

Based on past models that ran just fine. One advantage is that with this, you are able to compare data from past models because you include same predictors and same variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Problems with historical designs

A

You may ignore variables that were not initially considered. Design may be insufficient eventually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The exploratory approach

A

Here a wide range of models are run and then usually the one that worked is the one reported.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Problems with the exploratory strategy?

A

A lot of experimenter degrees of freedom. Results could be too flexible and could lead to replication problems. Use a more conservative model selection approach (AIC and BIC are always safe bets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Don’t do this in calling analyses

A

Call exploratory analysis confirmatory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Theoretically driven approach

A

Only a limited number of models are run here, 3 tops has to be theoretically driven. Is very systematic and highly localized. Less prone to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Problems with theoretically driven?

A

Can miss the analysis of variables already collected. Is okay to include both models just report everything. But for confirming that you should run an additional experiment and run it as confirmatory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the mixed strategy?

A

Report which decisions models were exploratory, theoretically driven, etc. in its appropriate sections. Can segment the analysis into different parts, lessens the overfitting issue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Disadvantages of mixed strategy?

A

Researcher degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

P- focused strategy

A

Choose the model that has your key variables significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interocular Test

A

You look at the results, and it hits you between the eyes. Plot your data,a which says a lot. Statistics are more like an afterthought. Worry if the graph shows no differences and the analysis is telling you otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens with the R square

A

The proportion of variance accounted for. To calculate this you have the mean somewhere in the formula. Mean is a measure of central tendency in a NORMAL DISTRIBUTION. Which is why in models where this is not met, R square is not a good model fit indicator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RMSE

A

Is the square root of the variance squared or the ss formula. The squared errors are averaged. In other words this is a measure of the average distance any point is from the mean. But this does not by any means deal with complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mallow CP

A

In JMP, when you try to do a stepwise regression. Another inde,x butit is not common

17
Q

The AIC and BIC

A

The Akaike and Bayesian Information criteria. One penalizes for complexity. BIC favors simpler models over complex models

18
Q

Likelihood Ratios

A

When you are comparing one model to the other we talk about how much likely one model is to fit the data versus another model. It gives a ratio because… is 10 times more likely, is 35 times more likely.

19
Q

AIC / BIC / Likelihood ratios

A

Since AIC and BIC are based on Log likelihoods, you can include them in the model fit inspection, or you can transform AICs and BICs to the log likelihood scale to make them more interpretable

20
Q

Cross validation

A

Uses two types of data. Sample data to develop a model. Test the dataset to see if that model also applies to more data. This is an empirical way to determine if the model is going to perform well out of that sample. If it does, it cross-validates.

21
Q

LOO

A

Leave-one-out cross-validation. It holds one line out of your data out and after the model is done and created, see how well it does with that one out, and repeats for every single row in your dataset

22
Q

Bootstrapping

A

Sampling with replacement, and treat the distribution as the population. It gives you an empirically estimated distribution Some people use it to make up for a small sample size.

23
Q

When do you use bootstrapping

A

When you have bad, very bad distributional properties that there is no statistical test to determine the differences. This happens in something called interportal ranges.

24
Q

Where could you use this?

A

In multilevel repeated meassures designs when you have imbalances but slight ones like in each condition some values were missing but does not apply if the whole level is missing

25
Another application of Bootstrapping
You can bootstrap your R square values which is great because now you add the uncertainty estimate of the R square to make more informed decisions
26
Weakness of R-squared
It isn’t really appropriate when the errors aren’t normally distributed. R-squared and RMSE do not control for model flexibility
27
Weaknesses of AIC and BIC
Can only compare models using the same scale in both models (log-log) Very sensitive to missing data for comparison, be careful if a variable has missing data, as it makes the model no longer comparable. You cannot compare a not-transformed vs a transformed model
28
Overfitting
Too small sample sizes. Robustness over p-values.
29
Bias-variance tradeoff
With cross valitadion we are stimating how big the gap is. Too small an be underfitting, too large can be overfitting
30
What is bias?
Model does not fit the training data... makes sense
31
What is variance?
The size of the difference between testing and training systems
32
Ideal model
The one that minimizes both variance and bias
33
Cross validation vs BIC, AIC
When you don't have enough data you can opt for a AIC/BIC
34
Statistical comparissons
Statistical comparisson of both models. chi square the most common and tests for fit as a function of increased flexibility. But models have to be nested within the other
35
What does nested mean?
A model is nested when it includes a subset of the predictors that is in the model one
36
Check model is a lifesaver
Because when AICs are not okay because the two scales of y are different, when your models are not nested, when you don't have enough to cross validate... Check model is enough to compare between models