ML-06 - Model and feature selection Flashcards
ML-06 - Model and feature selection
How do you use validation data to check if your problem is due to high bias vs. high variance?
(See image)
ML-06 - Model and feature selection
Which area is high bias and which is high variance?
(See image)
ML-06 - Model and feature selection
If you have high bias, is your model underfit/overfit?
Underfit
ML-06 - Model and feature selection
If you have high variance, is your model underfit/overfit?
Overfit
ML-06 - Model and feature selection
If your model is underfit, do you have high bias or variance?
High bias
ML-06 - Model and feature selection
If your model is overfit, do you have high bias or variance?
High variance
ML-06 - Model and feature selection
What is the first tool to try for overfitting problems?
Regularization
ML-06 - Model and feature selection
What does regularization prevent?
Overfitting.
ML-06 - Model and feature selection
Rescribe the bias/variance as a function of the regularization lamba parameter.
ML-06 - Model and feature selection
Describe how the error vs. training set size looks for a situation with a good bias/variance trade-off.
(See image)
ML-06 - Model and feature selection
Describe how the error vs. training set size looks for a situation high bias.
(See image)
ML-06 - Model and feature selection
Describe how the error vs. training set size looks for a situation with high variance.
(See image)
ML-06 - Model and feature selection
What should you try if you have high variance? (3)
- Get more data
- Smaller sets of features (or smaller NN)
- Try increasing regularization lambda
ML-06 - Model and feature selection
What should you try if you have high bias? (3)
- Get more features
- Feature engineering, add polynomial features
- Try decreasing regularization lambda
ML-06 - Model and feature selection
What are the 3 steps of the ML design guideline?
1) Start with a small model (baseline) that’s quick to implement.
2) Decide if more data or features will help (guided by learning curves)
3) Error analysis, manually examine samples where model made errors
ML-06 - Model and feature selection
In the ML design guideline, how do you perform error analysis? (3)
- Look at data your model predicted wrongly
- Look for systematic trends in the type of errors made
- Hypothesize what cues (features) could have helped
ML-06 - Model and feature selection
What is feature selection in ML?
Selecting which features are necessary, e.g. because they are redundant or not correlated with the labels (e.g. ID column).
ML-06 - Model and feature selection
What is the goal of feature selection?
Find an optimal set of features that results in a “best model” for a problem.
ML-06 - Model and feature selection
For feature selection, what are the 4 classes of feature selection mentioned?
- Filter methods
- Wrapper methods
- Embedded methods
- Dimensionality reduction methods
ML-06 - Model and feature selection
Describe the gist of feature selection filter methods.
From the set of all features, find a subset based on some selection criteria, e.g. correlation coefficient.
ML-06 - Model and feature selection
If you use a selection criteria to reduce a set of features to a subset, what type of feature selection method is that?
Filter methods
ML-06 - Model and feature selection
What does the correlation coefficient measure?
Linear relationships between two or more features relative to each other or the output label.
ML-06 - Model and feature selection
How would you use the correlation coefficient to filter out unnecessary features?
If two features are correlated, you don’t need both.
ML-06 - Model and feature selection
Describe how wrapper methods work.
Search
- Loop until stopping
- Generate a feature set
- Test performance
- Select best performer