Model Concepts Flashcards
(43 cards)
What is model complexity?
A measure of how well the model can capture underlying patterns in the data
Vector/linear regression models often measure model complexity as…
The polynomial degree
Machine learning models often measure model complexity as…
The number of parameters in the model
A good time to stop increasing model complexity is when…
Cross validation error starts to increase
What is bias?
The tendency to miss or be inaccurate
What is variance?
The tendency to be inconsistent
If we have high bias in the model, it will fail to…
Accurately capture the relationship between features and outcome variable, however it will be wrong consistently
If we have high variance in the model, but low bias, the model will…
Identify properly the relationship between the features and outcome variable, but will also incorporate random noise
What is irreducible error?
Randomness/luck in the data points that does not relate to the data at all, typically from real world data
What is the bias-variance tradeoff?
Model adjustments that decrease bias, often increase variance, and vice versa, therefore this tradeoff is analogous to a complexity tradeoff
Lower degrees of complexity cause [bias/variance], while higher degrees cause [bias/variance].
Bias, variance
What is shrinkage/regularisation?
Adding a small adjustable regularisation parameter into the cost function which adds a penalty proportional to the size of the model parameter, thereby penalising more complex models
What issue does regularisation solve, and why?
Bias-variance tradeoff, as a higher regularisation strength parameter introduces a simpler model, thereby adding bias, while less regularisation makes the model more complex, adding variance
What is ridge regression (or L2 regularisation)?
The penalty is applied proportionally to squared coefficient values
How can we find the best regularisation parameter?
Cross-validation, testing each segment on a different regularisation parameter
What is LASSO (or L1 regularisation)?
The penalty is applied proportionally to absolute coefficient values
What is the difference between L1 and L2 regularisation?
Both methods go to zero in different ways - L2 applies smooth but strong regularisation, while L1 is more stable
Regularisation can perform feature selection by…
Shrinking some features’ contributions to zero
What is feature selection?
Selecting only the most important features from your data, deleting the rest
How can we perform efficient feature selection via cross-validation?
Removing values one at a time and measuring predictive results - if the removed feature improves or doesn’t change the results, it can be removed
What is gradient descent?
An iterative approach to fitting any machine learning model by adjusting weights based on the loss calculated from the loss function
Why is gradient descent better than grid search and random sampling for finding optimal parameters?
Random sampling and grid search will take too long to converge, with grid search just being more uniform
How does gradient descent minimise the loss of a model?
We start at a random point in parameter space, and calculate the error. We then adjust our parameters using the gradient of the parameters with respect to the magnitude of error
What is L1 and L2 norm?
Two methods of calculating errors, wherein L1 is an absolute sum of errors, and L2 (also known as Euclidean distance) is a sum of squared errors rooted.