Model Concepts Flashcards

(43 cards)

1
Q

What is model complexity?

A

A measure of how well the model can capture underlying patterns in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Vector/linear regression models often measure model complexity as…

A

The polynomial degree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Machine learning models often measure model complexity as…

A

The number of parameters in the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A good time to stop increasing model complexity is when…

A

Cross validation error starts to increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is bias?

A

The tendency to miss or be inaccurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is variance?

A

The tendency to be inconsistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If we have high bias in the model, it will fail to…

A

Accurately capture the relationship between features and outcome variable, however it will be wrong consistently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If we have high variance in the model, but low bias, the model will…

A

Identify properly the relationship between the features and outcome variable, but will also incorporate random noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is irreducible error?

A

Randomness/luck in the data points that does not relate to the data at all, typically from real world data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the bias-variance tradeoff?

A

Model adjustments that decrease bias, often increase variance, and vice versa, therefore this tradeoff is analogous to a complexity tradeoff

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lower degrees of complexity cause [bias/variance], while higher degrees cause [bias/variance].

A

Bias, variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is shrinkage/regularisation?

A

Adding a small adjustable regularisation parameter into the cost function which adds a penalty proportional to the size of the model parameter, thereby penalising more complex models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What issue does regularisation solve, and why?

A

Bias-variance tradeoff, as a higher regularisation strength parameter introduces a simpler model, thereby adding bias, while less regularisation makes the model more complex, adding variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is ridge regression (or L2 regularisation)?

A

The penalty is applied proportionally to squared coefficient values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we find the best regularisation parameter?

A

Cross-validation, testing each segment on a different regularisation parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is LASSO (or L1 regularisation)?

A

The penalty is applied proportionally to absolute coefficient values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the difference between L1 and L2 regularisation?

A

Both methods go to zero in different ways - L2 applies smooth but strong regularisation, while L1 is more stable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Regularisation can perform feature selection by…

A

Shrinking some features’ contributions to zero

19
Q

What is feature selection?

A

Selecting only the most important features from your data, deleting the rest

20
Q

How can we perform efficient feature selection via cross-validation?

A

Removing values one at a time and measuring predictive results - if the removed feature improves or doesn’t change the results, it can be removed

21
Q

What is gradient descent?

A

An iterative approach to fitting any machine learning model by adjusting weights based on the loss calculated from the loss function

22
Q

Why is gradient descent better than grid search and random sampling for finding optimal parameters?

A

Random sampling and grid search will take too long to converge, with grid search just being more uniform

23
Q

How does gradient descent minimise the loss of a model?

A

We start at a random point in parameter space, and calculate the error. We then adjust our parameters using the gradient of the parameters with respect to the magnitude of error

24
Q

What is L1 and L2 norm?

A

Two methods of calculating errors, wherein L1 is an absolute sum of errors, and L2 (also known as Euclidean distance) is a sum of squared errors rooted.

25
What is meant by Lp norm?
Lp norm uses a stronger root than L2 norm's squared error, such as L3 being cube error, L4 being quadratic error, etc.
26
The minimum error of a gradient descent graph is the point where...
The derivative of all dimensions are equal to exactly zero
27
What is a confusion matrix?
A diagram used to visualise the accuracy of a classifier
28
How is a confusion matrix constructed?
We graph predicted values along the y axis, compared to their true values along the x axis, representing True and False Negatives and Positives
29
What is precision?
The probability that, given a true positive, we will predict positive
30
What is recall?
The probability that we will correctly classify a value given a true positive example
31
How is precision calculated?
The number of true positives divided by the number of total predicted positives
32
How is recall calculated?
The number of true positives divided by the number of positives in the ground truth data
33
High recall and low precision implies that...
The positive values are mostly correctly classified, but negative values are not
34
Low recall but high precision implies...
We are predicting negative a lot, but when we do predict positive, we are usually correct
35
What is F1 score?
A combination of precision and recall, with high F1 scores implying a good balance between the two
36
What is sensitivity?
How many positives are correctly positive
37
What is specificity?
How many negatives are correctly negative
38
How is specificity calculated?
The inverse of accuracy, such that we divide the number of true negatives by the number of predicted negatives
39
How is sensitivity calculated?
The exact same as recall, such that we divide the number of true positives by the number of predicted positives
40
What is a ROC curve?
A combination of sensitivity and specificity, with a high ROC curve implying better performance overall
41
What is the shape of an ideal ROC curve?
An exact right angle, implying that the area under the curve is exactly 1
42
How do we construct a ROC curve?
At each point on the graph, take 1 minus the specificity, and model it against the sensitivity
43
How can we compare a ROC curve to another?
Calculate the area under the graph, a ROC curve with an AUC of 1 is optimal