416 - Practice Final Flashcards
Questions from the CSE Practice Final 1 and 2. (25 cards)
True / False
Linear regression is a useful model to make predictions, but it is limited by the fact that we are unable to interpret the model to make inferences about the relationships between features and the output.
False
Linear regression is interpretable — its coefficients show the strength and direction of associations. While it can’t prove causation, it’s still useful for understanding relationships between variables.
True / False
When given the choice of two models, the one that has smaller training error will always have smaller true error.
False
A smaller training error doesn’t guarantee smaller true error — a model can overfit the training data and perform poorly on unseen data.
True / False
We expect a model with high variance to generalize better than a model with high bias.
False
High-variance models overfit and often generalize poorly.
Lower variance — even with some bias — often leads to better generalization.
True / False
When we use the same model complexity on a smaller dataset, overfitting is more likely.
True
With a smaller dataset, a complex model is more likely to memorize noise — increasing the risk of overfitting.
True / False
In machine learning, bias is always a bigger source of error than variance.
False
Bias and variance both contribute to error. In some cases, bias dominates (underfitting); in others, variance does (overfitting). Neither is always the bigger source.
True / False
Given an infinite amount of noiseless training data, we expect the training error for decision stumps to go to 0.
False
Decision stumps are too simple to fit complex data perfectly — even with infinite noiseless data, some training error can remain.
True / False
As the number of iterations goes to infinity, boosting is guaranteed to reach zero training error.
True
If data is separable and assumptions hold
Boosting can reduce training error to zero as iterations increase — but this depends on the data and the learners.
True / False
Increasing k in k-NN increases bias and decreases variance.
True
Larger k: more stable (↓variance), less flexible (↑bias).
Increasing k in k-NN increases bias (less flexible) and decreases variance (more stable predictions).
True / False
To determine the best value of k for k-means, it’s sufficient to run the k-means algorithm once for each value of k you want to try.
False
k-means should be run multiple times per k to avoid bad initializations.
True / False
Increasing the number of recommended items is more likely to increase the recall than decrease it.
True
Recall = TP / (TP + FN)
Recommending more items increases the chance of including relevant ones, which raises recall.
True / False
With a large dataset, nearest neighbors is more efficient at test time than logistic regression.
False
Logistic regression is faster at test time — kNN must compare each test point to all training data.
True / False
k-means converges to a global optimum for the heterogeneity objective.
False
k-means minimizes WSS; final result depends on initialization.
k-means can get stuck in local minima — it’s not guaranteed to reach the global optimum for the heterogeneity (WSS) objective.
True / False
To find the best set of coefficients for logistic regression, we use gradient descent to minimize the number of examples misclassified.
False
Gradient descent minimizes the value of the optimization function, not misclassification count.
Define
Precision
Precision = Of predicted positives, how many were actually correct?
Formula: TP / (TP + FP)
Define
Recall
Recall = Of actual positives, how many did we correctly predict?
Formula: TP / (TP + FN)
Why would we prefer LASSO over Ridge? (Select all)
- Helps identify important features.
- Faster to learn weights.
- Lower generalization error.
- Efficient predictions with many features.
- Helps identify important features.
- Efficient predictions with many features.
LASSO performs feature selection by shrinking some coefficients to 0.
Select all that apply.
Symptoms of logistic regression overfitting?
* Large coefficients
* Good generalization
* Simple boundary
* Complex boundary
* Overconfident predictions
- Large coefficients
- Complex boundary
- Overconfident predictions
Overfitting happens when the model learns noise — often leading to extreme weights, wiggly decision boundaries, and very confident but wrong predictions.
Select all that apply.
What can happen when adding a regularization term to OLS?
* Increase training error
* Increase validation error
* Increase bias
* Increase variance
- Increase training error
- Increase validation error
- Increase bias
Regularization simplifies the model (↑ bias, ↓ variance).
It usually helps generalization, but if over-regularized, validation error can increase.
Regularization typically reduces variance, not increases it.
Which increases test error more than training error?
* Bias
* Variance
Variance
High variance causes overfitting — test error increases more than training error.
True / False
A model is overfit if it has lower training error than another model.
False
Overfitting means low training error and poor generalization — not just low training error.
Multiple Choice
Which model builds trees sequentially, focusing more on hard-to-classify examples?
* Random Forest
* AdaBoost
AdaBoost
AdaBoost builds models sequentially, focusing on mistakes.
Multiple Choice
Which model is more robust to noise and mislabeled data?
* Random Forest
* AdaBoost
Random Forest
Random Forest is more robust to noise and outliers, since it averages many trees built from bootstrapped samples.
AdaBoost can overfocus on noisy or mislabeled points.
Which one uses full decision trees and majority voting?
* AdaBoost
* Random Forest
Random Forest
Random Forest uses full decision trees and majority voting.
Which model would you prefer on a small, clean dataset where you want max accuracy?
* AdaBoost
* Random Forest
AdaBoost
AdaBoost often wins on small, clean datasets with its precision focus.