416 - Practice Final Flashcards

Questions from the CSE Practice Final 1 and 2. (25 cards)

1
Q

True / False

Linear regression is a useful model to make predictions, but it is limited by the fact that we are unable to interpret the model to make inferences about the relationships between features and the output.

A

False

Linear regression is interpretable — its coefficients show the strength and direction of associations. While it can’t prove causation, it’s still useful for understanding relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True / False

When given the choice of two models, the one that has smaller training error will always have smaller true error.

A

False

A smaller training error doesn’t guarantee smaller true error — a model can overfit the training data and perform poorly on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True / False

We expect a model with high variance to generalize better than a model with high bias.

A

False

High-variance models overfit and often generalize poorly.
Lower variance — even with some bias — often leads to better generalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True / False

When we use the same model complexity on a smaller dataset, overfitting is more likely.

A

True

With a smaller dataset, a complex model is more likely to memorize noise — increasing the risk of overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

True / False

In machine learning, bias is always a bigger source of error than variance.

A

False

Bias and variance both contribute to error. In some cases, bias dominates (underfitting); in others, variance does (overfitting). Neither is always the bigger source.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True / False

Given an infinite amount of noiseless training data, we expect the training error for decision stumps to go to 0.

A

False

Decision stumps are too simple to fit complex data perfectly — even with infinite noiseless data, some training error can remain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True / False

As the number of iterations goes to infinity, boosting is guaranteed to reach zero training error.

A

True

If data is separable and assumptions hold

Boosting can reduce training error to zero as iterations increase — but this depends on the data and the learners.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True / False

Increasing k in k-NN increases bias and decreases variance.

A

True

Larger k: more stable (↓variance), less flexible (↑bias).

Increasing k in k-NN increases bias (less flexible) and decreases variance (more stable predictions).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True / False

To determine the best value of k for k-means, it’s sufficient to run the k-means algorithm once for each value of k you want to try.

A

False

k-means should be run multiple times per k to avoid bad initializations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True / False

Increasing the number of recommended items is more likely to increase the recall than decrease it.

A

True

Recall = TP / (TP + FN)

Recommending more items increases the chance of including relevant ones, which raises recall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True / False

With a large dataset, nearest neighbors is more efficient at test time than logistic regression.

A

False

Logistic regression is faster at test time — kNN must compare each test point to all training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True / False

k-means converges to a global optimum for the heterogeneity objective.

A

False

k-means minimizes WSS; final result depends on initialization.

k-means can get stuck in local minima — it’s not guaranteed to reach the global optimum for the heterogeneity (WSS) objective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True / False

To find the best set of coefficients for logistic regression, we use gradient descent to minimize the number of examples misclassified.

A

False

Gradient descent minimizes the value of the optimization function, not misclassification count.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define

Precision

A

Precision = Of predicted positives, how many were actually correct?

Formula: TP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define

Recall

A

Recall = Of actual positives, how many did we correctly predict?

Formula: TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why would we prefer LASSO over Ridge? (Select all)

  • Helps identify important features.
  • Faster to learn weights.
  • Lower generalization error.
  • Efficient predictions with many features.
A
  • Helps identify important features.
  • Efficient predictions with many features.

LASSO performs feature selection by shrinking some coefficients to 0.

17
Q

Select all that apply.

Symptoms of logistic regression overfitting?
* Large coefficients
* Good generalization
* Simple boundary
* Complex boundary
* Overconfident predictions

A
  • Large coefficients
  • Complex boundary
  • Overconfident predictions

Overfitting happens when the model learns noise — often leading to extreme weights, wiggly decision boundaries, and very confident but wrong predictions.

18
Q

Select all that apply.

What can happen when adding a regularization term to OLS?
* Increase training error
* Increase validation error
* Increase bias
* Increase variance

A
  • Increase training error
  • Increase validation error
  • Increase bias

Regularization simplifies the model (↑ bias, ↓ variance).
It usually helps generalization, but if over-regularized, validation error can increase.
Regularization typically reduces variance, not increases it.

19
Q

Which increases test error more than training error?
* Bias
* Variance

A

Variance

High variance causes overfitting — test error increases more than training error.

20
Q

True / False

A model is overfit if it has lower training error than another model.

A

False

Overfitting means low training error and poor generalization — not just low training error.

21
Q

Multiple Choice

Which model builds trees sequentially, focusing more on hard-to-classify examples?
* Random Forest
* AdaBoost

A

AdaBoost

AdaBoost builds models sequentially, focusing on mistakes.

22
Q

Multiple Choice

Which model is more robust to noise and mislabeled data?
* Random Forest
* AdaBoost

A

Random Forest

Random Forest is more robust to noise and outliers, since it averages many trees built from bootstrapped samples.
AdaBoost can overfocus on noisy or mislabeled points.

23
Q

Which one uses full decision trees and majority voting?
* AdaBoost
* Random Forest

A

Random Forest

Random Forest uses full decision trees and majority voting.

24
Q

Which model would you prefer on a small, clean dataset where you want max accuracy?
* AdaBoost
* Random Forest

A

AdaBoost

AdaBoost often wins on small, clean datasets with its precision focus.

25
Which model is easier to train in parallel on a compute cluster? * AdaBoost * Random Forest
Random Forest ## Footnote Random Forest is easy to **parallelize**, since trees are trained independently.