416 - Practice Final Flashcards by Danner Peter

True / False

Linear regression is a useful model to make predictions, but it is limited by the fact that we are unable to interpret the model to make inferences about the relationships between features and the output.

False

Linear regression is interpretable — its coefficients show the strength and direction of associations. While it can’t prove causation, it’s still useful for understanding relationships between variables.

How well did you know this?

Not at all

Perfectly

True / False

When given the choice of two models, the one that has smaller training error will always have smaller true error.

False

A smaller training error doesn’t guarantee smaller true error — a model can overfit the training data and perform poorly on unseen data.

How well did you know this?

Not at all

Perfectly

True / False

We expect a model with high variance to generalize better than a model with high bias.

False

High-variance models overfit and often generalize poorly.
Lower variance — even with some bias — often leads to better generalization.

How well did you know this?

Not at all

Perfectly

True / False

When we use the same model complexity on a smaller dataset, overfitting is more likely.

True

With a smaller dataset, a complex model is more likely to memorize noise — increasing the risk of overfitting.

How well did you know this?

Not at all

Perfectly

True / False

In machine learning, bias is always a bigger source of error than variance.

False

Bias and variance both contribute to error. In some cases, bias dominates (underfitting); in others, variance does (overfitting). Neither is always the bigger source.

How well did you know this?

Not at all

Perfectly

True / False

Given an infinite amount of noiseless training data, we expect the training error for decision stumps to go to 0.

False

Decision stumps are too simple to fit complex data perfectly — even with infinite noiseless data, some training error can remain.

How well did you know this?

Not at all

Perfectly

True / False

As the number of iterations goes to infinity, boosting is guaranteed to reach zero training error.

True

If data is separable and assumptions hold

Boosting can reduce training error to zero as iterations increase — but this depends on the data and the learners.

How well did you know this?

Not at all

Perfectly

True / False

Increasing k in k-NN increases bias and decreases variance.

True

Larger k: more stable (↓variance), less flexible (↑bias).

Increasing k in k-NN increases bias (less flexible) and decreases variance (more stable predictions).

How well did you know this?

Not at all

Perfectly

True / False

To determine the best value of k for k-means, it’s sufficient to run the k-means algorithm once for each value of k you want to try.

False

k-means should be run multiple times per k to avoid bad initializations.

How well did you know this?

Not at all

Perfectly

True / False

Increasing the number of recommended items is more likely to increase the recall than decrease it.

True

Recall = TP / (TP + FN)

Recommending more items increases the chance of including relevant ones, which raises recall.

How well did you know this?

Not at all

Perfectly

True / False

With a large dataset, nearest neighbors is more efficient at test time than logistic regression.

False

Logistic regression is faster at test time — kNN must compare each test point to all training data.

How well did you know this?

Not at all

Perfectly

True / False

k-means converges to a global optimum for the heterogeneity objective.

False

k-means minimizes WSS; final result depends on initialization.

k-means can get stuck in local minima — it’s not guaranteed to reach the global optimum for the heterogeneity (WSS) objective.

How well did you know this?

Not at all

Perfectly

True / False

To find the best set of coefficients for logistic regression, we use gradient descent to minimize the number of examples misclassified.

False

Gradient descent minimizes the value of the optimization function, not misclassification count.

How well did you know this?

Not at all

Perfectly

Define

Precision

Precision = Of predicted positives, how many were actually correct?

Formula: TP / (TP + FP)

Precision asks: When I Predict Positive, am I right?

How well did you know this?

Not at all

Perfectly

Define

Recall

Recall = Of actual positives, how many did we correctly predict?

Recall: TP / (TP + FN)

How well did you know this?

Not at all

Perfectly

Why would we prefer LASSO over Ridge? (Select all)

Helps identify important features.
Faster to learn weights.
Lower generalization error.
Efficient predictions with many features.

Study These Flashcards

Helps identify important features.
Efficient predictions with many features.

LASSO performs feature selection by shrinking some coefficients to 0.

Select all that apply.

Symptoms of logistic regression overfitting?
* Large coefficients
* Good generalization
* Simple boundary
* Complex boundary
* Overconfident predictions

Study These Flashcards

Large coefficients
Complex boundary
Overconfident predictions

Overfitting happens when the model learns noise — often leading to extreme weights, wiggly decision boundaries, and very confident but wrong predictions.

Select all that apply.

What can happen when adding a regularization term to OLS?
* Increase training error
* Increase validation error
* Increase bias
* Increase variance

Study These Flashcards

Increase training error
Increase validation error
Increase bias

Regularization simplifies the model (↑ bias, ↓ variance).
It usually helps generalization, but if over-regularized, validation error can increase.
Regularization typically reduces variance, not increases it.

Which increases test error more than training error?
* Bias
* Variance

Study These Flashcards

Variance

High variance causes overfitting — test error increases more than training error.

True / False

A model is overfit if it has lower training error than another model.

Study These Flashcards

False

Overfitting means low training error and poor generalization — not just low training error.

Multiple Choice

Which model builds trees sequentially, focusing more on hard-to-classify examples?
* Random Forest
* AdaBoost

Study These Flashcards

AdaBoost

AdaBoost builds models sequentially, focusing on mistakes.

Multiple Choice

Which model is more robust to noise and mislabeled data?
* Random Forest
* AdaBoost

Study These Flashcards

Random Forest

Random Forest is more robust to noise and outliers, since it averages many trees built from bootstrapped samples.
AdaBoost can overfocus on noisy or mislabeled points.

Which one uses full decision trees and majority voting?
* AdaBoost
* Random Forest

Study These Flashcards

Random Forest

Random Forest uses full decision trees and majority voting.

Which model would you prefer on a small, clean dataset where you want max accuracy?
* AdaBoost
* Random Forest

Study These Flashcards

AdaBoost

AdaBoost often wins on small, clean datasets with its precision focus.

Which model is easier to train in parallel on a compute cluster? * AdaBoost * Random Forest

Random Forest ## Footnote Random Forest is easy to **parallelize**, since trees are trained independently.

You have a small dataset with very few rows but many features. You want to predict a numerical outcome. Which model is most likely to perform well? * Linear Regression * LASSO Regression * Convolutional Neural Network * k-Nearest Neighbors

You want to reduce a high-dimensional image dataset down to 2D to visualize relationships between images. Which technique should you use? * Logistic Regression * Decision Tree * Principal Component Analysis (PCA) * Convolutional Neural Network

You want to estimate the probability that a student passes a class based on how many hours they studied. You believe the relationship is linear. Which model is best? * Logistic Regression * Linear Regression * Random Forest * k-means

You have an unlabeled image dataset of cats and dogs. You want to group them into separate groups based on visual similarity. Which technique is most appropriate? * Logistic Regression * k-means * Convolutional Neural Network * Random Forest

You’re working on a loan approval task. You’ve already tried simple models, but they didn’t perform well. You want to use an ensemble model that can be trained in parallel across multiple computers. Which model should you use? * AdaBoost * Fully Connected Neural Network * Random Forest * LASSO Regression

You’re entering the CIFAR-100 image classification challenge. You’re working with a large dataset of labeled images. Training time is not a concern. Which model is designed for this? * Principal Component Analysis * Random Forest * Logistic Regression * Convolutional Neural Network

Accuracy

Accuracy = "How many predictions were correct, overall?" ## Footnote Accuracy = (TP + TN) / (TP + TN + FP + FN)

416 - Practice Final Flashcards

Questions from the CSE Practice Final 1 and 2. (32 cards)