Questions Flashcards
multiple choice (42 cards)
What is the difference between supervised and unsupervised learning?
Supervised learning uses labeled data; unsupervised learning uses unlabeled data.
Explain the concept of overfitting in machine learning. How can it be mitigated?
Overfitting is when the model performs poorly on new data; it can be mitigated by using techniques like cross-validation and regularization.
What is a confusion matrix and what are its components?
A table used to evaluate classification models; it includes true positives, true negatives, false positives, and false negatives.
Explain the difference between precision and recall.
Precision is the ratio of true positives to predicted positives; recall is the ratio of true positives to actual positives.
What is cross-validation and why is it important?
A technique for assessing model performance; it helps detect overfitting and ensures good generalization.
What is the purpose of regularization in machine learning models?
To reduce the model complexity and prevent overfitting.
How does the k-means clustering algorithm work?
It partitions data into k clusters by minimizing the distance between points within each cluster.
What is gradient descent and how is it used in machine learning?
An optimization algorithm used to minimize the loss function by iteratively updating the model’s parameters.
Which of the following is an example of an ordinal feature?
Education Level (e.g., High School, Bachelor’s, Master’s)
What is the main difference between categorical and ordinal features?
Ordinal features have a meaningful order; categorical do not
Which of the following is an embedded method for feature selection?
LASSO regression
Which of the following methods uses a model to evaluate subsets of features?
Wrapper
Why is overfitting a problem in machine learning?
The model does not generalize well to new data
What is the main objective of a Support Vector Machine?
To maximize the margin between classes
Which term is optimized in the SVM cost function?
Hinge loss with L2 regularization
What does the SVM hyperparameter C control?
The tradeoff between margin width and classification error
Why does SVM not output class probabilities?
It only provides a separating hyperplane
What is a known disadvantage of decision trees?
They tend to overfit the data
What happens when a decision tree is trained on noisy data?
It increases variance
Why are ensemble methods like Random Forest used with decision trees?
To reduce variance and improve generalization
Why can’t decision trees extrapolate?
They create discrete predictions based on seen data
Which parameter in Gradient Boosting controls learning rate?
Eta or Learning rate in most API
What is the purpose of using subsamples in Stochastic Gradient Boosting?
To reduce variance and increase robustness
What happens when you decrease the learning rate in Gradient Boosting?
You need more trees to maintain accuracy