ML Part 4 Flashcards
(15 cards)
What is clustering?
An unsupervised learning task to group similar data points together.
What is k-means clustering?
An algorithm that partitions data into k clusters based on distance to cluster centroids.
What is the elbow method in k-means?
A method to choose k by plotting inertia vs. k and finding the ‘elbow’.
What is inertia in k-means?
The sum of squared distances from data points to their closest centroid.
What are limitations of k-means?
It assumes spherical clusters and requires specifying k.
What is a Support Vector Machine (SVM)?
A supervised learning algorithm that finds the optimal hyperplane to separate classes.
What is a support vector?
A data point closest to the decision boundary; it defines the margin.
What is the margin in SVM?
The distance between the decision boundary and the nearest data points.
What is the kernel trick?
A method to transform data into higher dimensions for non-linear separation.
What are common kernel types in SVM?
Linear, polynomial, RBF (Gaussian).
What is cross-entropy loss?
A loss function for classification measuring difference between predicted probabilities and true labels.
What is ROC AUC?
Area under the ROC curve, summarizing model discrimination across thresholds.
What is precision-recall tradeoff?
Increasing precision often decreases recall, and vice versa.
What is a learning curve?
A plot showing how model performance changes with training size.
What is a validation curve?
A plot showing performance as a function of a hyperparameter value.