Chapter 9 Flashcards

(26 cards)

1
Q

What is the main limitation of supervised learning mentioned in the presentation?

A

It requires labeled data, which is often unavailable or expensive to obtain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of unsupervised learning?

A

To analyze data without labels and discover hidden structures like clusters or anomalies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are common applications of clustering?

A

Customer segmentation, data analysis, anomaly detection, semi-supervised learning, and image segmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the K-Means clustering algorithm?

A

An algorithm that partitions data into k clusters by minimizing the distance between instances and cluster centroids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the main steps in the K-Means algorithm?

A

Randomly place centroids, assign points to nearest centroid, compute new centroids, repeat until convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does ‘inertia’ measure in K-Means?

A

The mean squared distance between each instance and the nearest cluster centroid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is K-Means++ initialization?

A

A method that chooses initial centroids that are far apart to improve clustering performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between hard and soft clustering?

A

Hard clustering assigns each instance to one cluster; soft clustering assigns a score or probability for each cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you speed up K-Means on large datasets?

A

Use Mini-Batch K-Means, which updates centroids using small random subsets of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the elbow method in K-Means?

A

A technique to determine the optimal number of clusters by identifying where inertia stops decreasing significantly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the silhouette score?

A

A metric that measures how similar an instance is to its own cluster compared to other clusters, ranging from -1 to +1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the limitations of K-Means?

A

It performs poorly on clusters with varying sizes, densities, and non-spherical shapes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is clustering used for image segmentation?

A

By grouping pixels with similar colors into clusters to separate regions in an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can clustering improve supervised learning?

A

By reducing dimensionality or generating features like distances to cluster centroids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is semi-supervised learning with clustering?

A

Using few labeled instances to label entire clusters or propagate labels to nearby points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is DBSCAN?

A

A density-based clustering algorithm that groups together points in high-density regions and marks outliers.

17
Q

What are the key parameters in DBSCAN?

A

eps (neighborhood radius) and min_samples (minimum number of neighbors to form a core point).

18
Q

What is a core point in DBSCAN?

A

A point with at least min_samples neighbors within its eps-radius.

19
Q

What is the main advantage of DBSCAN over K-Means?

A

DBSCAN can find clusters of arbitrary shapes and automatically detect outliers.

20
Q

What is a Gaussian Mixture Model (GMM)?

A

A probabilistic model assuming data is generated from a mixture of several Gaussian distributions.

21
Q

How does GMM differ from K-Means?

A

GMM uses soft clustering with probability distributions, while K-Means uses hard assignments based on distance.

22
Q

What are the shapes that GMM can model?

A

Spherical, diagonal, or tied (same shape, size, and orientation) clusters.

23
Q

How is anomaly detection performed using GMM?

A

By identifying points that fall in low-density regions of the Gaussian distribution.

24
Q

What criteria can be used to select the number of GMM components?

A

Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC).

25
What other clustering algorithms are mentioned in the presentation?
Agglomerative clustering, BIRCH, Mean-Shift, Affinity Propagation, and Spectral Clustering.
26
What unsupervised algorithms are used for anomaly detection?
PCA, Fast-MCD, Isolation Forest, Local Outlier Factor, and One-Class SVM.