K means clustering Flashcards

(27 cards)

1
Q

What type of learning is K-Means?

A

Unsupervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of clustering?

A

To group similar data points based on structure or distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does K in K-Means refer to?

A

The number of clusters you want to divide your data into.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main assumption of K-Means about clusters?

A

That clusters are roughly spherical and separable by distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does each cluster in K-Means have?

A

A centroid, representing the mean of points in that cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does K-Means assign points to clusters?

A

By assigning each point to the nearest centroid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does K-Means aim to minimize?

A

The total within-cluster sum of squares (WCSS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for the K-Means objective?

A

min Σₖ Σᵢ∈Cₖ ‖xᵢ - μₖ‖²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a centroid in K-Means?

A

The mean of all points assigned to a cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is Euclidean distance computed between two 2D points?

A

√((x₁ - y₁)² + (x₂ - y₂)²)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the steps of the K-Means algorithm?

A

Initialize centroids, assign points, update centroids, repeat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When does K-Means stop iterating?

A

When point assignments no longer change or a max iteration limit is reached.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What shape do K-Means decision boundaries form?

A

Voronoi cells based on centroid proximity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the complexity of K-Means per iteration?

A

O(NKD), where N = data points, K = clusters, D = dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why can K-Means produce different results on different runs?

A

Because it uses random initialization of centroids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is K-Means++?

A

A smarter initialization method that spreads out initial centroids.

17
Q

What problem does K-Means++ solve?

A

Poor convergence due to bad initial centroid placement.

18
Q

What is the elbow method used for in K-Means?

A

To choose the optimal number of clusters based on cost drop-off.

19
Q

What is the silhouette score used for?

A

To evaluate how well a point fits in its cluster versus the next closest one.

20
Q

What does a high silhouette score indicate?

A

That the point is well-clustered and far from other clusters.

21
Q

What does a silhouette score near zero mean?

A

The point lies on the boundary between two clusters.

22
Q

What does a negative silhouette score indicate?

A

The point is likely in the wrong cluster.

23
Q

Why is exhaustive clustering infeasible?

A

Because the number of possible clusterings grows combinatorially with data size.

24
Q

What are Stirling numbers used for in this context?

A

To count the number of ways to partition data into non-empty clusters.

25
What does the union of all clusters in K-Means cover?
All the data points in the dataset.
26
Why is K-Means considered a 'hard' clustering method?
Each point is assigned to exactly one cluster with no uncertainty.
27
What happens in the update step of K-Means?
Each centroid is updated to the mean of its assigned points.