K means clustering Flashcards by ROWAN Gomanee

What type of learning is K-Means?

Unsupervised learning.

How well did you know this?

Not at all

Perfectly

What is the goal of clustering?

To group similar data points based on structure or distance.

How well did you know this?

Not at all

Perfectly

What does K in K-Means refer to?

The number of clusters you want to divide your data into.

How well did you know this?

Not at all

Perfectly

What is the main assumption of K-Means about clusters?

That clusters are roughly spherical and separable by distance.

How well did you know this?

Not at all

Perfectly

What does each cluster in K-Means have?

A centroid, representing the mean of points in that cluster.

How well did you know this?

Not at all

Perfectly

How does K-Means assign points to clusters?

By assigning each point to the nearest centroid.

How well did you know this?

Not at all

Perfectly

What does K-Means aim to minimize?

The total within-cluster sum of squares (WCSS).

How well did you know this?

Not at all

Perfectly

What is the formula for the K-Means objective?

min Σₖ Σᵢ∈Cₖ ‖xᵢ - μₖ‖²

How well did you know this?

Not at all

Perfectly

What is a centroid in K-Means?

The mean of all points assigned to a cluster.

How well did you know this?

Not at all

Perfectly

How is Euclidean distance computed between two 2D points?

d = √((x₂ - x₁)² + (y₂ - y₁)²)

How well did you know this?

Not at all

Perfectly

What are the steps of the K-Means algorithm?

Initialize centroids, assign points, update centroids, repeat.

How well did you know this?

Not at all

Perfectly

When does K-Means stop iterating?

When point assignments no longer change or a max iteration limit is reached.

How well did you know this?

Not at all

Perfectly

What shape do K-Means decision boundaries form?

Voronoi cells based on centroid proximity.

How well did you know this?

Not at all

Perfectly

What is the complexity of K-Means per iteration?

O(NKD), where N = data points, K = clusters, D = dimensions.

How well did you know this?

Not at all

Perfectly

Why can K-Means produce different results on different runs?

Because it uses random initialization of centroids.

How well did you know this?

Not at all

Perfectly

What is K-Means++?

Study These Flashcards

A smarter initialization method that spreads out initial centroids.

What problem does K-Means++ solve?

Study These Flashcards

Poor convergence due to bad initial centroid placement.

What is the elbow method used for in K-Means?

Study These Flashcards

To choose the optimal number of clusters based on cost drop-off.

What is the silhouette score used for?

Study These Flashcards

To evaluate how well a point fits in its cluster versus the next closest one.

What does a high silhouette score indicate?

Study These Flashcards

That the point is well-clustered and far from other clusters.

What does a silhouette score near zero mean?

Study These Flashcards

The point lies on the boundary between two clusters.

What does a negative silhouette score indicate?

Study These Flashcards

The point is likely in the wrong cluster.

Why is exhaustive clustering infeasible?

Study These Flashcards

Because the number of possible clusterings grows combinatorially with data size.

What are Stirling numbers used for in this context?

Study These Flashcards

To count the number of ways to partition data into non-empty clusters.

What does the union of all clusters in K-Means cover?

All the data points in the dataset.

Why is K-Means considered a 'hard' clustering method?

Each point is assigned to exactly one cluster with no uncertainty.

What happens in the update step of K-Means?

Each centroid is updated to the mean of its assigned points.

K means clustering Flashcards

(27 cards)