hierarchal clustering Flashcards by ROWAN Gomanee

What is a key limitation of K-Means that clustering alternatives aim to solve?

It requires knowing the number of clusters (K) in advance.

How well did you know this?

Not at all

Perfectly

What does hierarchical clustering produce instead of a flat set of clusters?

A hierarchy or tree of nested clusters (dendrogram).

How well did you know this?

Not at all

Perfectly

What are the two main types of hierarchical clustering?

Agglomerative (bottom-up) and divisive (top-down).

How well did you know this?

Not at all

Perfectly

How does agglomerative clustering work?

It starts with each point as its own cluster and merges the closest clusters iteratively.

How well did you know this?

Not at all

Perfectly

How does divisive clustering work?

It starts with all points in one cluster and recursively splits it into sub-clusters.

How well did you know this?

Not at all

Perfectly

What is single linkage in hierarchical clustering?

The shortest distance between any two points in different clusters.

How well did you know this?

Not at all

Perfectly

What is complete linkage in hierarchical clustering?

The longest distance between any two points in different clusters.

How well did you know this?

Not at all

Perfectly

What is average linkage in hierarchical clustering?

The average distance between all pairs of points in two clusters.

How well did you know this?

Not at all

Perfectly

What is a key advantage of agglomerative clustering?

It is easier to implement and allows flexible linkage criteria.

How well did you know this?

Not at all

Perfectly

What is a challenge of divisive clustering?

It can be computationally expensive and complex to implement.

How well did you know this?

Not at all

Perfectly

What type of data structure is formed by hierarchical clustering?

A dendrogram showing nested groupings of data.

How well did you know this?

Not at all

Perfectly

What is DBSCAN designed to detect that K-Means struggles with?

Clusters of arbitrary shape and noise/outliers.

How well did you know this?

Not at all

Perfectly

What does DBSCAN stand for?

Density-Based Spatial Clustering of Applications with Noise.

How well did you know this?

Not at all

Perfectly

What does ε (epsilon) represent in DBSCAN?

The radius of the local neighborhood for density estimation.

How well did you know this?

Not at all

Perfectly

What does MinPts represent in DBSCAN?

The minimum number of points required to form a dense region (a core point).

How well did you know this?

Not at all

Perfectly

What is a core point in DBSCAN?

Study These Flashcards

A point that has at least MinPts within its ε-neighborhood.

What is a border point in DBSCAN?

Study These Flashcards

A point within ε of a core point but with fewer than MinPts neighbors itself.

What is a noise point in DBSCAN?

Study These Flashcards

A point that is neither a core nor a border point; considered an outlier.

How are clusters formed in DBSCAN?

Study These Flashcards

By connecting core points and including their reachable border points.

Why is DBSCAN good at handling outliers?

Study These Flashcards

Because it labels sparse points as noise and doesn’t force them into clusters.

What is one drawback of DBSCAN?

Study These Flashcards

It struggles with datasets that have varying densities.

What happens if ε is too small in DBSCAN?

Study These Flashcards

Too many points are labeled as noise or clusters are fragmented.

What happens if ε is too large in DBSCAN?

Study These Flashcards

Different clusters may merge or core points may be missed.

Does DBSCAN require you to specify the number of clusters?

Study These Flashcards

No, DBSCAN determines the number of clusters automatically.

How does DBSCAN handle non-convex cluster shapes?

It connects dense regions, allowing arbitrary cluster shapes.

Which clustering algorithm is sensitive to initialization?

K-Means.

What does a dendrogram represent in hierarchical clustering?

A tree showing how clusters are merged or split at different levels.

Which method is better suited for finding small details in data structure?

Agglomerative hierarchical clustering.

What is one major limitation of hierarchical clustering for large datasets?

It has high computational complexity, especially with O(N³) in naive versions.

hierarchal clustering Flashcards

(29 cards)