hierarchal clustering Flashcards
(29 cards)
What is a key limitation of K-Means that clustering alternatives aim to solve?
It requires knowing the number of clusters (K) in advance.
What does hierarchical clustering produce instead of a flat set of clusters?
A hierarchy or tree of nested clusters (dendrogram).
What are the two main types of hierarchical clustering?
Agglomerative (bottom-up) and divisive (top-down).
How does agglomerative clustering work?
It starts with each point as its own cluster and merges the closest clusters iteratively.
How does divisive clustering work?
It starts with all points in one cluster and recursively splits it into sub-clusters.
What is single linkage in hierarchical clustering?
The shortest distance between any two points in different clusters.
What is complete linkage in hierarchical clustering?
The longest distance between any two points in different clusters.
What is average linkage in hierarchical clustering?
The average distance between all pairs of points in two clusters.
What is a key advantage of agglomerative clustering?
It is easier to implement and allows flexible linkage criteria.
What is a challenge of divisive clustering?
It can be computationally expensive and complex to implement.
What type of data structure is formed by hierarchical clustering?
A dendrogram showing nested groupings of data.
What is DBSCAN designed to detect that K-Means struggles with?
Clusters of arbitrary shape and noise/outliers.
What does DBSCAN stand for?
Density-Based Spatial Clustering of Applications with Noise.
What does ε (epsilon) represent in DBSCAN?
The radius of the local neighborhood for density estimation.
What does MinPts represent in DBSCAN?
The minimum number of points required to form a dense region (a core point).
What is a core point in DBSCAN?
A point that has at least MinPts within its ε-neighborhood.
What is a border point in DBSCAN?
A point within ε of a core point but with fewer than MinPts neighbors itself.
What is a noise point in DBSCAN?
A point that is neither a core nor a border point; considered an outlier.
How are clusters formed in DBSCAN?
By connecting core points and including their reachable border points.
Why is DBSCAN good at handling outliers?
Because it labels sparse points as noise and doesn’t force them into clusters.
What is one drawback of DBSCAN?
It struggles with datasets that have varying densities.
What happens if ε is too small in DBSCAN?
Too many points are labeled as noise or clusters are fragmented.
What happens if ε is too large in DBSCAN?
Different clusters may merge or core points may be missed.
Does DBSCAN require you to specify the number of clusters?
No, DBSCAN determines the number of clusters automatically.