hierarchal clustering Flashcards

(29 cards)

1
Q

What is a key limitation of K-Means that clustering alternatives aim to solve?

A

It requires knowing the number of clusters (K) in advance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does hierarchical clustering produce instead of a flat set of clusters?

A

A hierarchy or tree of nested clusters (dendrogram).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two main types of hierarchical clustering?

A

Agglomerative (bottom-up) and divisive (top-down).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does agglomerative clustering work?

A

It starts with each point as its own cluster and merges the closest clusters iteratively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does divisive clustering work?

A

It starts with all points in one cluster and recursively splits it into sub-clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is single linkage in hierarchical clustering?

A

The shortest distance between any two points in different clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is complete linkage in hierarchical clustering?

A

The longest distance between any two points in different clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is average linkage in hierarchical clustering?

A

The average distance between all pairs of points in two clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a key advantage of agglomerative clustering?

A

It is easier to implement and allows flexible linkage criteria.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a challenge of divisive clustering?

A

It can be computationally expensive and complex to implement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of data structure is formed by hierarchical clustering?

A

A dendrogram showing nested groupings of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is DBSCAN designed to detect that K-Means struggles with?

A

Clusters of arbitrary shape and noise/outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does DBSCAN stand for?

A

Density-Based Spatial Clustering of Applications with Noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does ε (epsilon) represent in DBSCAN?

A

The radius of the local neighborhood for density estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does MinPts represent in DBSCAN?

A

The minimum number of points required to form a dense region (a core point).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a core point in DBSCAN?

A

A point that has at least MinPts within its ε-neighborhood.

17
Q

What is a border point in DBSCAN?

A

A point within ε of a core point but with fewer than MinPts neighbors itself.

18
Q

What is a noise point in DBSCAN?

A

A point that is neither a core nor a border point; considered an outlier.

19
Q

How are clusters formed in DBSCAN?

A

By connecting core points and including their reachable border points.

20
Q

Why is DBSCAN good at handling outliers?

A

Because it labels sparse points as noise and doesn’t force them into clusters.

21
Q

What is one drawback of DBSCAN?

A

It struggles with datasets that have varying densities.

22
Q

What happens if ε is too small in DBSCAN?

A

Too many points are labeled as noise or clusters are fragmented.

23
Q

What happens if ε is too large in DBSCAN?

A

Different clusters may merge or core points may be missed.

24
Q

Does DBSCAN require you to specify the number of clusters?

A

No, DBSCAN determines the number of clusters automatically.

25
How does DBSCAN handle non-convex cluster shapes?
It connects dense regions, allowing arbitrary cluster shapes.
26
Which clustering algorithm is sensitive to initialization?
K-Means.
27
What does a dendrogram represent in hierarchical clustering?
A tree showing how clusters are merged or split at different levels.
28
Which method is better suited for finding small details in data structure?
Agglomerative hierarchical clustering.
29
What is one major limitation of hierarchical clustering for large datasets?
It has high computational complexity, especially with O(N³) in naive versions.