Hierarchical Clustering Flashcards

1
Q

What are the 2 stratagies for hierarchical structuring

A

Top-down: start with one cluster, recursivly split

Bottom-up: start with singleton clusters, merge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does aglomerative clustering work?

A
  1. Start with a cluster for each point
  2. Find the pair of clusters that are clostest
  3. Merge the clusters
  4. Repeat until 1 cluster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a hierarchical tree of clusters diagram called?

A

Dendrogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Whats the runtime complexity of aglormerative clustering?

A

O(n2d + n3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is single link cluster distance measure?

A

Distance between clostest points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is complete link cluster distance measure?

A

Distance between farthest elements in cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the problem with single link?

A

Produces long chains (often we want spherical clumps)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does complete link prefer (cluster shape)?

A

Spherical clusters with constant diameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is average link cluster distance measure?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is centroid cluster distance measure?

A

Distance between centroids of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Ward’s method cluster distance measure?

A

The total distance from all points from the centroid formed if the two clusters where merged

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Lance-Williams algorithm?

A
  1. Compute D, the distance matrix between each point (singleton centroid)
  2. For N iterations
    1. Pick smallest Di,j
    2. Delete Di and Dj and add Di+j
    3. For each k != i or j update the distance (where alpha and beta are constants based on the cluster distance measure used)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly