K-Means Flashcards

1
Q

Define monothetic cluster

A

Members have some common property

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define polythetic cluster

A

Cluster members are similar to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define hard clustering

A

Clusters do not overlap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define soft clustering

A

Clusters may overlap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define flat clustering

A

set of clusters without any explicit structure that would relate clusters to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define hierarchical clustering

A

Produces a hierarchy of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

K-D trees, monothetich or polythetic?

A

Monothetic, few cuts describe each member of the region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

K-D trees, hard or soft boundaries?

A

Hard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-D trees, flat or hierarchical?

A

Hierarchical, we can cut the tree at any depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

K-means, monothetic or polythetic?

A

Polythetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

K-means, hard or soft boundaries?

A

Hard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

K-means, flat or hierarchical?

A

Flat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gaussian mixtures, monothetic or polythetic?

A

Polythetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gaussian mixtures, hard or soft boundaries?

A

Soft

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Gaussian mixtures, flat or hierarchical?

A

Flat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Agglomerative clustering, monothetic or polythetic?

A

Polythetic

17
Q

Agglomerative clustering, hard or soft boundaries?

18
Q

Agglomerative clustering, flat or hierarchical?

A

Hierarchical

19
Q

What are some common use cases for K-means? (2)

A
  • Discovering classes (unsupervised)
  • Dimensionality reduction
20
Q

How can k-means be used for dimensionality reduction?

A

Run k-means, replace features with a cluster number

21
Q

K-means clustering algorithm

A
  1. Place centroids c1, …, ck randomly
  2. Repeat until convergence
    1. For each point xi
      1. Find the nearest centroid cj
      2. Assign point xi to cluster j
    2. For each cluster, make the centroid cj closest to all the data points in the cluster (for euclidian distances this is the mean)
22
Q

What does it mean that K-means converges to a local minimum?

A

Different starting points can produce difference clusters

23
Q

Whats the variance for K-means?

A

Sum of distance to every points centroid

24
Q

What is a scree plot?

25
How can you pick a good value of K for k-means? (2)
* class labels may suggest a value (digits recognition - 10 digits) * optimize V visually from a scree plot (where *mountain* ends and *rubble* begins)
26
How can we extrinsicly excaulate a clustering algorithm?
Use it as part of another problem (e.g. removing outliers for digit recognition), and see if it helped
27
How can we intrinsicly evaluate a clustering algorithm (with reference clusters)?
* Align reference clusters Rj with system produced clusters Ci * Measure the accuracy
28
How do we measure the accuracy in intrinsic clustering evaluation (with reference clusters)?
The sum of the overlapping instances in each pair of system and reference clusters over the number of training instances
29
How can we align a reference cluster with a system cluster?
* Pick the pair of reference and system clusters with the *maximum overlap* * *Greedly reassign clusters* until each reference cluster has a unique system cluster
30
How can we intrinsicly evaluate clusters with humans?
* Sample pairs * Ask humans if they should be in the same cluster * Count errors and compute accuracy
31
What is the advantage of intrinsic evaluation with humans vs reference clusters?
* Does not require cluster alignment strategy * Can handle overlapping classes
32
How can we use K-Means as part of image recognition?
* Split image into regions * Compute statistics of regions * Use K-Means to cluster regions * Put clusters together like bag-of-words { 4 \* "C27", 15 \* "C44", ... } * Use any algorithm on the flat representation
33
What does K-Means minimize?
the aggregate intra-cluster distance