K-Means Flashcards

1
Q

Define monothetic cluster

A

Members have some common property

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define polythetic cluster

A

Cluster members are similar to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define hard clustering

A

Clusters do not overlap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define soft clustering

A

Clusters may overlap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define flat clustering

A

set of clusters without any explicit structure that would relate clusters to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define hierarchical clustering

A

Produces a hierarchy of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

K-D trees, monothetich or polythetic?

A

Monothetic, few cuts describe each member of the region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

K-D trees, hard or soft boundaries?

A

Hard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-D trees, flat or hierarchical?

A

Hierarchical, we can cut the tree at any depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

K-means, monothetic or polythetic?

A

Polythetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

K-means, hard or soft boundaries?

A

Hard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

K-means, flat or hierarchical?

A

Flat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gaussian mixtures, monothetic or polythetic?

A

Polythetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gaussian mixtures, hard or soft boundaries?

A

Soft

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Gaussian mixtures, flat or hierarchical?

A

Flat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Agglomerative clustering, monothetic or polythetic?

A

Polythetic

17
Q

Agglomerative clustering, hard or soft boundaries?

A

Hard

18
Q

Agglomerative clustering, flat or hierarchical?

A

Hierarchical

19
Q

What are some common use cases for K-means? (2)

A
  • Discovering classes (unsupervised)
  • Dimensionality reduction
20
Q

How can k-means be used for dimensionality reduction?

A

Run k-means, replace features with a cluster number

21
Q

K-means clustering algorithm

A
  1. Place centroids c1, …, ck randomly
  2. Repeat until convergence
    1. For each point xi
      1. Find the nearest centroid cj
      2. Assign point xi to cluster j
    2. For each cluster, make the centroid cj closest to all the data points in the cluster (for euclidian distances this is the mean)
22
Q

What does it mean that K-means converges to a local minimum?

A

Different starting points can produce difference clusters

23
Q

Whats the variance for K-means?

A

Sum of distance to every points centroid

24
Q

What is a scree plot?

A
25
Q

How can you pick a good value of K for k-means? (2)

A
  • class labels may suggest a value (digits recognition - 10 digits)
  • optimize V visually from a scree plot (where mountain ends and rubble begins)
26
Q

How can we extrinsicly excaulate a clustering algorithm?

A

Use it as part of another problem (e.g. removing outliers for digit recognition), and see if it helped

27
Q

How can we intrinsicly evaluate a clustering algorithm (with reference clusters)?

A
  • Align reference clusters Rj with system produced clusters Ci
  • Measure the accuracy
28
Q

How do we measure the accuracy in intrinsic clustering evaluation (with reference clusters)?

A

The sum of the overlapping instances in each pair of system and reference clusters over the number of training instances

29
Q

How can we align a reference cluster with a system cluster?

A
  • Pick the pair of reference and system clusters with the maximum overlap
  • Greedly reassign clusters until each reference cluster has a unique system cluster
30
Q

How can we intrinsicly evaluate clusters with humans?

A
  • Sample pairs
  • Ask humans if they should be in the same cluster
  • Count errors and compute accuracy
31
Q

What is the advantage of intrinsic evaluation with humans vs reference clusters?

A
  • Does not require cluster alignment strategy
  • Can handle overlapping classes
32
Q

How can we use K-Means as part of image recognition?

A
  • Split image into regions
  • Compute statistics of regions
  • Use K-Means to cluster regions
  • Put clusters together like bag-of-words { 4 * “C27”, 15 * “C44”, … }
  • Use any algorithm on the flat representation
33
Q

What does K-Means minimize?

A

the aggregate intra-cluster distance