K-Means Flashcards
Define monothetic cluster
Members have some common property
Define polythetic cluster
Cluster members are similar to each other
Define hard clustering
Clusters do not overlap
Define soft clustering
Clusters may overlap
Define flat clustering
set of clusters without any explicit structure that would relate clusters to each other
Define hierarchical clustering
Produces a hierarchy of clusters
K-D trees, monothetich or polythetic?
Monothetic, few cuts describe each member of the region
K-D trees, hard or soft boundaries?
Hard
K-D trees, flat or hierarchical?
Hierarchical, we can cut the tree at any depth
K-means, monothetic or polythetic?
Polythetic
K-means, hard or soft boundaries?
Hard
K-means, flat or hierarchical?
Flat
Gaussian mixtures, monothetic or polythetic?
Polythetic
Gaussian mixtures, hard or soft boundaries?
Soft
Gaussian mixtures, flat or hierarchical?
Flat
Agglomerative clustering, monothetic or polythetic?
Polythetic
Agglomerative clustering, hard or soft boundaries?
Hard
Agglomerative clustering, flat or hierarchical?
Hierarchical
What are some common use cases for K-means? (2)
- Discovering classes (unsupervised)
- Dimensionality reduction
How can k-means be used for dimensionality reduction?
Run k-means, replace features with a cluster number
K-means clustering algorithm
- Place centroids c1, …, ck randomly
- Repeat until convergence
- For each point xi
- Find the nearest centroid cj
- Assign point xi to cluster j
- For each cluster, make the centroid cj closest to all the data points in the cluster (for euclidian distances this is the mean)
- For each point xi
What does it mean that K-means converges to a local minimum?
Different starting points can produce difference clusters
Whats the variance for K-means?
Sum of distance to every points centroid
What is a scree plot?