IFN580 Week 8 Clustering Flashcards
(13 cards)
What is clustering?
A ML technique that groups similar data points together
K-means terminates when?
When the cluster centres for the current iteration are identical to the previous iteration
What is hierarchical clustering?
building a hierarchy of clusters represented by a dendogram
How are hierarchical clusters built?
by splitting or merging clusters based on distance between them
Which clustering algorithm initially assumes that each instance represents a single cluster?
Agglomerative
List some of the distance measures that are used in clustering.
Manhattan Distance
Euclidean Measure
Minkowski Distance
Dot Point
Cosine Similarity
How can the distance between missing values be computed?
Assume they are maximally distant
Unsupervised evaluation can be internal or external. Which of the following is an internal method for evaluating alternative clusterings produced by the K-Means algorithm?
Compare the sum of squared error differences
What is the range for silhouette scores?
between -1 and 1
What are silhouette scores?
calculates the quality of a clustering solution using intra-cluster and inter-cluster distances
1 = clusters are well apart
0 = clusters are overlapping
-1 = point is in wrong cluster
How to determine what is the optimal number of clusters when
using the using the elbow method?
the point at which the reduction starts to level off