applied statistics terms Flashcards
(33 cards)
what is clustering about?
finding discrete groups with small differences between group members
is clustering classification?
no
hard clustering
each data point is only assigned to a single cluster
soft clustering
a datapoint is assigned with a certain degree of strenght over all clusters (not used in this course)
can you add data to K-means clustering?
yes, by adding it to a cluster that already exists
what method pre-specifies number of clusters?
K-means clustering
what method creates a dendogram?
hierarchical clustering
how does k-means clustering work?
you specify how many clusters you want (k) and it creaters that many random centers. it adds the closest data point to those clusters.
what is agglomerative clustering?
from the bottom to the top, focusses on mergers
what’s divisive clustering?
from top to bottom, focusses on splits
how to calculate binary distance after clustering for binary data?
Jaccard distance, (intersection / union) or manhattan
how to calculate binary distance after clustering for continuous data?
Euclidian or manhattan
which distance techniques look at absolute distances?
Euclidian and Manhattan
which distance techniques look at relative distances?
Jaccard and Bray Curtis
what is linkage?
calculating the distance between (sub)clusters in hierarchical clustering
single linkage
shortest distance, two closest points in the two clusters
complete linkage
longest distance, farthest points in the two clusters
centroid linkage
distance between the centroids of the two clusters
average linkage
average distance of all the pairwise distances
Wards minimun variance method linkage
you compute a centroid for the two clusters if they were merged (so one centroid for both clusters) and then computes the distance of the new centroid to all datapoints, and looks at the sum of squares for all compared to the new centroid
what linkage to use when you want to do anova or regression?
Ward
how does single linkage shape clusters?
it elongates
how does complete linkage shape clusters?
they become compact
what is the within cluster sum of squares if all data points are its own cluster?
0