Hierarchical clustering Flashcards
(34 cards)
Hierarchical clustering algorithm operates in _______ fashion and why
Hierarchical clustering algorithms typically operate in a greedy fashion, making locally optimal choices at each step (merging the closest clusters or spitting the largest clusters) without reconsidering previous steps.
Hierarchal clustering is __________-
divide and conquer clustering
Another name of agglomerative clustering
Bottom up approach
Another name of agglomerative clusetring
Top down approach
Hierarchical clustering can be used for what or cant’s be used for what
Hierarchical clustering can be used for outlier detection but not for finding missing values (NA) or detected fake values.
Hierarchical clustering is primarily used for ______ because ________
Hierarchical clustering is primarily used for exploration because it helps in understanding the natural grouping within data which can be very useful in exploratory data analysis.
Hierarchical clustering is _________ visualization
Dendogram visualization
In hierarchical clustering do we need to specify the number of clusters?
No need to specify the number of clusters in hierarchical clustering
How hierarchical clustering provides flexibility or not
It allows you to choose the number of clusters by cutting the dendrogram at different levels, providing flexibility to explore the data at different granularities.
Hierarchal clustering is deterministic or not
Hierarchal clustering is deterministic because it allows a fixed sequence of merging or splitting clusters based on defined criteria like distance
Linkage (definition and types)
Linkage is how to link the clusters
Linkage techniques are two types: Single linkage and complete linkage
Single linkage
* Another name
* Keyword
* Definition
* Formula
- Another name: Nearest neighbour method
- Keyword: shortest distance
- Definition: This linkage technique focused on the shortest distance between data points in each cluster.
Complete linkage
* Another name
* Keyword
* Definition
* Formula
- Another name: Farthest neighbour method
- Keyword: longest distance
- Definition: This linkage technique focused on the longest distance between data points in each cluster.
Agglomerative clustering keyword
Merging approach
Agglomerative clustering use which linkage
can use any linkage
Single linkage or complete linkage
Decisive clustering use which linkage
Decisive linkage use only complete linkage
Remember point in agglomerative clustering problem
Average linkage technique
Decisive clustering keyword
Splitting approach
How to do problem of decisive clustering
We create Minimal spanning tree (MST) based on dissimilar matrix
Minimal spanning tree characteristics (4)
It is a connected tree
No loops/ no closed circuits in the tree.
Each data point(node) in the tree is visited atleast once.
If ‘n’ nodes are present in the tree, then (n-1) edges are present or formed in the tree.
If there is n nodes in MST then
If ‘n’ nodes are present in the tree, then (n-1) edges are present or formed in the tree.
Remember point in decisive clustering problem
Explain number of levels in hierarchy in both agglomerative clustering and decisive clustering
Agglomerative clustering: If there are n observations, then there will be n-1 levels in the hierarchy.Since n−1 merges are required to combine n observations into a single cluster, the hierarchy has n−1 levels.
Decisive clustering: The number of levels depends on the way splits occur (e.g., binary splits may create more or fewer than n−1 levels).