Clustering Flashcards
(48 cards)
What is clustering?
“Process of dividing data objects into similar groups.”
What is the difference between hard and soft clustering?
“In hard clustering each data point belongs to one and one only cluster. On soft clustering a data point can be assigned to multiple clusters according to a probability
What are the applications of clustering?
1.Image Segmentation
2. Customer segmentation
3. Text Clustering
4. Language Clustering
5. Gene clustering
6. Product segmentation
7. among many others
What defines a cluster?
“A set of data objects that are more similar to each other than to objects in other clusters.”
Why can clustering reveal unknown groups?
“Because groups are formed by the algorithm without human intervention.”
What are the main types of clustering?
Partitioning, hierarchical, density-based, and grid-based
What is the advantage of hierarchical clustering methods?
“They do not require the number of clusters to be specified in advance.”
What is a dendrogram?
“A hierarchical representation of similarity relationships between objects.”
What characterizes a density-based clustering method?
“Clusters are dense regions of points separated by lower-density regions.”
What is Euclidean distance?
“A metric based on the square root of the sum of the squared differences between coordinates.”
Why normalize data before applying clustering?
“To prevent variables on different scales from disproportionately influencing cluster formation.”
What is the K-Means algorithm?
“An algorithm that groups data into K clusters based on proximity to centroids.”
What is the ‘Elbow Method’ technique?
“A graphical method to determine the optimal number of clusters in K-Means.”
How does hierarchical clustering work?
“It builds a tree structure where similar objects progressively group together.”
What is a limitation of hierarchical clustering?
“It is computationally expensive for large datasets.”
What is outlier detection?
“Identifying points that significantly differ from the normal cluster patterns.”
How does the DBSCAN method detect clusters?
“It groups points that have a minimum number of close neighbors
What is Manhattan distance?
“The sum of absolute differences between coordinates.”
What is the linkage technique in clustering?
“A way to measure distance between clusters using different criteria (single
What is PCA applied to clustering?
“Dimensionality reduction to facilitate cluster identification in high-dimensional data.”
How does clustering help in customer segmentation?
“It groups customers with similar characteristics to optimize marketing strategies.”
What are the weaknesses of K-Means?
“Sensitive to outliers and requires the number of clusters to be defined beforehand.”
Why can clustering be subjective?
“Because different methods can generate different groupings for the same data.”
What characterizes a well-defined cluster?
“High internal cohesion and significant separation between clusters.”