Clustering Flashcards

(48 cards)

1
Q

What is clustering?

A

“Process of dividing data objects into similar groups.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between hard and soft clustering?

A

“In hard clustering each data point belongs to one and one only cluster. On soft clustering a data point can be assigned to multiple clusters according to a probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the applications of clustering?

A

1.Image Segmentation
2. Customer segmentation
3. Text Clustering
4. Language Clustering
5. Gene clustering
6. Product segmentation
7. among many others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What defines a cluster?

A

“A set of data objects that are more similar to each other than to objects in other clusters.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why can clustering reveal unknown groups?

A

“Because groups are formed by the algorithm without human intervention.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the main types of clustering?

A

Partitioning, hierarchical, density-based, and grid-based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the advantage of hierarchical clustering methods?

A

“They do not require the number of clusters to be specified in advance.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a dendrogram?

A

“A hierarchical representation of similarity relationships between objects.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What characterizes a density-based clustering method?

A

“Clusters are dense regions of points separated by lower-density regions.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Euclidean distance?

A

“A metric based on the square root of the sum of the squared differences between coordinates.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why normalize data before applying clustering?

A

“To prevent variables on different scales from disproportionately influencing cluster formation.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the K-Means algorithm?

A

“An algorithm that groups data into K clusters based on proximity to centroids.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the ‘Elbow Method’ technique?

A

“A graphical method to determine the optimal number of clusters in K-Means.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does hierarchical clustering work?

A

“It builds a tree structure where similar objects progressively group together.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a limitation of hierarchical clustering?

A

“It is computationally expensive for large datasets.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is outlier detection?

A

“Identifying points that significantly differ from the normal cluster patterns.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does the DBSCAN method detect clusters?

A

“It groups points that have a minimum number of close neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Manhattan distance?

A

“The sum of absolute differences between coordinates.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the linkage technique in clustering?

A

“A way to measure distance between clusters using different criteria (single

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is PCA applied to clustering?

A

“Dimensionality reduction to facilitate cluster identification in high-dimensional data.”

21
Q

How does clustering help in customer segmentation?

A

“It groups customers with similar characteristics to optimize marketing strategies.”

22
Q

What are the weaknesses of K-Means?

A

“Sensitive to outliers and requires the number of clusters to be defined beforehand.”

23
Q

Why can clustering be subjective?

A

“Because different methods can generate different groupings for the same data.”

24
Q

What characterizes a well-defined cluster?

A

“High internal cohesion and significant separation between clusters.”

25
What is the Dunn index?
"A metric to evaluate cluster separation and compactness."
26
What is the advantage of DBSCAN over K-Means?
"It can find arbitrarily shaped clusters and detect outliers."
27
What happens if a dataset has high dimensionality?
"It can make cluster separation difficult and slow down modeling."
28
What is the fuzzy clustering technique?
"Allows a data point to belong to multiple clusters with associated probabilities."
29
Why choose the mean to update centroids in K-Means?
"To help keep centroids centered within the cluster's data."
30
What is the silhouette coefficient?
"Measures how well a point is grouped within its cluster."
31
What does 'partitioning clustering' mean?
"A method that divides a dataset into distinct groups based on proximity."
32
How can a dendrogram indicate the ideal number of clusters?
"By observing the cuts at the heights of the hierarchical tree branches."
33
What is the Gap Statistic technique?
"Compares intra-cluster variation with expected values under a null distribution."
34
What is the difference between agglomerative and divisive methods?
"Agglomerative methods start with individual objects and merge them
35
What is the CLARANS method?
"A clustering algorithm based on random sampling to optimize partitions."
36
What are the challenges of clustering in big data?
"Scalability
37
What is the difference between supervised and unsupervised clustering?
"Unsupervised clustering does not use labels
38
What is grid-based clustering?
"A method that divides space into cells and uses those cells to form clusters."
39
How is clustering used in medicine?
"Identifying genetic patterns
40
What is the BIRCH method?
"A hierarchical algorithm that uses cluster feature trees."
41
Why can clusters of different sizes be problematic?
"Because some methods assume clusters of similar size
42
What is probabilistic clustering?
"A form of clustering that assigns probabilities for each object to belong to a cluster."
43
What is mutual information in clustering?
"Measures clustering quality by comparing it to a reference dataset."
44
What defines a cluster structure in density-based clustering?
"Each point must have a minimum number of close neighbors to be considered part of a cluster."
45
What happens if there are too many dimensions in clustering?
"All points end up being close
46
What is the OPTICS method?
"A clustering algorithm that orders data points to identify cluster structure."
47
What is the difference between clustering and classification?
"Clustering groups data without predefined labels
48
What does it mean for a cluster to have 'cohesion'?
"Points within the cluster are strongly grouped and close to each other."