L9 Cluster Analysis Flashcards

1
Q

What is the main difference between factor and cluster analysis?

A

In the factor analysis we want to find factors of items.
In the cluster analysis we want to find clusters of objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of the cluster analysis?

A

Find clusters such that within a cluster the objects are as similar as possible (internal homogeneity) while at the same time the clusters are as distinct as possible (external heterogeneity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Two ways of qunatifying similarity

A

Distance and correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Two types of distance measure

A

Euclidean & city-block distance

–> depends on the case which one to use. (On what you define as similar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is agglomerative hierarchical clustering?

A

When you create larger and less clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is divisive hierarchical clustering?

A

When you create more and more clusters out of large ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the procedure for derving clusters? In the agglomerative hierarchical approach

A

Starting point: calculate pairwise similarity between objects (based on distance or correlation)
Step1: Merge those objects with highest similarity (P and Q) into a cluster
Step2: Calculate linkage criterion for the new cluster and the other objects (or clusters)
Step3: Merge those objects and cluster that minimize the linkage criterion
Then repeat steps 2&3 until there is a single cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the coefficient measure in the graph?

A

It measures the heterogeity index. Heterogeneity increases with a larger coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are three linkage methods?

A

Single linkage
Complete linkage
Ward’s method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is single linkage?

A

You find the nearest neighbor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is complete linkage?

A

You find the farthest neighbor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Ward’s method?

A

Minimize total distance (variance) within a considered cluster.
Most reliable method.
You create a centroid which is the mean value of the hypothetical cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are 3 indices for model evaluation? (About how many clusters to retain)

A
  • Within-cluster sum of squares (WSS)
  • Information criteria:
  • Bayesian information criterion (BIC)
  • Akaike information criterion (AIC)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you minimize within cluster sum of squares?

A

The smallest WSS is always the max # of clusters available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the advantage of the Information criteria over WSS?

A

The BIC and AIC are a trade off between model fit and model complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why might the scree plot not be suitable in determining the optimal # clusters?

A

There might not be a clear elbow. Use the Lowest BIC instead.

17
Q

What does the Silhouette coefficient tell you?

A

Expresses how clearly the clusters are separated. It measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation).

18
Q

From what till what does the Silhouette coefficient range? What does a high score mean?

A

Ranges from -1 to 1 (a high positive value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters)

19
Q

What if many points have a low or negative silhouette coefficient?

A

Then the clustering configuration may have too many or too few clusters.