Cluster Analysis Flashcards

1
Q

When to look at cluster patterns

A

PT would like to group patients according to their attributes in order to better treat them

PT would like to classify patients based on their individual health records in order to develop specific appropriate management strategies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hierarchical clustering

A

set of nested clusters organized using hierarchical tree

produce a set of nested clusters. each pair of individuals or clusters progressively nested in larger until only one remains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Non-Hierarchical clustering

A

group of individuals into clusters so that each object is in exactly one cluster

divides a data set of ‘n’ individuals into ‘m’ clusters

K-mean clustering most commonly used type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hierarchical Clustering:

Bottom-up (agglomerative)

A

starts with one single piece of data and then merge it with others to form larger groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
Hierarchical Clustering:
Top down (divisive)
A

starts with all in one group and then partition data step by step using a flat clustering algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Procedure of Agglomerative style

A
  1. assign each item to a cluster
  2. find closest pair of clusters and merge into a single cluster
  3. compute distances (similarities) between the new cluster and each of the old clusters
  4. repeat steps 2 and 3 until all items are clustered into a single cluster of the original sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Limitations of Hierarchical Clustering

A

necessary to specifiy both distance metric and linkage criteria without any strong theoretical basis

selecting the number of clusters using dendrogram may mislead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

K-Mean Clustering

A

data is classified into K number of clusters.

each individual data is mapped into the cluster with its nearest mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-Mean Clustering:

Procedure

A
  1. select K points as initial centroids
  2. assign points to different centroids based on proximity
  3. re-evaluate centroid of each group
  4. repeat steps 2 and 3 until best solutions emerge (centers are stable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

K-Mean Clustering:

Limitations

A

researcher chooses number of clusters

more Ks=shorter distance from centroid

when every data point is a centroid the distance is 0 but is useless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Two Step Clustering

A

run pre-clustering first and then hierarchical methods.

  • can have categorical AND continuous clusters
  • automatic selection of number of clusters
  • ability to analyze large data set efficiently
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Two Step Clustering:

Procedure

A
  1. a sequential approach is used to pre-cluster the cases by condensing the variables
  2. the pre-clusters are statistically merged into the desired # of clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cluster Quality Validation Index:

Silhouette coefficient

A

measures how well an individual data is clustered and estimates the average distance between clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cluster Quality Validation Index:

Silhouette plot

A

displays a measure of how close each point in one cluster is to points in the neighboring cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interpretation with Silhouette coefficient:

individual data with large Silhouette coefficient value of almost 1

A

very well clustered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interpretation with Silhouette coefficient:

individual data with small Silhouette coefficient value of around 0

A

lies between two clusters

17
Q

Interpretation with Silhouette coefficient:

individual data with negative coefficient value

A

probably placed in the wrong cluster

18
Q

Silhouette coefficient value

0.5-1.0

A

Good

19
Q

Silhouette coefficient value

0.2-0.5

A

Fair

20
Q

Silhouette coefficient value

-1.0 - 0.2

A

Poor