Lecture 16 - Clustering Flashcards

1
Q

What is clustering?

A

Given a set of unlabelled training examples, finding a way to partition the examples into classes/groups

Therefore be able to determine the class of any new sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clustering is also known as ____________ learning

A

unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What criteria make a good partition for clustering?

A

Maximise similarity within classes

Minimise similarity between classes

Minimise number of classes created

Maximises ability to predict unknown attribute values from class membership

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Agglomerative Hierarchy and K-Means method are both methods of?

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the basic procedure of agglomerative hierarchical clustering?

A

Assign each sample to its own cluster

While there are at least X clusters

Find the most similar pair of clusters

Merge them into a new larger cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the prerequisites for agglomerative hierarchical clustering?

A

Similarity metric for samples

All examples must be available at the start

human analyst to determine optimal number of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the basic procedure of k-means method?

A

k = number of clusters to form

Choose k items randomly to be cluster centers

repeat until no item changes clusters {

assign each item to its nearest cluster

set each cluster center to be the mean value of each item in the cluster

}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly