17-Unsupervised learning Flashcards

1
Q

What is unsupervised learning?

A

Unsupervised learning is a group of machine learning models where the class is unknown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference in exclusive or overlapping clustering?

A

Exclusive clustering says that an item can only be in one category, whereas overlapping clustering suggests an item can be in more than one category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between deterministic and probabilistic clustering?

A

Deterministic clustering means that an item can be in one cluster. Whereas probabilistic clustering assigns a probability to each item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between hierarchical and partitioning clustering?

A

Hierarchical clustering suggests clusters have subset relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between heterogenous and homogenous clustering?

A

Heterogenous clustering have clusters of different shapes and sizes, whereas homogenous clustering have clusters of one shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between partial vs complete clustering?

A

Partial clustering only clusters some of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between incremental vs batch clustering?

A

In batch clustering, items are clustered at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does k-means algorithm work?

A

Initialise k random seed points
Assign each instance to the cluster with the nearest centroid.
Update centroid and assign to the average of the nearest centroid
Stop until centroids don’t change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the pros of k-means?

A

Relatively efficient
Can be extended to hierarchical clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the cons of k-means?

A

Sensitive to random centroid selection
Mean not well defined for nominal / ordinal attributes
May not work well with outliers
May not be able to handle clusters of different sizes
Need to classify k in advance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How should the k for k-means be calculated?

A

Calculate within-cluster SSE from centroid for each cluster. As k increases within-cluster SSE decreases. Use elbow method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two hierarchical clustering methods?

A

Agglomerative and divisive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is agglomerative hierarchical clustering?

A

Bottom up clustering - start with single instance clusters and join two closest clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is divisive hierarchical clustering?

A

Top-down clustering - start with one universal cluster, find two partitioning clusters and proceed recursively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the graph-based measures of proximity?

A

Minimum - two nearest single points
Complete - two furthest points
Average - Average distance between all points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can clustering be evaluated?

A

Unsupervised: Cluster cohesion, Cluster separation
Supervised: Entropy, Purity

17
Q

What is cluster cohesion?

A

Cluster cohesion measures how closely related points are. Measured with within -cluster SSE. 1/SSE of each cluster

18
Q

What is cluster separation?

A

How separate clusters are from each other. Measured with between cluster SSE

19
Q

What is entropy in clustering?

A

Minimise entropy, the weighted sum of entropy. Where we measure the probability of a class within the cluster

20
Q

What is purity in clustering?

A

Maximise purity. The weighted maximum of probability of a class within the cluster

21
Q

What are the two main ways to measure homogeneity and completeness?

A

Homogeneity - If all elements have same true label. Measured with entropy and purity

Completeness - All members of a class are assigned to same cluster