L11 - Unsupervised Learning Flashcards

1
Q

What is the goal of unsupervised learning?

A
  • To identify patters in unseen data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give some examples of objectives that can be achieved with unsupervised learning…

A
  • Identify new animal species, customer segmentation, identifying fraudulent activity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unsupervised learning is used for clustering tasks, explain how this is done…

A
  • Iterate all points in data, establishing distance metrics between one another. Clusters can be created from data points that are closer to one another.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unsupervised learning is used for community detection, explain what this is and how it’s done…

A
  • A community is a group of interconnected nodes. Nodes that share more connections have a higher connection strength. E.g. Community of school friends on facebook will be strong due to many mutual friendships.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Unsupervised modelling is used for topic modelling, explain what this is and how it’s done…

A
  • Topic modelling identifies topics and common themes in a data set. This can be done through methods such as word embedding using lemma or stems words.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give some examples of clustering algorithms…

A

K means -> Identifies points close to K centroids where K is a hyper parameter given by the user.

DBSCAN -> Density Based Spatial Clustering of Applications with Noise. Finds high density regions, and creates cluster by expanding outwards.

Hierarchical Clustering -> Repeatedly divide clusters into sub-clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 types of clustering algorithms? Define each…

A

Hard Clustering -> Each data belongs to 1 cluster and only 1 cluster. Used when we want to make a definite decision on the data. I.e data can’t belong to multiple classifications. e.g data is either in A or B or C.

Soft Clustering -> Data can be assigned to multiple clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a common similarity / distance metric used for clustering?

A
  • Euclidean distance ( L2 norm )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When do we use Jaccard Similarity? How is it calculated?

A

We use Jaccard Similarity when we want to establish the similarity between 2 sets. It’s calculated by the number of intersection points of the sets divided by the number of union data point of the sets.

The Jaccard Distance = 1 - Jaccard Similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we calculate Jaccard Distance?

A
  • Jaccard Distance = 1 - Jaccard Similarity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly