IFN580 Week 7 Unsupervised Learning Flashcards

Question 1

Q

What is ‘curse of dimensionality’

Answer

A

when dimensionality increases, data becomes sparse and requires more data to learn properly

Question 2

Q

Both supervised and unsupervised require at least one:

Answer

A

input attribute

Question 3

Q

Supervised learning differs from unsupervised learning in that supervised learning
requires:

Answer

A

at least one attribute

Question 4

Q

When selecting features, are the attribute values changed in any way?

Answer

A

No, feature selection is simply selecting/excluding features without any change.

Question 5

Q

Both feature selection and Principal Component Analysis reduce the number of features for a given dataset. How does the process differ between these two techniques?

Answer

A

PCA creates new features

Question 6

Q

What is one approach for selecting the optimal number of components for PCA?

Answer

A

Graph the sum of total variance of all components and look for elbow point

Question 7

Q

When does PCA work optimally?

Answer

A

PCA works best when the data follows a normal distribution.

Question 8

Q

What is a scenario where PCA may not perform adequately?

Answer

A

It is sensitive to outliers and may not work optimally if data is sparser.

Question 9

Q

t-SNE and UMAP are ___________ methods for ___________.

Answer

A

machine learning, dimensionality reduction

Question 10

Q

t-SNE can only compute up to ? components.

Question 11

Q

What is perplexity?

Answer

A

a hyperparameter is t-sne that controls how many neighbours each point considers

Question 12

Q

Which hyperparameters does UMAP use?

Answer

A

“nearest neighbours” (n_neighbors): Controls how UMAP balances local
structure versus global structure in the data.

“minimum distance” (min_dist), which controls the distance between the
points in the low dimensional representation

Question 13

Q

What algorithm does UMAP use for optimisation?

Answer

A

Uses deterministic graph Laplacian-based optimisation,

Question 14

Q

What algorithm does t-SNE use for optimisation?

Answer

A

Uses stochastic gradient descent

Question 15

Q

Can you combine multiple dimensionality reduction approaches?

Answer

A

Yes, use PCA to extract greater number of components and then UMAP or t-SNE to reduce to 2 components

Question 16

Q

What is PCA?

Answer

Study These Flashcards

A

a technique used to reduce the dimensionality of a dataset but preserve the variance

IFN580 Week 7 Unsupervised Learning Flashcards

(16 cards)