L11 Flashcards

Question 1

Q

What are the three types of learning in machine learning?

Answer

A

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised learning involves labeled data, unsupervised learning does not, and reinforcement learning involves learning through interaction with the environment.

Question 2

Q

What is the objective of supervised learning?

Answer

A

Learn function F: Xk→YkX_k
ightarrow Y_k

The aim is to map inputs to outputs based on labeled data.

Question 3

Q

What is the objective of unsupervised learning?

Answer

A

Discover patterns, cluster/group similar items

Examples include topic clustering on Twitter or detecting fraud through outlier detection.

Question 4

Q

What is the curse of dimensionality?

Answer

A

Exponential increase in feature space volume with more features

This leads to difficulties in generalization, increased risk of overfitting, and sparsity of data.

Question 5

Q

What is dimensionality reduction?

Answer

A

Process of reducing the number of features in a dataset

It speeds up computation, improves generalization, enhances visualization, and eliminates redundant features.

Question 6

Q

What are the two main approaches to dimensionality reduction?

Answer

A

Feature Selection
Dimensionality Reduction

Feature selection keeps original variables while dimensionality reduction creates new features as combinations.

Question 7

Q

What does Principal Component Analysis (PCA) do?

Answer

A

Finds new axes (principal components) by rotating the coordinate system

The first component captures the direction of maximum variance, while the second is orthogonal to the first.

Question 8

Q

What is the first step in performing PCA?

Answer

A

Standardize data (mean 0, variance 1)

Standardization ensures that each feature contributes equally to the analysis.

Question 9

Q

What are the properties of PCA?

Answer

A

Orthogonal components
Linear combinations of original variables
Each additional component explains less variance

PCA is commonly used for data visualization and feature extraction.

Question 10

Q

How many principal components should be kept in PCA?

Answer

A

Enough to reach ≥90% explained variance

The number of components depends on training data size and classifier complexity.

Question 11

Q

True or False: PCA is color-blind and ignores class labels.

Answer

A

True

This means PCA does not consider class information when reducing dimensions.

Question 12

Q

What is Non-negative Matrix Factorization (NMF)?

Answer

A

Unsupervised method for non-negative data that factorizes a matrix into non-negative latent components

NMF is useful for text mining and image analysis.

Question 13

Q

What are the strengths of NMF?

Answer

A

Interpretability
Parts-based representations

NMF is particularly useful for analyzing topics in text and features in images.

Question 14

Q

What is a downside of NMF?

Answer

A

Requires positive data and is sensitive to initialization

NMF may also struggle with non-convex optimization.

Question 15

Q

What does t-SNE do?

Answer

A

Projects high-dimensional data into 2D or 3D while keeping local neighborhoods close

It is particularly effective for visualizing clusters in data.

Question 16

Q

What are some hyperparameters for t-SNE?

Answer

Study These Flashcards

A

n_components (usually 2)
perplexity (typically 5–50)
early_exaggeration
learning_rate (10–1000)
n_iter (≥250, typically 1,000)

These hyperparameters control various aspects of how t-SNE processes data.

Question 17

Q

What is a key limitation of t-SNE?

Answer

Study These Flashcards

A

Emphasizes local over global structure

It is non-parametric and cannot easily project new points.

Question 18

Q

What are the strengths and weaknesses of PCA?

Answer

Study These Flashcards

A

Strengths: Fast, interpretable
Weaknesses: May lose class info

PCA is a linear method that is efficient but may overlook important class distinctions.

Question 19

Q

What are the strengths and weaknesses of NMF?

Answer

Study These Flashcards

A

Strengths: Interpretability, text/image analysis
Weaknesses: Non-convex, needs positive data

NMF is useful but has specific data requirements and optimization challenges.

Question 20

Q

What are the strengths and weaknesses of t-SNE?

Answer

Study These Flashcards

A

Strengths: Great for visualization
Weaknesses: Not useful for modeling

t-SNE excels at visualizing data but is not suitable for predictive modeling tasks.

L11 Flashcards

(20 cards)