L1 Flashcards

(27 cards)

1
Q

What is Machine Learning?

A

Learning from data to make predictions or extract knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is Machine Learning important?

A
  • Massive growth of unstructured data: texts, audio, images, video.
  • Knowledge discovery from abundant data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Supervised Learning?

A

Learning with input + labeled output.

  • Classification (discrete output)
  • Regression (continuous output)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Unsupervised Learning?

A

Learning with input only to find patterns or groups.

  • e.g. Clustering news topics; Discovering trending topics on Twitter or in the news / Outlier detection (e.g. Fraud detection & security systems)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Reinforcement Learning?

A

Learning policies via interaction & reward.

  • Reasoning under uncertainty to make optimal decisions
  • how agents should to take actions in an environment to maximize some reward
  • Other Types: Semi-supervised, Active Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Model in Machine Learning?

A

Mathematical function linking input to output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Score functions?

A

Measures of how well a model fits data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Feature Selection?

A

Picking important features for a model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Feature Extraction?

A

Transforming data into new features through mathematical operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Classification in Supervised Learning?

A

Predicting category/discrete output (e.g., pass/fail).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Regression in Supervised Learning?

A

Predicting numerical value/continuous output (e.g., salary, temperature).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a DummyClassifier?

A

Classifies data using simple strategies without generating insights.

  • Do not generate any insight about data
  • Baseline models to compare against other more complex classifiers/regressors using naive strategies (mean, mode)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What types of data are used in Machine Learning?

A
  • Images - Numeric arrays (RGB values for each pixel)
  • Text - Needs preprocessing (e.g., tokenization); Words/ Letters need to be converted in a format computers can understand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of Scaling in preprocessing?

A

To standardize input numerical attributes with different scales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Standard Scaler?

A

Transforms data to have a mean of 0 and a standard deviation of 1.

  • good for normal distributions)
  • common method in data normalization (good for non-skewed data)
  • z = x - mean / s.d
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Robust Scaler?

A

Uses median and IQR to handle outliers.
- better for skewed data

17
Q

What is the MinMax Scaler?

A

Rescales data to a specified range, usually [0, 1].

18
Q

What does Normalizer do?

A

Normalizes row-wise vector norms.

  • (used only when direction matters)
  • Each row of data rescaled so its norm becomes 1
  • helpful for histograms
19
Q

What are Univariate Transformations?

A
  • Transformations like Log, power, Box-Cox to make data more Gaussian.
  • Parameters automatically estimated so skewness minimized & variance stabilized
20
Q

What is Binning?

A
  • Grouping numeric values into intervals for simplification.
  • Good for models with few parameters (e.g. regression); not effective for models with many parameters (e.g. decision tree)
21
Q

What is the purpose of Train/Test Split?

A

Prevents overfitting by evaluating model performance on separate data.

22
Q

What is Cross-Validation?

A

Evaluating a model’s ability to predict new data.

  • Detect overfitting / selection bias
23
Q

What is K-Fold Cross-Validation?

A

Splitting data into K parts for training and testing.

24
Q

What is Leave-One-Out Cross-Validation?

A

An extreme version of K-Fold where each observation is used as a test set once.

25
What are ML Pipelines?
Workflows to execute a sequence of tasks in Machine Learning.
26
Steps for ML Pipelines
1. Data normalization (scaling) 2. Imputation of missing values 3. Dimensionality Reduction 4. Classification
27
Give an example of instance, label / class. features / attributes, feature values, feature vector
instance -- "Pikachu" label / class -- "mouse" features / attributes -- abilities, weight, legendary feature values -- "lighting rod", "2", "yes" feature vector -- ("lighting rod", "2", "yes")