L1 Flashcards by jolyn Unknown

What is Machine Learning?

Learning from data to make predictions or extract knowledge.

How well did you know this?

Not at all

Perfectly

Why is Machine Learning important?

Massive growth of unstructured data: texts, audio, images, video.
Knowledge discovery from abundant data

How well did you know this?

Not at all

Perfectly

What is Supervised Learning?

Learning with input + labeled output.

Classification (discrete output)
Regression (continuous output)

How well did you know this?

Not at all

Perfectly

What is Unsupervised Learning?

Learning with input only to find patterns or groups.

e.g. Clustering news topics; Discovering trending topics on Twitter or in the news / Outlier detection (e.g. Fraud detection & security systems)

How well did you know this?

Not at all

Perfectly

What is Reinforcement Learning?

Learning policies via interaction & reward.

Reasoning under uncertainty to make optimal decisions
how agents should to take actions in an environment to maximize some reward
Other Types: Semi-supervised, Active Learning

How well did you know this?

Not at all

Perfectly

What is a Model in Machine Learning?

Mathematical function linking input to output.

How well did you know this?

Not at all

Perfectly

What are Score functions?

Measures of how well a model fits data.

How well did you know this?

Not at all

Perfectly

What is Feature Selection?

Picking important features for a model.

How well did you know this?

Not at all

Perfectly

What is Feature Extraction?

Transforming data into new features through mathematical operations.

How well did you know this?

Not at all

Perfectly

What is Classification in Supervised Learning?

Predicting category/discrete output (e.g., pass/fail).

How well did you know this?

Not at all

Perfectly

What is Regression in Supervised Learning?

Predicting numerical value/continuous output (e.g., salary, temperature).

How well did you know this?

Not at all

Perfectly

What is a DummyClassifier?

Classifies data using simple strategies without generating insights.

Do not generate any insight about data
Baseline models to compare against other more complex classifiers/regressors using naive strategies (mean, mode)

How well did you know this?

Not at all

Perfectly

What types of data are used in Machine Learning?

Images - Numeric arrays (RGB values for each pixel)
Text - Needs preprocessing (e.g., tokenization); Words/ Letters need to be converted in a format computers can understand

How well did you know this?

Not at all

Perfectly

What is the purpose of Scaling in preprocessing?

To standardize input numerical attributes with different scales.

How well did you know this?

Not at all

Perfectly

What is the Standard Scaler?

Transforms data to have a mean of 0 and a standard deviation of 1.

good for normal distributions)
common method in data normalization (good for non-skewed data)
z = x - mean / s.d

How well did you know this?

Not at all

Perfectly

What is the Robust Scaler?

Study These Flashcards

Uses median and IQR to handle outliers.
- better for skewed data

What is the MinMax Scaler?

Study These Flashcards

Rescales data to a specified range, usually [0, 1].

What does Normalizer do?

Study These Flashcards

Normalizes row-wise vector norms.

(used only when direction matters)
Each row of data rescaled so its norm becomes 1
helpful for histograms

What are Univariate Transformations?

Study These Flashcards

Transformations like Log, power, Box-Cox to make data more Gaussian.
Parameters automatically estimated so skewness minimized & variance stabilized

What is Binning?

Study These Flashcards

Grouping numeric values into intervals for simplification.
Good for models with few parameters (e.g. regression); not effective for models with many parameters (e.g. decision tree)

What is the purpose of Train/Test Split?

Study These Flashcards

Prevents overfitting by evaluating model performance on separate data.

What is Cross-Validation?

Study These Flashcards

Evaluating a model’s ability to predict new data.

Detect overfitting / selection bias

What is K-Fold Cross-Validation?

Study These Flashcards

Splitting data into K parts for training and testing.

What is Leave-One-Out Cross-Validation?

Study These Flashcards

An extreme version of K-Fold where each observation is used as a test set once.

What are ML Pipelines?

Workflows to execute a sequence of tasks in Machine Learning.

Steps for ML Pipelines

1. Data normalization (scaling) 2. Imputation of missing values 3. Dimensionality Reduction 4. Classification

Give an example of instance, label / class. features / attributes, feature values, feature vector

instance -- "Pikachu" label / class -- "mouse" features / attributes -- abilities, weight, legendary feature values -- "lighting rod", "2", "yes" feature vector -- ("lighting rod", "2", "yes")

L1 Flashcards

(27 cards)