L1 Flashcards
(27 cards)
What is Machine Learning?
Learning from data to make predictions or extract knowledge.
Why is Machine Learning important?
- Massive growth of unstructured data: texts, audio, images, video.
- Knowledge discovery from abundant data
What is Supervised Learning?
Learning with input + labeled output.
- Classification (discrete output)
- Regression (continuous output)
What is Unsupervised Learning?
Learning with input only to find patterns or groups.
- e.g. Clustering news topics; Discovering trending topics on Twitter or in the news / Outlier detection (e.g. Fraud detection & security systems)
What is Reinforcement Learning?
Learning policies via interaction & reward.
- Reasoning under uncertainty to make optimal decisions
- how agents should to take actions in an environment to maximize some reward
- Other Types: Semi-supervised, Active Learning
What is a Model in Machine Learning?
Mathematical function linking input to output.
What are Score functions?
Measures of how well a model fits data.
What is Feature Selection?
Picking important features for a model.
What is Feature Extraction?
Transforming data into new features through mathematical operations.
What is Classification in Supervised Learning?
Predicting category/discrete output (e.g., pass/fail).
What is Regression in Supervised Learning?
Predicting numerical value/continuous output (e.g., salary, temperature).
What is a DummyClassifier?
Classifies data using simple strategies without generating insights.
- Do not generate any insight about data
- Baseline models to compare against other more complex classifiers/regressors using naive strategies (mean, mode)
What types of data are used in Machine Learning?
- Images - Numeric arrays (RGB values for each pixel)
- Text - Needs preprocessing (e.g., tokenization); Words/ Letters need to be converted in a format computers can understand
What is the purpose of Scaling in preprocessing?
To standardize input numerical attributes with different scales.
What is the Standard Scaler?
Transforms data to have a mean of 0 and a standard deviation of 1.
- good for normal distributions)
- common method in data normalization (good for non-skewed data)
- z = x - mean / s.d
What is the Robust Scaler?
Uses median and IQR to handle outliers.
- better for skewed data
What is the MinMax Scaler?
Rescales data to a specified range, usually [0, 1].
What does Normalizer do?
Normalizes row-wise vector norms.
- (used only when direction matters)
- Each row of data rescaled so its norm becomes 1
- helpful for histograms
What are Univariate Transformations?
- Transformations like Log, power, Box-Cox to make data more Gaussian.
- Parameters automatically estimated so skewness minimized & variance stabilized
What is Binning?
- Grouping numeric values into intervals for simplification.
- Good for models with few parameters (e.g. regression); not effective for models with many parameters (e.g. decision tree)
What is the purpose of Train/Test Split?
Prevents overfitting by evaluating model performance on separate data.
What is Cross-Validation?
Evaluating a model’s ability to predict new data.
- Detect overfitting / selection bias
What is K-Fold Cross-Validation?
Splitting data into K parts for training and testing.
What is Leave-One-Out Cross-Validation?
An extreme version of K-Fold where each observation is used as a test set once.