Domain 3: Modeling Flashcards
This is an application agnostic standard that can be used as a baseline to understand the various phases of the ML workflow.
The Cross Industry Standard Process for Data Mining (CRISP-DM)
ML lifecycle phases
- Identify Problem
- Collect and QC
- Prepare
- Visualization
- Feature Engineering
- Model Training
- Model Evaluation
- Business Workflow Integration
T/F: Business problem identification requires senior leadership buy-in.
True
What is the goal of ML?
To predict the value or class of an unknown quantity using a mathematical model.
Data that the model can use to “learn” from, which consists of independent variables and a dependent variable.
Training data
What makes a good model?
Should be able to generalize what it has learned to unseen data, namely data where the dependent variable is unknown.
What makes a poor model?
One that has simply memorized the training data will have poor generalization performance and therefore will not be usable in a business process.
When a model is shown labeled examples of ground truth values and learns to predict the label based on the input data or features.
Supervised learning
When you do not have labeled data available and you want the model to discover patterns in the unlabeled data.
Unsupervised learning
When a model or agent learns by interacting with its environment - similar to trial-and-error learning, where an agent is given rewards and penalties for actions taken and its aim is to maximize the long-term rewards.
Reinforcement learning
T/F: The data type (whether it is structured or unstructured) does not dictate whether learning is supervised.
True
A type of supervised learning where the label is binary, such as fraud/not fraud, cat/dog, spam/not spam
Binary classification
A type of supervised learning where the label can have more than two classes
Multiclass classification
A type of supervised learning where the label is a continuous number such as a house price
Regression
A form of supervised machine learning where a model predicts a linear relationship between the data and the labels.
Linear models
Used when you have a continuous label (regression task), where the assumption is made that the label is linearly related to the data.
Linear regression
An idea that the label is a linear combination of the input data or feature vectors.
Linearity
Two assumptions that need to be tested before a linear model can be accurately fit to the data.
Linearity, constant variance, features cannot be strongly correlated w/ one another.
This is where one feature can be linearly derived from the other, in the most trivial example; they are related by a constant.
Multicollinear
What is often used in machine learning as a way to penalize the model from learning weights that do not generalize well to unseen data and reduces the overall model complexity and prevents overfitting?
Regularization
This tends to reduce the values of weights that are unimportant in predicting the labels, where you add an L2 penalty or quadratic penalty to the weights.
Ridge
This tends to shrink the weights to zero, where where you add an L1 penalty or absolute value penalty to the weights. It also eliminates unimportant features.
Lasso
This combines ridge and lasso regulation.
Elastic net
Lasso regression is also known as _____.
Shrinkage