Regression and Classification Flashcards
(56 cards)
Accuracy
Correct / Total. Good when equal consequences from false positive or false negative (e.g., manufacturing QC, weather, manufacturing stock prediction, school grades)
AUC
Area under the ROC curve - represents the chances that a randomly positive will have higher ranking than random negative. Coin is .5, perfect is 1. Good general way to choose between models for a balanced dataset.
Batch Gradient Descent
Graident Descent Algorithm that goes through the entire data set to calculate partial derivatives and update weight and bias
Batch Size
The number of examples used in a single learning iteration (before updating weight and bias)
Binary classification
Type of classification that predicts one of two mutually exclusive classes - positive and negative
Clasification
Learning models that attempt to predict a defined number of categories that can be numbers or or non-numeric (e.g., dog, cat, bird)
Class-Imbalanced
A dataset for a classification problem where the number of labels of each class differs significantly
Common causes of data quality and reliability issues
Omitted, duplicate, value errors, label errors, bad sections
Confusion matrix
Used to summarize the performance of a classification algorithm. Typically a 2x2 with actual vs predicted with actual as rows and predicted as values
Cost function
A combination across all loss functions. May include some sort of penalty for complexity.
Decision boundary
A surface or line that separates different classes predited by a classification algorithm. It markes the boundary of one class vs another.
Early stopping
Regularization method that stops before training loss stops decreasing, stops when validation loss increases (or generalization performance worsens)
epoch
A full training pass over the entire training set (every example processed once). 1 epoch = 1/batch size
false positive rate
FP / (FP + TN) - rate that negative is accidentally called a positive. This is the x-axis for ROC curve.
Feature
Individual model input that is an individual property or characteristic typically represented by columns (or “x” in a model formula)
Feature engineering
Using intuition to design new features by transforming or combining original features (using depth and width to define “area”)
Feature scaling
Scaling the range of features to improve gradient descent performance - typically done via mean normalization or z-score normalization. Goal is generally -1 < x < 1
Gradient Descent
An optimization function to minimize a loss function by iteratitvely adjusting the parameters of a model based on the partial derivatives that point in the steepest direction of the loss function.
Hyperparameters
The variables that you or a hyperparameter tuning service adjust between successive runs of training a model. (e.g., learning rate). This is in contast to parameters, like weight and bias that the model learns during training.
L1 loss
A loss function used for regression that calculates the aboslute difference between actual labels and what a model predicts. L1 loss is less sensitive to outliars than L2 loss.
L1 Regularization
A type of regularization that penalizes weights in proportion to the sum of the absolute value of the weights. L1 regularization helps drive weights of irrelevant or almost irrelevant features to exactly 0 - removing it from the model.
L2 Loss
A loss function used for regression that calculates the square of the difference between actual labels and what amodel predicts. L2 loss is more sensitive to outliars than L1 loss.
L2 Regularization
A type of regularization that penalizes weights in proportion to the sum of the square of the weights. L2 regularization helps drive outlier weights closer to zero (but not zero). Those with low 0 values stay in the model. L2 regularization always improves generalization in linear models.
Label
A label is the target output or value that a model is trained to predict. It represents the answer or ground truth for a single data point in supervised learning. Also known as target.