- A cross-tabulation of our model's predictions against actual values - A matrix (table) used to measure the performance of a machine learning algorithm - Rows: actual classes (Ci) - Columns: predicted classes (Cj)

- A regression algorithm - To find the values of the coefficients that weight each input variable) - To assign observations to a discrete set of classes - To predict discrete outcomes - binomial and multinomial - The output is a value between 0 and 1 that represents the probability of one class over the other.

Classification Flashcards by M&M Sweet

impute/imputation

In statistics, imputation is the process of replacing missing data with substituted values.

How well did you know this?

Not at all

Perfectly

confusion matrix

A cross-tabulation of our model’s predictions against actual values
A matrix (table) used to measure the performance of a machine learning algorithm
Rows: actual classes (Ci)
Columns: predicted classes (Cj)

How well did you know this?

Not at all

Perfectly

What are the 4 possible outcomes of classification task?

True Positive
False Postive
False Negative
True Negative

How well did you know this?

Not at all

Perfectly

What is the common choice for the baseline model for a classification problem?

a model that simply predicts the most common class every single time

How well did you know this?

Not at all

Perfectly

What are the common evaluation metrics for a classification problem/model?

Accuracy
Precision
Recall
Specificity
f1 score
ROC curve

How well did you know this?

Not at all

Perfectly

What is accuracy?

the number of times we predicted correctly divided by the total number of observations

How well did you know this?

Not at all

Perfectly

What is precision / positive predictive value?

the percentage of positive predictions that we made that are correct.

How well did you know this?

Not at all

Perfectly

What is recall / true positive rate / sensitivity?

the percentage of positive cases we accurately predicted.

How well did you know this?

Not at all

Perfectly

What is specificity / true negative rate?

the percentage of negative cases we accurately predicted.

The percentage of predicting true negative out of all negatives.

How well did you know this?

Not at all

Perfectly

logistic regression

A regression algorithm
To find the values of the coefficients that weight each input variable)
To assign observations to a discrete set of classes
To predict discrete outcomes
binomial and multinomial
The output is a value between 0 and 1 that represents the probability of one class over the other.

How well did you know this?

Not at all

Perfectly

regularized least squares

A way of solving least squares regression problems
An extra constraint on the solution, which is called regularization
It adds a penalty term to the error.
A argument in LogisticRegression

How well did you know this?

Not at all

Perfectly

What are the components of a decision tree?

root
condition/internal node
branches/edges
decision/leaf

How well did you know this?

Not at all

Perfectly

What is classification tree?

to classify the outcome variable

How well did you know this?

Not at all

Perfectly

What is regression tree?

to predict continuous values like price of a house

How well did you know this?

Not at all

Perfectly

CART = ?

classification and regression trees

How well did you know this?

Not at all

Perfectly

Recursive binary splitting / greedy algorithm

Study These Flashcards

Consider all the features
Use a cost function to try and test all different (candidate) split points
Select the split with the best/lowest cost.
Make the root node the best predictor/classifier

What are Decision Trees?

Study These Flashcards

Is a supervised machine learning process / train on labeled data.
Use the training data to train the tree to find a decision boundary / a sequence of rules.
Use the boundary as a decision rule to classify 2 or more classes.

What does each node represent?

Study These Flashcards

Is a splitting point in the decision tree.
Represents a single input variable(x).
a split point or class of that variable

What are the pros of decision tree?

Study These Flashcards

Simple to understand
Simple to visualize
Simple to explain the output
Requires little data preparation
Don’t need to encode our target variable
Perform well for a broad range of problems

What is f1-score?

Study These Flashcards

harmonic mean of Recall and Precision
- giving both metrics equal weight.
When you are looking to optimize for both Recall and Precision.

What is support?

Study These Flashcards

number of occurrences of each class in where y is true.

What is overfitting?

Study These Flashcards

Don’t generalize the data well.

How to avoid overfitting in Decision-tree?

Study These Flashcards

Mechanisms such as

Pruning.
Set the minimum number of samples required at a leaf node
Set the maximum depth

How to handle overfitting?

Study These Flashcards

Obtain more training data
Feature engineering
3.

What is Logistic Regression model?

1. Maps any real value into a number between 0 and 1, - representing the probability that - an observation is in the positive class.

How does the threshold in LR model affect the metrics?

1. decrease the threshold, Recall increases. | 2. increase the threshold, Precision increases

What is ROC curve?

1. Receiver Operating Characteristic Curve 2. Summarize the trade-off between TPR and FPR - for a predictive model - using different probability thresholds

How to calculate baseline prediction?

1. Average value of dependent variable

What is encoding?

Transform categorical variables to binary or numeric counterparts

What is the purpose of splitting data?

to avoid overfitting the model to one sample

Classification Flashcards

(30 cards)