03 Classification Flashcards

1
Q

What does random_state?

A

it produces reproducible result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Does shuffling training data improve model performance

A

Shuffle training set as some models perform poorly when instances are ordered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is k-fold cross validation

A

It divides the dataset into k parts and model is trained on k-1 parts and validated on the kth part.
These parts get shuffled on each iteration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is true positive and true negative

A

True positive- their actual value is 1 and model also predicted as 1.
True negative- their actual value is 0 and model also predicted as 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is false positive and false negative

A

False positive- their actual value is 0 but model predicted them as 1.
False negative- their actual value is 1 but model predicted them as 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Confusion matrix format

A

T.P F.N
F.P. T.N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define accuracy

A

% of correct prediction made by our model
Formula- (T.N+T.P)/(TP+TN+FP+FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When to use accuracy

A

It is best to use when there is class balance and worst to use when there is class imbalance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define precision

A

Among all the positive PREDICTIONS how many are actually positive.
Formula- TP/(TP+FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define recall

A

AKA Sensitivity
Among all the ACTUAL positive’s how many are correct
Formula- TP/(TP+FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When to use Precision

A

When our objective is to minimize false positives
e.g. - if we want our model to catch criminals in this case we let go of some criminal but can not catch an innocent person hence we need to reduce false positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When to use recall

A

when our objective is to reduce false negatives
e.g. - suppose we want our model for intense checking in airport check-in. in this case it is ok to take an innocent person aside as we can check and let him go but we can not let go of a criminal hence we need to reduce false negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what relation between Recall & Precision

A

Inversely Proportional.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to use F1-Score

A

When we cannot trade of between false positives and false negatives
e.g.- we want a model to predict promotion of an employee in this case we dont want to stop promotion of a deserving employee we also dont want to promote someone not good hence we need both false positive and false negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define F1-Score

A

It is harmonic mean of precision and recall (we choose HM cause in HM even if either of precision or recall goes low the value reduces drastically)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Function used to get any score.

A

cross_val_score(sgdclassifier, x_train, y_train, cv = 3, scoring = “accuracy”)

17
Q

Impact on threshold on Precision and recall

A

When we increase the threshold the precision increases and the recall decreases.
As the threshold is decreased the recall increases and the precision decreases.

18
Q

How can we view decision score?

A

using decision_function()
we can not view the threshold but we can see the scores.

19
Q

ROC - AUC Curve

A

Plot between True Positive Rate (TPR) & False Positive Rate (FPR)

20
Q

Define True Positive Rate (TPR)

A

Recall AKA Sensitivity

21
Q

Define False Positive Rate (FPR)

A

is the number of negative instances which were wrongly identified as positive.
formula: 1-TNR(True Negative Rate i.e. negative instances which were correctly identified as negative AKA Specificity)

22
Q

which function to use to get probabilities of each instance

A

predict_proba()

23
Q

What is multiclass classification

A

it distinguishes between multiple classes

24
Q

which model are capable of classifying multilabel classification

A

Random Forest & Bayes Classifier

25
Q

Name few strictly binary models

A

SVM & Linear Classifiers

26
Q

How to use a binary classification model for multiclassification model

A
  1. One versus all
  2. Ove versus one
27
Q

Define One versus all strategy

A

in this if we want to classify 0-9 digits then we will build 10 classification models and we will consider score which is highest in each model.

28
Q

which among one versus all and one versus one strategy is preferred

A

One versus all is preferred.
Scikit-learn also uses this model by default for all binary classification

29
Q

Explain One versus One strategy

A

in this we will build one classification model for each pair.

30
Q

Best model to start with

A

Stochastic Gradient Descent
Especially with large datasets

31
Q

What are Multi label classification

A

e.g. - suppose i need a model to recognise me, jb and pj.
now in one instance i have a picture of me and jb then i need the model to give a output as 1,1,0 that is a multi label classification model