Classification Flashcards

1
Q

When is classification in modeling useful? (read: used)

A

When the response (outcome Variable) is nominal or categorial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Must there be an order of the levels in the categorial response in order to use classification?

A

No, there must be no order in the levels of the categorial response, otherwise other techniques may and should be applied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In classification, what is the vocabulary?

A

The vocabulary is the set of Classes C that the response value can be devided into. f.i. {yes, no} or {bus, tram, bike, train}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a classifier do?

A

It takes as an input the features (IV’s) and predicts what class (C) most likely belongs to categorial resonse Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Multi-Label Classification?

A

When the instances we test may be associated with more than one class.

f.i. an image containing a house and a dog may be classified as BOTH a house and a dog.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Hierarchical or Multilevel Classification?

A

The classes in set C can be divided into subclasses, so it hierarchical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Structured Classification?

A

If there is structure in the input, there must also be a structure in the output.

f.i. each word of a sentence can be classed as {noun, verb, etc} but a sentence has to contain a certain structure to be comprehendable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In Binaray or Multiclass Classification, how do we calculate Accuracy?

A

Where TP/TN means True Positive/Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is accuracy a good metric for classification?

A

Only if both cases are well represented, this is because of imbalance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In classification, what is imbalance?

A

If one response appears more frequent than the other responses in our data, there is an natural skew in the classifier towards classifying this response class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In classification, what is recall and how do we calculate it?

A

fraction of predicted that are correct.

(TP/NF) = true positive/ negative false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In classification, what is precision and how do we calculate it?

A

fraction of predicted that are correct.

(TP/NF) = true positive/ negative false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In classification, what is the F-score?

And what does it’s value represent?

A

The F-score is an harmonic mean to balance Precision and Recall.

A high F-score means both P and R are high as well, generally a good thing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For what types of classification can we use classification as regression?

A

Only binary classification. However for classification as logistic regression the multi levels can be seperated as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In classification as regression, what type of regression is useful?

A

Logisitc Regression, as the sponse space ensures the probabilty of the link space between the binary values of 0 and 1, hence the threshold is always crossed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In tree modelling using regions, explain top down greedy approach.

A

Top-down: from root to leaves
Greedy: make the best split at each point, without looking back or forward

17
Q

In tree modelling, what is the ideal region accuracy using the Gini Index?

A

If all instances in a region belong to a certain class, and no other regions contain instances of that class. So the gini index is 1.

18
Q

In tree modelling, what is pruning?

A

Pruning is the reduction of an existing tree in order to reducing variance and overfitting.

19
Q

What are hyperparameters?

A

They are not parameters of the model, but of the training process.

20
Q

What process is used to determine how much to prune a tree?

A

Same as other overfitting models: cross-validation on tree size.

21
Q

in terms of bias and variance, how do the values change when pruning deep trees?

A

reduce variance but increase bias

22
Q

in terms of bias and variance, what values are expected in deep (read: large) trees?

A

low bias but high variance

23
Q

What is bagging in modelling and how does it work?

A

Bagging is a technique to reduce the model complexity without reducing bias (compromising accuracy). often used in tree pruning.

In bragging you create multiple models (trees) with multiple training sets and average their predictions. this reduces variance.

24
Q

Where are the training sets comming from when using bagging?

A

You sample (or bootstrap) the (sub) training sets from the full training set.

25
Q

Why and how do Random Forests improve the tree model?

A

Random Forests are used in tree-bagging and reduce variance even more by decorrelating the trees: In normal bagging you use all the predictors in every model, this means most models will be similar. Hence Random Forests make use of only a (random) selection of the predictors in each tree split.

26
Q

What does an Support Vector Machine do?

A

It finds a hyperplane that separates the classes in the (2D) feature space.

27
Q

In SVM’s, what are margins?

A

When a hyperplane devides an space into two classes, the margin defines the gap between the classes and the hyperplane

28
Q

What are soft margins in SVM?

A

It accounts for the fact that a perfect hyper plane cannot always exist due to noise. in soft margins, points further than M(1-ei) units away are still allowed.

(ei is error from the (correct) margin edge.)

29
Q

In SVM, what is a budget?

A

A budget is refering to the total amount of error when using Soft Margins to find you hyper plane. Note that the budget is a hyperparameter!

30
Q

In SVM, what is the support set?

A

It is the set of points that are located inside the margins of an hyperplane.

These observations determine the position of the hyperplane on their own.

31
Q

Do I have a low or high budget in SVM when my variance is high and my bias is low?

A

low budget, as few points are contained in the margin so variance is high, and accuracy is high as well hence low bias.

32
Q

Do I have a low or high budget in SVM when my variance is low and my bias is high?

A

high budget, as many points are contained in the margin so variance is low, but accuracy is low as well hence high bias.

33
Q

In SVM for Multiclass Classification, explain the One versus All approach.

A

You make a hyperplane for each class of a model and choose the class for which the hyperplane best predicts. (fk(x) is largest)

34
Q

In SVM for Multiclass Classification, explain the One versus One approach.

A

You model each possible pair of classes in you predictors and choose the class that wins in most cases.