5.3 Predictive Analytics (EN) Flashcards

1
Q

Given the decision tree below and a test set with 20 observations, what is the accuracy of this model?
incorrect IIIIIII= 7
correct IIIIIIIIIIIII = 13

A

accuracy = nr of correct predictions/ nr of observations *100
13/20
0,65

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

You are in the process of building a decision tree for the dataset below. In the first step, you identify attribute “Color” as the best possible attribute to split the instances in the root node of the tree. As such, you end up with the so-called “decision stump” below. You are using the misclassification error as the impurity measure for constructing the tree.

Suppose that you want to further improve the tree and therefore look into how to further split “Internal Node 3”. What is the resulting impurity when you split “Internal Node 3”, based on the best attribute available?

A

1/4 ???????
color blue = 8/20
second attr= 3/8 true
misclassification error = 1 - max (proportion of majority class, proportion of second majority class)
decision stump= it is a simple decision tree with a single decision point and two leaf nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You are constructing a decision tree for the data set below. You use the misclassification error as impurity metric for building a decision tree
Error(t)= 1- max[p(iIt)]
What is the impurity gain of the split when using the best attribute for the first split when building the tree?
What is the misclassification error of the split when using the best attribute for the first split when building the tree?

A

4/20 ????
8/20????

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In linear regression, the parameter coefficients are chosen in such a way that the

sum of squared residuals or errors is maximized.
sum of squared residuals or errors is minimized.
product of squared residuals or errors is minimized.
product of squared residuals or errors is maximized.

A

sum of squared residuals or errors is minimized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The higher the Area under the ROC curve (AUC) the
better the performance
worse the performance

A

better the performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Given the following Gains chart: …
With a total client base of 10 000 people and 5000 responders on a marketing campaign, If we target 8000 clients with the highest scores from our model, we expect to reach:

1250 responders.
5000 responders.
2500 responders.
4750 responders.

A

4750 responders.

x-as is contacted, y-axis is responders

8000contacted/10 000total= 80% contacted op de x as –> waarde (0,8; 0,95)
0,95* 5000 responders= 4750

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following statements is NOT CORRECT about the k-nearest neighbor classifier?

It is intuitive and easy to understand.
It has a large computing power requirement.
It needs a value for k which should be determined upfront.
It is unaffected by the presence of irrelevant variables.

A

It is unaffected by the presence of irrelevant variables.
KNN is sensitive to the presence of irrelevant variables bcs If irrelevant variables are present, they may introduce noise and contribute to incorrect distance calculations,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When the cut-off is set at its minimum (e.g., 0), then

the sensitivity becomes 1 and the specificity becomes 1.
the sensitivity becomes 1 and the specificity becomes 0.
the sensitivity becomes 0 and the specificity becomes 1.
the sensitivity becomes 0 and the specificity becomes 0.

A

the sensitivity becomes 1 and the specificity becomes 0.

When the cutoff is set at its minimum (e.g., 0), the interpretation is typically that all predictions are classified as positive. In binary classification, this means that the model is predicting the positive class for all instances, and there are no negative predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Consider a data set with 100% good customers and 0% bad customers. This data set has an entropy of
0
0,5
1
10

A

0
entropy is a measure of impurity, often used for making splits in decision trees
the dataset has 100% good customers and 0% bad customers. Therefore, Pgood= 1, Pbad=0
H= -1log2(1) - 0 log2(0) = 0
=0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Confusion matrix;
real classes
predicted values 23 16
55 6

The classification accuracy is 29/100, the error rate is 71/100, the sensitivity is 23/78 and the specificity is 6/22.

The classification accuracy is 29/100, the error rate is 71/100, the sensitivity is 6/22 and the specificity is 23/78.

The classification accuracy is 71/100, the error rate is 29/100, the sensitivity is 23/78 and the specificity is 6/22.

The classification accuracy is 71/100, the error rate is 29/100, the sensitivity is 6/22 and the specificity is 23/78.

A

The classification accuracy is 29/100, the error rate is 71/100, the sensitivity is 23/78 and the specificity is 6/22.

True Positives (TP): 23
False Positives (FP): 16
False Negatives (FN): 55
True Negatives (TN): 6

Class accuracy = (TP + TN)/total = (23+6)/ (23+16+55+6) = 29/100
error rate = (FP + FN)/ total = (16+55)/ 100= 71/100
sensitivty / recall= TP/ actual postive = 23/ (23+55) = 23/78
specificty (TN)= TN/ actual negative TN= 6+ (16+ 6) =6/22

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which statement is NOT CORRECT?

In terms of advantages, decision trees are easy to interpret and understand, assuming they are not too big.

Decision trees are non-parametric, because no assumptions of normality, symmetric distributions, or independence are needed.

Decision trees are very robust with respect to outliers.

Decision trees are often referred to as stable classifiers since they are very insensitive to changes in the training data.

A

Decision trees are often referred to as stable classifiers since they are very insensitive to changes in the training data.

Decision trees can be sensitive to changes in the training data

(non-parametric= make no assumptions about undelrying distribution of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Netflix decision tree
Weather=sunny; Tired; No= No netflix –> III (I)
Weather=sunny; Tired; Yes= Netflix –> IIIII (II)
Weather=Rainy; Homework= No; Netflix –> II
Weather=Rainy; Homework= yes; Tired= No; No netflix –> II (I)
Weather=Rainy; Homework= yes; Tired= yes; netflix –> II (III)

The classification accuracy is 0.35, the error rate is 0.65, the sensitivity is 0.5, the specificity is 0.8.

The classification accuracy is 0.35, the error rate is 0.65, the sensitivity is 0.8, the specificty is 0.5.

The classification accuracy is 0.65, the error rate is 0.35, the sensitivity is 0.5, the specificity is 0.8.

The classification accuray is 0.65, the error rate is 0.35, the sensitivity is 0.8, the specificity is 0.5.

A

The classification accuracy is 0.65, the error rate is 0.35, the sensitivity is 0.8, the specificity is 0.5.

TP= 8
FP= 2
TN= 5
FN= 5
Class accuracy = (TP + TN)/total =
(9+5)/ (20) = 13/20 = 0,65
error rate = (FP + FN)/ total =
(2+5)/ 20= 7/20 = 0,35
sensitivty / recall= TN/ actual postive TP =
8/(8+5) =0,5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To avoid overfitting from happening when building a decision tree, various strategies can be adopted. One option is to split the data into a training set and a validation set. The optimal tree is then chosen where the

training set error is maximal.
validation set error is minimal.
training set error is minimal.
validation set error is maximal

A

validation set error is minimal.

By using a validation set, you can evaluate different tree sizes and select the one that provides the best performance on unseen data, thus avoiding overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which is the most easy decision to make when building a decision tree?
splitting decision
stopping decision
assignment decision

A

assignment decision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Using the classification error to build a decision tree, the gain of the employment split is:

0.35.
0.33.
0.4.
0.5.

A

0,4?????

total instance 20
employed yes = 8 -> 0 churn
employed no = 12 -> 10 churn,, 2 no churn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Consider a data set with 50% good customers and 50% bad customers. This data set has an entropy of

0
0.5
1
10

A

1

k= 2, each class has proportion of 0,5
H= -0,5* log2(0,5) - 0,5* log(0,5)
H= -0,5 (-1) -0,5(-1) = 0,5+ 0,5 = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Confusion matrix
actual class
Pred. class 0 1
42 57

The classification accuracy is 57/100, the error rate is 43/100, the sensitivity is 57/58 and the specificity is 0.

The classification accuracy is 57/100, the error rate is 43/100, the sensitivity is 0 and the specificity is 57/58.

The classification accuracy is 43/100, the error rate is 57/100, the sensitivity is 57/58 and the specificity is 0.

The classification accuray is 43/100, the error rate is 57/100, the sensitivity is 0 and the specificity is 57/57.

A

The classification accuracy is 57/100, the error rate is 43/100, the sensitivity is 0 and the specificity is 57/58.

confusion table ;
TP FP
FN TN

ACC= (TP+ TN) / all observ
ERR= (FP+FN) / all observ.
SENS aka true negative TPR= TP / actual positive aka TP+FN
SPEC aka true neg rate TNR= TN/ actual negative aka FP + TN

TP= 0
FP=1
FN= 42
TN= 57

18
Q

When predicting fraud, the target variable is

continuous
categorical

A

categorical

19
Q

Consider the following numbers from a confusion matrix: TP=60; TN=15; FP=10; FN=15. Which statement is CORRECT?

The classification accuracy equals 75%, the sensitivity 80% and the specificity 60%.

The classification accuracy equals 75%, the sensitivity 60% and the specificity 80%.

The classification accuracy equals 80%, the sensitivity 75% and the specificity 60%.

The classification accuracy equals 25%, the sensitivity 60% and the specificity 80%.

A

The classification accuracy equals 75%, the sensitivity 80% and the specificity 60%.

20
Q

In the ROC curve, the diagonal represents the

perfect model.
random model.

A

random model.

21
Q

Consider a data set with 100% good customers and 0% bad customers. This data set has a classification error of

0
0,5
1
10

A

0

ERR= (FP+FN)/total -> 0/1= 0

22
Q

The Area under the ROC curve (AUC) represents the probability that

a randomly chosen good gets a higher score than a randomly chosen bad.

a randomly chosen bad gets a higher score than a randomly chosen good.

a randomly chosen good the same score as a randomly chosen bad.

A

a randomly chosen good gets a higher score than a randomly chosen bad.

23
Q

When predicting customer lifetime value (CLV), the target variable is

continuous.
categorical

A

continuous

CLV= total expected value a customer will generate for a business over the entire duration of their relationship with that business.

24
Q

Consider the bounding function f(z)=1/(1+e^(-z)). Which statement is CORRECT?

For z=0 it becomes 0.5, for z very big it will approach 1 and for z very small it will approach 0.

For z=0 it becomes 0.5, for z very big it will approach 0 and for z very small it will approach 1.

For z=0 it becomes 1, for z very big it will approach 0 and for z very small it will approach 0.5.

For z=0 it becomes 0, for z very big it will approach 0.5 and for z very small it will approach 0.

A

For z=0 it becomes 0.5, for z very big it will approach 1 and for z very small it will approach 0.
~behavior of the sigmoid or logistic function

for z=0; e^(-z) becomes 1, fucntion becomes 1/ (1+1) = 0,5
for z=big; e^(-z) becomes 0 -> 1/ (1+0) = 1
for z=small; e^(-z) becomes large -> 1/ ~9999..= 0

25
Q

A tree with many splits and leave nodes is likely to

overfit.
underfit.

A

overfit

26
Q

Decision trees can be used for

for doing categorization.
for variable selection.
for segmentation.
as the final analytical model.
All of the above.

A

All of the above

27
Q

The classification accuracy, error rate, sensitivity and specificity are

independent from the cut-off chosen.
dependent upon the cut-off chosen.

A

dependent upon the cut-off chosen.

adjusting the cut-off value changes the balance between sensitivity and specificity, which, in turn, impacts accuracy and error rate

28
Q

When building a decision tree with a training set for making the splitting decision and a validation set for making the stopping decision, the training set error is usually

higher than the validation set error.
lower than the validation set error.

A

lower than the validation set error.

he decision tree is designed to fit the training data well, so the training set error is typically lower than the validation set error

29
Q

Given the decision tree below and a test set with 20 observations. Which of the records below is a False Negative (FN)?

Record with ID “3”
Record with ID “6”
Record with ID “12”
Record with ID “16”

A

Record with ID “3”
should be false but states true

FN= Actuall class is TRUE (in table), predicted class is FALSE (in decision tree)

30
Q

Given the gains chart below representing the performance of a certain response model by means of a lift curve. Suppose that you have a potential customer base of 4.000 clients. In other words, you can at most contact 4.000 people with a certain marketing initiative (e.g. a brochure).
How many respondents will react if you target the 1600 customers with the highest response score?

A

640
1600contacted/4000total= 40% –> graph; (40,80) (lift curve)
but baseline for 40% customers contatced is 40%postive repsonse = 1600*40% = 640 reactions

31
Q

Given the decision tree below and a test set with 20 observations, what is the number of true positives (TP)?

A

(7)
TP = actual class= bad, ,predicted class= bad

TP. FP
FN TN

32
Q

Given the decision tree below and a test set with 20 observations. Which of the records below is a False Negative (FN)?

A

(10)
FN
actual = good, predicted = bad

TN FP
FN TP

33
Q

Given the following metric: (FP+FN) / (TP+FP+TN+FN)

A

error rate

34
Q

Linear regression typically assumes that the target variable is

continuous.
categorical.

A

continuos

35
Q

Given the following metric: TN / (TN+FP). This is the

A

SPEC

36
Q

Consider a data set with 50% good customers and 50% bad customers. This data set has a GINI of

A

0,5
1-(Ppos ^2-Pneg^2)

37
Q

With a total client base of 10 000 people and 5000 responders on a marketing campaign, If we target 6000 clients with the highest scores from our model, we expect to reach:

A

60% x-axis –> (0,6; 0,85)
5000*0,85= 4250 reactions

38
Q

Using the classification error to build a decision tree, the gain of the monetary split is:

0.25.
0.15.
0.2.
0.5.

A

0,2??????

39
Q

Consider a data set with 50% good customers and 50% bad customers. This data set has a gini of

A

Gini= 1 - (p^2 pos. + p^2 neg.)
Gini= 1 -(0,5^2 + 0,5^2)
= 1- (0,25+0,25) = 0,5

40
Q

The R-squared always varies between
Pearson always varies between

A

0 and 1
-1 and +1

41
Q

Given the gains chart below representing the performance of a certain response model by means of a lift curve. Suppose that you have a potential customer base of 8.000 clients. In other words, you can at most contact 8.000 people with a certain marketing initiative (e.g. a brochure).
How many respondents will react if you target the 800 customers with the highest response score?

A

240
10% x-axis –> (10,30)
800 responders * 300% positive repsonses= 2400