5.3 Predictive Analytics (EN) Flashcards

Question 1

Q

Given the decision tree below and a test set with 20 observations, what is the accuracy of this model?
incorrect IIIIIII= 7
correct IIIIIIIIIIIII = 13

Answer

A

accuracy = nr of correct predictions/ nr of observations *100
13/20
0,65

Question 2

Q

You are in the process of building a decision tree for the dataset below. In the first step, you identify attribute “Color” as the best possible attribute to split the instances in the root node of the tree. As such, you end up with the so-called “decision stump” below. You are using the misclassification error as the impurity measure for constructing the tree.

Suppose that you want to further improve the tree and therefore look into how to further split “Internal Node 3”. What is the resulting impurity when you split “Internal Node 3”, based on the best attribute available?

Answer

A

1/4 ???????
color blue = 8/20
second attr= 3/8 true
misclassification error = 1 - max (proportion of majority class, proportion of second majority class)
decision stump= it is a simple decision tree with a single decision point and two leaf nodes.

Question 3

Q

You are constructing a decision tree for the data set below. You use the misclassification error as impurity metric for building a decision tree
Error(t)= 1- max[p(iIt)]
What is the impurity gain of the split when using the best attribute for the first split when building the tree?
What is the misclassification error of the split when using the best attribute for the first split when building the tree?

Answer

A

4/20 ????
8/20????

Question 4

Q

In linear regression, the parameter coefficients are chosen in such a way that the

sum of squared residuals or errors is maximized.
sum of squared residuals or errors is minimized.
product of squared residuals or errors is minimized.
product of squared residuals or errors is maximized.

Answer

A

sum of squared residuals or errors is minimized.

Question 5

Q

The higher the Area under the ROC curve (AUC) the
better the performance
worse the performance

Answer

A

better the performance

Question 6

Q

Given the following Gains chart: …
With a total client base of 10 000 people and 5000 responders on a marketing campaign, If we target 8000 clients with the highest scores from our model, we expect to reach:

1250 responders.
5000 responders.
2500 responders.
4750 responders.

Answer

A

4750 responders.

x-as is contacted, y-axis is responders

8000contacted/10 000total= 80% contacted op de x as –> waarde (0,8; 0,95)
0,95* 5000 responders= 4750

Question 7

Q

Which of the following statements is NOT CORRECT about the k-nearest neighbor classifier?

It is intuitive and easy to understand.
It has a large computing power requirement.
It needs a value for k which should be determined upfront.
It is unaffected by the presence of irrelevant variables.

Answer

A

It is unaffected by the presence of irrelevant variables.
KNN is sensitive to the presence of irrelevant variables bcs If irrelevant variables are present, they may introduce noise and contribute to incorrect distance calculations,

Question 8

Q

When the cut-off is set at its minimum (e.g., 0), then

the sensitivity becomes 1 and the specificity becomes 1.
the sensitivity becomes 1 and the specificity becomes 0.
the sensitivity becomes 0 and the specificity becomes 1.
the sensitivity becomes 0 and the specificity becomes 0.

Answer

A

the sensitivity becomes 1 and the specificity becomes 0.

When the cutoff is set at its minimum (e.g., 0), the interpretation is typically that all predictions are classified as positive. In binary classification, this means that the model is predicting the positive class for all instances, and there are no negative predictions.

Question 9

Q

Consider a data set with 100% good customers and 0% bad customers. This data set has an entropy of
0
0,5
1
10

Answer

A

0
entropy is a measure of impurity, often used for making splits in decision trees
the dataset has 100% good customers and 0% bad customers. Therefore, Pgood= 1, Pbad=0
H= -1log2(1) - 0 log2(0) = 0
=0.

Question 10

Q

Confusion matrix;
real classes
predicted values 23 16
55 6

The classification accuracy is 29/100, the error rate is 71/100, the sensitivity is 23/78 and the specificity is 6/22.

The classification accuracy is 29/100, the error rate is 71/100, the sensitivity is 6/22 and the specificity is 23/78.

The classification accuracy is 71/100, the error rate is 29/100, the sensitivity is 23/78 and the specificity is 6/22.

The classification accuracy is 71/100, the error rate is 29/100, the sensitivity is 6/22 and the specificity is 23/78.

Answer

A

The classification accuracy is 29/100, the error rate is 71/100, the sensitivity is 23/78 and the specificity is 6/22.

True Positives (TP): 23
False Positives (FP): 16
False Negatives (FN): 55
True Negatives (TN): 6

Class accuracy = (TP + TN)/total = (23+6)/ (23+16+55+6) = 29/100
error rate = (FP + FN)/ total = (16+55)/ 100= 71/100
sensitivty / recall= TP/ actual postive = 23/ (23+55) = 23/78
specificty (TN)= TN/ actual negative TN= 6+ (16+ 6) =6/22

Question 11

Q

Which statement is NOT CORRECT?

In terms of advantages, decision trees are easy to interpret and understand, assuming they are not too big.

Decision trees are non-parametric, because no assumptions of normality, symmetric distributions, or independence are needed.

Decision trees are very robust with respect to outliers.

Decision trees are often referred to as stable classifiers since they are very insensitive to changes in the training data.

Answer

A

Decision trees are often referred to as stable classifiers since they are very insensitive to changes in the training data.

Decision trees can be sensitive to changes in the training data

(non-parametric= make no assumptions about undelrying distribution of data)

Question 12

Q

Netflix decision tree
Weather=sunny; Tired; No= No netflix –> III (I)
Weather=sunny; Tired; Yes= Netflix –> IIIII (II)
Weather=Rainy; Homework= No; Netflix –> II
Weather=Rainy; Homework= yes; Tired= No; No netflix –> II (I)
Weather=Rainy; Homework= yes; Tired= yes; netflix –> II (III)

The classification accuracy is 0.35, the error rate is 0.65, the sensitivity is 0.5, the specificity is 0.8.

The classification accuracy is 0.35, the error rate is 0.65, the sensitivity is 0.8, the specificty is 0.5.

The classification accuracy is 0.65, the error rate is 0.35, the sensitivity is 0.5, the specificity is 0.8.

The classification accuray is 0.65, the error rate is 0.35, the sensitivity is 0.8, the specificity is 0.5.

Answer

A

The classification accuracy is 0.65, the error rate is 0.35, the sensitivity is 0.8, the specificity is 0.5.

TP= 8
FP= 2
TN= 5
FN= 5
Class accuracy = (TP + TN)/total =
(9+5)/ (20) = 13/20 = 0,65
error rate = (FP + FN)/ total =
(2+5)/ 20= 7/20 = 0,35
sensitivty / recall= TN/ actual postive TP =
8/(8+5) =0,5

Question 13

Q

To avoid overfitting from happening when building a decision tree, various strategies can be adopted. One option is to split the data into a training set and a validation set. The optimal tree is then chosen where the

training set error is maximal.
validation set error is minimal.
training set error is minimal.
validation set error is maximal

Answer

A

validation set error is minimal.

By using a validation set, you can evaluate different tree sizes and select the one that provides the best performance on unseen data, thus avoiding overfitting.

Question 14

Q

Which is the most easy decision to make when building a decision tree?
splitting decision
stopping decision
assignment decision

Answer

A

assignment decision

Question 15

Q

Using the classification error to build a decision tree, the gain of the employment split is:

0.35.
0.33.
0.4.
0.5.

Answer

A

0,4?????

total instance 20
employed yes = 8 -> 0 churn
employed no = 12 -> 10 churn,, 2 no churn

Question 16

Q

Consider a data set with 50% good customers and 50% bad customers. This data set has an entropy of

0
0.5
1
10

Answer

A

1

k= 2, each class has proportion of 0,5
H= -0,5* log2(0,5) - 0,5* log(0,5)
H= -0,5 (-1) -0,5(-1) = 0,5+ 0,5 = 1

Question 17

Q

Confusion matrix
actual class
Pred. class 0 1
42 57

The classification accuracy is 57/100, the error rate is 43/100, the sensitivity is 57/58 and the specificity is 0.

The classification accuracy is 57/100, the error rate is 43/100, the sensitivity is 0 and the specificity is 57/58.

The classification accuracy is 43/100, the error rate is 57/100, the sensitivity is 57/58 and the specificity is 0.

The classification accuray is 43/100, the error rate is 57/100, the sensitivity is 0 and the specificity is 57/57.

Answer

A

The classification accuracy is 57/100, the error rate is 43/100, the sensitivity is 0 and the specificity is 57/58.

confusion table ;
TP FP
FN TN

ACC= (TP+ TN) / all observ
ERR= (FP+FN) / all observ.
SENS aka true negative TPR= TP / actual positive aka TP+FN
SPEC aka true neg rate TNR= TN/ actual negative aka FP + TN

TP= 0
FP=1
FN= 42
TN= 57

Question 18

Q

When predicting fraud, the target variable is

continuous
categorical

Answer

A

categorical

Question 19

Q

Consider the following numbers from a confusion matrix: TP=60; TN=15; FP=10; FN=15. Which statement is CORRECT?

The classification accuracy equals 75%, the sensitivity 80% and the specificity 60%.

The classification accuracy equals 75%, the sensitivity 60% and the specificity 80%.

The classification accuracy equals 80%, the sensitivity 75% and the specificity 60%.

The classification accuracy equals 25%, the sensitivity 60% and the specificity 80%.

Answer

A

The classification accuracy equals 75%, the sensitivity 80% and the specificity 60%.

Question 20

Q

In the ROC curve, the diagonal represents the

perfect model.
random model.

Answer

A

random model.

Question 21

Q

Consider a data set with 100% good customers and 0% bad customers. This data set has a classification error of

0
0,5
1
10

Answer

A

0

ERR= (FP+FN)/total -> 0/1= 0

Question 22

Q

The Area under the ROC curve (AUC) represents the probability that

a randomly chosen good gets a higher score than a randomly chosen bad.

a randomly chosen bad gets a higher score than a randomly chosen good.

a randomly chosen good the same score as a randomly chosen bad.

Answer

A

a randomly chosen good gets a higher score than a randomly chosen bad.

Question 23

Q

When predicting customer lifetime value (CLV), the target variable is

continuous.
categorical

Answer

A

continuous

CLV= total expected value a customer will generate for a business over the entire duration of their relationship with that business.

Question 24

Q

Consider the bounding function f(z)=1/(1+e^(-z)). Which statement is CORRECT?

For z=0 it becomes 0.5, for z very big it will approach 1 and for z very small it will approach 0.

For z=0 it becomes 0.5, for z very big it will approach 0 and for z very small it will approach 1.

For z=0 it becomes 1, for z very big it will approach 0 and for z very small it will approach 0.5.

For z=0 it becomes 0, for z very big it will approach 0.5 and for z very small it will approach 0.

Answer

A

For z=0 it becomes 0.5, for z very big it will approach 1 and for z very small it will approach 0.
~behavior of the sigmoid or logistic function

for z=0; e^(-z) becomes 1, fucntion becomes 1/ (1+1) = 0,5
for z=big; e^(-z) becomes 0 -> 1/ (1+0) = 1
for z=small; e^(-z) becomes large -> 1/ ~9999..= 0

Question 25

Q

A tree with many splits and leave nodes is likely to

overfit.
underfit.

Question 26

Q

Decision trees can be used for

for doing categorization.
for variable selection.
for segmentation.
as the final analytical model.
All of the above.

Answer

A

All of the above

Question 27

Q

The classification accuracy, error rate, sensitivity and specificity are

independent from the cut-off chosen.
dependent upon the cut-off chosen.

Answer

A

dependent upon the cut-off chosen.

adjusting the cut-off value changes the balance between sensitivity and specificity, which, in turn, impacts accuracy and error rate

Question 28

Q

When building a decision tree with a training set for making the splitting decision and a validation set for making the stopping decision, the training set error is usually

higher than the validation set error.
lower than the validation set error.

Answer

A

lower than the validation set error.

he decision tree is designed to fit the training data well, so the training set error is typically lower than the validation set error

Question 29

Q

Given the decision tree below and a test set with 20 observations. Which of the records below is a False Negative (FN)?

Record with ID “3”
Record with ID “6”
Record with ID “12”
Record with ID “16”

Answer

A

Record with ID “3”
should be false but states true

FN= Actuall class is TRUE (in table), predicted class is FALSE (in decision tree)

Question 30

Q

Given the gains chart below representing the performance of a certain response model by means of a lift curve. Suppose that you have a potential customer base of 4.000 clients. In other words, you can at most contact 4.000 people with a certain marketing initiative (e.g. a brochure).
How many respondents will react if you target the 1600 customers with the highest response score?

Answer

A

640
1600contacted/4000total= 40% –> graph; (40,80) (lift curve)
but baseline for 40% customers contatced is 40%postive repsonse = 1600*40% = 640 reactions

Question 31

Q

Given the decision tree below and a test set with 20 observations, what is the number of true positives (TP)?

Answer

A

(7)
TP = actual class= bad, ,predicted class= bad

TP. FP
FN TN

Question 32

Q

Given the decision tree below and a test set with 20 observations. Which of the records below is a False Negative (FN)?

Answer

A

(10)
FN
actual = good, predicted = bad

TN FP
FN TP

Question 33

Q

Given the following metric: (FP+FN) / (TP+FP+TN+FN)

Answer

A

error rate

Question 34

Q

Linear regression typically assumes that the target variable is

continuous.
categorical.

Answer

A

continuos

Question 35

Q

Given the following metric: TN / (TN+FP). This is the

Question 36

Q

Consider a data set with 50% good customers and 50% bad customers. This data set has a GINI of

Answer

A

0,5
1-(Ppos ^2-Pneg^2)

Question 37

Q

With a total client base of 10 000 people and 5000 responders on a marketing campaign, If we target 6000 clients with the highest scores from our model, we expect to reach:

Answer

A

60% x-axis –> (0,6; 0,85)
5000*0,85= 4250 reactions

Question 38

Q

Using the classification error to build a decision tree, the gain of the monetary split is:

0.25.
0.15.
0.2.
0.5.

Answer

A

0,2??????

Question 39

Q

Consider a data set with 50% good customers and 50% bad customers. This data set has a gini of

Answer

A

Gini= 1 - (p^2 pos. + p^2 neg.)
Gini= 1 -(0,5^2 + 0,5^2)
= 1- (0,25+0,25) = 0,5

Question 40

Q

The R-squared always varies between
Pearson always varies between

Answer

A

0 and 1
-1 and +1

Question 41

Q

Given the gains chart below representing the performance of a certain response model by means of a lift curve. Suppose that you have a potential customer base of 8.000 clients. In other words, you can at most contact 8.000 people with a certain marketing initiative (e.g. a brochure).
How many respondents will react if you target the 800 customers with the highest response score?

Answer

A

240
10% x-axis –> (10,30)
800 responders * 300% positive repsonses= 2400

Brainscape's Knowledge GenomeTM

5.3 Predictive Analytics (EN) Flashcards

Brainscape's Knowledge Genome^TM