Assignment Questions Flashcards

1
Q

The marketing team requires your assistance in evaluating model XY with a total accuracy of 60%. This model predicts 75% of the customers that will not adhere to a digital subscription correctly. The new digital subscription will be rolled out to the customers for a weekly distribution at a cost of $50 a year. The current value of the paper magazine is $60 per year. The marketing team has estimated that targeting a customer will have a cost of $2 and in addition will incur the cost of a one year discount of 10% of the value of the subscription. Transferring the magazines to a digital format, has associated costs for the company which should be distributed through the whole customer base, in the first year of the transition. Different business models are being considered:

  1. Fixed yearly cost of $20 dollars per costumer;
  2. Pay per use.

Considering the fixed yearly cost case, what is the expected profit for the model XY ?

Considering the fixed yearly cost case, using model XY, over which probabilityConsidering the fixed yearly cost case, using model XY, over which probashould we target each consumer in the dataset?

Fixed yearly cost of $20 dollars per costumer;
2. Pay per use.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Based on the information from the previous questions and Table 1 draw the full dendrogram when using complete linkage. Show the steps of computations.

A

Calculate dendrogram by going through and combining customers and cluster together based on closest ones. Since complete linkage is being used, once they have been combined you fill the table with the furthest distances to the other clusters/customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the differences between hierarchical clustering and objective based clustering.

A

The main difference is that hierarchical clustering focuses on the similarities between the individual instances & how similarities link them together while objective based functions like k-means clustering focuses on the clusters themselves. K means computes the distance between the clusters and center. Hierarchical computes distance between all pairs of clusters on each iteration.

Within cluster distance minimised, distance between clusters needs to be maximised.

Also, objective based clustering can take many forms based on the objective function used.

Objective based functions also must be run many times as the clustering result is dependent upon the initial centroid locations, hierarchical clustering does not have this.

Also, you need to define the number of clusters for objective based.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Given table 2, plot the observations, start by randomly assigning a cluster label to each observation and do five steps of the K-means clustering.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Consider the logistic regression model f (x) = −1.48 −0.11x1 + 0.05x2 where x1 and x2 are ‘Years of Subscription’ and ‘Age’ respectively, trained in a different data than the one presented table 3. Draw the ROC curve for this model and for a random classifier. To alleviate the calculation burden, you can use the first 10 customers and use only 4 points to draw the curve. Hint: : See figure 8.1 For these exercises, you are “ranking instead of classifying”.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Based on the same model, draw the profit curve for this model and for a random classifier. To alleviate the calculation burden, you can use the first 10 customers and use only 4 points to draw the curve.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Based on the same model, draw the cumulative response curve for this model and for a random classifier. To alleviate the calculation burden, you can use the first 10 customers and use only 4 points to draw the curve.

A

You just got to look at the confusion matrix and make the calculations.

Very straightforward if you have the cumulative response curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classify David using a naive Bayes classifier, based on the information from all customers. David is 32 years old, enjoys fishing and has been a subscriber for 3 years. Show the steps of the computations. Explain the assumptions made.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Using the naive Bayes classifier you created in the previous question, classify the first three training examples. Does the model classifies the first 3 training examples correctly?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Research the use of m-estimate of probability in the naive Bayes classifier. Classify David using the m-estimate of probability with an equivalent sample size m = 4. Use 3 different values for the prior estimate p for P (C = c|E).

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Using the Bayes’ optimal classifier you created in the previous question, classify the first three training examples. Does the model classifies the first 3 training
examples correctly?

A

Notice in the bottom example you need to find complete matches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain what is an appropriate baseline model based on the same concept as a naive Bayes classifier and based on the same concept as decision trees.

A

Naive bayes - calculating without attributes

Decision tree - most basic version of a tree, so one root split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Research what is a perceptron and what is the role of the activation function.

A

A perceptron is a single layer neural network. It is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether an input, represented by a vector of numbers, belongs to some specific class. A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs then outputs a 1 if the output is greater than some threshold, or -1 otherwise.

“Perceptron is just a node, it’s the most basic feedforward NN. It consists one layer of one node. It does binary classification.”

The activation function decides whether a neuron should be activated or not by calculating the weighted sum and adding bias to it. The reason the activation function is added, is to introduce non-linearity to the output of the neuron.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In neural networks training, what is an epoch and an iteration?

A

An epoch is one forward and one backward pass of the data.

For each complete epoch, we have several iterations. Iteration is the number of batches or steps through partitioned packets of the training data, needed to complete one epoch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Research why the softmax function is used in the output layer for classification problems.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Research what is the difference between online learning and batch learning.

A
17
Q

Consider the diagram of a neural network with two inputs (i1 is the years of subscription and i2 is the Age), two hidden neurons (including a bias), two output neurons (o1 is the positive class), depicted in figure 1. The weights of the network are as follows: w1 = 0.15, w2 = 0.25, w3 = 0.10, w4 = 0.40, w5 = w6 = 0.55, w7 = 0.50, w8 = 0.25, b1 = 0.35, b2 = 0.6. Each of the nodes h1, h2 uses as activation function a logistic function, while the nodes o1, o2 use a softmax function. What is the type of this neural network?

A
18
Q

Calculate the error of this network for the first 3 customers using online learning. If you run into problems with the softmax function, use a logistic function for the nodes o1, o2. How would the calculation differ if you were using batch learning with size of 3?

A

Notice that the equation of the output is already in the softmax function e term.

19
Q

How would the calculations of the backward pass differ if you were using a hyperbolic tangent activation function? if you would use the ReLU activation function?

A

Rather than using the derivative of the softmax or logistic function we would use the derivative of those functions to compute the gradient.

20
Q

Research what are deep neural networks? How do they compare with neural networks?

A

A deep neural network is an artificial neural network with multiple layers between the input and output layers. Essentially, the main difference between a neural network and a deep neural network is the number of hidden layers.