Exam 2 Questions Flashcards

(30 cards)

1
Q

Rec System for newly launched online bookstore. 1 Million books, but only 10.000 ratings. Which is the better Rec System:
a. User-User collaborative filtering
b. Item-Item collaborative filtering
c. Content-based recommendation

A

c.) Content-based recommendation

Content-based recommendation does not rely on User-User or Item-Item similarities, but on the underlying features of the items - in this case books - itselfs. Thus, it is able to overcome the cold-start problem CF-methods suffer from, and can be applied to a sparse User-Item matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Matrix factorization method:
Determine the Baseline Estimate of the User’s rating of an Item
* Global avg. rating r_g = 3.4
* User avg. rating r_u = 3.0 –> User bias = -0.4
* Item avg. rating r_i = 4.1 –> Item bias = +0.7

A

Baseline Estimate:
r_g + User bias + Item bias = 3.4-0.4+0.7 = 3.7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Item-Item based Collaborative Filtering application steps - Predict the rating of unknown User-Item pair

A
  1. Calculate average rating of items
  2. Compute simularity between items
    a. Cosine similarity
    b. Jaccard similarity
    c. Pearson similarity - 1. Subtract the item rating average from the item ratings 2. Compute the cosine similarity
  3. Calculate weighted average
    a. Chose n number of items with highest similarity to item in question
    b. Multiply Similarity with rating of the other item (n times) and add it up
    c. Divide by the sums of the n similarities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Application steps of Hierarchical Clustering

A
  1. Merge nearest cluster (In the beginning, every point is a cluster)
  2. Compute new centroid of cluster
  3. Repeat steps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F: GMMs are a method of hard clustering.

A

False

GMMs use probabilities for clustering, thus soft.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

T/F: The updating process (i.e., updating centers and assigning data points to centers) of K-means is similar to the updating process of EM.

A

True

Both iteratively update assignments & centers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F: EM is a globally optimized algorithm.

A

False

As typical for iterative optimization algorithm, it is locally optimized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F: EM includes the steps of Expectation (E) and Marginalization (M).

A

False

EM is Expectation (E) and Maximization (M) steps, not Marginalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Set of training data, and two clustering algorithms: K-Means, and a Gaussian Mixture Model trained using EM. Will these two clustering algorithms produce the same cluster centers (means) for this data set? Explain why or why not.

A

Very similar, but not necessarily equal. Since the data is separated very clearly, K-Means and GMM will group it into two clusters. However, due to the probabilistic nature of the EM method that GMM utilizes and the fact it consider weighted means, the center of K-means and GMM differ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Considering the existence of m data points X, k mixture
distributions: Omega, wij denotes the membership of xi from the mixture j. Let N(xi | Omega) denote the probability of x that is drawn from the Gaussian distribution N(Omega). Derive the
optimization objective of GMM (a.k.a., the likelihood of X that are generated by GMM)

A

See Deriving GMM:
1. Take the weighted sum of the likelihood of a xi belonging to each Gaussian Distribution omega_j

P(xi | Omega) = sum [j=1, k] ( wij . N(xi |omegaj)

  1. Compute the likelihood of all datapoints m

L(Omega) = prod [i=1,m] (P(xi | Omega))

  1. Maximize L(Omega) w.r.t. wij and Omega - Constraint sum [j=1,k] (wij) = 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the three categories of Machine Learning to Rank tasks?

A
  1. bi-partite LTR
  2. k-partite LTR
  3. real value based LTR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the differences between RankSVM, RankBoost, and RankNet in terms of loss functions?

A

RankSVM: Hinge loss = max(1-x,0)
Loss function not smooth, penalizes incorrect ranking moderately

RankBoost: Exponential loss function
Smooth, Huge penalization

RankNet: Logistic loss function
Smooth, More moderate penalization compared to Exponential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why do the loss functions of RankSVM, RankBoost, and RankNet all
monotonically decease?

A

The lower the values of the indicator function the more incorrect the ranking is –> Thus, lower values must be penalized more than higher values, i.e. all loss function must monotonically decrease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F: the K-partite ranking uses a divide and conquer strategy to decompose the K-partite ranking task into a single bipartite ranking task.

A

False.

The K-Partite ranking decomposes into multiple bipartite ranking tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which ranking model is better? Explain why.
Golden-standard (Actual)
ranking scores
Model #1 Model #2
Item A 0.9 0.88 1000000
Item B 0.85 0.89 10000
Item C 0.8 0.83 10

A

Model #2 is better.

Ranking is concerned about the order of object not if the values are close. Model #2 predicts the order correctly and separates well, too.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the key six components of a reinforcement learning framework?

A
  1. Agents
  2. Environments
  3. Action
  4. Observations
  5. State
  6. Rewards
17
Q

Answer True or False only: In a reinforcement learning system, after taking an action, the state of the environment must change.

18
Q

What are the benefits of Deep-Q Network (DQN) over classic Q-learning (i.e., the Q table)?

A
  1. Can handle large and or high-dimensional states, thus scalable
  2. Generalizes better by learning underlying features of the states
19
Q

Which action should the agent take when st = s2 at time t? Why?
s1 s2 s3 s4
a1 1.20 0.60 2.50 1.40
a2 0.33 2.40 1.31 1.27
a3 -0.90 -2.80 1.00 0.80

A

action 2.

DQN policy is to maximize Q (s,a).
2.40 > 0.60 > -2.80

20
Q

After taking the action suggested in Q4.1 (a2) , suppose the discount factor is g = 0.9, the state transfers from s2 to s4 after taking action a, and the reward r is 0.7.

Table:
s1 s2 s3 s4
a1 1.20 0.60 2.50 1.40
a2 0.33 2.40 1.31 1.27
a3 -0.90 -2.80 1.00 0.80

Please update the Q-table and write down the updated Q-table.

Note: Only one value in the table needs updating, and you might need the Bellman Equation:

Q(st, at) = r(st, at)+ gt * max [at+1] { Q(st+1,at+1) }

A
  1. Determine maxQ of new state:
    max{ Q(s4,a) } = Q(s4,a1) = 1.4
  2. Update earlier states maxQ:
    Q’(s2,a2) = 0.7+0.9 . 1.4 = 1.96
  3. Update table:
    s1 s2 s3 s4
    a1 1.20 0.60 2.50 1.40
    a2 0.33 1.96 1.31 1.27
    a3 -0.90 -2.80 1.00 0.80
21
Q

What are the weaknesses of policy gradient?

A
  • Require a lot of transition data points to train the policy function
  • Computationally expensive
22
Q

Reinforcement learning and feature selection are two machine learning tasks. In reinforcement learning, we take actions to interact with the environment and change the state of environment, receive rewards of the actions, update policy functions, until the reward is maximized. In feature selection, we select a subset of features, observe model accuracy, and then re-select features, until the model’s accuracy is maximized.
1. What are the commodities and distinctions between reinforcement learning and feature selection?
2. Can you use reinforcement learning to conduct feature selection?
3. If yes, how are you going to design such reinforcement learning based feature selection system? If not, please explain why.

A

Similarities:
1. Iterative process to maximize a value (reward / accuracy)
2. Choosing subsets (action / subset of features) to train model

Distinctions:
1. RF typically is sequential - meaning previous action/state pairs influence the subsequent ones. Feature selection is not sequential, the order does not matter
2. RF steps change the environment, Feature selection doesnt change the set of features to choose from

Yes.

Design:
State: Subset of selected features
Action: Adding or Subtracting certain features
Reward: Accuracy of trained model

23
Q

Multiple choices: Which of the following algorithm is/are not an example of an ensemble method?

A. Learning to Rank
B. Random Forest
C. AdaBoosting
D. Decision Tree

24
Q

Which of the following option is/are correct regarding benefits of ensemble model?
1. Better performance
2. Generalized and robust models
3. Better explainability and interpretability

A. 1 and 3
B. 2 and 3
C. 1 and 2
D. 1, 2 and 3

25
True or False: Ensemble learning can only be applied to supervised learning methods. A. True B. False
B. False
26
True or False: Ensembles will yield bad results when there is significant diversity among the models. Note: All individual models have meaningful and good predictions. A. True B. False
B. False
27
True or False: Ensemble of classifiers may or may not be more accurate than any of its individual model. A. True B. False
A. True
28
In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their 8 votes. Which of the following ensemble method works similar to above-discussed election procedure? Hint: Persons are like base models of ensemble method. A. Bagging B. Boosting C. A Or B D. None of these
A. Bagging
29
Suppose, you are working on a binary classification problem. And there are 3 models each with 70% accuracy. If you want to ensemble these models using majority voting. What will be the minimum accuracy you can get? A. Always greater than 70% B. Always greater than and equal to 70% C. It can be less than 70% D. None of these
B. Always greater than and equal to 70%
30
Supposed in binary classification problem, you have given the following predictions of three models (M1, M2, M3) for five observations of test data set. What will be the output of the ensemble model if we are using majority voting method? M1 M2 M3 Output 1 1 0 0 1 0 0 1 1 1 0 1 1 1 1
[1,0,1,1,1]