Exam 2 Questions Flashcards
(30 cards)
Rec System for newly launched online bookstore. 1 Million books, but only 10.000 ratings. Which is the better Rec System:
a. User-User collaborative filtering
b. Item-Item collaborative filtering
c. Content-based recommendation
c.) Content-based recommendation
Content-based recommendation does not rely on User-User or Item-Item similarities, but on the underlying features of the items - in this case books - itselfs. Thus, it is able to overcome the cold-start problem CF-methods suffer from, and can be applied to a sparse User-Item matrix.
Matrix factorization method:
Determine the Baseline Estimate of the User’s rating of an Item
* Global avg. rating r_g = 3.4
* User avg. rating r_u = 3.0 –> User bias = -0.4
* Item avg. rating r_i = 4.1 –> Item bias = +0.7
Baseline Estimate:
r_g + User bias + Item bias = 3.4-0.4+0.7 = 3.7
Item-Item based Collaborative Filtering application steps - Predict the rating of unknown User-Item pair
- Calculate average rating of items
- Compute simularity between items
a. Cosine similarity
b. Jaccard similarity
c. Pearson similarity - 1. Subtract the item rating average from the item ratings 2. Compute the cosine similarity - Calculate weighted average
a. Chose n number of items with highest similarity to item in question
b. Multiply Similarity with rating of the other item (n times) and add it up
c. Divide by the sums of the n similarities
Application steps of Hierarchical Clustering
- Merge nearest cluster (In the beginning, every point is a cluster)
- Compute new centroid of cluster
- Repeat steps
T/F: GMMs are a method of hard clustering.
False
GMMs use probabilities for clustering, thus soft.
T/F: The updating process (i.e., updating centers and assigning data points to centers) of K-means is similar to the updating process of EM.
True
Both iteratively update assignments & centers.
T/F: EM is a globally optimized algorithm.
False
As typical for iterative optimization algorithm, it is locally optimized
T/F: EM includes the steps of Expectation (E) and Marginalization (M).
False
EM is Expectation (E) and Maximization (M) steps, not Marginalization
Set of training data, and two clustering algorithms: K-Means, and a Gaussian Mixture Model trained using EM. Will these two clustering algorithms produce the same cluster centers (means) for this data set? Explain why or why not.
Very similar, but not necessarily equal. Since the data is separated very clearly, K-Means and GMM will group it into two clusters. However, due to the probabilistic nature of the EM method that GMM utilizes and the fact it consider weighted means, the center of K-means and GMM differ.
Considering the existence of m data points X, k mixture
distributions: Omega, wij denotes the membership of xi from the mixture j. Let N(xi | Omega) denote the probability of x that is drawn from the Gaussian distribution N(Omega). Derive the
optimization objective of GMM (a.k.a., the likelihood of X that are generated by GMM)
See Deriving GMM:
1. Take the weighted sum of the likelihood of a xi belonging to each Gaussian Distribution omega_j
P(xi | Omega) = sum [j=1, k] ( wij . N(xi |omegaj)
- Compute the likelihood of all datapoints m
L(Omega) = prod [i=1,m] (P(xi | Omega))
- Maximize L(Omega) w.r.t. wij and Omega - Constraint sum [j=1,k] (wij) = 1
What are the three categories of Machine Learning to Rank tasks?
- bi-partite LTR
- k-partite LTR
- real value based LTR
What are the differences between RankSVM, RankBoost, and RankNet in terms of loss functions?
RankSVM: Hinge loss = max(1-x,0)
Loss function not smooth, penalizes incorrect ranking moderately
RankBoost: Exponential loss function
Smooth, Huge penalization
RankNet: Logistic loss function
Smooth, More moderate penalization compared to Exponential
Why do the loss functions of RankSVM, RankBoost, and RankNet all
monotonically decease?
The lower the values of the indicator function the more incorrect the ranking is –> Thus, lower values must be penalized more than higher values, i.e. all loss function must monotonically decrease.
T/F: the K-partite ranking uses a divide and conquer strategy to decompose the K-partite ranking task into a single bipartite ranking task.
False.
The K-Partite ranking decomposes into multiple bipartite ranking tasks
Which ranking model is better? Explain why.
Golden-standard (Actual)
ranking scores
Model #1 Model #2
Item A 0.9 0.88 1000000
Item B 0.85 0.89 10000
Item C 0.8 0.83 10
Model #2 is better.
Ranking is concerned about the order of object not if the values are close. Model #2 predicts the order correctly and separates well, too.
What are the key six components of a reinforcement learning framework?
- Agents
- Environments
- Action
- Observations
- State
- Rewards
Answer True or False only: In a reinforcement learning system, after taking an action, the state of the environment must change.
False
What are the benefits of Deep-Q Network (DQN) over classic Q-learning (i.e., the Q table)?
- Can handle large and or high-dimensional states, thus scalable
- Generalizes better by learning underlying features of the states
Which action should the agent take when st = s2 at time t? Why?
s1 s2 s3 s4
a1 1.20 0.60 2.50 1.40
a2 0.33 2.40 1.31 1.27
a3 -0.90 -2.80 1.00 0.80
action 2.
DQN policy is to maximize Q (s,a).
2.40 > 0.60 > -2.80
After taking the action suggested in Q4.1 (a2) , suppose the discount factor is g = 0.9, the state transfers from s2 to s4 after taking action a, and the reward r is 0.7.
Table:
s1 s2 s3 s4
a1 1.20 0.60 2.50 1.40
a2 0.33 2.40 1.31 1.27
a3 -0.90 -2.80 1.00 0.80
Please update the Q-table and write down the updated Q-table.
Note: Only one value in the table needs updating, and you might need the Bellman Equation:
Q(st, at) = r(st, at)+ gt * max [at+1] { Q(st+1,at+1) }
- Determine maxQ of new state:
max{ Q(s4,a) } = Q(s4,a1) = 1.4 - Update earlier states maxQ:
Q’(s2,a2) = 0.7+0.9 . 1.4 = 1.96 - Update table:
s1 s2 s3 s4
a1 1.20 0.60 2.50 1.40
a2 0.33 1.96 1.31 1.27
a3 -0.90 -2.80 1.00 0.80
What are the weaknesses of policy gradient?
- Require a lot of transition data points to train the policy function
- Computationally expensive
Reinforcement learning and feature selection are two machine learning tasks. In reinforcement learning, we take actions to interact with the environment and change the state of environment, receive rewards of the actions, update policy functions, until the reward is maximized. In feature selection, we select a subset of features, observe model accuracy, and then re-select features, until the model’s accuracy is maximized.
1. What are the commodities and distinctions between reinforcement learning and feature selection?
2. Can you use reinforcement learning to conduct feature selection?
3. If yes, how are you going to design such reinforcement learning based feature selection system? If not, please explain why.
Similarities:
1. Iterative process to maximize a value (reward / accuracy)
2. Choosing subsets (action / subset of features) to train model
Distinctions:
1. RF typically is sequential - meaning previous action/state pairs influence the subsequent ones. Feature selection is not sequential, the order does not matter
2. RF steps change the environment, Feature selection doesnt change the set of features to choose from
Yes.
Design:
State: Subset of selected features
Action: Adding or Subtracting certain features
Reward: Accuracy of trained model
Multiple choices: Which of the following algorithm is/are not an example of an ensemble method?
A. Learning to Rank
B. Random Forest
C. AdaBoosting
D. Decision Tree
A. and D.
Which of the following option is/are correct regarding benefits of ensemble model?
1. Better performance
2. Generalized and robust models
3. Better explainability and interpretability
A. 1 and 3
B. 2 and 3
C. 1 and 2
D. 1, 2 and 3
C. 1 & 2