What data does collaborative filtering use?
Only uses Users' history of ordering or rating items
What are the data sources that can be used for a recommender system? (3)
- Users' demographic information (e.g., age, gender, location)
- Items' attributes (e.g., price, category)
- Users' history of ordering or rating items
What do we know about data for collaborative filtering?
What are the representations of the collaborative filtering problem?
User-user approach: estimate a user’s rating of an item by finding "similar" users and then looking at their ratings for that item.
Item-item approach: estimate a user’s rating of an item by finding similar items and then looking at that user's rating of these similar items.
Matrix factorization: construct two low-rank matrices that approximate the observed entries of X.
What's The task of collaborative filtering?
"fill in" the missing values (i.e., the predicted user ratings) based on the existing ratings
How do you convert a dataframe to a Numpy matrix?
In the user-user approach, how is the similarity weight calculated?
Two common options:
- Pearson correlation
- Cosine similarity
- What's a drawback of the user-user and item-item approaches?
- How do you solve it?
- Can't keep the user-rating matrix in sparse format, which does not scale well with a large number of users / items.
- Use matrix factorization.
What's an advantage of the matrix factorization approach?
We don't need to break the sparsity of the user ratings matrix?
What is this? "singularity issues with matrix inverses"
- What is alternating least squares?
- What's an important point about it?
- It's feasible with sparse matrices, which can considerably reduce runtime and memory usage.
What's matrix factorization?
This approach aims to approximate the observed entries in X as a product of two lower-rank matrices