Learning from User Generated Data Flashcards

(81 cards)

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A recommender system creates a list of recommendations based on a user-defined query that expresses the user’s information need. (True/False)

A

False

A recommender system is an information filtering system that provides a personalized perspective on the available item catalog based on user actions or preferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

An information retrieval system creates a list of documents based on a user-defined query that expresses the user’s information need. (True/False)

A

True

This is the fundamental function of an information retrieval system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name the main goals pursued by the course ‘Learning from User-generated Data’ regarding recommender systems.

A

The main goals are:
* To illustrate approaches to learning from user-generated data
* To provide a sense of how recommender systems are used in real-world applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the typical inputs and outputs of recommender systems? Provide examples for each.

A

Inputs:
* User-item interactions
* User-item ratings
* Personal and item data

Outputs:
* Predicted ratings
* Filtered lists
* Recommendations for the ‘next item’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the difference between explicit and implicit user feedback and give an example for each.

A

Explicit Feedback:
* Direct indication of preference (e.g., ratings)
Implicit Feedback:
* Inferred preferences from interactions (e.g., frequency of consumption)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the main assumption on which Collaborative Filtering (CF) is based?

A

The main assumption is that users who had a similar taste in the past will have a similar taste in the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe what an interaction matrix is and why it is a key concept in many recommender systems.

A

An interaction matrix represents users (rows) and items (columns), with entries showing interactions or ratings. It is key because it forms the basis for many collaborative filtering algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the difference between Memory-based and Model-based Collaborative Filtering.

A

Memory-based CF:
* Stores all ratings directly
* Predictions are ad-hoc
Model-based CF:
* Factorizes the user-item matrix
* Predictions are based on a learned model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Item-based CF scales better for several reasons: (1) item-item similarities can be calculated offline and updated from time to time, and (2) only items that the active user has rated are considered when identifying the nearest neighbors. (True/False)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which similarity measures are commonly used in Collaborative Filtering, and why is Adjusted Cosine Similarity relevant for item-based CF?

A

Common measures:
* Pearson’s correlation coefficient
* Cosine similarity
Adjusted cosine similarity accounts for user-specific rating bias in item-based CF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A major problem with using Truncated SVD for matrix factorization is missing values in the user-item rating matrix. (True/False)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Model-based CF methods learn exclusively from implicit user feedback, which exacerbates cold-start problems. (True/False)

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When Stochastic Gradient Descent (SGD) is used to create an MF model, a regularization term is used to prevent overfitting. A common choice is Tikhonov regularization. (True/False)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are latent factors in matrix factorization, and how are they used to predict ratings?

A

Latent factors are dimensions derived from rating patterns representing users and items. Predicted ratings are calculated as the inner product of user and item latent vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain the concept of Singular Value Decomposition (SVD) in the context of matrix factorization for recommender systems.

A

SVD factorizes a matrix into three smaller matrices: user factors (U), singular values (Σ), and item factors (V^T). It reconstructs the original matrix and enables dimensionality reduction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Consider the following preferences and needs of a user evaluating a new recommender system. What metric corresponds to knowing the precision at the point where precision and recall are equal?

A

R-precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Two recommender systems have created lists of 3 recommendations for the same user. Both lists achieve the same score for the Reciprocal Rank. (True/False)

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The metrics MRR (Mean Reciprocal Rank) and NDCG (Normalized Discounted Cumulative Gain) consider the position of relevant items in the recommendation list. (True/False)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Average Precision (AP) considers relevant items that are not in the recommendation list, thus implicitly incorporating recall. (True/False)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Compared to CG (Cumulative Gain), DCG (Discounted Cumulative Gain) weights the item gains using the position in the recommendation list. (True/False)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

NDCG (Normalized Discounted Cumulative Gain) relates DCG to the ideal DCG to increase the interpretability of the DCG values. (True/False)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Rank correlation coefficients such as Kendall’s τ or Spearman’s ρ can be used to compare the rankings of 2 (or more) recommender algorithms and determine the extent of their agreement. (True/False)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the range of scores normalized by recommender systems?

A

0 to 1

This normalization allows for comparisons between different systems and facilitates the interpretation of results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Can rank correlation coefficients like Kendall's τ or Spearman's ρ be used to compare recommender algorithms?
True ## Footnote These coefficients measure the similarity or agreement between rankings of different algorithms.
26
Name and briefly describe the three main scenarios for evaluating recommender systems.
* Offline Evaluation: Based on a fixed dataset, easy to reproduce * Online Evaluation: Conducted with real users, high validity * User Studies: Involves surveys and observations ## Footnote Each scenario has its pros and cons regarding realism, effort, and reproducibility.
27
Explain the difference between Recall@k and Precision@k.
* Precision@k: Proportion of relevant items among the top-k recommendations * Recall@k: Proportion of relevant items recommended out of all available ## Footnote The choice of metric depends on specific use case and priorities.
28
What are 'Beyond-Accuracy Metrics'?
* Diversity * Novelty * Coverage * Serendipity * Explainability ## Footnote These metrics evaluate aspects of recommender systems beyond prediction accuracy.
29
Is content-based filtering well-suited for recommending 'long-tail' items?
True ## Footnote Content features are more objective and less affected by popularity biases.
30
What are lemmatization and stemming used for in text processing?
To reduce the dimensionality of the vector space ## Footnote These techniques reduce the number of unique terms in a vector space model (VSM).
31
True or False: The Vector Space Model (VSM) represents each document as a fixed-dimensional vector of term weights.
True ## Footnote Each dimension corresponds to a unique term from the vocabulary.
32
What is case folding in text preprocessing?
Converts all text characters to lowercase ## Footnote This can introduce semantic ambiguities.
33
What does the normalization assumption in the third monotonicity assumption address?
It is addressed by using cosine similarity ## Footnote Cosine similarity normalizes the vectors to their length.
34
True or False: The logarithmic term in TF and IDF calculations is motivated by Zipf's Law.
True ## Footnote Zipf's Law describes the frequency of word occurrences in natural languages.
35
For short texts, which TF formulation typically outperforms?
Binary TF formulation ## Footnote The logarithmic variant offers no significant advantage for rarely repeated words.
36
What is the Bag of Words (BoW) approach?
A text representation ignoring order, grammar, and syntax ## Footnote Advantages include simplicity and speed; disadvantages include ignoring semantic meanings.
37
What are the essential steps in a text preprocessing pipeline?
* Noise Removal * Lowercasing ## Footnote These steps clean and standardize texts for processing.
38
True or False: LSA is a probabilistic variant of LDA.
False ## Footnote LSA uses Singular Value Decomposition (SVD), while LDA is a probabilistic model.
39
What does LSA perform on the TF-IDF representation of the term-document matrix?
Singular Value Decomposition (SVD) ## Footnote This uncovers latent semantic structures.
40
True or False: The topic modeling approaches pLSA and LDA assume conditional independence of terms and documents.
True ## Footnote This assumption is fundamental to both models.
41
What is the overarching goal of Topic Modeling in text mining?
To describe documents by a set of topics rather than individual words ## Footnote This helps uncover latent semantic structures.
42
True or False: A recommender system that applies graph-based transitivity considers the recommendation task as a graph analysis problem.
True ## Footnote It uses a bipartite graph of users and items with specific path requirements.
43
What is Katz centrality?
A node metric modeling diffusion behavior in networks ## Footnote It measures the number of paths originating from a node, penalizing longer paths.
44
True or False: Graph-based transitivity operates on a bipartite graph with users and items as nodes.
True ## Footnote Edges represent interactions between users and items.
45
What do medial node-based centrality measures consider?
Paths that pass through the node of interest ## Footnote They assess a node's importance based on shortest paths.
46
True or False: K-path centrality and degree centrality are examples of length-based centrality measures.
False ## Footnote They are radial centrality measures, not directly integrated into matrix factorization.
47
What are edge measures in social networks?
They describe local relationships between two nodes ## Footnote Examples include Tie Strength and Edge Betweenness.
48
True or False: In a weighted hybrid system, weights can be learned individually for each user.
True ## Footnote Weights can be adjusted based on available data.
49
True or False: A parallelized design of hybrid recommender systems is an example of early-fusion aggregation.
False ## Footnote It is an example of late-fusion aggregation.
50
What is a parallelized design of hybrid recommender systems an example of?
A late-fusion approach ## Footnote In late-fusion, individual recommenders generate recommendations independently, and the aggregation occurs later.
51
True or False: Monolithic hybridization designs always use late fusion as an aggregation approach.
False ## Footnote Monolithic designs typically employ early-fusion, combining data sources before applying the recommendation algorithm.
52
True or False: Borda rank aggregation is a late-fusion method.
True ## Footnote Borda rank aggregation combines ranks from multiple recommenders, applying the method after individual lists are created.
53
What is CBCF (Content-Boosted Collaborative Filtering) based on?
Enriching real rating data with predictions from a classifier based on content features ## Footnote CBCF combines content-based and collaborative filtering to create a denser matrix of pseudo-ratings.
54
True or False: Self-weighting in Content-Boosted Collaborative Filtering quantifies the algorithm's confidence in the correlation between users' rating vectors.
False ## Footnote Self-weighting quantifies confidence in the content-based prediction used for generating pseudo-ratings.
55
What are the main goals for using hybrid recommender systems?
* To reflect different facets of the items or the domain * To achieve better results (e.g., improved accuracy) * To enable predictions in situations where a single system might not work ## Footnote Hybrid systems leverage strengths of different approaches to overcome individual limitations.
56
List the three main categories of hybrid recommender designs according to Jannach et al.
* Parallelized Design * Monolithic Design * Pipelined Design ## Footnote These categories classify when and how different recommendation approaches are combined.
57
True or False: User context aspects include the target user's activity, mood, and spatio-temporal context.
True ## Footnote User context refers to dynamic aspects of the user's situation during item interaction.
58
True or False: The item context in a multimedia recommender system only includes data encoded in the item's audiovisual signal.
False ## Footnote Item context includes additional data not directly extracted from primary media content, such as tags or album covers.
59
True or False: A music recommender system suggesting more Christmas songs in December is an example of a context-aware recommendation.
True ## Footnote This is an example of considering temporal context.
60
True or False: Context acquisition can be explicit, implicit, or inferential.
True ## Footnote Explicit context is directly collected, implicit derives from behaviors, and inferential uses existing data.
61
What does the Item Purpose describe in a recommender system?
The purpose that the content creator had in mind when creating the item ## Footnote Item Purpose reflects the creator's intention or function of the item.
62
True or False: The User Background describes dynamic information about the user.
False ## Footnote User Background refers to static characteristics, while dynamic needs fall under User Intent or User Context.
63
True or False: The User Intent describes the purpose that the content creator had in mind when creating the item.
False ## Footnote User Intent describes why the user consumes an item, not the creator's purpose.
64
Explain the difference between passive and active user awareness in context-aware recommender systems.
* Passive User Awareness: Captures context but does not change behavior * Active User Awareness: Automatically integrates new context and adjusts recommendations ## Footnote The difference is in whether context changes trigger immediate adjustments.
65
True or False: In a dimensional model of affect, the valence dimension describes the pleasantness of an emotion.
True ## Footnote Valence indicates the quality of emotion, while arousal indicates intensity.
66
True or False: A common approach to integrating personality into a recommendation algorithm is to redefine user similarity based on personality traits.
True ## Footnote This approach helps in cold-start scenarios by enabling similarity estimation with limited interactions.
67
True or False: Mood is longer-lasting and of low intensity compared to emotion.
True ## Footnote Emotions are specific, intense, and short-lived reactions to stimuli.
68
True or False: The assignment of a participant's responses to the BFI-44 personality instrument to the OCEAN model is achieved through harmonic mean weighting.
False ## Footnote The final score is typically a linear combination of responses.
69
True or False: The personality-aware recommender system by Lu and Tintarev, 2018, creates recommendations by weighting items through a linear combination of rank and diversity.
True ## Footnote This method incorporates personality traits to adjust recommendation diversity.
70
True or False: The base-level activation function in the ACT-R model describes the frequency and recency of item exposure.
True ## Footnote It models how frequently and recently used information is more easily retrieved.
71
What are the three main categories of Psychology-informed Recommender Systems (PIRS)?
* Cognition-inspired Recommender Systems * Personality-aware Recommender Systems * Affect-aware Recommender Systems ## Footnote PIRS utilizes psychological insights to refine recommendations.
72
Explain the concept of the Ebbinghaus forgetting curve.
It describes the decline in memory performance over time ## Footnote In recommender systems, it models how user interests lose relevance, suggesting a time-based decay weighting for interactions.
73
True or False: Common strategies for mitigating unwanted biases in a recommender system include regularization, data rebalancing, and adversarial training.
True ## Footnote These strategies address bias through various processing techniques.
74
True or False: Many common recommendation algorithms favor popular items over less popular ones.
True ## Footnote This is known as popularity bias or the 'rich-get-richer' effect.
75
True or False: Individual fairness implies that similar users are treated similarly.
False ## Footnote Individual fairness and group fairness are distinct concepts; one does not guarantee the other.
76
What is popularity bias in recommender systems?
Popularity bias is known as the 'rich-get-richer' effect, where already popular content tends to be recommended more frequently, reinforcing the popularity distribution. ## Footnote This phenomenon can lead to a lack of diversity in recommendations.
77
Individual fairness implies that similar users are treated similarly. True or False?
False ## Footnote Individual fairness and group fairness are different concepts. Achieving group fairness does not guarantee individual fairness for every user.
78
What percentage of songs in a music recommender system's catalog were created by female artists if the recommendation lists contain an average of 50% songs by female artists?
42% ## Footnote The recommender system attempts to mitigate societal bias by striving for a 50% representation of female artists in recommendations.
79
Filtering items created by the majority group of producers from the recommender list is considered a post-processing bias mitigation strategy. True or False?
True ## Footnote Post-processing strategies are techniques applied after the recommendation algorithm has generated its initial results.
80
What is the difference between 'Societal Bias' and 'Statistical Bias' in recommender systems?
* Societal Bias: Discrepancy between the ideal world and reality (e.g., equal representation of genders). * Statistical Bias: Discrepancy between reality and its representation in the system or model. ## Footnote Societal bias concerns external norms, while statistical bias relates to data representation and model building.
81
What types of harm can arise from harmful biases in recommender systems?
* Distributional Harm: Unjust denial of resources or advantages to a person or group. * Representational Harm: Misrepresentation or encoding of stereotypes in the system. ## Footnote These harms impact social and ethical dimensions beyond system performance.