05 - Recommender Systems Evaluation Flashcards

1
Q

What questions should you ask yourself is you develop a recommender system?

A
  • Objective: What do you want to achieve with the model?
  • How to measure: Evaluation methods and Evaluation metrics
  • How good/relevant are the results?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the goals of the business world?

A
  • A successful business
  • Maximum profit, income, and user satisfaction
  • Minimize costs
  • Get as many users as possible
  • To have the best product
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are possible costs, that may arise?

A
  • Labour costs
  • Server
  • Legal/Licenses
  • etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Goodhart’s Law?

A

When a measure becomes a target, it ceases to be a good measure (dt. wenn ein Messwert zu einem Ziel wird, ist es kein geeigneter Messwert)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three main evaluation methods and metrics?

A
  • Online Evaluations
  • Offline Evaluations
  • User Studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is part of Online Evaluations?

A
  • Sales
  • Profit
  • Clicks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is part of Offline Evaluations?

A
  • Errors
  • Accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is part of User Studies?

A
  • User feedback
  • User observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does an A/B Test work?

A
  • Typical Online Test
  • 50% of the users see Variante A
  • 50% of the users see Variante B
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Interleaving work?

A
  • Randomize Rankings
  • All kinds of variations (Random Mix, Top n Mix, Fixed amount Mix)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a typical metric for classification?

A

Accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a typical metric for Regression?

A

Error Metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a typical metric for Ranking?

A

Ranking Metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Is Regression = Classification?

A
  • Regression tasks can be interpreted as classification/ranking problem
  • Define intervals and treat them as classes (and use a classification algorithm instead of regression algorithm)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What regression metrics do you know?

A
  • Mean Absolute Error (MAE)
  • (Root) Mean Square Error ((R)MSE)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Mean Absolute Error (MAE)?

A

Average Error (Mittelwert) between prediction and observation

17
Q

What is the benefit of Mean Absolute Errors (MAE)?

A

Intuitive

18
Q

What is the drawback of RMSE?

A
  • Not very intuitive
  • Punishes high error rates more
19
Q

What (Ranked) Retrieval Metrics do you know?

A
  • Mean Reciprocal Rank (MRR)
  • Mean Average Precision (MAP)
  • Normalized Discounted Cumulative Gain (nDGC)
20
Q

What is Mean Reciprocal Rank (MRR)?

A
  • Measures at which rank the first relevant result is displayed
  • Takes care of the first relevant result only
21
Q

What is Normalized Discounted Cumulative Gain (nDGC)?

A

Relevant items are ranked higher than less relevant items

22
Q

In which steps can the Normalized Discounted Cumulative Gain (nDGC) be divided?

A
  • Step 1: Cumulative gain = Sum of relevance of the top n items
  • Step 2: Discounted Cumulative Gain: Punishes relevant items, that are less ranked
  • Step 3: Normalized Discounted Cumulative Gain: Normalises DCG on interval 0 to 1
23
Q

What is Effectiveness?

A

Die richtigen Sachen machen (Do the right things )

24
Q

What is Efficiency?

A

Sachen richtig machen (Do things right)

25
Q

What is Performance?

A
  • Sometimes synonym for Effectiveness
  • Sometimes used as generic term