08 - Evaluation Flashcards

1
Q

What do you need to know to assess how good or meaningful results are?

A
  • What type of error was used
  • Which data set and how it was divided
  • Scale
  • Context, how do other algorithms perform
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can you use as a baseline? What are other algorithms that you can compare your own method with?

A
  • State of the art algorithms or algorithm used so far
  • Simple algorithms (linear regression)
  • Mean or median
  • Highest Class Probalility (Modal)
  • Some simple rules
  • Random
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do you need a baseline?

A
  • Without a baseline, performance evaluations of an algorithm are typically of little or no relevance
  • A baseline gives meaning to the results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the ground truth for recommender systems?

A
  • Ratings, submitted ratings, relevance scores of a dataset
  • These are considered “true” but may well be false, biased, sparse or noisy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the problems with the ground truth of recommender systems?

A
  • Real ground truth is difficult to measure
  • Ground truth is derived/approximated
  • Is the best possible that is available
  • Hard to find
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is called the Gold Standard?

A

Something is the best available thing you can get

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the assumption of the Central Limit Theorem?

A
  • Large number of examinations
  • Large random sample with n examinations
  • Samples are random (independent of the previous examination)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Central Limit Theorem?

A
  • Mean (and sum) of the samples follows a normal distribution
  • The larger n, the closer the mean and sum of the samples approach the true values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is statistical significance?

A
  • Describes the probability that an observed difference is caused by chance
  • The typical p value should be less than 0.05 or 0.01
  • Statistically significant results can still be false or practically insignificant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does statistical significance mean?

A
  • Experimental data giving a p value of 0.05 means that there is only a 5% chance of getting the observed result if no real effect exists
  • The p value provides information about the probability of obtaining evidence. It does not quantify the strength of the evidence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is called P-hacking?

A

If you torture your data long enough, they will confess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is it important to analyze performance over time?

A

Standard stupid assumption: performance is always the same over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is called dataset pruning?

A

Remove data that does not fit your intention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When is it good to remove data?

A
  • Wrong data
  • Noisy data
  • Missing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly