08 Evaluation Flashcards

1
Q

why do we need to evaluate

A
  1. economic reasons
    • how effective is the solution
  2. scientific progress
    • is their method better than competitors
  3. verification
    • verify performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what do we need to evaluate

A
  1. efficiency
    - how fast
  2. coverage
    - how many pages is indexed
  3. presentation
    - effort required
  4. effectiveness
    - how correct is it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the IR experimental set up

A

maintain a test collection of docs, queries and relevance assessments using Ground truth
- measure of performance of precision, recall
- systems to compare for query TF vs TF-IDF
- experimental design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the assumptions for the evaluation

A

system provides a ranked list after searching the query
- a better system will provide a better ranked list
- a better ranked list generally satisfies the users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is precision

A

retrieved docs that are relevant / all retrieved docs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is recall

A

retrieved docs that are relevant / all relevant docs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ranking effectiveness

A
  1. how many to rank? eg. top 1, 3, 5?
  2. if precision at rank R is higher, recall will also be higher
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the 3 methods of summarising ranking

A
  1. calculate recall, precision at fixed rank positions
  2. calculate precision at standard recall levels from 0.0 to 1.0
    - interpolation
  3. averaging precision values from the rank positions where a relevant document was retrieved
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is mean average precision (MAP)

A

summarise rankings from multiple queries by averaging average precision
- assume user is interested in finding many relevant documents for each query
- requires many relevance judgments in test collection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

recall precision graphs

A

cannot show pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interpolation

A

defines precision at any recall level as the maximum precision observed in any recall-precision point at a higher recall level

  • into step function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

joining average precision points at standard recall levels

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly