06 Vector Space Model Flashcards

1
Q

what is an N-Gram

A

a sequence of any consecutive chars
eg. john is quicker than marry
n =1&raquo_space; “john”, “is”, “quicker”, “than”, “marry”&raquo_space; doest not imply who is quicker
n =2&raquo_space; “john is”, “is quicker”, “quicker than”, “than marry”…

used to estimate probability of sequence of word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is TF-IDF

A

matrix of term weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is an IR model

A

model for
- document representation
- query representation
- estimating the relevance given the query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the vector space model

A

representation of documents and queries as vectors
relevance can be calculated by comparing the similarity of the vectors

documents that have similar vectors talk about the same thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does it mean if the cosine similarity is negative

A

some algorithms to penalise non matching terms to a negative number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the advantages for vector space model

A
  • simple geometric interpretation
  • easy to compute and measure
  • easy to adapt to various weighting schemes
  • provides ranked output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the disadvantages of vector space model

A
  • high dimensionality
  • term independence assumption
  • which similarity metric to use?
  • no guidance on when to stop ranking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly