Indexing Texxt Flashcards

1
Q

Why do we index large collections?

A

Fast similarity computation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What data structures are used?

A

Inverted Index, Forward index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we index and process queries?

A

Inversion and QP algs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two data structures for Lexicon?

A

Hash-based and B+ tree-based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Inverted Index?

A

For each term t, we must store the list of all documents that contain t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Forward Index

A

Mapping of doc-ids to term-ids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is memory-based inversion?

A

Change from doc: [term, positions] to term: [document, <pos>] using the dictionary</pos>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Sort-based Inversion ?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly