Unit 5 - Web Searching Flashcards

1
Q

Data structures used for storing indices:

A
  1. Suffix tree,
  2. Inverted index
  3. Citation index
  4. N-gram index
  5. Term document matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are indices?

A

Indices are nothing but short descriptions of each webpage that may include title, creation, date and size, 1st line etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is XML?

A

Stands for extensible Markup Language, used for exchanging data on the Web

Enables separation of content(XML) and presentation(XSL).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Who created XML?

A

W3C, to provide easy to use and standardised way to store self describing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

INEX 2002 defined:

A

Component coverage and topical relevance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Four cases in Component coverage dimension

A

Exact coverage (E)
Too small (S)
Too large (L)
No coverage (N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cases in Topical relevance:

A

Highly relevant (3)
Fairly relevant (2)
Marginally relevant (1)
Non relevant (0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a search engine?

A

Search engine is a program which helps users to find information stored on a computer somewhere in the World Wide Web.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Centralised crawler index architecture:

A

It is used by most of the search engines so it uses a crawler gather information to a single site where it is index by the index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Components of crawler indexer architecture

A

Crawlers, index query engine user interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Problems using crawler indexer architecture

A

Dynamic nature of the web
High load on web servers
Large volume of data
Communication link problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Harvest distributed crawler index and architecture

A

Problems:
Due to different crawler server load increase
Object by the cross are usually useless and discarded
No coordination among the crawlers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Components of harvest:

A

Gatherers
Brokers
Replicator
Object cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly