Question Answering Flashcards

1
Q

What is IR based QA?

A

It is information retrieval question answering. It uses a large corpus from the web, and given a question Machine Reading Comprehension (MRC) extracts answers from spans of text in a corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is knowledge based QA?

A

It maps questions to a knowledge based query to get an answer - it uses a database of facts like DBpedia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some other types of QA?

A

Long-form QA, which uses why type questions (long answers)

Community QA, uses QA pairs from sources like Quora and Stack Overflow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the concept of IR-based QA?

A

Given a question, return an answer from text spans with a corpus of web documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What model does IR-based factoid QA use?

A

Retrieve and Read model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the retrieve and read model do?

A

It retrieves relevant documents to a query from an index. MRC is used to select best text span from relevant documents for an answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some MRC datasets?

A

SQuAD 1.1, 2.0
Hotspot QA
Natural Questions
TyDI QA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does MRC work?

A

It performs answer extraction, which is a span labelling task. The input is a question and a passage. There are two outputs which form two decoder stacks, which produces two outputs stating whether a token it observes is the start and end of the answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is MRC encoded?

A

It uses the standard BERT model. We take the final hidden layer, T of the BERT model for the paragraph. An MLP is used for both the start and end point of the answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What loss function is used for MRC?

A

Cross entropy for both the start answer embedding and the end answer embedding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we find the best span score for MRC?

A

We take the dot product of the start embedding with the word embedding, added to the dot product of the end embedding with the word embedding. We look at all possible options and use argmax to take the best span.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do we need to ensure with the start and end embeddings?

A

That the start is always before the end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens when there are no-answer questions?

A

They contain the special [CLS] token as a proxy for a no answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens when the answer is longer than the BERT limit?

A

BERT limit is 512, so a sliding 512 word window is used over larger passage documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Where is IR based factual QA likely to fail?

A

Where answers are rooted in the deep web, so databases have to consulted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the concept of knowledge-based QA?

A

Given a question, return a DB query that can get the answer

17
Q

What type of information is knowledge-based QA good for?

A

Factual information (numerical information)

18
Q

What are graph-based QA approaches based on?

A

Based on relation databases or triple stores of subject, predicate, objects. The first task is to perform entity linking, then perform relation detection/linking, then finally perform a database query for the answer

19
Q

What is entity linking?

A

It is the task of associating a mention in text with an ontology/database entry where the input X is a question text and the output Y is the entity text span plus entity URI.

20
Q

What is a non-neural entity linking approach?

A

TAGME uses anchor dictionaries (wikipedia concept URI + text spans linking to this URI) + entity disambiguation. The probability of a page given a particular anchor entity is the co-occurrence in the corpus plus the relatedness

21
Q

What model is used for Neural Entity Linking?

A

The EQL model

22
Q

How does the EQL model work?

A

Two encoders are used, a question encoder and an entity encoder. We have two inputs, the question, which is passed to the question encoder, and the entity candidate description, which is passed to the entity encoder along with the entity title. There are two separate outputs also with their own classifiers. The entity mention classifier observes the start of the entity, the end of the entity and is the token part of the entity. It tries to predict if a particular token is an entity within the question. The entity linker outputs the entity URI, the actual knowledge based entity. The entity linker compares the entity mention embedding and the entity candidate embedding to provide a disambiguated knowledge based entity for each question entity text span

23
Q

What happens once we have identified the entities with knowledge based QA?

A

Once we have identified the entity within the question and identity with the a knowledge base, the relation needs to be identified.

24
Q

What is neural relation detection and linking?

A

It is a way of linking a relation between entities in the database. The input is the question text and the output is the relation text span and the relation URI