Information Extraction Flashcards

1
Q

Information extraction

A

The process of Information Extraction turns the unstructured information embedded in texts into structured data, e.g. populating a relational database to enable further processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Relation Extraction

A

Finding and classifying semantic relations among entities mentioned in a text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RDF triple

A

A tuple of entity-relation-entity,
called a subject-predicate-object expression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

5 Classes of algorithms for relation extraction

A
  • handwritten patterns
  • supervised machine learning
  • semi-supervised (via bootstrapping or distant supervision)
  • unsupervised
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Semisurpervised Relation Extraction via Bootstrapping

A

If we have a few high-precision seed patterns, or seed tuples, we can bootstrap a classifier.

Bootstrapping proceeds by taking the entities in the seed pair, and then finding sentences (e.g. on the web) that contain both entities.

From all such sentences, we extract and generalize the context around the entities to learn new patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Semantic drift

A

In semantic drift, an erroneous pattern leads to the introduction of erroneous tuples, which - in turn - leads to the creation of problematic patterns and the meaning of the extracted relations ‘drifts’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Relation Extraction

Confidence values in bootstrapping

A

Bootstrapping systems assign confidence values to new tuples to avoid semantic drift.

Given a document collection D, a current set of tuples, T, and a proposed pattern p, we need to track two factors:

  • hits(p): the set of tuples in T that p matches while looking in D.
  • finds(p): the total set of tuples that p finds in D.

Conf(p) = log(|finds(p))|) x |hits(p)| / finds(p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distant Supervision for Relation Extraction

A

Distant supervision combines the advantages of bootstrapping with supervised learning.

Instead of just a handful of seeds, distant supervision uses a large database to acquire a huge number of seed examples, creates lots of noisy pattern features from all these examples, and then combines them in a supervised classifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unsupervised Relation Extraction

Open Information Extraction

A

A task which has the goal of extracting relations from the web when we have no labeled training data, and not even any list of relations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Open Information Extraction

ReVerb 4 Steps

A
  1. Run a part-of-speech tagger and entity chuncker over s
  2. For each verb in s, find the longest sequence of words w that start with a verb and satisfy syntactic and lexical constraints, merging adjacent matches.
  3. For each phrase w, find the nearest noun phrase x to the left which is not a relative pronoun, wh-word or existential “there”. Find the nearest noun phrase y to the right.
  4. Assign confidence c to the relation r = (x, w, y) using a confidence classifier and return it.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Temporal expressions

A

Expressions that refer to absolute points in time, relative times, durations and sets of those.

Absolute temporal expressions can be mapped directly to calendar dates, times of day, or both.

Relative temporal expressions map to particular times through some other reference point.

Durations denote spans of time at varying levels of granularity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Temporal Normalization

A

The process of mapping a temporal expression to either a specific point in time, or to a duration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fully qualified date expression

A

Contains a year, month and day in some conventional form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Event Extraction

A

The task of identifying mentions of events in tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

7 Allen Relations

A

A before B
A overlaps B
A meets B
A equals B
A starts B
A finishes B
A during B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A