C6 Flashcards

1
Q

information extraction (IE) with applications

A

discover structured information from unstructured or semi-structured text

applications: automatically identify mentions of medications and side effects in electronic health records, find company names in economic newspaper texts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

2 types of information extraction tasks

A
  • Named Entity Recognition (NER)
  • Relation extraction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Named Entity Recognition

A

machine learning task based on sequence labelling:
- word order matters
- one entity can span multiple words
- multiple ways to refer to the same concept

=> extracted entities often need to be linked to a standard form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sequence labelling for NER

A
  • sequence = sentence, element = word, label = entity type
  • one label per token
  • assigned tags capture both the boundary and the type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

IOB tagging

A

format of training data
- each word gets a label (punctuation gets labelled separately)
- beginning (B), inside (I) of each entity type
- and one for tokens outside (O) any entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hidden Markov Model (HMM)

A

probabilistic sequence model: given a sequence of units (words), it computes a probability distribution over possible
sequences of labels and chooses the best label sequence

probabilities are estimated by counting on a labelled training corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

feature-based NER

A

supervised learning:
- each word is represented by a feature vector with information about the word and its context: create a feature vector for word x_i in position i, describing x_i and its context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Part-Of-Speech tagging

A

Part-of-speech (POS) = category of words that have similar grammatical properties
- noun, verb, adjective, adverb
- pronoun, preposition, conjunction, determiner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Conditional Random Fields (CRF)

A

It is hard for generative models like HMMs to add features directly into the model => more powerful model: CRF

  • discriminative undirected probabilistic graphical model
  • can take rich representations of observations (feature vectors)
  • takes previous labels and context observations into account
  • optimizes the sequence as a whole. The probability of the best sequence is computed by the Viterbi algorithm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

commonly used neural sequence model for NER

A

bi-LSTM-CRF:
LSTM = neural architecture with Long Short-Term Memory
Bi-LSTMs are Recurrent Neural Networks (RNNs)

But for NER the softmax optimization is insufficient because we need strong constraints for neighbouring tokens (I tag must follow I or B tag) => CRF layer on top of the bi-LSTM output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

normalization of extracted mentions

A

suppose we have to extract company names and stock market info in newspaper text -> multiple extracted mentions can refer to the same concept

in order to normalize these, we need a list of concepts:
- knowledge bases (IMBD, Wikipedia)
- ontology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ontology linking approaches

A
  1. Define it as text classification task with the ontology items as labels. challenges: huge label space and we don’t have training data for all items
  2. Define it as term similarity task: use embeddings trained for synonym detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

give a relation extraction example and three possible methods

A

example relations: Tim Wagner is a spokesman for American Airlines, United is a unit of UAL Corp.

methods:
1. Co-occurrence based
2. Supervised learning (most reliable)
3. Distant supervision (if labelled data is limited)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

co-occurrence based relation extraction

A

assumption: entities that frequently co-occur are semantically connected

  • use a context window (e.g. sentence) to determine co-occurrence
  • we can create a network structure based on this
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

supervised relation extraction

A

assumtions: two entities, one relation

relation extraction as classification problem
1. Find pairs of named entities (usually in the same sentence).
2. Apply a relation classification on each pair. The classifier can use any supervised technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

distant supervision relation extraction

A

Suppose we don’t have labelled data for relation extraction, but we do have a knowledge base => How could you use the knowledge base to identify relations in the text and discover relations that are not yet in the knowledge base?

  1. Start with a large, manually created knowledge base (e.g. IMDB)
  2. Find occurrences of pairs of related entities from the database in sentences
    - Assumption: If two entities participate in a relation, any sentence that contains these entities express that relation
  3. Train a Relation Extraction classifier (supervised) on the found entities and their context
  4. Apply the classifier to sentences with yet unconnected other entities in order to find new relations
17
Q

what is a named entity?

A

A named entity is a sequence of words that designates some real-world entity (typically a name)

18
Q

challenges of NER

A
  • ambiguity of segmentation (where are the boundaries of an entity?)
  • type ambiguity (JFK refers to president or airport
  • shift of meaning (“president of the US” changes from Obama to Trump)
19
Q

why do we need knowledge bases or ontologies for NER?

A

multiple extracted mentions can refer to the same concept, so we want to normalize these and need a list of concepts

20
Q

relation extraction

A

example relations: Tim Wagner is a spokesman for American Airlines, United is a unit of UAL Corp.

methods:
1. Co-occurrence based
2. Supervised learning (most reliable)
3. Distant supervision (if labelled data is limited)

21
Q

why are POS tags informative for NER?

A

some word categories are more likely to be (part of) an entity