Sequence Labelling for Parts of Speech and Named Entities Flashcards

1
Q

Part-of-speech tagging

A

Taking a sequence of words and assigning each word a part of speech like NOUN or VERB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Named entity recognition

A

Assigning words or phrases tags like PERSON, LOCATION or ORGANIZATION.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sequence labelling tasks

A

Tasks in which we assign each item in an input sequence, xᵢ, a label yᵢ, so that the output sequence Y has the same length as the input sequence X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2 Categories of Parts of Speech

Closed class

A

POS with relatively fixed membership, such as prepositions. New prepositions are rarely coined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

2 Categories of Parts of Speech

Open class

A

Nouns and verbs. New nouns are continually being created or borrowed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Categories of Parts of Speech

4 Major open classes

A
  • nouns
  • verbs
  • adjectives
  • adverbs
  • (smaller open class of) interjections
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Count nouns vs mass nouns

A

Count nouns can occur in the singular and plural (goat / goats, relationship / relationships) can can be counted.

Mass nouns are used when something is conceptualized as a homogeneous group. (snow, salt, communism).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Proper nouns

A

names of specific persons or entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Verbs

A

Refer to actions and processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Adjectives

A

Often describe properties or qualities of nouns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Adverbs

A

Adverbs generally modify something.

Directional adverbs or locative adverbs specify the direction or location of some action. (home, here, downhill)

Degree adverbs specify the extent of some action, process or property (extremely, very, somewhat).

Manner adverbs describe the manner of some action or process (slowly, slinkily, delicately)

Temporal adverbs describe the time that some action took place (yesterday, Monday)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Particle

A

Resembles a preposition or an adverb and is used in combination with a verb.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Conjunctions

A

Join two phrases, clauses or sentences.

Coordinating conjunctions - and, or, but - join two elements of equal status.

Subordinating conjunctions are used when one of the elements has some embedded status.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

4 Common Named Entity types

A

PER (person)
LOC (location)
ORG (organization)
GPE (geo-political entity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Markov chain

A

A model that tells us something about the probabilities of sequences of random variables - states - each of which can take on values from some set.

These sents can be words, or tags, or symbols representing anything, e.g. the weather.

A Markov chain makes a very strong assumption that if we want to predict the future in the sequence, all that matters is the current state. All states before the current state have no impact on the future except via the current state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

3 Components of a markov model

A

Q = q₁, q₂, … qₙ. A set of n states

A = a₁₁, a₁₂, ….. aₙ₁, …aₙₙ A transition probability matrix A, each aᵢⱼ representing the probability of moving from state i to state j

π = π₁, π₂, …, πₙ. An initial probability distribution over states. πᵢ is the probability that the Markov chain will start in state i.

17
Q

Hidden Markov Model

A

A hidden Markov model (HMM) allows us to talk about both observed events (like words in the input) and hidden events (like POS tags) that we think of as causal factors in our probabilitstic model.

18
Q

Components of an HMM:

A

Q = q₁, q₂, … qₙ. A set of n states

A = a₁₁, a₁₂, ….. aₙ₁, …aₙₙ A transition probability matrix A, each aᵢⱼ representing the probability of moving from state i to state j

O = o₁, o₂, … oₜ. A sequence of t observations, each one drawn from a vocabulary V.

B = bᵢ(oᵣ). A sequence of observation likelihoods, also called emission probabilities, each expressing the probability of an observation oᵣ being generated from a state qᵢ.

π = π₁, π₂, …, πₙ. An initial probability distribution over states. πᵢ is the probability that the Markov chain will start in state i.

19
Q

HMM tagging as decoding

A

Given as input an HMM 𝞴 = (A, B) and a sequence of observations O = o₁, o₂, ...oₜ, find the most probable sequence of states Q = q₁, q₂, ...qₜ

20
Q

2 common approaches to sequence modelling

A
  • A generative approach: HMM tagging
  • A descriminative approach: CRF tagging
21
Q

How are the probabilities in HMM taggers estimated

A

By maximum likelihood estimation on tag-labeled training corpora.

The Viterbi algorithm is used for decoding, finding the most likely tag sequence.

22
Q

Conditional Random Fields

A

Train a log-linear model that can choose the best tag sequence given an observation space, based on features that condition on the output tag, the prior output tag, the entire input sequence, and the current timestep.

They use the Viterbi algorithm for inference, to choose the best sequence of tags, and a version of the Forward-Backward algorithm for training.

23
Q
A