Week 5 - Word Sense Disambiguation Flashcards

1
Q

Word Sense

A

one of the meanings of a word in a linguistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Word Sense Disambiguation

A

(WSD) is the NLP task of selecting which sense of a word is used in a given piece of text (e.g. a sentence) from a set of multiple known possibilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WSD applications

A

Machine translation - lexical choices for words that need different translations for different senses

Information Retrieval - Search choices for queries that are relevant to different topics for different senses

Bioinformatics - Assign a species identifier (e.g. human, mouse) to a gene and gene product entity (e.g. proteins)

Medical Applications - Find the correct meaning of acronyms in clinical text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Typical WSD approaches

A

Knowledge-based
- Use external lexical resources like dictionaries, thesaurus

Supervised machine Learning
- Use labelled training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lesk Algorithm

A

Examine the definition overlap in all possible sense combinations among all the words in a given text

Implementation
- Retrieve from the dictionary all sense definitions of the words in the given piece of text
- Calculate the definition overlaps for all possible sense configurations
- Choose the senses that offers the highest overlap

Disadvantage:
Very impractical for long sentences

Disambiguating all words in the sentence takes m1xm2xm3x…xmn where mi is the number of definitions of the ith word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Simplified Lesk Algorithm

A

A faster version of Lesk for longer sentences

Examines overlap between sense definition of a word and its current context

Compare the senses to the context (the given sentence)

Disambiguating all words in the sentence takes m1+m2+m3+…+mn where mi is the number of definitions of the ith word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Corpus Lesk approach

A

Enhance performance using labelled data

Enhance the sense definition with labelled data

Add labelled examples to the definitions

Weigh each overlapped word using a weight
Examples says the idf of the word overlapping between the target sentence and the sense definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Supervised machine learning - goal

A

To predict the output for an input data pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Training examples

A

A set of example data patterns are provided, where the ground-truth output is known for each example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Predictive mapping

A

A mapping from an input data pattern and the desired output built from training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Annotated training corpus

A

A collection of training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Classification

A

Assign an input data pattern to one of a pre-defined set of classes (categorical output)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Converting WSD to classification

A

An input data pattern: a word in context

Pre-defined set of classes: dictionary senses (called tag set)

Training corpus: A collection of words tagged in context with their sense

One option is to train one classifier to identify the sense for one word. N words in the dictionary requires to build N classifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Building a WSD classifier

A

Given an annotated corpus

Find a way to characterise each word pattern (along with its context) with a set of features (feature extraction)

With existing tools:
Choose a classifier (classification algorithm)

Train the classifier using the training examples

Test the trained classifier using new examples (evaluation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bag of word features (WSD)

Example:
“An electric guitar and bass player stand off to one side not really part of the scene”
+/-2 window, what is the set of features for “bass”

A

Based on words occurring anywhere with a window of the target word

Consider frequency (occurrence counts)

Answer: {guitar, and, player, stand}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Feature vector

Example:
[fishing, sound, player, fly, rod, pound, stand, runs, guitar, band]

Convert: {and, guitar, player, stand} into a feature vector

A

[0, 0, 1, 0, 0, 0, 1, 0, 1, 0]

17
Q

Naive bayes classifier for WSD

Do Question 4 of week 5 homework sheet

A

See Week 5 homework solution sheet

18
Q

Classifier

A

A chosen classifier can compute (x input feature vector, y class)
- discriminant function f(x)
- posterior probabilities p(y|x)
- Joint probabilities p(x,y)
- Conditional Probabilities p(x|y)

Possible Options:
- logistic regression
- Fishers linear discriminant analysis
- naive bayes classifier
- support vector machine
- neural networks
- k-nearest neighbour
- …

19
Q

Generating more training examples

A

One sense per collocation - A word recurring in collocation with the same word will almost surely have the same sense.
e.g. “play” often occurs with the music “bass”
“fish” often occurs with the fish “bass”

One sense per discourse - The sense of a word is highly consistent within a document, especially topic specific words

Automatically generate more training examples with rules, to be combined with hand-labelled training examples.

This is considered semi-supervised learning

20
Q

sequence labelling - popular machine learning techniques

A

Structured support vector machines

Conditional random field

Hidden Markov model

Recurrent neural network

21
Q

WSD Evalutions

A

Check sense accuracy (intrinsic evaluation)
- % of words tagged identically with human-manual sense tags
- usually evaluate using held-out data from same labelled corpus (train-test split)

Task Based Evaluation (extrinsic evaluation)
Embed WSD in an NLP task (e.g. use the results from WSD in a machine translation task, and see if the WSD makes the translation results better) and see if you can do the task better

22
Q

WSD baselines for comparison

A

Assign the most frequent sense
Simplified Lesk (~42% accuracy)

Human agreement on all-words corpora with WordNet style senses are around 75%-80%