Vector Semantics and Embeddings Flashcards

1
Q

Distributional hypothesis

A

Words that occur in similar contexts tend to have similar meanings.

The hypothesis was formulated in the 1950s by Joos, Harris and Firth, who noticed that words which are synonyms tended to occur in the same environment with the amount of meaning difference between two words “corresponding to the amount of difference in their environments”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Vector semantics

A

Vector semantics instantiates the distributional hypothesis by learning representations of the meaning of words, called embeddings, directly from their distributions in texts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Representation learning

A

Self-supervised learning, where useful representations of the input text are automatically learned, instead of crafting representation by hand using feature engineering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Lexical symantics

A

The linguistic study of word meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Propositional meaning

A

Two words are synonymous if they are substitutable for one another in any sentence without changing the truth conditions of the sentence - the situations in which the sentence would be true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Principle of contrast

A

A difference in linguistic form is always associated with some difference in meaning.

E.g. H₂O and water are synonymous. But H₂O is used in scientific contexts and would be inappropriate in a hiking guide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Semantic field

A

A set of words which cover a particular semantic domain and bear structured relations with each other.

E.g. the semantic field of hospitals (surgeon, scalpel, nurse, anesthetic, hospital), restaurants (waiter, menu, plate, food, chef).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Semantic frame

A

A set of words that denote perspectives or participants in a particular type of event.

E.g. a commercial transation is a kind of event in which one entity trades money to another entity in return for some good or service, after which the good changes hands or perhaps the service is performed.

This event can be encoded lexically by using verbs like buy (the event from the perspective of the buyer), sell (from the persepctive of the seller), pay (focussing on the monetary aspect), or nouns like buyer.

Frames have semantic roles (buyer, seller, goods, money) and words in a sentence can take on these roles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Connotations

A

Words have affective meanings.

The aspects of a word’s meaning that are related to a writer or reader’s emotions, sentiment, opinions or evaluations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sentiment

A

Positive or negative evaluation language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Vectors semantics

A

The standard way to represent word meaning in NLP.

The idea is to represent a word as a point in a multidimensional semantic space that is derived from the distributions of word neighbours.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Embeddings

A

Vectors for representing words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Co-occurrence matrix

A

A way of representing how often words co-occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Term-document matrix

A

Each row represents a word in the vocabulary and each column represents a document from some collection of documents.

Each cell represents the number of times a particular word occurs in a particular document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Information retrieval

A

The task of finding the document d from the D documents in some collection that best matches a query q.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

term-term matrix

A

A matrix of dimensionality |V| x |V|, where each cell records the number of times the row word and the column word co-occur in some context in some training corpus.

The context could be the document. However, it is common to use smaller contexts, generally a window around the word, e.g. 4 words to the left and 4 words to the right.

17
Q

Cosine similarity

A

cosine(v, w) = v · w / (|v| |w|)

Value ranges from -1 to 1.

But since raw frequency values are non-negative, cosine similarity for these vectors ranges from 0-1.

18
Q

tf-idf weighting

A

A product of two terms: term frequency and inverse document frequency

w(t, d) = tf( t, d) x idf (t )

19
Q

tf-idf

term frequency

A

The frequency of the word t in the document d.

tf(t, d) = count(t, d)

Commonly a log weighting is used:

tf(t, d) = log₁₀ ( count(t, d) + 1)

20
Q

tf-idf

inverse document frequency

A

The document frequency of a term t is the number of documents it occurs in. df(t)

Inverse document frequency, idf, where N is the total number of documents in the collection:

idf(t) = N / df(t)

Commonly a log weighting is used:

idf(t) = log₁₀ ( N / df(t) )

The fewer documents a term occurs in, the higher this weight.

21
Q

Positive Pointwise Mutual Information

Intuition

A

The best way to weight the association between two words is to ask how much more two words co-occur in our corpus than we would have a priori expected them to appear by chance.

22
Q

Pointwise Mutual Information

A

A measure of how often two events x and y occur, compared with what we would expect if they were independent:

I(x, y) = log₂ P(x, y) / (P(x) P(y)

The Pointwise mutual information between a target word w and a context word c is then defined as:

PMI( w, c ) = log₂ P(w, c) / (P(w) P(c))

The numerator tells us how often we observed the two words together (assuming we compute probability by using the MLE).

The denominator tells us how often we would expect the two words to co-occur assuming they each occurred independently.

23
Q

Word2Vec

Intuition of skip-gram

A
  1. Treat the target word and a neighbouring context word as positive examples
  2. Randomly sample other words in the lexicon to get negative samples.
  3. Use logistic regression to train a classifier to distinguish these two cases.
  4. Use the learned weights as the embeddings.
24
Q

First-order co-occurrence

A.k.a syntagmatic association

A

Two words have first-order co-occurrence if they are typically nearby each other.

.e.g “wrote” is a first-order associate of “book” or “poem”

25
Q

Paradigmatic association

Second-order co-occurrence

A

Two words have second-order co-occurrence if they have similar neighbours.

Wrote is a second-order associate of words like “said” or “remarked”

26
Q
A