Text classification 3: Learning word embeddings Flashcards
(3 cards)
1
Q
What are count-based embedding matrices? What are cons?
A
- Each row represents a word
- Each column represents some context in which word can occur
- Every entry is a strength of association between i-th word and j-th context
Cons:
- data sparsity: some entries in the matrix may be incorrect because we did not observe enough data
- words have very high dimension (many contexts, possibly thousands or millions). Could tackle which dimensionality reduction
2
Q
What are dimensions of each param?
A
3
Q
A