NLP Flashcards

(33 cards)

1
Q

Corpus

A

A collection of speech data. Used for training NLP models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Word Sense Disambiguation

A

The process of identifying the correct meaning of a word, when multiple meanings can be interpreted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BLEU

A

Bi-Lingual Evaluation Understudy.
Metric used to measure the quality of a machine translation, compared to reference translations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ROUGE

A

A metric used to measure the quality of machine summaries, compared to reference summaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hidden Markov Model (HMM)

A

Statistical model used for part-of-speech tagging and speech recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Part-of-Speech tagging

A

Tags words in a sentence according to noun, verb, adverb etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Transfer Learning

A

Applying a model trained on one task to a different task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

N-Gram

A

A sequence of N continuous items from a text of speech.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Lemmatization

A

Reducing words to their base form, undoing any conjugations. Similar to Stemming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Stemming

A

Reducing words to their stem, undoing any conjugations. Similar to Lemmatization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Named Entity Recognition

A

Categorising names into predefined groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Co-reference Resolution

A

Identifying which words in a text refer to the same entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stop Word

A

A commonly used word which does not contribute to a texts content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Word Embeddings

A

Vectorisation of words. Similar words are mapped to nearby vectors in vector space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Word2Vec

A

Word vectorisation method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

GloVe

A

Word vectorisation method.

17
Q

BERT

A

Word vectorisation method.

18
Q

Bag-of-Words

A

Method for representing a set of words, without regard to order or grammar.

19
Q

TF-IDF

A

Term Frequency - Inverse Document Frequency
Metric measuring how important a word is to a document in a corpus, relative to its frequency in the rest of the corpus.

20
Q

Latent Semantic Analysis

A

Analyses relationships between words in a document corpus to discover semantic structures.

21
Q

Latent Dirichlet Allocation

A

Generative probabilistic model identifying topics in a document corpus.

22
Q

Generative Probabilistic Model

23
Q

Perplexity

A

Metric measuring how well a model predicts a sample. Lower perplexity is better performing.

24
Q

Componential Semantics

A

Words represented by sets of semantic components which together describe the meaning of the word.

25
Distributional Semantics
Describing words according to the contexts they appear in.
26
Thematic Distance
Metric measuring the similarity of words based on the angle between their vectors.
27
Saltonian vector
Binary vector representation of a word. Zero everywhere except the index of the word in the corpuses complete wordlist.
28
Vector size reduction
29
Word error rate
A metric measuring the relative error between a generated text and a reference text. S + D + I / N S: Substitutions D: Deletions I: Insertions N: Number of words in reference text
30
Connectionist Temporal Classification
A method for end-to-end Automatic Speech Recognition.
31
Attention-based Encoder-Decoder Models
A method for end-to-end Automatic Speech Recognition
32
Transducer Models (RNN-T)
A method for end-to-end Automatic Speech Recognition
33