NLP Flashcards
(33 cards)
Corpus
A collection of speech data. Used for training NLP models
Word Sense Disambiguation
The process of identifying the correct meaning of a word, when multiple meanings can be interpreted.
BLEU
Bi-Lingual Evaluation Understudy.
Metric used to measure the quality of a machine translation, compared to reference translations.
ROUGE
A metric used to measure the quality of machine summaries, compared to reference summaries.
Hidden Markov Model (HMM)
Statistical model used for part-of-speech tagging and speech recognition
Part-of-Speech tagging
Tags words in a sentence according to noun, verb, adverb etc.
Transfer Learning
Applying a model trained on one task to a different task.
N-Gram
A sequence of N continuous items from a text of speech.
Lemmatization
Reducing words to their base form, undoing any conjugations. Similar to Stemming.
Stemming
Reducing words to their stem, undoing any conjugations. Similar to Lemmatization.
Named Entity Recognition
Categorising names into predefined groups.
Co-reference Resolution
Identifying which words in a text refer to the same entity.
Stop Word
A commonly used word which does not contribute to a texts content.
Word Embeddings
Vectorisation of words. Similar words are mapped to nearby vectors in vector space.
Word2Vec
Word vectorisation method.
GloVe
Word vectorisation method.
BERT
Word vectorisation method.
Bag-of-Words
Method for representing a set of words, without regard to order or grammar.
TF-IDF
Term Frequency - Inverse Document Frequency
Metric measuring how important a word is to a document in a corpus, relative to its frequency in the rest of the corpus.
Latent Semantic Analysis
Analyses relationships between words in a document corpus to discover semantic structures.
Latent Dirichlet Allocation
Generative probabilistic model identifying topics in a document corpus.
Generative Probabilistic Model
Perplexity
Metric measuring how well a model predicts a sample. Lower perplexity is better performing.
Componential Semantics
Words represented by sets of semantic components which together describe the meaning of the word.