Topic 6: Representation Learning Flashcards by Niko Dice

What are latent factors?

Latent factors are a collection of latent variables. Latent variables (also called hidden variables), are never observed in the training data.

How well did you know this?

Not at all

Perfectly

What are latent spaces?

A latent space is the model’s internal “understanding” of the data, encoded as points in a lower-dimensional, meaningful space.

A latent space is a way to represent data in a simpler, compressed form, where only the most important and informative features are kept. This helps the model understand the structure or “essence” of the data without all the noise or raw complexity.

How well did you know this?

Not at all

Perfectly

What are latent variables?

A hidden or unobservable variable that influences observed data. These variables are not directly measurable, but are inferred through models that relate them to observable data.

How well did you know this?

Not at all

Perfectly

What is PCA (and shortly t-SNE)?

PCA is a simple method for computing linear low-dimensional representations of the data. It’s a way to reduce the dimensions/features of data, to avoid noisy and irrelevant features. It will choose the components/features that can explain the most of the data (usually 3 or so can explain the whole data enough).

t-SNE is used for unsupervised, non-linear dimensionality reductions.

How well did you know this?

Not at all

Perfectly

What is unsupervised learning of latent code (factors/space/variables)?

An autoencoder performs unsupervised learning by training a model to compress input data into a lower-dimensional latent code and then reconstruct it. This latent code captures the most essential, hidden structure of the data, called latent variables or factors, without using any labels (unsupervised). Through this process, the model learns to represent complex inputs in a simplified, abstract latent space that preserves the most informative features.

How well did you know this?

Not at all

Perfectly

What are Denoising AutoEncoders?

THE PURPOSE OF DAE is to remove noise. You can also think of it as acustomiseddenoising algorithm tuned to your data.

Given that we train the DAE on a specific set of data, it will be optimised to remove noise from similar data. If we were to train it to remove noise from images, it will work well on similar images, but not for cleaning text data.

Unlike an undercomplete AE, it takes the same or higher number of neurons within the hidden layer, making the DAE overcomplete

The second difference comes from not using identical inputs and outputs. Instead, the outputs are the original data (e.g., images), while the inputs contain data with some added noise.

How well did you know this?

Not at all

Perfectly

How do you regularise AutoEncoders?

Autoencoders can be regularised in different ways to improve their learning of meaningful and generalisable latent representations.
1. method: apply sparsity constraints such as L1 regularisation or KL divergence to encourage only a few active latent units, making the representation more interpretable and efficient.
2. method: add noise to the input data, as in Denoising Autoencoders (DAEs), which forces the model to learn robust features by reconstructing the original input from corrupted versions.

These regularisation techniques help guide the autoencoder to extract the underlying structure of the data more effectively.

How well did you know this?

Not at all

Perfectly

What are manifolds (and hypothesis manifolds)?

Manifold Learning isa technique that simplifies the visualisation and analysis of complex, high-dimensional data sets, by finding underlying low-dimensional structures

manifolds - topological space appears like Euclidean space near each point. the data concentrates around low-dimensional manifolds

with PCA: transform the structure of manifold

with Autoencoders: learn structure of manifold as tangent planes

Manifold Learning isa technique that simplifies the visualization and analysis of complex, high-dimensional data sets, by finding underlying low-dimensional structures

manifold - topological space appears like Euclidean space near each point. the data concentrates around low-dimensional manifolds

with PCA: transform the structure of manifold

with Autoencoders: learn structure of manifold as tangent planes

How well did you know this?

Not at all

Perfectly

What are disentanglement representations?

Disentangled representation learning aims to learn a low-dimensional interpretable abstract representation that can identify and isolate different potential variables hidden in the high-dimensional observations.

Each dimension in the latent space corresponds to one meaningful and independent factor of variation in the data.

How well did you know this?

Not at all

Perfectly

What are word embeddings, and which methods exist?

A word embedding is a representation of a word. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning.

GloVe: relies on building an accurate co-occurrence matrix. its a context-free word embedding on large datasets. the semantic similarity is found via cosine similarity. neighbours reveal rare words represented similarly
Word2Vec: word2vec is not a singular algorithm, rather, it is a family of model architectures and optimizations that can be used to learn word embeddings from large datasets. Embeddings learned through word2vec have proven to be successful on a variety of downstream natural language processing tasks.
- CBOW: given the set of context words, what missing word is also likely to appear?
- Skip-gram: given this single word, what other words are likely to appear with

How well did you know this?

Not at all

Perfectly

What are AutoEncoders?

autoencoder (automatically encoding data): Auto encoders are one of the unsupervised deep learning models. The aim of an auto encoder is dimensionality reduction and feature discovery. An auto encoder is trained to predict its own input, but to prevent the model from learning the identity mapping, some constraints are applied to the hidden units.

Autoencoder (AE): Autoencoders present an efficient way to learn a representation of your data, which helps with tasks such as dimensionality reduction or feature extraction. You can even train anautoencoder toidentify and remove noisefrom your data.

An autoencoder consists of:

input layer: to pass input data into the network
Hidden layerconsisting ofEncoderandDecoder:to process information by applying weights, biases and activation functions
output layer:typically matches the input neuron

How well did you know this?

Not at all

Perfectly

What are good representations?

something that gives smoothness and linearity: small changes in the latent space → small changes in the output
has multiple explanatory factors: each latent variable captures a different aspect
causal factors: captures underlying causes, not just correlations
Depth or a hierarchical organization: organizes info across multiple layers or levels.

How well did you know this?

Not at all

Perfectly

Explain Word2Vec

Words that appear in similar contexts tend to have similar meanings. Word2Vec learns to represent each word as a vector such that words used in similar ways are close together in vector space.
Word2Vec is a predictive model for learning semantic representations of words by training a neural network to predict word-context pairs. It uses either CBOW or Skip-gram to model word relationships, and the result is a set of embeddings where semantic similarity is encoded as spatial closeness.

How well did you know this?

Not at all

Perfectly

Explain GloVe

Unlike Word2Vec, which learns from local context using a predictive model (like CBOW or Skip-gram), GloVe learns from the entire global co-occurrence statistics of a corpus.
It builds a co-occurrence matrix:
- Each cell in the matrix counts how often word i appears near word j in a window across the entire text.
- The idea is: the meaning of a word is captured by how frequently it co-occurs with other words.

GloVe is a word embedding method that uses a matrix of global word co-occurrence to learn vector representations. It combines the statistical strength of count-based methods with the efficiency of embedding learning, producing word vectors that reflect semantic similarity and meaningful relationships.

How well did you know this?

Not at all

Perfectly

Explain the Encoding and Decoding of latent features

the goal is to learn a set of basis factors (features) that can be used to reconstruct natural image patches. learning a set of basis factors (features) that can be used to reconstruct patches of natural images.

Encoding = converting a patch into a set of feature activations (coefficients).
Decoding = reconstructing the patch using those coefficients and basis features.

latent features: coefficients of basis sets for each local patch. Typically defined asunobserved or abstract concepts that cannot be measured directly.

sparsity: most image patches are described by the sum of few common patterns. Most image patches can be described using only a few active basis features.

How well did you know this?

Not at all

Perfectly

What type of completeness of AutoEncoders exists?

Study These Flashcards

Types of Autoencoders

The relationship between the number of nodes in each layer determines the type of an Autoencoder.E.g.:

Undercomplete Autoencoder(the focus of this article) – has fewer nodes (dimensions) in the middle compared to Input and Output layers. In such setups, we tend to call the middle layer a “bottleneck.”
Overcomplete Autoencoder– has more nodes (dimensions) in the middle compared to Input and Output layers.

What is a bottleneck AutoEncoder?

Study These Flashcards

An autoencoder is made up of:

encoder: a component that shrinks the input data from the train-validate-test set into an encoded representation that’s often several orders of magnitude less
bottleneck: A module that is the most crucial component of the network because it includes compressed knowledge representations.
- this hidden layer forces the network to learn the compressed latent representations
decoder: A component that aids the network in “decompressing” knowledge representations and recovering the data from its encoded state. Then, the output is contrasted with a source of truth.
Reconstruction Loss: The type of input and output we want the autoencoder to adapt to has a significant impact on the loss function we employ to train it. The most common loss functions for reconstruction when dealing with picture data are MSE Loss and L1 Loss. We can also utilize Binary Cross Entropy as the reconstruction loss if the inputs and outputs are in the [0,1] range, as they are in MNIST.
wide latent code layers: learns the identity function
- the latent space is large (as big or bigger than the input)
- the model doesn’t need to compress (it can just copy the input directly
narrow latent code layers: learns undercomplete representations
- The latent space is small (fewer dimensions than the input).
- The model is forced to compress the input as it can’t store everything.

What are probabilistic factor models?

Study These Flashcards

Probabilistic factor models are models where observed data is explained as being generated from a small number of hidden factors, with uncertainty modeled using probability distributions. They are powerful tools for understanding structure in complex data and are widely used in NLP and unsupervised learning.

Factor model - a generalised linear latent variable model used in personality research to analyze latent structures and modeling options

What is factor analysis?

Study These Flashcards

Factor analysis: PCA is a simple method for computing linear low-dimensional representations of the data. factor analysis is a generalisation of PCA. it’s based on a probabilistic model, meaning we can treat it as a building block for more complex models (such as mixturing when we want to).

What is Independent Component Analysis (ICA)?

Study These Flashcards

Independent Component Analysis (ICA):

A technique in the field of data analysis that allows you to separate and identify the underlying independent sources in a multivariate data set. it allows for analysing complex and highly correlated data.

ICA:
- Model independent latent components $p(h)$ with minimised
mutual information
- Different algorithms exist for learning $p(h)$ via maximum likelihood

What are distributed representations?

Study These Flashcards

Distributed Representations
Localist Representation:
- one neuron to each concept/feature. so one neuron represents exactly one thing
- easy to associate
- “one-hot encoding”
- disadvantages: it doesn’t scale well, since we need a new neuron for every thing. there’s no concept of similarity, cat and lion are just as different as cat and car

Distributed Representation:
- simultaneity in binding. One concept = a pattern of activity across many neurons
- Can represent many more concepts with fewer units = many-to-many relationship

What is Latent Semantic Analysis (LSA)?

Study These Flashcards

Latent Semantic Analysis (LSA) in a Nutshell

how do we derive word embeddings:

reminder: bag of words
- count word frewuency over i \in D unique tokens
- do this over your j \in N documents, e.g. via TFIDF
Latent semantic indexing (LSI)
- count frequency of tokens (e.g. words)
- describes the k-rank approximation ^C of C as embedding as word being in context
- Cosine-similarity ranks documents based on (word) query and document

LSA is similar to LSI, BUT:
- context j instead of document defined as local neighbourhood of words with window h
- Singular value decomposition (SVD) approximated count word occurrence (incremental algorithm)

FOCUS ON UNDERSTANDING HOW REPRESENTATIONS FORM IN AUTOENCODERS AND EMBEDDING LEARNING

Study These Flashcards

Representations in Autoencoders
- Autoencoders are trained to reconstruct inputs, forcing the model to compress information through a latent (bottleneck) layer.
- This compression forces the network to learn the most essential features of the data.
- The latent code (in the middle layer) becomes a representation of the input — ideally capturing underlying structure or “factors” (e.g., digit identity, style).
- Different regularisations (like sparsity or noise) encourage better representations.
- Takeaway: The encoder learns to form a compact and meaningful internal representation (latent space) of the data, not by being told what to learn, but by reconstructing from what it has learned.

Embedding Learning (e.g., Word Embeddings)
- Word embeddings (like Word2Vec or GloVe) represent words in a continuous vector space where semantic similarity = spatial closeness.
- These embeddings form implicitly by training a model to predict context (Skip-gram) or reconstruct target words (CBOW).
- What emerges is a space where “king - man + woman ≈ queen”, meaning the model has captured relationships without supervision.
- Takeaway: Embedding learning builds representations by training on large corpora and capturing patterns of co-occurrence. The structure of the embedding space encodes meaning.

What is does an encoder and decoder do?

Study These Flashcards

An encoder takes information and changes it into a special format that’s easier to send or store. A decoder does the opposite - it takes that special format and turns it back into the original information

What is the characteristic of reconstructed speech in a bottleneck AutoEncoder?

An AutoEncoder (AE) is a type of neural network used to learn efficient representations (encodings) of data, usually in an unsupervised way. An AutoEncoder consists of two parts: - Encoder: Compresses input 𝑥 into a low-dimensional vector 𝑧 (the "bottleneck"). - Decoder: Reconstructs 𝑥^ from that compressed representation 𝑧 𝑥 → Encoder→ 𝑧 → Decoder→ 𝑥^ The goal is to make the reconstruction 𝑥^ as close to 𝑥 as possible. The reconstructed speech will typically: - Lose fine details (e.g. prosody, speaker-specific traits) - Retain coarse, structural content (like phonemes, rhythm, general tone) - Sound muffled or lower-quality, especially if the bottleneck is narrow - Still be intelligible, depending on bottleneck size This is because the bottleneck compresses the input, so only the most essential features are kept and reconstructed.

Topic 6: Representation Learning Flashcards

(25 cards)