EM Flashcards
(27 cards)
What problem does Expectation Maximization (EM) solve?
It optimizes likelihood in models with hidden or latent variables.
What is the basic idea behind EM?
Alternating between estimating hidden variables and optimizing model parameters.
What are the two main steps in EM?
E-step (Expectation) and M-step (Maximization).
What happens in the E-step of EM?
Compute the posterior distribution of the latent variables given current parameters.
What happens in the M-step of EM?
Update parameters to maximize the expected complete-data log-likelihood.
Why can’t we directly maximize the likelihood in latent variable models?
Because the log of a sum over hidden variables doesn’t simplify easily.
What is the name of the function that EM maximizes as a lower bound?
The Evidence Lower Bound (ELBO) or free energy.
Why does EM increase the log-likelihood at every iteration?
Because each step maximizes a lower bound on the true log-likelihood.
What does the ELBO become tight (equal to the log-likelihood)?
When the approximate posterior matches the true posterior.
What distribution is commonly used in the E-step for GMMs?
The posterior responsibilities: p(c | x, θ).
What does qₙc represent in GMM EM?
The probability that data point xₙ belongs to cluster c.
What is the formula for qₙc in the E-step of GMMs?
qₙc = πc · N(xₙ | μc, Σc) / Σk πk · N(xₙ | μk, Σk)
What is updated during the M-step of GMM EM?
The means, covariances, and mixing coefficients of the Gaussians.
What is the formula for updating μc in GMM EM?
μc = Σ qₙc xₙ / Σ qₙc
What is the formula for updating Σc in GMM EM?
Σc = Σ qₙc (xₙ - μc)(xₙ - μc)ᵀ / Σ qₙc
What is the formula for updating πc in GMM EM?
πc = (1/N) Σ qₙc
What type of model is a Gaussian Mixture Model (GMM)?
A probabilistic generative model with latent variables.
How is EM related to K-means?
K-means is a limiting case of GMM with hard assignments and small variance.
When does GMM reduce to K-means?
When covariances are isotropic and approach zero.
What kind of assignment does K-means make?
Hard assignments (each point to one cluster).
What kind of assignment does EM in GMM make?
Soft assignments using probabilities.
What type of optimization method is EM?
An iterative coordinate ascent on a lower bound of the log-likelihood.
Why is the log-likelihood hard to compute directly in GMMs?
Because it involves the log of a sum over components.
What is a key requirement for EM to work?
That we can compute the posterior and maximize the expected log-likelihood.