week 7 Flashcards
(38 cards)
“What is the core idea behind Data Augmentation (DA) algorithms?”
To simplify sampling from a target distribution π(x) by introducing auxiliary variables y and sampling from an augmented joint density f(x, y) using Gibbs sampling.
“In the formal setting of DA, what is the relationship between the target distribution π(x) and the augmented density f(x, y)?”
The target distribution is the marginal of the augmented density: ∫ f(x, y) dy = π(x).
“What is the augmented target distribution?”
The joint density f(x, y) constructed such that its marginal distribution for x is the original target distribution π(x).
“What is the general structure of a DA algorithm iteration, starting with current state x’ = x(i)?”
Step 1: Sample auxiliary variable y ~ fY|X(· | x’). Step 2: Sample target variable x ~ fX|Y(· | y). Step 3: Set the next state x(i+1) = x.
“What happens to the auxiliary variable y sampled in Step 1 of a DA iteration?”
It is discarded after Step 2 (or 3), as it is not part of the original target distribution.
“When is a Metropolis-within-Gibbs step needed within a DA algorithm?”
When the full conditional distributions fX|Y or fY|X are not known distributions from which direct sampling is easy.
“What is the key idea of the Slice Sampler, a type of DA algorithm?”
Sampling from π(x) is equivalent to sampling uniformly from the region S under the graph of π(x), specifically S = {(x, u) : 0 ≤ u ≤ π(x)}.
“What is the augmented target density f(x, u) used in the Slice Sampler?”
f(x, u) is proportional to the indicator function 1S(x, u), meaning it’s uniform over the set S = {(x, u) : 0 ≤ u ≤ π(x)}.
“Describe Step 1 of the Slice Sampler algorithm, given the current state x’ = x(i).”
Simulate the auxiliary height variable u ~ Uniform(0, π(x’)).
“Describe Step 2 of the Slice Sampler algorithm, given the sampled height u.”
Simulate the next state x ~ Uniform(A), where A is the ‘slice’ A = {z : π(z) > u}.
“What is the probability mass function f(k; ρ) for the Yule-Simon distribution?”
f(k; ρ) = ρ * B(k, ρ + 1), for k = 1, 2, … and shape parameter ρ > 0, where B(·, ·) is the Beta function.
“How can the Yule-Simon distribution be represented as a mixture?”
If W ~ Exponential(ρ) and K | W ~ Geometric(e⁻ʷ), then the marginal distribution of K is Yule-Simon(ρ).
“What is the integral representation of the Yule-Simon PMF derived from its mixture representation?”
f(k; ρ) = ∫[0 to ∞] e⁻ʷ * (1 - e⁻ʷ)^(k-1) * ρ * e⁻ρʷ dw
“In the DA approach for the Yule-Simon distribution, what is the augmented target distribution f(k, w)?”
f(k, w) = e⁻ʷ * (1 - e⁻ʷ)^(k-1) * ρ * e⁻ρʷ, for w > 0 and k = 1, 2, ….
“In the DA approach for Yule-Simon, what is the full conditional distribution of K given W=w?”
K | W=w ~ Geometric(e⁻ʷ).
“In the DA approach for Yule-Simon, what is the full conditional distribution f(w|k) proportional to?”
f(w|k) ∝ e⁻ʷ * (1 - e⁻ʷ)^(k-1) * e⁻ρʷ.
“By using the change of variable T = e⁻ʷ, what is the full conditional distribution T | K in the Yule-Simon DA?”
T | K ~ Beta(ρ + 1, K).
“Describe Step 1 of the DA algorithm for sampling from Yule-Simon(ρ), given current state k(i).”
Sample t(i+1) | k(i) ~ Beta(ρ + 1, k(i)).
“Describe Step 2 of the DA algorithm for sampling from Yule-Simon(ρ), given t(i+1).”
Compute w(i+1) = -log(t(i+1)).
“Describe Step 3 of the DA algorithm for sampling from Yule-Simon(ρ), given w(i+1).”
Sample k(i+1) ~ Geometric(e⁻ʷ⁽ⁱ⁺¹⁾).
“What defines the posterior distribution in Bayesian statistics?”
π(θ|y) ∝ π(θ)L(θ; y), where π(θ) is the prior and L(θ; y) is the likelihood.
“Under what condition might the likelihood function L(θ; y) be intractable?”
When it involves integrals over latent variables (e.g., mixed models) or complex normalizing constants (e.g., Potts model, models on graphs).
“In the general mixed-effects model yjk = xᵀjkβ + bk + εjk, why is the likelihood f(yn|θ) often intractable?”
It requires integrating out the random effects bk over their distribution Fb.
“In the Potts model, why is the likelihood f(yn|θ) intractable?”
The normalizing constant involves a sum over all kⁿ possible configurations of the n pixels, which is computationally infeasible for large n.