week 7 Flashcards by tyrion lannister

“What is the core idea behind Data Augmentation (DA) algorithms?”

To simplify sampling from a target distribution π(x) by introducing auxiliary variables y and sampling from an augmented joint density f(x, y) using Gibbs sampling.

How well did you know this?

Not at all

Perfectly

“In the formal setting of DA, what is the relationship between the target distribution π(x) and the augmented density f(x, y)?”

The target distribution is the marginal of the augmented density: ∫ f(x, y) dy = π(x).

How well did you know this?

Not at all

Perfectly

“What is the augmented target distribution?”

The joint density f(x, y) constructed such that its marginal distribution for x is the original target distribution π(x).

How well did you know this?

Not at all

Perfectly

“What is the general structure of a DA algorithm iteration, starting with current state x’ = x(i)?”

Step 1: Sample auxiliary variable y ~ fY|X(· | x’). Step 2: Sample target variable x ~ fX|Y(· | y). Step 3: Set the next state x(i+1) = x.

How well did you know this?

Not at all

Perfectly

“What happens to the auxiliary variable y sampled in Step 1 of a DA iteration?”

It is discarded after Step 2 (or 3), as it is not part of the original target distribution.

How well did you know this?

Not at all

Perfectly

“When is a Metropolis-within-Gibbs step needed within a DA algorithm?”

When the full conditional distributions fX|Y or fY|X are not known distributions from which direct sampling is easy.

How well did you know this?

Not at all

Perfectly

“What is the key idea of the Slice Sampler, a type of DA algorithm?”

Sampling from π(x) is equivalent to sampling uniformly from the region S under the graph of π(x), specifically S = {(x, u) : 0 ≤ u ≤ π(x)}.

How well did you know this?

Not at all

Perfectly

“What is the augmented target density f(x, u) used in the Slice Sampler?”

f(x, u) is proportional to the indicator function 1S(x, u), meaning it’s uniform over the set S = {(x, u) : 0 ≤ u ≤ π(x)}.

How well did you know this?

Not at all

Perfectly

“Describe Step 1 of the Slice Sampler algorithm, given the current state x’ = x(i).”

Simulate the auxiliary height variable u ~ Uniform(0, π(x’)).

How well did you know this?

Not at all

Perfectly

“Describe Step 2 of the Slice Sampler algorithm, given the sampled height u.”

Simulate the next state x ~ Uniform(A), where A is the ‘slice’ A = {z : π(z) > u}.

How well did you know this?

Not at all

Perfectly

“What is the probability mass function f(k; ρ) for the Yule-Simon distribution?”

f(k; ρ) = ρ * B(k, ρ + 1), for k = 1, 2, … and shape parameter ρ > 0, where B(·, ·) is the Beta function.

How well did you know this?

Not at all

Perfectly

“How can the Yule-Simon distribution be represented as a mixture?”

If W ~ Exponential(ρ) and K | W ~ Geometric(e⁻ʷ), then the marginal distribution of K is Yule-Simon(ρ).

How well did you know this?

Not at all

Perfectly

“What is the integral representation of the Yule-Simon PMF derived from its mixture representation?”

f(k; ρ) = ∫[0 to ∞] e⁻ʷ * (1 - e⁻ʷ)^(k-1) * ρ * e⁻ρʷ dw

How well did you know this?

Not at all

Perfectly

“In the DA approach for the Yule-Simon distribution, what is the augmented target distribution f(k, w)?”

f(k, w) = e⁻ʷ * (1 - e⁻ʷ)^(k-1) * ρ * e⁻ρʷ, for w > 0 and k = 1, 2, ….

How well did you know this?

Not at all

Perfectly

“In the DA approach for Yule-Simon, what is the full conditional distribution of K given W=w?”

K | W=w ~ Geometric(e⁻ʷ).

How well did you know this?

Not at all

Perfectly

“In the DA approach for Yule-Simon, what is the full conditional distribution f(w|k) proportional to?”

Study These Flashcards

f(w|k) ∝ e⁻ʷ * (1 - e⁻ʷ)^(k-1) * e⁻ρʷ.

“By using the change of variable T = e⁻ʷ, what is the full conditional distribution T | K in the Yule-Simon DA?”

Study These Flashcards

T | K ~ Beta(ρ + 1, K).

“Describe Step 1 of the DA algorithm for sampling from Yule-Simon(ρ), given current state k(i).”

Study These Flashcards

Sample t(i+1) | k(i) ~ Beta(ρ + 1, k(i)).

“Describe Step 2 of the DA algorithm for sampling from Yule-Simon(ρ), given t(i+1).”

Study These Flashcards

Compute w(i+1) = -log(t(i+1)).

“Describe Step 3 of the DA algorithm for sampling from Yule-Simon(ρ), given w(i+1).”

Study These Flashcards

Sample k(i+1) ~ Geometric(e⁻ʷ⁽ⁱ⁺¹⁾).

“What defines the posterior distribution in Bayesian statistics?”

Study These Flashcards

π(θ|y) ∝ π(θ)L(θ; y), where π(θ) is the prior and L(θ; y) is the likelihood.

“Under what condition might the likelihood function L(θ; y) be intractable?”

Study These Flashcards

When it involves integrals over latent variables (e.g., mixed models) or complex normalizing constants (e.g., Potts model, models on graphs).

“In the general mixed-effects model yjk = xᵀjkβ + bk + εjk, why is the likelihood f(yn|θ) often intractable?”

Study These Flashcards

It requires integrating out the random effects bk over their distribution Fb.

“In the Potts model, why is the likelihood f(yn|θ) intractable?”

Study These Flashcards

The normalizing constant involves a sum over all kⁿ possible configurations of the n pixels, which is computationally infeasible for large n.

"What is the likelihood (PMF) for the Wallenius distribution P(x; n, m, ω)?"

P(x; n, m, ω) = [Π(j=1 to c) (mj choose xj)] * ∫[0 to 1] Π(j=1 to c) [1 - t^(ωj/d)]^xj dt, where d = Σωj(mj-xj).

"Why is sampling from the Wallenius distribution (or its posterior) not straightforward?"

The likelihood involves an integral that is generally not available in closed form.

"What is the key requirement for applying Approximate Bayesian Computation (ABC) methods?"

The ability to simulate data z from the likelihood model f(·|θ) for any given parameter θ.

"What is the core step in the exact likelihood-free rejection sampling algorithm (for discrete data y)?"

Accept a parameter θ' drawn from the prior π(θ) if data z simulated from f(·|θ') exactly matches the observed data y (z=y).

"Why does exact likelihood-free rejection sampling typically fail for continuous data?"

The probability of simulating z that exactly matches y (P(z=y | θ')) is typically zero.

"How does ABC modify the acceptance condition of likelihood-free rejection sampling?"

It accepts θ' if the simulated data z is 'sufficiently close' to the observed data y, measured by a distance d(·) being less than a tolerance ε.

"How are summary statistics s(·) used in ABC, especially for high-dimensional data?"

Instead of comparing full datasets z and y, ABC compares their summary statistics: d(s(z), s(y)) ≤ ε.

"What is the acceptance condition in the standard ABC rejection sampling algorithm using summary statistics?"

d{s(z), s(y)} ≤ ε, where z ~ f(·|θ'), θ' ~ π(θ), s(·) is the summary statistic function, d(·,·) is the distance function, and ε is the tolerance.

"The accepted values θ from ABC rejection sampling are i.i.d. draws from which distribution?"

They are draws from the approximate posterior distribution π_ε(θ|y), which approximates the true posterior π(θ|y).

"What distribution do the accepted pairs (θ, z) follow in ABC rejection sampling?"

π_ε(θ, z|y) ∝ π(θ)f(z|θ)IAε,y(z), where IAε,y(z) is the indicator function for the acceptance region {z : d(s(z), s(y)) ≤ ε}.

"What are the three critical choices that need to be made when implementing ABC?"

1. Choice of summary statistics s(y), 2. Choice of distance function d(·,·), 3. Choice of tolerance level ε.

"In the ABC example for the Wallenius distribution, what summary statistic was used?"

The average frequency vector across k draws: p̂ = (1/k) Σ[h=1 to k] p̂h, where p̂h is the vector of observed frequencies (xhj/nh) for draw h.

"In the ABC example for the Wallenius distribution, what distance metric was used?"

Distance in variation: ρ(p̂(l), p̂(t)) = (1/2) Σ[j=1 to c] | p̂j(l) - p̂j(t) |.

"What prior was used for the parameter ω in the Wallenius ABC example?"

A symmetric Dirichlet prior.

week 7 Flashcards

(38 cards)