Week 5 Flashcards by tyrion lannister

“What is the goal of the Metropolis-Hastings (MH) sampler?”

To produce a Markov chain of values x(1), x(2), … whose equilibrium distribution is a target distribution π.

How well did you know this?

Not at all

Perfectly

“In the MH algorithm, how is a candidate value x’ generated at iteration i?”

It is proposed from a transition probability (proposal distribution) q(x(i-1), x’).

How well did you know this?

Not at all

Perfectly

“What is the formula for the acceptance probability α(x(i-1), x’) in the general Metropolis-Hastings algorithm?”

α(x(i-1), x’) = min{1, [π(x’)q(x’, x(i-1))] / [π(x(i-1))q(x(i-1), x’)]}

How well did you know this?

Not at all

Perfectly

“How is the next state x(i) determined after calculating the acceptance probability α and drawing u ~ U(0, 1)?”

x(i) = x’ if u < α(x(i-1), x’), otherwise x(i) = x(i-1).

How well did you know this?

Not at all

Perfectly

“What is the proposal distribution q(x(i-1), x’) also called?”

The proposal distribution.

How well did you know this?

Not at all

Perfectly

“What are the two main types of Metropolis-Hastings samplers characterized by their proposal distribution?”

Metropolis-Hastings random walk sampler and Metropolis-Hastings independence sampler.

How well did you know this?

Not at all

Perfectly

“How is the proposed value x’ generated in an MH random walk sampler?”

x’ = x(i-1) + εi, where the distribution of εi is symmetric (e.g., Normal, t).

How well did you know this?

Not at all

Perfectly

“What condition simplifies the acceptance probability formula in an MH random walk sampler?”

The symmetry of the proposal distribution, which means q(x(i-1), x’) = q(x’, x(i-1)).

How well did you know this?

Not at all

Perfectly

“What is the simplified acceptance probability α(x(i-1), x’) for an MH random walk sampler?”

α(x(i-1), x’) = min{1, π(x’) / π(x(i-1))}

How well did you know this?

Not at all

Perfectly

“What is the characteristic of the proposal distribution q(x(i-1), x’) in an MH independence sampler?”

It does not depend on the previous state x(i-1), i.e., q(x(i-1), x’) = q(x’).

How well did you know this?

Not at all

Perfectly

“What is the acceptance probability α(x(i-1), x’) for an MH independence sampler?”

α(x(i-1), x’) = min{1, [π(x’)q(x(i-1))] / [π(x(i-1))q(x’)]}

How well did you know this?

Not at all

Perfectly

“When does the MH independence sampler method work well?”

When the proposal distribution q(x) is a good approximation of the target distribution π(x).

How well did you know this?

Not at all

Perfectly

“In the outlier modelling example, what distribution is used to model the scores yi|μ?”

A t-distribution with ν degrees of freedom, location μ, and scale σ: yi|μ ~ tν(μ, σ²).

How well did you know this?

Not at all

Perfectly

“What is the formula for the probability density function fμ(yi) of the tν(μ, σ²) distribution?”

fμ(yi) = [Γ((ν+1)/2) / (√(νπσ²)Γ(ν/2))] * [1 + (yi - μ)² / (νσ²)]^(-(ν+1)/2)

How well did you know this?

Not at all

Perfectly

“In the outlier example, what were the assumed fixed values for σ² and ν?”

σ² = 260 and ν = 3.

How well did you know this?

Not at all

Perfectly

“In the outlier example, what was the prior distribution assumed for μ?”

μ ~ N(100, 100²).

How well did you know this?

Not at all

Perfectly

“How is the posterior distribution π(μ|y) related to the prior and likelihood in the outlier example?”

π(μ|y) ∝ π(μ) * Π[i=1 to 21] fμ(yi).

How well did you know this?

Not at all

Perfectly

“In the MH random walk implementation for the outlier example, what proposal distribution was used for μ’?”

μ’ ~ N(μ(i-1), τ²).

How well did you know this?

Not at all

Perfectly

“What was the acceptance probability formula used in the MH random walk for the outlier example?”

α(μ(i-1), μ’) = min{1, π(μ’|y) / π(μ(i-1)|y)}.

How well did you know this?

Not at all

Perfectly

“In the outlier example MH run, what was the posterior mean of μ approximately?”

Study These Flashcards

Approximately 94.6.

“In the outlier example MH run, what was the 95% credible interval for μ?”

Study These Flashcards

(87.3, 101.8).

“What parameter controls the variance of the proposal distribution in the MH random walk sampler?”

Study These Flashcards

τ (or τ² depending on parameterization).

“What happens if the proposal variance τ² is too small in an MH random walk?”

Study These Flashcards

The chain typically moves often but with very small jumps (slow exploration).

“What happens if the proposal variance τ² is too large in an MH random walk?”

Study These Flashcards

The chain typically moves rarely (low acceptance rate) but with large jumps when it does move.

"What does the 'average acceptance rate' refer to in MH?"

The average of the acceptance probabilities α over the iterations of the sampler.

"What is a general guideline for the optimal average acceptance rate for MH random walk samplers aiming for low variability?"

Between 0.2 and 0.3.

"In the MH independence sampler example for the outlier model, what distribution was proposed as an approximation q(μ)?"

A Normal distribution, N(μ*, σ*²).

"In the 'better approximation' for the MH independence sampler, how was σ*² related to the posterior variance under a Normal likelihood assumption?"

σ*² was 4 times the posterior variance under the Normal likelihood assumption.

"What is the acceptance probability formula used in the MH independence sampler implementation shown?"

α(μ(i-1), μ') = min{1, [π(μ'|y)q(μ(i-1))] / [π(μ(i-1)|y)q(μ')]}}, where q is the N(μ*, σ*²) density.

"What are the two main practical issues when using MCMC samples for inference?"

Convergence and Mixing.

"What does 'Convergence' refer to in MCMC?"

Whether the Markov chain has run long enough to reach its equilibrium (target) distribution, making the sample representative.

"What does 'Mixing' refer to in MCMC?"

How quickly the chain explores the target distribution, related to the dependence (autocorrelation) between consecutive samples.

"In the Gibbs sampling example for a bivariate normal with correlation ρ, what is the full conditional distribution of x1 given x2?"

N(ρx2, 1 - ρ²).

"In the Gibbs sampling example for a bivariate normal with correlation ρ, what is the full conditional distribution of x2 given x1?"

N(ρx1, 1 - ρ²).

"How does the correlation ρ affect the conditional variance in the bivariate normal Gibbs sampler?"

As ρ increases (towards +/- 1), the conditional variance (1 - ρ²) decreases.

"How does high correlation (e.g., ρ=0.99) affect the convergence of the Gibbs sampler for the bivariate normal, especially with poor starting values?"

It significantly slows down convergence.

"What is 'initialization bias' in MCMC?"

The bias in estimates caused by including early samples from the chain that were generated before it reached the target distribution, especially when starting far from the main probability mass.

"What is the common method to avoid initialization bias?"

Discarding an initial portion of the Markov chain samples, known as the 'burn-in' period.

"How is the estimate of a posterior mean calculated after discarding the burn-in samples (M samples)?"

Estimate = (1 / (N-M)) * Σ[i=M+1 to N] x(i), where N is the total number of samples.

"What are the first M discarded samples called?"

Burn-in samples or burn-in period.

"What practical approach is often used to determine an appropriate burn-in length M?"

Inspecting trace plots of the chain's output to visually assess when it appears to have reached a stationary behavior.

"How does high correlation ρ affect the dependence between consecutive samples in the bivariate normal Gibbs sampler?"

Higher |ρ| leads to higher dependence (autocorrelation) between consecutive samples.

"What is the Effective Sample Size (Neff)?"

A measure of the number of independent samples that would provide the same amount of information (in terms of variance of the mean estimate) as the dependent samples obtained from an MCMC run.

"How is the variance of an MCMC estimate f̄ related to the variance of f(x) under the target distribution and Neff?"

V[f̄] = V[f(x)] / Neff.

"How does Neff relate to the total number of samples N?"

Neff ≤ N.

"What happens to the Effective Sample Size (Neff) as the dependence (autocorrelation) in the MCMC chain increases?"

Neff decreases.

"What is 'thinning' in MCMC?"

A method to reduce dependence in the used samples by keeping only every k-th value of the chain (e.g., x(k), x(2k), x(3k), ...).

"What is 'reparameterization' as a method to deal with dependence in MCMC?"

Transforming the correlated variables into a new set of variables with lower (ideally zero) correlation, and running MCMC on the transformed variables.

"In the bivariate normal example, how can X1 and X2 be reparameterized using independent Z1, Z2 ~ N(0,1) to remove correlation?"

X1 = Z1, X2 = ρZ1 + √(1-ρ²)Z2.

Week 5 Flashcards

(49 cards)