Week 4 Flashcards by tyrion lannister

“What is the general purpose of Monte Carlo methods discussed in the context of a target distribution π?”

To evaluate expectations of the form Eπ(f) = ∫ f(x)π(x)dx, especially when analytical evaluation is not possible.

How well did you know this?

Not at all

Perfectly

“According to the Strong Law of Large Numbers (SLLN), how can Eπ(f) be approximated using i.i.d. samples X1, …, Xn from π?”

Eπ(f) ≈ (1/n) * Σ[i=1 to n] f(Xi) for large n.

How well did you know this?

Not at all

Perfectly

“In the toy example, what is the target distribution π(x) and the function f(x)?”

π(x) is the standard normal density (1/√(2π)) * e^(-x²/2), and f(x) = x.

How well did you know this?

Not at all

Perfectly

“What is the exact value of Eπ(x) when π is the standard normal distribution?”

Eπ(x) = 0.

How well did you know this?

Not at all

Perfectly

“What R function is mentioned for sampling from a Normal distribution?”

rnorm(n, mean, sd)

How well did you know this?

Not at all

Perfectly

“What R function is mentioned for computing the sample mean?”

mean(x)

How well did you know this?

Not at all

Perfectly

“What does the graph on slide 5 illustrate?”

It shows the convergence of the running sample mean xbar(s) = (1/s) * Σ[i=1 to s] xi towards the true mean (0 in this example) as s increases, illustrating the SLLN.

How well did you know this?

Not at all

Perfectly

“What are the two main problems mentioned that can prevent direct Monte Carlo simulation?”

It’s not possible to sample directly from the target distribution π. 2. π is only known up to a normalizing constant.

How well did you know this?

Not at all

Perfectly

“What is a potential solution mentioned for the problem of not being able to sample directly from π?”

Importance Sampling.

How well did you know this?

Not at all

Perfectly

“What is a potential solution mentioned for both problems (cannot sample directly, unknown normalizing constant)?”

Markov Chain Monte Carlo (MCMC) Methods.

How well did you know this?

Not at all

Perfectly

“In a typical Bayesian model setup, how are the data Y1, …, Yn related to the parameter θ?”

Conditionally independent and identically distributed given θ, following a distribution fθ: Y1, …, Yn | θ ~ iid fθ.

How well did you know this?

Not at all

Perfectly

“How is the parameter θ itself modelled in a Bayesian setup?”

It is treated as a random variable with a prior distribution π: θ ~ π.

How well did you know this?

Not at all

Perfectly

“According to Bayes’ Theorem (proportional form), how is the posterior distribution π(θ|y) related to the likelihood and prior?”

Posterior Distribution ∝ Likelihood × Prior Distribution, specifically π(θ|y) ∝ [Π(i=1 to n) fθ(yi)] * π(θ).

How well did you know this?

Not at all

Perfectly

“What is the core idea behind MCMC algorithms for approximating a target distribution π?”

To construct a Markov Chain whose stationary distribution is the target distribution π.

How well did you know this?

Not at all

Perfectly

“How is the expectation Eπ(f) approximated using samples X1, …, Xn from a Markov Chain with stationary distribution π?”

Using the ergodic theorem (SLLN for Markov Chains): Eπ(f) ≈ (1/n) * Σ[i=1 to n] f(Xi) for large n (after burn-in).

How well did you know this?

Not at all

Perfectly

“How can the posterior expectation E[θ|y] be expressed as an integral?”

E[θ|y] = ∫ θ * π(θ|y) dθ.

How well did you know this?

Not at all

Perfectly

“How can the posterior cumulative distribution function P(θ < a | y) be expressed as an integral?”

P(θ < a | y) = ∫ I(θ < a) * π(θ|y) dθ, where I is the indicator function.

How well did you know this?

Not at all

Perfectly

“What is the definition of the indicator function I(A)?”

I(A) = 1 if A is true, and I(A) = 0 if A is not true.

How well did you know this?

Not at all

Perfectly

“How can the posterior expectation E[θ|y] be approximated using N samples θ(1), …, θ(N) from the posterior distribution π(θ|y)?”

E[θ|y] ≈ (1/N) * Σ[i=1 to N] θ(i).

How well did you know this?

Not at all

Perfectly

“How can the posterior probability P(θ < a | y) be approximated using N samples θ(1), …, θ(N) from the posterior distribution π(θ|y)?”

P(θ < a | y) ≈ (1/N) * Σ[i=1 to N] I(θ(i) < a).

How well did you know this?

Not at all

Perfectly

“What are the two specific MCMC algorithms mentioned?”

The Gibbs sampler and the Metropolis-Hastings sampler.

How well did you know this?

Not at all

Perfectly

“What is the goal of the Gibbs Sampler in the context of a random vector (X1, …, Xd) with density π?”

Study These Flashcards

To generate samples from the joint density π by iteratively sampling from the full conditional distributions.

“What is the output of the Gibbs sampler algorithm?”

Study These Flashcards

A d-dimensional Markov chain {(X1(n), …, Xd(n)), n = 0, 1, …} whose distribution converges to π.

“How is the j-th full conditional density πj(y | xi, i ≠ j) defined?”

Study These Flashcards

πj(y | xi, i ≠ j) = π(x1, …, xj-1, y, xj+1, …, xd) / π(x-j), where π(x-j) is the marginal density integrating out the j-th component.

"How is the j-th full conditional density πj(y | xi, i ≠ j) related to the joint density π if we only need it up to a constant?"

πj(y | xi, i ≠ j) ∝ π(x1, ..., xj-1, y, xj+1, ..., xd), treating y as the variable and all other x's as fixed.

"Describe the general update step for the j-th component Xj at iteration (i+1) in the Gibbs sampler."

Sample Xj(i+1) from its full conditional distribution πj using the most recently updated values for other components: Xj(i+1) ~ πj(y | X1(i+1), ..., Xj-1(i+1), Xj+1(i), ..., Xd(i)).

"In the semi-conjugate normal model example, what is the likelihood model for Y1, ..., Yn given μ and φ?"

Y1, ..., Yn | μ, φ ~ i.i.d. N(μ, 1/φ).

"In the semi-conjugate normal model example, what is the prior distribution for μ?"

μ ~ N(μ0, σ0²).

"In the semi-conjugate normal model example, what is the prior distribution for the precision φ?"

φ ~ Ga(α, β).

"What is the target distribution for the Gibbs sampler in the semi-conjugate normal model example?"

The joint posterior distribution π(φ, μ | y).

"What is the form of the full conditional distribution for φ, π1(φ|μ, y), in the semi-conjugate normal model?"

Gamma(α + n/2, β + (1/2) * Σ[i=1 to n] (yi - μ)²).

"What is the form of the full conditional distribution for μ, π2(μ|φ, y), in the semi-conjugate normal model?"

Normal( (nφȳ + μ0/σ0²) / (nφ + 1/σ0²), 1 / (nφ + 1/σ0²) ).

"In the numerical example (slide 21), what were the chosen hyperparameters α, β, μ0, σ0²?"

α = 1, β = 10⁻⁴, μ0 = 0, σ0² = 10⁸.

"In the numerical example (slide 21), what were the starting values μ(0) and φ(0)?"

μ(0) = 0, φ(0) = 1.

"In the first Gibbs iteration (slide 22), updating φ, what Gamma distribution was sampled from?"

Gamma(11, 7629.4).

"In the first Gibbs iteration (slide 23), updating μ, what Normal distribution was sampled from?"

Normal(24.98, 20.83).

"What was the simulated value φ(1) after the first update?"

φ(1) = 0.0024.

"What was the simulated value μ(1) after the first update?"

μ(1) = 25.51.

"In the second Gibbs iteration (slide 24), updating φ, what Gamma distribution was sampled from?"

Gamma(11, 1392.2).

"In the second Gibbs iteration (slide 25), updating μ, what Normal distribution was sampled from?"

Normal(24.98, 7.35).

"What does a trace plot in MCMC typically show?"

It shows the sampled value of a parameter at each iteration of the algorithm.

"What is the approximate posterior mean for μ found in the example (slide 27)?"

E[μ|y] ≈ 25.1.

"What is the approximate posterior mean for σ² (=1/φ) found in the example (slide 27)?"

E[σ²|y] ≈ 146.7.

"In the change-point model, what distribution is assumed for the yearly counts yi before the change-point m?"

yi | λ ~ Po(λ) for i = 1, ..., m.

"In the change-point model, what distribution is assumed for the yearly counts yi after the change-point m?"

yi | φ ~ Po(φ) for i = m+1, ..., n.

"What prior distribution is assumed for the change-point location m?"

m ~ Uniform({1, ..., n}).

"What prior distribution is assumed for the rates λ and φ in the change-point model?"

λ ~ Ga(a, b) and φ ~ Ga(a, b), independently.

"What is the full conditional distribution for λ given (φ, m, y) in the change-point model?"

Ga(a + Σ[i=1 to m] yi, m + b).

"What is the full conditional distribution for φ given (λ, m, y) in the change-point model?"

Ga(a + Σ[i=m+1 to n] yi, n - m + b).

"What form does the full conditional distribution for the change-point m given (λ, φ, y) take?"

A discrete distribution on {1, ..., n} where p(m) ∝ exp{-mλ - (n-m)φ} * λ^(Σ[i=1 to m] yi) * φ^(Σ[i=m+1 to n] yi).

"Based on the results (slide 33), what is the likely range for the change-point year index m?"

Between 35 and 46 (corresponding to years 1885 to 1896).

"Based on the results (slide 34), what is the approximate posterior mode/mean for the rate λ before the change-point?"

Around 3.

"Based on the results (slide 35), what is the approximate posterior mode/mean for the rate φ after the change-point?"

Around 1.

Week 4 Flashcards

(53 cards)