MCMC methods (simulation) Flashcards
(24 cards)
What is Markov chain simulation good for?
When it is difficult to simulate using simple methods such as inverse method or rejection sampling, we can construct a Markov chain that converges to the hard to simulate distribution.
Define a Markov chain
A (time-homogeneous) Markov chain with state space Ω is a discrete time stochastic process X0, X1, X2, . . . so that:
* for all integers n ≥ 0, all events F ⊆ Ω,
* for all x_0, . . . , x_n ∈ Ω:P(X_{n+1} ∈ F | X_0 = x_0, . . . , X_n = x_n) = P(X_{n+1} ∈ F | X_n = x_n) = P(x_n , F)
The distribution of X_0 is called the initial distribution of the chain,
P is called the transition kernel of the chain.
Together, the initial distribution and the transition kernel define the distribution of the Markov chain.
What is an invariant/stationary Markov chain?
An invariant/stationary distribution of a Markov chain is a distribution Π on Ω such thatX_t ∼ Π ⇒ X_{t+1} ∼ Π
What is an invariant density?
Suppose Π is a target distribution with density π, i.e. that for all event F ⊂ Ω,Π(F) = ∫_F π(x)dx
We call π an invariant density, if for all events F ⊆ Ω,∫_Ω π(x)P(x, F)dx = Π(F)
i.e. X_{t+1} ∼ Π.
DBC ⇒ reversibility ⇒ π is an invariant density
Proposal distribution
Given the current step X_t = x, this is the distribution for a
proposal Y_{n+1} = y that the next step in the Markov chain is y.
Acceptance probability
This is a probability a(x, y) ∈ [0, 1] of accepting the proposal, i.e. that X_{t+1} = Y_{t+1} = y (otherwise X_{t+1} = X_t = x). This will be chosen to take care of the arbitrariness of the proposal distribution so that the target distribution becomes invariant
Proposal density
∀x ∈ Ω, let q(x, y) be a density function on y ∈ Ω so that for any
event F ⊆ Ω,Q(x, F) = ∫_F q(x, y)dy = P(Y_{t+1} ∈ F | X_t = x)
Rejection probability
r(x) = 1 − ∫ q(x, y)a(x, y)dy
Reversibility
The chain is reversible wrt Π ifX_0 ∼ Π ⇒ (X_0, X_1) ∼ (X_1, X_0)
DBC ⇒ reversibility ⇒ π is an invariant density
Detailed balance condition
For the target density: the Markov chain satisfies DBC (wrt π) if for all x, y ∈ Ω,π(x) p(x, y) = π(y) p(y, x)
where p(x, y) := q(x, y) a(x, y).
DBC ⇒ reversibility ⇒ π is an invariant density
Irreducibility
A Markov chain is Π-irreducible if for any x ∈ Ω and any event F ⊆ Ω with Π(F) > 0, there exists an n such that P_n (x, F) > 0.
Intuitively irreducibility says that we can reach any relevant
part of E. Irreducibility is a necessary condition if a limit distribution should exist.
Harris-recurrence
The chain is Harris recurrent if for any x ∈ Ω and any event F ⊆ Ω
with Π(F) > 0,
P(X_n ∈ F for infinitely many n | X0 = x) = 1.
Harris recurrence ⇒ irreducibility ⇒ uniqueness of the invariant distribution.
Strong law of large numbers for Markov chains
If the Markov chain is irreducible, then there exists some C ⊆ Ω such that Π(C ) = 1 and for all x_0 ∈ C,
P( lim_{n→∞}ˆθn = θ | X_0 = x_0) = 1.
If it is also Harris recurrent, we can take C = Ω.
Periodic Markov chain
An irreducible Markov chain is periodic if there exists a partition Ω = ∪_{i=1}^n with n > 1, Π(A0) = 0, and
x ∈ A1 ⇒ P(x, A2) = 1,
x ∈ A2 ⇒ P(x, A3) = 1,
…
x ∈ An−1 ⇒ P(x, An ) = 1,
x ∈ An ⇒ P(x, A1) = 1.
Otherwise it is aperiodic.
Markov chain convergence theorem
If the Markov chain is irreducible and aperiodic, then there exists some C ⊆ Ω such that Π(C ) = 1 and for all x_0 ∈ C and events F ⊆ Ω,
lim_{n→∞} P(X_n ∈ F | X_0 = x_0) = Π(F).
If it is also Harris recurrent, we can take C = Ω.
In practice we check whether X_n seems to stabilize for large enough n.
Metropolis-Hastings algorithm
- Choose a proposal density q(x, y), e.g. q(x, ·) ∼ Nd (x, σ2Id ).
- Let the acceptance ratio be
a(x, y) = min{1, π(y)q(y, x) / π(x)q(x, y)}
Let X_0 = x_0 be the initial value (π(x_0) > 0). For n = 0, 1, . . ., do iteratively:
1. Generate Y_{n+1} ∼ q(X_n , ·),
2. Generate U_{n+1} ∼ Unif([0, 1])
3. If U_{n+1} ≤ a(X_n , Y_{n+1}), let X_{n+1} = Y_{n+1}; else, let X_{n+1} = X_n.
Hastings ratio
H (x, y) = π(y)q(y, x) / π(x)q(x, y)
Properties of Metropolis-Hastings algorithm
DBC.
Irreducibility: fulfilled if q(x, y) > 0 for all x, y ∈ Ω,
Aperiodicity: fulfilled if there exists an event F ⊆ Ω so that Π(F) > 0 and H (x, ·) < 1 for all x ∈ F.
What is the Gibbs sampler a special case of?
The Metropolis-Hastings algorithm, with acceptance probability 1.
What is a trace plot and what is it used for?
If dim(Ω) = 1: plot X_t vs t.
Consider plots of various real statistics f(X_t ) vs t to check if the distribution has approximately converged at some point, i.e. escaped the burn in.
How should sigma in a Metropolis-Hastings algorithm be chosen?
Rule of thumb: choose σ to get an average acceptance probability of 20%-40% (theoretical or simulated).
What is it called if the Markov chain ”moves quickly around” in Ω?
The Markov chain has good mixing properties.
What is a Markov function?
Let h : Nf → [0, ∞) denote any hereditary function. If for all x ∈ N_f with h(x) > 0 and all ξ ∈ S \ x,
h(x ∪ ξ)/h(x) depends on x only through x ∩ N_ξ
then h is said to be a Markov function (with respect to ∼)
Metropolis algorithm
It is the Metropolis-Hastings algorithm with symmetric q_i.