MCMC methods (simulation) Flashcards

(24 cards)

1
Q

What is Markov chain simulation good for?

A

When it is difficult to simulate using simple methods such as inverse method or rejection sampling, we can construct a Markov chain that converges to the hard to simulate distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define a Markov chain

A

A (time-homogeneous) Markov chain with state space Ω is a discrete time stochastic process X0, X1, X2, . . . so that:
* for all integers n ≥ 0, all events F ⊆ Ω,
* for all x_0, . . . , x_n ∈ Ω:
P(X_{n+1} ∈ F | X_0 = x_0, . . . , X_n = x_n) = P(X_{n+1} ∈ F | X_n = x_n) = P(x_n , F)

The distribution of X_0 is called the initial distribution of the chain,
P is called the transition kernel of the chain.
Together, the initial distribution and the transition kernel define the distribution of the Markov chain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an invariant/stationary Markov chain?

A

An invariant/stationary distribution of a Markov chain is a distribution Π on Ω such that
X_t ∼ Π ⇒ X_{t+1} ∼ Π

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an invariant density?

A

Suppose Π is a target distribution with density π, i.e. that for all event F ⊂ Ω,
Π(F) = ∫_F π(x)dx

We call π an invariant density, if for all events F ⊆ Ω,
∫_Ω π(x)P(x, F)dx = Π(F)
i.e. X_{t+1} ∼ Π.

DBC ⇒ reversibility ⇒ π is an invariant density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Proposal distribution

A

Given the current step X_t = x, this is the distribution for a
proposal Y_{n+1} = y that the next step in the Markov chain is y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Acceptance probability

A

This is a probability a(x, y) ∈ [0, 1] of accepting the proposal, i.e. that X_{t+1} = Y_{t+1} = y (otherwise X_{t+1} = X_t = x). This will be chosen to take care of the arbitrariness of the proposal distribution so that the target distribution becomes invariant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Proposal density

A

∀x ∈ Ω, let q(x, y) be a density function on y ∈ Ω so that for any
event F ⊆ Ω,
Q(x, F) = ∫_F q(x, y)dy = P(Y_{t+1} ∈ F | X_t = x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Rejection probability

A

r(x) = 1 − ∫ q(x, y)a(x, y)dy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Reversibility

A

The chain is reversible wrt Π if
X_0 ∼ Π ⇒ (X_0, X_1) ∼ (X_1, X_0)

DBC ⇒ reversibility ⇒ π is an invariant density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Detailed balance condition

A

For the target density: the Markov chain satisfies DBC (wrt π) if for all x, y ∈ Ω,
π(x) p(x, y) = π(y) p(y, x)
where p(x, y) := q(x, y) a(x, y).

DBC ⇒ reversibility ⇒ π is an invariant density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Irreducibility

A

A Markov chain is Π-irreducible if for any x ∈ Ω and any event F ⊆ Ω with Π(F) > 0, there exists an n such that P_n (x, F) > 0.

Intuitively irreducibility says that we can reach any relevant
part of E. Irreducibility is a necessary condition if a limit distribution should exist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Harris-recurrence

A

The chain is Harris recurrent if for any x ∈ Ω and any event F ⊆ Ω
with Π(F) > 0,
P(X_n ∈ F for infinitely many n | X0 = x) = 1.

Harris recurrence ⇒ irreducibility ⇒ uniqueness of the invariant distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Strong law of large numbers for Markov chains

A

If the Markov chain is irreducible, then there exists some C ⊆ Ω such that Π(C ) = 1 and for all x_0 ∈ C,
P( lim_{n→∞}ˆθn = θ | X_0 = x_0) = 1.

If it is also Harris recurrent, we can take C = Ω.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Periodic Markov chain

A

An irreducible Markov chain is periodic if there exists a partition Ω = ∪_{i=1}^n with n > 1, Π(A0) = 0, and
x ∈ A1 ⇒ P(x, A2) = 1,
x ∈ A2 ⇒ P(x, A3) = 1,

x ∈ An−1 ⇒ P(x, An ) = 1,
x ∈ An ⇒ P(x, A1) = 1.

Otherwise it is aperiodic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Markov chain convergence theorem

A

If the Markov chain is irreducible and aperiodic, then there exists some C ⊆ Ω such that Π(C ) = 1 and for all x_0 ∈ C and events F ⊆ Ω,
lim_{n→∞} P(X_n ∈ F | X_0 = x_0) = Π(F).

If it is also Harris recurrent, we can take C = Ω.
In practice we check whether X_n seems to stabilize for large enough n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Metropolis-Hastings algorithm

A
  • Choose a proposal density q(x, y), e.g. q(x, ·) ∼ Nd (x, σ2Id ).
  • Let the acceptance ratio be
    a(x, y) = min{1, π(y)q(y, x) / π(x)q(x, y)}

Let X_0 = x_0 be the initial value (π(x_0) > 0). For n = 0, 1, . . ., do iteratively:
1. Generate Y_{n+1} ∼ q(X_n , ·),
2. Generate U_{n+1} ∼ Unif([0, 1])
3. If U_{n+1} ≤ a(X_n , Y_{n+1}), let X_{n+1} = Y_{n+1}; else, let X_{n+1} = X_n.

17
Q

Hastings ratio

A

H (x, y) = π(y)q(y, x) / π(x)q(x, y)

18
Q

Properties of Metropolis-Hastings algorithm

A

DBC.
Irreducibility: fulfilled if q(x, y) > 0 for all x, y ∈ Ω,
Aperiodicity: fulfilled if there exists an event F ⊆ Ω so that Π(F) > 0 and H (x, ·) < 1 for all x ∈ F.

19
Q

What is the Gibbs sampler a special case of?

A

The Metropolis-Hastings algorithm, with acceptance probability 1.

20
Q

What is a trace plot and what is it used for?

A

If dim(Ω) = 1: plot X_t vs t.
Consider plots of various real statistics f(X_t ) vs t to check if the distribution has approximately converged at some point, i.e. escaped the burn in.

21
Q

How should sigma in a Metropolis-Hastings algorithm be chosen?

A

Rule of thumb: choose σ to get an average acceptance probability of 20%-40% (theoretical or simulated).

22
Q

What is it called if the Markov chain ”moves quickly around” in Ω?

A

The Markov chain has good mixing properties.

23
Q

What is a Markov function?

A

Let h : Nf → [0, ∞) denote any hereditary function. If for all x ∈ N_f with h(x) > 0 and all ξ ∈ S \ x,
h(x ∪ ξ)/h(x) depends on x only through x ∩ N_ξ
then h is said to be a Markov function (with respect to ∼)

24
Q

Metropolis algorithm

A

It is the Metropolis-Hastings algorithm with symmetric q_i.