CHAPTER 5: Continuous time Markov chain models Flashcards

1
Q

A continuous time stochastic process

A

A continuous time stochastic process is then a set of random variables X_t for real numbers t; often we assume time starts at 0 so that t ∈ R+

given known state at time 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A continuous time stochastic process is a Markov chain

A

A continuous time stochastic process is a Markov chain if it has discrete states

and satisfies the condition: for
t_n bigger than t_{n-1} bigger than … bigger than t_1 ≥ 0,

P{X_{t_n} = i_n | X_{t_{n−1}} = i_{n−1}, . . . X_{t_1} = i_1} =
P{X_{t_n} = i_n | X_{t_{n−1}} = i_{n−1}}
for each n ≥ 2

  • maintain current value for random length of time and then jump to another value

TIME HOMOGENEOUS
P(X_{t+τ} = j | X_t = i) = p_{ij} (τ )
not depending on t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For a continuous-time chain the Chapman-Kolmogorov equations are,

A

For a continuous-time chain the Chapman-Kolmogorov equations are,
pij (t + τ ) = SUM over k of
[p_{ik}(t)p_{kj}(τ )]

which may be written in matrix terms as
P(t + τ ) = P(t)P(τ )
where P(t) = (p_{ij}(t))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For a continuous-time chain :
matrix terms of chapman kolmogorov

P(t + τ ) = P(t)P(τ )

A

extending the chapman kolmogorov: we apply multiple times ie we only need to know how TM behaves for very small time intervals

because of this if we knew the value of P(t) for all t in a short
interval [0, δt] then we would be able to find it for any larger t too because the C-K equations would allow successive extensions to [0, 2δt], [0, 4δt] etc

we should be able to parametrize the chain in terms of the transition
probabilities over arbitrarily short times. As δt → 0 it is reasonable to insist that
P(δt) → I

(the unit matrix)

we focus on how pij → 0 (i ≠ j) and pii → 1.

*the zero step transition matrix should be identity- as 1 to 1 and any other state to itself with prob 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Matrices For a continuous-time chain :

A

in discrete time we had nth powers of TM but in this case

we cant use this fact, we cant describe the chain by the one step TM

we can only apply the chapman kolmogorov

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Suppose that for
δt very small the probability of a transition from i to j is
p_ij (δt) = g_{ij}δt + o(δt)

i ≠ j,

(infinitesimal transition scheme)

A

ie for small time step, if currently in state i transition prob tends to 0 as interval tends to 0 in linear way

ie constant (derivative) * δt

o(δt) (last term goes to 0 faster than δt goes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Suppose that for
δt

pii(δt) = 1 + g_{ii}δt + o(δt)

(infinitesimal transition scheme

A

transition prob of i to itself

linear terms + term going to 0 faster than δt

(g_ii will be negative to ensure probs tend to 1)
where the gij are constants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DEF: generator matrix of the continuous Markov chain

A

Matrix G with elements g_ij is the (infinitesimal) generator of the Markov chain

ie tends to I

st 
p_ij (δt) = g_{ij}δt + o(δt) 
pii(δt) = 1 + g_{ii}δt + o(δt)

diagonal and off-diag elements

(When a continuous time Markov chain is used for modelling, it is usually not the transition probabilities pij (t) that are specified, but the generator.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Properties of generator matrix of the continuous Markov chain

its possible to find chains which break

A

**Σ_j [p_ij(δt)] = 1 as rows of TM

we expect that
pii(δt) = 1 - Σ_{i ≠ j} [g_{ij}δt + o(δt)]

ie pii(δt) = 1 - sum of non-diagonal entries

= 1 - Σ_{i ≠ j} [g_{ij}]δt + o(δt)

so g_ii= -Σ_{i ≠ j} [g_{ij}]

(valid if first sum is finite)

will certainly be true for chains with finite state space. In general it is
not guaranteed. However, most chains used in modelling satisfy it. They are called
conservative chains

A natural way to think about gij for j ≠ i is as the instantaneous rate at which, given
that the chain is in state i, a transition to state j is expected to occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Conservative chains

A

Satisfy g_ii= -Σ_{i ≠ j} [g_{ij}]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

EXAMPLE 22 A two state chain (continuous)

A

Let S = {1, 2} and set
G=
[−α α
[ β −β]

α, β bigger than 0

(no constraints on being less than one but off diagonals non negative and diagonals negative)

G is the generator matrix for a continuous time Markov chain with two
states. (As previously, we could think of this as a weather model, with the states being “wet”
and “dry”.)

ie its raining now will it continue to rain for some time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

EXAMPLE 23 The Poisson process (continuous)

Consider N_t, # events in (0,t] in a Poisson process with rate λ.

A

Consider N_t, # events in (0,t] in a Poisson process with rate λ.

P(Nt+δt = n + 1|Nt = n)
= λδt + o(δt)

and that

P(Nt+δt > n + 1|Nt = n) = o(δt)

(rows sum to 0)

we can consider a Poisson process as a continuous time Markov chain on N_0 with generator
given by 
g_{n,n+1} = λ, 
g_{n,n} = −λ 
and other entries zero

The chain is not irreducible: values can only increase, so it is not possible to get from state i to state j if j less than i.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

EXAMPLE 24 Linear birth process

and d so each population member reproduces at rate λ

A

Consider pop where each member reproduces at rate λ

each member of pop has probability of a reproduction event between times t and t + δt is
λt + o(δt))

(no other ways pop can change)

Then letting N_t be the size of the population at time t,
P(N_{t+δt} = i + 1|N_t = i) = 
λ_iδ_t + o(δ_t)
 and
P(N_{t+δt} = i|N_t = i) = 
1 − λ_iδ_t + o(δ_t), with 
P(N_t+δt = j|N_t = i) being 
o(δt) if j is neither i
nor i + 1
continuous markov chain :
g_{ij} =
{λi    j=i+1
{-λi   j=1
{0   o/w

(not irreducible as cant leave state 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

poisson processes

A

are not irreducible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

EXAMPLE 25: linear birth-death process

and d so each population member reproduces at rate λ

and dies at rate µ

A

Extend example 24: allow deaths so each population reproduces at rate λ:

P(Nt+δt = i + 1|Nt = i) =
λiδt + o(δt),

and dies at rate µ giving: P(N_{t+δt} = i − 1|N_t = i)
= µ_iδt + o(δt)

P(Nt+δt = i|Nt = i) = 1 − (λ + µ)iδt + o(δt)

(still not irreducible as cant leave state 0)

continuous MC:
g_ij =
{λ_i          , j = i + 1
{−(λ + µ)i   'j = i
{µi             ,j = i − 1
{0 otherwise.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

EXAMPLE 26: M/M/n queue

Imagine that we have a queue with n servers, and assume that:
• Arrivals of customers occur as a Poisson process of rate λ.
• There are n servers, who can each serve one customer at a time, and will automatically
move on to the next customer in the queue when their current customer leaves.
• Each customer currently being served completes their service at rate µ, so that the probability they leave in (t, t + δt] is µδt + o(δt).

A

Let N_t= number in the queue at time t.

*If f i ≤ n, then in (t + δt], we may have an
arrival, which is an event of a Poisson process, so has probability λδt + o(δt), or we may have a
service completion, which happens with probability µiδt + o(δt) as there are i customers being served, or we may have more than one event with probability o(δt)

gi,i+1 = λ,
gi,i−1 = µi 
gii = −(λ + µi). 

*if i is bigger than n thengi,i+1 = λ still but only n customers being served

g_i,i−1 = µn
and
g_ii = −(λ + µn)

This chain is irreducible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

EXAMPLE 27: Generalized birth-death process

prev examples generalized

A
This has transitions
i 
→i + 1,        λi δt + o(δt);
→ i,            1 − (λi + µi) δt + o(δt);
→ i − 1,        µi δt + o(δt).
corresponding to
gij =
{λi    if j = i + 1
{−λi − µi   if j = i
{µi  if j = i − 1
{0 otherwise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

continuous MC and transition probabilities

obtained from G

A

writing generator defs in matrix notation:

P(δt) − I = Gδt + o(δt),
δt → 0

(o(δt), goes to 0)

ie
limit as δt → 0 of
[ (P(δt)-I)/δt] = G

(one way of thinking about G is that its the derivative of t-step transition matrix at time 0)
S by S matrix or infinite state space

considering chain from starting state at time 0 to state at time t + δt

Chapman-Kolmogorov eq:
transitions over the period t + δt. expressed in terms of those between 0 and t and in final short time δt.

P(t + δt) = P(t) P(δt)
Then

[P(t + δt) − P(t)]/δt =
[P(t) (P(δt) − I)]/δt

Let δt → 0 and e limit of the left hand term is deriv P’(t) of P at time t.

ie P’(t) = P(t) G,
known as the forward differential equations.

backward differential equations from paths over 0, t+ into short (δt) and long (t)
sections rather than vice versa

P’(t) = G P(t).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

note: generator matrix

A

*All continuous chains with finite state space rows sum to 0 (we will only see in course st true)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the forward differential equations.

A

P’(t) = P(t) G,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

backward differential equations

A

P’(t) = G P(t).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Solutions of forwards and backward diff eqs

A

for finite chains:
the equations are matrix generalizations of the simple DE

y’(t)=g y(t) with constant g

solution: y(t)=y(0)e^{gt}

With P(0) = I, solution for the forward and backward equations

P(t) = exp(Gt),
where, for a matrix G, the exponential function is defined as
exp(Gt) =
Σ_{i=0,∞} [(Gt)^i/i!]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Spectral decomposition of G:

A

Assuming G is diagonalizable, the spectral representation of G (cf (1)) is
G = T DT −1

where T is a matrix whose columns are right eigenvectors vector(t) of G, so that
Gt = dt, (t is bold)
and D is a diagonal matrix of the corresponding eigenvalues d

D=
[0    0     ... ...]
[0  d_2  0 ... ]
[0    0   d_3..]
[...   ...    ...   ...]

*conservatism of G guaranteeing that 0 is an eigenvalue (NOT 1)

G^2= TDT-1 TDT-1= TD^2T-1

G^n = T D^nT−1, 
n = 1, 2, . . .
where D^n=
[0    0     ... ...]
[0  d^n_2  0 ... ]
[0    0   d^n_3..]
[...   ...    ...   ...]
24
Q

Solutions of forwards and backward diff eqs

expressed in matrix

A

Spectral representation of G allows exp(Gt) to be found

P(t) = P(0) exp(Gt).

exp(Gt)= Σ_{n=0,∞} [(Gt)^n/n!]
= T ( Σ_{n=0,∞} [(Dt)^n/n!]T^-1

= T
[ 1  0  ...  ...]
[0  Σ[(d_2 t)^n/n!] 0 ...]
[0   0    Σ[(d_3 t)^n/n!] ...]
[...  ...  ...  ...]
T^-1
=T
[1  0 ... ...]
[0 e^{d_2 t} 0 ...]
[0  0  e^{d_3 t} ...]
[... ... ... ...]
T^-1
25
EXAMPLE 28 Two state chain G= [-1 1] [2 -2]
* if currently in state 1 then prob of jumping to state 2 in δt is δt * in state 2 prob of jump to state 2 in δt is 2δt tend to jump faster away from state 2 than state 1 (expect more time in state 1 as t increases) G has eigenvalues 0 and -3, diagonalize G= [1 1 ][0 0][2/3 1/3] [1 -2][0 -3][1/3 -2/3] Hence P(t)= [1 1 ][0 0 ][2/3 1/3] [1 -2][0 e^{-3}][1/3 -2/3] = [ (2/3) + (1/3)e^{-3t} (1/3) -(1/3)e^{-3t}] [(2/3) - (2/3)e^{-3t} (1/3) +(2/3)e^{-3t}] (TM rows sum to 1) so P(t) tends to [2/3 1/3] [2/3 1/3] as t tends to infinity ( as e^{-3t} tends to 0) *prob of being in state 1 a long time in the future is 2/3 and 2 is 1/3 entries converge to some kind of stat dist?
26
Solution of forward equations for Poisson process
(missed calculations in lectures) ... If we treat s as fixed, this is an ordinary differential equation with respect to t, and its solution is F_{N_t}(s) = C exp (λ(s − 1)t). F_{N_t}(s) = exp (λ(s − 1)t). A Poisson distribution with parameter µ has probability generating function given by Σ{k=0,∞} [((µ^k e^−µ) /k!) s^k] = e^−µ * Σ{k=0,∞} [(µs)^k /k!] = e^−µ e^µs = e^µ(s−1) , so we can recognise F_{N_t}(s) as the probability generating function of a Poisson distribution with parameter λt. The probability generating function determines the distribution, so we do indeed have that Nt ∼ P o(λt).
27
limiting distributions
Assume that as t → ∞ the transition probabilities pij (t) converge to some limit πj for each j, whatever initial state i In matrix each row of P(t) tends to row vector bold(π ) of the π_j Given pij (t) converge to constants expect their time derivatives to converge to zero i.e. P'(t) → 0 Thus forward equations P'(t) = P(t) G as t → ∞ the first term tends to 0 and each row of second term tends to bold(π)G st limiting probabilities satisfy bold(π)G=0 (limiting distribution satisfies this)
28
Stationary dists
Say that a distribution bold(π) = (πj ) (a row vector) on the states is stationary if πP(t) = π for all t > 0.
29
Stationary dists and limiting dists
*If P(t) can be expressed as exp(Gt) then a distribution π is a stationary distribution if and only if π satisfies bold(π)G=0 condition In particular this identifies stationary and limiting distributions for finite chains. The identification holds for many infinite chains too.
30
skeleton chain
An interesting way to think about the limiting behaviour of a continuous time chain is to imagine observing the chain only at discrete times 0, δt, 2δt, . . . . The resulting sequence is a discrete time Markov chain (called a skeleton chain (obtain a discrete MC from a continuous one and deduce results from another ) *skeleton chains are automatically aperiodic so we will use existence of limiting dists
31
if continuous chain is irreducible (all states communicate in the sense that p_ij (t) bigger than 0 and p_ij(s) bigger than 0 for some t and s), then either if the chain is irreducible there are possible routes from I to j and j to I so positive transition probs
1) (STABLE CASE)the system is stable and the limiting distribution is limit as t tends to infinity of [ p_ij (t)] = π_j is the unique solution of bold(π)G = 0. (we have a limiting dist=stat dist) 2) (corres to null recccurence or transience for discrete) pij(t) → 0 for all states and bold(π)G = 0 has no solution (in discrete case we required aperiodicity but In continuous we assume this)
32
EXAMPLE 30: two state chain, find a stationary distribution for the chain in Example 22 Let S = {1, 2} and set G= [−α α [ β −β] α, β bigger than 0
``` We solve (π1 π2)* [−α α] [β −β] = (0 0) ``` −απ1 + βπ2 = 0 απ1 −βπ2 = 0 so: π2 = (α/β)π1 ``` and constraint π1 + π2 = 1 Stationary dist is bold(π) = ( (β)/(α+β) (α)/(α+β)) ```
33
Example 31. Generalized birth-death process finding stat dist ``` This has transitions i →i + 1, λi δt + o(δt); → i, 1 − (λi + µi) δt + o(δt); → i − 1, µi δt + o(δt). ``` ``` corresponding to gij = {λi if j = i + 1 {−λi − µi if j = i {µi if j = i − 1 {0 otherwise ```
``` G= [−λ_0 λ_0 0 0 0 …] [ µ1 −(λ1 + µ1) λ1 0 0 …] [0 µ2 −(λ2 + µ2) λ2 0 …] [0 0 µ3 −(λ3 + µ3) λ3 …] [ ... ... ... ... ... …] ``` The equations for the stationary distribution (π_0 π1 …)G = 0 are therefore λ_0π_0 = µ_1π_1 (λj + µj)πj = λ_{j−1}π_{j−1} + µ_{j+1}π_{j+1}, j ≥ 1 π1 = (λ0/µ1) π0, π2 = (1/µ_2){−λ_0π_0 + (λ_1 + µ_1)(λ_0π_0)/ (µ1)) = (λ0λ1/µ1µ2)π0 in general: πj = (λ0 ...λ_{j−1})/( µ1 ...µj)π0 which gives a probability distribution if and only if λ_0 ≠ 0 and Σ_{j=1, ∞} [(λ0 ...λ_{j−1})/( µ1 ...µj)] less than ∞ ie sum is finite Then the distribution has prob funct π0 = 1/ [1 + Σ_{j=1, ∞} [(λ0 ...λ_{j−1})/( µ1 ...µj)]] πj = (λ0 ...λ_{j−1})/( µ1 ...µj)π0 j = 1,2,....
34
Example 32. M/M/1 queue | For the M/M/1 queue, Example 26 with n = 1, we have λi = λ and µi = µ for all i.
So example 31 means: *Σ_{j=1, ∞} [(λ^j)/( µ^j)] less than ∞ true iff λ less than µ. (I.e. the service rate exceeds the arrival rate summing the geometric series gives π_0 = (µ−λ)/µ , and so π_j =(µ−λ)/µ (λ/µ)^j Hence the stationary distribution is a geometric distribution.
35
Example 33. Immigration-death process Consider a generalized birth-death process with λi = λ for all i, representing a constant rate of immigration, and µi = µi, as in the linear birth-death process.
The criterion in Example 31 now becomes Σ_{j=1, ∞} [(λ^j)/( µ^j j!)] less than ∞ which is always satisfied because Σ_{j=1, ∞} [(λ^j)/( µ^j j!)] = e^{λ/µ} -1 Then π_0 = 1/e^{λ/µ} and πj = λ^j/(µ^{j} j! e^{λ/µ}) = ((λ/µ)^{j} e^{−λ/µ})/ j! showing that the stationary distribution is Poisson with parameter λ/µ.
36
Evolution of the chain
How long does chain remain in current state for? Holding times Where does it go when it leaves that state? First Jump Probabilities:
37
Holding times
Suppose chain is initially in state u. denote H_i as length of time after which it first leaves I (HOLDING TIME in i) We consider : F(t) = P(Hi > t) (1−F(t) =cumulative distribution function of Hi) From the markov property (probability that chain is still in state i at time t + δt: F(t + δt) = F(t)(1 + g_{ii}δt + o(δt)) (probability still in state i at time t)*(probability that it doesn't move away between time t and t + δt) (g_ii is negative as diag entry)
38
Distribution of holding times
F(t + δt) = F(t)(1 + g_{ii}δt + o(δt)) On subtracting F(t) from both sides, dividing by δt and taking the limit it follows that F'(t) = g_{ii}F(t) The solution is F(t) = e^{g_ii t} (we knew F_0 =1 so constant included in sol) so that Hi has an exponential distribution with mean 1/(−gii) (recall that gii less than 0). For notational convenience we can write −gkk, as gk. Then Hi is exponential with mean 1/gi.
39
First Jump Probabilities:
P(i ↝ j during (t,t + δt) | leave i during (t,t + δt)) = gijδt + o(δt) /(−giiδt + o(δt)) for j ≠ i →gij/gi (ij entry of generator matrix/ ?) = ¯ pij, say, as δt → 0. The ¯ pij are called first jump probabilities. For a conservative chain (rows sum to 0) Σ_{j ≠ i} [ ¯ pij] =1, for each i so the first jump probabilities form a stochastic matrix( with zero diagonal ) Corresp discrete time chain is the first jump chain also true that if J is dest. state on leaving i P(J=j | H_i=t) = ¯ pij , j ≠ i} *calculations missed, since the chance of 2 jumps in δt is o(δt) etc
40
How the chain develops
The process remains in its initial state i for an exponentially distributed time with mean 1/gi It then jumps to another state j ≠ i chosen according to the first jump probabilities ¯ pij sequence of steps you pass through given by discrete jump chain length of time/ holding times are exp dist and is independent of the jump chain
41
Continuous markov chain is described by:
A continuous markov chain is described by generator matrix G off diagonal entries are rates of moving to state j given state i (different states) g_ij entries on diagonal are negative so g_ii is -g_i rows sum to zero usually so g_i's are found by adding the other entries in the same row * starts in a state and remains in that state for a time with exponential distribution with parameter g_i ie minus the diagonal entry in matrix * mean of exponential is 1/g_i Eventually leaves- holding time ends and moves to other state j with probability ¯ pij= gij/gi we think of these as transition probabilities as we have they sum to 1, we call the jump chain
42
To simulate a continuous markov chain we can study the discrete time chain and dont need new methods for inference: we study the jump chain
We use an exponential dist for RV and discrete markov chain for jump chain holding times with exponential distributions To see how, note that the jump chain, the discrete time chain obtained by observing the continuous time process only when it changes state, passes through the same sequence of states as the continuous time process, so properties depending on space but not time structure – for example, probabilities of absorption – can be obtained from the discrete jump chain. Note, however, that the jump chain and the underlying continuous time chain will often have different stationary distributions. ie could spend different amounts of time in different times
43
Example 34. Linear birth-death process from example 25 so each population member reproduces at rate λ and dies at rate µ
``` gi,i+1 = λi, gi,i−1 = µi and gi,i = −(λ + µ)i ``` (entries in G) This tells us that the holding time in state i: the length of time remaining in state i is exp dist with parameter (λ + µ)i probability that next event is a birth (jump chain up) p¯i,i+1 = gi,i+1/ −gi,i=λ/ (λ + µ) probability that the next event is a death is p¯i,i−1 = µ/ (λ + µ) for any i ≥ 1. State 0 is a special case; if state of chain is 0 then remains 0: corresponds to 3
44
BEHAVIOUR OF JUMP CHAIN Example 34. Linear birth-death process from example 25 so each population member reproduces at rate λ and dies at rate µ
The behaviour of the jump chain is then similar to the Gambler’s Ruin discrete time chain with the probability of jumping up being λ/(λ+µ) and that of jumping down being µ/(λ+µ), except at state 0 which cannot be left. The only difference is that in MAS275 there was an upper target N at which the gambler would stop, whereas here the population may increase indefinitely. From MAS275, the probability of reaching N before 0 starting at state i is q_{N,i} = {[1−(µ/λ)^i[/] 1−(µ/λ)^N] λ ≠ µ {i/N λ = µ. Letting N → ∞, we see that q_{N,i} → 0 if µ ≥ λ, indicating that the size of the population will not get indefinitely large and will eventually reach 0, so the population will become extinct with probability 1. (birth rate smaller than death rate- hard to reach large pop) On the other hand, if µ < λ then there are two possibilities. We have that q_{N,i} → 1 − (µ/λ)^i , indicating that, if the chain starts in state i there is a probability 1 −(µ/λ)^i that the population will become indefinitely large and not become extinct, but there is also a probability (µ/λ)^i that the population will reach 0, and hence become extinct (positive probability of getting indefinitely large and probability of becoming extinct)
45
How to fit a continuous-time Markov chain model to observation LIKELIHOOD
GIven a complete record of the chain’s evolution over the period [0, t], {x(s) : 0 ≤ s ≤ t} we know holding times and sequences of states it was in We suppose the model is parametrized by its generator matrix G. Likelihood of G is the the probability (density) of the observation as a function of the parameters *Exp continuous components and discrete jumps *the chain’s path can be described in terms of the sequence of states passed through and the lengths of holding time in each. Thus let {X¯_k : k = 0, 1, . . . , Nt} denote the sequence of states passed through by time t, Nt being the total number of jumps. The holding time in state X¯_k is denoted by H_k. (remains in state for random time H_k, exp dist with parameter entry of G) Then conditionally on X¯_0=x_0 the likelihood is **Holding time H_{N_t} is incomplete at time t, we don't know how long stays in state X¯_n_t observed values in lower case vs upper RVs We observe state 0 to n_{t-1} X¯_0 to X¯_{t-1}, holding times are exp dist terms for each holding time (where gs are same) * each transition we observe to a different state (last term corresponds to the last jump, we know its at least this value, at least this holding time up until t, the probability that this value is at least than this is used as the last term) L(G : x(s), 0 ≤ s ≤ t) = {∏ _{k=0, n_{t} − 1} [gx¯_k e{−gx¯_k h_k} p¯x¯_k x¯_{k+1}) ] e^{−gx¯_{n_t} h_{n_t} = ∏_{k=0, n_t-1} e^{−gx¯_k h_k g_{x¯_k,x¯_{k+1}}) e^{−gx¯_{n_t} h_{n_t}} = exp−(∑_i [ g_ia_i]) ∏_{i≠j} [g^{nij} _ij ] where a_i is the total time spent in state i during [0, t] and nij is the number of transitions observed from state i to state j during [0, t] *therefore we only need n_t,a_i, n_ij to find likelihood number of transitions observed, amount of time spent in each state and number of transitions between each pair of states
46
How to fit a continuous-time Markov chain model to observation LOG LIKELIHOOD
the log likelihood is l = −∑_i [g_ia_i] + ∑_{ij} [n_ij log g_{ij}] which, because g_i = −g_{ii} = ∑_{i≠j} gij , is the same as l = −∑_i ∑_{j≠i} [ g_ijai] + ∑_j≠i [nij log gij]
47
How to fit a continuous-time Markov chain model to observation MAXIMUM LIKELIHOOD ESTIMATES
partial derivative of log likelihoodwrt to each g_ij each transition rate form i to j ∂l/∂g_{ij} = −a_i + nij/gij ``` and is zero at at g^ij = n_ij/a_i i ≠ j mle is number of transitions from i to j / time that it was in state i ``` (ie dividing by the time it had the opportunity to move from i) DIAGONAL ENTRIES ESTIMATE ``` for gi = −gii follows from g^i = −g^_ii = ∑_{j ≠ i} [g^ij] = ∑_{j ≠ i} [n_ij/ai] = ni./ai ``` transitions out of state i/ amount of time in state i ie rate of transitions out of state i
48
How to fit a continuous-time Markov chain model to observation LIKELIHOOD NOTES covariance matrix variance matrix
**estimates g^ij = n_ij/a_i i ≠ j have the form no. transitions i to j per unit time spent in state i. gij as the rate at which transitions from i to j occur. ** generator elements gij are themselves functions of a lower dimension vector parameter, θ say In that case the maximum likelihood estimate θ^ would be found as the value at which l is maximum with respect to θ, and the generator elements would be estimated as gij (θ^) ** The asymptotic variance-covariance matrix of the estimates is given as usual by the inverse of the information matrix, minus 1 times the matrix of second derivatives of the log-likelihood l with respect to the parameters. (This is the observed information matrix J whose expectation is I, the expected information matrix. Either J or I may be used; in large samples they lead to equivalent results.)
49
Example 35. Inference for a linear birth-death process. Birth rate for pop n and death rate for pop n, where parameters are unknown inference for parameters we have the data: #transitions and holding times
λ_n = λ n and µ_n = µ n, for n = 0, 1, . . . , where λ and µ are unknown. The generator elements are g_ij= {λi j = i + 1 {−(λ + µ)i j = i {µ i j = i − 1 Suppose as above that the process has been observed over an interval [0, t], and the times ai and transition counts n_ij recorded. Then the log-likelihood for λ and µ is l = − ∑_i [g_ia_i] + ∑_{i,j} nij log gij = −∑_i [(λ + µ)ia_i] + ∑_i [n_{i,i+1} log(λi)] + ∑_i [n_{i,i−1} log(µi)] partial diff wrt to either parameter, births and deaths terms ∂l/∂λ = −∑_i [ia_i] + (1/λ) ∑_i [n_{i,i+1}] ∂l/∂µ = −∑_i [iai] + (1/µ) ∑_i [n_{i,i−1}] mles are: λ^ = ∑_i [n_{i,i+1}]/∑_i [iai] µ^ = ∑_i [n_{i,i−1}]/ ∑_i [iai] b =∑_[ n_{i,i+1} and d = ∑_i [n_{i,i−1}] are the total number of births and deaths respectively, and ∑[iai] is the total accumulated time, T say, (person-years say) during which birth or death could happen to one individual, so λ^ is the observed birth rate b/T, (spending more time for each individual members to born die for bigger population) and µ^ the observed death rate d/T, per individual per unit time.
50
Example 35. Inference for a linear birth-death process. ESTIMATED STANDARD ERRORS CIs
To find standard errors we calculate the observed information matrix J. From the above ∂^2l/∂λ^2 = −b/λ^2 ∂^2l/∂µ∂λ = 0 ∂^2l/∂µ^2 = −d/µ^2 so that J = [b/(λ^2) 0 ] [ 0 d/µ^2] and J−1 = [(λ^2)/b 0 ] [0 µ^2/d] Estimated standard errors for λb and µb are obtained from J−1 by taking square roots of the appropriate elements and substituting estimates for unknown parameters. For example ese (λ^) = λ^/√b and an approximate 95% confidence interval for λ is λ^ ± 2(λ^/√b)
51
Continuous-time Markov chain models and developments of them are used in many areas; for example in:
* population studies, where, for example, they have been developed to take account of age structure and spatial distributions of animal populations; * manpower planning for large organizations; * studies of social and occupational mobility; • diffusion of news, rumours and internet viruses; * competition, in ecology and in animal populations; * predator-prey phenomena; * disease modelling, including cancer growth; * epidemic modelling The following sketch of epidemic modelling aims to indicate the subject-matter and some of the demands it makes for development of ideas and techniques.
52
EPIDEMICS: cont process
Compartmentalised population SUSCEPTIBLE- one who may catch the relevant disease by contact with an infective individual INFECTIVE- infected REMOVED/IMMUNE- The removed individuals are those who, having caught the disease earlier, have now recovered and become immune or have been removed (by death or by being isolated, for example) and are no longer open to re-infection themselves or able to infect any one else For a population of n, S(t),I(t) and R(t) thus S(t)+I(t)+r(t)=n. (we only track S and I)
53
EPIDEMIC RATES
A susceptible person meeting infected can catch the disease At time t the rate of occurrence of such meetings and infections will depend on the numbers S(t) and I(t) of both susceptibles and infectives. proportional *rate of change in the number of susceptibles is dS(t)/dt = −βS(t)I(t) for some β bigger than 0 β= infection rate *reduction if infected recovers, rate proportional to I(t) dI(t)/dt = βS(t)I(t)−γI(t) = (βS(t)−γ)I(t) where γ is a constant referred to as the removal rate. * the rate of change of the number removed is dR(t)/ dt = γI(t).
54
For epidemic begins from a small number I0 of infectives at time 0.
We approximate S by S_0 for beginning of chain; we only track I for linear birth/death process solution of equations dS(t)/dt = −βS(t)I(t) dI(t)/dt = (βS_0)−γ)I(t) then gives a deterministic prediction for the progress of the epidemic until no new infections are occurring
55
EPIDEMIC RATES giving MARKOV CHAIN continuous time MC model
during (t, t+𝛿t) (I,S) → {(I+1,S-1) with prob βSI𝛿t+o(𝛿t) {(I-1),S) with probability γI𝛿t+o(𝛿t) A markov chain on the state of pairs (S,I) *if we consider at the beginning of the epidemic #S close to n and I close to 0, reduction in S slow thus the transition probabilities will be approx. I → {I+1 with prob βS_0I𝛿t+o(𝛿t) {I-1 with probability γI𝛿t+o(𝛿t) only changes in I need be tracked Transition probabs of a simple linear birth-death process with birth rate λ_i = βS_0i death rate μ_i = γi Follows that infectives die out if βS_0⩽γ (S_0 ⩽ ρ) and therefore there will be no major epidemic (γ/β = ρ) *If βS_0 bigger than γ there is still a probability ( γ/βS_0 )^l_0 of extinction so that a major epidemic MAY take place (not def will take place) Different conclusions for stochastic approach vs deterministic