Chapter 7: Markov Chains, Monte Carlo Estimation Flashcards

1
Q

What is meant by stochastic processes?

A

Stochastic processes are mathematical models that describe systems that evolve over time in a random manner. They are used to model a wide range of phenomena, such as stock prices, weather patterns, and biological populations. A stochastic process typically consists of a set of random variables that evolve over time according to certain rules or probability distributions.

More formally:
A stochastic process is a sequence X1, X2, . . . = (Xn)n∈N of RVs Xn : Ω → X

where:
- The set X is often called the state space of the process.
- The index n is called the time index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain why Bayesian network are unsuitable for representing stochastic processes. Instead, what other representation is better suited, and why?

A

Stochastic processes are often unbounded by time. If for each X1, X2, . . . = (Xn)n∈N we have a node in the BN, we would have an infinitely long BN. In addition, this would require ((formalizing the conditional probability tables** in relation to n, which is exponential. Therefore using BN’s to represent stochastic processes is unworkable.

Instead Markov chains can be used to represent these types of processes. In a Markov chain, the Markov property states that the probability of being in a particular state at a future time step depends only on the current state and the transition probabilities between states, and not on any previous states or the steps that led to the current state. (a state is conditionally independent of it’s predecessors). This means that the CPT’s do not grow exponentially as is the case in BN’s, but it will grow linearly, as long as the number of unique states remain constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant with time homogeneity? Give a formal definition.

A

Common additional assumption is time homogeneity (also called stationarity):
p(Xk+1 | Xk ) = p(X2 | X1) for all k ∈ N
Then number of required parameters is constant w.r.t. desired time horizon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Suppose we have a homogeneous Markov Chain (Xn)n∈N with state space X = {a, b}
and parameters:

p(X1 = a) = 1, p(X2 = a | X1 = a) = 0.4, p(X2 = b | X1 = a) = 0.6,
p(X1 = b) = 0, p(X2 = a | X1 = b) = 0.5, p(X2 = b | X1 = b) = 0.5.

What is p(X3 = a)?

A

0.46 (see slides for explanation, I can’t add pictures, because no premium :’( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is meant with the Limit distribution and burn-in periods of Markov chains?

A

In a Markov chain, the limit distribution (also called stationary distribution or steady state distribution) is a probability distribution that the system approaches as time goes on, assuming that the initial state is chosen according to some distribution. The limit distribution is important because it represents the long-term behavior of the system, that is, the distribution of states that the system is likely to be in after a large number of time steps.

When simulating a Markov chain on a computer, it is important to run the simulation for a sufficiently long time to ensure that the system has reached its limit distribution. However, it is also important to note that the initial state of the system might not be a sample from the limit distribution. In this case, the initial states are called “burn-in states” or “burn-in period” and it is necessary to discard them before collecting data to avoid bias in the sample. The length of the burn-in period is defined as the number of steps needed for the system to reach the limit distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain how likelihoods about continuous random variables can be represented. How can we calculate the probability that a continuous RV takes on the value within an interval?

A

A probability density function (PDF) is a function that describes the probability distribution of a continuous random variable. The value of the function at a given point represents the probability density at that point, which is a measure of the likelihood of the random variable taking on a value in the vicinity of that point.

The area under the curve of the PDF between two points on the x-axis represents the probability that the random variable will take on a value within that interval. More specifically, if we denote the PDF as f(x), the probability that the random variable X will take on a value in the interval [a, b] is given by the definite integral:

P(a <= X <= b) = ∫(f(x) dx) from a to b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between probability density and the Area Under the Curve (AOC)?
(not strictly exam material, but good for understanding PDFs)

A

It may be helpful to think of the PDF as a smooth curve that describes the relative likelihood of different values of the random variable. The height of the curve at a given point represents the relative likelihood of the random variable taking on a value near that point. The area under the curve between two points represents the relative likelihood of the random variable taking on a value within that interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a cumulative distribution function?

A

The cumulative distribution function (CDF) Fx : R → [0, 1] of X is defined for all x ∈ R as

FX (x) = PX (X ≤ x) = PX ({y ∈ X : y ≤ x})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is meant with the expected value, and how can you calculate it?

A

Intuitively, E[X] is the probability-weighted average of the possible outcomes of X.

For discrete RVs, E[X] can be calculated by taking the sum over all possible values in X multiplied by the probability of that outcome. (SUM(x * p(x) for all x ∈ X)

The expected value of a continuous random variable X with probability density function f(x) is defined as:

E(X) = ∫(-∞, ∞)x f(x)dx

The integral is taken over the entire domain of the random variable X. It is a weighted average of all possible values of the random variable, where each value is weighted by its probability of occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Provide three reasons for why numerically evaluating a sum or integral is hard in practice.

A
  1. Distribution function pX may not exist
  2. Distribution function pX may exist but be hard to compute
  3. In general, problem may involve high-dimensional sums or integrals

For example calculating the expected value of a function represented on the sample space of a Bayesian network requires exponentially many evaluation of the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we overcome the difficulties of calculating the expected value function in practice?

A

We can use Monte Carlo estimation, which is the process of approximating the expected value function by sampling pX. This approach tends to converge to the actual expected value function. Generally this take O(√k) in time complexity.
(Read as: to reduce error ten-fold, you need a hundred times the samples)

Make sure to check the lecture slides for the actual formula.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Provide one method for sampling when using Monte Carlo Estimation. Also name an important assumption when using this sampling method.

A

Inverse-transform sampling is a technique for generating random samples from a probability distribution that can be derived from its cumulative distribution function (CDF). It is used in Monte Carlo simulations to generate random samples from a probability distribution that is not easy to sample from directly.

Assumptions include:
- We know the inverse CDF function, examples include Gaussian, Gamma or Poisson distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly