week 2 formatted Flashcards

1
Q

State Bayes’ Theorem for random variables Y and X in terms of their conditional and marginal densities.

A

f<sub>Y|X</sub>(y|x) = [f<sub>X|Y</sub>(x|y) * f<sub>Y</sub>(y)] / f<sub>X</sub>(x), where f<sub>X</sub>(x) = ∫ f<sub>X|Y</sub>(x|y) * f<sub>Y</sub>(y) dy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In Bayesian analysis, how is the parameter θ treated, and what represents the initial beliefs about it?

A

θ is treated as a random variable with a prior density π₀(θ) encapsulating beliefs about θ before observing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Write down the formula for the posterior distribution π(θ|x) using Bayes’ Theorem, given data x = (x₁, ..., xn).

A

π(θ|x) = [Π<sub>i=1</sub><sup>n</sup> f<sub>X|θ</sub>(xi|θ) * π₀(θ)] / f(x) = [L(θ, x) * π₀(θ)] / ∫ L(θ, x) * π₀(θ) dθ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the likelihood function, L(θ, x), in the context of Bayesian inference?

A

L(θ, x) = Π<sub>i=1</sub><sup>n</sup> f<sub>X|θ</sub>(xi|θ), representing the probability (or density) of observing the data x given a specific value of the parameter θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the term for the denominator in the Bayes’ Theorem formula for π(θ|x), and what does it represent?

A

The denominator, f(x) = ∫ L(θ, x) π₀(θ) dθ, is called the marginal likelihood or evidence. It represents the marginal probability (density) of observing the data x, integrated over all possible values of θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the proportionality relationship used for calculating the posterior distribution, ignoring the normalizing constant?

A

π(θ|x) ∝ L(θ, x) * π₀(θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe how Bayesian updating works sequentially when a new datum x₂ arrives after observing x₁.

A

The posterior after x₁, π(θ|x₁), becomes the prior for processing x₂. The new posterior is π(θ|x₁, x₂) ∝ f<sub>X|θ</sub>(x₂|θ) * π(θ|x₁).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If T = T(X) is a sufficient statistic for θ, how does this simplify the calculation of the posterior distribution π(θ|x)?

A

The posterior distribution depends on the data x only through the value of the sufficient statistic T(x). That is, π(θ|x) ∝ g(T(x), θ) * π₀(θ), where L(θ, x) = g(T(x), θ)h(x) by the Factorization Theorem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two main computational/analytical challenges mentioned in Bayesian inference related to the posterior and marginal likelihood?

A
  1. Evaluating the marginal likelihood integral f(x) = ∫ L(θ, x) π₀(θ) dθ. \n2. Determining the distributional form of the posterior π(θ|x).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a conjugate prior family P for a class of likelihood distributions F = {f<sub>X|θ</sub>(x|θ)}?

A

P is conjugate for F if, for any prior π₀(θ) ∈ P and any likelihood f<sub>X|θ</sub>(x|θ) ∈ F, the resulting posterior distribution π(θ|x) is also in the family P.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main advantage of using a conjugate prior?

A

It leads to an analytically tractable posterior calculation, meaning the form of the posterior distribution is known and often easy to compute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Write the general form of a k-parameter exponential family pdf/pmf, f<sub>X|θ</sub>(x|θ).

A

f<sub>X|θ</sub>(x|θ) = h(x) * c(θ) * exp[ Σ<sub>j=1</sub><sup>k</sup> t<sub>j</sub>(x) * w<sub>j</sub>(θ) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the components h(x), c(θ), t_j(x), and w_j(θ) in the exponential family definition?

A

h(x) is a function of x only; c(θ) is a function of θ only (related to the normalizing constant); t<sub>j</sub>(x) are the sufficient statistics; w<sub>j</sub>(θ) are functions of the parameters (often called natural parameters).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When is an exponential family called ‘regular’?

A

The family is regular if the support of the distribution, denoted by the set X, does not depend on the parameter θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the form of the conjugate prior π₀(θ) for a parameter θ of a regular k-parameter exponential family likelihood?

A

π₀(θ) = d(α, β) * [c(θ)]<sup>α</sup> * exp[ Σ<sub>j=1</sub><sup>k</sup> β<sub>j</sub> * w<sub>j</sub>(θ) ], where α and β = (β₁, ..., βk) are hyperparameters and d(α, β) is the prior normalizing constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Given a sample x = (x₁, ..., xn) from a regular exponential family and a conjugate prior as defined above, what is the form of the posterior distribution π(θ|x)?

A

The posterior is proportional to [c(θ)]<sup>(α+n)</sup> * exp[ Σ<sub>j=1</sub><sup>k</sup> (β<sub>j</sub> + Σ<sub>i=1</sub><sup>n</sup> t<sub>j</sub>(xi)) * w<sub>j</sub>(θ) ]. It has the same form as the prior but with updated hyperparameters.

17
Q

How are the hyperparameters (α, β) updated to get the posterior hyperparameters (α*, β*) for the conjugate prior of a regular exponential family after observing data x = (x₁, ..., xn)?

A

α* = α + n; β<sub>j</sub>* = β<sub>j</sub> + Σ<sub>i=1</sub><sup>n</sup> t<sub>j</sub>(xi) for j = 1, ..., k.