week 2 formatted Flashcards
State Bayes’ Theorem for random variables Y
and X
in terms of their conditional and marginal densities.
f<sub>Y|X</sub>(y|x) = [f<sub>X|Y</sub>(x|y) * f<sub>Y</sub>(y)] / f<sub>X</sub>(x)
, where f<sub>X</sub>(x) = ∫ f<sub>X|Y</sub>(x|y) * f<sub>Y</sub>(y) dy
.
In Bayesian analysis, how is the parameter θ
treated, and what represents the initial beliefs about it?
θ
is treated as a random variable with a prior density π₀(θ)
encapsulating beliefs about θ
before observing data.
Write down the formula for the posterior distribution π(θ|x)
using Bayes’ Theorem, given data x = (x₁, ..., xn)
.
π(θ|x) = [Π<sub>i=1</sub><sup>n</sup> f<sub>X|θ</sub>(xi|θ) * π₀(θ)] / f(x) = [L(θ, x) * π₀(θ)] / ∫ L(θ, x) * π₀(θ) dθ
What is the likelihood function, L(θ, x)
, in the context of Bayesian inference?
L(θ, x) = Π<sub>i=1</sub><sup>n</sup> f<sub>X|θ</sub>(xi|θ)
, representing the probability (or density) of observing the data x
given a specific value of the parameter θ
.
What is the term for the denominator in the Bayes’ Theorem formula for π(θ|x)
, and what does it represent?
The denominator, f(x) = ∫ L(θ, x) π₀(θ) dθ
, is called the marginal likelihood or evidence. It represents the marginal probability (density) of observing the data x
, integrated over all possible values of θ
.
What is the proportionality relationship used for calculating the posterior distribution, ignoring the normalizing constant?
π(θ|x) ∝ L(θ, x) * π₀(θ)
Describe how Bayesian updating works sequentially when a new datum x₂
arrives after observing x₁
.
The posterior after x₁
, π(θ|x₁)
, becomes the prior for processing x₂
. The new posterior is π(θ|x₁, x₂) ∝ f<sub>X|θ</sub>(x₂|θ) * π(θ|x₁)
.
If T = T(X)
is a sufficient statistic for θ
, how does this simplify the calculation of the posterior distribution π(θ|x)
?
The posterior distribution depends on the data x
only through the value of the sufficient statistic T(x)
. That is, π(θ|x) ∝ g(T(x), θ) * π₀(θ)
, where L(θ, x) = g(T(x), θ)h(x)
by the Factorization Theorem.
What are the two main computational/analytical challenges mentioned in Bayesian inference related to the posterior and marginal likelihood?
- Evaluating the marginal likelihood integral
f(x) = ∫ L(θ, x) π₀(θ) dθ
. \n2. Determining the distributional form of the posteriorπ(θ|x)
.
What is a conjugate prior family P
for a class of likelihood distributions F = {f<sub>X|θ</sub>(x|θ)}
?
P
is conjugate for F
if, for any prior π₀(θ) ∈ P
and any likelihood f<sub>X|θ</sub>(x|θ) ∈ F
, the resulting posterior distribution π(θ|x)
is also in the family P
.
What is the main advantage of using a conjugate prior?
It leads to an analytically tractable posterior calculation, meaning the form of the posterior distribution is known and often easy to compute.
Write the general form of a k-parameter exponential family pdf/pmf, f<sub>X|θ</sub>(x|θ)
.
f<sub>X|θ</sub>(x|θ) = h(x) * c(θ) * exp[ Σ<sub>j=1</sub><sup>k</sup> t<sub>j</sub>(x) * w<sub>j</sub>(θ) ]
What are the components h(x)
, c(θ)
, t_j(x)
, and w_j(θ)
in the exponential family definition?
h(x)
is a function of x
only; c(θ)
is a function of θ
only (related to the normalizing constant); t<sub>j</sub>(x)
are the sufficient statistics; w<sub>j</sub>(θ)
are functions of the parameters (often called natural parameters).
When is an exponential family called ‘regular’?
The family is regular if the support of the distribution, denoted by the set X
, does not depend on the parameter θ
.
What is the form of the conjugate prior π₀(θ)
for a parameter θ
of a regular k-parameter exponential family likelihood?
π₀(θ) = d(α, β) * [c(θ)]<sup>α</sup> * exp[ Σ<sub>j=1</sub><sup>k</sup> β<sub>j</sub> * w<sub>j</sub>(θ) ]
, where α
and β = (β₁, ..., βk)
are hyperparameters and d(α, β)
is the prior normalizing constant.
Given a sample x = (x₁, ..., xn)
from a regular exponential family and a conjugate prior as defined above, what is the form of the posterior distribution π(θ|x)
?
The posterior is proportional to [c(θ)]<sup>(α+n)</sup> * exp[ Σ<sub>j=1</sub><sup>k</sup> (β<sub>j</sub> + Σ<sub>i=1</sub><sup>n</sup> t<sub>j</sub>(xi)) * w<sub>j</sub>(θ) ]
. It has the same form as the prior but with updated hyperparameters.
How are the hyperparameters (α
, β
) updated to get the posterior hyperparameters (α*
, β*
) for the conjugate prior of a regular exponential family after observing data x = (x₁, ..., xn)
?
α* = α + n
; β<sub>j</sub>* = β<sub>j</sub> + Σ<sub>i=1</sub><sup>n</sup> t<sub>j</sub>(xi)
for j = 1, ..., k
.