Bayesian Flashcards

1
Q

Write down Bayes theorem (simple case)

A

p(B|A) = \frac{p(A|B)p(B)}{p(A)}
Where

p(A) = \sum_{B_i \in S_B}p(A|B_i)p(B_i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the differnet components in the Bayesian formula

A

$p(B)$ is the marginal probability of B and is considered “prior” information since it exists prior to our situation. In the case of a pregnancy test “The “prior” information we need, $p(B) ≡ p(preg)$, is the marginal probability of being pregnant, not knowing anything beyond the fact that the woman has had a single sexual encounter. This information is considered prior information, because it is relevant information that exists prior to the test. We may know from previous research that, without any additional information (e.g., concerning date of the last menstrual cycle), the probability of conception for any single sexual encounter is approximately 15%”.

$p(B|A)$ is called the “posterior” probability. This is because it is the updated belief after observing the data.

$p(A|B)p(B)$ is called the “Kernel”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we go from the point-probability Bays thing to the probability distribution

A

Bayesian statistics typically involves using probability distributions rather than point probabilities for the quantities in the theorem. We thus replace the prior with a hole prior distribution which captures the uncertainty regarding the prior. The inclusion of a prior probability distribution ultimately produces a posterior probability that is also no longer a single quantity; instead, the posterior becomes a probability distribution as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can you put probability distributions on parameters in the Bays perspective?

A

From the Bayesian perspective, any quantity for which the true value is uncertain, including model parameters, can be represented with probability distributions. From the classical perspective, however, it is unacceptable to place probability distributions on parameters, because parameters are assumed to be fixed quantities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Write the Bays formula and describe the parts in the case when we use probability distributions.

A

See notion.

where $f(θ|data)$ is the posterior distribution for the parameter $θ, f(data|θ)$ is the sampling density for the data—which is proportional to the Likelihood function, only differing by a constant that makes it a proper density function—$f(θ)$ is the prior distribution for the parameter, and $f(data)$ is the marginal probability of the data.

Because this denominator simply scales the posterior density to make it a proper density, and because the sampling density is proportional to the likelihood function, Bayes’ Theorem for probability distributions is often stated as:

$$
Posterior ∝ Likelihood × Prior
$$

This equation will be “off” by the denominator on the right side of the equation for the posterior above, in addition to whatever normalizing constant is needed to equalize the likelihood function and the sampling density $p(data | θ)$.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is proportionality?

A

Specification of an appropriate prior distribution for a parameter is the most substantial aspect of a Bayesian analysis that differentiates it from a frequentist analysis. Using the example of pregnancy from ~~here~~, we could replace the point estimate of .15 with a probability distribution that represented the plausible values of the prior probability of pregnancy and their relative merit. For example, we may give considerable prior weight to the value .15 with diminishing weight to values of the prior probability that are far from .15. Similarly, in the polling data example, we can use a distribution to represent our prior knowledge and uncertainty regarding $\theta$
.

We like a distribution that insures $\theta\in[0,1]$

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the beta distribution and how we should set the parameters.

A

An appropriate prior distribution for an unknown such as $\theta$
is a beta distribution. The pdf of the beta distribution is:

$$
\frac{\Gamma (\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}
$$

With higher $\alpha$ and $\beta$ we get less variance in our prior distribution. If we have little or no prior information, or we want to put very little stock in the information we have, we can choose values for α and β that reduce the distribution to a uniform distribution, i.e., setting $\alpha = \beta = 1$. This type of prior is called “noninformative.” If we have considerable prior information and we want it to weigh heavily relative to the current data, we can use large values of α and β. If we have considerable prior information but we do not wish for it to weigh heavily in the posterior distribution, we can choose moderate values of the parameters that yield a mean that is consistent with the previous research but that also produce a variance around that mean that is broad.

With little data the choice of prior (the values of alpha and beta) will be of significant importance, but that is not the case if we have a lot of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an improper prior?

A

A probability distribution for $θ$, $p(θ)$, is called improper if its integral over the sample space does not converge. E.g., $\theta \in(-\infin,\infin)$. This seems to be a uniform distribution? That is, we have $\alpha = \beta = 1$

This is essentially a rectangle with an infinitely long base, so its area diverges. We give all events the probability of 1???

Reasons to use improper priors are:

  • That they can be thought as an approximation to a proper prior representing very vague beliefs.
  • A uniform prior is also a quick and convenient way to start doing the analysis
    without worrying about specifying your beliefs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the posterior?

A

The posterior density represents your beliefs about $θ$ given your prior beliefs
and the beliefs embodied in the likelihood. • It is the culmination of the empirical analysis. Equivalent in frequentist econometrics: table with estimates of $θ$ and its standard errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we think about the distribution of the posterior

A

The distribution of the posterior

$$
Posterior ∝ Likelihood × Prior
$$

We study this and guess what distribution it is. It might e.g., by that ir resembles a beta distribution. Then it is a beta distribution where e.g.,$\alpha = a+b$ and $\beta = c+d+e$. When the prior and likelihood are of such a form that the posterior distribution follows the same form as the prior, the prior and likelihood are said to be conjugate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we report the posterior?

A

In some cases (such as with conjugate priors), we know the posterior analytically. Sometimes we don’t know the posterior analytically, but we can draw samples
from it (using Monte Carlo simulations or importance sampling).

In Bayesian analysis, we can report the mean of the posterior and an interval where we can say that with, say, 95% probability we believe that the true parameter lies in that interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an credible interval?

A

With the posterior density determined, we now can summarize our updated knowledge about $\theta$. If we are willing to assume that this beta distribution is approximately normal, then we could construct a 95% interval based on a normal approximation. This interval is called a “credible interval”.

If we like to know the probability that $\theta$ is larger than $.5$, we integrate the beta distribution of this density from .5 to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the likelihood in the Bays formula?

A

The likelihood $p(data|θ)$ is simply the probability of the data conditional on the parameter θ.

If the data follows a binomial distribution then the likelihood function is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Bayesian algorithm

A

We can formulate the Bayesian method as an algorithm

  1. Formulate an economic model as a collection of probability distributions indexed by a set of parameters $θ$
  2. State your beliefs about $θ$ using a prior probability distribution, $p(θ)$
  3. Collect the data
  4. Use Bayes’ theorem to calculate your new (posterior) beliefs about $θ$
  5. Criticize the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When and why do we sample from the posterior to obtain moments such as the mean and variance?

A

available. In these cases, we can use computational methods to obtain moments of the posterior or a set of draws from it, e.g., importance sampling or Monte Carlo integration methods.

If we sample from the posterior, we can know it with arbitrary precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is importance sampling?

A

Objective: obtain moments such as the mean or the variance from the posterior distribution.

We could in practice do this by sampling from the distribution and calculating the mean. However, we can not do this since we do not know the marginal distribution of $y$, i.e., $p(y)$ in the denominator. We solve this with “importance sample”. Here introduce some $g(\theta)$, a distribution we know. This helps us get the posterior.

After a bit of algebra we end up with some kind of estimator.

17
Q

Describe the metropolis alghorithem to sample from the posterior?

A

???? Kolla min anteckning från någonstans ifrån???

18
Q

Why use Bayesian methods rather than frequentists?

A

If we have few variables and many observations, Bayes produces similar results as frequentist methods. But Bayes performs better if

  1. there are many variables relative to observations
  2. prior information is available

Can be especially useful for prediction models