Bayesian inference Flashcards Preview

AQM > Bayesian inference > Flashcards

Flashcards in Bayesian inference Deck (19)
Loading flashcards...
1

What is bayesian data analysis?

• Bayesian data analysis is when you use probability to represent uncertainty in all parts of a statistical model

2

Describe probability theory

• A random variable X is a variable that obtains different values, x, (observed value) in different realizations, each value with a defined probability
• (Conventionally, upper case letters denote random variables; the corresponding lower case letters denote their realizations.)

3

What are realisations in probability theory?

A realization, or observed value, of a random variable is the value that is actually observed (what actually happened).

4

What is independence in probability?

• Two random variables are independent if they don’t depend on each other
o One event can occur without affecting the probability of the other event occurring
o Two events are independent if the statistics of one event happening is the same no matter the outcome of the other
 E.g. the chances of rolling a 1 after flipping a head on a coin is still 1/6
Conditional probability will be the same as marginal probability for independent events

5

What is marginal probability?

P(X = x)
the likelihood of one event happening, independent of all other events
You can think of marginal probability as being the probability totals in the ‘margins’ of the probability tables

6

What is joint probability?

P(X = x ∧ Y = y)
The likelihood of two events occurring together
o The joint probability is the product of the marginal probabilities (i.e. multiply the marginal probabilities together), only if the 2 events are independent of each other
P(X = x ∧ Y = y) = P(X) x P(Y)

7

What is conditional probability?

P(X = x|Y = y) = P(X = x ∧ Y = y) / P(Y = y)
• The probability of an event ( Y ), given that another ( X ) has already occurred.
• If data are obtained from two (or more) random variables, the probabilities for one may depend on the value of the other(s)
• (in this case, these events are NOT independent)

8

What are discrete and continuous probabilities?

o Discrete: summing
 Splitting the data up into chunks, e.g. there’s a .01 probability of an adult being over 120cm, .03 probability of it being 200-210 etc and dividing all the probabilities of heights into chunks so they all equate to 1
o Continuous: integration
 Keep splitting the chunks into smaller and smaller/more precise probabilities and you will eventually get a smooth curve instead of the chunks that you can then predict events off of
• The probability distributions will always equal to 100% (1), no matter how much we share or distribute it

9

What is probability density?

Probability density
• For continuous-valued random variables, denoted by x ∈ R, instead of specifying probabilities the distribution is described by the cumulative distribution function
o Or by its derivative, the probability density function

10

Describe Bayesian probability theory

• Bayesian probability theory:
o Probability is a quantification of the degree of confidence we have for something to be the case based on our current knowledge- including prior knowledge and the new data.
• Bayesian methods enable statements to be made about the partial knowledge available (based on data) concerning some situation or ‘state of nature’ (observable or as yet unobserved) in a systematic way, using probability as a measure of uncertainty
• The guiding principle is that the state of knowledge about anything unknown is described by a probability distribution

11

Principles of Bayes theorem

• The posterior probability of a model given the data
• If you’re uncertain about something, the uncertainty is described by a probability distribution called your prior distribution
• You then obtain relevant data, the new data changed your uncertainty, which is then described by a new probability distribution called your posterior distribution
o Most of Bayesian inference is about how to go from prior to posterior
o The way Bayesians go from prior to posterior is to use the laws of conditional probability
o Can be called Bayes’ rule or Bayes’ theorem

12

Describe the Bayes theorem equation

P(M|D) = P(D|M) X P(M)/P(D)

M: model, D: data
P(M|D) The posterior probability of the model given the data
P(D|M) The probability of the data given the model
P(M) Prior, marginal probability
P(D) probability of the data given all evidence from all models

13

How to work out P(D|M)

 P(D|M) = P(y1 = 0|M=1) ∧ (y2 = 0| M = 1) = P(y1=0) x P(y2=0) = 0.5x0.5 = 0.25

14

How to work out P(D)

o The probability of the data taking into account the evidence for all models (M=1 and M=0)
o P(D) = P(y|M=1) P(M=1) + P(y|M = 0) P(M=0)
o P(D) = (.25 X .5) + (1 x .5)
o P(D) = .125 + .5
o P(D) = .625

15

final step of bayes theorem- how to work out P(M|D)

o P(M|D) = P(D|M) x (P(M)/ P(D)
o P(M|D) = P(y|M=1) x (P(M=1)/P(D)
o P(M|D) = (.25 x .5)/ .625
o P(M|D) = .125/.625
o P(M|D) = .2

16

Critique of Null Hypothesis Significance Testing

• If H0 is correct, then this datum (D) cannot occur. D has occurred. Therefore, H0 is false
o Saying because D has occurred, H0 is false otherwise it violates the rule
• P(D|H0) ≠ P(H0|D)
o What we really want to know is probability that the hypothesis is false (i.e. the probability of the model) give that the data has occurred P(H0|D)
• P(D|H0) is the likelihood function
• P(H0|D) is the posterior probability
• A primary motivation for Bayesian thinking is that it facilitates a common-sense interpretation of statistical conclusions
o For instance, a Bayesian (probability) interval for an unknown quantity of interest can be directly regarded as having a high probability of containing the unknown quantity
o A frequentist (confidence) interval may strictly be interpreted only in relation to a sequence of similar inferences that might be made in repeated practice

17

What is use for significance testing in frequentist and bayesian approach?

Frequentist--> p value (null hypothesis significance test)
Bayes--> Bayes factor

18

What is used for estimation with uncertainty in frequentist and bayesian approaches

Freq: Maximum likelihood estimate with confidence intervals
Bay: Posterior distribution with highest density interval

19

What is Bayes factor?

The aim of the Bayes factor is to quantify the support for a model over another (e.g. null over alternative), not to judge which one is correct (which is what NHST aims to do)
K = P(D|M1) P(M1) / P(D|M2) P(M2)
A value of K > 1 means that M1 is more strongly supported by the data under consideration than M2.