Bayesian inference Flashcards Preview

AQM > Bayesian inference > Flashcards

Flashcards in Bayesian inference Deck (19)
Loading flashcards...

What is bayesian data analysis?

• Bayesian data analysis is when you use probability to represent uncertainty in all parts of a statistical model


Describe probability theory

• A random variable X is a variable that obtains different values, x, (observed value) in different realizations, each value with a defined probability
• (Conventionally, upper case letters denote random variables; the corresponding lower case letters denote their realizations.)


What are realisations in probability theory?

A realization, or observed value, of a random variable is the value that is actually observed (what actually happened).


What is independence in probability?

• Two random variables are independent if they don’t depend on each other
o One event can occur without affecting the probability of the other event occurring
o Two events are independent if the statistics of one event happening is the same no matter the outcome of the other
 E.g. the chances of rolling a 1 after flipping a head on a coin is still 1/6
Conditional probability will be the same as marginal probability for independent events


What is marginal probability?

P(X = x)
the likelihood of one event happening, independent of all other events
You can think of marginal probability as being the probability totals in the ‘margins’ of the probability tables


What is joint probability?

P(X = x ∧ Y = y)
The likelihood of two events occurring together
o The joint probability is the product of the marginal probabilities (i.e. multiply the marginal probabilities together), only if the 2 events are independent of each other
P(X = x ∧ Y = y) = P(X) x P(Y)


What is conditional probability?

P(X = x|Y = y) = P(X = x ∧ Y = y) / P(Y = y)
• The probability of an event ( Y ), given that another ( X ) has already occurred.
• If data are obtained from two (or more) random variables, the probabilities for one may depend on the value of the other(s)
• (in this case, these events are NOT independent)


What are discrete and continuous probabilities?

o Discrete: summing
 Splitting the data up into chunks, e.g. there’s a .01 probability of an adult being over 120cm, .03 probability of it being 200-210 etc and dividing all the probabilities of heights into chunks so they all equate to 1
o Continuous: integration
 Keep splitting the chunks into smaller and smaller/more precise probabilities and you will eventually get a smooth curve instead of the chunks that you can then predict events off of
• The probability distributions will always equal to 100% (1), no matter how much we share or distribute it


What is probability density?

Probability density
• For continuous-valued random variables, denoted by x ∈ R, instead of specifying probabilities the distribution is described by the cumulative distribution function
o Or by its derivative, the probability density function


Describe Bayesian probability theory

• Bayesian probability theory:
o Probability is a quantification of the degree of confidence we have for something to be the case based on our current knowledge- including prior knowledge and the new data.
• Bayesian methods enable statements to be made about the partial knowledge available (based on data) concerning some situation or ‘state of nature’ (observable or as yet unobserved) in a systematic way, using probability as a measure of uncertainty
• The guiding principle is that the state of knowledge about anything unknown is described by a probability distribution


Principles of Bayes theorem

• The posterior probability of a model given the data
• If you’re uncertain about something, the uncertainty is described by a probability distribution called your prior distribution
• You then obtain relevant data, the new data changed your uncertainty, which is then described by a new probability distribution called your posterior distribution
o Most of Bayesian inference is about how to go from prior to posterior
o The way Bayesians go from prior to posterior is to use the laws of conditional probability
o Can be called Bayes’ rule or Bayes’ theorem


Describe the Bayes theorem equation

P(M|D) = P(D|M) X P(M)/P(D)

M: model, D: data
P(M|D) The posterior probability of the model given the data
P(D|M) The probability of the data given the model
P(M) Prior, marginal probability
P(D) probability of the data given all evidence from all models


How to work out P(D|M)

 P(D|M) = P(y1 = 0|M=1) ∧ (y2 = 0| M = 1) = P(y1=0) x P(y2=0) = 0.5x0.5 = 0.25


How to work out P(D)

o The probability of the data taking into account the evidence for all models (M=1 and M=0)
o P(D) = P(y|M=1) P(M=1) + P(y|M = 0) P(M=0)
o P(D) = (.25 X .5) + (1 x .5)
o P(D) = .125 + .5
o P(D) = .625


final step of bayes theorem- how to work out P(M|D)

o P(M|D) = P(D|M) x (P(M)/ P(D)
o P(M|D) = P(y|M=1) x (P(M=1)/P(D)
o P(M|D) = (.25 x .5)/ .625
o P(M|D) = .125/.625
o P(M|D) = .2


Critique of Null Hypothesis Significance Testing

• If H0 is correct, then this datum (D) cannot occur. D has occurred. Therefore, H0 is false
o Saying because D has occurred, H0 is false otherwise it violates the rule
• P(D|H0) ≠ P(H0|D)
o What we really want to know is probability that the hypothesis is false (i.e. the probability of the model) give that the data has occurred P(H0|D)
• P(D|H0) is the likelihood function
• P(H0|D) is the posterior probability
• A primary motivation for Bayesian thinking is that it facilitates a common-sense interpretation of statistical conclusions
o For instance, a Bayesian (probability) interval for an unknown quantity of interest can be directly regarded as having a high probability of containing the unknown quantity
o A frequentist (confidence) interval may strictly be interpreted only in relation to a sequence of similar inferences that might be made in repeated practice


What is use for significance testing in frequentist and bayesian approach?

Frequentist--> p value (null hypothesis significance test)
Bayes--> Bayes factor


What is used for estimation with uncertainty in frequentist and bayesian approaches

Freq: Maximum likelihood estimate with confidence intervals
Bay: Posterior distribution with highest density interval


What is Bayes factor?

The aim of the Bayes factor is to quantify the support for a model over another (e.g. null over alternative), not to judge which one is correct (which is what NHST aims to do)
K = P(D|M1) P(M1) / P(D|M2) P(M2)
A value of K > 1 means that M1 is more strongly supported by the data under consideration than M2.