Lecture 5 Flashcards

(86 cards)

1
Q

What is a Bayesian Inference

A

the outcome of a learning process that is governed by relative predictive success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bayesian learning cycle

A
  1. prior knowledge
  2. prediction
  3. data
  4. prediction error
  5. knowledge update
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If your predictive updating factor is larger than 1…

A

then your beliefs, in a particular value of data, increases.

If it is lower than 1.. decrease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

So your beliefs increases…

A

if your prediction for the data is better when you condition on that particular data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Do you want surprises in statistics

A

nope, probably means that the data isn’t really good

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

theta (0met streepje)

A

gaat altijd over een unknown proportion. van een hele populatie bijvoorbeeld

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A more complex model…

A

does FIT the data better, but only in terms of FITTING it. But it is very poorly at prediction!!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Suppose bf01= 3, What is the correct interpretation

A
  • the observed data are 3 times more likely under the H0 than under H1. or “H0 outpredicts H1 by a factor of 3” - its about the evidence coming from the data

NOT: after seeing the data, H0 is now 3 times more likely than H1. This is correct ONLY when H0 and H1 are equally likely a priori

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a rough conceptualisation of the bayesian learning cycle

A

Prior knowledge
prediction
data
prediction error
knowledge update
and prior knowledge again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does Bayes’ rule compute in terms of beliefs?

A

It computes how we update our posterior beliefs about unknown parameters (θ) using prior beliefs and observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Write out Bayes’ Rule using θ and data.

A

P(θ∣data)= P(data∣θ)⋅P(θ)
​ ——————-
P(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does 𝑃 (𝜃∣data) represent?

A

The posterior belief about the parameter θ after seeing the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does P(θ) represent?

A

The prior belief about θ before seeing any data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does P(data∣θ) represent?

A

The likelihood – the probability of observing the data if θ were true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does P(data) represent?

A

The evidence – the overall probability of the data under all possible θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the informal interpretation of Bayes’ Rule?

A

Posterior = Prior × Predictive updating factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the ratio
P(θ∣data)
————-
P(θ) represent?

A

The change in support for θ after seeing data (how much belief in θ increases/decreases).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the ratio
P(data∣θ)
———–
​P(data)
represent?

A

The predictive success of θ – how well it explains the data relative to alternatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What equality connects the two ratios above?

A

P(θ)
=
P(data∣θ)
————–
P(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does this equation tell us about belief updating?

A

The change in belief (support) is proportional to the predictive success of θ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the interpretation of “the extent to which an account of the world out-predicts another”?

A

It is how much better θ explains the data than competing hypotheses, and this drives belief change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does it mean if data are very surprising under a model?

A

It suggests the model is not a good fit — it loses credibility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is meant by “surprise lost is credibility gained”?

A

A model that reduces surprise (predicts well) gains support in Bayesian inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the parameter θ in the binary outcome example?

A

The unknown proportion (e.g., of cat people), which lies between 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is an example of a binary outcome?
Yes/No, Correct/Incorrect, Cat person/Dog person.
26
What is a common prior for θ in this example?
A uniform distribution (every value of θ equally likely).
27
In the example, what does observing 5/10 cat people tell us?
It shifts our belief to a posterior distribution centered around θ = 0.5.
28
What does the arrow from prior to posterior in the graph represent?
The change in belief caused by the data — matching the left side of Bayes’ rule.
29
What is the predictive distribution?
The distribution of possible outcomes we expect before seeing data, based on θ.
30
What does the predictive distribution look like under a uniform prior?
A broad/flat distribution where all outcomes are roughly equally likely.
31
What does the predictive distribution look like if we assume θ = 0.5?
A concentrated distribution centered around equal numbers of cat/dog people.
32
What does the arrow between two predictive distributions represent?
The change in predictive power, matching the right side of Bayes’ rule.
33
What is the final takeaway of the lecture?
The change in belief (prior → posterior) equals the difference in predictive performance — both sides of Bayes’ rule describe the same updating process from different perspectives.
34
What does this, about the case of bob, prior distribution represent?
Our uncertainty about Bob’s IQ before seeing the test results.
35
What is the posterior distribution in the bob example?
The updated belief about Bob's IQ after combining the prior with the three test results.
36
What does it mean that the posterior distribution is narrower than the prior?
We are more confident about Bob's true IQ — less uncertainty.
37
What does area A under the prior distribution represent?
The probability that Bob’s IQ is under 70 before seeing the test results.
38
What does area B under the posterior distribution represent?
The probability that Bob’s IQ is under 70 after seeing the test results.
39
What does Point C on the posterior distribution represent?
The most likely value (mode) of Bob’s IQ after the data is seen
40
What does Point D represent?
The likelihood ratio: how much more likely the most probable IQ value is than an IQ of 70. (Here: 1.5 times more likely.)
41
What does Point E represent?
A 95% Bayesian credible interval, indicating the interval (64.99 to 81.66) where Bob’s true IQ likely lies.
42
What does it mean for binary data to be "exchangeable"?
The order of data doesn’t matter — only the total counts do.
43
What is θ in the context of binomial data?
The latent chance (probability) of getting one of the outcomes (e.g., someone being a cat person).
44
What kind of prior distribution is used initially?
A uniform prior — every proportion θ is equally likely.
45
Why is a uniform prior not always a good idea?
It can lead to extreme or unrealistic posteriors after very little data (e.g., assuming everyone is a cat person after just 1 "c").
46
What happens to the posterior after entering 1 "cat" data point?
The most likely value of θ becomes 1 (i.e., 100% cat people), which is not reasonable.
47
What happens when switching to a 2–2 parameter prior?
The prior becomes less extreme; now the most likely values of θ are somewhere in the middle.
48
Why is the 2–2 prior better in this case?
It avoids extreme conclusions and reflects that we don't expect all cat or all dog people.
49
In the lecture example, why do they switch back to the uniform prior later?
To demonstrate how the data alone can gradually correct unrealistic priors over time.
50
What happens to the posterior as more data is entered?
It becomes sharper and more centered — reflecting stronger, more precise beliefs.
51
What are some ways to summarize the posterior distribution?
Mode, mean, median, and Bayesian credible intervals.
52
What does a Bayesian credible interval represent?
The range where the true value of θ likely lies with a certain level of confidence (e.g., 95%)
53
What is the closing message of the parameter estimation section?
The order of the data doesn’t matter — only the final counts do
54
What is the general effect of adding complexity to a model?
It often fits the data better but risks poor predictive performance, especially between known values.
55
What is the guiding principle when choosing between simple and complex models?
Occam’s Razor (parsimony): Prefer simpler models unless the data clearly demands complexity.
56
Who carries the burden of proof in model comparison?
The advocate of the more complex model.
57
What is the benefit of a simple (parsimonious) model?
It concentrates predictive mass on fewer outcomes — if those outcomes occur, it gains much more credibility.
58
Why are complex models called “cowardly”?
Because they spread their predictions across many outcomes, hedging bets and avoiding strong commitments.
59
What does H₀ represent?
The null hypothesis — the simpler model.
60
What does H₁ represent?
The alternative hypothesis — the more complex model.
61
In Bayesian terms, which model is better supported by the data?
The one that predicts the data best — not necessarily the one with the best fit.
62
What is the Bayesian measure for comparing predictive performance?
The Bayes Factor.
63
What are the 3 key components of Bayesian hypothesis testing?
Posterior odds Prior odds Predictive updating factor (Bayes Factor)
64
How do you calculate posterior odds?
P(H 1∣data) ---------------- P(H 0∣data) ​
65
What is the Bayes Factor (predictive updating factor)?
P(data∣H 1) ---------------- P(data∣H 0) ​ — the ratio of how well each model predicts the observed data.
66
What role does the Bayes Factor play?
It governs how we shift from prior to posterior beliefs.
67
What does a Bayes Factor of 1–3 mean?
Anecdotal evidence.
68
What does a Bayes Factor of 3–10 mean?
Moderate evidence.
69
What does a Bayes Factor of 10–30 mean?
Strong evidence.
70
What does a Bayes Factor of 30–100 mean?
Very strong evidence.
71
What does a Bayes Factor over 100 mean?
Extreme evidence.
72
Suppose BF₀₁ = 3. What is the correct interpretation?
The data are 3× more likely under H₀ than under H₁. Or H₀ outpredicts H₁ by a factor of 3. Wrong: That H₀ is now 3× more likely than H₁ — this is only true if H₀ and H₁ were equally likely a priori.
73
What does the Bayes Factor actually quantify?
The relative predictive performance of the models — not the absolute belief in a hypothesis.
74
What does the Bayes Factor tell you about your beliefs?
How much your beliefs should shift, not what those beliefs are.
75
In the hypothesis “All X are Y”, what happens as you observe confirmatory data?
Each confirmatory instance increases your belief in the general law.
76
How does Bayesian inference treat repeated confirmatory evidence?
The Bayes Factor in favor of the null increases with each confirmation.
77
Does the Bayes Factor depend on the prior distribution?
Yes
78
Name a key advantage of Bayes Factors over p-values.
They quantify evidence instead of forcing a binary accept/reject decision.
79
How do Bayes Factors handle "absence of evidence"?
They distinguish between evidence of absence and absence of evidence.
80
Can Bayes Factors be used sequentially as data accumulates?
Yes — they allow continuous updating and evidence monitoring.
81
Are Bayes Factors applicable to real-world, unplanned data collection?
Yes — they do not require a fixed sampling plan.
82
Wat stelt de verhouding P(θ∣data) P(θ)
De verandering in geloof over θ door de data — hoeveel meer of minder we in θ geloven na het zien van de data.
83
Wat stelt de verhouding P(data∣θ) P(data) ​
De predictive success: hoe goed θ de data verklaart vergeleken met alle andere mogelijke waarden van θ.
84
Wat is de betekenis van de vergelijking: P(θ∣data) = P(data∣θ) P(θ) P(data)
De mate van geloofsverandering (support) is gelijk aan hoe goed θ de data voorspelt (predictive success)
85
Leg deze vergelijking uit"
Deze vergelijking laat zien dat de mate waarin ik mijn geloof in een bepaalde waarde van θ aanpas (support), precies overeenkomt met hoe goed die θ de waargenomen data voorspelt (predictive success). Hoe beter θ de data verklaart in vergelijking met alle andere mogelijkheden, hoe meer ik geneigd ben om na het zien van de data in die θ te geloven Links = verandering in geloof, rechts = voorspellend vermogen, en ze zijn aan elkaar gelijk.
86