Week 2 (MLE, MAP, Bayesian Curve, Entropy) Flashcards

(27 cards)

1
Q

Stirling’s Approximation

A

As N -> inf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Marginal probability, Joint probability and conditional probability in the context of the grid

A

Where N is the total number of possibilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sum rule and product rule in context of grid

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Transformed densities formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variance and covariance formulae

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multivariate Gaussian PDF

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Log Likelihood and estimators of μ and σ^2 for univariate Gaussian

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Expectation of MLEs for univariate Gaussian

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Connect the MLE to polynomial curve fitting and construct resulting likelihood function

A

Instead of taking the naive error minimisation approach we let tn = y(xn, w) + εn where ε is Gaussian nose distributed N(0, β-1)

This implies p(tn | xn, w, β) = Πn N(y(xn, w), β -1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Apply MLE to polynomial curve fitting to produce predictive distribution

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does MAP differ from MLE predictive distribution

A

We introduce a prior on the weights w addressing overfitting by regularisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Process for MAP

A

Where p(w| α) is the Gaussian prior we assume on w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Bayesian curve fitting build on MAP

A

We treat w as a RV computing the full posterior and integrating over all possible w to make predictions

This addresses the limitations of the MLE and MAP approaches by quantifying the uncertainty of both the data and the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Process for Bayesian curve fitting

A

Where p(w| x, t, α, β) is the posterior (as it is for MAP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Entropy equation discrete

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cross validation, method & purpose

A

To select best degree of polynomial curve fitting split data into training and validation sets multiple times for each model for i=1,…, M :
Train model on training set
Evaluate error on validation sets
Choose M with lowest error

To avoid overfitting by over parametrising

17
Q

Differential entropy for continuous variables

18
Q

Differential entropy maximised for Gaussian distribution

19
Q

Conditional entropy

20
Q

Def KL divergence

21
Q

Def mutual information

A

Mutual information I[x,y] measures information shared between x and y

22
Q

Decision theory process

23
Q

Minimum expected loss for classification

24
Q

Decision theory steps for regression

25
Derivation of squared loss function
26
Generative vs Discriminative approaches
Generative models learn the full join distribution allowing density estimation and handling missing data Discriminative models focus on the decision boundary, often simpler but less flexible
27
Notation of bayesian curve fitting