Week 2 (MLE, MAP, Bayesian Curve, Entropy) Flashcards by tyrion lannister

Stirling’s Approximation

As N -> inf

How well did you know this?

Not at all

Perfectly

Marginal probability, Joint probability and conditional probability in the context of the grid

Where N is the total number of possibilities

How well did you know this?

Not at all

Perfectly

Sum rule and product rule in context of grid

How well did you know this?

Not at all

Perfectly

Transformed densities formula

How well did you know this?

Not at all

Perfectly

Variance and covariance formulae

How well did you know this?

Not at all

Perfectly

Multivariate Gaussian PDF

How well did you know this?

Not at all

Perfectly

Log Likelihood and estimators of μ and σ^2 for univariate Gaussian

How well did you know this?

Not at all

Perfectly

Expectation of MLEs for univariate Gaussian

How well did you know this?

Not at all

Perfectly

Connect the MLE to polynomial curve fitting and construct resulting likelihood function

Instead of taking the naive error minimisation approach we let t_n = y(x_n, w) + ε_n where ε is Gaussian nose distributed N(0, β^-1)

This implies p(t_n | x_n, w, β) = Π_n N(y(x_n, w), β ^-1)

How well did you know this?

Not at all

Perfectly

Apply MLE to polynomial curve fitting to produce predictive distribution

How well did you know this?

Not at all

Perfectly

How does MAP differ from MLE predictive distribution

We introduce a prior on the weights w addressing overfitting by regularisation

How well did you know this?

Not at all

Perfectly

Process for MAP

Where p(w| α) is the Gaussian prior we assume on w

How well did you know this?

Not at all

Perfectly

How does Bayesian curve fitting build on MAP

We treat w as a RV computing the full posterior and integrating over all possible w to make predictions

This addresses the limitations of the MLE and MAP approaches by quantifying the uncertainty of both the data and the model

How well did you know this?

Not at all

Perfectly

Process for Bayesian curve fitting

Where p(w| x, t, α, β) is the posterior (as it is for MAP)

How well did you know this?

Not at all

Perfectly

Entropy equation discrete

How well did you know this?

Not at all

Perfectly

Cross validation, method & purpose

Study These Flashcards

To select best degree of polynomial curve fitting split data into training and validation sets multiple times for each model for i=1,…, M :
Train model on training set
Evaluate error on validation sets
Choose M with lowest error

To avoid overfitting by over parametrising

Differential entropy for continuous variables

Study These Flashcards

Differential entropy maximised for Gaussian distribution

Study These Flashcards

Conditional entropy

Study These Flashcards

Def KL divergence

Study These Flashcards

Def mutual information

Study These Flashcards

Mutual information I[x,y] measures information shared between x and y

Decision theory process

Study These Flashcards

Minimum expected loss for classification

Study These Flashcards

Decision theory steps for regression

Study These Flashcards

Derivation of squared loss function

Generative vs Discriminative approaches

Generative models learn the full join distribution allowing density estimation and handling missing data Discriminative models focus on the decision boundary, often simpler but less flexible

Notation of bayesian curve fitting

Week 2 (MLE, MAP, Bayesian Curve, Entropy) Flashcards

(27 cards)