Week 5: Statistical Modelling Flashcards

1
Q

Probability Distributions

A

This approach models uncertainty and quantifies our degree of belief that something will happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Probability Distribution Function (PDF)

A

The area under the curve between two points of a PDF is the probability of the outcome being within the two points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cumulative Distribution Function (PDF)

A

The height of the curve at a point is the chance that the outcome is less than or equal to the point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Joint Distribution

A

It’s the probability distribution of all the random variables in the set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Independence

A

A and B and independent if

P(A|B) = P(A),
P(B|A) = P(B),
P(A,B) = P(A)*P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Conditional Independence

A

A and B are conditionally independent given C

iff P(A,B|C) = P(A|C) * P(B|C), or
iff P(A|B,C) = P(A|C)

Conditional independence doesn’t imply unconditional independence or the other way around.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Representative Sample

A

A sample from a population that accurately reflects the characteristics of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Prior

A

The initial probability that hypothesis h holds without having observed the data.

P(h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Likelihood

A

The probability of observing data D, given some world where the hypothesis h is true.

P(D|h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Posterior

A

The probability that hypothesis h is true, given that we have observed dataset D.

P(h|D)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Likelihoods

A

When modelling a random process, we don’t know the hypothesis h. We estimate the parameters of a model h by maximising the probability P(D|h) (or L(h|D)) of observing D. Hypotheses aren’t always mutually exclusive and there can be an infinite number of them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Maximum Likelihood Estimate (MLE)

A

Calculate the parameters of so to maximise the likelihood L(h|D).

Goal is \arg \max_h \left{L(h \mid D) \right}

L(h \mid D) = P(D \mid h) = \prod_{i=1}^m P(\boldsymbol{x}_i \mid h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bayesian Estimation

A

Compute a model h of maximum posterior probability Pr(h \mid D)

Goal is \arg \max_h \left{ P(h \mid D) \right}

Using Bayes Rule,

P(h \mid D) = \frac{P(D \mid h) \cdot P(h)}{P(D)}

This assumes conditional independence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conditional Independence Assumption

A

Can multiply all probabilities given assumption and the probability of the assumption together in the numerator divided by the probability of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probability Density Function for Normal Distribution

A

f(x) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2} \left( \frac{(x - \mu)^2}{\sigma^2} \right)}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Laplace Estimation

A

When computing likelihoods for each possible attribute value, add 1 to the numerator and \ell to the denominator. This allows for non-zero denominators

17
Q

Density Estimation

A

Given a dataset, compute an estimate of an underlying probability density function.

18
Q

Parametric Models

A

The number of parameters is fixed and independent of the training set size. These are approximations of reality and incorporate stronger assumptions than non-parametric models. They’re generally more explainable and enable deeper investigations.

Examples include:
- Multivariate Linear Regression
- Neural Networks
- k-Means
- Gaussian

19
Q

Non-parametric Models

A

The number of parameters grows as the sample size increases. They have modelling power for getting stronger representations.

Examples include:
- Decision Tree
- DBSCAN

20
Q

Multivariate Gaussian/Normal Distribution

A

f(\boldsymbol{x}) = \frac{1}{(2\pi)^{\frac{n}{2}} \left\lvert \boldsymbol{\Sigma} \right\rvert ^{\frac{1}{2}}} e^{-\frac{1}{2}(\boldsymbol{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^_{-1} (\boldsymbol{x} - \boldsymbol{\mu})}

21
Q

Iso-density Contours

A

In these contours, all the points x have equal density. f(x) = c.

This is similar to the elevation maps used in topography.

22
Q

Poisson Distribution

A

f(x) = \frac{\delta^x e^{-\delta}}{x !}

\delta = rate at which the events occur
x = random variable corresponding to the number of events

23
Q

Mixture Model

A

Consists of multiple component models each one specified by its own parameters.

f(\boldsymbol{x}) = \sum_{k=1}^K \pi_k f_k (\boldsymbol{x}; \boldsymbol{w}_k)

24
Q

Log-likelihood of a Mixture Model

A

L(\boldsymbol{\pi}, \boldsymbol{w}_1,…,\boldsymbol{w}K) = \log \left[ \sum{k=1}^K \pi_k f_k (\boldsymbol{x}; \boldsymbol{w}_k) \right]

25
Q

Gaussian Mixture Model (GMM)

A

P(\boldsymbol{x}i) = \sum{k=1}^K P(C_k) P(\boldsymbol{x}_i \mid C_k)

26
Q

Expectation-Maximisation (EM) Algorithm

A

Well-known algorithm for computing GMM’s.

E-Step: compute \pi_{i,k} = P(C_k \mid \boldsymbol{x}i). Using Bayes rule, compute P(\boldsymbol{x}i \mid C_k) P(C_k). m_k = \sum{i=1}^m \pi{i,k}

M-step: compute the new means, covariances, and component weights
\boldsymbol{\mu}k \leftarrow \sum{i=1}^m \left( \frac{\pi_{i,k}}{m_k} \right) \boldsymbol{x}i
\boldsymbol{\Sigma}k \leftarrow \sum{i=1}^m \left( \frac{\pi
{i,k}}{m_k}\right) (\boldsymbol{x}_i - \boldsymbol{\mu}_j) (\boldsymbol{x}_i - \boldsymbol{\mu}_j)^T
\pi_k \leftarrow \frac{m_k}{m}