Equations Flashcards

1
Q

conditional probability, p(a|b) =

A

p(a,b) / p(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

bayes, p(a|b) =

A

p(b|a)p(a) / p(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

independent events, p(a,b) =

A

p(a)p(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

total probability/marginalisation, p(X=x) =

A

sumy: p(x|y)p(y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

conditional independence assumption, p(x|y) =

A

multiply: p(x|y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

discriminant function, f(x) =

A

sum: wx - t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

perceptron update rule, sigmoid error wj =

A

wj - (lrate)(f(x)-y)(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sigmoid/logistic regression, f(x) =

A

1 / (1+e^-z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

log loss/cross entropy loss, L(f(x),y) =

A

-{ylogf(x) + (1-y)log(1-f(x))}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

summed log loss/ cross entropy error/ negative log likelihood, E =

A
  • sum i: {ylogf(x) + (1-y)log(1-f(x))}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

partial derivative of cross entropy error, dE/dw =

A

sum: (f(x) - y))(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

partial derivative of sigmoid, dy/dz =

A

y(1-y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

partial derivative of cross entropy error, dE/df(x)

A

-[y(1/f(x)) - (1-y)(1/(1-f(x)))]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

specificity =

A

TN / (FP+TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

precision = positive predictive value =

A

TP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

recall = sensitivity = tp rate =

A

TP / P

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

fp rate =

A

FP / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

f1 measure =

A

2 / (1/precision) + (1/recall)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

pearsons correlation coefficient =

A

sum:(x-xhat)(y-yhat) / sqrt(sum:(x-xhat)^2)sum:(y-yhat)^2))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

information gain/ mutual information =

A

I(X;Y) = H(Y) - H(Y|X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

euclidean distance =

A

sqrt(sum:(x1-x2)^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

hamming distance =

A

sum: delta(xi not equal xj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

neuron, y(x,w) =

A

f(wx + b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

softmax =

A

e^z / sumk: e^z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
gradient descent, wnew =
wold - (lrate)(dL/dw)
26
mean squared error loss, MSE =
1/n sum: (y-t)^2
27
neuron gradient, with sigmoid loss, and squared loss, dL/dw =
dL/dy dy/dz dz/dw = (y-yhat)(yhat)(1-yhat)(x)
28
entropy, H(X) =
sum: p(x)logp(x)
29
sum: p(x)logp(x)
entropy
30
L = 0.5(y-t)^2
squared error loss
31
e^z / sumk: e^z
softmax
32
information gain =
I(X;Y) = H(Y) - H(Y|X)
33
mutual information =
I(X;Y) = H(Y) - H(Y|X)
34
recall =
TP / P
35
sensitivity =
TP / P
36
tp rate =
TP / P
37
precision =
TP / (TP + FP)
38
positive predictive value =
TP / (TP + FP)
39
- sum i: {ylogf(x) + (1-y)log(1-f(x))}
summed log loss/ cross entropy error/ negative log likelihood
40
summed log loss =
- sum i: {ylogf(x) + (1-y)log(1-f(x))}
41
cross entropy error =
- sum i: {ylogf(x) + (1-y)log(1-f(x))}
42
negative log likelihood =
- sum i: {ylogf(x) + (1-y)log(1-f(x))}
43
log loss, L(f(x),y) =
-{ylogf(x) + (1-y)log(1-f(x))}
44
cross entropy loss, L(f(x),y) =
-{ylogf(x) + (1-y)log(1-f(x))}
45
sigmoid, f(x) =
1 / (1+e^-z)
46
logistic regression, f(x) =
1 / (1+e^-z)
47
bias update for logistic regression, t =
t + lrate(f(x) - y)
48
bias update for perceptron, t =
t = t +lrate(yhat - y)
49
what is P(A or B) if a) they are disjoint b) they are joint
a) P(A) + P(B) | b) P(A) + P(B) - P(A and B)
50
give the bernoulli distribution
``` P(X = 0) = 1 - p p(X = 1) = p ```
51
give the binomial distribution
P(X = k) = (nCk)(p^k)(1-p)^k
52
give the geometric distribution
P(X=x) = (1-p)^x-1 (p)
53
give the poisson distribution
P(X=x) = { lambda^x e(-lambda) } / x!
54
if a discrete r.v. X has a pmf f(X) what is the expected value E[g(x)]
sum i: g(Xi)f(Xi)
55
if a discrete r.v. X has a pmf f(X) what is the variance V[g(x)]
E[(g(X) - E(g(X)))^2] = E[g(X)^2] - E[g(X)]^2
56
properties of Expectations | E[aX + b] =
aE[X] + b
57
properties of variance: | V[aX+b] =
a^2V[X]
58
give the equation for hinge loss
sum: -y(wx + b) = sum: -y(yhat) sum all the negative values for ONLY the misclassified samples
59
when we perform minibatch sgd, what do we times sum:dL/dW by to scale it
n / |S| | n samples / batch size
60
what is the perceptron weight update, with hinge loss?
wj = wj - (lrate)( - yhat x y)(xj) or if just for the misclassified wj = wj - (lrate)( -y)(xj) = wj + (lrate)(y)(xj)
61
what is the loss function (negative log-likelihood) for SGD for logistic regression
- 1/n sumi->n:[yi log f(xi) + (1-yi) log (1-f(xi))] same but with 1/n to rescale based on sample size
62
the decision boundary for logistic regression is given by
d = 1 / (1+e^-z) wx + b = log(d / 1-d)
63
give the equation for zero mean, unit variance normalisation
(x - x_mean) / sigma
64
give the equation for restrict range normalisation
- (x - x_min) / (x_max - x_min)
65
give the equation for fisher score, F=
(m1 - m2)^2 ---------------- v1 + v2
66
give a kernel for horizontal lines
1 1 1 0 0 0 -1 -1 -1
67
give a kernel for vertical lines
1 0 -1 1 0 -1 1 0 -1
68
give the distribution update scheme for adaboost, i.e. what do we multiply Dj(i) by
1 / 2ej if the classification was incorrect | 1 / 2(1-ej) if the classification was correct
69
if we know that A is conditionally independent of B given C, then P(A|B,C) = ?
P(A|C)
70
if A is conditionally independent of B given C, then P(A|B,C) = P(A|C), prove it
P(A,B|C) = P(A|C)P(B|C), conditional independence P(A,B,C) / P(C) = P(A,C)/P(C) P(B|C)/P(C), conditional probability P(A,B,C) = P(A,C)P(B,C) / P(C), times by P(C) P(A,B,C)/P(B,C) = P(A,C)/P(C), divide by P(B,C) PA|B,C) = P(A|C)
71
if A and B are conditionally independent given C then we know?
P(A,B|C) = P(A|C)P(B|C)
72
d e^x / dx = ?
x' e^x
73
d ln x/ dx = ?
1 / x
74
product rule, d(uv) / dx
u dv/dx + v du/dx
75
d log f(x) / dx =
f'(x) / f(x)