Final Deep Learning Flashcards

(29 cards)

1
Q

1943

A

McCulloch/Pitts - Binary Neuron activation - they activate above a threshold [s >0]; allowed us to model neurons as non-linear functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

1948

A

Wiener - Feedback Cycles in the brain/with neurons - allowed us to give and get feedback for artificial neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

1949

A

Hebb - NeuroPlasticity/Synaptic Adaptability - Taught us how to adjust/train neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

1957

A

Rosenblatt - Single Layer Perceptron - [wTx] > 0 and learning rule; created first artificial neuron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Learning Rule for single layer perceptron

A

W1 <- w1 - w1f(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Consider tossing a coin where p = probability of heads

Write P(y) in terms of Q(p)^(R(y))

A

P^y * (1-p)^(1-y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For a single layer perceptron, how do you calculate the probability of a binary output?

A

Sigmoid(wTf(x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For a binary prediction, write P(y|x) in polynomial multiplication form

A

sigmoid(wtf(x))^y*(1 - sigmoid(wtf(x))^(1-y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give the formula for H(y, y~) and explain why it’s important

A

Binary cross entropy - H(y, y~) is called cost because it tells us how far off the prediction is

H(y, y~) = ylog(y~) + (

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give the formula for H(y, y~) and explain why it’s important-

A

Binary cross entropy - H(y, y~) is called cost because it tells us how far off the prediction is

H(y, y~) = ylog(y~) + (1-y)(1-log(y~))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

L(w, x, y) = ?

A

ylog(1 + exp(-s)) + (1-y)(1-log(1 + exp(s))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Soft plus

A

Log(1 + exp(s))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

L(w, x, y) with soft plus

A

ysoftplus(-s) + (1-y)softplus(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Rewrite loss in terms of log likelihood

A

substitute in ynlog(p(y=1|x)) + (1-yn)log(P(y=0|x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we recover the perceptron learning rule w <- w + (y - y~)f(x) from minimizing the loss function of the perceptron

A

When you take the derivative of the sigmoid (wtfx), you get (y^ - y)fx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Let y = onehot_k(y), the y-th column of I_k, the identity matrix of size K. Give the expression for the scalar P(y|x)

A

P(y|x) = exp(Bs_y)/(sum(exp(Bs_k)

17
Q

Probability of dataset D with a soft multiclass perceptron

A

product of y^t, y~ for all yn

18
Q

Cross Entropy for a soft multiclass perceptron

A

H(y, y~) = -log[y^ty~]

19
Q

Rewrite cross entropy loss using softmax function

A

softmax(wtf) - wy^Tf

20
Q

multiclass perceptron learning rule

A

W <- w + (y - y~)fT(x)

21
Q

1969

A

Seymour/Minsky - Seymour Report - Perceptrons will not converge with non-linearly separable data, killing hopes for AI

22
Q

1973

A

UK National Science Council - Lighthall Report - AI Research is not promising, causing an AI Winter

23
Q

1982

A

Hopfield - Proposed Hopfield Net, an associative memory that uses the McCulloch and Pitts binary Neuron and minimizing energy

24
Q

1986

A

Williams;Hinton - Backpropagation/GD w<- w + lr*dLoss/dw, as well as stochastic hopfield nets with hidden units. Allowed us to easily calculate the gradient of NNs with hidden layers, creating deep networks

25
How did we get out of the first AI winter?
Widrow, who also introduced the adaptive linear neuron, changed AI research names to Adaptive Filters
26
1989
Yann Lecunn invents CNNs, now we're able to process images with less parameters allowing big inputs with low computation
27
1997
Jordan Schmidhuber introduces the LSTM, allowing us to model long-term dependencies in sequences
28
give the formal definition of a signal over X
X = {x^(p): omega -> R^c, w -> x^(p)(w)} where omega is the domain, c is the channelsh
29
What are the three properties of natural signals exploited by CNNs and how do they do so?
Stationarity - parameter sharing Locality - kernel preserves spatial awareness Compositionality - The transformed output is the whole input put together