Final Deep Learning Flashcards
(29 cards)
1943
McCulloch/Pitts - Binary Neuron activation - they activate above a threshold [s >0]; allowed us to model neurons as non-linear functions
1948
Wiener - Feedback Cycles in the brain/with neurons - allowed us to give and get feedback for artificial neurons
1949
Hebb - NeuroPlasticity/Synaptic Adaptability - Taught us how to adjust/train neurons
1957
Rosenblatt - Single Layer Perceptron - [wTx] > 0 and learning rule; created first artificial neuron
Learning Rule for single layer perceptron
W1 <- w1 - w1f(x)
Consider tossing a coin where p = probability of heads
Write P(y) in terms of Q(p)^(R(y))
P^y * (1-p)^(1-y)
For a single layer perceptron, how do you calculate the probability of a binary output?
Sigmoid(wTf(x))
For a binary prediction, write P(y|x) in polynomial multiplication form
sigmoid(wtf(x))^y*(1 - sigmoid(wtf(x))^(1-y)
Give the formula for H(y, y~) and explain why it’s important
Binary cross entropy - H(y, y~) is called cost because it tells us how far off the prediction is
H(y, y~) = ylog(y~) + (
Give the formula for H(y, y~) and explain why it’s important-
Binary cross entropy - H(y, y~) is called cost because it tells us how far off the prediction is
H(y, y~) = ylog(y~) + (1-y)(1-log(y~))
L(w, x, y) = ?
ylog(1 + exp(-s)) + (1-y)(1-log(1 + exp(s))
Soft plus
Log(1 + exp(s))
L(w, x, y) with soft plus
ysoftplus(-s) + (1-y)softplus(s)
Rewrite loss in terms of log likelihood
substitute in ynlog(p(y=1|x)) + (1-yn)log(P(y=0|x))
How do we recover the perceptron learning rule w <- w + (y - y~)f(x) from minimizing the loss function of the perceptron
When you take the derivative of the sigmoid (wtfx), you get (y^ - y)fx
Let y = onehot_k(y), the y-th column of I_k, the identity matrix of size K. Give the expression for the scalar P(y|x)
P(y|x) = exp(Bs_y)/(sum(exp(Bs_k)
Probability of dataset D with a soft multiclass perceptron
product of y^t, y~ for all yn
Cross Entropy for a soft multiclass perceptron
H(y, y~) = -log[y^ty~]
Rewrite cross entropy loss using softmax function
softmax(wtf) - wy^Tf
multiclass perceptron learning rule
W <- w + (y - y~)fT(x)
1969
Seymour/Minsky - Seymour Report - Perceptrons will not converge with non-linearly separable data, killing hopes for AI
1973
UK National Science Council - Lighthall Report - AI Research is not promising, causing an AI Winter
1982
Hopfield - Proposed Hopfield Net, an associative memory that uses the McCulloch and Pitts binary Neuron and minimizing energy
1986
Williams;Hinton - Backpropagation/GD w<- w + lr*dLoss/dw, as well as stochastic hopfield nets with hidden units. Allowed us to easily calculate the gradient of NNs with hidden layers, creating deep networks