Final Deep Learning Flashcards by Sahil Sen

1943

McCulloch/Pitts - Binary Neuron activation - they activate above a threshold [s >0]; allowed us to model neurons as non-linear functions

How well did you know this?

Not at all

Perfectly

1948

Wiener - Feedback Cycles in the brain/with neurons - allowed us to give and get feedback for artificial neurons

How well did you know this?

Not at all

Perfectly

1949

Hebb - NeuroPlasticity/Synaptic Adaptability - Taught us how to adjust/train neurons

How well did you know this?

Not at all

Perfectly

1957

Rosenblatt - Single Layer Perceptron - [wTx] > 0 and learning rule; created first artificial neuron

How well did you know this?

Not at all

Perfectly

Learning Rule for single layer perceptron

W1 <- w1 - w1f(x)

How well did you know this?

Not at all

Perfectly

Consider tossing a coin where p = probability of heads

Write P(y) in terms of Q(p)^(R(y))

P^y * (1-p)^(1-y)

How well did you know this?

Not at all

Perfectly

For a single layer perceptron, how do you calculate the probability of a binary output?

Sigmoid(wTf(x))

How well did you know this?

Not at all

Perfectly

For a binary prediction, write P(y|x) in polynomial multiplication form

sigmoid(wtf(x))^y*(1 - sigmoid(wtf(x))^(1-y)

How well did you know this?

Not at all

Perfectly

Give the formula for H(y, y~) and explain why it’s important

Binary cross entropy - H(y, y~) is called cost because it tells us how far off the prediction is

H(y, y~) = ylog(y~) + (

How well did you know this?

Not at all

Perfectly

Give the formula for H(y, y~) and explain why it’s important-

Binary cross entropy - H(y, y~) is called cost because it tells us how far off the prediction is

H(y, y~) = ylog(y~) + (1-y)(1-log(y~))

How well did you know this?

Not at all

Perfectly

L(w, x, y) = ?

ylog(1 + exp(-s)) + (1-y)(1-log(1 + exp(s))

How well did you know this?

Not at all

Perfectly

Soft plus

Log(1 + exp(s))

How well did you know this?

Not at all

Perfectly

L(w, x, y) with soft plus

ysoftplus(-s) + (1-y)softplus(s)

How well did you know this?

Not at all

Perfectly

Rewrite loss in terms of log likelihood

substitute in ynlog(p(y=1|x)) + (1-yn)log(P(y=0|x))

How well did you know this?

Not at all

Perfectly

How do we recover the perceptron learning rule w <- w + (y - y~)f(x) from minimizing the loss function of the perceptron

When you take the derivative of the sigmoid (wtfx), you get (y^ - y)fx

How well did you know this?

Not at all

Perfectly

Let y = onehot_k(y), the y-th column of I_k, the identity matrix of size K. Give the expression for the scalar P(y|x)

Study These Flashcards

P(y|x) = exp(Bs_y)/(sum(exp(Bs_k)

Probability of dataset D with a soft multiclass perceptron

Study These Flashcards

product of y^t, y~ for all yn

Cross Entropy for a soft multiclass perceptron

Study These Flashcards

H(y, y~) = -log[y^ty~]

Rewrite cross entropy loss using softmax function

Study These Flashcards

softmax(wtf) - wy^Tf

multiclass perceptron learning rule

Study These Flashcards

W <- w + (y - y~)fT(x)

1969

Study These Flashcards

Seymour/Minsky - Seymour Report - Perceptrons will not converge with non-linearly separable data, killing hopes for AI

1973

Study These Flashcards

UK National Science Council - Lighthall Report - AI Research is not promising, causing an AI Winter

1982

Study These Flashcards

Hopfield - Proposed Hopfield Net, an associative memory that uses the McCulloch and Pitts binary Neuron and minimizing energy

1986

Study These Flashcards

Williams;Hinton - Backpropagation/GD w<- w + lr*dLoss/dw, as well as stochastic hopfield nets with hidden units. Allowed us to easily calculate the gradient of NNs with hidden layers, creating deep networks

How did we get out of the first AI winter?

Widrow, who also introduced the adaptive linear neuron, changed AI research names to Adaptive Filters

1989

Yann Lecunn invents CNNs, now we're able to process images with less parameters allowing big inputs with low computation

1997

Jordan Schmidhuber introduces the LSTM, allowing us to model long-term dependencies in sequences

give the formal definition of a signal over X

X = {x^(p): omega -> R^c, w -> x^(p)(w)} where omega is the domain, c is the channelsh

What are the three properties of natural signals exploited by CNNs and how do they do so?

Stationarity - parameter sharing Locality - kernel preserves spatial awareness Compositionality - The transformed output is the whole input put together

Final Deep Learning Flashcards

(29 cards)