Backprop Flashcards

(27 cards)

1
Q

What is backpropagation used for in neural networks?

A

To compute gradients of the loss function with respect to weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the key mathematical tool used in backpropagation?

A

The chain rule of calculus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the goal of training a neural network?

A

To minimize the loss function by updating the model’s parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the forward pass in a neural network compute?

A

The output prediction ŷ from input x using current weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the loss function measure?

A

The difference between predicted output ŷ and true output y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for mean squared error (MSE)?

A

MSE = ½ Σ(yᵢ - ŷᵢ)²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does gradient descent do?

A

Updates parameters in the direction that reduces the loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the general update rule in gradient descent?

A

w ← w - η · ∂J/∂w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we need the chain rule in multilayer networks?

A

Because each weight affects the loss through multiple intermediate functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the backward pass in backpropagation compute?

A

Gradients of the loss with respect to each weight in the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does backpropagation proceed through the network?

A

From the output layer backward to the input layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ∂J/∂w₅ represent?

A

The partial derivative of the loss with respect to weight w₅.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the derivative of the sigmoid function?

A

σ’(z) = σ(z) · (1 - σ(z))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does each term in the chain rule represent during backprop?

A

A local gradient of one part of the network function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In a two-layer network, where do we start backpropagation?

A

At the output layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the learning rate η control?

A

The size of each update step during training.

17
Q

What is the effect of a high learning rate?

A

Potential overshooting or divergence.

18
Q

What is the effect of a very low learning rate?

A

Very slow convergence.

19
Q

In backpropagation, why do we update hidden layer weights?

A

Because they also influence the loss through their effect on the output.

20
Q

What does the forward pass compute in numerical backprop?

A

Layer outputs using initial weights and activations.

21
Q

What is the typical loss value after the first forward pass in an example?

A

~0.2984 (in the lecture example)

22
Q

How is the gradient with respect to an output weight computed?

A

Using the error, activation derivative, and the previous activation value.

23
Q

What does backprop through the hidden layer require?

A

Propagating the output error through the activation and weights.

24
Q

What kind of function is used as activation in the numerical example?

A

Sigmoid function.

25
What does one epoch of training include?
A forward pass, loss computation, backward pass, and weight updates.
26
What happens after computing all gradients in backprop?
Weights are updated and the process repeats in the next epoch.
27
What is the purpose of repeating forward and backward passes?
To iteratively reduce the loss and improve predictions.