Backprop Flashcards

(25 cards)

1
Q

What is backpropagation used for in neural networks?

A

To compute gradients of the loss function with respect to weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the key mathematical tool used in backpropagation?

A

The chain rule of calculus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the goal of training a neural network?

A

To minimize the loss function by updating the model’s parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the forward pass in a neural network compute?

A

The output prediction ŷ from input x using current weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the loss function measure?

A

The difference between predicted output ŷ and true output y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for mean squared error (MSE)?

A

MSE = ½ Σ(yᵢ - ŷᵢ)²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does gradient descent do?

A

Updates parameters in the direction that reduces the loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the general update rule in gradient descent?

A

w ← w - η · ∂J/∂w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we need the chain rule in multilayer networks?

A

Because each weight affects the loss through multiple intermediate functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the backward pass in backpropagation compute?

A

Gradients of the loss with respect to each weight in the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does backpropagation proceed through the network?

A

From the output layer backward to the input layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ∂J/∂w₅ represent?

A

The partial derivative of the loss with respect to weight w₅.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the derivative of the sigmoid function?

A

σ’(z) = σ(z) · (1 - σ(z))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does each term in the chain rule represent during backprop?

A

A local gradient of one part of the network function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In a two-layer network, where do we start backpropagation?

A

At the output layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the learning rate η control?

A

The size of each update step during training.

17
Q

What is the effect of a high learning rate?

A

Potential overshooting or divergence.

18
Q

What is the effect of a very low learning rate?

A

Very slow convergence.

19
Q

In backpropagation, why do we update hidden layer weights?

A

Because they also influence the loss through their effect on the output.

20
Q

What does the forward pass compute in numerical backprop?

A

Layer outputs using initial weights and activations.

21
Q

How is the gradient with respect to an output weight computed?

A

Using the error, activation derivative, and the previous activation value.

22
Q

What does backprop through the hidden layer require?

A

Propagating the output error through the activation and weights.

23
Q

What does one epoch of training include?

A

A forward pass, loss computation, backward pass, and weight updates.

24
Q

What happens after computing all gradients in backprop?

A

Weights are updated and the process repeats in the next epoch.

25
What is the purpose of repeating forward and backward passes?
To iteratively reduce the loss and improve predictions.