Backprop Flashcards by ROWAN Gomanee

What is backpropagation used for in neural networks?

To compute gradients of the loss function with respect to weights.

How well did you know this?

Not at all

Perfectly

What is the key mathematical tool used in backpropagation?

The chain rule of calculus.

How well did you know this?

Not at all

Perfectly

What is the goal of training a neural network?

To minimize the loss function by updating the model’s parameters.

How well did you know this?

Not at all

Perfectly

What does the forward pass in a neural network compute?

The output prediction ŷ from input x using current weights.

How well did you know this?

Not at all

Perfectly

What does the loss function measure?

The difference between predicted output ŷ and true output y.

How well did you know this?

Not at all

Perfectly

What is the formula for mean squared error (MSE)?

MSE = ½ Σ(yᵢ - ŷᵢ)²

How well did you know this?

Not at all

Perfectly

What does gradient descent do?

Updates parameters in the direction that reduces the loss.

How well did you know this?

Not at all

Perfectly

What is the general update rule in gradient descent?

w ← w - η · ∂J/∂w

How well did you know this?

Not at all

Perfectly

Why do we need the chain rule in multilayer networks?

Because each weight affects the loss through multiple intermediate functions.

How well did you know this?

Not at all

Perfectly

What does the backward pass in backpropagation compute?

Gradients of the loss with respect to each weight in the network.

How well did you know this?

Not at all

Perfectly

How does backpropagation proceed through the network?

From the output layer backward to the input layer.

How well did you know this?

Not at all

Perfectly

What does ∂J/∂w₅ represent?

The partial derivative of the loss with respect to weight w₅.

How well did you know this?

Not at all

Perfectly

What is the derivative of the sigmoid function?

σ’(z) = σ(z) · (1 - σ(z))

How well did you know this?

Not at all

Perfectly

What does each term in the chain rule represent during backprop?

A local gradient of one part of the network function.

How well did you know this?

Not at all

Perfectly

In a two-layer network, where do we start backpropagation?

At the output layer.

How well did you know this?

Not at all

Perfectly

What does the learning rate η control?

Study These Flashcards

The size of each update step during training.

What is the effect of a high learning rate?

Study These Flashcards

Potential overshooting or divergence.

What is the effect of a very low learning rate?

Study These Flashcards

Very slow convergence.

In backpropagation, why do we update hidden layer weights?

Study These Flashcards

Because they also influence the loss through their effect on the output.

What does the forward pass compute in numerical backprop?

Study These Flashcards

Layer outputs using initial weights and activations.

How is the gradient with respect to an output weight computed?

Study These Flashcards

Using the error, activation derivative, and the previous activation value.

What does backprop through the hidden layer require?

Study These Flashcards

Propagating the output error through the activation and weights.

What does one epoch of training include?

Study These Flashcards

A forward pass, loss computation, backward pass, and weight updates.

What happens after computing all gradients in backprop?

Study These Flashcards

Weights are updated and the process repeats in the next epoch.

What is the purpose of repeating forward and backward passes?

To iteratively reduce the loss and improve predictions.

Backprop Flashcards

(25 cards)