optimisation Flashcards

Question 1

Q

What is the goal of optimisation in supervised learning?

Answer

A

To find parameters that minimize the total loss over the dataset.

Question 2

Q

In linear regression, what are we trying to minimize?

Answer

A

The sum of squared differences between predictions and true values.

Question 3

Q

What does the gradient of a loss function represent?

Answer

A

The direction of steepest increase in loss.

Question 4

Q

What does the learning rate α control?

Answer

A

The size of the step taken during each update.

Question 5

Q

What happens if the learning rate is too small?

Answer

A

The model trains slowly and may take too long to converge.

Question 6

Q

What happens if the learning rate is too large?

Answer

A

The updates may overshoot the minimum and diverge.

Question 7

Q

What kind of function guarantees a single global minimum?

Answer

A

A convex function.

Question 8

Q

Why are neural network loss surfaces non-convex?

Answer

A

Because they contain many local minima and saddle points.

Question 9

Q

What is stochastic gradient descent (SGD)?

Answer

A

An optimisation method that updates parameters using mini-batches instead of the full dataset.

Question 10

Q

What are the advantages of using SGD?

Answer

A

It is faster, more memory efficient, and helps escape local minima.

Question 11

Q

What are the disadvantages of SGD?

Answer

A

Updates are noisy and convergence can be unstable.

Question 12

Q

What does momentum add to gradient descent?

Answer

A

An inertia term that smooths updates and helps traverse valleys.

Question 13

Q

How does Nesterov Accelerated Gradient improve on momentum?

Answer

A

It computes the gradient at the predicted future position for more accurate updates.

Question 14

Q

What is Adam short for?

Answer

A

Adaptive Moment Estimation.

Question 15

Q

What two ideas does Adam combine?

Answer

A

Momentum and adaptive learning rates (via RMSProp).

Question 16

Q

What are the first and second moments in Adam?

Answer

Study These Flashcards

A

First moment is the mean of gradients; second moment is the mean of squared gradients.

Question 17

Q

Why do we apply bias correction in Adam?

Answer

Study These Flashcards

A

To correct the initialization bias in moment estimates early in training.

Question 18

Q

What is the main benefit of Adam over SGD?

Answer

Study These Flashcards

A

It adapts learning rates for each parameter and typically converges faster.

Question 19

Q

What is a hyperparameter in optimisation?

Answer

Study These Flashcards

A

A setting (like learning rate) that is not learned from data but chosen beforehand.

Question 20

Q

What are some key hyperparameters in training?

Answer

Study These Flashcards

A

Learning rate, batch size, momentum, optimiser choice, schedule.

Question 21

Q

What is the purpose of using a learning rate schedule?

Answer

Study These Flashcards

A

To reduce the learning rate over time for more stable convergence.

Question 22

Q

What is the Frobenius norm used for in optimisation problems?

Answer

Study These Flashcards

A

To measure the size or error in matrix approximations.

Question 23

Q

What does optimisation help us achieve in training neural networks?

Answer

Study These Flashcards

A

To reduce prediction error and improve generalisation by tuning parameters.

optimisation Flashcards

(23 cards)