Binary Classification Flashcards

Question 1

Q

What is mini-batch stochastic gradient descent (SGD) and why is it used?

Answer

A

Compute gradients on random subsets (batches) of data; it saves memory and helps escape local minima, making training feasible on large datasets.

Question 2

Q

What does shuffle=True do in a PyTorch DataLoader?

Answer

A

Randomly shuffles the dataset at each epoch to improve model generalisation by presenting data in a different order.

Question 3

Q

What does num_workers control in a DataLoader?

Answer

A

Specifies the number of subprocesses for parallel data loading, speeding up I/O and preprocessing.

Question 4

Q

How do you ensure all batches are the same size in DataLoader?

Answer

A

Set drop_last=True to drop the final smaller batch if the dataset size isn’t divisible by batch_size.

Question 5

Q

How do you flatten images of shape (batch, 3, 32, 32) for a linear layer?

Answer

A

Use tensor.flatten(start_dim=1) or images.reshape(batch_size, -1) to get shape (batch, 3072).

Question 6

Q

How is a single logistic regression neuron computed?

Answer

A

z = Σ_i w_i · x_i + b; then output ŷ = f(z) where f is the activation function.

Question 7

Q

What activation function is used in logistic regression and its formula?

Answer

A

Sigmoid: σ(z) = 1 / (1 + e^(-z)), producing outputs between 0 and 1 interpreted as probabilities.

Question 8

Q

What is the binary cross-entropy loss formula in the notebook?

Answer

A

B(z) = σ(z)^y (1-σ(z))^(1-y); log-likelihood L = y·log σ(z) + (1-y)·log(1-σ(z)).

Question 9

Q

How are model parameters initialized for logistic regression?

Answer

A

weights = torch.randn(3x32x32, requires_grad=True); bias = torch.tensor(0., requires_grad=True).

Question 10

Q

What are the steps in one training iteration?

Answer

A

1) Compute predictions via model(input); 2) Compute loss; 3) loss.backward(); 4) optimizer.step().

Question 11

Q

What does loss.backward() do?

Answer

A

Computes gradients of the loss w.r.t. all parameters and stores them in each parameter’s .grad attribute.

Question 12

Q

Why call optimizer.zero_grad() before backward pass?

Answer

A

Clears old gradients to prevent accumulation from multiple backward calls.

Question 13

Q

How do you convert predicted probabilities to class labels?

Answer

A

Threshold at 0.5: if σ(z) ≥ 0.5 then class 1, else class 0.

Binary Classification Flashcards

(13 cards)