Binary Classification Flashcards
(13 cards)
What is mini-batch stochastic gradient descent (SGD) and why is it used?
Compute gradients on random subsets (batches) of data; it saves memory and helps escape local minima, making training feasible on large datasets.
What does shuffle=True do in a PyTorch DataLoader?
Randomly shuffles the dataset at each epoch to improve model generalisation by presenting data in a different order.
What does num_workers control in a DataLoader?
Specifies the number of subprocesses for parallel data loading, speeding up I/O and preprocessing.
How do you ensure all batches are the same size in DataLoader?
Set drop_last=True to drop the final smaller batch if the dataset size isn’t divisible by batch_size.
How do you flatten images of shape (batch, 3, 32, 32) for a linear layer?
Use tensor.flatten(start_dim=1) or images.reshape(batch_size, -1) to get shape (batch, 3072).
How is a single logistic regression neuron computed?
z = Σ_i w_i · x_i + b; then output ŷ = f(z) where f is the activation function.
What activation function is used in logistic regression and its formula?
Sigmoid: σ(z) = 1 / (1 + e^(-z)), producing outputs between 0 and 1 interpreted as probabilities.
What is the binary cross-entropy loss formula in the notebook?
B(z) = σ(z)^y (1-σ(z))^(1-y); log-likelihood L = y·log σ(z) + (1-y)·log(1-σ(z)).
How are model parameters initialized for logistic regression?
weights = torch.randn(3x32x32, requires_grad=True); bias = torch.tensor(0., requires_grad=True).
What are the steps in one training iteration?
1) Compute predictions via model(input); 2) Compute loss; 3) loss.backward(); 4) optimizer.step().
What does loss.backward() do?
Computes gradients of the loss w.r.t. all parameters and stores them in each parameter’s .grad attribute.
Why call optimizer.zero_grad() before backward pass?
Clears old gradients to prevent accumulation from multiple backward calls.
How do you convert predicted probabilities to class labels?
Threshold at 0.5: if σ(z) ≥ 0.5 then class 1, else class 0.