Binary Classification Flashcards

(13 cards)

1
Q

What is mini-batch stochastic gradient descent (SGD) and why is it used?

A

Compute gradients on random subsets (batches) of data; it saves memory and helps escape local minima, making training feasible on large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does shuffle=True do in a PyTorch DataLoader?

A

Randomly shuffles the dataset at each epoch to improve model generalisation by presenting data in a different order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does num_workers control in a DataLoader?

A

Specifies the number of subprocesses for parallel data loading, speeding up I/O and preprocessing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you ensure all batches are the same size in DataLoader?

A

Set drop_last=True to drop the final smaller batch if the dataset size isn’t divisible by batch_size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you flatten images of shape (batch, 3, 32, 32) for a linear layer?

A

Use tensor.flatten(start_dim=1) or images.reshape(batch_size, -1) to get shape (batch, 3072).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is a single logistic regression neuron computed?

A

z = Σ_i w_i · x_i + b; then output ŷ = f(z) where f is the activation function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What activation function is used in logistic regression and its formula?

A

Sigmoid: σ(z) = 1 / (1 + e^(-z)), producing outputs between 0 and 1 interpreted as probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the binary cross-entropy loss formula in the notebook?

A

B(z) = σ(z)^y (1-σ(z))^(1-y); log-likelihood L = y·log σ(z) + (1-y)·log(1-σ(z)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are model parameters initialized for logistic regression?

A

weights = torch.randn(3x32x32, requires_grad=True); bias = torch.tensor(0., requires_grad=True).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps in one training iteration?

A

1) Compute predictions via model(input); 2) Compute loss; 3) loss.backward(); 4) optimizer.step().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does loss.backward() do?

A

Computes gradients of the loss w.r.t. all parameters and stores them in each parameter’s .grad attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why call optimizer.zero_grad() before backward pass?

A

Clears old gradients to prevent accumulation from multiple backward calls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you convert predicted probabilities to class labels?

A

Threshold at 0.5: if σ(z) ≥ 0.5 then class 1, else class 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly