L4 Flashcards

Question 1

Q

What is the primary objective of Linear Regression?

Answer

A

Linear Regression models the relationship between features and a continuous target

Minimize the difference between predicted y and actual y

This is achieved through the adjustment of weights and bias.

Question 2

Q

What are the parameters of a Linear Regression model?

Answer

A

Weights (w) - shows how important a feature is → positive w (More of this feature, higher chance of class 1) / negative w (More of this feature, lower chance of class 1)

**Controls tilt/steepness of the S-shaped curve

bias (b)
**sets where the 50 % probability point (decision boundary) lies: y^=0.5 where wx+b=0.
Increase b (make it less negative) → curve slides left → need smaller x to reach 50 %.
Decrease b (more negative) → curve slides right → need larger x to reach 50 %.

Weights indicate the importance of features, while bias adjusts the line’s intercept.

Question 3

Q

Define hyperparameters in the context of Linear Regression.

Answer

A

Variables whose value is set before the training process begins

Examples include regularization parameters and number of neighbors.

Question 4

Q

What is a loss function?

Answer

A

Measures how far off our predictions are for a single training example

what you are trying to minimize for a single training example to achieve your objective → e.g. square loss (average y - yi)

Example: square loss (average y - yi).

Question 5

Q

What is the cost function in Linear Regression?

Answer

A

Average of loss functions over the entire training set, e.g., Mean Squared Error (MSE)

Find the average squared difference between what we predicted and what it really was

It quantifies the overall prediction error.

Question 6

Q

What is an objective function?

Answer

A

Any function that you optimize during training (e.g. maximum likelihood, divergence between classes)
**loss function is a part of a cost function which is a type of an objective function

Examples include maximum likelihood and divergence between classes.

Question 7

Q

How does gradient descent work in Linear Regression?

Answer

A

Adjusts weights to minimize the cost function (optimization)

“slope” tells us how much we should adjust 𝑤 + 𝑏 to make the loss smaller

It uses the slope to determine adjustments needed.

Question 8

Q

What does the learning rate control in gradient descent?

Answer

A

The size of steps taken in any direction

Controls how big each step is (too big = overshoot, too small = very slow)

Affects the speed and accuracy of convergence.

Question 9

Q

What is the purpose of logistic regression?

Answer

A

To model categorical outcomes, especially in binary classification

Linear regression cannot model categorical outcomes well (e.g., pass/fail) -> Logistic regression solves this using a sigmoid (logistic) function (squashes the prediction into a range between 0 and 1)
- Assumes a particular functional form (a sigmoid) is applied to the linear function of the data

It uses a sigmoid function to output probabilities.

Question 10

Q

What is the output range of the logistic regression model?

Answer

A

(0, 1) -> interpretable as probability.

When 𝑧 is very big → output is close to 1 (yes)
When z is very small → output is close to 0 (no)
When 𝑧 = 0 → output = 0.5 (unsure)

**Can handle both continuous and discrete features
**Can be extended to multi-class problems

This range is interpretable as probabilities.

Question 11

Q

What is the decision boundary in logistic regression?

Answer

A

The point where probability = 0.5

It indicates the threshold for classifying outcomes.

Question 12

Q

State the formula for the logistic function.

Answer

A

y = 1 / (1 + e^(-z))

Where z is the linear combination of inputs.

Question 13

Q

What does cross-entropy loss measure in logistic regression?

Answer

A

The penalty for incorrect predictions, especially when the model is confident

It emphasizes the cost of being wrong with high certainty.

Question 14

Q

Define entropy in the context of probability distributions.

Answer

A

Measures uncertainty in a distribution

Max entropy = most uncertain (e.g., [0.5, 0.5])
Min entropy = most certain (e.g., [1, 0])

Max entropy indicates maximum uncertainty, while min entropy indicates certainty.

Question 15

Q

What is regularization used for?

Answer

A

To prevent overfitting, especially when training data is limited

regularization is applied by adding a penalty term to the loss function → This penalty makes it “costly” for the model to have very large weights.

Prevents the model from being too sensitive to training data (prevents overfitting).
Encourages simpler models that are more likely to perform well on new data.
Helps especially when you have many features or limited data.

It adds a penalty term to the loss function to control weight sizes.

Question 16

Q

What are the two types of regularization mentioned?

Answer

Study These Flashcards

A

L1 (Lasso) and L2 (Ridge)

L1 encourages sparsity; L2 discourages large weights.

Question 17

Q

What does L1 regularization do?

Answer

Study These Flashcards

A

Encourages some weights to become exactly zero

can be used for feature selection — less important features are “removed” by making their weights zero

WHILE L2
- Penalizes large weights, encouraging all weights to be small but not exactly zero.
- Helps smooth the model and prevent it from fitting noise.

This can be useful for feature selection.

Question 18

Q

Describe the One-vs-Rest (OvR) approach in multi-class logistic regression.

Answer

Study These Flashcards

A

Train a classifier for each class vs all others

How it works:
For KK classes, train K separate binary classifiers.
Each classifier tries to predict if the sample is in class kk or not.
Prediction:
For a new input, run all K classifiers and choose the class with the highest predicted probability.

Example:
For classes A, B, C:
Classifier 1: A vs not A
Classifier 2: B vs not B
Classifier 3: C vs not C

Each classifier predicts membership in its respective class.

Question 19

Q

What is the One-vs-One (OvO) approach?

Answer

Study These Flashcards

A

Train a classifier for each pair of classes

How it works:
For KK classes, train a separate classifier for each pair of classes (total of K(K−1)/2K(K-1)/2 classifiers).
Each classifier tries to tell the difference between two classes at a time.
Prediction:
For a new input, let all the classifiers vote, and pick the class that wins the most votes.
Less common for logistic regression; more common with SVMs.
E.g. 1v1, 1v2, 1v3, 1v4, 2v3, 2v4, 3v4
n * (n-1) / 2 (i.e. 6) binary classifiers - each on a fraction of the data

Total classifiers = n(n-1)/2, where n is the number of classes.

Question 20

Q

List the pros of Logistic Regression.

Answer

Study These Flashcards

A

Quick to train
Extendable to multi-class
Less prone to overfitting with regularization
Coefficients are interpretable
Resistant to overfitting

These features make it a popular choice for classification tasks.

Question 21

Q

List the cons of Logistic Regression.

Answer

Study These Flashcards

A

Assumes linear decision boundary
Not suitable for very complex decision surfaces

These limitations can affect performance in certain scenarios.

Question 22

Q

What is the goal of gradient descent?

Answer

Study These Flashcards

A

Find weights ww that minimize cross entropy loss
Use chain rule to differentiate sigmoid and loss
Learning rate λ\lambda: Controls step size

If the true label is 1:
- If prediction y is close to 1, loss ≈ 0 (good!)
- If prediction y is close to 0, loss is very large (bad!)

If the true label is 0:
- If prediction y is close to 0, loss ≈ 0 (good!)
- If prediction y is close to 1, loss is very large (bad!)

Interpretation:
The further your predicted probability is from the actual class, the more you’re penalized.

Question 23

Q

What is the softmax approach?

Answer

Study These Flashcards

A

multinomial
directly compute probabilities for each class

L4 Flashcards

(23 cards)