L2 Linear Regression Flashcards

Question 1

Q

Linear Regression, definition

Answer

A

Predict yˆ \in R (label, response) from x \in R^d (features, covariate)

Least squares model: yˆ = w_1^⊤ x + w_2 (bias), where w_1 \in R^d and w_2 \in R (that is, w \in R^{d+1})

Learning: choose (w_1, w_2) based on data ((x(i), y(i)))^N_{i=1}.
Prediction: given x, predict yˆ = w_1^⊤ x + w_2.

Question 2

Q

Linear Regression, learning

Answer

A

arg min(w1 in R^d, w2 in R) 1/N Sum^N_{i=1} 1/2 (w1^T x(i) + w2 - y(i) )^2

Simplification: arg min(w in R^{d+1}) 1/2 ||Xw - y||^2_2

Solving gives OLS (Ordinary least squares) estimator wˆ = (X^T X)^{-1} X^T y (when it exists)

Question 3

Q

Linear Regression, problems/solutions

Answer

A

Not inversible if (X^T X)^{-1} does not exists. Does not exists if n < d+1

Pseudoinverse (X^⊤ X)^† X^⊤ y = X^† y – still satisfies the “derivative condition” i.e. (X^⊤ X)wˆ = X^⊤ y
Ridge regression (regularisation, to make sure there are no null eigenvalues): arg min(w∈{Rd+1}) 1/2 ∥Xw − y∥^2_2 + λ/2∥w∥^2, giving w ̃ = (X^⊤ X + λI)^{−1} X^⊤ y

If \lambda \to \infty, w_i \to 0.

Question 4

Q

Linear Regression, justifications/interpretations

Answer

A

Geometric interpretation, the residual Xwˆ − y is orthogonal to span z1,..zd+1 because X^⊤(Xwˆ − y) = 0.
Probabilistic model: y | x ~ N(w^⊤ x, σ2) –solve = maximize likelihood
Loss minimization (ERM)

Question 5

Q

Empirical Risk Minimization

Answer

A

l_ls(y, yˆ) = 1/2 (y − yˆ)^2 is the least squares loss

ERM: arg min(f) 1/N Sum^N_{i=1} l(y(i), f(x(i)))

Question 6

Q

Least squares classification

Answer

A

Suppose y ∈ {−1, +1} ( i.e. binary classification) with classification error loss 1[y ̸= yˆ] ≈ 1[yyˆ ≤ 0]

Strategy: Choose w to minimize least squares loss lls(y, yˆ) = (y − yˆ)^2/2.

if y ∈ {−1, +1}, (y − yˆ)^2/2 = (y^2)(1 − yyˆ)^2/2 = (1 − yyˆ)^2/2.

Predict sgn(wˆ⊤ x) ∈ {−1, +1}

(6 cards)