L2 Linear Regression Flashcards

(6 cards)

1
Q

Linear Regression, definition

A

Predict yˆ \in R (label, response) from x \in R^d (features, covariate)

Least squares model: yˆ = w_1^⊤ x + w_2 (bias), where w_1 \in R^d and w_2 \in R (that is, w \in R^{d+1})

Learning: choose (w_1, w_2) based on data ((x(i), y(i)))^N_{i=1}.
Prediction: given x, predict yˆ = w_1^⊤ x + w_2.

  • Closed form solution
  • Gaussian probability model
  • Ideal for regression, often not well suited for classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Regression, learning

A

arg min(w1 in R^d, w2 in R) 1/N Sum^N_{i=1} 1/2 (w1^T x(i) + w2 - y(i) )^2

Simplification: arg min(w in R^{d+1}) 1/2 ||Xw - y||^2_2

Solving gives OLS (Ordinary least squares) estimator wˆ = (X^T X)^{-1} X^T y (when it exists)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Linear Regression, problems/solutions

A

Not inversible if (X^T X)^{-1} does not exists. Does not exists if n < d+1

  1. Pseudoinverse (X^⊤ X)^† X^⊤ y = X^† y – still satisfies the “derivative condition” i.e. (X^⊤ X)wˆ = X^⊤ y
  2. Ridge regression (regularisation, to make sure there are no null eigenvalues): arg min(w∈{Rd+1}) 1/2 ∥Xw − y∥^2_2 + λ/2∥w∥^2, giving w ̃ = (X^⊤ X + λI)^{−1} X^⊤ y

If \lambda \to \infty, w_i \to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Linear Regression, justifications/interpretations

A
  • Geometric interpretation, the residual Xwˆ − y is orthogonal to span z1,..zd+1 because X^⊤(Xwˆ − y) = 0.
  • Probabilistic model: y | x ~ N(w^⊤ x, σ2) –solve = maximize likelihood
  • Loss minimization (ERM)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Empirical Risk Minimization

A

l_ls(y, yˆ) = 1/2 (y − yˆ)^2 is the least squares loss

ERM: arg min(f) 1/N Sum^N_{i=1} l(y(i), f(x(i)))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Least squares classification

A

Suppose y ∈ {−1, +1} ( i.e. binary classification) with classification error loss 1[y ̸= yˆ] ≈ 1[yyˆ ≤ 0]

Strategy: Choose w to minimize least squares loss lls(y, yˆ) = (y − yˆ)^2/2.

if y ∈ {−1, +1}, (y − yˆ)^2/2 = (y^2)(1 − yyˆ)^2/2 = (1 − yyˆ)^2/2.

Predict sgn(wˆ⊤ x) ∈ {−1, +1}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly