L2 Linear Regression Flashcards Preview

CS 446 > L2 Linear Regression > Flashcards

Flashcards in L2 Linear Regression Deck (6)
Loading flashcards...
1

Linear Regression, definition

Predict yˆ \in R (label, response) from x \in R^d (features, covariate)

Least squares model: yˆ = w_1^⊤ x + w_2 (bias), where w_1 \in R^d and w_2 \in R (that is, w \in R^{d+1})

Learning: choose (w_1, w_2) based on data ((x(i), y(i)))^N_{i=1}.
Prediction: given x, predict yˆ = w_1^⊤ x + w_2.

- Closed form solution
- Gaussian probability model
- Ideal for regression, often not well suited for classification

2

Linear Regression, learning

arg min(w1 in R^d, w2 in R) 1/N Sum^N_{i=1} 1/2 (w1^T x(i) + w2 - y(i) )^2

Simplification: arg min(w in R^{d+1}) 1/2 ||Xw - y||^2_2

Solving gives OLS (Ordinary least squares) estimator wˆ = (X^T X)^{-1} X^T y (when it exists)

3

Linear Regression, problems/solutions

Not inversible if (X^T X)^{-1} does not exists. Does not exists if n < d+1

1. Pseudoinverse (X^⊤ X)^† X^⊤ y = X^† y – still satisfies the “derivative condition” i.e. (X^⊤ X)wˆ = X^⊤ y

2. Ridge regression (regularisation, to make sure there are no null eigenvalues): arg min(w∈{Rd+1}) 1/2 ∥Xw − y∥^2_2 + λ/2∥w∥^2, giving w ̃ = (X^⊤ X + λI)^{−1} X^⊤ y

If \lambda \to \infty, w_i \to 0.

4

Linear Regression, justifications/interpretations

- Geometric interpretation, the residual Xwˆ − y is orthogonal to span z1,..zd+1 because X^⊤(Xwˆ − y) = 0.
- Probabilistic model: y | x ~ N(w^⊤ x, σ2) – solve = maximize likelihood
- Loss minimization (ERM)

5

Empirical Risk Minimization

l_ls(y, yˆ) = 1/2 (y − yˆ)^2 is the least squares loss

ERM: arg min(f) 1/N Sum^N_{i=1} l(y(i), f(x(i)))

6

Least squares classification

Suppose y ∈ {−1, +1} ( i.e. binary classification) with classification error loss 1[y ̸= yˆ] ≈ 1[yyˆ ≤ 0]

Strategy: Choose w to minimize least squares loss lls(y, yˆ) = (y − yˆ)^2/2.

if y ∈ {−1, +1}, (y − yˆ)^2/2 = (y^2)(1 − yyˆ)^2/2 = (1 − yyˆ)^2/2.

Predict sgn(wˆ⊤ x) ∈ {−1, +1}