Linear Regression Flashcards
(8 cards)
What is supervised learning?
A learning paradigm where a model is trained on input-output pairs (x_i, y_i) to learn a mapping from inputs to outputs.
How is the linear regression model expressed using basis functions?
f(x) = ∑ₖ βₖ φₖ(x), where φₖ are basis functions (e.g., φ₀(x) = 1, φ₁(x) = x).
What loss function does linear regression use and why?
Mean Squared Error: L = (1/n) ∑ᵢ (yᵢ − f(xᵢ))²; penalizes larger errors and is differentiable.
What is the normal equation for the closed-form solution?
β = (Φᵀ Φ)⁻¹ Φᵀ y, where Φ is the design matrix of basis functions.
How does gradient descent optimize parameters?
β_new = β_old − rate × ∂L/∂β, thus moving each step in the negative‐gradient direction to reduce the loss.
What is the gradient descent update rule in linear regression?
β_new = β_old – rate * (–2 * Φ^T * y + 2 * Φ^T * Φ * β_old), Step in the negative‐gradient direction of the MSE loss.
How does learning rate affect gradient descent convergence?
A small learning rate yields slow convergence; a large learning rate can cause oscillation or divergence.
How is the design matrix Φ?
Φ = np.vstack((np.ones(n), x)), stacking a row of ones (for the intercept) on top of the feature row.