3. Linear Regression Flashcards

Question

Least Squares Regression | Estimated Residuals

Answer 1

-now that we have used the method of least squares to determine the values α^ and β^, the values which minimise r(α,β) -we can consider the estimated residuals: εi^ = yi - yi^ = yi - α^ - xi*β^ -these are the vertical distances between the data and the fitted regression line

Answer 2

-in order to fit a linear model we also need to estimate the residual variance, σ² -this can be done using the estimator: σ²^ ≈ 1/(n-2) Σ(εi-ε`)² = 1/(n-2) Σ(yi-α^-xiβ^)² -to understand the form of this estimator, remember that σ² is just the variance of εi -thus using the standard estimator for variance we could estimate σ² as: σ² ≈ 1/(n-1) Σ(εi-ε`)² = 1(n-1) Σ(εi^-ε^`)² -where ` is used to indicate a sample mean and ^ is for the values estimated using the least squares model

Answer 3

-let x1,...,xnϵR be given, εi,...,εn i.i.d random variables with E(εi)=0 and Var(εi)=σ² -Let α,βϵR and define Yi=α+βxi+εi for all iϵ{1,...,n} -furthermore, let α^, β^ and σ^² ve the estimators -then we have: E(α^(x,Y)) = α E(β^(x,Y)) = β E(σ^²(x,Y)) = σ² -where x=(x1,...,xn) and Y=(Y1,...,Yn)

Answer 4

-here α^ and β^ the estimators Var(α^(x,Y)) = (x²~σ²) / (σx²(n-1)) -where x²~ indicates the mean of x² i.e. 1/n Σxi² Var(β^(x,Y)) = σ² / (σx²(n-1)) -and σx² is the sample variance of x1,...,xn

Answer 5

-once we have found estimates α^ and β^ for the parameters α and β we can predict the y value for any given x using: y^ = α^ + xβ^ -since the estimates α^ and β^ are affected by noise in the true observations, the estimates regression line will differ from the 'true' regression line: y = α + xβ -but we expect the error in y^ to decrease with n since we can see that the variance in α^ and β^ will decrease with n, i.e. our estimates will become more stable

Answer 6

``` -let x*∈R and: y^* = α^ + x*β^ -then y^ is an unbiased estimator for the y value of the unknown true regression line at the point x*, i.e. : E(y^*) = α + x*β -and Var(y^) = 1/n (1 + n(x*-x_)²/(n-2)σx²) -where x_ is the mean of xi ```

Answer 7

**********

Answer 8

- residual(m) returns εi^ for each data point - fitted m returns yi^ calculated by yi^=α^ + xiβ^ - you can print m to screen to see the key values, α and β - summary(m) can be called for more statistical information about the model - the coefficients as a vector can be obtained using coef(m) and can then be assigned to variables alpha and beta

Answer 9

-one of the main aims of fitting a linear model is to make predictions for new not previously observed x values, i.e. to compute: ynew = α + βxnew -the command for this is predict(m , newdata=...) -where m is the model previously fitted using lm and newdata specifes the new x-values to predict responses for -the argument newdata should be a data.frame with a column which has the name of the original variable and contains the new values, e.g.: predict(m,newdata=data.frame(x=1))

Answer 10

- so far we have considered a regression line in the form y=α+βx to predict y from x - instead we could have used x=γ+𝛿y to predict x from y - regression for y as a function of x minimises the (average squared) length of the vertical lines from the points to the line - regression of x as a function of y minimises the (average squared) length of the horizontal lines from the points to the line - thus the two models are different

Answer 11

-in the model Yi=α+βxi+εi, the residuals εi can be seen as an error or uncertainty in the observations in yi whereas the values xi are assumed to be known exactly -how would we construct a model where there are uncertainties about the values of both x and y -a simple model would be: Xi = xi + ηi Yi = α+βxi+εi -for i=1,...,n, where ηi~N(0,ση²) and εi~N(0,σε²) independently -models of this form are called 'errors in variables models'

Answer 12

-use the lm() command | m

3. Linear Regression Flashcards

Sample covariance and correlation, least square regression, alternative regression models (38 cards)