POWERPOINT 5 Flashcards
picks the line that minimizes the sum of the squares of the residuals
least squares
Y'i = b0+b1Xi b0 = intercept b1 = slope
linear prediction
rxy * sy/sx
rxy = corr(x, y) is the sample correlation
sx and sy are the sample standard deviation of X and Y
b1
Ybar - biXbar
Xbar and Ybar are the sample mean of X and Y
b0
measure of centrality
n
Ybar = 1/n
sample mean
measure of spread
n
sy^2= 1/n-1
sample variance
sy = sqrt(sy^2)
sample standard deviation
measures the direction and strength of the linear relationship between Y and X
n
Cov(Y, X) =
sample covariance
sx+y^2 =sx^2 +sy^2 −Cov(X,Y)
relate sample variance and covariance as follows
standardized covariance; scale invariance and the units of measurement don’t matter; only measures linear relationships
It is always true that −1 ≤ corr(X , Y ) ≤ 1
This gives the direction (- or +) and strength (0 → 1) of the linear relationship between X and Y
Corr(X, Y) = cov(X, Y)/sxsy
correlation
In Summary: Y = Yˆ + e where:
Yˆ is “made from X”; corr(X,Yˆ) = 1
e is unrelated to X; corr(X,e) = 0
fitted values and residuals
n

total sum of squares (SST)
n

regression SS (SSR)
n

error SS (SSE)
SSR + SSE
SST