Regression Flashcards
What questions can regression answer?
How do systems work?
ex: how many runs the avg homerun is worth
-effects of economic factors on pres. election
Make Predictions about what will happen in the future?
-height in the future
-price of oil in the future
-housing demand in next 6 months
Simple Linear Regresssion
-one predictor
-y = response
x = predictor
Equation
y = a0 + a1x1
general linear regression equation
with m predictors
y = response
x = predictor
y = a0 + sum from j =1 to m ajxj
How do you measure the quality of a regression line’s fit?
the sum of squared errors
-distance between true response and our estimate
simple linear regression prediction error
Yi - actual
yhati - prediction
Yi - Yhati or yi - (a0+a1xi1))
Sum of squared errors equation
sum from i = 1 to n (yi-yhati)^2
or
sum from i = 1 to n (yi-(a0+a1xi1))^2
What is the best fit regression SLR line?
minimizes sum of squared errors
-defined by a0 and a1
How do we measure the quality of a models fit?
likelihood
What is likelihood? What is maximum likelihood?
-measure the probability (density) for any parameter set; we assume the observed data is the correct value and we have information about the variance
-parameters that give the highest probability
What Maximum Likelihood Estimation (MLE). What are you minimizing to calculate this?
the set of parameters that minimizes the sum of squared errors
zi = observations
yi = model estimates
minimize sum from i = 1 to n (zi-yi)^2
Maximum likelihood in the context of linear regression
LR - y = a0 + sum from j =1 to m ajxj
sum square errors = sum from i = 1 to n (zi-yi)^2
substitute regression equation for yi in sum of squared errors
minimize sum from i = 1 to n (zi-(a0 + sum from j =1 to m ajxj))^2
How can you use likelihood to compare two different models?
the likelihood ratio
Akaike Information Criterion equation. What is the penalty terma nd what does it do?
L*: maximum likelihood value
K: # of parameters we’re investigating
AIC = 2k -2ln(L*)
Penalty term - (2k) balances likelihood with simplicity
-helps avoid overfitting
AIC with regression? Do you want AIC to be smaller or higher?
substitute maximum likelihood reg. equasion and the # of parameters is m+1
-we prefer models with smaller aic, aic smaller encourages fewer parameters and higher likelihood
corrected AIC
-works well if we have infitiely many data points
-this never happens
-add a corrections term
AICc = AIC 2k(k+1)/ n-k-1
Comparing models with AIC
relative likelihood that lower AIC model is better =
e^((AIC1-AIC2)/2)
Bayesian Information Criterion (BIC)
L*: maximum likelihood value
K: # of parameters we’re investigating
n: number of data points
BIC = kln(n) - 2ln(L*)
AIC VS BIC
BICs penalty term >AICs penalty term
-BIC encourages models with fewer parameters than AIC does
-only use bic when there are more data points than parameters
BIC comparison between 2 modesl on the same dataset…
is abs(BIC1-BIC2) >10 the smaller bic model is very likely to be better
if between 6 and 10 smaller bic models is likely better
between 2 and 6 somewhat likely better
between 0 and 2 is slightly likely to be better
Is there a hard an fast rule for choosing betweeen AIC, BIC, or maximum likelihood?
No, all 3 can give valuable information. Looking at all 3 can help you decide which is best
Regression coefficients for predictions and forecasting
the response increases by the coeeficient * the variable
in other words if the variable= 1 , that increases the response by the coefficient amount (descriptive)
if we are forecasting
-same thing but the coefficient is increase the response by its amount when the variable =1 (predictive)
Which of the components of analytics can regression be used for?
Descriptive and predictive analytics
not prescriptive
Causation
one thing causes another thing
correlation
two things tend to happen together or not together
- they don’t nescessarily cause each other