Regression Flashcards by Jacob Fritz

What questions can regression answer?

How do systems work?
ex: how many runs the avg homerun is worth
-effects of economic factors on pres. election

Make Predictions about what will happen in the future?
-height in the future
-price of oil in the future
-housing demand in next 6 months

How well did you know this?

Not at all

Perfectly

Simple Linear Regresssion

-one predictor
-y = response
x = predictor

Equation

y = a0 + a1x1

How well did you know this?

Not at all

Perfectly

general linear regression equation

with m predictors
y = response
x = predictor
y = a0 + sum from j =1 to m ajxj

How well did you know this?

Not at all

Perfectly

How do you measure the quality of a regression line’s fit?

the sum of squared errors

-distance between true response and our estimate

How well did you know this?

Not at all

Perfectly

simple linear regression prediction error

Yi - actual
yhati - prediction

Yi - Yhati or yi - (a0+a1xi1))

How well did you know this?

Not at all

Perfectly

Sum of squared errors equation

sum from i = 1 to n (yi-yhati)^2
or
sum from i = 1 to n (yi-(a0+a1xi1))^2

How well did you know this?

Not at all

Perfectly

What is the best fit regression SLR line?

minimizes sum of squared errors
-defined by a0 and a1

How well did you know this?

Not at all

Perfectly

How do we measure the quality of a models fit?

likelihood

How well did you know this?

Not at all

Perfectly

What is likelihood? What is maximum likelihood?

-measure the probability (density) for any parameter set; we assume the observed data is the correct value and we have information about the variance

-parameters that give the highest probability

How well did you know this?

Not at all

Perfectly

What Maximum Likelihood Estimation (MLE). What are you minimizing to calculate this?

the set of parameters that minimizes the sum of squared errors

zi = observations
yi = model estimates
minimize sum from i = 1 to n (zi-yi)^2

How well did you know this?

Not at all

Perfectly

Maximum likelihood in the context of linear regression

LR - y = a0 + sum from j =1 to m ajxj
sum square errors = sum from i = 1 to n (zi-yi)^2

substitute regression equation for yi in sum of squared errors

minimize sum from i = 1 to n (zi-(a0 + sum from j =1 to m ajxj))^2

How well did you know this?

Not at all

Perfectly

How can you use likelihood to compare two different models?

the likelihood ratio

How well did you know this?

Not at all

Perfectly

Akaike Information Criterion equation. What is the penalty terma nd what does it do?

L*: maximum likelihood value
K: # of parameters we’re investigating

AIC = 2k -2ln(L*)

Penalty term - (2k) balances likelihood with simplicity
-helps avoid overfitting

How well did you know this?

Not at all

Perfectly

AIC with regression? Do you want AIC to be smaller or higher?

substitute maximum likelihood reg. equasion and the # of parameters is m+1

-we prefer models with smaller aic, aic smaller encourages fewer parameters and higher likelihood

How well did you know this?

Not at all

Perfectly

corrected AIC

-works well if we have infitiely many data points
-this never happens

-add a corrections term

AICc = AIC 2k(k+1)/ n-k-1

How well did you know this?

Not at all

Perfectly

Comparing models with AIC

Study These Flashcards

relative likelihood that lower AIC model is better =
e^((AIC1-AIC2)/2)

Bayesian Information Criterion (BIC)

Study These Flashcards

L*: maximum likelihood value
K: # of parameters we’re investigating
n: number of data points

BIC = kln(n) - 2ln(L*)

AIC VS BIC

Study These Flashcards

BICs penalty term >AICs penalty term
-BIC encourages models with fewer parameters than AIC does

-only use bic when there are more data points than parameters

BIC comparison between 2 modesl on the same dataset…

Study These Flashcards

is abs(BIC1-BIC2) >10 the smaller bic model is very likely to be better

if between 6 and 10 smaller bic models is likely better

between 2 and 6 somewhat likely better

between 0 and 2 is slightly likely to be better

Is there a hard an fast rule for choosing betweeen AIC, BIC, or maximum likelihood?

Study These Flashcards

No, all 3 can give valuable information. Looking at all 3 can help you decide which is best

Regression coefficients for predictions and forecasting

Study These Flashcards

the response increases by the coeeficient * the variable

in other words if the variable= 1 , that increases the response by the coefficient amount (descriptive)

if we are forecasting
-same thing but the coefficient is increase the response by its amount when the variable =1 (predictive)

Which of the components of analytics can regression be used for?

Study These Flashcards

Descriptive and predictive analytics
not prescriptive

Causation

Study These Flashcards

one thing causes another thing

correlation

Study These Flashcards

two things tend to happen together or not together
- they don’t nescessarily cause each other

When is there causation?

-cause is before effect -idea of causation makes sense -no outside factors that could cause the relationship -be careful before claiming causation

Transforming data

-adjust the data so the fit is linear -quadratic regression -response transform -box-cox transformation

variable interaction

ex- 2 yr olds height @ adulthood. if both parents are tall maybe the kid will be even taller ie their heights interact -y = a0 + a1x1 + a2x2+a3(x1x2) -the interaction term is a new column of data that we can use as a new input x3

p-value of coefficient

estimate the probability that the coefficient is really 0 -form of hypothesis testing if p value > 0.05 - can remove from model -other thresholds can be used -higher thresholds - more factors can be included -possibilitie of including irrelevant factor -lower thresholds - less factors can be included - possibility of leaving out relevant factor

p-value warnings

with large amounts of data p values get small even when attributes are not at all related to the response p values are only probabilities even when meaningful -100 attributes p values of .02 each, 2% chance of not being significant -expect 2 that are not really relevant

confidence interval

where the coefficient probably lies and how close it is to 0

T-statistic

the coefficient divided by it's standard error -related to p value

interpreting coefficient

-sometimes you discover the coefficient when multiplied by attribute still doesn't make much of a difference even if the pvalue is very low ex: estimate household income with age as one of the attributes -if the coefficient is 1 even with low p value the attribute really isn't very important. its unlikely to mkae even a $100 difference

R squared value (coefficient of determination)

-estimate of how much variability your model accounts for -ex rsquared = 59% -accounts for about 59% of the variability in the data -the remaining 41% is either randomness or other factors

adjusted r dquared

rsquared adjusted for # of attributes used

interpreting r squared, what is a good value?

-some things aren't easily modeled -things can affect real life systems especially when humans are involved -r-squared values of .4 or .3 are quite good

what is the null hypothesis?

the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.

r squared formual

1-SSEresiduals/SSEtota

Regression Flashcards

(37 cards)