Regression Flashcards

1
Q

What questions can regression answer?

A

How do systems work?
ex: how many runs the avg homerun is worth
-effects of economic factors on pres. election

Make Predictions about what will happen in the future?
-height in the future
-price of oil in the future
-housing demand in next 6 months

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Simple Linear Regresssion

A

-one predictor
-y = response
x = predictor

Equation

y = a0 + a1x1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

general linear regression equation

A

with m predictors
y = response
x = predictor
y = a0 + sum from j =1 to m ajxj

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you measure the quality of a regression line’s fit?

A

the sum of squared errors

-distance between true response and our estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

simple linear regression prediction error

A

Yi - actual
yhati - prediction

Yi - Yhati or yi - (a0+a1xi1))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sum of squared errors equation

A

sum from i = 1 to n (yi-yhati)^2
or
sum from i = 1 to n (yi-(a0+a1xi1))^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the best fit regression SLR line?

A

minimizes sum of squared errors
-defined by a0 and a1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we measure the quality of a models fit?

A

likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is likelihood? What is maximum likelihood?

A

-measure the probability (density) for any parameter set; we assume the observed data is the correct value and we have information about the variance

-parameters that give the highest probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What Maximum Likelihood Estimation (MLE). What are you minimizing to calculate this?

A

the set of parameters that minimizes the sum of squared errors

zi = observations
yi = model estimates
minimize sum from i = 1 to n (zi-yi)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Maximum likelihood in the context of linear regression

A

LR - y = a0 + sum from j =1 to m ajxj
sum square errors = sum from i = 1 to n (zi-yi)^2

substitute regression equation for yi in sum of squared errors

minimize sum from i = 1 to n (zi-(a0 + sum from j =1 to m ajxj))^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you use likelihood to compare two different models?

A

the likelihood ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Akaike Information Criterion equation. What is the penalty terma nd what does it do?

A

L*: maximum likelihood value
K: # of parameters we’re investigating

AIC = 2k -2ln(L*)

Penalty term - (2k) balances likelihood with simplicity
-helps avoid overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

AIC with regression? Do you want AIC to be smaller or higher?

A

substitute maximum likelihood reg. equasion and the # of parameters is m+1

-we prefer models with smaller aic, aic smaller encourages fewer parameters and higher likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

corrected AIC

A

-works well if we have infitiely many data points
-this never happens

-add a corrections term

AICc = AIC 2k(k+1)/ n-k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Comparing models with AIC

A

relative likelihood that lower AIC model is better =
e^((AIC1-AIC2)/2)

17
Q

Bayesian Information Criterion (BIC)

A

L*: maximum likelihood value
K: # of parameters we’re investigating
n: number of data points

BIC = kln(n) - 2ln(L*)

18
Q

AIC VS BIC

A

BICs penalty term >AICs penalty term
-BIC encourages models with fewer parameters than AIC does

-only use bic when there are more data points than parameters

19
Q

BIC comparison between 2 modesl on the same dataset…

A

is abs(BIC1-BIC2) >10 the smaller bic model is very likely to be better

if between 6 and 10 smaller bic models is likely better

between 2 and 6 somewhat likely better

between 0 and 2 is slightly likely to be better

20
Q

Is there a hard an fast rule for choosing betweeen AIC, BIC, or maximum likelihood?

A

No, all 3 can give valuable information. Looking at all 3 can help you decide which is best

21
Q

Regression coefficients for predictions and forecasting

A

the response increases by the coeeficient * the variable

in other words if the variable= 1 , that increases the response by the coefficient amount (descriptive)

if we are forecasting
-same thing but the coefficient is increase the response by its amount when the variable =1 (predictive)

22
Q

Which of the components of analytics can regression be used for?

A

Descriptive and predictive analytics
not prescriptive

23
Q

Causation

A

one thing causes another thing

24
Q

correlation

A

two things tend to happen together or not together
- they don’t nescessarily cause each other

25
When is there causation?
-cause is before effect -idea of causation makes sense -no outside factors that could cause the relationship -be careful before claiming causation
26
Transforming data
-adjust the data so the fit is linear -quadratic regression -response transform -box-cox transformation
27
variable interaction
ex- 2 yr olds height @ adulthood. if both parents are tall maybe the kid will be even taller ie their heights interact -y = a0 + a1x1 + a2x2+a3(x1x2) -the interaction term is a new column of data that we can use as a new input x3
28
p-value of coefficient
estimate the probability that the coefficient is really 0 -form of hypothesis testing if p value > 0.05 - can remove from model -other thresholds can be used -higher thresholds - more factors can be included -possibilitie of including irrelevant factor -lower thresholds - less factors can be included - possibility of leaving out relevant factor
29
p-value warnings
with large amounts of data p values get small even when attributes are not at all related to the response p values are only probabilities even when meaningful -100 attributes p values of .02 each, 2% chance of not being significant -expect 2 that are not really relevant
30
confidence interval
where the coefficient probably lies and how close it is to 0
31
T-statistic
the coefficient divided by it's standard error -related to p value
32
interpreting coefficient
-sometimes you discover the coefficient when multiplied by attribute still doesn't make much of a difference even if the pvalue is very low ex: estimate household income with age as one of the attributes -if the coefficient is 1 even with low p value the attribute really isn't very important. its unlikely to mkae even a $100 difference
33
R squared value (coefficient of determination)
-estimate of how much variability your model accounts for -ex rsquared = 59% -accounts for about 59% of the variability in the data -the remaining 41% is either randomness or other factors
34
adjusted r dquared
rsquared adjusted for # of attributes used
35
interpreting r squared, what is a good value?
-some things aren't easily modeled -things can affect real life systems especially when humans are involved -r-squared values of .4 or .3 are quite good
36
what is the null hypothesis?
the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
37
r squared formual
1-SSEresiduals/SSEtota