Week 5 Flashcards
(26 cards)
WHat kinds of questions can regression anser?
How do systems work? Value of a home run, effect of econmic factors on pres election, ipact of education on income, key factors in car purchasing
What will happen in the future? How tall a child will be, oil price prediction, remaining lifetime of person purchasing life insurance
Simple regression equation
y = a0 + a1x1
How to measure the quality of a line’s fit
Minimize sum of squared errors
Most basic measure of quality
maximum liklihood
MLE
set of parameters that minimizes sum of squared errors
How can likihood be used
to compare two models performance
Commonly used functions for comparing models
AIC(Akaike Information Criterion)
BIC(Bayesian information criterion)
What values to prefer for AIC and BIC
AIC(Prefer smaller numbers)
BIC(
Making AIC smaller encourages
fewer parameters k and higher liklihood
AIC has nice properties if
there are infinitely many data points
How to deal with AIC with smaller datasets
AIC c
Use BIC when there is a lot more data than parameters. TF
T
Rule of thumb for BIC
BIC1 - BIC2 > 10: smaller BIC model is likely better
6 < |BIC1 - BIC2} < 10: Smallery BIC model is likely better
2 < |BIC1 - BIC2| < 6: smallery BIC model is somewhat likely better
0 < |BIC1 - BIC2| < 2: SMallery BIC model is slightly likely better
AIC is what kind of point of view
Frequentist
Regression can’t answer what kind of problem
prescriptive
causation
one thing causes another thing
correlation
two things tend to happen or not happen together. Neither of the might cause the other.
p value
estimates the probability that the coefficient might be 0:
If p value > .05, remove corresponding ttribute from model
Other p value thresholds
higher thresholds: More factors can be included with possibility of including irrelevant factor
lower thresholds: less factors can be included with possibility of leaving out a relevant factor
Warnings about p values
With large amounts of data, p values get small even when attributes are not related to the response
P values are only probabilities even when meaningful
COnfidence interval
WHere the coefficient probaby lies and how close it is to 0
T statistic
the coefficient divided by its standard error
related to p value
Coefficient
When multiplied by the attribute value doesn’t make much difference even if very low p value
R squared
estimate how much variability your model accounts for.
If R squared value is 59% when it accounts for 59% of variability int he data and the remaining 41% is randomness or other factors