Chp3 Regression Flashcards
(54 cards)
Linear Regression Theory
A statistical process for estimating the relationship among variables
What two variables are in regression?
Response Variable (dependent variable y)
Predictor Variables (independent variables, X)
What is regression used for?
Predicting and forecasting
What does linear regression try to do?
Use past data to predict future outcomes
What are model coefficients/parameters/weights?
multiply them against the input values to generate your response variable
What is error called between observed response Yi and predicted response Yhat
Residual
Residual Formula
Yi - Yhat
observed response - predicted response
Residual Sum of Squares
RSS, sum of each residual squared
Residual Formula mathematical simple
Ei = (B0 + B1Xi) - (Bhat0 + Bhat1X1)
Which is the best regression line?
The one that minimizes the sum of squared residuals
Multiple R-squared
Will always increase as you add more predictors because increasing variance and every predictor is increasing multiple R-squared, but not every predictor is a good predictor
Adjusted R-squared
Captures how many of those predictors you have added are actually good predictors as you add those predictors. Mult r sq and adj r sq values go up, but there will be a time where the adj r sq will plateau and drop, stop adding variables at that point
Adjusted R-squared shows
When adding more predictors makes it worse
F statistic
Captures how good model is, bigger the better
Degrees of freedom
How much wiggle room you have in your data set
In hypothesis testing, what must be true to support the Null hypothesis H0
Pvalue > alpha
In hypothesis testing, what must be true to not support the Null hypothesis
Pvalue <= alpha
What is alpha in hypothesis testing
The probability of Rejecting the null hypothesis given that the null hypothesis is true
What is pvalue in hypothesis testing
The probability of getting a result as extreme as you have given that the null hypothesis is true
What are the only two outcomes from hypothesis testing?
- Reject H0 in favor of H1
- Do not reject H0
In hypothesis testing, we never accept //
H1
If we are looking if a drug has an effect, what is null and alternate hypothesis?
null - drug has no effect
alt = drug has some effect
What are the four questions to evaluate the fit of a regression model?
Is at least one of the predictors useful in predicting the response?
Do all the predictors help explain the response, or is only a subset of the predictors useful?
How well does the model fit the data?
Given a set of predictor values, what response value should we predict and how accurate is our prediction?
What is the hypothesis test to determine if at least one predictor is useful in predicting the response?
H0: all betas are 0
H1: at least one beta is nonzero