Exam 1 Flashcards
(40 cards)
Why use models?
To understand the relationships between variable
To predict future outcomes
To quantify differences between groups or treatments
Response variable
the variable that you want to understand/model/predict. aka - y, dependent variable
explanatory variables
the variables you know and think that they are maybe related to the response variable that you want to use to figure out a pattern/model/relationship. aka - x, independent variable, predictor variable, covariates
model
a function that combines explanatory variables mathematically into estimates of the response variable
error
what’s left over; the variability in the response that your model doesn’t capture (error
is somewhat of a misnomer – maybe noise is a better term)
Categorical Data
Two outcomes, not numerical
Quantitative variables
Numerical
Parameter
Describes entire population
Statistic
Describes sample
The four-step process
- Choose
- Fit
- Assess
- Use
Model Notation
Y = f(X) + e
ybar or xbar
averages
yhat
estimate
Y = ? (Simple Linear Regression)
Beta0 + Beta1*X + e
Yhat = ? (Simple Linear Regression)
Beta0 + Beta1*X
Naive Model
Mean + Error
Age = Agebar + e
Residuals
How far from the prediction line points are
yhat - y
Least Squares
Technique to minimize SSE
The value of all squared residuals is at a minimum
SSE
SSE =∑(yhat − y)^2
Regression Standard Error
σ = sqrt(SSE / n-2)
Linearity
If the resuduals resemble a line
Independence
Residuals do not depend on time. Don’t get bigger or smaller as plot goes on
Normality of Residuals:
The residuals are distributed symmetrically around zero, with no skewness or kurtosis.
- Equal Variance of Residuals (homoskedasticity):
Variables have equal variance over time.