Exam 1 Flashcards by Owen Foster

Why use models?

To understand the relationships between variable

To predict future outcomes

To quantify differences between groups or treatments

How well did you know this?

Not at all

Perfectly

Response variable

the variable that you want to understand/model/predict. aka - y, dependent variable

How well did you know this?

Not at all

Perfectly

explanatory variables

the variables you know and think that they are maybe related to the response variable that you want to use to figure out a pattern/model/relationship. aka - x, independent variable, predictor variable, covariates

How well did you know this?

Not at all

Perfectly

model

a function that combines explanatory variables mathematically into estimates of the response variable

How well did you know this?

Not at all

Perfectly

error

what’s left over; the variability in the response that your model doesn’t capture (error
is somewhat of a misnomer – maybe noise is a better term)

How well did you know this?

Not at all

Perfectly

Categorical Data

Two outcomes, not numerical

How well did you know this?

Not at all

Perfectly

Quantitative variables

Numerical

How well did you know this?

Not at all

Perfectly

Parameter

Describes entire population

How well did you know this?

Not at all

Perfectly

Statistic

Describes sample

How well did you know this?

Not at all

Perfectly

The four-step process

Choose
Fit
Assess
Use

How well did you know this?

Not at all

Perfectly

Model Notation

Y = f(X) + e

How well did you know this?

Not at all

Perfectly

ybar or xbar

averages

How well did you know this?

Not at all

Perfectly

yhat

estimate

How well did you know this?

Not at all

Perfectly

Y = ? (Simple Linear Regression)

Beta0 + Beta1*X + e

How well did you know this?

Not at all

Perfectly

Yhat = ? (Simple Linear Regression)

Beta0 + Beta1*X

How well did you know this?

Not at all

Perfectly

Naive Model

Mean + Error

Age = Agebar + e

How well did you know this?

Not at all

Perfectly

Residuals

Study These Flashcards

How far from the prediction line points are

yhat - y

Least Squares

Study These Flashcards

Technique to minimize SSE
The value of all squared residuals is at a minimum

SSE

Study These Flashcards

SSE =∑(yhat − y)^2

Regression Standard Error

Study These Flashcards

σ = sqrt(SSE / n-2)

Linearity

Study These Flashcards

If the resuduals resemble a line

Independence

Study These Flashcards

Residuals do not depend on time. Don’t get bigger or smaller as plot goes on

Normality of Residuals:

Study These Flashcards

The residuals are distributed symmetrically around zero, with no skewness or kurtosis.

Equal Variance of Residuals (homoskedasticity):

Study These Flashcards

Variables have equal variance over time.

Standard Error

ei / σhat = yi - yhati/σhat If greater than 3 it is considered an outliar

Leverage

Points that have extreme x values can have a disproportionate influence on the slope of the regression line

Hypothesis Testing

H0: B1 = 0 HA: B1 DNE 0

Test Statistic

t = B1hat / SE

Confidence Interval for Slope

Beta1 +/- t* SE

Coefficient of determination

R^2, How much of the variability is explained by the model

Partitioning variability

ANOVA (yi - ybar) = (yhat - ybar) + yi - yhat)

SST

∑(yi - ybar)^2

SSM

∑(yhat-ybar)^2

SST, SSM, SSE Relationship

SST = SSM + SSE

R^2 =

SSM/SST

Confidence Interval

sqrt(1/n + [x*-xbar]^2/[∑x-xbar^2])

Prediction Interval

sqrt(1 + 1/n + [x* -xbar]^2/[∑x-xbar^2])

MLR

Y = B0 + B1*X1 + B2*X2 +...+Bp*Xp + e

MLR with categorical data

Parallel slopes model

When does p-value explain

p-value < .05

Exam 1 Flashcards

(40 cards)