Exam 1 Flashcards

(40 cards)

1
Q

Why use models?

A

To understand the relationships between variable

To predict future outcomes

To quantify differences between groups or treatments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Response variable

A

the variable that you want to understand/model/predict. aka - y, dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

explanatory variables

A

the variables you know and think that they are maybe related to the response variable that you want to use to figure out a pattern/model/relationship. aka - x, independent variable, predictor variable, covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

model

A

a function that combines explanatory variables mathematically into estimates of the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

error

A

what’s left over; the variability in the response that your model doesn’t capture (error
is somewhat of a misnomer – maybe noise is a better term)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Categorical Data

A

Two outcomes, not numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Quantitative variables

A

Numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parameter

A

Describes entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistic

A

Describes sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The four-step process

A
  1. Choose
  2. Fit
  3. Assess
  4. Use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Model Notation

A

Y = f(X) + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ybar or xbar

A

averages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

yhat

A

estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Y = ? (Simple Linear Regression)

A

Beta0 + Beta1*X + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Yhat = ? (Simple Linear Regression)

A

Beta0 + Beta1*X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Naive Model

A

Mean + Error

Age = Agebar + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Residuals

A

How far from the prediction line points are

yhat - y

18
Q

Least Squares

A

Technique to minimize SSE
The value of all squared residuals is at a minimum

19
Q

SSE

A

SSE =∑(yhat − y)^2

20
Q

Regression Standard Error

A

σ = sqrt(SSE / n-2)

21
Q

Linearity

A

If the resuduals resemble a line

22
Q

Independence

A

Residuals do not depend on time. Don’t get bigger or smaller as plot goes on

23
Q

Normality of Residuals:

A

The residuals are distributed symmetrically around zero, with no skewness or kurtosis.

24
Q
  • Equal Variance of Residuals (homoskedasticity):
A

Variables have equal variance over time.

25
Standard Error
ei / σhat = yi - yhati/σhat If greater than 3 it is considered an outliar
26
Leverage
Points that have extreme x values can have a disproportionate influence on the slope of the regression line
27
Hypothesis Testing
H0: B1 = 0 HA: B1 DNE 0
28
Test Statistic
t = B1hat / SE
29
Confidence Interval for Slope
Beta1 +/- t* SE
30
Coefficient of determination
R^2, How much of the variability is explained by the model
31
Partitioning variability
ANOVA (yi - ybar) = (yhat - ybar) + yi - yhat)
32
SST
∑(yi - ybar)^2
33
SSM
∑(yhat-ybar)^2
34
SST, SSM, SSE Relationship
SST = SSM + SSE
35
R^2 =
SSM/SST
36
Confidence Interval
sqrt(1/n + [x*-xbar]^2/[∑x-xbar^2])
37
Prediction Interval
sqrt(1 + 1/n + [x* -xbar]^2/[∑x-xbar^2])
38
MLR
Y = B0 + B1*X1 + B2*X2 +...+Bp*Xp + e
39
MLR with categorical data
Parallel slopes model
40
When does p-value explain
p-value < .05