Flashcards in Unit 1: Simple Linear Regression Deck (16):

1

## Why do we use simple linear regression?

### To model a response variable Y against the predictor variable X

2

## What is Covariance (SXY)?

###
Covariance describes the joint behavior of two Random Variables (X and Y).

The sign indicates the direction but we cannot know the strength because it is dependent on units.

3

## What is the correlation coefficient (R) and what does it tell us?

### The correlation coefficient (R) measures the linear relationship between two or more quantitative variables and falls between -1 and 1. The R value tells you if there is a linear relationship and the strength and direction of that relationship.

4

## What is the coefficient of determination (R2)? What can it tell you about the linear relationship?

###
The coefficient of determination (R2) = SSM/SST.

It is the proportion of the variability in y explained by the linear association with x. It falls between 0 and 1.

It can tell you the strength of the relationship but not the direction.

5

## If the covariance of two variables = 0, what can you say about the independence of the variables?

### You cannot know if the variables are independent just because the covariance is 0. You can only know that there is no linear relationship between those variables. If 2 variables are KNOWN to be independent, than the covariance equals 0, but you cannot assume independence when covariance is 0.

6

## What is Fisher's Z Transformation?

###
It is a variance stabilizing transformation that allows you to construct confidence interval for any p. It can indirectly test the null hypothesis that p=p0 (rho = observed rho) for any p0 not equal to 0.

The rho (p) is more accurate near the boundaries.

7

## Residuals

### Estimated error = observed Y- expected Y

8

## What are the hypotheses for the overall F test for SLR?

###
H0: B1 = 0 (Slope of X =0 and the intercept-only model is a better model)

H1: B1 =/= 0 (Slope of X is not equal to 0. The model with X is a better model)

9

## What are the assumptions for SLR?

###
Linearity

Independence

Normality of Error

Errors are homoskedastic

10

## What do violations of SLR assumptions look like?

###
Curved shape

Fanning shape

heteroskedacity of the residuals

11

## What do we do when assumptions are violated?

###
Proceed with analysis because inference is robust to minor deviations from the assumptions for a large n.

For major violations, consider variable transformations or adding higher order polynomial terms.

For clear trends, consider adding predictors (MLR)

For heteroskedacity, consider advanced regression techniques

12

## What causes the Coefficient of Determination (R2) to increase?

###
Increase in SSM

Increase in MSM

Decrease in SSE

Decrease in Residual Variance (O2)

Stronger Linear relationship between X and Y

13

## What causes the Coefficient of Determination (R2) to decrease?

###
Decrease in SSM

Decrease in MSM

Increase in SSE

Increase in Residual Variance (O2)

Weaker linear relationship between X and Y

14

## What are outliers?

### Outliers are far from data and include points of leverage and influential points.

15

## Why do we use method of least squares?

###
"Closed form" solution

Estimates (B0 & B1) are identical to those from Maximum Likelihood Estimates (MLE)

The estimates are unbiased and have smallest possible variance

16