Linear Regression Flashcards

1
Q

What is linear regression?

A

A tool for describing relationships between multiple interval scale variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In linear regression, the outcome and predictor are?

A

The outcomes and predictor are both numeric

We can have multiple predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the fundamental idea for linear regression?

A

Fit the best regression line to the data, then try to understand that line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the formula for a linear regression line?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Changing the intercept of the regression line does what to the line?

A

Raises or lowers the regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Changing the slope of the regression line does what to the regression line?

A

Alters the steepness of the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we know what the best regression line is?

A

The principle of ‘least squares’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the principle of least squares?

A

The best regression line for data (X,Y) is the one that minimises the sum squared deviation between the predictions and the actual values

This is referred to as the residual sum of squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you estimate a simple linear regression model in R?

A

Like ANOVA, it is done in stages

  1. lm() estimates the values of b0, b1 etc
  2. summary() runs some hypothesis tests

The lm() function

This is the main “workhorse” function

It creates an “lm” object (i.e. variable), which contains lots of quantities of interest relating to regressions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a regression with only a single predictor called?

A

Simple linear regression

Mostly the same thing as Pearson correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In real life, we usually expect that multiple variables could predict our outcome, what is this called?

A

Multiple linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The slope terms and intercept are called _____

A

Regression coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the test statistic in a multiple linear regression?

A

F statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the sampling distribution of T if the null is true in multiple linear regression?

A

F distribution

Exactly analogous of ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The null hypothesis in ANOVA’s predict?

A

No relationship between the predictors and outcome

(all slope parameters are zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The alternative hypothesis in ANOVA’s predict?

A

The relationship between the predictors and outcome matches the regression model

17
Q

What are the two sources of variance in a regression model?

(assume a model with K predictor variables)

A

SSmod and SSres

18
Q

How do you test the null hypothesis that a specific predictor variable has no relationship to the outcome?

A

Use a t-test

19
Q

What is the effect size measure for multiple linear regression

A

R2

20
Q

How do you interpret the R2 effect size measure?

A
21
Q

Doing a multiple linear regression in R?

A
22
Q

What does the stat block for the regression model look like?

A
23
Q

How do you do regression analysis when the predictors are on different scales?

A

Using standardised regression coefficients

  • Convert all the variables to standard scores before running a regression so all are on the same scale​*
  • Usually denoted as β (beta)​*
24
Q

What are some of the dangers of using regression analysis?

A

1) Outlier

(Model prediction might be way off with outliers)

2) High leverage point

(An observation that has different values on the predictors than the other ones, residual might still be small though)

3) High influence point

(An outlier with high leverage, These are dangerous)

25
Q

Does regression correlation = causation?

A

No

Variables are not controlled and no intervention has been done, so causation cannot be inferred