Linear Regression Flashcards

simple linear regression, multiple linear regression, model selection, diagnostics

1
Q

What is regression?

A

A way to study relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main reasons we’d use regression?

A
  • description and explanation (genuine interest in the nature of the relationship between variables)
  • prediction (using variables to predict others)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are linear regression models?

A
  • contain explanatory variable(s) which help us explain or predict the behaviour of the response variable
  • assume constantly increasing or decreasing relationships between each explanatory variable and the response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What structure does a linear model have?

A

response = intercept + (slope x explanatory variable) + error

yi = β0 + β1xi + ∈i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the intercept of a linear model?

A

β0

  • response variable when the explanatory variables are 0
  • where the regression cuts the vertical axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the slope of a linear model?

A

β1, gradient of the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the error term of a linear model?

A

∈i

  • not all data follows the relationship exactly
  • ∈i allows fo deviations
  • normally distributed in the y dimension (zero mean, variance is estimated as part of the fitting process)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Least Square (LS) Criterion?

A
  • can be used to fit the regression
  • finds parameters that minimise:

Σ (data - model)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a residual?

A

The vertical distance between the observed data and the best fit line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is the slope estimated?

A

β1(hat) = (Σ (xi-x̄) * yi) / (Σ (xi-x̄)^2)

x̄ is the mean explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is the intercept estimated?

A

β0(hat) = y̅ - (β1(hat) * x̄)

x̄ is the mean explanatory variable
y̅ is the mean of the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the variance estimate calculated?

A

s^2 = (1/(n - k - 1))*Σ (yi - yi(hat))^2

n is number of observations, k is number of slope parameters estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we work out how much of the total observered variation has been explained?

A

Work out the proportion of unexplained variation and - from 1:

R^2 = 1 - ((Σ(yi - y(hat))^2)/(Σ(yi - y̅)^2))

R^2 = 1 - (SSerror/SStotal)

numerator: square error
demoninator: total sum of squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the definition of the best line?

A

One that minimises the residual sums-of-squares.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the main reasons to use multiple covariates?

A
  • description (interest in findinf relationship between such variables)
  • prediction (knowledge of some will help us predict others)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is added to a simple regression model to make it a multiple regression model?

A

More explanatory variables (of the form βp*xpi).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What model is used for the noise of a multiple regression model?

A

Normal distribution, 0 mean, variance σ^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are dummy variables?

A
  • switch on (x=1) or off (x=0) depending on level of the factor variable
  • first of the group acts as baseline, rest switch on when applicable (n-1 variables)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is parameter inference?

A

In order to make general statements about model parameters we can generate ranges of plausible values for these parameters and test “no-relationship” hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What test statistic value is used when calculating the confidence intervals for slope parameters?

A

t(α/2, df=N-P-1)

N: total number of observations
P: number of explanatory variables fitted in the model

21
Q

What is the null hypothesis for parameter inference?

A

H0: βp(hat) = 0

H1: βp(hat) does not equal 0

22
Q

What is the equation for the adjusted R^2?

A

Adjusted R^2 = 1 - ((N - 1)*(1 - R^2)/N - P - 1)

N: total number of observations
P: number of explanatory variables fitted in the model
R^2: squared correlation

23
Q

What is the standard error for the prediction on xp (xp any value)?

A

se(y(hat)) = sqrt(MSE * ((1/n)+(((xp - x̄)^2)/(Σ(xi - x̄^2)))))

MSE: mean square error/residual from ANOVA table

24
Q

Why do we want an appropriate amount of covariates in our model? What happens if theres too many/few? What if the model is too simple/complex?

A

too few: throw away valuable info
non-esential variables: se and p-value tend to be too large
too simple/complex: model will have poor predictive abilities

25
Q

What happens when collinear variables are put together in a model?

A
  • model is unstable

- inflated se

26
Q

What are Variance Inflation Factors (VIFs)?

A

Detect collinearity.

VIF = 1/(1 - R^2)

R^2 squared correlation

27
Q

How should variables be removed?

A

One at a time.

28
Q

How does p-value based model selection work?

A
  • covariates with one associated coefficient, retention can be based on the associated p-value (large p-values suggest omission)
29
Q

What type of regression models does the F-test work on? What can we use for other models?

A

Nested models. Can use AIC or BIC on both nested and non-nested models.

30
Q

What is Akaike’s Information Criterion (AIC)?

A

The smaller the AIC value, the better the model.

AIC = -2*log-likelihood value + 2P

P: number of est. parameters
log-likelihood: calculated using the est. parameters in the model

31
Q

What is AICc?

A

Used when sample size isnt much larger than the number of parameters in the model.

AICc = AIC + (2P(P + 1))/(N - P - 1)

N»P then AICc -> AIC

32
Q

What is BIC?

A

Differs from AIC by employing a penalty that charges with the sample size (N).

BIC = -2log-likelihood value + log(N)P

33
Q

What values of BIC represent a better model?

A

Smaller BIC values.

34
Q

How are AIC weights calculated?

A

Δi(AIC) = AICi - minimum AIC

wi(AIC) = exp{-1/2Δi(AIC)}/(Σ e{-1/2Δk(AIC)})

35
Q

What is interaction?

A
  • Similar to ‘syndergy’ in chemistry. Non-additive effect (eg A = +10, B = +20, A+B = -10)
  • interaction term is significant then p-values associated with main effect are irrelevent
  • interactions should always come last in the sequence of predictors
36
Q

What values can R^2 take?

A

Between 0 and 1.

37
Q

What assumptions do we make about the errors of a linear model?

A

We assume one Normal distribution provides the (independant) noise.

38
Q

How do we assess Normality?

A
  • qualitative assessment from plotting (histogram of residuals, QQ-norm plot)
  • formal test of normality (wilks-shapiro)
39
Q

What do QQ-norm plots tell us? And how are they formed?

A
  • plot quantiles of two sets of data against one another
  • shapes are similar -> get straight line (y=x) -> data normally dist.
  • residuals in ascending order, standardised (divide by sd), plotted against normal dist.
40
Q

How is the ith point on a QQ-norm plot found?

A

p(i) = i/(n+1)

41
Q

What is the Shapiro-Wilks test?

A
  • tests for normality

- H0: data is normally dist.

42
Q

What is the Breusch-Pagan test?

A
  • a model which satifies the constant error variance assumption would produce a plot with a horizontal line
43
Q

How do we assess independance?

A
  • Durbin-Watson test (H0: uncorreleated errors)

- independnce can be violated in ways that cannot be tested (eg pseudoreplication)

44
Q

How can we tell what variable in a signal causes non-linearity?

A

Use partial (residual) plots. These are found by adding the estimated relationship (for pth predictor βp*xpi) to the residuals (ri) of the model.

45
Q

When do we bootstrap (for linear regression models)?

A
  • horrible dist of residuals
  • reasonably happy with signal model
  • independant isnt and issue
46
Q

What values can correlation take?

A

The correlation coefficient (r) can take values between -1 and 1. (These pretty much correspond to gradients of straight lines).

47
Q

How is the significance of r calculated?

A

t = r*sqrt(n - 2) / sqrt(1 - r^2)

48
Q

Causaulity implies causation. True/False?

A

True, but not the other way around.