Regression, GLMs and beyond Flashcards

1
Q

What test do we use if we want to consider the relationship between continuous predictor and response variables?

A

Regression
(Or correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the line of best fit in least squares regression do?

A

Minimises the squared deviations of the datapoints from the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Should you just calculate the line of best fit without looking at the data?

A

No, lots of different patterns of data will return the same line of best fit, there is no substitute for plotting the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a correlation coefficient tell us?

A

Tells you about the strength of the correlation between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the correlation coefficient symbol?

A

Rho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What values can the correlation coefficient take?

A

Between -1 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a correlation coefficient of 1 tell us?

A

Our data lies along a perfect straight line with a positive gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a correlation coefficient of -1 tell us?

A

Our data lies along a perfect straight line with a negative gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Does the correlation coefficient tell us anything about the gradient of the line?

A

No, it just tells us how well the datapoints lie along the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Pearson’s correlation for?

A

Linear relationships between two continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Non-parametric equivalent of Pearson’s correlation

A

Spearman’s rank correlation

Can be used when the relationship is not linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

General Linear Model for a categorical predictor

A

Y = A0 + (B1, B2.. B how many levels of the predictor) + e

Y = variable you’re predicting
A0 = constant
B terms = effect of categorical predictor variable
e = error (normally distributed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

General Linear Model for continuous predictors

A

Y = A0 + A1x1 + A2x2 + error

Y = variable you’re predicting
A0 = constant
A1x1 = gradient of relationship with predictor variable x1
A2x2 = gradient of relationship with predictor variable x2
e = error (normally distributed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

General Linear Model for both categorical and continuous predictors

A

Y = A0 + A1x1 + (B1, B2….) + e

Y = variable you’re predicting
A0 = constant
A1x1 = gradient of relationship with predictor variable x1
B terms = effect of categorical predictor variable
e = error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the test statistic for a GLM?

A

F ratio

F = treatment mean square / error mean square

(explained variation (signal) / unexplained variation (noise))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does (Intercept) tell us on the R summary output for a regression model?

A

The y-intercept of the first group (or the group that isn’t mentioned by name lower down on the summary)

The y-intercept is the number under the estimate column on this row

17
Q

How to find the intercept for the other line?

A

Look for the variable name data$variable you are looking for

The number in the estimate column on this row is the difference between the y-intercept of the other groups line and the line you are looking for

18
Q

What does the middle row of the r output tell us? (data$lnMass in the lecture slides)

A

The gradient of the relationship between the two variables

In the lecture slides it is the gradient of the relationship between ln(Mass) and ln(brain size) (body mass and brain size)

When there is no interaction, the gradient is the same for both lines

19
Q

How to calculate the amount of variation in the data that we have explained by the model

A

R^2 = 1 - residual sum of squares / total sum of squares

This is because if we calculate how much of the variation the model hasn’t explained, we can subtract this from 1 to find out how much it has explained

The closer R^2 is to 1, the better the model (as more of the variation has been explained by the model)

20
Q

What is interpolation?

A

Making predictions within the range of the data that we have

Usually a reasonable thing to do

21
Q

What is extrapolation?

A

Making predictions outside of the range of the data that we have

Usually meaningless, be wary of extrapolation

22
Q

Simpson’s Paradox

A

A combination of things can add up to create a relationship which is not the truth (or hide a relationship)

Including the right explanatory factors can help us to understand this relationship and see if the relationship changes when the variables are grouped

23
Q

What happens if our relationship isn’t linear?

A

Transform the data to make the relationship linear

Use polynomial terms in the GLM

Have to include the linear term and the polynomial term

24
Q

What if the relationship is not well described by polynomials?

A

Generalised linear models allow response variables to have errors which are not normally distributed

25
Q

Logistic regression

A

Particular form of a generalised linear model which uses a binomial distribution which is often used for binary or survival data

26
Q

Fixed effects

A

An explanatory variable where…

  • The level of the explanatory variable is meaningful
  • We wish to draw inferences about the effects of that particular level of the explanatory variable on the response variable
  • If you repeated the experiment, it is possible to repeat exactly the same levels of that variable
27
Q

Random effects

A

An explanatory variable where…

  • The level of the explanatory variable is not meaningful (difference between being participant 1 and participant 5)
  • We do not need to draw inferences about the effect of a specific level of the explanatory variable on the response variable
  • If you repeated the experiment, you wouldn’t be able to repeat exactly the same levels of the variable, but you would be able to draw new levels (or individuals) from the same population
28
Q

What does independence of error mean?

A

Knowing something about the error associated with one datapoint tells you nothing about the error associated with any other datapoint

29
Q

What is independence of error an assumption of?

A

General linear models

30
Q

Mixed models

A

Allow us to include both random and fixed explanatory variables in our model

31
Q

By identifying an explanatory variable as _____ we can fit models which accurately account for the different sources of variation in the dataset

A

Random

32
Q

What can we use to decide whether an explanatory factor should be included in the model? (to determine the importance of the different effects in mixed models)

A

Likelihood ratio tests

33
Q

What do we mean by likelihood?

A

The probability of observing our data, given the model

34
Q

What does one likelihood tell us?

A

Not a lot, but comparing likelihoods can tell us a lot

Model with higher likelihood will give you the better model

35
Q

How to determine whether an explanatory factor is important…

A
  1. Fit a model with lmer() containing all the explanatory factors of interest
  2. Fit a new model with lmer() leaving out one of the explanatory factors
  3. Run a likelihood ratio test to determine if there is a significant difference in likelihood between the two models
  4. If removing the factor does make a significant difference to the likelihood of the model, that is evidence that the factor is important
36
Q

Random intercepts model

A

Assumes that the relationship between two factors have the same slope but the intercept of the slope can change according to the person

37
Q

Random slopes and random intercepts model

A

Assumes that the relationship between the two factors does not have the same slope for each person and the intercept can also change according to the person

38
Q

Argument for using the simpler model…

A

Sometimes we just want to find the simplest model that explains the data

We can examine whether including the random slopes makes a difference to the model and leave it out if it doesn’t

39
Q

Argument for using the more complicated model…

A

If we want to test if something really has an effect on something else, it is better to keep it maximal as this will give the best representation of real life and should give the “best” answer