W4: GLM 1 Flashcards

1
Q

What does the simple linear regression equation mean
yi = b0 + b1 * xi + ei

A
  • yi = outcome variable
  • b0 = intercept (expected y when x = 0)
    *b1 = slope of line (how much y expected to change for 1 unit change in x)
    *x = predictor / explanatory variable
    *e = residual / error term
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What direction will the level of line shift when intercept is positive (b0 >0) ?

A

Up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What direction will the level of line shift when intercept is negative (b0 < 0)?

A

Down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Does changes to intercept change the slope of line?

A

Doesn’t have to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the line of best fit?

A
  • Line that minimizes sum of squared residuals
  • Gives estimates of b0 and b1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are residuals?

A

Difference between observed and predicted outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a multiple linear regression?

A
  • Model with more than 1 predictor
  • Each predictor has independent associations with outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the multiple linear regression equation mean
yi = b0 + b1 * x1i + … + bk * xki + ei

A
  • b0 = intercept (expected y when all predictors are 0)
  • bk / b1 = slope (how much y is expected to change for 1 unit change in xk / x1, holding all other predictors constant)
    *e = residual / error term
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do simple and multiple linear regressions assume the outcome to be?

A

Continuous and normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do simple and multiple linear regressions assume the association between explanatory variables and outcome to be?

A

Linear association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are generalized linear models (GLMs) used to extend…

A

To extend linear model to different outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are 3 examples of GLMs?

A
  1. Linear regression (continuous)
  2. Logistic / probit regression (binary)
  3. Poisson / Binomial regression (count)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How many parameters does normal distribution have and name their functions.

A

2 parameters
Mean : controls location for centre of distribution
SD: controls scale/spread of distribution
* N (mean, SD)
* standard normal distribution: N (0,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is link function and inverse link function always called?

A

g() and g()-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between R^2 and adjusted R^2?

A
  • R^2 : assumes all independent variables in model affects model results
  • Adjusted R^2: better estimate of model (USE FOR INTERPRETATION),
    considers only independent variables which actually have an effect on model performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does F-test / F-statistic do?

A

Tests whether model is statistically significant overall or not (all predictors tested simultaneously)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What should you do to categorical variables when including them in linear regression?

A

Dummy code them so it becomes numeric predictor (0s and 1s)
E.g 1 = female, 0 = male

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you interpret the regression coefficient for sex that has been dummy coded?

A

It is the difference in predictor score on average between males (0) and females (1).
E.g Expected value of neuroticism at intercept (when predictors are 0) = male ppts scores
sex1 = difference between male + female ppts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are 3 assumptions of linear regression model diagnostics?

A
  1. Normality (distribution) of residuals
  2. Independent observation
  3. Homogeneity of variance (spread/variance of residuals should be about equal)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the difference between b and beta in linear regression equation?

A

B = unstandardized coefficients (raw change in units)
beta = standardized coefficients (change in SDs)

21
Q

What does b1 in the linear regression equation show?
The ___ between 2 variables

A

The direction and strength of relationship (reg coeff) between 2 variables

22
Q

What does the inclusion of error term suggest for our data?

A

Data is real and not going to fall perfectly on the regression line.
Perfect model: eta = b0 + b1 * 1 (without ei)

23
Q

What are 4 examples of linear models?

A
  1. T-test
  2. ANOVAs
  3. Pearson correlations
  4. Linear regressions
24
Q

What is the type of y variable for linear regression?

A

Continuous and normally distributed

25
Q

What is the type of y variable for logistic regression?

A

Binary (0 / 1, yes / no, T / F)

26
Q

What is the type of y variable for Poisson regression?

A

Count (how many of something)

27
Q

What is a probability distribution?

A

Distribution of probability of an outcome
E.g for a coin flip, prob distribution = 0.5 heads, 0.5 tails.

28
Q

Assumption of y (outcome) being conditionally normal has a mean and SD of what?
What does this mean for the distribution of errors?

A

Mean as eta and SD as residuals
i.e N (eta, ohm(residuals)
Also means that errors are normally distributed with mean of 0 and some SD
i.e N(0, ohm/SD of residuals)

29
Q

What does the acronym L.I.N.E represent for the assumption of normality?

A

Linear r-ship
Independent variables (observations) and errors (uncorrelated)
Normally distributed errors (with a mean of 0, random)
Equal variance of errors

30
Q

What does GLMs do?

A

Uses some function to transform/link eta from linear space to outcome space

31
Q

Is there a link function in linear regression?

A

No, it’s already in linear space so it’s called the identity function

32
Q

What kind of model does lm() fit?

A

Linear model
lm ( outcome (dependent variable) ~ predictor, data = d )

33
Q

Is the estimate of predictor variable from lm() output standardized?

A

No, it is the unstandardized coefficient.

34
Q

What does the shaded region on visreg graphs show?

A

95% confidence intervals

35
Q

What does the QQ plot / deviates plot from modelDiagnostics() test for?

A

Extreme outliers (solid black)

36
Q

What does the density plot from modelDiagnostics test for?

A

Normal distribution of residuals

37
Q

What is the equation for the effect size, R^2?

A

variance explained / total variance

38
Q

What is the equation of cohen’s f^2 (effect size) for linear regression models?

A

R^2 / 1 - R^2 (variance not explained)

39
Q

What is the equation of cohen’s f^2 (effect size) for multiple regression models (individual predictor)?

A

R^2AB - R^2A / 1 - R^2AB

  • (R^2AB - R^2A) = difference in the coefficient of determination (variance) between the full model (including all independent variables) and a reduced model (subset of independent variables)
  • (1 - R^2AB) = unexplained variance/residual variance by the model
40
Q

How do you show the inclusion of main effects when including interaction term in lm() equation?

A

Example:
neuroticism = b0 + (b1 * stress) + (b2 * sex) + (b3* stress * sex)

41
Q

When plotting a continuous moderator, what are the values used for breaks() in visreg?

A

breaks = c( mean - 1 SD, mean + 1SD)

42
Q

If we have more than 2 predictors (multiple linear regression) what kind of best fit do we have?

A

A plane of best fit (3D)

43
Q

Nothing is truly linear. Regression models are a simplification of _____

A

reality

44
Q

Normal distribution is also known as the ______ distribution

A

Gaussian

45
Q

p-value can be used to determine effect size and magnitude (strength of relationship).
True or false?

A

False, just used to see if it’s above or below our determined threshold (significance)

46
Q

If the data is derived from siblings or repeated measures, what assumption does it violate?

A

Assumption of independent (variables) observations and errors (bc they would be correlated)

47
Q

If the loess smooth line is not flat and about 0, what does it indicate?

A

Systematic bias in residuals

48
Q

What is a transformation you do if the assumption of homogeneity is violated?

A

Remove extreme values