General linear model Flashcards

1
Q

General linear model

A

all about expressing the relationship between variables

For example…
What is the relation between a test score and the grouping variable
What is the relation between a pre-and-post-test measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

GLM Stat tests

A

T-tests
ANOVA, ANCOVA, MANOVA, MANCOVA
Correlations (pearson and spearman)
Linear regressions and multiple regressions
Goodness of fit test/chi squares
Machine learning and prediction models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

GLM equation

A

using this equation, we can predict the outcome variable Ŷi for participant i, as long as we have the X value for participant i

Ŷi = b0 + b1xi + ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A

Ŷ is the estimate of the observation outcome (Y) ie. represent the estimated DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

b0

A

b0 is the intercept of the regression line (where is crosses the axis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

B1

A

B1 is the the slope of the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

xi

A

xi is the observation of the predictor (X) ie represents the IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ei

A

ei Is the residual error term, which is the difference between observed and predicted Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

i

A

i stands for the participant whose data is being used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Correlation

A

a standardized measure of the linear relation between two variables

X and Y are interchangeable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

r-value

A

The correlation is represented by an r-value that can take any value between -1.00 to +1.00

The numerical value represents the shape of the correlation, the positive or negative represents the direction

A 1 is a perfect line, while a smaller value like 0.2 will be unfocused along a line

positive means when the IV increases so does the DV (and the oppositve for negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Common correlation interpretations

A

For absolute correlation values (positive or negative), common interpretation:

= 0.00 no relation, entirely random
0.01 to 0.30 weak
0.30 to 0.50 moderate
0.50 to 0.99 strong
= 1.00 perfect, identical

But these rules are arbitrary and should be based on the context of the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Anscombe quartet

A

Idea that graphed data can look totally different but have the same summary statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ordinary least squares regression

A

The general linear model tries to create or fit a line (line of best fit) through the datapoints that is as close as possible to every datapoint

This is done by minimizing the squared distance between the line and each point, which is why it is called “Ordinary least squares regression”

By default its estimates come out unstandardized ie. using the units of the original variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Coefficients for ordinary least squares regression

A

For ordinary least squares regression: The estimated regression coefficients (b0 and b1) are the those that minimise the sum of the squared residuals.

Take the distance between a datapoint and the fitted line
Square that distance

Repeat for all datapoints, and sum up all these surfaces

Find the line where this combined surface is the smallest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Unstandardized

A

Unstandardized coefficients are based on raw data and one unit changes in the IV

Unstandardized estimates more intuitive, but can’t easily be compared across different kinds of measurements

“For every 1 min difference in average exercise per day, there’s a 0.017 difference in BMI” (unstandardized)

17
Q

Standardized

A

Standardized data is on analyzed data/standard deviations

Standardized estimates are less concrete, but can be compared across different measurements; can use the correlation “rules of thumb” we discussed above

“For every 1SD difference in average exercise per day, there’s a 0.176 SD difference in BMI” (standardized)
OR
“Average minutes exercise per day 3% of the variance in BMI” (standardized R2)

18
Q

Multiple/linear regression model formula

A

When including several predictors: Need Multiple regression model

Ŷi = b0 + b1xi + b2xi +ei ,

Can go on adding as many predictors as makes sense
Ŷi = b0 + b1xi + b2xi + b3xi … +ei ,

But, instead of a singular line, we are now trying to create a plane within a 3D space that still minimizes the distance between observed data points

19
Q

Controlling/Adjusting/Partialing out in Linear Regressions

A

All refer to the same process of when having multiple variables in your regression

If you control outcome Y for predictor variable X, then check the association between variable Z and outcome Y, you’re asking: “what would be the Y~Z relation in a sample where everyone had the average level of X?”

This is not MAGIC

Predictions from regression models, even if “controlled” don’t suddenly make associations causal

All depends on where your data came from

If they’re from a randomised experiment, causal conclusions might be justified
If they’re from an observational study, probably no

20
Q

(Multiple) regression assumptions

A

Normality (of residuals) (-> if you were to plot the residuals you would see a normal distribution)

Linearity (-> associations between X and Y are linear, aka constant)

Homogeneity of variance (of residuals)

Uncorrelated predictors (-> no collinearity)

Uncorrelated residuals (-> no effect of another unmeasured variable)

No highly-influential outliers

21
Q

T-tests and GLM

A

Subtype of GLM, equivalent of simple linear regression

Think of the intercept as the mean of group 1

And the slope as the distance from the intercept to the mean of group 2 (the black line)

Ŷi = b0(mean_group1) + b1(mean_group2–mean_group1)xi + ei

22
Q

ANOVA

A

Comparing more than 2 group
ANOVA (Analysis Of VAriance) is a kind of general linear model that only has categorical predictors

Even though it’s called analysis of variance, it’s actually mainly interested in differences between means

A one-way ANOVA, comparing ≥ 3 means, is equivalent to a multiple regression model

The ANOVA’s test statistic is the F-ratio – the ratio of variance explained between the groups to that explained within them

23
Q

R: Get r-value/cor coeff

A

cor.test(dataset$variable_1 , dataset$variable_2)

24
Q

R: Plot correlation

A

plot(dataset$variable_1 , dataset$variable_2)

25
Q

R: Plot with line of best fit (unstandardized)

A

ggplot(dataset, aes( x = IV, y + DV)0 + geom_point() + stat_smooth(method = lm)

26
Q

R: linear model

A

new_name <- lm(IV ~ DV , data = dataset)
summary(new_name)

27
Q

R: Find what type of data

A

class(dataset$variable)

28
Q

R: Convert to factor

A

dataset$variable <- factor(dataset$variable)

29
Q

R: Multiple regression

A

lm(variable_1 ~ variable_2 + predictor, data = dataset)

30
Q

R: Remove outliers

A

dataset$variable[dataset$variable>or<#] <- NA

31
Q

R: Multiple regression with interaction

A

name <- lm(variable_1 ~ variable_2 (+ predictor if necessary) + interactionv1:interactionv2, , data = dataset)
summary(name)

32
Q

R: Plot interaction

A

interact_plot(name, pred = IV, modx = DV)

33
Q

R: ANOVA

A

aov ()