General linear model Flashcards by Ashley Winegarden

General linear model

all about expressing the relationship between variables

For example…
What is the relation between a test score and the grouping variable
What is the relation between a pre-and-post-test measure

How well did you know this?

Not at all

Perfectly

GLM Stat tests

T-tests
ANOVA, ANCOVA, MANOVA, MANCOVA
Correlations (pearson and spearman)
Linear regressions and multiple regressions
Goodness of fit test/chi squares
Machine learning and prediction models

How well did you know this?

Not at all

Perfectly

GLM equation

using this equation, we can predict the outcome variable Ŷi for participant i, as long as we have the X value for participant i

Ŷi = b0 + b1xi + ei

How well did you know this?

Not at all

Perfectly

Ŷ

Ŷ is the estimate of the observation outcome (Y) ie. represent the estimated DV

How well did you know this?

Not at all

Perfectly

b0 is the intercept of the regression line (where is crosses the axis)

How well did you know this?

Not at all

Perfectly

B1 is the the slope of the regression line

How well did you know this?

Not at all

Perfectly

xi is the observation of the predictor (X) ie represents the IV

How well did you know this?

Not at all

Perfectly

ei Is the residual error term, which is the difference between observed and predicted Y

How well did you know this?

Not at all

Perfectly

i stands for the participant whose data is being used

How well did you know this?

Not at all

Perfectly

Correlation

a standardized measure of the linear relation between two variables

X and Y are interchangeable

How well did you know this?

Not at all

Perfectly

r-value

The correlation is represented by an r-value that can take any value between -1.00 to +1.00

The numerical value represents the shape of the correlation, the positive or negative represents the direction

A 1 is a perfect line, while a smaller value like 0.2 will be unfocused along a line

positive means when the IV increases so does the DV (and the oppositve for negative)

How well did you know this?

Not at all

Perfectly

Common correlation interpretations

For absolute correlation values (positive or negative), common interpretation:

= 0.00 no relation, entirely random
0.01 to 0.30 weak
0.30 to 0.50 moderate
0.50 to 0.99 strong
= 1.00 perfect, identical

But these rules are arbitrary and should be based on the context of the study

How well did you know this?

Not at all

Perfectly

Anscombe quartet

Idea that graphed data can look totally different but have the same summary statistics

How well did you know this?

Not at all

Perfectly

Ordinary least squares regression

The general linear model tries to create or fit a line (line of best fit) through the datapoints that is as close as possible to every datapoint

This is done by minimizing the squared distance between the line and each point, which is why it is called “Ordinary least squares regression”

By default its estimates come out unstandardized ie. using the units of the original variables

How well did you know this?

Not at all

Perfectly

Coefficients for ordinary least squares regression

For ordinary least squares regression: The estimated regression coefficients (b0 and b1) are the those that minimise the sum of the squared residuals.

Take the distance between a datapoint and the fitted line
Square that distance

Repeat for all datapoints, and sum up all these surfaces

Find the line where this combined surface is the smallest.

How well did you know this?

Not at all

Perfectly

Unstandardized

Study These Flashcards

Unstandardized coefficients are based on raw data and one unit changes in the IV

Unstandardized estimates more intuitive, but can’t easily be compared across different kinds of measurements

“For every 1 min difference in average exercise per day, there’s a 0.017 difference in BMI” (unstandardized)

Standardized

Study These Flashcards

Standardized data is on analyzed data/standard deviations

Standardized estimates are less concrete, but can be compared across different measurements; can use the correlation “rules of thumb” we discussed above

“For every 1SD difference in average exercise per day, there’s a 0.176 SD difference in BMI” (standardized)
OR
“Average minutes exercise per day 3% of the variance in BMI” (standardized R2)

Multiple/linear regression model formula

Study These Flashcards

When including several predictors: Need Multiple regression model

Ŷi = b0 + b1xi + b2xi +ei ,

Can go on adding as many predictors as makes sense
Ŷi = b0 + b1xi + b2xi + b3xi … +ei ,

But, instead of a singular line, we are now trying to create a plane within a 3D space that still minimizes the distance between observed data points

Controlling/Adjusting/Partialing out in Linear Regressions

Study These Flashcards

All refer to the same process of when having multiple variables in your regression

If you control outcome Y for predictor variable X, then check the association between variable Z and outcome Y, you’re asking: “what would be the Y~Z relation in a sample where everyone had the average level of X?”

This is not MAGIC

Predictions from regression models, even if “controlled” don’t suddenly make associations causal

All depends on where your data came from

If they’re from a randomised experiment, causal conclusions might be justified
If they’re from an observational study, probably no

(Multiple) regression assumptions

Study These Flashcards

Normality (of residuals) (-> if you were to plot the residuals you would see a normal distribution)

Linearity (-> associations between X and Y are linear, aka constant)

Homogeneity of variance (of residuals)

Uncorrelated predictors (-> no collinearity)

Uncorrelated residuals (-> no effect of another unmeasured variable)

No highly-influential outliers

T-tests and GLM

Study These Flashcards

Subtype of GLM, equivalent of simple linear regression

Think of the intercept as the mean of group 1

And the slope as the distance from the intercept to the mean of group 2 (the black line)

Ŷi = b0(mean_group1) + b1(mean_group2–mean_group1)xi + ei

ANOVA

Study These Flashcards

Comparing more than 2 group
ANOVA (Analysis Of VAriance) is a kind of general linear model that only has categorical predictors

Even though it’s called analysis of variance, it’s actually mainly interested in differences between means

A one-way ANOVA, comparing ≥ 3 means, is equivalent to a multiple regression model

The ANOVA’s test statistic is the F-ratio – the ratio of variance explained between the groups to that explained within them

R: Get r-value/cor coeff

Study These Flashcards

cor.test(dataset$variable_1 , dataset$variable_2)

R: Plot correlation

Study These Flashcards

plot(dataset$variable_1 , dataset$variable_2)

R: Plot with line of best fit (unstandardized)

ggplot(dataset, aes( x = IV, y + DV)0 + geom_point() + stat_smooth(method = lm)

R: linear model

new_name <- lm(IV ~ DV , data = dataset) summary(new_name)

R: Find what type of data

class(dataset$variable)

R: Convert to factor

dataset$variable <- factor(dataset$variable)

R: Multiple regression

lm(variable_1 ~ variable_2 + predictor, data = dataset)

R: Remove outliers

dataset$variable[dataset$variable>or<#] <- NA

R: Multiple regression with interaction

name <- lm(variable_1 ~ variable_2 (+ predictor if necessary) + interactionv1:interactionv2, , data = dataset) summary(name)

R: Plot interaction

interact_plot(name, pred = IV, modx = DV)

R: ANOVA

aov ()

General linear model Flashcards

(33 cards)