lecture 6 - experiment meets analysis Flashcards

Question 1

Q

GLM assumptions

Answer

A

linearity: any change in the regressor is associated with a proportional change in the data
–> i.e., there is a linear relationship between regressor and data
normality: residuals are normally distributed
no multicollinearity: regressors are independent of each other (this assumption of often violated)
independence: observations and residuals are independent of each other (e.g., different time points)§
homoscedasticity: the variance of the residuals is constant across all levels of the data (e.g., all time points)

Question 2

Q

multicollinearity: definition + what is the problem

Answer

A

when two or more predictors in the model are highly correlated, or predictors are linear combinations of other predictors

correlated regressors explain overlapping variance in the signal
model coefficients (β) become unstable
–> i.e., small changes in the data lead to large changes in the coefficients
in case of perfect collinearity, there are infinite solutions to the regression
–> meaning the model can’t uniquely determine the individual contributions of the correlated predictors.
bouncing beta effect: model coefficients for the same regressor can be strongly positive or strongly negative depending on the coefficients of other regressors
–> the sign and size of the coefficient for a regressor can change dramatically depending on the presence of other correlated regressors in the model.
coefficients are not reliable, and the resulting model does not generalize to new data
–> most important problem

Question 3

Q

multicollinearity: how can we quantify the problem

Answer

A

look at the data: are stimulus features/behavioral variables correlated
–> if yes, this will cause problems for EVERY VOXEL in the brain
look at the covariance structure of the design matrix: high correlations among predictors after HRF convolution could be deleted if they are unnecessary and affect important comparisons
compute variance inflation factors (VIF): quantifies how much the variance of a regression coefficient increases due to multicollinearity

Question 4

Q

VIF

Answer

A

quantifies how much variance of a regression coefficient increases due to multicollinearity

R^2 = variance explained in a predictor by all other predictors in the model
VIF = 1/(1-R^2)
VIF = 1, no collinearity
VIF = 5-10, you are in trouble: 80-90% of your predictor is explained by other predictors
VIF > 20 = close laptop

Question 5

Q

‘solving’ multicollinearity

Answer

A

not possible, but you can
1. avoid the problem before it occurs through the experimental design
2. compensate for the problem through analytical strategies

Question 6

Q

multicollinearity: experimental considerations

Answer

A

think of the analysis before designing the experiment: determine a priori which factors need to be independent
orthogonal task designs: e.g., vary each experimental component independently from all others, balance their combination
separate conditions in time: add inter-trial intervals with litter, separate task phases (e.g., stimuli & button clicks)
counterbalance trial order: ensure that each condition precedes each other condition equally often (at least randomize order)
block designs: group together trials of a certain condition to separate them from trials of another condition (unlike event-related designs)m

Question 7

Q

multicollinearity: analytical considerations

Answer

A

reduce model complexity: remove predictors that are not needed
–> rule of thumb: n_regressors < n_datapoints/20
orthogonalization of regressors: decide which predictor gets credit for explaining overlapping variance
regularized regressions (e.g., Ridge regression): penalty term (λ) added to the GLM shrinks coefficients, with larger coefficients being compressed more.
–> λ value needs to be estimated through CV
–> model fits the training data less well, but it generalizes better to new data
dimensionality reduction: find principal components of design matrix and fit those to the data

Question 8

Q

pro’s and cons for orthogonalization

Answer

A

pro: can be appropriate for covariate regressors of a main regressor

con: can be misleading
–> e.g., difference between model coefficients is ‘not real’ but rather reflects your decision

Question 9

Q

regularization

Answer

A

Regularized regression is a statistical method that modifies traditional regression to prevent overfitting, which can occur when a model is too complex. It introduces a penalty term to the loss function that the optimization algorithm seeks to minimize.
This penalty term typically increases as the absolute value of the coefficients increases, leading to a preference for smaller coefficients overall, which can lead to simpler models that generalize better to new data.

Question 10

Q

temporal autocorrelation

Answer

A

the signal is correlated with a delayed version of itself, meaning that each value in the time series can be predicted based on the values that came before
–> also known as serial dependence

Question 11

Q

problem with temporal autocorrelation

Answer

A

observations are not independent

samples acquired close in time are very similar (e.g., because of the HRF)
the amount of independent information in the data is reduced
degrees of freedom are overestimated, leading standard errors to be underestimated
autocorrelation leads to inflated t-statistic, and to an increase in false positive results

Question 12

Q

Temporal autocorrelation – How can we quantify the problem?

Answer

A

compute autocorrelogram
prewhitening

Question 13

Q

compute autocorrelogram

Answer

A

An autocorrelogram is a plot that shows the correlation of the time series with itself at different lags.

correlate time series with a delayed version of itself
do this for all possible delays
inspect the resulting curve (i.e., the autocorrelogram for all delays)

Question 14

Q

prewhitening

Answer

A

remove autocorrelation by transforming the data such that the residuals resemble white noise

fit a GLM model
compute residual autocorrelation
correct residual autocorrelation (e.g., through filtering)
add the uncorrected residuals to the ‘explained (fitted) signal’
re-run the GLM on corrected data

this improves fMRI reliability

Question 15

Q

Temporal autocorrelation as a feature, not a bug

Answer

A

Check for autocorrelations in your data. They might speak towards your research question or cause problems (e.g., violating assumptions)

Question 16

Q

heteroscedasticity

Answer

A

physiological or thermal noise can vary over scan duration (e.g., head motion), leading to variations in the variance of residuals

Question 17

Q

what is the problem with heteroscedasticity

Answer

A

variance in residuals may change over time
standard errors differing between conditions etc., leading to biased t-statistics

Question 18

Q

how can we detect heteroscedasticity

Answer

A

plot residuals over predicted values of regression model
–> plot(residuals(mod), predicted(mod))

In a homoscedastic situation, the residuals will be randomly dispersed around the horizontal axis, with no clear pattern.
–> diagonal line (good)
In contrast, a heteroscedastic pattern might show a funnel shape where residuals spread out with larger predicted values, indicating increasing variability in the residuals.
–> funnel shape (bad)

Question 19

Q

how can we solve heteroscedasticity

Answer

A

correct metrics for residual variance
–> e.g., weighted least squares, robust standard errors

Question 20

Q

smoothing of hemodynamic response

Answer

A

the hemodynamic response ‘smooths’ the time series, blurring the lines between trials and making regressors similar
–> The HRF is a slow and gradual process, taking several seconds to rise and fall again after a brief neural event. This means that if two cognitive events occur close together in time, the slow HRFs that result from these events can overlap and summate.
–> The overlapping responses mean that the regressors, which are meant to be distinct, will appear more similar to each other because the blood flow responses they are trying to model are not well separated in time.

Question 21

Q

why do fMRI researchers need to be extra-aware of multicollinearity and autocorrelation compared to those using other techniques (e.g., EEG)?

Answer

A

because HRF-convolution of regressors can increase the correlation among them
because the hemodynamic response results in ‘smooth’ time series (i.e., time points are not independent)

Question 22

Q

which GLM assumption is related to the proportional change in data with changes in the regressor

Answer

A

linearity

Question 23

Q

why can multicollinearity of your design matrix be a problem

Answer

A

coefficients can invert their sign depending on other coefficients
affected model coefficients are difficult to interpret
multicollinearity increases the variance of model coefficients

Question 24

Q

Prewhitening is a technique to correct for:

Answer

A

autocorrelation

Question 25

Q

What are reasons to use ridge regression for fMRI analysis?

Answer

A

it can improve the generalization performance of the model
it can compensate for the problem of multicollinearity

Brainscape's Knowledge GenomeTM

lecture 6 - experiment meets analysis Flashcards

Brainscape's Knowledge Genome^TM