SOWO 918 Flashcards

1
Q

What assumptions of ordinary least squares regression make the coefficients the Best, Linear, Unbiased, Estimates of the population effects?

Name a potential violation of each assumption and a recommended practice to correct for and/or respond to that violation.

A

TL:DR;
In sum: errors are independent and normally distributed with mean of 0.
+ Linearity
+ No multicollinearity (problematic redundancy)
+ IVs are not correlated with the error term

  1. Linearity: DV (outcome variable) and IV (predictor variable) are linearly related. When this assumption is violated, the nonlinear relation cannot be identified by the model and parameters cannot be trusted.

Fix: Transform the DV; Select a non-linear modeling technique instead of OLS linear regression.

  1. Expected value (i.e., mean) of error is 0: When expected value is violated the intercept is biased.

Fix: Re-specify the model w/ a better set of IVs and consider adjusting for standard errors.

  1. Errors are uncorrelated (meaning errors are independent across sample members): When independence is violated then standard errors may be biased (happens in panel or clustered data), and lead to underestimates in standard errors—inflating test statistics.

Fix: Can use Generalized Least Squares or other methods for these type of data (e.g., MLM), and adjust for standard errors (e.g., use robust standard errors).

  1. Errors are homoscedastic (meaning errors are similar across regression line or have a constant variance): When homoscedasticity is violated then standard errors may be biased.

Fix: Re-specify the model w/ a better set of IVs and consider adjusting for standard errors.

  1. IVs are not correlated with the error term: When error terms are correlated with IVs then regression coefficients may be biased. This is an indicator of endogeneity and/or omitted variable bias.

Fix: Regression model is not properly specified; therefore, re-specify the model.

  1. No perfect collinearity among IVs: If there is perfect collinearity among IVs then it may yield significant F test without a single significant regression coefficient. This doesn’t mean your IVs cannot be correlated, they can’t be perfectly correlated (b/c this would be redundant).

Fix: Re-specify the model omitting redundant IVs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Be able to draw on scenarios from your research area to write an equation for a multiple regression model using ordinary least squares regression, including an interaction term as a predictor. Be able to define each term in the model.

A

Research Question: Does a history of sexual abuse, substance use, and their intersection predict commercial sexual exploitation among adolescents?

Equation: Y = B0 + B1X1 + B2X2 + B3(X1X2) + e

Terms:
Y = Dependent variable
B0 = the intercept, i.e., the expected value of Y when all other terms are 0
B1 and B2 = the increment of change in Y related to a one unit change in X1 and X2 (respectively) when the other variable (X2 or X1 is 0)
X1 and X2 = Independent variable(s)
B3 = the interaction coefficient, or the difference in conditional effect/slope for one variable when the other variable increased by one unit ==> change in B2 when X1 changes and change in B1 when X2 changes
e = error, observed value minus the predicted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a conditional effect in the case of ordinary least squares regression with interactions?

What are simple effects in the case of ordinary least squares regression with interactions?

A
  • The main effect is the effect of variable X on Y when you do not include the product (interaction) term (XZ) in the regression equation.
  • The simple effect of variable X on Y when holding Z constant at 0. In Jaccard and Turrisi (2003), the authors state that the simple effect is the same thing as the conditional effect (pg. 5). You can plot simple slopes at various levels of Z to probe the interaction more.

Conditional effects (interaction effects):
• The conditional effect is the effect of variable X on Y when holding Z constant at a certain value
• Slopes for variables included in the interaction represent the conditional effect/slope (not main effect) of that variable on Y when the other variables in the interaction are at zero.
• When the other variable does not equal zero, the effect of X1 is B1 + B3X2 (the conditional effect plus the interaction effect times the other variable)
• Note that nonsignificant conditional effects don’t necessarily mean that the variable is not influential on its own.
• Centering variables is useful b/c the conditional effect is when the other variable is at its mean

Simple effects:
• They can be simple means for dummy/categorical second IVs (i.e., the level of y at each level of 2 categories, adjusted for variation explained by other IVs in the model) OR simple slopes (the slope of X1 at specific levels of X2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is moderation, and how does one investigate moderation using ordinary least squares regression?

A

Moderation: When the effect of a focal variable on the DV varies systematically based on the level of another variable (or if the moderator is categorical then whether the effect/slope diffs for each group)

How to investigate: Moderation is typically assessed via an interaction between a focal IV and a factor that specifies the appropriate conditions for its operation
• Examine data descriptively per BLUE assumptions, moderator variable should be uncorrelated w/ the IV and DV, and always function as independent of predictor variables
• Create interaction term based on theory, identify which variable is focal and which is the moderator
• Run sequence of models, center continuous variables in the interaction to make 0 meaningful
• Examine whether interaction(s) is/are significant: Model fit could be assessed using the AIC and BIC as well as the R2 values. A more formal F test can be run to determine if the second model, the one with the interaction, explains more variance in the outcome, Y, than the model without the interaction. If the second model is a significantly better model, the researcher can confidently state that there is an interaction between the variables X and Z.
• If significant, probe interaction for simple effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What effect sizes should an investigator report when conducting ordinary least squares regression, and what do they mean?

A

Effect sizes are important for understanding regression results because they help researchers determine the practical and theoretical importance of an observed effect.

Good reporting practice: coefficient, standard error, p-value, confidence intervals, standard effect size (e.g., beta, eta-squared, d)

It may be useful to standardize the betas so that the coefficients can be compared and the effect of each individual variable on Y is comparable.

R2/eta-squared (likely the most common measure of effect size for regression w/ one IV) indicates how much variation in Y is accounted for by the IV, providing a way to interpret effect size.
•R2adjusted is a modified version of R2 which adjusts for the number of predictors (useful in multiple regression) in the model and increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.
• Cohen’s d measures effect size by subtracting the means of two distributions, divided by the standard deviations.
• Hedges g (unbiased calculation and useful for small sample sizes) measures effect size by subtracting the means of two distributions, divided by pooled sample standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Be able to draw on scenarios from your research area to write an equation for a binary logistic regression model and define each term in the model.

A

Research Question: Is sexual abuse (binary = y/n) associated with the increased odds of ever experiencing commercial sexual exploitation?

Equation:
• In(p/1-p) = B0 +B1(X1)
• In(p/1-p) = the log odds that Y = 1 (in this context, that the child experiences commercial sexual exploitation)
• B0 = the predicted log odds of commercial sexual exploitation if IVs = 0
• B1 = predicted change in the log odds corresponding to a one-unit change in X1 (The IV or sexual abuse)
o Often converted to an odds ratio (OR) for interpretation as an effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What effect sizes should an investigator report when conducting binary logistic regression, and what do they mean?

A
  • It is recommended to report beta coefficients which are log odds. These can be exponentiated to get odds ratios and also used to estimate predicted probabilities.
  • In addition, report p-values as well as AIC and BIC values for model comparison.

Odds ratios and sometimes predicted probabilities should be used to interpret an effect size, in addition to the intercept, coefficients, and model fit.

Odds ratios:
• An odds ratio is the ratio of odds for one group compared to the other group
• Odds = probability of outcome happening divided by the probability of it not happening
• OR = Odds for reference condition divided by the other (i.e., odds of commercial sexual exploitation for males divided by odds of commercial sexual exploitation for females)
• Ranges from 0 to positive infinity
• Greater than 1 means increased odds, less than one means decreased odds.
• When dichotomous IV, the difference in odds is between two groups; When continuous IV the factor by which predicted odds of Y = 1 changes for every one-unit change in IV
• For OR > 1, take 1/OR and describe in terms of odds of Y = 0

Predicted Probabilities:
• Probabilities implied by the model (kind of like simple means/slopes)
• What is the probability of the outcome Y = 1 for different levels of IV
• Example: The probability that two groups (male/female children) are likely to experience outcome (commercial sexual exploitation) in sample according to the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What approach and statistic(s) would you use to assess the model fit of a logistic regression model?

A

• Test statistic: Likelihood ratio test (p < 0.05) explains whether the model is better than the null model. Can also use to test two different models.
• In addition, can check the AIC and BIC: Only used in model comparison, want these scores to be lower.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why are confidence intervals and effect sizes important for understanding the results of ordinary least squares and binary logistic regressions?

A

• Confidence intervals are important to communicate the precision of the model/parameter. If a confidence interval is large, the prediction may not be precise and something may be causing large standard errors. The parameters may not be reliable estimates of the population values.
• Effect sizes are important for understanding the practical and clinical importance of an effect (how large is the rate of change, how much variance in DV is explained by the model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is another way to write Logit(p)?

A

Log(p/1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly