Lecture 2 - Continuous outcomes Flashcards

1
Q

Analysis of Variance (ANOVA)
What is it used for?

A

Used to compare the means of three or more groups/samples. The means are used to determine possible differences between groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analysis of Variance (ANOVA)
What assumptions need to be met?

A
  • Independent variable = categorical
  • Dependent varibale = scale level and normally distributed
  • Variances (SD^2) in all groups are equal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Analysis of Variance (ANOVA)
* What solutions are there if the outcome on scale level is not normally distributed?
* Which test can be used to determine equal variances in all groups?

A
  • Either transform the outcome or use the non-parametric Kruskal-Wallis test
  • Test of Homogeneity of Variances
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When three group means are compared with the use of ANOVA and the resulting p-value is <0.001, you can conclude that the means of the three groups differ significantly. However, ANOVA only gives an indication that there is a significant difference somewhere between groups. You can not conclude where this difference is located between the three groups.
How can you determine where the significant difference is located?

A

By doing a post-hoc procedure (i.e. using ANOVA to establish a significant p-value and using post-hoc to establish where the significant difference is located).
* Performing three independent sample t-tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is a Bonferroni correction needed when doing a post-hoc procedure to determine where the significant difference is located after an ANOVA?

A

A bonferroni correction is needed to address the issue of inflated Type I error rates when conducting multiple statistical tests. When you perform three independent sample t-tests with a significance level of 0.05, the overall probability of making at least one type I error across all three tests becomes (3x0.05=0.15). Meaning that there is a 15% chance of observing a type I error (i.e. rejecting H0 while H0 is true).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Linear regression - continuous covariate
What is it used for?

A

Used to predict outcome Y with the use of the independent variable X, via a linear relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linear regression - continuous covariate
What assumptions need to be met?

A
  • Linear relationship between dependent and independent variables.
  • Independent variable discrete or continuous.
  • Dependent variable continuous.
  • Dependent variable has equal variances for each value of the independent variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The formula that belong to a linear regression with a continuous covariate is: y = a + b * x
* What is b0?
* What is b1?

A
  • b0 = mean y when x=0
  • b1 = mean difference in y when x increases with 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

So if you want to know whether a dependent variable is (significantly) associated with the independent variable, a linear regression model can be made.
The model has the following output:
* b0 = -2.266
* p = <0.001
* b1 = 0.031

Considering the linear regression formula, calculate y when x = 100 and when x = 101.

A
  • x = 100: y = -2.266 + 0.031* 100 = 0.834
  • x = 101: y = -2.266 + 0.031* 101 = 0.865
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Dummy regression
What assumptions need to be met?

A
  • Dependent variable is quantitative (e.g. 13.04).
  • Residuals surrounding the regression line are roughly normally distributed
  • Linear relationship between independent and dependent variables.
  • Variances of residuals are equal for all X values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

You are analysing whether there is a difference in salary between people with different educational backgrounds (low, moderate, high). For this, 2 dummies are created:
* Dummy1: Moderate education coded as 1 + other (low and high education) coded as 0.
* Dummy2: High education coded as 1 + other (low and moderate education) coded as 0.

Name the overall regression formula and used this formula to calculate the mean salary for each education level.

A

Overall regression formula: salary (y) = B0 + B1 x Dummy1 + B2 x Dummy2.
* Low education: salary (y) = B0 + B1 x 0 + B2 x 0
* Moderate education: salary (y) = B0 + B1 x 1 + B2 x 0
* High education: salary (y) = B0 + B1 x 0 + B2 x 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly