Module 5 (Lecture 5, Tutorial 2, Article) Flashcards

(24 cards)

1
Q

When do you use a Chi-square test (x^2) and what does it measure?

A

When both the IV and DV are nominal (non-metric), one group.

It measures whether the observed frequencies differ significantly from expected frequencies.
- Goal is to test for association between two nominal variables = Chi-square test (x^2) (contingency analysis).
- Goal is to predict a nominal (yes/no) DV = logistic regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do you use a T-test and what does it measure?

A

Use T-test when DV is metric and the goal is to compare means.

It measures whether a mean difference is statistically significant.
- use a t-test in one group: tests whether the mean of the group differs from a known or expected value.
- use a t-test in two groups: tests whether the means of the two groups are significantly different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When do you use a F-test and what does it measure?

A

The F-test is used when the DV is metric.

  • use a F-test when you have 2 groups and want to test whether variances between the two groups are different.
  • use a F-test when you have 3+ groups and want to test whether their means differ significantly (used in ANOVA).
  • also used in regression to test if the model explains a significant portion of variance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the purpose of a hypothesis test for mean differences

A

A hypothesis test checks whether an observed difference in means is likely due to random sampling error or reflects a real effect.

Null hypothesis (H0) = there is no difference between the group means.
With a t-test you compare group means.
If the difference is statistically significant, you reject h0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the null and alternative hypothesis in a t-test?

A
  • H0 (null hypothesis): U1 - U2 = 0.
  • H1 (alternative hypothesis):
    Two-sided: U1 - U2 ≠ 0.
    One-sided: U1 - U2 < or > 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the significance level (a) mean?

A

The significance level is the threshold below which the p-value must fall to reject the null hypothesis.

It represents the probability of making a type I error.

Rejecting H0 when it’s actually true.

Lower a = fewer false alarms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does it mean if a test result falls in the shade tail of the bell curve?

A

It means the result is statistically significant. Unlikely to occur by random chance under H0. Reject H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is power (1-B)?

A

The chance of correctly detecting a real effect, if it exists.

It’s the chance of correctly rejecting H0 when H1 is true.

Higher power = lower chance of missing a real effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Difference between type I error and type II error?

A

Type I = false positive, you reject H0 when it’s actually true.

Type II = false negative, you fail to reject H0 even though H0 is false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What affects the a error (false positive)?

A
  • Larger effect size -> lowers a error.
  • Larger sample size -> lowers a error.
  • More data dispersion -> increases a error.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the objective of a regression analysis?

A
  1. Measures the slope of the regression line.
  2. Estimates influence of X on Y.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the regression formula?

A

Y = B0 + B1X

B1 is the slope of the line (difference of Y / difference of X).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the least squares method in regression?

A

It is a method to find the best beta 0 and beta 1 for the regression line, the values that minimise differences from the actual observations and regression line.

Steps:
1. Regression formula: Y = b0 + b1X + u
(Only error term added)

  1. Rearrange, error term needs to be in front of equation
  2. Minimise the total squared errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is R^2 in regression?

A

R^2 is the goodness-of-fit statistic in regression.
It shows how much of the variance in the DV (Y) is explained by the IV (X).

Formula: R^2 = (regression coefficient or slope)^2 x (variance of x / variance of y).

A higher R^2 means X explains more of the variation in Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 3 limitations of R^2?

A
  1. No rules on how high R^2 need to be.
  2. Offers no info about how well the model performs outside sample.
  3. Says nothing about the practical importance (you can have a high number but really small slope).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Differences between correlation analysis (3) and regression analysis (3)?

A

Correlation analysis:
1. Correlation coefficient between -1 and +1.
2. Measures linear correlation between 2 variables.
3. No theory needed (just shows correlation) and not testable.

Regression analysis:
1. Regression coefficient (unconstrained).
2. Measures linear correlation between one DV and 1/multiple influencing variables.
3. Theoretical understanding necessary (you need to decide which variable influences which) and testable (you can also do causal models).

17
Q

What is a multiple linear regression?

A

A statistical method used to examine the relationship between 1 DV (Y) and 2 or more IV’s (X1, …).

No multicollinearity = multiple regression assumes that the IV’s are not highly correlated with each other.

18
Q

What are the 4 key assumptions of linear regression?

A
  1. Linear relationship between DV and IV.
  2. Error term is normally distributed.
  3. The model should show homoscedasticity (equal spread of errors across x values).
  4. Sample size of at least 20 cases per IV.
19
Q

Which 4 variables increase the likelihood that media outlets report corporate social irresponsible news?

A
  1. Brand salience (how prominent a brand is in someone’s memory)
  2. Brand strength
  3. Level of negative word of mouth
  4. Domestic brand
20
Q

On what scale is “gender” coded?

A

On a nominal scale.

Also use dummy variables to describe it numerically.

21
Q

What are 3 advantages of using a multi-item scale compared to single-item scale?

A
  • Less variables in your regression formula.
  • Higher reliability.
  • Higher validity.
22
Q

Suppose we want to identify the factors that drive willingness to pay, and the independent variable is metric. What kind of econometric analysis could we perform?

A

Regression analysis

Both IV and DV are metric.

23
Q

Suppose that we ask our respondents whether they would join the festival, either yes or no. IV is metric.

What kind of analysis could we perform?

A

Logistic regression.

For non-metric DV (yes/no) and metric IV.

24
Q

What is correlation? What is causality?

A

Correlation = when two or more events are related to each other and change together.

Causality = when one event contributes to the production of another event. The cause is partly responsible for the effect, and the effect is dependent on the cause.