Module 5 (Lecture 5, Tutorial 2, Article) Flashcards
(24 cards)
When do you use a Chi-square test (x^2) and what does it measure?
When both the IV and DV are nominal (non-metric), one group.
It measures whether the observed frequencies differ significantly from expected frequencies.
- Goal is to test for association between two nominal variables = Chi-square test (x^2) (contingency analysis).
- Goal is to predict a nominal (yes/no) DV = logistic regression.
When do you use a T-test and what does it measure?
Use T-test when DV is metric and the goal is to compare means.
It measures whether a mean difference is statistically significant.
- use a t-test in one group: tests whether the mean of the group differs from a known or expected value.
- use a t-test in two groups: tests whether the means of the two groups are significantly different.
When do you use a F-test and what does it measure?
The F-test is used when the DV is metric.
- use a F-test when you have 2 groups and want to test whether variances between the two groups are different.
- use a F-test when you have 3+ groups and want to test whether their means differ significantly (used in ANOVA).
- also used in regression to test if the model explains a significant portion of variance.
Explain the purpose of a hypothesis test for mean differences
A hypothesis test checks whether an observed difference in means is likely due to random sampling error or reflects a real effect.
Null hypothesis (H0) = there is no difference between the group means.
With a t-test you compare group means.
If the difference is statistically significant, you reject h0.
What are the null and alternative hypothesis in a t-test?
- H0 (null hypothesis): U1 - U2 = 0.
- H1 (alternative hypothesis):
Two-sided: U1 - U2 ≠ 0.
One-sided: U1 - U2 < or > 0.
What does the significance level (a) mean?
The significance level is the threshold below which the p-value must fall to reject the null hypothesis.
It represents the probability of making a type I error.
Rejecting H0 when it’s actually true.
Lower a = fewer false alarms.
What does it mean if a test result falls in the shade tail of the bell curve?
It means the result is statistically significant. Unlikely to occur by random chance under H0. Reject H0.
What is power (1-B)?
The chance of correctly detecting a real effect, if it exists.
It’s the chance of correctly rejecting H0 when H1 is true.
Higher power = lower chance of missing a real effect.
Difference between type I error and type II error?
Type I = false positive, you reject H0 when it’s actually true.
Type II = false negative, you fail to reject H0 even though H0 is false.
What affects the a error (false positive)?
- Larger effect size -> lowers a error.
- Larger sample size -> lowers a error.
- More data dispersion -> increases a error.
What is the objective of a regression analysis?
- Measures the slope of the regression line.
- Estimates influence of X on Y.
What is the regression formula?
Y = B0 + B1X
B1 is the slope of the line (difference of Y / difference of X).
What is the least squares method in regression?
It is a method to find the best beta 0 and beta 1 for the regression line, the values that minimise differences from the actual observations and regression line.
Steps:
1. Regression formula: Y = b0 + b1X + u
(Only error term added)
- Rearrange, error term needs to be in front of equation
- Minimise the total squared errors
What is R^2 in regression?
R^2 is the goodness-of-fit statistic in regression.
It shows how much of the variance in the DV (Y) is explained by the IV (X).
Formula: R^2 = (regression coefficient or slope)^2 x (variance of x / variance of y).
A higher R^2 means X explains more of the variation in Y.
What are the 3 limitations of R^2?
- No rules on how high R^2 need to be.
- Offers no info about how well the model performs outside sample.
- Says nothing about the practical importance (you can have a high number but really small slope).
Differences between correlation analysis (3) and regression analysis (3)?
Correlation analysis:
1. Correlation coefficient between -1 and +1.
2. Measures linear correlation between 2 variables.
3. No theory needed (just shows correlation) and not testable.
Regression analysis:
1. Regression coefficient (unconstrained).
2. Measures linear correlation between one DV and 1/multiple influencing variables.
3. Theoretical understanding necessary (you need to decide which variable influences which) and testable (you can also do causal models).
What is a multiple linear regression?
A statistical method used to examine the relationship between 1 DV (Y) and 2 or more IV’s (X1, …).
No multicollinearity = multiple regression assumes that the IV’s are not highly correlated with each other.
What are the 4 key assumptions of linear regression?
- Linear relationship between DV and IV.
- Error term is normally distributed.
- The model should show homoscedasticity (equal spread of errors across x values).
- Sample size of at least 20 cases per IV.
Which 4 variables increase the likelihood that media outlets report corporate social irresponsible news?
- Brand salience (how prominent a brand is in someone’s memory)
- Brand strength
- Level of negative word of mouth
- Domestic brand
On what scale is “gender” coded?
On a nominal scale.
Also use dummy variables to describe it numerically.
What are 3 advantages of using a multi-item scale compared to single-item scale?
- Less variables in your regression formula.
- Higher reliability.
- Higher validity.
Suppose we want to identify the factors that drive willingness to pay, and the independent variable is metric. What kind of econometric analysis could we perform?
Regression analysis
Both IV and DV are metric.
Suppose that we ask our respondents whether they would join the festival, either yes or no. IV is metric.
What kind of analysis could we perform?
Logistic regression.
For non-metric DV (yes/no) and metric IV.
What is correlation? What is causality?
Correlation = when two or more events are related to each other and change together.
Causality = when one event contributes to the production of another event. The cause is partly responsible for the effect, and the effect is dependent on the cause.