Tutorial 2 extra practice material prev. courses Flashcards
Statistical formulas and calculations (13 cards)
What is the formula for calculating the mean?
mean = (Sum of all values) ÷ (Number of values)
What is the formula for calculating the variance?
Variance = (Sum of the squared differences between each value and the mean) ÷ (Number of values - 1)
Here’s what that means step by step:
1. Find the mean of the data set.
2. Subtract the mean from each value to find the difference.
3. Square each difference (multiply it by itself).
4. Add up all the squared differences.
5. Divide this total by the number of values -1 (in case of a sample)
What is the formula for calculating the standard deviation?
Standard deviation = Square root of the variance
Here’s what that means step by step:
1. Find the mean of the data set.
2. Subtract the mean from each value to find the difference.
3. Square each difference (multiply it by itself).
4. Add up all the squared differences.
5. Divide this total by the number of values -1 (in case of a sample) –> variance
6. Take the square root of the variance –> this gives you the standard deviation.
How do we interpretate the variance?
We interpret variance as a measure of how spread out the values in a data set are around the mean (on average (in squared units)).
- A low variance means the data points are close to the mean, the values don’t vary much.
- A high variance means the data points are spread out widely from the mean, there’s a lot of variability.
How do we interpretate the standard deviation?
We interpret the standard deviation as a measure of the average distance between each data point and the mean. It tells us how spread out the values in a data set are, in the same units as the data itself (not squared).
- A small standard deviation means the data points are close to the mean, variability.
- A large standard deviation means the data points are more spread out from the mean, high variability.
Give the lineair regression model
ŷ=β0 +β₁x
ŷ = predicted value
𝑥 = independent variable
β₀= intercept (where the line crosses the y-axis)
β₁ = slope (how much 𝑦 changes with each unit of
𝑥)
What is the formula for the slope β₁ using the least squares method?
β₁ = s𝑥𝑦 / s²𝑥
s²𝑥= x²^⁻ - (x^⁻)²
s𝑥𝑦 = x²^⁻ȳ - x²^⁻ * ȳ
What is the formula for the intercept β₀?
β₀ = ȳ −β₁x^⁻
ȳ = mean of the y-values
x^⁻ = mean of the x-values
How do we test whether β₁ is significant on a 5% level (two-sided t-test)?
- t-test for the regression coeffficients
- hypothesis formulation:
H₀: β₁ = 0 vs. H₁: β₁ ≠ 0 (𝛼= 0.05) - empirical test statistic (t-test):
test-statistic = β̂₁ / sᵦ̂₁
(sᵦ̂₁= standard error of the estimated coefficient s𝑥/√n) - Determination of the critical value (t-table): Reject H₀, if test statistic > t₁₋𝛼⁄2; n-j-1
- Compute empirical test statistic and compare it with the critical value; test decision: test-statistic = β̂₁ / sᵦ̂₁
What influences the chance of a type 1 / 𝛼 error the most?
sample size
What is a type 1/𝛼 error?
False positive: Rejecting the null when it’s true
You think there is an effect or difference, but there isn’t.
What is a type 2/β error?
False negative: Failing to reject the null hypothesis when it is actually false.
You think there is no effect or difference, but there is.
What is endogeneity?
Error term and independent variable may correlate with an unobserved variable.