Midterm Flashcards
Right vs left skewed distribution
Right = mode, median, mean
Left = mean, median, mode
What is the variance
The arithmetic average of the squared differences of the data values of the mean
How to calculate standard error
Standard deviation divided by the square root of the sample size
What is standard deviation
Describes the spread o f values in a continuous distribution - a sample or population
It is used as a descriptive statistic
What is a standard error
Is used to measure the accuracy of a sample distribution in representing a population
How do you calculate confidence bounds
Mean plus minus (t value * standard error)
How to calculate standard error of the difference
Root of (se1 squared plus se2 squared)
How to calculate a chi square
Sum of ((observed frequency minus expected frequency)squared) divided by expected frequency
What does the chi square tell us about the null hypothesis
If the chi square is larger than the critical value of the degrees of freedom, the null hypothesis can be rejected
Explain type I vs type II error
Type II error is one where you fail to reject the null hypothesis when you should
Type I error is one where you reject a null hypothesis when you shouldn’t
What is the central limit theorem
Establishes that means of repeated large samples are normally distributed even when underlying distribution of the data is not normal
What is a confidence interval?
Best guess of finding the mean of a data set and the confidence that it lies somewhere in the desired interval (generally use 95)
Pearson’s correlation coefficient
Measure the association between two continuous variables
R is scaled between -1 and 1
R= covariation of x and y
Correlation vs regressions
Correlation tells us how strongly associated two variables are
Regression can tell us on average how much a one unit increase in the independent variable changes the predicated value of the dependent variable
What does the line of best fit do
Minimizes the Y distance from each observation to the line
Why do we use y hat and how does it differ
Y hat means we are producing estimated y values
In actual values of y we need the error term so y=a+bXi+ei
Standard error of the slope
Given by the root mean square error over standard deviation
T ratio
t= (b - ßH0)/s.e.
What are the five OLS assumptions
Linearity
Mean independence
Homscedasticity
Uncorrelated disturbances
Normal disturbance
Explain linearity
Linearity - the dependent variable is a linear function of the x’s plus a population error term ex. yea+ß1x1+ß2x2+e
Pertains to linearity in the parameters
Explain mean independence
Zero conditional mean
The mean value of error does not depend on any of the x’s
Assume that e(€)=0
Most important because violations 1. Can generate large bias in the estimates and 2. Cannot be tested for without additional data
Omitted variable bias
Endogenous bias
Measurement error
Explain homoscedasticity
The variance of the error cannot depend on the x’s
standard deviation squared is constant
You want homoskedacity
P value has to be >0.05
Non constant variance
Biases the standard errors
Explain uncorrelated disturbances
Teh value of the error for any observation is uncorrelated with the value of the error for any other observation
Correlated errors can arise from connected observations, causal effects, or serial correlation
They shrink standard errors, observations are assumed to be more independent that they are, type 1 error danger
Explain normal disturbance
The disturbances ,e, are distributed normally
Only the disturbances not the variables must be normally distributed
Normality is the least important assumption