regression Flashcards
(38 cards)
regression analysis
a statistical method for examining relationships among variables
linear regression
a statistical model that assumes a linear relationship between two variables
population linear regression model
decribes the relationship that holds between Y and X in the population
X
the independent variable or regressor
Y
the dependent variable
Beta 0
the intercept, it measures the point at which the regression line intercepts the Y axis.
Beta 1
the slope of the regression line. It measures the difference in Y associated with a one unit change in X
u i
regression residual
Prediction
using the observed values of a given variable to predict the value of another variable
causal inference
to determine whether and to what extent a cause-and-effect relationship exists between variables
causality
an action is said to cause an outcome in the outcome is the direct consequence of that action
treatment group
recieve the treatment
control group
does not recieve treatment (counterfactual)
observational data
surveys, administrative records, financial reports
cross-sectional data
- data collected at a single point in time for different entities
- reflects a snapshot of variables at that point
- we can use this data so study differences across intities in a single time period
panel data
- data collected for multiple entities at multiple points in time
- captures the dynamics of change over time
-allows for the analysisi of temporal effects across entities
time series
- data collected for a single entity at multiple time points
- allows for the analysis of temporal effects and forecasting
ordinary least squares (OLS)
it identifies the prameters that minimize the sum of the squared residuals
residual
the vertical distance from the regression line
the sign (±) on Beta 1 for an independent variable
the direction of its association with the dependent variable
Central limit theorem
when the sample is large and properly drawn, the sample mean is distributed normally around the true mean
standard error
it represents the average distance that the observed values fall from the regression line
t-statistic
the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error
95% confidence interval
an interval that is a function of the data that contains the true parameter value 95% of the time in repeated samples