Descriptive Analysis and Linear Regression Flashcards

(43 cards)

1
Q

Linear Regression Model

A
Yi = B1 + B2X2i + BkXki + ui
Yi = dependent variable
Xi = explanatory/independent/regressor
B1 = intercept/constant (average value of Y when X=0
B2 = slope coefficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ui

A

stochastic error term

average effect of all unobserved variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

objective of regression analysis

A

estimate values of Bs based on sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

OLS

A

Ordinary Least Squares - used to estimate regression coefficients
finds the pair of B1 and B2 (b1 and b2) that minimise RSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OLS assumptions

A
  • LRM is linear in its parameters
  • regressors = fixed/non-stochastic
  • exogeneity - expected value of error term = 0 given values of X
  • homoscedasticity - constant variance of each u given values of X
  • no multicollinearity - no linear relationship between regressions
  • u follows normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

OLS estimators are BLUE

A

best linear unbiased estimators

  • estimators are linear functions of Y
  • on average they are = to the true parameter values
  • they have minimum variance i.e. efficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

standard deviation of error term =

A

standard error

= RSS/df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

n-k

A

degrees of freedom
n = sample size
k = no. of regressors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

hypothesis testing

A

construct Ho and Ha e.g B2 = 0 and B2 x 0

t = b2/se(b2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

if t > cv from table

A

reject null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

type 1 error

A

incorrect rejection of true null

detecting an affect that is not present

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

type 2 error

A

failure to reject false null

failing to detect present effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

low p-value

A

suggests that estimated coefficient if statistically significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

p-value < 0.01, 0.05, 0.1

A

statistically significant at 1%, 5%, 10% levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

dummy variables

A
0 = absence
1 = presence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

e.g 1 if female, 0 if male

A

B2 would measure changes when you go from male to female
b1 = estimated wage for men
b2 = estimated diff btw men and women
b1+b2 = estimated wage for women

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

if exogeneity assumption doesn’t hold

A

leads to bias estimates and therefore we need to adjust for omitted variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

quadratic terms

A

capture increasing/decreasing marginal effects

have to generate a new variable and add it to regression

19
Q

marginal effect

A

first derivative of regression functioned wrt variable of interest

20
Q

interaction variable

A

constructed by multiplying two regressors

allows the magnitude of the effect X has on Y to vary depending on the level of another X

21
Q

interpreting

A

how does the regression function respond to a change in a variable

22
Q

if it is not linear (log-log)

A

log-log model so that it is linear in parameters

take logs and add error term

23
Q

log-lin model

A

dependent variable in logs – %
explanatory variables in levels – units
B2 measures relative change in output Q for an absolute change in input

24
Q

lin-log model

A

estimates % growth in dependent variable for an absolute change in explanatory variable

25
lin-lin model
using a linear production function
26
testing for linear combinations
se -- t-stat -- compare to critical value -- create p-value -- reject/don't reject null
27
TSS
total sum of squares = ESS + RSS sum of squared deviations from the sample mean = how well we could predict outcome w/o any regressors
28
ESS
explained sum of squares = how much of that variation do our regressors predict
29
RSS
residual sum of squares = outcome variation that regressors don't explain
30
R^2
ESS/TSS overall measure of goodness-of-fit of the estimated regression line how much of variation is explained by regressors increases when u add more regressors
31
F-stat
tests significance of all coeffs (ESS/k-1) / (RSS/n-k) >critical value =reject null
32
dummy variable trap
situation of multicollinearity | to distinguish btw m categories we can only have m-1 dummies
33
perfect collinearity
perfect linear relationship between two or more regressors | one predictor variable can be used to predict another
34
imperfect collinearity
one dependent variable always equals to a linear combination of the other dependent variables plus a small error term
35
consequences of multicollinearity in the data
larger standard errors -- smaller t-ratio -- wider CI -- less likely to reject null
36
homoscedasticty
assumption that error term has has the same variance for all observations (doesn't always hold)
37
heteroscedasticity
error terms have unequal variances for different observations
38
consequences of heteroscedasticity
- OLS still consistent and unbiased - se either too large or too small so t-stats, F-stats, p-values etc will be wrong - OLS no longer efficient
39
dealing with heteroscedasticity
- use log transformation - keep using OLS and compute heteroscedasticty - weighted least squares
40
using a logarithmic transformation of the outcome variable
e.g. ln(wage) - these variables tend to have more variance at higher values
41
continuing to use OLS and computing heteroscedasticity - robust standard errors
regress y on x | corrects se to allow for heteroscedastcity
42
weighted least squares
more efficient than OLS in presences of heteroscedastcity
43
omission of relevant variables
they'll be captured by the error term | if they are correlated to the ones included then parameters are biased and exogeneity assumption doesn't hold