Descriptive Analysis and Linear Regression Flashcards

1
Q

Linear Regression Model

A
Yi = B1 + B2X2i + BkXki + ui
Yi = dependent variable
Xi = explanatory/independent/regressor
B1 = intercept/constant (average value of Y when X=0
B2 = slope coefficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ui

A

stochastic error term

average effect of all unobserved variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

objective of regression analysis

A

estimate values of Bs based on sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

OLS

A

Ordinary Least Squares - used to estimate regression coefficients
finds the pair of B1 and B2 (b1 and b2) that minimise RSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OLS assumptions

A
  • LRM is linear in its parameters
  • regressors = fixed/non-stochastic
  • exogeneity - expected value of error term = 0 given values of X
  • homoscedasticity - constant variance of each u given values of X
  • no multicollinearity - no linear relationship between regressions
  • u follows normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

OLS estimators are BLUE

A

best linear unbiased estimators

  • estimators are linear functions of Y
  • on average they are = to the true parameter values
  • they have minimum variance i.e. efficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

standard deviation of error term =

A

standard error

= RSS/df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

n-k

A

degrees of freedom
n = sample size
k = no. of regressors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

hypothesis testing

A

construct Ho and Ha e.g B2 = 0 and B2 x 0

t = b2/se(b2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

if t > cv from table

A

reject null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

type 1 error

A

incorrect rejection of true null

detecting an affect that is not present

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

type 2 error

A

failure to reject false null

failing to detect present effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

low p-value

A

suggests that estimated coefficient if statistically significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

p-value < 0.01, 0.05, 0.1

A

statistically significant at 1%, 5%, 10% levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

dummy variables

A
0 = absence
1 = presence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

e.g 1 if female, 0 if male

A

B2 would measure changes when you go from male to female
b1 = estimated wage for men
b2 = estimated diff btw men and women
b1+b2 = estimated wage for women

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

if exogeneity assumption doesn’t hold

A

leads to bias estimates and therefore we need to adjust for omitted variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

quadratic terms

A

capture increasing/decreasing marginal effects

have to generate a new variable and add it to regression

19
Q

marginal effect

A

first derivative of regression functioned wrt variable of interest

20
Q

interaction variable

A

constructed by multiplying two regressors

allows the magnitude of the effect X has on Y to vary depending on the level of another X

21
Q

interpreting

A

how does the regression function respond to a change in a variable

22
Q

if it is not linear (log-log)

A

log-log model so that it is linear in parameters

take logs and add error term

23
Q

log-lin model

A

dependent variable in logs – %
explanatory variables in levels – units
B2 measures relative change in output Q for an absolute change in input

24
Q

lin-log model

A

estimates % growth in dependent variable for an absolute change in explanatory variable

25
Q

lin-lin model

A

using a linear production function

26
Q

testing for linear combinations

A

se – t-stat – compare to critical value – create p-value – reject/don’t reject null

27
Q

TSS

A

total sum of squares = ESS + RSS sum of squared deviations from the sample mean = how well we could predict outcome w/o any regressors

28
Q

ESS

A

explained sum of squares = how much of that variation do our regressors predict

29
Q

RSS

A

residual sum of squares = outcome variation that regressors don’t explain

30
Q

R^2

A

ESS/TSS
overall measure of goodness-of-fit of the estimated regression line
how much of variation is explained by regressors
increases when u add more regressors

31
Q

F-stat

A

tests significance of all coeffs
(ESS/k-1) / (RSS/n-k)
>critical value =reject null

32
Q

dummy variable trap

A

situation of multicollinearity

to distinguish btw m categories we can only have m-1 dummies

33
Q

perfect collinearity

A

perfect linear relationship between two or more regressors

one predictor variable can be used to predict another

34
Q

imperfect collinearity

A

one dependent variable always equals to a linear combination of the other dependent variables plus a small error term

35
Q

consequences of multicollinearity in the data

A

larger standard errors – smaller t-ratio – wider CI – less likely to reject null

36
Q

homoscedasticty

A

assumption that error term has has the same variance for all observations (doesn’t always hold)

37
Q

heteroscedasticity

A

error terms have unequal variances for different observations

38
Q

consequences of heteroscedasticity

A
  • OLS still consistent and unbiased
  • se either too large or too small so t-stats, F-stats, p-values etc will be wrong
  • OLS no longer efficient
39
Q

dealing with heteroscedasticity

A
  • use log transformation
  • keep using OLS and compute heteroscedasticty
  • weighted least squares
40
Q

using a logarithmic transformation of the outcome variable

A

e.g. ln(wage) - these variables tend to have more variance at higher values

41
Q

continuing to use OLS and computing heteroscedasticity - robust standard errors

A

regress y on x

corrects se to allow for heteroscedastcity

42
Q

weighted least squares

A

more efficient than OLS in presences of heteroscedastcity

43
Q

omission of relevant variables

A

they’ll be captured by the error term

if they are correlated to the ones included then parameters are biased and exogeneity assumption doesn’t hold