Dick butt (week 1-7) Flashcards

Question

Zero conditional mean

Answer 1

The value of the explanatory variables must contain no more information about the mean of the unobserved factors so the regressors must be exogenous

Answer 2

The value of the explanatory variables must contain no information about the variance of the unobserved factors.

Answer 3

A measurable function from a set of possible outcomes to a measurable space.

Answer 4

A contemporaneous relationship between y and z.

Answer 5

A model where the past changes can affect the future.

Answer 6

Suppose that z is equal to c in all time periods before time t. At time t, z increases by one unit to c + 1 and then reverts to its previous level at time t + 1.

Answer 7

The error is independent of the explanatory variables and is normally distributed with zero mean and variance sigma^2.

Answer 8

The error is independent of the explanatory variables and is normally distributed with zero mean and variance sigma^2.

Answer 9

A) The error is always centered in our prediction

Answer 10

It aims to find the best possible fit for the regression. That means errors/residuals are as small as possible

Answer 11

Sample estimates of u (regression residuals) are found by looking at a sample of y values indexed by i (from 1 to n), then removing the fitted (predicted) value of y

Answer 12

the covariance of the values of x and y divided by variance of x

Answer 13

average of y values, subtracted by the estimated slope coefficient multiplied by the average of x values.

Answer 14

Step 1: Define fitted values for y and residuals Step 2: Choose parameters to minimize sum of squares Step 3: Take derivatives of parameters and set them equal to 0, leading to first order conditions Step 4: Solve for estimated constant (intercept) Step 5: Then solve for estimated coefficient parameter by substituting the solution for the intercept

Answer 15

1) Explains variable y in terms of variables x1 to xk 2) Incorporates more explanatory factors into the model 3) Explicitly holds fixed factors that otherwise would be within the disturbance term → makes the conditional mean independence more likely to hold 4) Allows for more flexibility in analysis → can hold certain variables fixed to analyse the impact of one particular variable on y 5) In a simple regression model, there would be a biased estimate where one factor would inherently include the impact of the other that has not been included. In multiple linear regression, this is minimized.

Answer 16

If the regression has log(y), the interpretation of this regression coefficient changes, and becomes the natural logarithm of y → i.e. percentage change in y if x is increased by one unit, given that x is non-logarithmic

Answer 17

Now it is an elasticity → percentage change in y/percentage change in x

Answer 18

1) To estimate different relationships 2) Introducing logarithms may provide a more accurate/relevant interpretation of the true relationship between the variables 3) Fits the data better

Answer 19

A) Sample averages of y and x’s lie on the regression line

Answer 20

1) Total Sum of Squares TSS 2) Explained Sum of Squared ESS 3) Residual Sum of Sqaures RSS

Answer 21

TSS = SSE + SSR | Total variation = explain part + unexplained part

Answer 22

D) All of the above

Answer 23

False R^2 is equal to the squared correlation coefficient between the actual and fitted value of the DEPENDENT variable

Answer 24

The estimator acts as a rule that assigns each possible outcome of the sample a value of Theta, whereas the estimate is a numerical value taken on by an estimator in a particular sample of data

Answer 25

Unbiased estimator

Answer 26

Assumption SLR.1 → Linear in parameters Assumption SLR.2 → Random sampling Assumption SLR.3 → Sample variation in explanatory variable; the values of explanatory variables are not all the same (otherwise it would be impossible to study how different values of x-variables leads to different y-variable values Assumption SLR.4 → Zero conditional mean Assumption SLR.5- Homoskedasticity: the value of the explanatory variable must contain no information about the variability of the unobserved factors

Answer 27

The values that characterise the true relationship between y and x in the population

Answer 28

This is measured by the variances of the estimators of constant and slope coefficients

Answer 29

With larger variability, standard errors will increase, this impacts the value of the inferences we can make about the data.

Answer 30

1. Larger for a higher variability in unobserved influences | 2. Smaller for a higher variability in explanatory variables

Answer 31

They are estimated standard deviations of the regression coefficients. They measure how precisely the regression coefficients are estimated.

Answer 32

We mean that the procedure by which the OLS estimates are obtained is unbiased when that procedure is applied across all possible random samples.

Answer 33

Because they are collinear with the intercept

Answer 34

Explanatory variables that are correlated with the error term

Answer 35

MLR.4 Zero Conditional Mean, because we expect the expected values of all observations to be uncorrelated with u and thus = 0, whereas endogenous variables are variables that correlation with error term u.

Answer 36

Explanatory variables that are uncorrelated with the error term are called exogenous

Answer 37

A random variable is a variable that takes on a set of all possible numerical values that are determined probabilistically - Since a random variable is a collection of possibilities, only a realization of a random variable is observed. - Uppercase letter denote RVs and lowercase letters denote their realizations

Answer 38

A continuous random variable is one that takes on any real value with zero probability and thus is taking any value within a range

Answer 39

the sum of PDFs over all values of xi such that xi ≤ x

Answer 40

A conclusion reached on the basis of evidence and reasoning

Answer 41

To obtain information about a population from information contained in a sample

Answer 42

each individual in population is collected randomly and are independently and identically distributed

Answer 43

divide population up into some non-overlapping groups then do a simple random sampling from each group

Answer 44

population is divided into groups, some groups are randomly selected, then all individuals within the group are measured

Answer 45

population is divided into groups, some groups are randomly selected, then all individuals within the group are measured

Answer 46

the t-stat measures how many estimated SDs the estimated coefficient is away from zero

Answer 47

1. If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance 2. The fact that a coefficient is statistically significant does not necessarily mean it is economically or practically significant!

Answer 48

1. If a variable is statistically and economically important but has the ``wrong“ sign, the regression model might be misspecified ... OR this might be the truth! 2. If the sample size is small, effects might be imprecisely estimated so that things which are economically important may still be statistically insignificant.

Answer 49

The p-value is the smallest significant level at which the null hypothesis would be rejected → an alternative to the classical approach to hypothesis testing

Answer 50

When testing multiple linear restrictions using F-tests, it is possible that a group of variables have no effect on the dependent variable

Answer 51

we can test how the model would fit if these variables were dropped from the regression (testing for exclusion of multiple variables) → restricted model i. Check the RSS → if it increases, test if this increase is statistically significant

Answer 52

Expected values/unbiasedness under MLR.1 – MLR.4 Variance formulas under MLR.1 – MLR.5 Gauss-Markov Theorem under MLR.1 – MLR.5 Exact sampling distributions/tests under MLR.1 – MLR.6

Answer 53

1. Consistency under MLR.1 – MLR.4 | 2. Asymptotic normality/tests under MLR.1 – MLR.5 (without assuming normality of the error term)

Answer 54

An estimator is consistent if the estimate converges in probability to the true population value

Answer 55

Consistency means that - the probability that the estimate is arbitrarily close to the - true population value can be made arbitrarily high by increasing the sample size. - Consistency is a minimum requirement for sensible estimators - As the sample size grows large, it is more and more unlikely for an estimator to be far away from the true values. - With larger sample size, we have more information and the estimator should get closer and closer (in probability sense) to its true value.

Answer 56

Discrete = takes on two values (e.g. yes/no; male/female)

Answer 57

Categorical = takes on a limited number of values (e.g states)

Answer 58

Dummy (or Indicator) Variables are qualitative measures indicating the presence or absence of an attribute or category.

Answer 59

1. The inclusion of a dummy variable allows us to estimate separate intercepts, but the same slope, for different groups 2. The intercept depends on whether d=0 or d=1 𝛽0 intercept for category assigned to 0 (the base category) – (𝛽0 + 𝛿0) intercept for category assigned to 1 3. dummy variable coefficient 𝛿0 measures the difference in the intercept between the two groups

Answer 60

If the model has a constant term, this will lead to perfect collinearity between the explanatory variables; the constant term is an x variable which takes the value 1 for all observations. This leads to perfect collinearity, and the model cannot be estimated.

Answer 61

Omit one category- for gender, which has 2 categories, either male or female must be omitted If we have a categorical variable with m categories: => include (m − 1) dummy variables to avoid DVT OR e.g., for states, which has 6 categories, only include 5 dummy variables

Answer 62

1. Create unrestricted and restricted models 2. Null hypothesis → all interaction effects are zero, that is the same regression coefficients apply to men and women 3. To test in a group, use f-test OR 1. Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions 2. Run regression for the restricted model and store SSR 3. If the test is computed in this way it is called the Chow-Test Important: Test assumes a constant error variance across groups

Answer 63

Linear regression when the dependent variable is binary

Answer 64

1. Predicted probabilities may be larger than one or smaller than zero. 2. Marginal probability effects are sometimes logically impossible. 3. The linear probability model is necessarily heteroskedastic so heteroskedasticity consistent standard errors need to be computed.

Answer 65

1. Easy estimation and interpretation | 2. Estimated effects and predictions are often reasonably good in practice.

Answer 66

1. Choice of independent variables - Over-specification & sampling variance effects - Omitted variables & Endogeneity 2. Heteroskedasticity 3. Functional Form 4. Measurement Error 5. Missing Data, Non-random Sampling & Outliers

Answer 67

Even though it satisfies MLR.1-MLR.4, 1. X3 may be correlated with x1 and x2 2. x3 has no effect on y after we control for x1 and x2 Inclusion of x3 has no cost in terms of bias in the estimates of any of the parameters, because E(b3hat) = b3 = 0 - However, including irrelevant variables may increase the - sampling variance (more on this shortly)

Answer 68

Omitting a relevant variable causes bias when the omitted variable is correlated with any of the other explanatory variables in the model. The estimators are also inconsistent.

Answer 69

1. Omitted Variables - In many cases important characteristics cannot be observed AND these are often correlated with observed explanatory information. 2. Measurement error: variables are measured with error 3. Simultaneity: two or more variables are simultaneously determined - X causes Y but Y also causes X, X is jointly determined with Y - Quantity and price by demand and supply - Investment And Productivity - Sales and advertising

Answer 70

The OLS estimator is biased and inconsistent

Answer 71

- Proxy variables method for omitted regressors (W 9.2) - IV is the most well-known method to address endogeneity problems - Fixed effects methods if 1) panel data is available, 2) endogeneity is time-constant, and 3) regressors are not time-constant - Random effects methods 1) again need panel data; 2) requires stronger assumptions

Answer 72

These are variables that are used instead of the variable of interest when that variable of interest cannot be measured directly.

Answer 73

1. We hope there is at least an imperfect linear relationship between the proxy and the unobserved variable 2. The error is uncorrelated with all the explanatory variables (𝑥1,𝑥2 and 𝑥3∗) AND uncorrelated with the proxy 𝑥3 a) ZCM assumption for all variables used in the model b) In other words, the proxy is "just a proxy" for the omitted variable, it does not belong into the population regression and it is uncorrelated with the population regression error 3. The proxy variable is a "good" proxy for the omitted variable a) Correlated With the omitted Variable b) And Using Other variables in addition will not help to predict the omitted variable

Answer 74

The regression model suffers from functional form misspecification when it does not properly account for the relationship between the dependent variable and the (observed) explanatory variables E.g. a key variable has been omitted & that variable is a function of the other variable(s) in the model

Answer 75

Do the RESET test

Answer 76

- It does not provide direction on how to proceed if a model is rejected. Just tells you the current one is misspecified. - You might use it to decide between two possible models – but the test may not provide a clear winner – could accept both or reject both…

Answer 77

1. Nested tests → use F-tests for exclusion restrictions | 2. Non-nested tests → where the alternative model has different explanatory variables

Answer 78

1. Obtain the fitted values from the alternative model, and include as one of the explanatory variables in the null model. 2. If the null model is correct, the coefficient on the fitted value from the other model should be insignificant; if not, reject the null model.

Answer 79

Sometimes we have the variable we want, but it may be measured with error.

Answer 80

1. If e0 and xj (as well as xj with u) are uncorrelated, OLS is unbiased and consistent 2. While unbiased, variances are larger than with no measure error 3. New composite error: u + e0 • If E(e0) ≠ 0 then b0 will be biased – not particularly worrying

Answer 81

1. Under CEV assumption, OLS is biased and inconsistent because the mismeasured variable is endogenous 2. The effect of the mismeasured variable suffers from attenuation bias, i.e. the magnitude of the effect will be attenuated towards zero 3. In addition, if it is multivariate regression, the effects of the other explanatory variables will be biased and inconsistent due to the measurement error in 𝑥1∗, unless they are uncorrelated with 𝑥1

Answer 82

Missing data is a special case of sample selection, such as non random sampling. If the data is missing at random – it is just as though you have a smaller sample: Your findings will just be less precisely estimated. If the data is missing in a non-random way, then we violate our Random Sampling assumption.

Answer 83

1. Extreme values and outliers may be a particular problem for OLS because the method is based on squaring deviations 2. If outliers are the result of mistakes that occured when keying in the data, one should just discard the affected observations 3. If outliers are the result of the data generating process, the decision whether to discard the outliers is not so easy

Answer 84

The least absolute deviations estimator minimizes the sum of absolute deviations (instead of the sum of squared deviations, i.e. OLS)

Answer 85

1. It may be more robust to outliers as deviations are not squared 2. The LAS estimator estimates the parameters of the conditional median (instead of the conditional mean with OLS) 3. Is a special case of quantile regression, which estimates parameters of conditional quantiles

Answer 86

Use LAD when you have an outlier but you still want to account for everything → OLS estimator will be biased, so use LAD instead

Answer 87

1. More computationally intensive than OLS 2. All statistical inference involving the LAD estimators is justified only as the sample size grows. 3. It does not always consistently esti- mate the parameters appearing in the conditional mean function 4. LAD is intended to estimate the effects on the conditional median. Generally, the mean and median are the same only when the distribution of y given the covariates x1, p, xk is symmetric about b0 1 b1x1 1 p 1 bkxk. (Equivalently, the population error term, u, is symmetric about zero.) 5. When LAD and OLS are applied to cases with asymmetric distributions, the estimated partial effect of, say, x1, obtained from LAD can be very different from the partial effect obtained from OLS.

Dick butt (week 1-7) Flashcards

(129 cards)