Regression Flashcards
(46 cards)
What can we sa ythat r4egression is all about?
Trying to explain movement in one variable as a result of movement in other variables
what is the difference between the explanatory variables X’s, and the explained variable Y?
Y is assumed to be stochastic, following a specific probability distribution.
The explanatory variables are assumed ot be fixed in repeated samples, i.e. non stochastic.
why do we assume X to be fixed in repeated samples?
Remove randomness from this part. All randomness is allocated to the stochastic Y variable.
Given a one-variable case where some financial theory suggest that increases in some X variable will lead to changes in Y, what is the first sensible thing to do?
Plot scatter to see if the pattern is linear
what do we need to know about the equaiton
y = a x + b
It is an exact funciton. It is not realistic to see real appliations where the relationship is that exact.
how do we include error term to the exact line y=ax+b?
We transform it to per-sample point:
y_t = a x_t + b + u_t
Or
y_t = a + b x_t + u_t
is more conventional.
u_t includes the difference between the exact line and hte specific data point.
elaborate on reasons to include random disturbance term
1) The number of actual determinants (number of explanatory variables) is usually too large to be quantified perfectly, typically due to uonbservability etc
2) Some errors cannot be modeled
how do we determine the correct line (linear regression model)?
Minimizing the sum of vertical distances between each point and the line.
This is correct because of the assumption of non stochastic explanatory variables.
most cmmon method to generate the model line?
OLS
Two other methods than OLS that can be used to find a model for linear regression
Method of moments
Maximum likelihood
how do we find residual?
Difference between actual value y_t and predicted value for y_t
what is the role of the residual in linear regression?
In OLS, we minimize the sum of these residuals (squared).
broadly speaking, how do we find the functions for hte OLS parameters in the single-variable case?
Build loss function, differentiate the loss function with regards to the params, equate to 0, solve for params. This works if the loss function is convex. If convex, doing this will minimize the loss function, and provide us a set of parameters that give the smallest residual sum of squares.
L = ∑(u_t)^2 = ∑(y_t - (pred(y_t)))^2 = ∑(y_t - alpha - beta x_t)^2
With OLS, what can we say about the MEAN of the datapoints?
The predicted OLS line will go through the mean points
what do we need to understand about the intercept term?
It is given on the basis of how our data set look like. Typically, our data points will not necessarily cover the entire possible domain, as this is infeasible. Therefore, typically, we will have edges in data set, and if we try to go beyond these edges on either side, the results are essentially unknown since the model has not been fitted on these regions.
As a result, it is useful to also include some numbers on the “valid range” of our model.
What is the difference between SRF and PRF?
Sample regression function does not include the error term, but the populaiton regression function does.
The SRF is the estimated line, while PRF is the “true” function.
Is this linear?
it is linear in parameters, but not linear in variables. it can be converted to linear form, and use OLS to fit line.
If theory suggests that Y is inversely related to X, can we use OLS?
Yes. We define new variable z = 1 / x, and use this to fit. Then we just need to transform observed values of X into Z before using the model to predict.
elaborate on why this is
if we need to estiamte gamma, and the estimated value for gamma change either through time or through changes in the explanatory variables, then the linear model will not be accurate. However, if we estiamte gamma in a way that makes it behave as a constant, then we can use it with OLS.
do we need to be specific about how the residuals are computed?
Yes, becasue Y_t depend on the residual
The set of assumptions related to the classical linear regression model, do they apply to the predicted model or the unobserved values?
Unobserved. No assumptions are made on the values of the residuals that are found from the model that is predicted.
elaborate on the CLRM assumptions
1) E(u_t) = 0
2) var(u_t) = sigma^2 < infinity
3) cov(u_i, u_j) = 0
4) cov(x_i, u_i) = 0
5) u_t is n(0, sigma^2), iid.
elaborate on the assumption of normality
Necessary for hypothesis testing
what can we say about the properties of the OLS parameters if the assumptions hold?
if 1-4 hold, we have BLUE.
Best Linear Unbiased Estimator
Best: the parameters have minimum variance
Linear: the parameters are given by fnctions that are linear
Unbiased: On average, the estimators will be equal to the true population parameters.
Estimator: estiamtoes of the true population parameters.