- sε = standard error of the estimate - Interpretation - standard deviation of residuals; standard error in predicting Y from the regression equation - best definition: standard deviation around prediction line

- Determine the strength and significance of association - coefficient of determination - measures proportion of total variation explained, i.e. - = explained variation / total variation = SSreg / SSy =(correlation coefficient)^2 - Will be between 0 and 1; a value close to 1 indicates most of the variation in y is explained by the regression equation

Regression Analysis, Time Series Analysis Flashcards by Nick Hartman

How is correlation coefficient notated/

If measured for a population, called ? (rho)

If estimated from a sample, sure r; i.e r estimates ?

How well did you know this?

Not at all

Perfectly

What is true of the correlation coefficient?

Correlation and covariance are appropriate for use with continuous variables whose distributions have the same shape (e.g. both normally distributed)
If these assumptions are not met, r will be ‘deflated’ and underestimates ?.

How well did you know this?

Not at all

Perfectly

What are the data in a regression analysis?

One continuous response variable (called y - dependent variable, response variable)
One or more continuous explanatory variable (called x - independent variable, explanatory variable, predictor variable, regressor variable)

How well did you know this?

Not at all

Perfectly

What is true of εi?

Mean of εi will be zero

How well did you know this?

Not at all

Perfectly

What does E(Y|X=x) = β0 + β1x mean?

The expected value of Y when X = x is β0 + β1x

How well did you know this?

Not at all

Perfectly

What is the true/population regression line?

Yi = β0 + β1xi + εi
β0 and β1 are constant to be estimated
εi is a random variable with mean = 0 if our line is going through the middle of our data

How well did you know this?

Not at all

Perfectly

How is the population regression line estimated?

ŷ = b0 + b1x
- b0 and b1 are estimated values
- ŷ is a fitted value of the response

How well did you know this?

Not at all

Perfectly

What is a residual?

vertical distance between observed response and fitted value of response

How well did you know this?

Not at all

Perfectly

How are residuals estimated?

ri estimates εi, the error variable

How well did you know this?

Not at all

Perfectly

What is SSE?

The error sum of squares

SSE = nΣi=1 (yi - ŷ)2

How well did you know this?

Not at all

Perfectly

What error assumptions do we make in regression analysis?

In our fitting we assume the errors have a particular distribution - that is, ε ~ N(o, σε2)
- Normal distibution
- Mean = 0
- Constant variance = σε2
- Errors associated with any two y values are independent

How well did you know this?

Not at all

Perfectly

What is sε?

sε = standard error of the estimate
Interpretation - standard deviation of residuals; standard error in predicting Y from the regression equation - best definition: standard deviation around prediction line

How well did you know this?

Not at all

Perfectly

What are the t stats in regression analysis output?

T = test statistic (that population intercept/slope = 0 against two sided alternative), compared to t with n-2 degrees of freedom finds P = 0, i.e. intercept/slope is not 0

How well did you know this?

Not at all

Perfectly

What is S in regression analysis output?

Standard Error of the Regression (S) = average distance that values fall from regression line

How well did you know this?

Not at all

Perfectly

What is R^2?

Determine the strength and significance of association
coefficient of determination
measures proportion of total variation explained, i.e.
= explained variation / total variation = SSreg / SSy =(correlation coefficient)^2
Will be between 0 and 1; a value close to 1 indicates most of the variation in y is explained by the regression equation

How well did you know this?

Not at all

Perfectly

What is important about R?

Study These Flashcards

r = ± √r2

What is Homoscedasticity?

Study These Flashcards

If variation is constant (residuals show constant spread around zero), called homoscedastic

What is Hetroscedasticity?

Study These Flashcards

If variation is non-constant (residuals show varying spread around zero), called heteroscedastic

What is true about Large Standardised Residuals?

Study These Flashcards

Minitab flags “Large Standardised Residuals” R - should be about 5%, - indicates normality of residuals

What must be true to make predictions from a regression analysis?

Study These Flashcards

High R-sq, small std error of estimate
All assumptions appear valid
Predictions should only be made for values inside the observed limits

What does β1 represent in a multiple regression with 2 predictors?

Study These Flashcards

β1 represents the expected change in Y when X1 is increased by one unit, but X2 is held constant or otherwise controlled

What is meant by additive effects of multiple regression?

Study These Flashcards

Combined effects of X1 and X2 are additive - if both X1 and X2 are increased by one unit, expected change in Y would be ( β1 + β2 )

What must be true for us to find a Least Squares solution for a multiple regression?

Study These Flashcards

Number of predictors is less than number of observations

- Non of the independent variables are perfectly correlated with each other

What is true of the coefficient of multiple determination?

Study These Flashcards

Will go up as we add more explanatory terms to the model whether they are important or not
Often we use adjusted R-sq - compensates for adding more variables, so it lower than R-Sq when variables are not “important”
So, if comparing models with differing numbers of predictors, use Adjusted R-Sq to compare how much variation in response is explained by model

What are the rules of dummy variable regression?

- Can code any discrete variable with k categories into (k-1) distinct dummy variables - Usually only used when variables have 2 (sometimes 3) categoreis/levels

What is a polynomial regression?

- Y = β0 + β1X + β2X^2 + β3X^3 + … + ε - Equivalent to fitting a multiple regression where - X1 = x - X2 = x^2 - Xk = x^k

What is completeness and interactions terms in polynomial regression?

- Called “complete’ if all lower order terms of x are significant - If only had x and x^3 would be incomplete, third order polynomial regression - Interaction Term - This is needed if the level of X1 affects the relationship between X2 and Y - e.g. Second order model with interaction - Y = β0 + β1X1 + β2X1^2 + β3X2 + β4X2^2 + β5X1X2 + ε

What is overparamaterisation?

- Polynomial regression - Because we’re fitting so many predictors (parameters) to so few observations, the regression may fit to data too well - Meaning that it might not predict the population accurately - Model doesn’t generalise - High r SQ

What are the components of a time series?

* Long term trend * Cyclical variation * Seasonal variation * Random variation

What is long term trend?

- Also called secular trend - Relatively smooth pattern or direction - Can be linear or non-linear

What is cyclical variation?

- Wave-like pattern describing long term trend apparent over a number of years - cyclical effect - Recurrence period over 1 year (definition) - e.g. Business cycles - Rare to find cyclical patterns that are consistent and predictable

What is seasonal variation?

- Cycles that occur over short repetitive calendar periods - Duration less than one year (definition) - “seasonal” may mean 4 seasons, or systematic patterns over a month/week/day - e.g. restaurant demand features “seasonal” variation throughout the day, monthly traffic volume

What is random variation?

- Irregular, unpredictable changes - Not caused by other components (trend, cyclical, seasonal variation) - Often referred to as “noise” - Can mask the existence of other components - Exists in all time series - Goal of most time series analysis is to reduce impact of random variation on forecasting or interpretation

Regression Analysis, Time Series Analysis Flashcards

(33 cards)