unit 3 - ch 13 - simple linear regression (slr) Flashcards
Underlying all is for slr is
chance
Chance
Correlation is passive (is)
Chance is application (of)
Are they moving in tandem (x and y)
Data always varies during to reason or chance
Chance is foundation in which regression is built
number of sales for six salesperson (SP)
We don’t know how much each salesperson sold
Number of sales → Y variable
Guess each salesperson’s sales?
Rule: You must guess the same number for each person
Your guess?
The mode → 10 (guess 6 times)
Right 2/6 times
How much error with each guess?
E = Y-10 (guess)
total error –> ess –> error sum of the square
Ess mode = sigma (Y-10)^2
comparison of guesses
We want to limit our losses but it’s like golf.. It’s not that we’re gonna hit a hole in one but what we want is to make multiple good shots to eventually get to the whole
Limit your losses not just a whole in one
Not really really wrong
Guessing the mean is this.. Limiting our losses
Substitute the word usually for average
How much better can we do than guessing we build off of this
predictions
Guessing to predictions
X is new here
Is the x variable and y variable correlated?
Use fx function to get r value
r = 0.92.18
Use fx function for intercept
b = 2.0909
Use fx function for the slope
m = 0.8182
Line of best fit
line
y = mx + b
regression equation (line of best fit)
y hat = b + mx
y hat = predicted value
b = y-intercept
m = slope
example
y hat = 2.0909 + 0.8182(x)
FM = full model
Using all predictive variables (x variables)
In SLR = 1 predictive variable (FM)
Chance model had no predictive variable
Full model all predictive variables
Line of best fit
SLR: common business practices
Predicting and/or forecasting
>Hiring decisions
>Inventory cycles
>Future sales
Understanding underlying elements
>Marketing strategy
> Operation efficiency
Supplement executive creativity
> Reveal new insights
SLR: Looking for relationships, making associations, drawing conclusions
Example of assumptions:
Outfit → Purchasing Power
Job → Disposition
Car → Personality
residuals (there is no perfect model)
Residuals woo!!!
Residual is another name for error
line of best fit - on exam (multiple questions)
memorize the picture in notes!!!!!!!!!
Line of best fit not perfect fit
Only would happen if there would be a perfect correlation
If r = +/- 1 or e = 0 there would be a perfect fit
Residuals (error) =
Residuals (error) = Y - Y hat
e = actual - predicted
e not r
Ordinary least squares regression =
Ordinary least squares regression = less total error
Doesn’t care if error is positive or negative, cares about magnitude of error
Sigma (Y-Y hat)^2 (squaring!!!!)
Errors are often plotted either randomly or normally
properties and qualities of residuals (error)
notation: a sample –> e, for a population –> funny looking e
if r = +1 or -1, e = 0
ordinary least squares regression: total error decrease
sum(y-yhat) = 0, always!, so sigma(y-yhat)square
sum(y-yhat)square –> total error: 7.81
plot errors
randomly scattered around the x-axis
normally distributed
randomly scattered around the x-axis
> more errors as you draw closer to x-axis (in the middle)
model is reducing the error that is why
in taking random samples our error should be random
plots and data point
If data point fall out in some sort of pattern you do not have a linear regression. Relationship can be parabolic etc. but it is not linear
If residuals fall out in a pattern it is not linear
chance model vs full model
??????
FM = full model
Using all predictive variables (x variables)
In SLR = 1 predictive variable (FM)
Chance model had no predictive variable
Full model all predictive variables
Line of best fit
Total variation in the Y-variable can divided into 2 distinct components:
Regression term
Y’s relationship with the X-variable(s)
Picked up in full model (has x in the formula)
Residual term
Random factors not in the model (error)
Years of experience, gender, age etc. that are not int he model that can influence sales of salesperson etc.
Four Key Concepts for SLR
Concept 1: The coefficient of determination
coefficient of determination- RSQ - the percentage of the variation in the y-variavle that is explained by the variation in the x-variables
Don’t confuse the coefficient of determination (RSQ) with the correlation coefficient (r, p)
A percentage
range = 0-1
Practical because percentages are understandable
square r
r = 0.9218
RSQ = about 85%
How high does RSQ need to be
Useful in context = good RSQ
Human behavior RSQ is lower because behavior is complex