Exam 2- Regression Flashcards

0
Q

Y=a+bx

A
Y-hat reminds us that we have deviations about the line and that values for y specified by the line are PREDICTIOnS
a - intercept
b - slope
^
Y- predicted value if y for a given x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Statistical model

A

An equation that fits the pattern between a response variable and possible explanatory variables, accounting for deviations from the model. Or in other words, a regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does y intercept tells us?

A

The value of y when x=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does slope tell us?

A

The change in y for every one unit increase in x , on average!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

As x increases by one unit what happened to the y when slope is negative?

A

Y decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

As x increases by one unit what happens to y when slope is positive?

A

Y increases by rise/run units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

b=

A

Rise(y)/run(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Interpretation of slope : rise/run

A

For every inch increase in height at age 4 , height increases by 1.15 inches ON AVERAGE at age 18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interpretation of y- intercept

A

Males who are zero inches tall at age 4 will be 23 inches tall at age 18

The intercept is the value of y when x=O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to predict

A
  • collect data
  • plot data
  • predict
  • fit the data with a straight line equation
  • evaluate the equation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Residuals

A

Vertical distance from the observed y value and the line , or

The difference between observed y value and y-hat , the value predicted by regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Squared Prediction error (residual)2

A

(Observed y - predicted y)2= (Y - Y(hat)) squared

They are squared because the sum of two residuals are normally equals to zero ( negative residual plus positive residuals above and below the line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Positive residuals

A

Points above the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Negative residuals

A

Points below the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The least-squares residual line is

A

The line with the smallest sum of squares errors (denoted SSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sum of Squared Deviations (residuals, errors (SSE) represents

A
The total variation in observed values of y
Sum residuals2( squared) =
        ( y   -     y-hat) squared
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Least - squares equation

A

Y-hat=a +bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Formula for a (intercept)

A

a=y-bar - bx(bar)

Where y and x are the respective means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Formula for b(slope)

A

Slope is a rate of change, the amount of change in y for a given value of x when x increases by 1

b=r Sy/Sx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Least-squares regressions line facts

A
  • makes the distance of the data points from the line small Only in Y direction
  • if we reverse the roles of two variables we get different least squared regression line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the connection between correlation r and the slope b of the least squared line?

A
Slope and r have the same sign
B=r only when Sy=Sx
Both r and b tell us the direction
If r=0 b =O
If ro b>0
If we know sign of r we know sign of b and vise versa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What b and r have in common

A

Always have the same sign

A change of 1 standard deviation in x corresponds to a change of r standard deviations in y.

Change in y(hat) is less then change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The least squares regression line always passes

A

Through the point (x bar;y bar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Correlation r describes

A

The straight line relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
The square of correlation r 2 gives us
The percentage % of Variation in the values of y that is explained by the least squares regression line On the chart R-sq=0.6937 or 69.37%
25
Regression line
Is a straight line that describes how a response variable y changes as an explanatory variable x changes
26
Least squares line is a math model used to predict
The value of y for a given x Y = a +bx
27
Least squares regression line requires that we have
Explanatory and response variables, quantitative
28
The least squares regression line of y on x is the line that makes
The sum of the squares of the vertical distance of the data points from line as small as possible
29
The least squares regression line as any line has
Slope and intercept Chance of y into Yhat Slope b =r(Sy/Sx) Where r is correlating factor and s are standard deviations for both x and y
30
When r2 is close to 0 zero the regression line
Is not a good model for the data ; hamburger shape , no relationship between x and y explained by regression line
31
When r2 is close to 1
The regression line should fit the data well or almost 100 % of variations in y are explained by x
32
The coefficient if determination r2
represents the fraction (%) of the variation in the values of y that is explained by the least squares regression of y on x.
33
Regression is a common statistical setting and least squared regression is most common method for
Fitting a regression line to data
34
Least squares regression line always passes through
The point x and y
35
Residual
Difference between an observed value of the response variable y and the value predicted by regression line y-hat Residual = observed y - predicted y or y-hat
36
The residual show
How far the data is from the regression line and how well the line describes the data.
37
The mean of the least squared residuals is
Always zero!
38
A residual plot (diagnostic plot)
Is a scatter plot of the residuals versus the observed x values ( or y-hats ) which lay on the regression line
39
If the residual plot shows uniform scatter of the points about the fitted line
Above and below with no unusual observations or systematic pattern, then the regression line captures the overall relationship well
40
Residual plot - curved pattern
Relationship is not linear
41
Residual plot - megaphone
Increasing or decreasing spread about the line x indicates that prediction of y will be LESS accurate for larger x's
42
Individual points with large residuals are
Outliers in the vertical direction
43
Influential observation
Is an outlier in either x or y direction which if removed would markedly change the value of the slope and y- intercept
44
Outlier
An observation that lies outside the overall pattern of the other observations
45
Ecological correlation
A correlation based on group mean averages rather than on individuals .
46
Correlation measures
Direction and strength of linear relationship of quantitative variables x and t
47
Regression models
The linear relationship between x and y and can be used to predict a value for the response variable y for a specific value of the explanatory variable x
48
What is total variation?
Sum of squared deviations about y-bar
49
What is unexplained variations?
Sum of squared residuals or variations not explained by regression line
50
Regression assumptions:
The relationship between x and y can be modeled by a straight line ( residuals show randomness around the line) Variations in Y's about the line does not depend on values if x ( residuals are similar in size for all X's)
51
If residuals conditions (assumptions) are met
Shoes box or There is no pattern in the residuals
52
Smile or frown pattern in residual plots indicate
Non-linear relationship - violation of conditions (assumptions)
53
Megaphone pattern in residual plot indicates
Non-constant variations ( variation in y is dependent on x)
54
Shoe box residual plot with a point outside indicates
Outlier in either x or y direction
55
An estimated statistical model-
Regression equation
56
Regression equation is an
Estimated statistical model
57
r2 is a measure of how
Successfully the regression explains the variation on the response, y
58
The sum of squared residuals measures ...... Variation
The unexplained
59
R-sq is a measure of the fraction of variation in y that is .... Not explained by X R-sq = 1 - unexplained var/total var
Not explained by x
60
Residual plot help us to magnify the residuals and identify ..... Sometimes we can see ..... Observations and ...... Which are much more visible on the residual plot.
Problems. Unusual observations Patterns
61
A residual plot is a ..... Of the x-values plotted against the residuals
Scatterplot
62
Correlations based on ..... Rather then on ...... Can be misleading if they are interpreted to be about individuals
Averages.....on ondividuals
63
Removing influential point from the data set will change ...
Slope and y-intercept