Flashcards in Exam 2- Regression Deck (64):

0

## Statistical model

### An equation that fits the pattern between a response variable and possible explanatory variables, accounting for deviations from the model. Or in other words, a regression line

1

##
^

Y=a+bx

###
Y-hat reminds us that we have deviations about the line and that values for y specified by the line are PREDICTIOnS

a - intercept

b - slope

^

Y- predicted value if y for a given x

2

## What does y intercept tells us?

### The value of y when x=0

3

## What does slope tell us?

### The change in y for every one unit increase in x , on average!

4

## As x increases by one unit what happened to the y when slope is negative?

### Y decreases

5

## As x increases by one unit what happens to y when slope is positive?

### Y increases by rise/run units

6

## b=

### Rise(y)/run(x)

7

## Interpretation of slope : rise/run

### For every inch increase in height at age 4 , height increases by 1.15 inches ON AVERAGE at age 18

8

## Interpretation of y- intercept

###
Males who are zero inches tall at age 4 will be 23 inches tall at age 18

The intercept is the value of y when x=O

9

## How to predict

###
- collect data

- plot data

- predict

- fit the data with a straight line equation

- evaluate the equation

10

## Residuals

###
Vertical distance from the observed y value and the line , or

The difference between observed y value and y-hat , the value predicted by regression line

11

## Squared Prediction error (residual)2

###
(Observed y - predicted y)2= (Y - Y(hat)) squared

They are squared because the sum of two residuals are normally equals to zero ( negative residual plus positive residuals above and below the line)

12

## Positive residuals

### Points above the line

13

## Negative residuals

### Points below the line

14

## The least-squares residual line is

### The line with the smallest sum of squares errors (denoted SSE)

15

## Sum of Squared Deviations (residuals, errors (SSE) represents

###

The total variation in observed values of y

Sum residuals2( squared) =

( y - y-hat) squared

16

## Least - squares equation

### Y-hat=a +bx

17

## Formula for a (intercept)

###
a=y-bar - bx(bar)

Where y and x are the respective means

18

## Formula for b(slope)

###
Slope is a rate of change, the amount of change in y for a given value of x when x increases by 1

b=r Sy/Sx

19

## Least-squares regressions line facts

###
-makes the distance of the data points from the line small Only in Y direction

- if we reverse the roles of two variables we get different least squared regression line

20

## What is the connection between correlation r and the slope b of the least squared line?

###
Slope and r have the same sign

B=r only when Sy=Sx

Both r and b tell us the direction

If r=0 b =O

If ro b>0

If we know sign of r we know sign of b and vise versa

21

## What b and r have in common

###
Always have the same sign

A change of 1 standard deviation in x corresponds to a change of r standard deviations in y.

Change in y(hat) is less then change in x

22

## The least squares regression line always passes

### Through the point (x bar;y bar)

23

## Correlation r describes

### The straight line relationship

24

## The square of correlation r 2 gives us

###
The percentage % of Variation in the values of y that is explained by the least squares regression line

On the chart R-sq=0.6937 or 69.37%

25

## Regression line

### Is a straight line that describes how a response variable y changes as an explanatory variable x changes

26

## Least squares line is a math model used to predict

###
The value of y for a given x

Y = a +bx

27

## Least squares regression line requires that we have

### Explanatory and response variables, quantitative

28

## The least squares regression line of y on x is the line that makes

### The sum of the squares of the vertical distance of the data points from line as small as possible

29

## The least squares regression line as any line has

###
Slope and intercept

Chance of y into Yhat

Slope b =r(Sy/Sx) Where r is correlating factor and s are standard deviations for both x and y

30

## When r2 is close to 0 zero the regression line

### Is not a good model for the data ; hamburger shape , no relationship between x and y explained by regression line

31

## When r2 is close to 1

### The regression line should fit the data well or almost 100 % of variations in y are explained by x

32

## The coefficient if determination r2

### represents the fraction (%) of the variation in the values of y that is explained by the least squares regression of y on x.

33

## Regression is a common statistical setting and least squared regression is most common method for

### Fitting a regression line to data

34

## Least squares regression line always passes through

### The point x and y

35

## Residual

###
Difference between an observed value of the response variable y and the value predicted by regression line y-hat

Residual = observed y - predicted y or y-hat

36

## The residual show

### How far the data is from the regression line and how well the line describes the data.

37

## The mean of the least squared residuals is

### Always zero!

38

## A residual plot (diagnostic plot)

### Is a scatter plot of the residuals versus the observed x values ( or y-hats ) which lay on the regression line

39

## If the residual plot shows uniform scatter of the points about the fitted line

### Above and below with no unusual observations or systematic pattern, then the regression line captures the overall relationship well

40

## Residual plot - curved pattern

### Relationship is not linear

41

## Residual plot - megaphone

### Increasing or decreasing spread about the line x indicates that prediction of y will be LESS accurate for larger x's

42

## Individual points with large residuals are

### Outliers in the vertical direction

43

## Influential observation

### Is an outlier in either x or y direction which if removed would markedly change the value of the slope and y- intercept

44

## Outlier

### An observation that lies outside the overall pattern of the other observations

45

## Ecological correlation

###
A correlation based on group mean averages rather than on individuals .

46

## Correlation measures

### Direction and strength of linear relationship of quantitative variables x and t

47

## Regression models

### The linear relationship between x and y and can be used to predict a value for the response variable y for a specific value of the explanatory variable x

48

## What is total variation?

### Sum of squared deviations about y-bar

49

## What is unexplained variations?

### Sum of squared residuals or variations not explained by regression line

50

##
Regression assumptions:

###
The relationship between x and y can be modeled by a straight line ( residuals show randomness around the line)

Variations in Y's about the line does not depend on values if x ( residuals are similar in size for all X's)

51

## If residuals conditions (assumptions) are met

### Shoes box or There is no pattern in the residuals

52

## Smile or frown pattern in residual plots indicate

### Non-linear relationship - violation of conditions (assumptions)

53

## Megaphone pattern in residual plot indicates

### Non-constant variations ( variation in y is dependent on x)

54

## Shoe box residual plot with a point outside indicates

### Outlier in either x or y direction

55

## An estimated statistical model-

### Regression equation

56

## Regression equation is an

### Estimated statistical model

57

## r2 is a measure of how

### Successfully the regression explains the variation on the response, y

58

## The sum of squared residuals measures ...... Variation

### The unexplained

59

##
R-sq is a measure of the fraction of variation in y that is .... Not explained by X

R-sq = 1 - unexplained var/total var

### Not explained by x

60

## Residual plot help us to magnify the residuals and identify ..... Sometimes we can see ..... Observations and ...... Which are much more visible on the residual plot.

###
Problems.

Unusual observations

Patterns

61

## A residual plot is a ..... Of the x-values plotted against the residuals

### Scatterplot

62

## Correlations based on ..... Rather then on ...... Can be misleading if they are interpreted to be about individuals

### Averages.....on ondividuals

63