Chpt 13 - Simple Linear Regression Flashcards

1
Q

What is the statistical method to model the linear relationship between 2 numerical variables?

A

Simple linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we know if two variables are linearly related?

A

If the mean of response variable Y is linearly dependent upon the value of predictor variable X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the linear equation that determines a fitted line?

A

Also called the regression equation

Y = βo + β1x

Where:

βo - is the intercept
B1- is the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a graphical method to determine the relationship between X (predictor) and Y (response)?

A

Scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When looking at a scatter plot, if the means of Y at different values of X are close to a straight line (although not necessarily right on the line), what can be said about the relationship?

A

It is linearly related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the mean of Y decreases as the value of x increases, what type of linear relationship is this?

A

Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If the mean of Y increases as the value of X increases, what type of linear relationship is this?

A

Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What variable is X when looking at simple linear regression?

A

The predictor value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What variable is Y when looking at simple linear regression?

A

The response value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we denote the mean of Y when the predictor value is x?

A

μY|X=x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the best fitted line that is found based on the least-squares criterion?

A

Regression line

or least-squares line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the least squares criterion?

A

The line that best fits a set of data points is the one having the SMALLEST possible sum of the squared errors (residuals) which are made in using the fitted line to predict the y values

Basically, it helps us to determine the best line for the set of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which value represents the sum of square of the difference between x and it’s mean

A

Sxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does Sxx represent?

A

The sum of the square of the difference between x and its mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the computing formula for Sxx?

A

Sxx = Σxsquared - ((Σx)squared/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which value represents the measure of the total variability of the yi’s from y?

A

Syy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does Syy represent?

A

The measure of the total variability of the yi’s from the y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the computing formula for Syy?

A

Syy = Σysquared - ((Σy)squared/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What value represents the sum of the product of the differences between x values and the mean of x and the differences between y values and the mean of y?

A

Sxy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does Sxy represent?

A

the sum of the product of the differences between x values and the mean of x and the differences between y values and the mean of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the computing formula for Sxy?

A

Sxy = Σxy - ((Σx)(Σy)/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:

Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3

What is the value of Σx?

A

Σx = 92

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:

Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3

What is the value of Σxsquared?

A

Σxsquared = 724

24
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:

Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3

What is the value of Σy?

A

Σy = 125

25
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:

Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3

What is the value of Σysquared?

A

Σysquared = 1193

26
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:

Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3

What is the value of Σxy?

A

Σxy = 616

27
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:

n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616

What is the value of Sxx?

A

Sxx = Σxsquared - ((Σx)squared/n)

= 724 - (92squared/15)

= 159.733

28
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:

n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616

What is the value of Syy?

A

Syy = Σysquared - ((Σy)squared/n)

= 1193 - (125squared/15)

= 151.333

29
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:

n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616

What is the value of Sxy?

A

Sxy = Σxy - ((Σx)(Σy)/n)

= 616 - (92x125/15)

= -150.667

30
Q

How do we calculate the value of slope?

A

b1 = Sxy/Sxx

31
Q

How do we calculate the value of the y intercept?

A

bo = ȳ - b1x̄

32
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:

n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
Sxx = 159.733
Syy = 151.333
Sxy = -150.667

What is the value for the slope?

A

b1 = Sxy/Sxx

= -150.667/159.733

= -0.9432

33
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:

n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
Sxx = 159.733
Syy = 151.333
Sxy = -150.667
b1 = -0.9432

What is the value for the intercept?

A

bo = ȳ - b1x̄

= (Σy)/n - b1*(Σx)/n

=(Σy - b1*Σx)/n

= (125-(-0.9432)*92)/15

= 14.118

34
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:

n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
Sxx = 159.733
Syy = 151.333
Sxy = -150.667
b1 = -0.9432

What will happen to the cost of a car when it becomes 1 year older?

A

It’s price will decrease by 0.9432 thousand

35
Q

When we use the fitted regression equation to make a prediction, what should we avoid? What does this mean? Why do we avoid it?

A

We avoid extrapolation

This means predicting the value of a response variable when the value of the predictor variable is outside the observed range.

Because the very high and low ends are always going to give you really wonky numbers… like a car doesn’t become worth a negative amount of money just because it is now 20 years old. Cheap yes, them pay you to take it away, not likely :)

36
Q

What is the correlation coefficient and what is it denoted by?

A

Denoted by r

Measures the strength of the linear relationship between the response variable and predictor variable

-1 is a strong negative relationship

1 is a strong positive relationship

0 is no relationship

37
Q

How to the values of the correlation coefficient tell us about the relationship between the response variable and predictor variable?

A

-1 is a strong negative relationship

1 is a strong positive relationship

0 is no relationship

38
Q

How is the correlation coefficient (r) calculated?

A

r = Sxy/(√SxxSyy)

39
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are:

n = 15
Σx = 92
Σxsquared = 724
Σy = 125
Σysquared = 1193
Σxy = 616
Sxx = 159.733
Syy = 151.333
Sxy = -150.667
b1 = -0.9432

What is the correlation coefficient? What does this tell us about the relationship between age of the car and the sale price?

A

r = Sxy/√(SxxSyy)

= -150.667 / √(159.733*151.333)

= -0.969

There is a strong negative linear association so as the age increases, the sale price decreases

40
Q

What would an r=0.01 value tell us?

A

There is no linear relationship

41
Q

What would an r=0.5 value tell us?

A

That there is a positive linear relationship, but it’s not very strong

42
Q

What would an r=1 value tell us?

A

A very strong positive linear relationship

43
Q

What would an r=-0.5 value tells us?

A

There is a negative linear relationship, but its not very strong

44
Q

What would an r=-1 value tell us?

A

A very strong negative linear relationship

45
Q

What is the coefficient of determination and what is it denoted by?

A

Denoted as Rsquared

A method to evaluate the utility of a regression equation for making predictions. It measures the percentage of variation in the observed values of the response variable that can be explained by the regression model

46
Q

How can we calculate the coefficient of determination (Rsquared)?

A

Rsquared = rsquared

The value is always between 0 and 1

47
Q

What does the value of the coefficient of determination (Rsquared) tell us?

A

The value is always between 0 and 1. When the Rsquared value is near 1, it indicates that the regression model is useful for making predictions

48
Q

What is the distribution of the response variable at a given predictor value called?

A

Conditional distribution because the distribution depends on the x value….this is the 3D model the guy drew with the superimposed distributions

49
Q

We have a correlation coefficient value of r=-0.9691, what is the coefficient of determination?

A

Rsquared = rsquared

=(-0.9691)squared

=0.9392

=93.92% of the variation in the observed y-values can be explained by the linear regression equations

The larger the Rsquared value is, the more useful it is

50
Q

What is the mean and standard deviation of the conditional distribution called

A

Conditional mean and conditional standard deviation

51
Q

For a single, observation of the response variable, it is very unlikely to be the condiitonal mean exactly. How do we introduce this to the slope calculation?

A

y = βo + β1x + ε

Where ε is the error term added to the model to capture the variation

52
Q

What assumptions need to be met for making inferences about a linear regression equation?

A

Normal population - each conditional distribution should be normal

conditional mean of Y at X=x is βo + β1x

Equal standard deviation of all the conditional standard deviations

Independent observations

53
Q

What is a residual and what is it denoted by?

A

It is denoted by e

It is the difference between an observation and where you expect it to be

It is an estimate of error ε

54
Q

What is the equation for the residual?

A

ei = yi - (bo+b1xi)

55
Q
A