Chpt 13 - Simple Linear Regression Flashcards

1
Q

What is the statistical method to model the linear relationship between 2 numerical variables?

A

Simple linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we know if two variables are linearly related?

A

If the mean of response variable Y is linearly dependent upon the value of predictor variable X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the linear equation that determines a fitted line?

A

Also called the regression equation

Y = βo + β1x

Where:

βo - is the intercept
B1- is the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a graphical method to determine the relationship between X (predictor) and Y (response)?

A

Scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When looking at a scatter plot, if the means of Y at different values of X are close to a straight line (although not necessarily right on the line), what can be said about the relationship?

A

It is linearly related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the mean of Y decreases as the value of x increases, what type of linear relationship is this?

A

Negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If the mean of Y increases as the value of X increases, what type of linear relationship is this?

A

Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What variable is X when looking at simple linear regression?

A

The predictor value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What variable is Y when looking at simple linear regression?

A

The response value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we denote the mean of Y when the predictor value is x?

A

μY|X=x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the best fitted line that is found based on the least-squares criterion?

A

Regression line

or least-squares line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the least squares criterion?

A

The line that best fits a set of data points is the one having the SMALLEST possible sum of the squared errors (residuals) which are made in using the fitted line to predict the y values

Basically, it helps us to determine the best line for the set of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which value represents the sum of square of the difference between x and it’s mean

A

Sxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does Sxx represent?

A

The sum of the square of the difference between x and its mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the computing formula for Sxx?

A

Sxx = Σxsquared - ((Σx)squared/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which value represents the measure of the total variability of the yi’s from y?

A

Syy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does Syy represent?

A

The measure of the total variability of the yi’s from the y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the computing formula for Syy?

A

Syy = Σysquared - ((Σy)squared/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What value represents the sum of the product of the differences between x values and the mean of x and the differences between y values and the mean of y?

A

Sxy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does Sxy represent?

A

the sum of the product of the differences between x values and the mean of x and the differences between y values and the mean of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the computing formula for Sxy?

A

Sxy = Σxy - ((Σx)(Σy)/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:

Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3

What is the value of Σx?

A

Σx = 92

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:

Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3

What is the value of Σxsquared?

A

Σxsquared = 724

24
Q

We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are:

Age Price ($1000)
1 14
1 13
3 13
4 10
4 10
5 9
5 9
6 7
7 7
7 8
8 7
8 6
10 5
10 4
13 3

What is the value of Σy?

25
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are: Age Price ($1000) 1 14 1 13 3 13 4 10 4 10 5 9 5 9 6 7 7 7 7 8 8 7 8 6 10 5 10 4 13 3 What is the value of Σysquared?
Σysquared = 1193
26
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. The values (all in $1000 except n) are: Age Price ($1000) 1 14 1 13 3 13 4 10 4 10 5 9 5 9 6 7 7 7 7 8 8 7 8 6 10 5 10 4 13 3 What is the value of Σxy?
Σxy = 616
27
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are: n = 15 Σx = 92 Σxsquared = 724 Σy = 125 Σysquared = 1193 Σxy = 616 What is the value of Sxx?
Sxx = Σxsquared - ((Σx)squared/n) = 724 - (92squared/15) = 159.733
28
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are: n = 15 Σx = 92 Σxsquared = 724 Σy = 125 Σysquared = 1193 Σxy = 616 What is the value of Syy?
Syy = Σysquared - ((Σy)squared/n) = 1193 - (125squared/15) = 151.333
29
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are: n = 15 Σx = 92 Σxsquared = 724 Σy = 125 Σysquared = 1193 Σxy = 616 What is the value of Sxy?
Sxy = Σxy - ((Σx)(Σy)/n) = 616 - (92x125/15) = -150.667
30
How do we calculate the value of slope?
b1 = Sxy/Sxx
31
How do we calculate the value of the y intercept?
bo = ȳ - b1x̄
32
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are: n = 15 Σx = 92 Σxsquared = 724 Σy = 125 Σysquared = 1193 Σxy = 616 Sxx = 159.733 Syy = 151.333 Sxy = -150.667 What is the value for the slope?
b1 = Sxy/Sxx = -150.667/159.733 = -0.9432
33
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are: n = 15 Σx = 92 Σxsquared = 724 Σy = 125 Σysquared = 1193 Σxy = 616 Sxx = 159.733 Syy = 151.333 Sxy = -150.667 b1 = -0.9432 What is the value for the intercept?
bo = ȳ - b1x̄ = (Σy)/n - b1*(Σx)/n =(Σy - b1*Σx)/n = (125-(-0.9432)*92)/15 = 14.118
34
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are: n = 15 Σx = 92 Σxsquared = 724 Σy = 125 Σysquared = 1193 Σxy = 616 Sxx = 159.733 Syy = 151.333 Sxy = -150.667 b1 = -0.9432 What will happen to the cost of a car when it becomes 1 year older?
It's price will decrease by 0.9432 thousand
35
When we use the fitted regression equation to make a prediction, what should we avoid? What does this mean? Why do we avoid it?
We avoid extrapolation This means predicting the value of a response variable when the value of the predictor variable is outside the observed range. Because the very high and low ends are always going to give you really wonky numbers... like a car doesn't become worth a negative amount of money just because it is now 20 years old. Cheap yes, them pay you to take it away, not likely :)
36
What is the correlation coefficient and what is it denoted by?
Denoted by r Measures the strength of the linear relationship between the response variable and predictor variable -1 is a strong negative relationship 1 is a strong positive relationship 0 is no relationship
37
How to the values of the correlation coefficient tell us about the relationship between the response variable and predictor variable?
-1 is a strong negative relationship 1 is a strong positive relationship 0 is no relationship
38
How is the correlation coefficient (r) calculated?
r = Sxy/(√SxxSyy)
39
We are analyzing the relationship between the ages of used cars and sale prices and 15 cars are selected. Important values (all in $1000 except n) are: n = 15 Σx = 92 Σxsquared = 724 Σy = 125 Σysquared = 1193 Σxy = 616 Sxx = 159.733 Syy = 151.333 Sxy = -150.667 b1 = -0.9432 What is the correlation coefficient? What does this tell us about the relationship between age of the car and the sale price?
r = Sxy/√(SxxSyy) = -150.667 / √(159.733*151.333) = -0.969 There is a strong negative linear association so as the age increases, the sale price decreases
40
What would an r=0.01 value tell us?
There is no linear relationship
41
What would an r=0.5 value tell us?
That there is a positive linear relationship, but it's not very strong
42
What would an r=1 value tell us?
A very strong positive linear relationship
43
What would an r=-0.5 value tells us?
There is a negative linear relationship, but its not very strong
44
What would an r=-1 value tell us?
A very strong negative linear relationship
45
What is the coefficient of determination and what is it denoted by?
Denoted as Rsquared A method to evaluate the utility of a regression equation for making predictions. It measures the percentage of variation in the observed values of the response variable that can be explained by the regression model
46
How can we calculate the coefficient of determination (Rsquared)?
Rsquared = rsquared The value is always between 0 and 1
47
What does the value of the coefficient of determination (Rsquared) tell us?
The value is always between 0 and 1. When the Rsquared value is near 1, it indicates that the regression model is useful for making predictions
48
What is the distribution of the response variable at a given predictor value called?
Conditional distribution because the distribution depends on the x value....this is the 3D model the guy drew with the superimposed distributions
49
We have a correlation coefficient value of r=-0.9691, what is the coefficient of determination?
Rsquared = rsquared =(-0.9691)squared =0.9392 =93.92% of the variation in the observed y-values can be explained by the linear regression equations The larger the Rsquared value is, the more useful it is
50
What is the mean and standard deviation of the conditional distribution called
Conditional mean and conditional standard deviation
51
For a single, observation of the response variable, it is very unlikely to be the condiitonal mean exactly. How do we introduce this to the slope calculation?
y = βo + β1x + ε Where ε is the error term added to the model to capture the variation
52
What assumptions need to be met for making inferences about a linear regression equation?
Normal population - each conditional distribution should be normal conditional mean of Y at X=x is βo + β1x Equal standard deviation of all the conditional standard deviations Independent observations
53
What is a residual and what is it denoted by?
It is denoted by e It is the difference between an observation and where you expect it to be It is an estimate of error ε
54
What is the equation for the residual?
ei = yi - (bo+b1xi)
55