chapter 14: multiple regression and model building Flashcards

1
Q

what are multiple regression models?

A

regression models that employ more than one independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why is it possible for the multiple regression model formula to be like the:

y = B0 + B1x1 + B2x2 + E

why are there 2 xs?

A

because the mean level (Uy) now is B0 + B1x1 + B2x2

this basically means that there are two different independent variables that can correlate or “Influence” the dependent variable “y”

E still remains the error term that causes y to deviate from the mean level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the new name for the mean level:

Uy = B0 + B1x1 + B2x2

A

the plane of means

its in a three dimensional space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is B0 in Uy = B0 + B1x1 + B2x2?

A

it is still the y intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is B1 in Uy = B0 + B1x1 + B2x2?

A

the regression parameter for the variable x1

the slope of the plane of the x1 direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is B2 in Uy = B0 + B1x1 + B2x2?

A

the regression parameter for the variable x2

the slope of the plane of the x2 direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the error term E in Uy = B0 + B1x1 + B2x2?

A

the error term

what describes the effects on y other than x1 and x2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the formula of the point estimate or prediction of

y = B0 + B1x1 + B2x2 + E

what is the name of such equation

A

y^ = b0 + b1x1 + b2x2

called the least squared plane, the estimate of the plane of means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is there no error term when we use

y^ = b0 + b1x1 + b2x2

to predict a point of

y = B0 + B1x1 + B2x2 + E

A

the error term has a 50% chance of being positive and 50% chance of being negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the residual?

A

the difference between the observes and predicted values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is SSE

A

the unexplained variation

the sum of the squared residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the multiple coefficient of determination?

A

the proportion of the Total variation in the n observed values of the dependent variable that is explained by the overall regression model

R^2

R^2 = explained variation / total variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the multiple correlation coefficient

A

R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the adjusted R^2

A

the adjusted multiple coefficient of determination used to avoid overestimating the importance of independent variables

adjusted R^2 =

(R^2 - (k / (n - 1))) * ((n - 1) / (n - (k + 1)))

n is the number of observations

k the number of independent variables in the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are the four assumptions of the error term values in the multiple regression model?

A
  1. at any given combination of x1, x2, …, xk, the population of potential error terms has a mean value of 0
  2. constant variance assumption
  3. normality assumption
  4. independence assumption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the error term constant variance assumption?

A

population of error term values has a variance that does not depend on the combination of values of x1, x2, …, xk

the different population of potential error terms corresponding to different combinations of values x1, x2, …, xk have equal variances

the constant variance is the population variance

17
Q

what is the error term constant normality assumption?

A

at any given combination of x1, x2, …, xk, the population of potential error terms has a normal distribution

18
Q

what is the error term constant independence assumption?

A

any one value of the error term E is statistically independent of any other value of E

an error term of a certain y has nothing to do with an error term of another y

19
Q

what is the point estimate of the constant variance of the different populations of error terms?

formula too

A

the mean square error

s^2

s^2 = SSE / (n - (k + 1))

20
Q

what is the point estimate of the standard deviation of the different populations of error terms?

formula too

A

the standard error

s

s = (SSE / (n - (k + 1)))^(1/2)

21
Q

in the mean square error and the standard error (the point estimate of the constant variance and the standard deviation of the different populations of error terms),

what is the meaning of the following

n - (k + 1)

A

degrees of freedom associated with SSE

22
Q

is testing the significance of the relationship between y and x1, x2, …, xk a proper way of assessing the utility of the regression model?

A

yeeee

23
Q

how do you test the significance of the relationship between y and x1, x2

A

with the F test

24
Q

what is the null hypothesis (H0) of the the significance of the relationship between y and x1, x2, …, xk

A

H0: B1 = B2 = … = Bk = 0

none of the independent variables x1, x2, … xk are significantly related to y

the regression relationship is not significant

25
Q

what is the alternative hypothesis (H0) of the the significance of the relationship between y and x1, x2, …, xk

A

Ha: at least one of B1, B2 … Bk =/= 0

at least one of the independent variables x1, x2, … xk is significantly related to y

the regression relationship is significant

26
Q

how do you calculate the F of the F statistic

A

F =

(explained variation) / k
____________________________
((unexplained variation) / (n - (k + 1)))

27
Q

how do the R^2 and adjusted R^2 differ?

A

R̅^2 differs from R^2 by taking into consideration the number of independent variables in the model

Using R̅^2 helps avoid overestimating the importance of the independent variables

28
Q

why would we test the significance of a single independent variable?

A

to gain further information of which independent variables significantly affect y?

29
Q

when you test the significance of a single independent variable, how do you refer to it?

what else must you assume?

A

xj

you have to assume it is multiplied by the parameter Bj

30
Q

what is the null hypothesis when you test xj?

A

Bj = 0

here, we say that xj is not significantly related to y

31
Q

what is the alternate hypothesis when you test xj?

A

Bj =/= 0

here, we say that xj is significantly related to y in the regression model under consideration

32
Q

what is the sbj

A

the standard error of the estimate bj

the point estimate of the population standard deviation of bj

33
Q

what test do you use to test the significance of xj?

A

the t test

34
Q

what is the formula of the t test to to test the significance of xj?

A

t = bj / sbj

35
Q

using the t test to to test the significance of xj, when do we reject Ho in favor of Ha?

A

t > t alpha

p value < significance value