READING 10 SIMPLE LINEAR REGRESSION Flashcards

(68 cards)

1
Q

What is the primary purpose of simple linear regression?

(A) To determine if two variables are related, without quantifying the relationship.
(B) To explain the variation in a dependent variable using the variation in a single independent variable.
(C) To predict future values of an independent variable based on a dependent variable.
(D) To analyze the correlation between multiple independent variables.

A

(B) To explain the variation in a dependent variable using the variation in a single independent variable.

Simple linear regression aims to model how changes in one variable (independent) are associated with changes in another (dependent), thus explaining the dependent variable’s variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a simple linear regression model, the variable whose variation is being explained is called the:

(A) Independent variable.
(B) Explanatory variable.
(C) Dependent variable.
(D) Predictor variable.

A

(C) Dependent variable.

The dependent variable is the outcome or response variable that we are trying to understand or predict. Its variation is what we are modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a simple linear regression model, the variable used to explain the variation in the dependent variable is called the:

(A) Response variable.
(B) Dependent variable.
(C) Endogenous variable.
(D) Independent variable.

A

(D) Independent variable.

The independent variable is the variable that is believed to influence or explain the changes in the dependent variable. It’s the “cause” in our simple linear model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

“Variation” in the context of linear regression refers to:

(A) The difference between the highest and lowest values of a variable.
(B) The standard deviation of a variable.
(C) The degree to which a variable differs from its mean value.
(D) The correlation between two variables.

A

(C) The degree to which a variable differs from its mean value.

Variation, often quantified by the sum of squared deviations from the mean, describes the spread or dispersion of the data points around the average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Suppose you are trying to predict a company’s stock price using its earnings per share (EPS). In this scenario, the dependent variable is:

(A) Earnings per share (EPS).
(B) The relationship between stock price and EPS.
(C) The prediction error.
(D) The company’s stock price.

A

(D) The company’s stock price.

We are trying to predict the stock price, so it is the variable being explained (dependent). EPS is the factor we are using for the prediction (independent).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Another term often used to describe the independent variable in a regression analysis is:

(A) Residual.
(B) Intercept.
(C) Explanatory variable.
(D) Coefficient of determination.

A

(C) Explanatory variable.

The independent variable is used to explain the changes in the dependent variable, hence the term “explanatory variable.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Another term often used to describe the dependent variable in a regression analysis is:

(A) Slope.
(B) Regressor.
(C) Predicted variable.
(D) Error term.

A

(C) Predicted variable.

The dependent variable is the one we are trying to predict using the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Understanding the difference between the dependent and independent variable is crucial because:

(A) It determines the scale of the regression coefficients.
(B) It dictates the direction of the hypothesized causal relationship being modeled.
(C) It affects the calculation of the correlation coefficient.
(D) It is only important for multiple regression, not simple linear regression.

A

(B) It dictates the direction of the hypothesized causal relationship being modeled.

The choice of which variable is dependent and which is independent reflects the assumed direction of influence. We are modeling how X affects Y, not the other way around.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In the simple linear regression model Yi​ = b0+b1(Xi)+ϵi, the term ϵi represents the:

(A) Predicted value of the dependent variable.
(B) Slope of the regression line.
(C) Residual or error term for the i-th observation.
(D) Intercept of the regression line.

A

(C) Residual or error term for the i-th observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The primary goal of the Ordinary Least Squares (OLS) method in linear regression is to:

(A) Maximize the correlation between the independent and dependent variables.
(B) Minimize the sum of the absolute errors.
(C) Minimize the sum of the squared errors.
(D) Ensure that the residuals are normally distributed.

A

(C) Minimize the sum of the squared errors.

OLS estimates the regression coefficients by finding the line that minimizes the sum of the squared differences between the actual and predicted values of the dependent variable (the sum of squared errors, SSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The intercept (b0) in a simple linear regression represents the:

(A) Change in the dependent variable for a one-unit change in the independent variable.
(B) Predicted value of the dependent variable when the independent variable is zero.
(C) Average value of the dependent variable.
(D) Standard deviation of the dependent variable.

A

(B) Predicted value of the dependent variable when the independent variable is zero.

The intercept (b0) is the estimated value of the dependent variable (Y) when the independent variable (X) is equal to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The slope coefficient (b1) in a simple linear regression represents the:

(A) Predicted value of the dependent variable when the independent variable is one.
(B) Baseline value of the dependent variable when the independent variable is zero.
(C) Change in the dependent variable for a one-unit change in the independent variable.
(D) Average change in the independent variable.

A

(C) Change in the dependent variable for a one-unit change in the independent variable.

The slope coefficient (b1) quantifies the change in the dependent variable (Y) associated with a one-unit increase in the independent variable (X).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The sum of the residuals (∑(Yi − Y^i)) in an OLS regression is typically:

(A) Minimized and always positive.
(B) Maximized.
(C) Equal to zero.
(D) Equal to the sum of squared errors.

A

(C) Equal to zero.

One of the properties of OLS is that the sum of the residuals (the differences between the actual and predicted values) is equal to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If the slope coefficient in a simple linear regression is positive, it indicates that:

(A) An increase in the independent variable is associated with a decrease in the dependent variable.
(B) There is no linear relationship between the two variables.
(C) An increase in the independent variable is associated with an increase in the dependent variable.
(D) The intercept of the regression line is also positive.

A

(C) An increase in the independent variable is associated with an increase in the dependent variable.

A positive slope coefficient signifies a direct, positive linear relationship: as the independent variable increases, the dependent variable tends to increase as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The formula for the estimated slope coefficient is directly related to the:

(A) Variance of the dependent variable.
(B) Covariance between the independent and dependent variables and the variance of the independent variable.
(C) Correlation coefficient squared.
(D) Sum of squared errors.

A

(B) Covariance between the independent and dependent variables and the variance of the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The intercept term in a regression model should be interpreted with caution when:

(A) The slope coefficient is statistically significant.
(B) The independent variable never takes on a value close to zero within the observed data range.
(C) The correlation between the variables is high.
(D) The sample size is large.

A

(B) The independent variable never takes on a value close to zero within the observed data range.

If the value of zero for the independent variable is far outside the range of the observed data, the intercept may not have a meaningful real-world interpretation. The linear relationship might not hold true at such extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The difference between the observed value of the dependent variable (Yi) and the predicted value (Y^i) is known as the:

(A) Explained variation.
(B) Total variation.
(C) Residual or error.
(D) Regression coefficient.

A

(C) Residual or error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In a simple linear regression, if there is a perfect positive linear relationship between the independent and dependent variables, the sum of squared errors (SSE) will be:

(A) Positive and large.
(B) Positive and small.
(C) Equal to zero.
(D) Equal to the total variation.

A

(C) Equal to zero.

A perfect linear relationship means all data points lie exactly on the regression line. In this ideal scenario, there are no deviations between the actual and predicted values, resulting in a sum of squared errors (SSE) of zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In a simple linear regression where a stock’s return is regressed against the market return, a slope coefficient of 1.2 indicates that for every 1% increase in the market return, the stock’s return is expected to:

(A) Decrease by 1.2%.
(B) Increase by 1.2%.
(C) Remain unchanged.
(D) Increase by 0.2%.

A

(B) Increase by 1.2%.

The slope coefficient represents the change in the dependent variable (stock’s return) for a one-unit change in the independent variable (market return). A slope of 1.2 means a 1% increase in the market return is associated with a 1.2% expected increase in the stock’s return.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In a regression of bond yield on the policy interest rate, a slope coefficient of 0.8 implies that a 100 basis point increase in the policy interest rate is expected to lead to a change in the bond yield of:

(A) An increase of 0.8 basis points.
(B) An increase of 80 basis points.
(C) A decrease of 0.8 basis points.
(D) A decrease of 80 basis points.

A

(A) An increase of 0.8 basis points.

The slope of 0.8 means for every one-unit (1 basis point in this case) increase in the policy interest rate, the bond yield is expected to increase by 0.8 units (0.8 basis points). Therefore, a 100 basis point increase would lead to an expected increase of 0.8 * 100 = 80 basis points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When interpreting a regression intercept, it is most important to consider:

(A) The magnitude of the slope coefficient.
(B) Whether the value of zero for the independent variable is within the relevant data range.
(C) The correlation coefficient between the variables.
(D) The statistical significance of the slope coefficient.

A

(B) Whether the value of zero for the independent variable is within the relevant data range

If the independent variable rarely or never takes on a value near zero in the observed data, the intercept may not have a practical or meaningful interpretation within the context of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

An intercept of -0.5% in a regression of a company’s sales on advertising expenditure suggests that if advertising expenditure is zero, the company’s sales are expected to be:

(A) 0.5% higher than average.
(B) Zero.
(C) 0.5% lower than average.
(D) -0.5%.

A

(D) -0.5%.

The intercept is the predicted value of the dependent variable (sales) when the independent variable (advertising expenditure) is zero. Therefore, expected sales would be -0.5%. Note that in a real-world scenario, negative sales might not be economically meaningful, highlighting the caution needed in interpreting intercepts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A slope coefficient of -0.3 in a regression of product demand on price indicates that a $1 increase in price is expected to lead to a:

(A) Decrease in demand of 0.3 units.
(B) Increase in demand of 0.3 units.
(C) No change in demand.
(D) Decrease in demand of 3 units.

A

(A) Decrease in demand of 0.3 units.

A negative slope indicates an inverse relationship. A slope of -0.3 means that for every $1 increase in price, the demand for the product is expected to decrease by 0.3 units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The magnitude of the slope coefficient in a simple linear regression directly indicates the:

(A) Strength of the linear relationship.
(B) Statistical significance of the relationship.
(C) Sensitivity of the dependent variable to a one-unit change in the independent variable.
(D) Proportion of the total variation in the dependent variable explained by the model.

A

(C) Sensitivity of the dependent variable to a one-unit change in the independent variable.

The slope coefficient’s magnitude quantifies how much the dependent variable is expected to change for each unit change in the independent variable. It reflects the sensitivity of Y to X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
In the context of regressing a stock's excess return on the market's excess return, the slope coefficient is often referred to as: (A) Alpha. (B) Standard deviation. (C) Beta. (D) R-squared.
(C) Beta. In the Capital Asset Pricing Model (CAPM) framework, the slope coefficient from regressing a stock's excess return on the market's excess return is the stock's beta, a measure of its systematic risk.
26
A regression analysis yields an intercept of 10 and a slope of 2. If the independent variable has a value of 5, the predicted value of the dependent variable is: (A) 2. (B) 10. (C) 20. (D) 30.
(C) 20. Using the regression equation Y^ = b^0+ b^1(X). Where: b^0 = 10 b^1 = 2 and X = 5 Thus: Y^ = 10 + (2×5) = 10+10=20.
27
When interpreting a slope coefficient, its sign indicates the: (A) Strength of the relationship. (B) Statistical significance of the relationship. (C) Direction of the relationship (positive or negative). (D) Amount of unexplained variation.
(C) Direction of the relationship (positive or negative). The sign of the slope coefficient (+ or -) tells us whether the relationship between the independent and dependent variables is direct (positive) or inverse (negative).
28
A researcher regresses a company's profit on its research and development (R&D) expenditure and finds a statistically significant slope coefficient of 0.8. This indicates that: (A) An increase in profit causes an increase in R&D expenditure. (B) For every $1 increase in R&D expenditure, profit is expected to increase by $0.80. (C) 80% of the variation in profit is explained by R&D expenditure. (D) Profit is always 80% of R&D expenditure.
(B) For every $1 increase in R&D expenditure, profit is expected to increase by $0.80. The slope coefficient of 0.8 implies that for each $1 increase in the independent variable (R&D expenditure), the dependent variable (profit) is expected to increase by $0.80. The statistical significance indicates that this relationship is unlikely to be due to random chance.
29
Which of the following is a key assumption of simple linear regression regarding the relationship between the dependent and independent variables? (A) The relationship is non-linear and monotonic. (B) The relationship is linear. (C) The relationship is exponential. (D) There is no relationship between the variables.
(B) The relationship is linear. A fundamental assumption of simple linear regression is that the relationship between the dependent and independent variables can be adequately modeled by a straight line
30
A researcher fits a linear regression model to data where the true underlying relationship is U-shaped. What pattern would likely be observed in a plot of the residuals against the independent variable? (A) A random scatter of points around zero. (B) A linear trend, either positive or negative. (C) A systematic, non-linear pattern (e.g., U-shaped or inverted U-shaped). (D) No discernible pattern.
(C) A systematic, non-linear pattern (e.g., U-shaped or inverted U-shaped). When a linear model is fit to non-linear data, the residuals will often exhibit a systematic pattern that mirrors the non-linearity in the underlying relationship. In this case, a U-shaped pattern in the residuals would be expected.
31
A random scatter of residuals around zero in a plot of residuals against the independent variable suggests that the assumption of: (A) Homoskedasticity is violated. (B) Autocorrelation is present. (C) Linearity of the relationship is likely met. (D) Normality of residuals is violated.
(C) Linearity of the relationship is likely met. A random scatter of residuals indicates that the errors are not systematically related to the independent variable, supporting the assumption that a linear model is appropriate for the relationship.
32
Violation of the linearity assumption in a linear regression model can lead to: (A) Unbiased but inefficient coefficient estimates. (B) Biased coefficient estimates and unreliable predictions. (C) Inflated standard errors of the coefficients. (D) Difficulty in interpreting the intercept.
(B) Biased coefficient estimates and unreliable predictions. If the true relationship is non-linear but a linear model is used, the model will not accurately capture the relationship, leading to biased estimates of the coefficients and unreliable predictions of the dependent variable.
33
The assumption of homoskedasticity in linear regression requires that the: (A) Dependent variable has a constant variance. (B) Independent variable has a constant variance. (C) Variance of the residual terms is constant across all levels of the independent variable(s). (D) Residual terms are normally distributed.
(C) Variance of the residual terms is constant across all levels of the independent variable(s). Homoskedasticity means "same scatter" and refers to the condition where the spread or variance of the prediction errors (residuals) is constant across all values of the independent variable(s).
34
A violation of the homoskedasticity assumption is known as: (A) Autocorrelation. (B) Multicollinearity. (C) Heteroskedasticity. (D) Non-normality.
(C) Heteroskedasticity. Heteroskedasticity occurs when the variance of the residual terms is not constant across the levels of the independent variable(s).
35
In a plot of residuals against the independent variable, what pattern would suggest heteroskedasticity? (A) A random scatter of points around zero with a consistent spread. (B) A horizontal band of residuals centered around zero. (C) A funnel shape, where the spread of residuals increases or decreases as the independent variable changes. (D) A cyclical pattern in the residuals.
(C) A funnel shape, where the spread of residuals increases or decreases as the independent variable changes. A funnel shape in the residual plot, where the variance of the residuals changes systematically with the independent variable, is a common visual indicator of heteroskedasticity.
36
Heteroskedasticity in a linear regression model can lead to: (A) Biased but efficient coefficient estimates. (B) Unbiased but inefficient coefficient estimates and unreliable standard errors. (C) Biased and inconsistent coefficient estimates. (D) Reliable standard errors but inefficient coefficient estimates.
(B) Unbiased but inefficient coefficient estimates and unreliable standard errors. While the OLS coefficient estimates remain unbiased in the presence of heteroskedasticity, they are no longer the most efficient, and the standard errors of the coefficients become unreliable, affecting hypothesis testing and confidence intervals.
37
Which of the following is a consequence of heteroskedasticity that is most concerning for hypothesis testing? (A) The regression line no longer fits the data well. (B) The sum of squared errors is inflated. (C) The standard errors of the regression coefficients are unreliable. (D) The residuals are no longer normally distributed.
(C) The standard errors of the regression coefficients are unreliable. Unreliable standard errors due to heteroskedasticity can lead to incorrect t-statistics and F-statistics, resulting in flawed conclusions from hypothesis tests about the significance of the regression coefficients.
38
The assumption of independence of residuals in linear regression means that the error term for one observation should: (A) Have a constant variance. (B) Be normally distributed. (C) Not be correlated with the error term for any other observation. (D) Have a mean of zero.
(C) Not be correlated with the error term for any other observation. The independence assumption requires that the residuals are not systematically related to each other. The error in predicting one data point should not influence the error in predicting another.
39
Correlation between residual terms is known as: (A) Heteroskedasticity. (B) Autocorrelation or serial correlation. (C) Multicollinearity. (D) Non-linearity.
(B) Autocorrelation or serial correlation. Autocorrelation (for time series data) or serial correlation (more general term) refers to the situation where the residuals are correlated with each other
40
In time series regression, a cyclical pattern in the plot of residuals against time suggests a violation of the assumption of: (A) Homoskedasticity. (B) Normality of residuals. (C) Independence of residuals. (D) Linearity.
(C) Independence of residuals. A cyclical pattern indicates that the prediction errors are systematically related over time, violating the independence assumption.
41
Violation of the independence of residuals (autocorrelation) can lead to: (A) Biased coefficient estimates. (B) Unreliable standard errors of the coefficients. (C) Inefficient coefficient estimates but reliable standard errors. (D) No impact on the validity of the regression model.
(B) Unreliable standard errors of the coefficients. Autocorrelation typically leads to underestimated standard errors, making hypothesis tests unreliable.
42
Which of the following is a method to detect autocorrelation in the residuals of a time series regression? (A) Examining a scatter plot of residuals against the independent variable. (B) Examining a histogram of the residuals. (C) Using the Durbin-Watson test. (D) Using the Breusch-Pagan test.
(C) Using the Durbin-Watson test. The Durbin-Watson test is a formal statistical test specifically designed to detect first-order autocorrelation in the residuals of a time series regression.
43
A Q-Q plot is used to assess the assumption of: (A) Homoskedasticity. (B) Independence of residuals. (C) Normality of residuals. (D) Linearity.
(C) Normality of residuals. A Quantile-Quantile (Q-Q) plot compares the quantiles of the residuals to the quantiles of a normal distribution. If the residuals are normally distributed, the points on the Q-Q plot should fall roughly along a straight line.
44
The assumption of normality of residuals is most critical for: (A) Obtaining unbiased coefficient estimates. (B) Ensuring homoskedasticity. (C) Conducting valid hypothesis tests and constructing confidence intervals, especially with small sample sizes. (D) Ensuring a linear relationship between the variables.
(C) Conducting valid hypothesis tests and constructing confidence intervals, especially with small sample sizes. While OLS estimators have desirable properties even without normal residuals (especially with large samples due to the Central Limit Theorem), the normality assumption is important for the validity of t-tests, F-tests, and confidence intervals.
45
Observations with unusually large prediction errors or extreme values of the independent variable are called: (A) Autocorrelated residuals. (B) Heteroskedastic errors. (C) Outliers. (D) Influential points.
(C) Outliers. Outliers are data points that lie far away from the general pattern of the data. They can have a disproportionate influence on the regression results.
46
Outliers can have a significant impact on the: (A) Sample size. (B) Number of independent variables. (C) Estimated regression line and parameter estimates. (D) Correlation coefficient, but not the regression line.
(C) Estimated regression line and parameter estimates.
47
A researcher observes a significant positive autocorrelation in the residuals of their regression model. What might be a potential consequence? (A) The standard errors of the coefficients are likely overestimated. (B) The model is correctly specified. (C) The standard errors of the coefficients are likely underestimated, leading to inflated t-statistics. (D) The residuals are randomly distributed.
(C) The standard errors of the coefficients are likely underestimated, leading to inflated t-statistics. Positive autocorrelation typically leads to underestimated standard errors, which in turn can inflate the t-statistics and potentially lead to incorrect rejection of the null hypothesis.
48
If the Durbin-Watson statistic is close to 0, it suggests: (A) No autocorrelation. (B) Positive autocorrelation. (C) Negative autocorrelation. (D) Heteroskedasticity.
(B) Positive autocorrelation. The Durbin-Watson statistic ranges from 0 to 4. A value close to 2 suggests no autocorrelation, a value close to 0 suggests positive autocorrelation, and a value close to 4 suggests negative autocorrelation.
49
Addressing autocorrelation in a time series regression might involve: (A) Transforming the dependent variable to stabilize its variance. (B) Including lagged values of the dependent variable as independent variables. (C) Using robust standard errors that are consistent in the presence of heteroskedasticity. (D) Removing outliers from the dataset.
(B) Including lagged values of the dependent variable as independent variables. Including lagged dependent variables can help capture the time dependence in the data and reduce autocorrelation in the residuals.
50
While a large sample size can mitigate the impact of non-normal residuals on hypothesis testing, it does NOT necessarily resolve issues related to: (A) The unbiasedness of the coefficient estimates. (B) The efficiency of the coefficient estimates. (C) Autocorrelation or heteroskedasticity. (D) The consistency of the coefficient estimates.
(C) Autocorrelation or heteroskedasticity. The Central Limit Theorem helps with the normality assumption in large samples, but it does not address problems like autocorrelation (dependent errors) or heteroskedasticity (non-constant variance of errors). These issues require specific detection and correction methods.
51
In the context of ANOVA for simple linear regression, the Total Sum of Squares (SST) measures the: (A) Variation in the dependent variable explained by the regression model. (B) Unexplained variation in the dependent variable. (C) Total variation in the dependent variable around its mean. (D) Variation of the predicted values around zero.
(C) Total variation in the dependent variable around its mean. SST quantifies the total dispersion of the observed values of the dependent variable (Yi) around their average (Yˉ).
52
The Sum of Squares Regression (SSR) in ANOVA for simple linear regression measures the: (A) Total variation in the dependent variable. (B) Unexplained variation in the dependent variable. (C) Variation in the dependent variable explained by the independent variable. (D) Variation of the residuals around zero.
(C) Variation in the dependent variable explained by the independent variable. SSR represents the portion of the total variation in the dependent variable that is accounted for by the linear relationship with the independent variable (i.e., the variation of the predicted Y^i around Yˉ).
53
The Sum of Squared Errors (SSE) in ANOVA for simple linear regression measures the: (A) Total variation in the dependent variable. (B) Unexplained variation in the dependent variable (the sum of squared residuals). (C) Variation explained by the independent variable. (D) Variation of the predicted values around their mean.
(B) Unexplained variation in the dependent variable (the sum of squared residuals). SSE quantifies the amount of variation in the dependent variable that is not explained by the regression model (the sum of the squared differences between Yi and Y^i).
54
The fundamental relationship between SST, SSR, and SSE is: (A) SST = SSR - SSE (B) SSR = SST + SSE (C) SST = SSR + SSE (D) SSE = SST + SSR
(C) SST = SSR + SSE The total variation in the dependent variable (SST) is equal to the sum of the variation explained by the regression (SSR) and the variation that remains unexplained (SSE).
55
In simple linear regression, the degrees of freedom for the Mean Square Regression (MSR) are: (A) n−2 (B) n−1 (C) 1 (D) n
(C) 1 In simple linear regression, there is one independent variable, so the degrees of freedom for the regression are k = 1. MSR = SSR/k = SSR/1 = SSR.
56
In simple linear regression with n observations, the degrees of freedom for the Mean Square Error (MSE) are: (A) 1 (B) n (C) n−1 (D) n−2
(D) n−2 The degrees of freedom for the error term are n-(number of estimated parameters) = n−2 (one for the intercept and one for the slope). MSE=SSE/(n−2).
57
Mean Square Regression (MSR) is calculated as: (A) SSE/(n−2) (B) SSR/(n−1) (C) SSR/1 (D) SST/(n−1)
(C) SSR/1 MSR is the Sum of Squares Regression (SSR) divided by its degrees of freedom, which is 1 in simple linear regression.
58
Mean Square Error (MSE) is calculated as: (A) SSR/1 (B) SST/(n−1) (C) SSE/(n−2) (D) SSR/(n−2)
(C) SSE/(n−2) MSE is the Sum of Squared Errors (SSE) divided by its degrees of freedom, which is n−2 in simple linear regression.
59
ANOVA in simple linear regression helps to assess the: (A) Significance of individual regression coefficients. (B) Presence of autocorrelation in the residuals. (C) Overall goodness of fit and significance of the regression relationship. (D) Presence of heteroskedasticity.
(C) Overall goodness of fit and significance of the regression relationship. ANOVA provides a framework to evaluate how well the regression model as a whole explains the variation in the dependent variable.
60
A larger SSR relative to SST indicates a: (A) Poor fit of the regression model. (B) Stronger linear relationship and a better fit of the regression model. (C) Higher unexplained variation in the dependent variable. (D) Lower correlation between the variables.
(B) Stronger linear relationship and a better fit of the regression model. If SSR is large compared to SST, it means a larger proportion of the total variation in Y is explained by the regression, indicating a better fit.
61
The F-statistic in ANOVA for simple linear regression is calculated as: (A) MSE/MSR (B) SSE/SST (C) MSR/MSE (D) SSR/SST
(C) MSR/MSE The F-statistic is the ratio of the Mean Square Regression (MSR) to the Mean Square Error (MSE). It is used to test the overall significance of the regression relationship.
62
A smaller SSE relative to SST indicates a: (A) Weaker linear relationship. (B) Better fit of the regression model, as less variation is unexplained. (C) Higher total variation in the dependent variable. (D) A slope coefficient close to zero.
(B) Better fit of the regression model, as less variation is unexplained. A smaller SSE means that the regression model has smaller prediction errors and therefore provides a better fit to the data.
63
A statistically significant F-statistic in the ANOVA table of a simple linear regression indicates that: (A) The intercept is significantly different from zero. (B) The independent variable has no linear relationship with the dependent variable. (C) At least one of the regression coefficients (in this case, the slope) is significantly different from zero, implying a significant linear relationship. (D) The residuals are normally distributed.
(C) At least one of the regression coefficients (in this case, the slope) is significantly different from zero, implying a significant linear relationship. A significant F-statistic suggests that the regression model as a whole explains a statistically significant portion of the variation in the dependent variable, meaning there is a significant linear relationship between X and Y (since there's only one independent variable in simple linear regression).
64
The Standard Error of Estimate (SEE) is a measure of the: (A) Proportion of the total variation in the dependent variable explained by the regression. (B) Correlation between the independent and dependent variables. (C) Standard deviation of the residuals. (D) Overall significance of the regression model.
(C) Standard deviation of the residuals. The SEE represents the standard deviation of the prediction errors (residuals), indicating the average dispersion of the actual data points around the fitted regression line.
65
A lower Standard Error of Estimate (SEE) indicates a: (A) Weaker linear relationship between the variables. (B) Poorer fit of the regression model. (C) Better fit of the regression model and more precise predictions. (D) Higher coefficient of determination (R²).
(C) Better fit of the regression model and more precise predictions. A lower SEE signifies that the data points are closer to the regression line, implying smaller prediction errors and a better fit of the model to the data.
66
The Coefficient of Determination (R²) is defined as the: (A) Standard deviation of the residuals divided by the standard deviation of the dependent variable. (B) Proportion of the total variation in the dependent variable explained by the independent variable(s). (C) Square root of the correlation coefficient. (D) Ratio of the explained variation to the unexplained variation.
(B) Proportion of the total variation in the dependent variable explained by the independent variable(s). R² measures the percentage of the total variation in the dependent variable that is accounted for by the regression model. It indicates the goodness of fit.
67
If the Sum of Squares Regression (SSR) is 75 and the Total Sum of Squares (SST) is 100, the Coefficient of Determination (R²) is: (A) 0.25 (B) 0.75 (C) 1.33 (D) -0.25
(B) 0.75 R² = SSR / SST = 75 / 100 = 0.75. This means that 75% of the total variation in the dependent variable is explained by the regression model.
68
In simple linear regression, the Coefficient of Determination (R²) is equal to: (A) The square root of the correlation coefficient. (B) The correlation coefficient. (C) The square of the correlation coefficient. (D) 1 minus the Standard Error of Estimate.
(C) The square of the correlation coefficient. For simple linear regression (with one independent variable), R² is equal to the square of the Pearson correlation coefficient (r) between the independent and dependent variables.