corrolation Flashcards

Question

What are the three main reasons to study relationships between variables?

Answer 1

Description – To describe patterns in data. Explanation – To understand causal relationships. Prediction – To estimate unknown values.

Answer 2

A statistical method to model the relationship between one explanatory variable and one response variable.

Answer 3

Y=β0+β1X+ε Y is the response variable, 𝑋 is the explanatory variable, 𝛽0 is the intercept, 𝛽1 is the slope, ε is the error term.

Answer 4

Linearity – The relationship between 𝑋 and 𝑌 is linear. Independence – Observations are independent. Homoscedasticity – Constant variance of errors. Normality – Errors are normally distributed.

Answer 5

The variable we aim to explain or predict (denoted as 𝑌).

Answer 6

The variable used to explain or predict the response variable (denoted as 𝑋)

Answer 7

It is the expected value of 𝑌 when 𝑋=0, or where the regression line crosses the y-axis.

Answer 8

When the explanatory variable (𝑋) cannot realistically take a value of zero (e.g., age in a salary regression).

Answer 9

It describes the expected change in 𝑌 for a one-unit increase in 𝑋.

Answer 10

𝛽1>0 → Positive relationship 𝛽1=0 → No relationship 𝛽1<0 → Negative relationship

Answer 11

It represents the difference between the observed and predicted values of 𝑌, accounting for variability not explained by 𝑋.

Answer 12

Errors (𝜖𝑖) are assumed to be normally distributed with mean zero: ϵi ∼N(0,σ^2)

Answer 13

It finds the line that minimizes the sum of squared residuals (SSR), ensuring the best fit for the data.

Answer 14

It ensures that the fitted line has the smallest possible sum of squared differences between observed and predicted values, leading to optimal parameter estimates.

Answer 15

The sum of squared residuals (SSR), ensuring the best-fitting line.

Answer 16

The vertical differences between observed data points and the fitted regression line.

Answer 17

β^0= yˉ−β^1xˉ

Answer 18

It provides optimal estimates and coincides with maximum likelihood estimation under normality assumptions.

Answer 19

lm(), which stands for linear model.

Answer 20

simple.lm <- lm(y ~ x, data=exData)

Answer 21

By printing the model object:simple.lm

Answer 22

y^=12.426+(1.902×50)=107.526

Answer 23

The relationship may not remain linear beyond the observed data, leading to inaccurate predictions.

Answer 24

The estimated standard deviation of residuals, which measures the spread of observed values around the fitted regression line.

Answer 25

It adjusts for the number of predictors, providing a more accurate measure of model fit when multiple explanatory variables are present.

Answer 26

The proportion of variance in the response variable explained by the explanatory variable(s).

Answer 27

It tests whether at least one predictor variable is significantly associated with the response variable.

Answer 28

H0: βi =0, meaning the predictor has no effect on the response variable.

Answer 29

Strong evidence against 𝐻0, suggesting the predictor is statistically significant.

Answer 30

It categorizes p-values using ***, **, *, and . to show different levels of statistical significance.

Answer 31

The distribution of residuals (errors), including minimum, 1st quartile, median, 3rd quartile, and maximum values.

Answer 32

For each 1-unit increase in SST, the predicted WeightA increases by 1.72 units on average.

Answer 33

On average, the actual WeightA values deviate by about 2.093 units from the predicted values.

Answer 34

It refers to how well the model explains the variability in the response variable.

Answer 35

R² ranges from 0 to 1: R² = 1 → Perfect fit R² = 0 → Model explains no variability R² = 0.7954 → Model explains 79.54% of the response variable’s variability.

Answer 36

It decomposes total variability into: Model Sum of Squares (SSModel) → Explained variation Residual Sum of Squares (SSRes) → Unexplained variation F-statistic → Measures model significance

Answer 37

F=t² Example: If t = 11.99, then F = (11.99)² = 143.9.

Answer 38

It suggests that at least one predictor variable significantly explains variation in the response.

Answer 39

It is the standard deviation of residuals, measuring how much actual values deviate from predictions.

Answer 40

Regression models explain relationships between variables. R² measures how much variation is explained. ANOVA tests overall model significance. F-statistic determines if predictors improve the model. RSE shows the spread of residuals.

Answer 41

MSE=RSE Example: If RSE = 2.093, then MSE = (2.093)² = 4.38.