Topic 2: Correlation, Simple, and Multiple Linear Regression Flashcards
correlation
displays the form, direction, and strength of a relationship
pearson’s correlation
measures the direction & strength of a linear relationship between two quantitative variables
covariance
indicates the degree to which x and y vary together
interpreting covariance
positive = x and y move in the same direicotn
negative = x and y move in opposite directions
0 = x and y are independent
when should we not use r?
- when two variables have a non-linear relationship
- observations aren’t independent
- outliers exist
- homoscedasticity is violated
- the sample size is very small
- both variables are not measured on a continuous scale
point-biserial correlation
binary & continuous variables
phi coefficient
two binary variables
spearman’s rho
- two ordinal variables
- recommended when N > 100
kendall’s tau
- two ordinal varibales
- recommended when N < 100
counfounder
an association between two variables that might be explained by some observed common factor that influences both
lurking factors
potential common causes that we don’t measure
partial correlation
the correlation between two variables after the influence of another variable is removed
hypotheses for significance of a correlation coefficient
- H₀: ρ = 0 (ρ = population correlation)
No linear association between the two variables - H₁: ρ ≠ 0 Linear association between variables
simple linear regression
- used to study an asymmetric linear relationship between x and y
- describes how the DV changes as a single IV changes
β
the relationship of x on y
linear regression equation
- Ŷ = ɑ + βX
- Ŷ = predicted line
- ɑ = intercept
- β = slope
method of least squares
- makes the sum of the squares of the vertical distances of the data points from the line as small as possible
- minimizes ss (error)
stating hypotheses for the significance of the slope in simple linear regression
- H₀: β = 0 (There is no linear relationship between x & y)
- H₁: β ≠ 0 (There is a linear relationship between x & y)
t-test formula
t = sample statistic/ standard error
standard error
standard deviation of a sample population
assumptions to apply t-test to slope
normal distribution & independence of observation
partitioning variance in simple linear regression
- ss (regression) = variation in y explained by the regression line
- ss (error) = variation in y unexplained by the regression line
r²
the proportion of total variation in y accounted for by the regression model
interpreting r²
0 = no explanation at all
1 = perfect explanation