Exam 2 Flashcards
(67 cards)
What is sample correlation
Measures the strength of the linear relationship between two variables. It is denoted by R, and zero indicates no corrrelation
Association vs correlation
Association is about general relatedness of two variables, while correlation is about linearity specifically
rnorm() function
Generates a vector of random numbers with a normal distribution
myCor() function
Peforms correlation and linear regression for all paris of numeric columns in input dataframe.
Do outliers have an effect on correlation
Yes, they can make the correlation artifically high or low
What is jittering
Adding a small amount of random normally distributed noise to both the x and y values. Allows us to see observations more clearly
What function allows us to calculate correlation
cor()
What function allows us to fit a regression line to data
lm()
What function allows us to calculate the confidence interval for the true correlation
cor.test()
What is bootstrapping
Allows us to estimate the distribution of an estimator by resampling data. samples with replacement
Parametric vs non parametric tests
Parametric tests assume certain conditions of the data (usually assumptions about normality, variance, standard deviation, etc). Nonparametric tests make fewer assumptions
What does it mean if the bootstrap confidence interval is wider than the theoretical ci
Underlying assumptions of the model are maybe not satisfied. Heteroskedasticity and outliers can contribute to this
What is a permutation test / what does it do
Allows us to quantify the difference/relationship between groups/variables. Permuted data is essential just reshuffled data
sample() function
Takes a sample of data with or without replacement. if replace = TRUE, bootstrapping. if replace = FALSE, permutation test
rep()
rep(x, …)
Replicates the values in x?
corrplot()
Visual represantion of correlations
What are residuals?
Estimated errors of regression (aka difference between estimated and actual values of regression line
How do we estimate the standard deviation of the residuals
Use sample standard deviation of the residuals as our estimate
What are the assumptions of a linear regression model
linearity and normal distribution of errors
What is a good way to see if data is normally distribtued
Make a normal quantile plot
If a linear fits data well, what should the residuals vs. fitted values plot look like?
Formless blob. No patterns
After fitting a model, what do we need to do?
See if we’ve met the model assumptions
What diagnostics dow e perform to see if we’ve met model assumptions
normal quantile plot to test for normality, and plot of fitted vs residuals
What does r squared do
Measures the percentage of variability of Y explained by the model (X’s