Unit 3 Flashcards
(13 cards)
What is covariance?
a measure of the relationship between two random variables and to what extent they change together
What is correlation?
quantifies the association between two continuous variables (a standardized version of covariance)
What is collinearity?
when the independent variables in a regression are correlated, each variable makes little additional contribution
what three things does collinearity do?
can inflate the standard error (SE), indicates redundancy in predictors, and associated p-values will be too high and can affect conclusions
what is multi-collinearity?
more than two independent variables are highly correlated in a regression
what are 6 assumptions that go with correlation?
random sample or representative of population, independent observations, x values are not used to compute y values, x values are not experimentally controlled, both x and y follow a normal (Guassian) distribution, and all covariation is linear, and no outliers
when would we use pearson correlation?
with continuous data
how does spearman rank correlation coefficient work?
separately ranks X and Y values and then computes the correlation between the two sets of ranks, it looks at the monotonic relationship
what is a monotonic relationship?
a function that either increases or decreases consistently across its domain
what does the variance inflation factor show?
how combination of independent variables predict each other, 1=no multicolinearity, typically between 2-10
what are the 5 steps of principal components analysis?
- standardize the range of continuous variables
- compute covariance matrix
- compute eigen vectors and eigen values of covariance matrix
- create feature vector to decide what components to keep
- recast the data
what is PCA doing?
transforming a large set of variables into smaller ones that still contains most of the information
how does PCA reduce multicollinearity?
seeing how much variable account for by each component is explained by independent variables and uses new components instead of original independents