chemometrics quiz 2 Flashcards
bicariate vs multivariate
bivariate looks at two data sets and tells how related - multivariate explains relationship between more than 2
Covariance vs correlation
Covariance - how two data sets change or vary together in tandem
Correlation - tells you when a change in one variable leads to a change in another
How do you calculate covaariance
Take mean of x and y
Take each point and subtract its x and y from their respective means and sum up
then divide by N-1
Scales of covariance vs correlation
covariance affected by change in scale - correlation isn’t
covariance keeps units correlation doesn’t
each are - when the two are independent
covariance from infinity to negative infinity correlation is from 1 to -1
How to calculate correlation
Covariance divided by (stdev x *stdev y)
how to program these
cor and cov
whats a corrgram
correlation matrix so basically the same variables on x and y axis s and see how they correlate, can be picture or colored etc - match top and bottom typically show how much and in what direction
How do p alues work with correlation
p vallue < 0.05 means correlation coefficient different than 0
Whats a scatter plot matrix?
same idea as a corrgram but each space has an actual scatter plot
How do you plot scatter plot matrix
pairs() funcion
Partial Correlation
between two quantitative variables - controlling for one or more quantitative variables
WHat is regression used for
1) IDing explanatory variables that are related to an outcome/response variable
2) Describe the form of a relationship between dependant and independent variable (general relationship()
3) Provide an equation to predict response variable from explanatory variable (cal curve)
Ordinary least squares regression what is it
quantitative depedant variable predicted from a weighted sum of predictor variables where weights are parameters estimated from data
goal of regression
choose model paramteres (Y and B! sloe and intercept) - that minimize difference between actual and predicted model
What is a residual
the difference between the observed and fitted value
what is true about sum of residuals
in a linear regression model - sum of residuals should be 0
Wghat determines best fit
minimize sum of squared differences
ASSUMPTIONS FOR LINEAR REGRESSION
Error in your x should be negligible
Error with y vlue must be normally distributed (dependant variable needs to be normally distributed)
Varian in error across y should be constant across area of interest (stdev constant)
x and y should be continuous
Whats a high leverge point and how to deal
a point that has more influlence on r^2
can deal with by having event spaced values
How to calculate regression
determine residuals then calculate residual standard dev - deviation of data points form regression line
stdev for slope
stdev for y intercept ( y - y fitted) squared and then summed
How to insepct residuals
plot them - should be scattered around zero with no pattern
How to tell how influential a data point is
COOKS DISTANCE -
How do t test and p vallue relate to regression
if p value less than 5 - significantly different than - THERE IS A RELATIONSHIP
R^2
shows how well the points fit