chemometrics quiz 2 Flashcards

1
Q

bicariate vs multivariate

A

bivariate looks at two data sets and tells how related - multivariate explains relationship between more than 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Covariance vs correlation

A

Covariance - how two data sets change or vary together in tandem
Correlation - tells you when a change in one variable leads to a change in another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you calculate covaariance

A

Take mean of x and y
Take each point and subtract its x and y from their respective means and sum up
then divide by N-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Scales of covariance vs correlation

A

covariance affected by change in scale - correlation isn’t
covariance keeps units correlation doesn’t
each are - when the two are independent
covariance from infinity to negative infinity correlation is from 1 to -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to calculate correlation

A

Covariance divided by (stdev x *stdev y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to program these

A

cor and cov

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

whats a corrgram

A

correlation matrix so basically the same variables on x and y axis s and see how they correlate, can be picture or colored etc - match top and bottom typically show how much and in what direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do p alues work with correlation

A

p vallue < 0.05 means correlation coefficient different than 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whats a scatter plot matrix?

A

same idea as a corrgram but each space has an actual scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you plot scatter plot matrix

A

pairs() funcion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Partial Correlation

A

between two quantitative variables - controlling for one or more quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

WHat is regression used for

A

1) IDing explanatory variables that are related to an outcome/response variable
2) Describe the form of a relationship between dependant and independent variable (general relationship()
3) Provide an equation to predict response variable from explanatory variable (cal curve)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ordinary least squares regression what is it

A

quantitative depedant variable predicted from a weighted sum of predictor variables where weights are parameters estimated from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

goal of regression

A

choose model paramteres (Y and B! sloe and intercept) - that minimize difference between actual and predicted model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a residual

A

the difference between the observed and fitted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is true about sum of residuals

A

in a linear regression model - sum of residuals should be 0

17
Q

Wghat determines best fit

A

minimize sum of squared differences

18
Q

ASSUMPTIONS FOR LINEAR REGRESSION

A

Error in your x should be negligible
Error with y vlue must be normally distributed (dependant variable needs to be normally distributed)
Varian in error across y should be constant across area of interest (stdev constant)
x and y should be continuous

19
Q

Whats a high leverge point and how to deal

A

a point that has more influlence on r^2
can deal with by having event spaced values

20
Q

How to calculate regression

A

determine residuals then calculate residual standard dev - deviation of data points form regression line
stdev for slope
stdev for y intercept ( y - y fitted) squared and then summed

21
Q

How to insepct residuals

A

plot them - should be scattered around zero with no pattern

22
Q

How to tell how influential a data point is

A

COOKS DISTANCE -

23
Q

How do t test and p vallue relate to regression

A

if p value less than 5 - significantly different than - THERE IS A RELATIONSHIP

24
Q

R^2

A

shows how well the points fit

25
How to test your prediction analysis
Make the regression with 80% while saving 20% to test afterwards ( split randomly
26
What is the minmax and MAPE
min max accuracy and MEAN ABSOLUTE PERCENT ERROR - tells you how far its off from a perfect model (1 is perfect) MAPE same thing - MAPE is 10 on average if forecast off by 10% its 100-MAPE (eg a MAPE of .49 means its 51% accurate
27
POLYNOMIAL REGRESSION (2nd order)how does it change
instead of y= a + bx y = a +bx + cx^2
28
What is multiple linear regression used for
When ou have more than one predictor variable - eg the more predictors the more - cubic is 3
29
how to interpret regression slope
basically the slope indicates change in one variable in comparison to another eg increase in dependant variable for one unit of the independent
30
Confidence interval for regression
95% confidence interval says 95% confident that the interval contains the true value AHH - so we're predicting right - 95% confidence says the actual value is between these ranges vs just an absolute here's this value
31
How do you test if dependant variables are independt
DURBIN WATSON
32
What is global validation
a test that performs a variety of tests to see if regression is valid skew, urotsis, equal variancesetc
33
How to test for outliers
outlierTest() gives bonferroni adjusted pvalue
34
what is hat statistics
hat tells you if theres a high leverage point (is an unusual combination) - can set the statistic and plot 2 or 3 times hat and see where things end up p/n (p is number of parameters in model including intercept and n is sample size
35
What is an AV plot
added variable plot another way to test for influential points
36
What is an influence plot
shows outliers LEVERAGE and influential observations in one plot - size is the influence, shows hat value on one
37
What are corrective measure if ID problesm with regression
delete observations, transform variables, add or delete variables, change approach
38
What is a nested model for
to look at multiple predicotrs and see which one does the best job to explain
39
What is AIC
Akaike information criteriion - again takes into account eh models fit - a smaller value is preferred (can take in multiple predictors or less and show whats better fit