Exam 2 Flashcards

Question

Influential point vs. outlier

Answer 1

Influential point: a point if, which removed, causes large change in fit of model outlier: point with a large residual

Answer 2

Check model assumptions, identify significant predictors, perform regression, check relationships (plots), identify variables (response and predictors)

Answer 3

a situation where variance is not contanst across all observations. Like if residuals get bigger as x increases

Answer 4

The square of the correlation

Answer 5

the amount of variability in the response variable Y that is explained by the regression model

Answer 6

If a more complicated model has a much higher R squared, then it is probably a better model. If a more complicated model has only a slightly higher R squared, then the simpler model is better

Answer 7

To account for the number of terms used in a multiple regression model. It is smaller than R squared because it adds a penalty for the number of terms used in the model

Answer 8

measures the goodness of fit of a model. Assumes normally distrubuted errors with constant variance

Answer 9

Smaller AIC better

Answer 10

Similar to AIC (measures goodness of fit of a model), but ti gives a larger penalty for using more parameters to fit model. Assumes normally distributed errors with constant variance. Smaller is better again

Answer 11

siggcor<- cor.mtest(x, conf.level = .95) corrplot.mixed(x, p.mat = sigcorr$p

Answer 12

Performs best subset selection (identifies best model based on # of predictors), where best is quantified using RSS

Answer 13

sum of squares of residuals aka deviations predicated from actual data

Answer 14

which.max(x$rsq) where x is summary statistics for a model (using regsubsets)

Answer 15

Aka dummy variables. Binary variables that take the value zero 1. Each indicator variable if the observation is of the specified level, zero otherwise.

Answer 16

No, one level of each categorical variavle is not entered into the model. The omitted level becomes the reference level for the model

Answer 17

Heteroskedasticity

Answer 18

If there is heteroskedasticity and the variance is some function of the mean boxcox(x) where x is a linear model (lm) of some data

Answer 19

Variable selection method in multiple regression --> starts with all predictors and iteratively removes least significant ones until only statistically significant variables are left

Answer 20

Model selection technique that examines all possible combinations of predictor varibales to find the best fitting model for a given response variable (selecting the best SUSBET of predictors)

Answer 21

If we have categorical predictors

Answer 22

If any level (indicator variables) are significant, leave the entire categorical variable (all levels) in the model. If all levels non-significant, remove the categorical predictor Use ANOVA table which tests all levels simultaneously

Answer 23

Look at how means change for various combinations of data & variables

Answer 24

Anova(x, type = 3) where x is an object with a model of some sort?

Answer 25

data has a normal distribution, same standard deviations (but maybe diff means), and all observations are independent. conditions met as long as ratio of largest to smallest group sample standard deviations less than 2

Answer 26

qq plot of residuals and fitted vs residuals plot

Answer 27

check the ratio of max/min standard deviations

Answer 28

aov() or lm()

Answer 29

use “-1” egs. lm(x ~ y -1)

Answer 30

Using bartlett’s testz bartlett.test(data, groups) Non signficant p value means there is no difference in variances

Answer 31

Using the levene test leveneTest(data, groups)

Answer 32

When we have unequal variances. oneway.test(x ~ y, data = df)

Answer 33

Non parametric test- we use it when the assumptions of anova are not met (aka when variances are not equal or when data isnt normally distributed) kruskal.test()

Answer 34

Makes data more normally distributed. Transforms the data by applying a power to it (lamda). Lamda is the x axis of the box cox plod

Answer 35

When we have two independent variables (factors) example: comparing avg test scores of students at different schools AND different grade levels aov()

Answer 36

To get summary information of an anova model?

Answer 37

It tells us how different the means are. TukeyHSD(aov)

Answer 38

Log transformation

Answer 39

Anova(lm, type = 3)

Answer 40

A covariate is a categorical predictor, while a factor is a categorical predictor

Answer 41

Ancova stands for analysis of covariance. We are fitting a separrate slope between x and y for each level of some other categorical variable.

Answer 42

Generalized linear model. An umbrella term for multiple kinds of linear models including: regression, one way anova, two way anova, one and two sample tests

Exam 2 Flashcards

(67 cards)