R commands Flashcards
(19 cards)
Possible model fitting for binary grouped data
glm(cbind(successes, insuccesses)~x, family=binomial(link=*))
- * = logit canonical by default, probit, cloglog;
- Different links comparable with AIC only;
- summary(fit): βs give the effect on log-odds, have to exp(β) to get the effect on odds.
Possible model for binary ungrouped data
glm(cbind(y, n-y)~x, data=d, family=binomial(link=))
glm(y~x, data=d, family=binomial(link=))
- Plots make no sense, we would have two lines (one per option);
- Deviance test is unreliable;
- Same coefficients as grouped data;
- Much larger deviances, DFs and AICs;
- Do not compare AIC with grouped data’s;
- Different overdispersion than grouped and different β’s SE;
- Can use test=”F”in anova().
Overdispersion fitting
glm(y~x, family=quasi____(link=*))
- No AIC;
- t values instead of Z;
- φ estimated with Pearson statistics (not deviance!);
- Can use test=”F” in anova test.
Goodness of fit for grouped data GLMMs
Only if enough ni for each group! Results might differ between the two!
Deviance test, for total model (predictors > 1):
qchisq(0.05, summary(fit)$df.residual, lower.tail=F)
- If lower than ResDev, reject H0
pchisq(deviance(fit), summary(fit)$df.residual, lower.tail=F)
Pearson test:
pchisq(sum(residuals(fit, type=”pearson”)^2), summary(fit)$df.residual, lower.tail=F)
Goodness of fit for ungrouped data
Hosmer-Lemeshow test:
(hl = hoslem(d$y==1, predict(fit, type=”response”))
Single predictor testing
anova(fit, test=*)
- * = “Chisq”, “LRT”, “Rao”;
- * = “F” only if quasilikelihood approach or Gamma regression;
- One test per predictor, added sequentially.
Model comparison for nested models
anova(fitmin, fitmax, test=*)
- Not for different link functions;
- *=”F” if overdispersion estimation or Gamma regression.
Model comparison for non-nested models
AIC(fit)
- NOT for grouped vs ungrouped comparisons because the scale of y is different;
- For different link functions too;
Predictions for each group
predict(fit, newdata=NULL, type=*)
- * = response gives the parameter, link gives exp(βs).
Possible model fitting for count data
glm(y~offset(log(delta))+x, data=d, family=poisson(link=*))
* = log default, identity, sqrt
- exp(Intercept) = average number of y;
- If we have an offset, exp(coef) are the effects on the rate of y, baseline count = delta * exp(β).
Offset testing for count data
H0: log(delta)=1
fit0 = glm(y~log(delta)+x, family=poisson)
C = rep(0, p)
C[2] = 1
d = 1
lht(fit0, C, rhd=d)
Change in reference category
new.x = relevel(d$x1, “newref”)
contrasts(d$x) = contr.treatment(#levels, base=#ofnewreflevel)
- Deviances stay the same;
- β’s significance might change.
Possible model fitting for nonnegative data
glm(y~x, data=d, family=Gamma(link=”log”)
- Overdispersion is always estimated via Pearson statistics;
- The statistics are now t (and not Z);
- If log is not specified, canonical one is used and signs are flipped;
- Can now use fit=”F” in anova();
- Never compare AICs with a logN model.
Unordered categories modelling
library(mnet)
multinom(y~x, d)
- Coefficients and std errors for every c-1 category and covariate;
- exp(coef) is the increase/decrease in odds of being in that category relative to the reference one;
Nested model comparison for categorical regression
dev = fitmin$dev - fitmax$dev
df = fitmax$edf - fitmin$edf
1-pchisq(dev, df)
Ordered non-sequential categories modelling
polr(y~x, data=d)
- Parametrization gives coefficients η = -β;
- Thresholds for passing categories (latent variable) are “intercepts”;
- exp(β) < 1: we favor lower categories, effect on the odds;
- exp(β*10): effect on original scale of increasing x by 10.
Possible model fitting for Generalized Linear Mixed Models
fit = glmer(y~x+(1|group), data=d)
fit = glmer(y~x+(1+x|group), data=d)
fit = glmer(y~x+(1+x||group), data=d) if we want uncorrelated gammas
- Fixed effects βs are on the linear predictor’s scale;
- Correlation is the rescaled Fisher matrix;
Model comparison for Generalized Linear Mixed Models
AIC(fit)
- Even for GLMs and GLMMs.
anova(fit1, fit2, test=”Chisq”)
- Only between same type of models.
Possible model fitting for Generalized Additive Models
fit = gam(y~s(x, bs=”tp”), data=d)
fit = gam(y~te(x1, x2), data=d) if we use bivariate, we use tensors
- Parametric coefficients (intercept) unless -1 in formula;
- edf is how many parameters I’m using to estimate the polynomial of non-linear effect
- plot(fit, shift=coef(fit)[1]) if we want to include the intercept too;
- Trust only highly data-dense zones!