R Parameterized Modeling Flashcards

1
Q

Omit Missing Values From Data Frame

A

na.omit(DataFrame)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to count missing values in a column

A

sum(is.na(DataFrame$column))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to perform Best Subset Selection

A
"library(leaps)
regfit.full = regsubsets(Salary~., Hitters)
#by default, this only tests up to the best eight variable model, 
#you can override this by using the nvmax option:
regfit.full = regsubsets(Salary~., data = Hitters, nvmax = 19)"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to interpret output of best subset model

A

It will return the best model for models with n variables. For example, for two variable model it will mark the appropriate two columns with an asterik

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to plot best subset using best subset selection

A

“plot(regfit.full, scale = ““adjr2””)

see ?plot.regsubsets for help”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Do forward/backward stepwise selection

A

“use method parameter in regsubsets function

regfit. fwd = regsubsets(Salary~., data= Hitters, method=”“forward””)
regfit. bwd = regsubests(Salary~., data = Hitters, method = ‘backward’)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Create a training/test set

A

“set.seed(1)
train = sample(c(TRUE, FALSE), nrow(Hitters), rep = TRUE)
test = (!train)
regfit.best = regsubsets(Salary~., data = Hitters[train, ])”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Create a matrix of X’s, which is a data structure used by many regression packages

A

“model.matrix(Formula, Dataframe)
this function creates a matrix with an intercept of 1, and dummy codes factor vars for you. Used by many regression functions behind the scenes to train and predict values. You must use this function incase some function does not have a predict function (like regsubsets)
remember - Formula = Y~., “

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What should you type in the R console to install the “car” package?

A

install.packages(‘psych’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What should you type to create a matrix “a” comprising all natural numbers from 1 to 10, with 2 rows and 5 columns.

A

a = matrix(1:10, 2, 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PCR in R

A

library(pls)

set. seed(2)
pcr. fit = pcr(Salary~., data = Hitters, scale = TRUE, validation = ‘CV’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PLS in R

A

library(pls)
set.seed(1)
pls.fit=plsr(Salary~., data=Hitters, subset = train, scale = TRUE, validation = ‘CV’)
summary(pls.fit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PLS vs. PCR

A

PCR tries to maximize variance explained by the predictors (the principal components), whereas PLS searches for predictors that explain variance in BOTH the predictors and the response - usually PLS explains more variance with fewer predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

polynomial regression

A

fit = lm(wage~poly(age, 4), data=Wage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly