R Sampling Methods Flashcards

1
Q

How to pull a random sample in R

A

sample() Can use this to subset something. Say something has 392 rows. sample(392,196) will give you 196 random number in the range 1-392.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Subsetting within predictive model

A

lm(y~. data=DF, subset = train)
where train can either be a (1) boolean vector or (2) numerical vector corresponding to the rows of the data frame you wish to subset.

Can use this in conjunction with sample() to manually create a vector corresponding to rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to perform cross validation

A

library(boot) # has cv functions in it
cv.glm() function is part of this library
glm_fit = glm(mpg~horsepower, data = Auto)
glm_err= cv.glm(Auto, glm_fit)
glm_err$delta #contains the cross validation error, when LOOCV two numbers output from this will be the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to generate a polynomial equation in formula

A

poly()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to perform k-fold cross validation

A

library(boot)
glm_fit = glm(mpg~horsepower, data = Auto)
glm.err = cv.glm(Auto, glm.fit, K=10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For loop in R

A

for (i in 1:10){
glm.fit = glm(mpg~poly(horsepower, i), data= Auto)
cv.error.10[i] = cv.glm(Auto, glm.fit, K=10)$delta[1]
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Omit Missing Values From Data Frame

A

na.omit(DataFrame)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to count missing values in a column

A

sum(is.na(DataFrame$column))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly