DSE R CODE Flashcards

(34 cards)

1
Q

reading data

A

Advertising = read.csv(“Data/Advertising.csv”, head = TRUE)
head(Advertising)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Generate lm model

A

lm1 = lm(sales ~ TV, data = Advertising)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Generate r output table

A

summary(lm1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Gnerate confidence interval for coeff of variable and constnat (95% or 90%)

A

confint(lm1)
confint(lm1,level=0.9)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to plot scatter plot? Sales against tv

label axis
colour red
dots
with regression line line width 3 and colour boue

A

plot(x = Advertising$TV, y = Advertising$sales,
xlab = “TV”, ylab = “Sales”, col = “red”, pch = 19)
abline(lm1, col = “blue”, lwd = 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to get coefficient from R output table or summary table?>

A

summary(lm4)$coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to determine whether there is relationship between variables? 4dp

A

round(cor(Advertising), digits = 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to get adjusted r squared?

A

summary(lm_mpg1)$adj.r.squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to load data from ISLR package?
Get auto fata

A

library(ISLR)
data(Auto)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to generate residuals plot( such as q-q plot) ?

A

par(mfrow = c(2, 2))
plot(lm_mpg1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to generate scale-location plot?

A

plot(lm_mpg2, which = 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

xclude name variable from linear model (mpg= everything except name)

A

lm_mpg4 = lm(mpg ~ . - name, data = Auto)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to add abels to nominal categorical data?

A

Auto$origin = factor(Auto$origin, labels = c(“American”, “European”, “Japanese”))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to generate logistic model ?

A

glm_fit = glm(default ~ balance, data = Default, family = binomial)

need to put familiy=binmomial

BINOMIAL IS NOT A STRING!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to have predicted probabilities from your own data

A

df_new = data.frame(student = c(“Yes”, “No”),
balance = c(1500, 1500), income = c(40000, 40000))

predict(glm_fit, newdata = df_new, type = “response”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to generate confusion matrix from iyself? START FROM PREDICTION STEP

A

STEP1: PREDICT FIRST
glm_prob = predict(glm_fit, type = “response”)

STEP 2: TABLE GENERATION
table(glm_prob >= 0.5, Default$default)

17
Q

How to plot ROC curve and AUC? (Start from prediction)

CURVE: LINEWIDTH 3
COLOUR BLACK

PASTE TEXT
Add line at 0 slope 1 ,dotted , line width 1

A

STEP 1: PREPARE DATA
pred = prediction(glm_prob, Default$default) #performed on training data

perf = performance(pred, measure = “tpr”, x.measure = “fpr”)

auc_perf = performance(pred, measure = “auc”)@y.values[[1]]

STEP 2: Plot ROC curve
plot(perf, lwd = 3, col = “black”)

abline(0, 1, lwd = 1, lty = 2) # add dashed diagonal line

text(0.4, 0.8, paste(“AUC =”, round(auc_perf, 2))) # add text

18
Q

How to generate validation set?

A

set.seed(1101)

train = sample(10000, 5000, replace = FALSE)

Default_train = Default[train, ]

Default_test = Default[-train, ]

19
Q

How to generate confusion matrix from training set?

A

glm_prob_train = predict(glm_fit2, type = “response”)

table(glm_prob_train > 0.5, Default_train$default)

20
Q

How to get trainign error after generating train results?

A

glm_pred = ifelse(glm_prob_train > 0.5, “Yes”, “No”)

mean(glm_pred != Default_train$default)

21
Q

How to plot residual vs leverage plot?

A

plot(lm1, which =5 )

which =5!!!! not which =4 (cook distance)

22
Q

What is the syntax for knn?

A

library(kknn) #not knn

fit=kknn(model, train, test, k, kernel=“rectangular”)

23
Q

How to choose not to standardize variable for knn

A

fit=kknn(model, train, test, k, kernel=“rectangular”, scale =FALSE)

24
Q

How to generate naive bayes model

A

library(es1071)
fit = naiveBayes(model, data)

25
vWhat assumption does the naive bayes function in r rely on ? (slide 19)
assumption of Normal distribution for quantitative predictors.
26
Do you tune parameters to select for naive bayes in R? (slide 19)
no
27
What is the tree basic syntax?
tree.fit=tree(y~x, data, tree.control)
28
What is the rpart basic syntax?
tree.fit=rpart(y~x, data, rpart.control)
29
What is the syntax for kmenas? Explain the parameters
kmeans(x, centers=k, nstart=n) “centers” specifies K, “nstart” tells R how many random initializations we want to perform.
30
What is the function for pca?
prcomp(x, scale=TRUE) scale means standardization
31
What is the syntax of pcr? What to do if we want to use LOOCV?
fit = pcr(y~ ., data = x, scale = TRUE, validation = “CV”) validation=CV” selects M by 10-fold CV. Use “LOO” for LOOCV.
32
How to plot CV MSE for pcr?
validationplot(fit, val.type=“MSEP”)
33
How to predict on valuews on pcr?
pred = predict(fit, x[test,], ncomp = M)
34