Week 6: Regression and Classification Flashcards

1
Q

RevERSED

Advantage: have the potential to accurately fit a wider range of possible shapes for f
Disadvantage: a very large number of observations is required to obtain an accurate estimate for f

A

What is the advantage and disadvantage of non-parametric methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

RevERSED

predict(model)

A

How do you predict the outcome of the model in R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RevERSED

Presence of a funnel shape in the residual plot. Transform Y to log(Y) or sqrt(Y)

A

How can you detect non-constant variance of error terms? What is a solution?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

RevERSED

Use least squared approach to minimise RSS (residual sum of squares)

A

How do you find the coefficients of a simple linear regression model?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RevERSED

LOOCV has higher levels of variance than k-fold because the model outputs are highly correlated with each other and therefore the mean has higher variance

A

Why does k-fold CV give more accurate estimates of MSE than LOOCV?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RevERSED

One less than the number of levels, because there is a baseline level with no dummy variable

A

How many dummy variables will there be when there is a predictor with more than 2 levels?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

RevERSED

  • Forward selection: begin with null model. Then fit p simple linear regressions and add to the null model the variable that results in the lowest RSS. Continue adding variables until some stopping rule is satisfied
  • Backward selection: start with all variables, remove the variable with the largest p-value, continue removing variables until a stopping rule is reached
  • Mixed selection: combination of forward and backward. Start with no variables in the model. Add the variable that provides the best fit. Continue to add variables one by one. If at any point the p-value for one of the variables in the model rises above a certain threshold, then remove that variable from the model. Continue until all the variables in the model have a sufficiently low p-value and all variables outside the model would have a large p- value if added to the model
A

What are the three approaches for deciding which variables to include in a model? How do they work?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

RevERSED

Do no make explicit assumptions about the functional form of f

A

What are non-parametric methods?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

RevERSED

the percentage of Falses that are identified correctly = TN/(TN+FP)

A

What is specificity?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RevERSED

Randomly divide the set of observation into k groups (or folds) of approximately equal size. The first fold is treaded as a validation set and the method is fit on the remaining k-1 folds. Repeat k times and get k estimates of the MSE. Find the average

A

What is k-fold cross validation?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

RevERSED

irreducible and reducible error

A

What does the accuracy of Y* as a prediction for Y depend on?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

RevERSED

Automatically outputs the log odds. To change it:
predict(lr_mod, type=“response”)

A

What is the default when using predict with logistic? How do you change it?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

RevERSED

predicted probability versus observed proportion, should be a straight line with slope 1

A

What is a calibration plot?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RevERSED

splits

A

What is a code for splitting data into test and train in R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RevERSED

recursive partitioning . Find the split that makes observations as similar as possible on the outcome within that split. Do that again with each resulting group. Stop at stopping parameter

A

What are classification trees?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

RevERSED

predict(model, newdata = data)

A

How do you predict the outcome of the model in R based on new data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

RevERSED

Tend to overfit
Use them as a basic building block for ensembles

A

What is the problem with regression trees? What can they be used for?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

RevERSED

Compute the standard error of B0 and B1

A

How do you assess the accuracy of the coefficient estimates?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

RevERSED

  • Additivity assumption: that the association between a predictor X and the response Y does not depend on the values of the other predictors
  • the error terms e1, e2, … are uncorrelated
  • the error terms have a constant variance, Var(ei) = sigma squared
A

What are the assumptions of the linear model? (3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

RevERSED

Use dummy variables. 0 for one, 1 for other. Or -1 and 1

A

How do you put a categorical variable in a linear regression model?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

RevERSED

as p increases (more dimensions), a given observation has no nearby neighbours

A

What is the curse of dimensionality?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

RevERSED

bias initially decreases faster than variance increases, so the MSE declines. But at some point increasing flexibility has more impact on the variance, so the MSE increases.

A

What happens to the MSE as you increase flexibility?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

RevERSED

Given a value for K and a prediction point x0, KNN regression first identifies the K training observations that are closest to x0 represented by N0. Then it estimates f(x0) using the average of all the training responses in N0

A

How does k nearest neighbours regression work?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

RevERSED

if we include an interaction in a model we should also include the main effects, even if the p-values associated with their coefficients are not significant

A

What is the hierarchical principle?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

RevERSED

Advantage: reduce the problem of estimating f down to one of estimating a set of parameters
Disadvantage: will usually not match the true unknown form of f

A

What is the advantage and disadvantage of parametric methods?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

RevERSED

Y = B0 + B1X1 + B2X2 + B3X1X2 + e
Combination of predictors

A

What are interaction terms?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

RevERSED

p(X) = [e^ B0+B1X] / [1 + e^ B0+B1X]

A

What is the logistic function?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

RevERSED

the percentage of Trues that are identified correctly = TP/(TP+FN)

A

What is sensitivity/recall?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

RevERSED

small training MSE but large test MSE

A

What happens to MSE when model is overfitted?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

RevERSED

predicting class one if Pr (Y = 1 | X = x0) > 0.5

A

What does Bayes classifier correspond to in a two-response value setting?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

RevERSED

lr_mod

A

How do you fit a logistic regression model using R?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

RevERSED

increasing X by one unit changes the log odds by B1

A

How do you interpret B1 in a logistic regression model?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

RevERSED

mean squared error
MSE = 1/n SUM(y - predicted y)^2

A

What is the most commonly used measure for measuring the quality of fit? what is the formula?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

RevERSED

A model is perfectly calibrated if for any probability value p, a prediction of a class with confidence p is correct 100*p percent of the time

A

What is callibration?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

RevERSED

Z-statistic

A

What is used for hypothesis testing in logistic regression?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

RevERSED

pred_prob 0.5, labels = c(“No”, “yes”)
table(true=, predicted = pred_lr)

A

How do you produce table of observed vs predicted results when classified as probability?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

RevERSED

e: cannot be predicted using X, therefore the error introduced by e cannot be reduced

A

What is irreducible error?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

RevERSED

TP/(TP+FP)

A

What is the positive predictive value (PPV) / precision?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

RevERSED

When performing regressions with a single predictor shows a very different outcome to performing regressions with multiple predictors that are also relevant

A

What is confounding?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

RevERSED

ifelse(student==“Yes”, 1, 0)

A

How would you turn a vector of “yes” and “nos” into a vector of 1s and 0s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

RevERSED

(TP+TN)/(TP+FP+FN+FN)

A

What is accuracy?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

RevERSED

training MSE will decrease, but test MSE may not

A

What happens to MSE as model flexibility increases?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

RevERSED

There is a dataset, set of people trying to find prediction rule and a referee.
The referee runs the prediction rule against a testing dataset, which is sequestered behind a Chinese wall
The referee objectively and automatically reports the score achieved by the submitted rule
Results in declining error rate

A

What is the common task framework /benchmarking?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

RevERSED

B0 can be interpreted as the average Y among non-students. B0 + B1 as the average Y among students. B1 as the average difference in Y between students and non students

A

How do you interpret B0 and B1 when there is a dummy variable 1, when someone is a student, 0 when they are not a student

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

RevERSED

Assume that X= (X1, …Xp) is drawn from multivariate normal distribution with a class-specific mean vector and common covariance matrix

A

What are the assumptions for linear discriminant analysis when p>1?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

RevERSED

For each observation xi, there is an associated response measurement yi

A

What is supervised learning?

47
Q

RevERSED

P(A|B) = P(B|A) . P(A) / P(B)

A

What is Bayes theorem in general terms?

48
Q

RevERSED

e is a random error term which is independent of X and has mean 0

A

In the formula Y = f(X) + e, what are the properties of e?

49
Q

RevERSED

Model the distribution of the predictors X separately in each of the response classes. Then use Bayes theorem to flip these around into estimates for Pr(Y=k|X=x)

A

What is a generative model for classification?

50
Q

RevERSED

assigns an observation X=x to the class for which pk(x) is the largest (assigns each observation to the most likely class given predictor values) 
-produces the lowest possible test error rate, called the Bayes error rate
A

What does Bayes classifier do?

51
Q

RevERSED

TSS measures the total variance in the response Y before the regression is performed
RSS measures the variability that is left unexplained after performing the regression
TSS - RSS measures the amount of variability in the response that is explained by performing the regression

A

What do TSS and RSS each measure? and TSS-RSS

52
Q

RevERSED

Measures the proportion of variability in Y that can be explained using X
R^2 = (TSS-RSS)/TSS

A

What is the R squared statistic?

53
Q

RevERSED

less true positives and less false positives (ie less positives overall)

A

What will happen if you increase the cutoff value?

54
Q

RevERSED

Low-levels of smoothness can lead to overfitting

A

When selecting a level of smoothness for non-parametric methods, what is the trade-off?

55
Q

RevERSED

A single observation (x1, y1) is used for the validation set and the remaining observations make up the training set.
Find the MSE and repeat this approach n times and get the average of the n MSE estimates

A

What is the leave one out cross validation (LOOCV) approach?

56
Q

RevERSED

false positive rate vs true positive rate. The area under the curve gives the overall performance of a classifier

A

What is the ROC curve?

57
Q

RevERSED

  • validation estimate of the test-error rate can be highly variable depending on which observations are included in the training set and which are included in the validation set
  • not as many observations in the training set
A

What are 2 drawbacks of the validation set approach?

58
Q

RevERSED

Find estimate for fk(x) to estimate pk(x) by approximating bayes classifier

A

What is the aim of linear discriminant analysis?

59
Q

RevERSED

the error that is introduced by approximating a real-life problem which may be very complicated, by a simpler model. In general, more flexible methods result in less bias

A

What is bias?

60
Q

RevERSED

log(p(X)/(1-p(X))) = B0 + B1X

A

What are the log odds from a logistic function?

61
Q

RevERSED

seq(0, 40, length.out = 1000)

A

Code to generate sequence of 1000 equally spaced values from 0 to 40

62
Q

RevERSED

B0* +- 2.SE(B0*)
where 2 is actually the 97.5% quantile of a t-distribution with n-2 degrees freedom

A

How do you find the confidence intervals of the coefficient estimates?

63
Q

RevERSED

models the probability that Y belongs to a particular category

A

What is a logistic regression model?

64
Q

RevERSED

knn(train, test, cl, k)
train is the training set with only the predictor variables, test is the test set with only the predictor variables, cl is the outcomes from the training set, k is the desired k

A

How do you make a prediction vector using knn in R?

65
Q

RevERSED

1 - Accuracy

A

What is the error rate?

66
Q

RevERSED

Y = B0 + B1X

A

What is the simple linear regression model?

67
Q

RevERSED

An unusual value for xi
for multiple regression: point that is unusual in terms of the full set of predictors

A

What is a high leverage point?

68
Q

RevERSED

  • only has to be fit k times compared to n times in LOOCV
  • variability in the test error estimate is lower than when using the validation set approach
A

What are the 2 advantages of k-fold cross validation?

69
Q

RevERSED

F1 score = sqrt(precision x recall) 
it is not affected by uneven class distributions
A

What is the F1 score and what is the point of it?

70
Q

RevERSED

E(MSE) = bias-squared + variance + e

A

How can you write the expected test MSE

71
Q

RevERSED

ensemble of weak learners. Make trees that are too simple and make more of them for observations with big residuals, then average them

A

What is boosting?

72
Q

RevERSED

the estimated standard errors will be too low -> unwanted sense of confidence in model

A

What happens to the linear model if the error terms are correlated?

73
Q

RevERSED

  • fk(x) is normal
  • there is a common variance across all K classes
A

What are the 2 assumptions for linear discriminant analysis when p=1?

74
Q

RevERSED

yi = yi*. difference between observed and predicted response

A

What is the residual ei for each data point?

75
Q

RevERSED

It is still a linear model

A

What type of model is polynomial regression?

76
Q

RevERSED

model

A

How do you create a linear model in R? and get its coefficients and r^2 etc

77
Q

RevERSED

sum of all the residuals squared:
e1^2 + e2^2 …

A

What is the residual sum of squares (RSS)?

78
Q

RevERSED

Classifying a response variable with more than 2 classes

A

What is multinomial logistic regression?

79
Q

RevERSED

Cor(Y, Y*)

A

What is R-squared always equal to in multiple linear regression?

80
Q

RevERSED

F-statistic, large F-statistic to reject null hypothesis

A

What do you need to check the significance of multiple coefficients together e.g. B0 = B1 = B2 = 0?

81
Q

RevERSED

Pr(X|Y=k)

A

What does fk(x) represent in linear discriminant analysis?

82
Q

RevERSED

randomly divide the available set of observations into two parts, a training set and a validation set. The model is fit on the training set and the fitted model is used to predict the responses for the observations in the validation set

A

What is the validation set approach?

83
Q

RevERSED

Given a positive integer K and a test observation x0, the KNN classifier first indentifies the K points in the training data that are closest to x0, represented by N0. It then estimates the conditional probability for class j as the fraction of points in N0 whose response values equal j. KNN classifies the test observation x0 to the class with the largest probability.

A

How does the k-nearest-neighbours classifier work?

84
Q

RevERSED

f* will not be a perfect estimate for f and will cause some error, this error is reducible because we can potentially improve the accuracy of f

A

What is reducible error?

85
Q

RevERSED

AutoML performs all the steps of comparing different models using cross-validation, choosing parameters automatically
fit_aml

A

What is autoML and its code in R?

86
Q

RevERSED

Plot the standardised residuals. Those with absolute values greater than 3 may be outliers

A

How can you detect outliers?

87
Q

RevERSED

use the number of mis-classified observations rather than the MSE

A

How do you quantify the test error on classification problems?

88
Q

RevERSED

lda_mod

A

How do you fit a linear discriminant analysis model in R?

89
Q

RevERSED

A very low K value means the decision boundary is overly flexible and finds patterns in the data that don’t correspond to the Bayes decision boundary. This classifier has low bias but very high variance
A very high K value means the classifier becomes less flexible and produces a decision boundary that is close to linear. This is a low-variance but high-bias classifier

A

What is the trade-off for a low or high K value in KNN classifier?

90
Q

RevERSED

predict(model, newdata = data, interval = “confidence”)

A

How do you predict the outcome of the model in R based on new data and get the upper and lower confidence intervals?

91
Q

RevERSED

  • Prediction intervals are used to answer the question how much will Y vary from Y*
  • Prediction intervals are always wider than confidence intervals because they incorporate both the reducible error and irreducible error
A

What is a prediction interval?

92
Q

RevERSED

residual standard error:
RSE = sqrt[(1 / n-2)*RSS]
It is the average amount that the response will deviate from the true regression line.
It is an absolute measure of the lack of fit of the model

A

How do you assess the accuracy of the linear regression model?

93
Q

RevERSED

For each observation we observe a vector of measurements xi but no response yi. Seek to understand relationship between observations

A

What is unsupervised learning?

94
Q

RevERSED

If epilepsy is set as the baseline, the B(stroke)0 is interpreted as the log odds of stroke versus epilepsy given that x1 = … = x1 = 0. A one unit increase in Xj is associated with a B(stroke)j increase in the log odds of stroke over epilepsy

A

How do you interpret the coefficients of multinomial logistic regression? with stroke, overdose and epilepsy as the 3 classifications

95
Q

RevERSED

TN/(TN+FN)

A

What is the negative predictive value (NPV)?

96
Q

RevERSED

Hypothesis test. Can compute t-statistic using standard errors. p-value is the probability of observing a value equal or larger than t. small p-value indicates unlikely to observe such a substantial association between the predictor and the response due to chance.
If p-value is small, reject hypothesis that co-efficient is 0

A

How do you test for significance of the coefficients?

97
Q

RevERSED

Change the threshold at which an observation is assigned to a class - default is 0.5

A

How do you alter the sensitivity and specificity of a classifier?

98
Q

RevERSED

the amount by which f* would change if we estimated it using a different training data set. In general, more flexible methods have higher variance

A

What is variance?

99
Q

RevERSED

table(true=, predicted=)

A

How do you produce table of observed vs predicted results when classified discretely?

100
Q

RevERSED

Plot the residuals ei versus the predictor xi

A

How can you detect non-linearity of data with a linear model?

101
Q

RevERSED

Plot the residuals as a function of time. Adjacent residuals may have similar values if they are correlated

A

How can you detect correlation in the error terms?

102
Q

RevERSED

following the errors or noise too closely

A

What is overfitting?

103
Q

RevERSED

First make an assumption about the functional form or shape of f, then use a procedure that uses the training data to fit or train the model

A

How do you find f with a parametric method?

104
Q

RevERSED

Develop an agent that improves its performance based on interactions with the environment

A

What is reinforcement learning?

105
Q

RevERSED

Collinearity reduces the accuracy of the estimates of the regression coefficients, and causes the standard error to grow

A

What happens when there is collinearity between the predictor variables?

106
Q

RevERSED

Advantage: It has less bias because the training set is bigger
Disadvantage: time consuming to implement

A

What is an advantage and a disadvantage of the LOOCV approach?

107
Q

RevERSED

Y = B0 + B1X1 + B2X2 + … + BpXp + e

A

What is the standard multiple linear regression formula?

108
Q

RevERSED

When there is a small number of observations per predictor

A

When will parametric methods outperform non-parametric methods?

109
Q

RevERSED

Instead of selecting a baseline classes, treat all K classes symmetrically. Estimate coefficients for all K classes

A

What is softmax coding for multinomial logistic regression?

110
Q

RevERSED

bagged trees with feature sampling. Make trees that are too complex and average over bootstrapped samples to cancel out the overfitting parts

A

What are random forests?

111
Q

RevERSED

2 variables: correlation matrix
multiple variables: variance inflation factor (VIF). value exceeding 5 or 10 is problematic

A

How can you assess collinearity between 2 variables and between multiple variables?

112
Q

RevERSED

Prediction: predict Y using Y* = f*(X),
Inference: understanding the association between Y and X

A

What is the difference between prediction and inference?

113
Q

RevERSED

p(X)/(1-p(X)) = e^ B0+B1X

A

What are the odds from a logistic function?