Interpreting Data with Advanced Statistical Models Flashcards

1
Q

A friend is trying to create a classifier for a startup. She first tries a logistic regression and clearly gets underfitting. She then tries an SVM and has the same issue. Finally, she tries logistic regression with fifth degree polynomial features and gets clearly overfits. What should she do?

Try an SVM with Gaussian Kernel and a higher C value

Try an SVM with Linear Kernel and a higher C value

Try an SVM with Linear Kernel and a lower C value

Try an SVM with Gaussian Kernel and a lower C value

A

Try an SVM with Gaussian Kernel and a higher C value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In which of the following unsupervised learning techniques do you think it is most important to feature scale, if different variables have different scales?

PCA

Anomaly Detection

K-means

Hierarchical clustering

A

PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For a medical trial to detect melanoma, you have a dataset of several patients with numerous variables and the advancement of the disease. You try to find some combination of variables that predicts melanoma with high precision.

You first run PCA to reduce the dimensionality, then you run linear regression. You find that the variance explained is low because there are a couple of possible outliers, so you run an algorithm to detect outliers. Which type of learning is each step in this pipeline?

Linear Regression: Unsupervised Learning, PCA: Unsupervised Learning, Outlier finding, Supervised Learning

Linear Regression: Supervised Learning, PCA: supervised Learning, Outlier finding, Unsupervised Learning

Linear Regression: Supervised Learning, PCA: Unsupervised Learning, Outlier finding, Supervised Learning

Linear Regression: Supervised Learning, PCA: Unsupervised Learning, Outlier finding, Unsupervised Learning

A

Linear Regression: Supervised Learning, PCA: Unsupervised Learning, Outlier finding, Unsupervised Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You optimize a training with gradient descent and get the following curve. How would you assess the quality of the curve?

[PICTURE]

As gradient descent uses all training data, the loss vs. iterations should be monotonic. Therefore, this is incorrect and GD is badly implemented or GD was not used.

Since the loss goes up sometimes, you may be overshooting. Try reducing the learning rate.

It is normal that sometimes the loss goes up. You see this in SGD, so you are OK since you reached a minima at 20 iterations.

Use a larger learning rate to get to the optima in a lower number of iterations

A

As gradient descent uses all training data, the loss vs. iterations should be monotonic. Therefore, this is incorrect and GD is badly implemented or GD was not used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You run the following model in a regression problem: y = ax + by + c x*y. You get a significant c value greater than 0. What could this indicate?

Nothing, since you do not check significance of coefficients in multiple linear regression

That a quadratic model will give even better results, since R2 will be higher

That you have a non-significant interaction between x and y. This indicates collinearity and you need to run multivariate techniques to reduce the dimension

That the model is wrong since it does not comply with parsimony principle

A

That you have a non-significant interaction between x and y. This indicates collinearity and you need to run multivariate techniques to reduce the dimension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mark performs simple and quadratic linear regression on a given problem. However, in the rush of presenting the results, he forgets the labels of what is what! Can you help him decide what to do next? The data you have is:

[PICTURE]

Model 1: Linear Model
Model 2: Quadratic Model
Evaluation: Model 1 has a high bias problem

Model 1: Linear Model
Model 2: Quadratic Model
Evaluation: Model 2 has a high bias problem

Model 1: Quadratic Model
Model 2: Linear model
Evaluation: Model 2 has a high variance problem

Model 1: Quadratic Model
Model 2: Linear model
Evaluation: Model 1 has a high variance problem

A

Model 1: Quadratic Model
Model 2: Linear model
Evaluation: Model 1 has a high variance problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For an important medical trial to detect melanoma, you have a dataset of several patients with numerous variables and the advancement of the disease. You are trying to find some combination of variables that predict melanoma with high precision.

You first run PCA to reduce the dimensionality, then run linear regression. You find that the variance explained is low because there are a couple of possible outliers. What would you do?

Check the significance of the model with ANOVA; low variance explained is a sign of a non-significant linear regression

As PCA outputs principal components with the most variance explained, you don’t have outliers after that step. You need a more complex regression

Using clustering, you could find possible outliers and check manually

Dimensionality reduction takes away outliers, so you must replace PCA with a more robust algorithm

A

Using clustering, you could find possible outliers and check manually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

You have a scatter plot that you want to classify. You try to classify with the following logistic regression: h(x) = logit( theta_0 + theta_1x_1 + theta_2x_2), but fail to have high accuracy. What should you do?

[PICTURE]

You should continue as you are, since accuracy is not a great metric

Adding quadratic terms may help, because in that space, the data becomes linearly separable

You should switch to a neural network

You should try to classify with SVM

A

Adding quadratic terms may help, because in that space, the data becomes linearly separable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regularization adds a term to the cost function to decrease the size of the parameters. You forget which models corresponds with each value of lambda = 0.01, 0.1, 1. Can you match each value of lambda to its correspondent final model?

[PICTURE]

Model 1: lambda=1
Model 2: lambda=0.1
Model 3: lambda=0.01

Model 1: lambda=0.01
Model 2: lambda=1
Model 3: lambda=0.1

Model 1: lambda=0.01
Model 2: lambda=0.1
Model 3: lambda=1

Model 1: lambda=0.1
Model 2: lambda =1
Model 3: lambda=0.01

A

Model 1: lambda=0.1
Model 2: lambda =1
Model 3: lambda=0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the Naive Bayes classifier optimize?

The variance of the classes to be similar

A robust classification above great fit

The normality of the posterior probability

The posterior probability of a class, based on previous events, using different way of calculating those probabilities

A

The posterior probability of a class, based on previous events, using different way of calculating those probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly