Midterm Review Flashcards

1
Q

Linear regression formula:

A

y = b0 + b1x

where…

y = output
b0 = y-intercept
b1 = slope
x = input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Three steps in fitting a linear regression model:

A
  1. Define the model: choose the ‘class’ of functions that relates the inputs (x) to the output (y)
  2. Define your training loss
  3. Find the function in your class that gives the smallest training loss
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is training loss function?

A

Measures the deviation of the model fits from the observed data

Large loss = poor fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are residuals?

A

Errors from the model fit: data = fit + residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two main types of loss function?

A
  1. Least Absolute Deviation (LAD, L1-norm, Lasso): minimize the sum of absolute values of residuals (eliminate outliers)
  2. Ordinary Least Squares (OLS, L2-norm, Ridge): minimize the sum of squared residuals (shrink but don’t eliminate)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When should you use L1-norm vs. L2-norm?

A

L1-norm should be used when outliers can be ignored (lots of features and unsure whether they are necessary) whereas L2-norm should be used when all the features must be considered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the coefficient of determination and how is it calculated?

A

The proportion of the explained variance in the response variable, or the quality of the fit of a linear regression model, where 0 = no fit and 1 = perfect fit.

It is calculated as 1 - (Residual Sum of Squares / Total Sum of Squares) or (1 - RSS / TSS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are robust statistics?

A

Statistics that are not greatly influenced by the inclusion of outliers.

For example, as measures of central tendency, mean is non-robust while median is robust.

L1-norm is robust.
L2-norm is non-robust.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is classfication?

A

Prediction where target variable Y is categorical
- often binary
- Y can take on value in a set {o1, o2, …}
- Y is discrete and finite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is logistic regression?

A

A form of classification in which an S-shaped sigmoid function is used to calculated the probability of each potential output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the natural logistic regression classification rule?

A

Binary classification: F(X|B) = 1 if P(Y=1|X,B) >= 0.5, 0 otherwise

If sigmoid function > 0.5 == 1
If sigmoid function < 0.1 == 0

Can be simplified to linear B0+BX >= 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is regularization?

A

Optimizing likelihood plus penalty on the size of the parameters, preventing infinite optimal solutions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the major flaw of accuracy rate as a metric for evaluating classifiers?

A

Can be misleading when classes are imbalanced. Baseline accuracy must be established.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are Type I and Type II errors in classification?

A

Type I: False positive (Predicted +, True -)
Type II: False negative (Predicted -, True +)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to calculate precision and recall?

A

Precision = #True Positive / #Predicted Positive

Recall = #True Positive / #Class Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to calculate F-measure?

A

2 * [(Precision * Recall) / (Precision + Recall)]

17
Q

How to interpret the ROC curve?

A

If I randomly generate a positive example and a negative example, what is the probability my classifier puts them “in the right order”?

18
Q

What is the range of AUROC values?

A

0.5 (worst, completely random) - 1 (best, perfect classification)

19
Q

What is the most important tool to limit overfitting?

A

Withholding a test dataset.

20
Q

High training error and high testing error means the model is _____?

A

Underfit

21
Q

Low training and high testing error means the model is ______?

A

Overfit

22
Q

Two ways to combat overfitting in decision trees:

A
  1. Bagging
  2. Random Forest
23
Q

What is bagging?

A

Bootstrap Aggregation

Bootstrap B datasets, fit a deep tree (high variance) for each dataset and average.

24
Q

When is bagging appropriate?

A

Regression.

Not always true in classification as bagging a bad classifier will make the average worse.

25
Q

What is Random Forest?

A

Bagged estimator, but only on a random subsample of the features to decorrelate the trees.

26
Q

Why are random forests popular among data scientists?

A
  • Low bias and low variance
  • Can measure feature importance