SKlearn Flashcards

1
Q

What is independent variable called in SKlearn

A

Feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is dependent variable called?

A

Output, Target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Find the R-squared in sklearn

A

reg.score(x_matrix,y)

Notice x has been reshaped to a 2d vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Find the coefficients in sklearn

A

reg.coef_

Result is an array containing all coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Find the intercept in sklearn

A

reg.intercept_

–> Leads to a float

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Making predictions in sklearn

A

reg.predict(input)

Leads to array not a float, because predict method can take more than 1 value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the ML word for observation?

A

Sample

Each row in the dataset is a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to calculate Adjusted R-squared in Python?

A

Set cell to markdown

Put the formula in Python –> Look up formula
r2 = reg.score(x,y)
n = x.shape[0]
p = x.shape[1]

Notice that x does not need to be reshaped because it already contains 2 variables. Then fill in these variables into formula.

Remember: Adjusted R-squared steps on the R-squared and adjusts for the nr of variables included in the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the advantage of feature selection?

A

Simplifies models

Improves speed and prevents a series of unwanted issues arising from having too many features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can you do with the F-statistic?

A

Test whether model has merit

Null Hypothesis is that all betas are equal to 0 –> H0: ß1 = ß2 = ß3 = 0

If all Beta’s are 0 than the model is useless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an F-Statistic?

A

Similar to a T statistic from a T-test

T-test will tell you if a single variable is statistically significant

F-test will tell you if a group of variables are jointly significant

Based on hypothesis that all betas are equal to 0 –> H0: ß1 = ß2 = ß3 = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to interpret the P-value in the results table?

A

A low P-value (< 0.05) means that the coefficient is likely not to equal zero.

A high P-value (> 0.05) means that we cannot conclude that the explanatory variable affects the dependent variable (here: if Average_Pulse affects Calorie_Burnage).

A high P-value is also called an insignificant P-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the P-value denoted in the results table?

A

P>|t|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to interpret the F-statistic? And the P-value change?

A

Compare F-statistic without or without variable –> Lower F-statistic means closer to a non-significant model

Prob(F-statistic) can still be significant but notice the change –> If it’s higher then drop the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What will this return?

from sklearn.feature_selection import f_regression

f_regression(x,y)

A

2 Arrays
1 with the F-statistics
1 with the according p-values –> Prob(F-statistic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does feature_selection.f_regression do?

A

It creates simple linear regressions of each feature and the dependent variable

17
Q

from sklearn.feature_selection import f_regression
f_regression(x,y)

How to extract the p-values from regression results?

A

p_values is f_regression(x,y)[1]

Since first array are the F-statistics

18
Q

Do p-values reflect the interconnection of the features in our multiple linear regression?

A

No

19
Q

Why is Feature Scaling / Standardization needed?

A

Common problem when working with numerical data is difference in magnitude

It is transforming the data into a standard scale so all numbers are of the same magnitude

20
Q

What is feature scaling also called?

A

Standardization / Normalization

21
Q

What are coefficients and intercept called in Machine Learning

A

Weights and Bias

22
Q

What is the reason SKlearn does not really support p-values?

A

Most ML practisioners perform some kind of feature scaling so what the (very small) weights off the variable become apparent.

23
Q

What would you need to do after standardizing the dataset?

A

After standardizing the dataset, the input for predict must be standardized as well

Input must be standarized in the same way

24
Q

x_simple_matrix = x_scaled[:,0].reshape(-1,1)

What happens here?

A

You are taking the first column out of x_scaled

25
Q
A