SKlearn Flashcards

Question 1

Q

What is independent variable called in SKlearn

Question 2

Q

What is dependent variable called?

Answer

A

Output, Target

Question 3

Q

Find the R-squared in sklearn

Answer

A

reg.score(x_matrix,y)

Notice x has been reshaped to a 2d vector

Question 4

Q

Find the coefficients in sklearn

Answer

A

reg.coef_

Result is an array containing all coefficients

Question 5

Q

Find the intercept in sklearn

Answer

A

reg.intercept_

–> Leads to a float

Question 6

Q

Making predictions in sklearn

Answer

A

reg.predict(input)

Leads to array not a float, because predict method can take more than 1 value

Question 7

Q

What is the ML word for observation?

Answer

A

Sample

Each row in the dataset is a sample

Question 8

Q

How to calculate Adjusted R-squared in Python?

Answer

A

Set cell to markdown

Put the formula in Python –> Look up formula
r2 = reg.score(x,y)
n = x.shape[0]
p = x.shape[1]

Notice that x does not need to be reshaped because it already contains 2 variables. Then fill in these variables into formula.

Remember: Adjusted R-squared steps on the R-squared and adjusts for the nr of variables included in the model

Question 9

Q

What is the advantage of feature selection?

Answer

A

Simplifies models

Improves speed and prevents a series of unwanted issues arising from having too many features

Question 10

Q

What can you do with the F-statistic?

Answer

A

Test whether model has merit

Null Hypothesis is that all betas are equal to 0 –> H0: ß1 = ß2 = ß3 = 0

If all Beta’s are 0 than the model is useless

Question 11

Q

What is an F-Statistic?

Answer

A

Similar to a T statistic from a T-test

T-test will tell you if a single variable is statistically significant

F-test will tell you if a group of variables are jointly significant

Based on hypothesis that all betas are equal to 0 –> H0: ß1 = ß2 = ß3 = 0

Question 12

Q

How to interpret the P-value in the results table?

Answer

A

A low P-value (< 0.05) means that the coefficient is likely not to equal zero.

A high P-value (> 0.05) means that we cannot conclude that the explanatory variable affects the dependent variable (here: if Average_Pulse affects Calorie_Burnage).

A high P-value is also called an insignificant P-value.

Question 13

Q

How is the P-value denoted in the results table?

Question 14

Q

How to interpret the F-statistic? And the P-value change?

Answer

A

Compare F-statistic without or without variable –> Lower F-statistic means closer to a non-significant model

Prob(F-statistic) can still be significant but notice the change –> If it’s higher then drop the variable

Question 15

Q

What will this return?

from sklearn.feature_selection import f_regression

f_regression(x,y)

Answer

A

2 Arrays
1 with the F-statistics
1 with the according p-values –> Prob(F-statistic)

Question 16

Q

What does feature_selection.f_regression do?

Answer

A

It creates simple linear regressions of each feature and the dependent variable

Question 17

Q

from sklearn.feature_selection import f_regression
f_regression(x,y)

How to extract the p-values from regression results?

Answer

A

p_values is f_regression(x,y)[1]

Since first array are the F-statistics

Question 18

Q

Do p-values reflect the interconnection of the features in our multiple linear regression?

Question 19

Q

Why is Feature Scaling / Standardization needed?

Answer

A

Common problem when working with numerical data is difference in magnitude

It is transforming the data into a standard scale so all numbers are of the same magnitude

Question 20

Q

What is feature scaling also called?

Answer

A

Standardization / Normalization

Question 21

Q

What are coefficients and intercept called in Machine Learning

Answer

A

Weights and Bias

Question 22

Q

What is the reason SKlearn does not really support p-values?

Answer

A

Most ML practisioners perform some kind of feature scaling so what the (very small) weights off the variable become apparent.

Question 23

Q

What would you need to do after standardizing the dataset?

Answer

A

After standardizing the dataset, the input for predict must be standardized as well

Input must be standarized in the same way

Question 24

Q

x_simple_matrix = x_scaled[:,0].reshape(-1,1)

What happens here?

Answer

A

You are taking the first column out of x_scaled

Question 25

Q

Brainscape's Knowledge GenomeTM

SKlearn Flashcards

Brainscape's Knowledge Genome^TM