Regression Models Flashcards

Question 1

Q

What are three measures of error?

Answer

A

Sum of Squared Errors, Sum of squared residuals, total sum of squares.

Question 2

Q

How is sklearn used to create a linear regression model?

Answer

A

LR = LinearRegression()

Question 3

Q

How is an sklearn model fit once created?

Answer

A

LR = LR.fit(X_train, y_train)

Question 4

Q

How are predictions made on an sklearn model?

Answer

A

y_predict = LR.predict(X_test)

Question 5

Q

When is a system of equations considered over determined?

Answer

A

A system of equations is considered overdetermined if there are more equations than unknowns.

Question 6

Q

How does Linear Least Squares method work?

Answer

A

Linear Least Squares attempts to make the sum of the squares of the errors as small as possible.

Question 7

Q

How are prediction models evaluated?

Answer

A

Using performance metrics that measure the quality of a models predictions. Usually representing closeness between y_predicted and y_actual.

Question 8

Q

What is the aim of an interpretation model?

Answer

A

To find insights from the data.

Question 9

Q

How is an interpretation model used?

Answer

A

The model is trained to find a function omega that best predicts y. Omega is then used to generate insights.

Question 10

Q

What needs to be considered when making the best model?

Answer

A

Best loss function, different hyper parameters, comparing a variety of models.

Question 11

Q

how might the linear model be enhanced?

Answer

A

Using Polynomials

Question 12

Q

Why do polynomials improve the performance of Regression Models?

Answer

A

Allow better prediction by better fitting the curvature of the data. Allow better explanation by finding variables that explain variations in the data.

Question 13

Q

How is the best model order selected?

Answer

A

Using Bayes Information Criterion.

Question 14

Q

What is a holdout set of data?

Answer

A

Unseen data that will test how well the model performs.

Question 15

Q

What is the training set used for?

Answer

A

Fitting the model

Question 16

Q

What is data leakage?

Answer

Study These Flashcards

A

Knowledge of the test set leaking into the training set.

Question 17

Q

What is the test set used for?

Answer

Study These Flashcards

A

Measuring performance by comparing predictions with actual values and measuring error.

Question 18

Q

How is a train/test split generated using sklearn?

Answer

Study These Flashcards

A

X_train, x_test, Y_train, y_test = train_test_split(X, y, test_size=n)

Question 19

Q

What is cross-validaiton?

Answer

Study These Flashcards

A

Splitting the data into multiple pairs of training and test sets and calculating the error across each of them.

Question 20

Q

Why is cross validation used?

Answer

Study These Flashcards

A

Performance measures will be more statistically significant.

Question 21

Q

How is cross validation performed on a given model? (Code)

Answer

Study These Flashcards

A

cross_val = cross_val_score(model, X_data, y_data, cv=10, scoring=”neg_mean_squared_error”)

Question 22

Q

What is stratified sampling?

Answer

Study These Flashcards

A

A sampling technique where the samples are selected in the same proportion as they appear on the population.

Question 23

Q

Why is stratified sampling used?

Answer

Study These Flashcards

A

It ensures that training and test sets have the same proportion of features of interest as the original dataset and ensures that cross validation is a close approximation of generalisation error.

Question 24

Q

Answer

Study These Flashcards

A

Regression Models Flashcards

(24 cards)