Regression Models Flashcards

(24 cards)

1
Q

What are three measures of error?

A

Sum of Squared Errors, Sum of squared residuals, total sum of squares.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is sklearn used to create a linear regression model?

A

LR = LinearRegression()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is an sklearn model fit once created?

A

LR = LR.fit(X_train, y_train)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are predictions made on an sklearn model?

A

y_predict = LR.predict(X_test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is a system of equations considered over determined?

A

A system of equations is considered overdetermined if there are more equations than unknowns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does Linear Least Squares method work?

A

Linear Least Squares attempts to make the sum of the squares of the errors as small as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are prediction models evaluated?

A

Using performance metrics that measure the quality of a models predictions. Usually representing closeness between y_predicted and y_actual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the aim of an interpretation model?

A

To find insights from the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is an interpretation model used?

A

The model is trained to find a function omega that best predicts y. Omega is then used to generate insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What needs to be considered when making the best model?

A

Best cost function, different hyper parameters, comparing a variety of models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how might the linear model be enhanced?

A

Using Polynomials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why do polynomials improve the performance of Regression Models?

A

Allow better prediction by better fitting the curvature of the data. Allow better explanation by finding variables that explain variations in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the best model order selected?

A

Using Bayes Information Criterion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a holdout set of data?

A

Unseen data that will test how well the model performs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the training set used for?

A

Fitting the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is data leakage?

A

Knowledge of the test set leaking into the training set.

16
Q

What is the test set used for?

A

Measuring performance by comparing predictions with actual values and measuring error.

17
Q

How is a train/test split generated using sklearn?

A

X_train, x_test, Y_train, y_test = train_test_split(X, y, test_size=n)

18
Q

What is cross-validaiton?

A

Splitting the data into multiple pairs of training and test sets and calculating the error across each of them.

19
Q

Why is cross validation used?

A

Performance measures will be more statistically significant.

20
Q

How is cross validation performed on a given mode?

A

corss_val = cross_val_score(model, X_data, y_data, cv=10, scoring=”neg_mean_squared_error”)

21
Q

What is stratified sampling?

A

A sampling technique where the samples are selected in the same proportion as they appear on the population.

22
Q

Why is stratified sampling used?

A

It ensures that training and test sets have the same proportion of features of interest and the original dataset and ensures that cross validation is a close approximation of generalisation error.