Regression Models Flashcards
(24 cards)
What are three measures of error?
Sum of Squared Errors, Sum of squared residuals, total sum of squares.
How is sklearn used to create a linear regression model?
LR = LinearRegression()
How is an sklearn model fit once created?
LR = LR.fit(X_train, y_train)
How are predictions made on an sklearn model?
y_predict = LR.predict(X_test)
When is a system of equations considered over determined?
A system of equations is considered overdetermined if there are more equations than unknowns.
How does Linear Least Squares method work?
Linear Least Squares attempts to make the sum of the squares of the errors as small as possible.
How are prediction models evaluated?
Using performance metrics that measure the quality of a models predictions. Usually representing closeness between y_predicted and y_actual.
What is the aim of an interpretation model?
To find insights from the data.
How is an interpretation model used?
The model is trained to find a function omega that best predicts y. Omega is then used to generate insights.
What needs to be considered when making the best model?
Best cost function, different hyper parameters, comparing a variety of models.
how might the linear model be enhanced?
Using Polynomials
Why do polynomials improve the performance of Regression Models?
Allow better prediction by better fitting the curvature of the data. Allow better explanation by finding variables that explain variations in the data.
How is the best model order selected?
Using Bayes Information Criterion.
What is a holdout set of data?
Unseen data that will test how well the model performs.
What is the training set used for?
Fitting the model
What is data leakage?
Knowledge of the test set leaking into the training set.
What is the test set used for?
Measuring performance by comparing predictions with actual values and measuring error.
How is a train/test split generated using sklearn?
X_train, x_test, Y_train, y_test = train_test_split(X, y, test_size=n)
What is cross-validaiton?
Splitting the data into multiple pairs of training and test sets and calculating the error across each of them.
Why is cross validation used?
Performance measures will be more statistically significant.
How is cross validation performed on a given mode?
corss_val = cross_val_score(model, X_data, y_data, cv=10, scoring=”neg_mean_squared_error”)
What is stratified sampling?
A sampling technique where the samples are selected in the same proportion as they appear on the population.
Why is stratified sampling used?
It ensures that training and test sets have the same proportion of features of interest and the original dataset and ensures that cross validation is a close approximation of generalisation error.