Regression Models Flashcards

(34 cards)

1
Q

What is supervised learning?

A

Learning a function that maps an input and some parameters to a predicted output (label), which is then evaluated based on a ground truth value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an observation in supervised learning?

A

One row of data in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a feature in supervised learning?

A

One column of data in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are hyperparameters?

A

Parameters that we define before running the model, and are not learned directly from the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Models are typically trained in two phases: […] and […]

A

Training and evaluation (/prediction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The training phase involves…

A

Teaching the models what predictors fall into what category, learning parameters that define the relationships between the features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The prediction/evaluation phase involves…

A

Gaining new observations and feeding them into our trained model to create a prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

An update rule is defined which is calculated using the value from a […].

A

Loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a loss function?

A

A quantitative measure of error in predicted values of a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Problems with quantitative responses are usually […] problems, while those with qualitative responses are usually […] problems.

A

Regression, classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Provide an example of a regression model used for qualitative responses.

A

Logistic regression, since it estimates the probability of a choice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are regression models?

A

Regression models are used to model any continuous target or outcome (i.e. loss, revenue)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does a linear regression model create predictions?

A

Defining a straight mathematical line that attempts to go through each point, with which we can substitute x and y to find y’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a linear regression model, epsilon represents…

A

A hyperparameter chosen at implementation that guides the model to learn the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In a linear regression model, b0 and b1 represent…

A

Parameters that represent the y-intercept and slope of the line respectively, calculated via an equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Mean Squared Error (MSE), and how can we use it for a linear regression model?

A

MSE is the sum of squared errors divided by the number of data points, which we can use to measure the error of our model and adjust coefficients accordingly

17
Q

Error metrics are useful for training a model because…

A

We can focus on easier-to-read parameters rather than the predictions themselves to generate model/data insights

18
Q

What is a black-box model?

A

A black-box is a type of learning model that does not explain its process, creating a sort of ‘black-box’ between input and output

19
Q

The best step-by-step practice for modelling involves…

A

Establishing the ideal cost function, developing multiple models with different hyperparameters, and comparing the results according to our loss function (establishing, developing, comparing)

20
Q

We can calculate the parameters of our linear regression model by…

A

Solving the linear system of equations formed by our data points for epsilon, calculating the sum of these equations, then using the derivative to solve for bn.

21
Q

The area above the mean of a linear regression line is called…

A

Explained variation

22
Q

The area below the mean of a linear regression line is called…

A

Unexplained variation

23
Q

Total variation is calculated as the sum of…

A

Explained and unexplained variation

24
Q

How can we add more curvature to our linear regression model?

A

By using a polynomial regression instead

25
If a first order polynomial regression is a linear regression that generates a straight line, a second order polynomial regression generates...
A curve with one local minima/maxima
26
The order of a polynomial refers to...
The largest exponent in any of the terms
27
Are polynomial regression models linear?
Yes, the algorithm itself is still a linear combination of features
28
What is the bias-variance tradeoff?
Model adjustments that decrease bias, often increase variance, and vice versa, therefore this tradeoff is analogous to a complexity tradeoff
29
What can we use to measure the complexity of a model?
Bayesian Information Criterion (BIC)
30
Why should we always include an unseen testing set?
If we train only on the training set, our model may generalise too hard to it, learning only how to replicate and not predict
31
If knowledge of the test set leaks into the training set, we can encounter an issue called...
Data leakage
32
What is cross validation?
Instead of using one training and testing set, you instead split the dataset into four sets of training/testing sets, then test the model on each and compare the error between them
33
What is the problem with randomly sampling the training set?
In the case where we have data with a large bias, that bias may not be properly represented in the random samples
34
What problem does stratified sampling solve and how?
Bias in the training set, by dividing the dataset into groups based on the different classes, and randomly selecting an amount of data equal to their proportion in the data from among each group