Prediction with the linear model Flashcards

week 4

1
Q

The aim of prediction

A

The aim of prediction is to estimate a numeric variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to call the variable we’re estimating

A

The variable we’re estimating is the target, outcome, or response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how to call the variables we use to do the estimation

A

The variables we use to do the estimation are the predictors, covariates or features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

training data

A

We typically have a training data set which contains both the target and predictors so that we can build a
model for prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

testing data

A

We then have a testing data set which contains only the predictors. The goal is to estimate the resulting
target on this data set.

For testing purposes we often create an artificial test set or validation data set so that we can check how
well our model is performing by comparing the predicted target against the actual target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Methods of prediction

A

Linear regression models
Regression trees
Random Forests
(Simple!) Neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

binary indicator variables

A

To handle factors, we introduce binary indicator variables (taking the value zero or one), and use these to
code each level of a factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Prediction intervals

A

Prediction intervals give the range of values associated with specified level of confidence.
E.g. 95% sure that predicted value is in range
(a, b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

difference between prediction and confidence intervals

A

A prediction interval is less certain than a confidence interval. A prediction interval predicts an individual number, whereas a confidence interval predicts the mean value. A prediction interval focuses on future events, whereas a confidence interval focuses on past or current events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bias

A

It’s a systematic error because it’s consistent and not random. So, even if you shoot many times, your shots will always miss in the same way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Variance

A

Variance is a measure of the variability of prediction. Variance is how much your shots vary from each other. Let’s say sometimes you shoot and the ball goes too far to the left, sometimes it goes too far to the right, and sometimes it’s just right. That’s variance. It measures how much your shots differ from each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Decomposition of the MSE

A

Mean squared error can be decomposed into squared bias and variance components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mean squared error

A

The overall accuracy of a prediction can be measured by its mean squared error (MSE).
Mean squared error can be decomposed into variance and squared bias components.
Accepting some bias will be advantageous if it results in a more substantial decrease in variance.
In practice we will want to use a prediction model that gets the right balance between prediction variance
and bias so as to minimize the prediction MSE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

where the Mean Squared Error (MSE) on the test data is much higher compared to the training MSE,

A

is a common phenomenon in machine learning. It’s typically attributed to the model’s inability to generalize well to unseen data.

Overfitting: During training, the model tries to capture all the patterns and nuances present in the training data, including noise. If the model becomes too complex or is trained for too long, it may start to memorize the training data instead of learning general patterns. This phenomenon is called overfitting. As a result, the model performs well on the training data (low training MSE) but poorly on new, unseen data (high test MSE).

Generalization: When the model encounters new data during testing, it may encounter patterns or variations that it hasn’t seen before. If the model hasn’t learned to generalize well from the training data, it may struggle to make accurate predictions on this new data. This leads to higher MSE on the test data compared to the training data.

Data Distribution: Sometimes, the distribution of the test data may be different from the distribution of the training data. If the model is trained on one type of data but tested on another type, its performance may suffer. This emphasizes the importance of having representative training and test datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The Bias-Variance trade-off

A

The bias-variance tradeoff implies that as we increase the complexity of a model, its variance decreases, and its bias increases. Conversely, as we decrease the model’s complexity, its variance increases, but its bias decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

validation dataset.

A

In practice we will not have target values for test cases.
This means that we cannot use test data to obtain reliable MSE for prediction.
One approach is to split training data.
One part is used to ‘build model’ (i.e. estimate model parameters);
Other part is used as a kind of independent test dataset to compute reliable MSE values. Often called
the validation dataset

17
Q

Akaike information criterion

A

The Akaike information criterion, usually abbreviated to AIC, is a measure of model quality.
AIC requires only a training dataset.
The lower the AIC value, the better the model.

wage.lm.step <- step(wage.lm)