L5 - Improving Predictive Models Flashcards

(7 cards)

1
Q

What is an improved model?

A

Improved can have different meanings.
* Simpler, faster to run
* More accurate estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is simple model validation preformed and what are its short comings?

A

Devide the data into traing and testing data (typically 30-70 split).
Calculate the error by finding the difference between model predictions and the test data outputs.

Issue! Model may not generalise well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is cross validation?

A

Cross-validation is a technique used in machine learning to evaluate the performance of a model on unseen data. It involves dividing the available data into multiple folds or subsets, using one of these folds as a validation set, and training the model on the remaining folds. This process is repeated multiple times, each time using a different fold as the validation set. Finally, the results from each validation step are averaged to produce a more robust estimate of the model’s performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Outline the process of k-fold cross validation

A
  1. Randomly divide the data set into k subsets (folds)
  2. Reserve 1 set for validation. Train the model using the other sets.
  3. Repeat with a different set reserved for validation.
  4. Repeat until all sets have been used for validation once.
  5. Calculate k-folder loss

k-folder loss = (1/k)SUM(loss)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Hyperparameters?

A

Hyperparameters are configuration variables that are set before the training process of a machine learning model begins. They control the learning process itself, rather than being learned from the data. Hyperparameters are crucial for tuning the performance of a model and can significantly impact its accuracy, generalization, and other metrics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it desirable to reduce the number of predictors in data?

A

Data can have hundereds or thousands of predictors. This makes learning algoritms computationaly intensive and the resulting models complex.
Methods:
1. Transform Features
2. Select Features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is ensemble learning?

A

Multiple weak learning models (like decision trees) can be grouped together to compare all their outputs creating a more robust model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly