L5 - Improving Predictive Models Flashcards
(7 cards)
What is an improved model?
Improved can have different meanings.
* Simpler, faster to run
* More accurate estimates
How is simple model validation preformed and what are its short comings?
Devide the data into traing and testing data (typically 30-70 split).
Calculate the error by finding the difference between model predictions and the test data outputs.
Issue! Model may not generalise well.
What is cross validation?
Cross-validation is a technique used in machine learning to evaluate the performance of a model on unseen data. It involves dividing the available data into multiple folds or subsets, using one of these folds as a validation set, and training the model on the remaining folds. This process is repeated multiple times, each time using a different fold as the validation set. Finally, the results from each validation step are averaged to produce a more robust estimate of the model’s performance.
Outline the process of k-fold cross validation
- Randomly divide the data set into k subsets (folds)
- Reserve 1 set for validation. Train the model using the other sets.
- Repeat with a different set reserved for validation.
- Repeat until all sets have been used for validation once.
- Calculate k-folder loss
k-folder loss = (1/k)SUM(loss)
What are Hyperparameters?
Hyperparameters are configuration variables that are set before the training process of a machine learning model begins. They control the learning process itself, rather than being learned from the data. Hyperparameters are crucial for tuning the performance of a model and can significantly impact its accuracy, generalization, and other metrics.
Why is it desirable to reduce the number of predictors in data?
Data can have hundereds or thousands of predictors. This makes learning algoritms computationaly intensive and the resulting models complex.
Methods:
1. Transform Features
2. Select Features
What is ensemble learning?
Multiple weak learning models (like decision trees) can be grouped together to compare all their outputs creating a more robust model.