Section 5 Model Tuning Flashcards by Sarah Oliver

Define a model/classifier

We generally defined a model (or classifier) as a function used to relate the target variable y to the input variables X

How well did you know this?

Not at all

Perfectly

What are hyperparameters in general in a model (lamda)

Any parameter you cannot estimate from data but it has an impact on predictive performance of your model.
Can be used to control the complexity of a model or the optimization algorithm.
They are employed in the training process to estimate the parameters.
Can be set manually for a specific predictive problem.
Their selection can be related to a model selection procedure (tuning).
Cannot be learned during the training process.

How well did you know this?

Not at all

Perfectly

What form can hyperparameters take

Characteristics of the loss function used for training and learning
Variables inherent to the algorithm and/or optimization method implemented for training
Variables relating to the complexity of the model.

How well did you know this?

Not at all

Perfectly

What do the model parameters w do (so we can differentiate from hyperparameters)

Characterise the specifics of a certain model and are required for predictions.
Are learned (estimated) during the training process - minimise the loss function
Cannot be set manually.

How well did you know this?

Not at all

Perfectly

Explain model training

Process of using the training data to estimate/learn the parameters w by minimising the training loss for fixed values of the hyperparameters.

How well did you know this?

Not at all

Perfectly

Explain model testing

Process of evaluating the predictive performance of the model on the out of sample data

How well did you know this?

Not at all

Perfectly

How do the hyperparameter relate to model complexity

Hyperparameters (lamda) control the complexity of the model and its ability to fit the training data by means of the optimisation procedure.

How well did you know this?

Not at all

Perfectly

Define the complexity of a model and what are the consequences with high and low complexity

Complexity refers to the flexibility of a model to fit a variety of functions and a model can be made arbitrarily complex.
Models with low complexity may be too simple and not able to fit well the training set.
Models with high complexity will fit the training data perfectly, but will generalise poorly.

How well did you know this?

Not at all

Perfectly

Why is training performance unreliable?

The training performance is an optimistic estimate of a model’s performance

How well did you know this?

Not at all

Perfectly

Comparing training and out of sample data where is a model optimised

there’s a gap between training and out-of-sample loss and predictive performance. The gap is dependent on the hyperparameters and is most important.
Underfitting or overfitting data will increase the gap, an optimally tuned model will minimise this gap.

How well did you know this?

Not at all

Perfectly

What does underfitting and overfitting mean

Underfitting: model does not obtain a sufficiently low loss value on the training data, not sufficiently good predictive performance on the training data. This will lead to large bias and a poor predictive performance on the test data.
Overfitting: model learns patterns which are specific to the training data and not general to the underlying data generating process. This corresponds to a gap between training and test error too large, and poor generalisation ability of the model.0 bias, but large variance in predictions.

How well did you know this?

Not at all

Perfectly

Explain tuning

Tuning is the act of trying to minimise the gap between training data and out of sample data loss functions and accuracy. Its the process of using validation data to select the optimal hyperparameter values mapping to maximum validation predictive performance

How well did you know this?

Not at all

Perfectly

Explain bias

The error introduced by approximating the data generating process by a simpler model is denoted bias.

How well did you know this?

Not at all

Perfectly

Explain variance

The variance of a model is proportional to the flexibility of a model, the more flexible, the more variance.Its the model’s stability.

How well did you know this?

Not at all

Perfectly

What is the expected generalisation error

Expected generalisation error=Variance of model + Bias of model ^2 + Variance of the error terms

How well did you know this?

Not at all

Perfectly

What is the optimal complexity in a model

Study These Flashcards

The optimal complexity is the complexity level which balances bias and variance.

How can we consider different hyperparameter values in a model

Study These Flashcards

Hyperparameters λ specify/affect the characteristics of a model, we can consider two (or more) versions of a model with the same structural form f(·) but different hyperparameter values. Tuning means using validation data to select the optimal values

How is cross valdiation applied in model tuning

Study These Flashcards

We can use cross-validation to compare models with different hyperparameters, i.e. we can use cross-validation to select the optimal λ and hence the best support vector machine classifier to use.

If training and validation procedure is implemented in a resampling framework how are optimal hyperparameters determined?

Study These Flashcards

The one maximising the average predictive performance over the replicates.

How do we assess generalised predictive performance after tuning

Study These Flashcards

Once the model has been tuned, one needs to assess its generalised predictive performance using separate test data.

What comes first model tuning or selection?

Study These Flashcards

Model tuning can be implemented in conjunction with model selection, whereby different types of models are compared also across different instances: comparing models and tuning happen at the same time.

Give examples of hyperparameters in logistic regression

Study These Flashcards

The step size η of the gradient descent algorithm is a hyperparameter that controls the optimization process (no need to worry about that).
The classification threshold τ is a hyperparameter that controls the classification of the data points where tuning is done with respect to predictive performance and model purpose.

Give examples of hyperparameters in SVM classifier

Study These Flashcards

In a support vector machine classifier, the hyperparameters of the kernel function control the complexity of the model.
Gaussian Radial Basis Function kernel (GRBF) – σ scaling coefficient
Also the cost is a hyperparameter which controls complexity.

Give an example of a hyperparameter for classification trees

Study These Flashcards

For classification trees the problem of defining a tree T can be formulated as a loss minimization problem where Lambda hyperparameter controls its complexity

Give an example of a hyperparameter for random forests

For Random forests (and bagging): The number of variables considered for a split has an effect on the predictive performance. More numbers of variables - more complex the ensemble is.

Name four things you should do in order to tune an dcompare approriately multiple classifiers using a cross validation method

Split the data into the training set and validation set - optionally can also split into three to give yourself a test set to assess predictive performance. Train the models on the training data and assess predictive performance compared to the validation data. Ensure when comparing across models that the training data and validation data come from the same splits Select the best classifier and hyperparameters for the model according to the average validation accuracy Replicate this procedure multiple times if using k fold cross validation or hold out sample cross-validation

Given a collection of competing classifiers and some data, does the use of a validation set to select the best one in the collection guarantee that the selected one will always provide the best predictive performance on future unseen observations?

No why: Could have overfitting Other classifiers may have higher variability hence could have test data performance overlap The validation set may not be fully representative of variability in the real-world data May be better models out there

Is classifier M2 to be preferred to classifier M1 if training and validation on different splits?

We cannot compare accurately as they are not trained and validated on the same splits of the data Necessary always to train competitive models and calculate validation performance ont he same splits of the data Otherwise can have issue of is difference in performance due to variations int he data splits or actual classifiers ability.

Section 5 Model Tuning Flashcards

(28 cards)