Model Training, Tuning and Evaluation Flashcards
(63 cards)
What is an activation function?
The function inside a neuron that takes in the inputs and uses that to figure out the output.
What is a rectified linear unit activation function (ReLU)?
An activation function where you have a linear function above 0 and below 0 it just outputs whatever was input.
This is easy and fast to compute.
What is a parametric ReLU activation function?
The same as ReLU, except of just outputting the inputs on the back half the slope is learned through back-propagation.
Complicated.
What is a Swish activation function?
Developed by Google. Works very well for deep neural networks.
What activation functions do RNNs tend to use?
Non-linear activation functions w/ TanH
What are convolutional neural networks mainly used for?
Image analysis
What does feature location invariant mean? And which type of neural network is it?
Means it doesn’t matter where within an image the key object is. Convolutional neural networks.
How does a convolutional neural network work?
Takes a source image, breaks it into chunks called convolutions. Then slowly layers these convolutions and increases the complexity processed.
For example, start with lines, then with shapes, then recognising and assembling these shapes, then recognising objects, etc.
What is something important to note about the source data for CNNs?
It must be of the appropriate dimensions - width x length x colour channels.
How are neural networks trained? What is the name of the method?
Gradient descent
What is an epoch in gradient descent?
An iteration of training
What is the downside of the learning rate being too high or too low?
Too high: might overshoot the optimal solution
Too low: will take too long to find the optimal solution
What is the purpose of regularisation?
The practice of preventing overfitting
What are 3 high-level ways of doing regularisation?
- Use a simpler model
- Dropout - remove randomly selected neurons to reduce overreliance on specific neurons
- Early stopping - automatically stop the training earlier than the dictated epochs to prevent overfitting
What is L1 regularisation?
A regularisation method where a regularisation term is added as your weights are learned. L1 is the sum of the weights.
What is L2 regularisation?
A regularisation method where a regularisation term is added as your weights are learned. L2 is the sum of the square of the weights.
What is the difference between L1 and L2 regularisation?
L1 the regularisation term is the sum of the weights, L2 the regularisation term is the sum of the square of the weights. L2 means that none of the features are lost, just weighted, whereas L1 some entire features can disappear.
What are 4 ways to counteract the vanishing gradient problem?
Use a multi-level hierarchy (break the levels into their own sub-networks that are individually trained)
Use long short-term memory (RNNs)
Use residual networks (an ensemble of smaller networks)
Use a different activation functions
What is the vanishing gradient problem?
When the slope of the learning curve approaches 0 or 1, the changes become very very small, which can cause issues and means that the NN is learning very slowly
Why can accuracy be a problematic metric?
Because it can be misleading. For instance, a test for a super super rare disease could be right 99.9% of the time by just always guessing that the person doesn’t have it
What is recall?
The number of correct positive predictions out of the number of actual positives there are
What is precision?
The number of correct positive predictions out of the amount positive predictions (like specificity but for positivies)
What is specificity?
The number of correct negative predictions out of the amount of negative predictions (like precision but for negatives)
In layman’s terms, what is F1 score?
The mean of precision and recall