chapter 5 Flashcards
(19 cards)
why using DL framework?
Because we do not need to implement all these new techniques from scratch in our neural network, but we focus on the big picture of solving real-world problems.
examples of DL framework
- TensorFlow
- PyTorch
steps foe using DL framework
1- write our program as a Python program.
2- import the framework of choice as
a library.
3 - use DL functions from the framework that fit in our program.
what are the important functions we use to build our neural networks?
- keras.Sequential: to build the network.
- keras.layers.Flatten: to handle our inputs.
- keras.layers.Dense: to handle the hidden layers, number or nurons, and the activation function.
- keras.optimizer.SGD: to apply the gradient descent and the learning rate.
- compile: prepare the model for training.
- fit: to start the trainig.
Epochs
refers to one cycle through the full training dataset.
Batch Size
The number of training samples used in one iteration.
Saturated neurons
when the neuron be insensitive to input changes because their derivative is 0 in the saturated region, that can cause learning to stop completely.
vanishing gradient problem
problem where the backpropagated error is 0 and the weights are not adjusted. Saturated neuron is one of the causes.
How to avoid Saturated neurons?
Three common techniques:
- weight initialization.
- input standardization.
- batch normalization.
weight initialization.
The way to ensure that our neurons are not saturated to begin with, so, if a neuron has a large number of inputs, then we want to initialize the weights to a smaller value to have a reasonable probability of still keeping the input to the activation function close to 0 to avoid saturation.
weight initialization strategies
- Glorot initialization: recommended for tanh- and sigmoid-based neurons.
- He initialization: recommended for ReLU-based
neurons.
Input standardization.
Standardizing the input data to be centered around 0 and with most values close to 0, By subtracting the mean and dividing by the standard deviation.
Batch normalization
The idea is to normalize values inside of the network as well and thereby prevent hidden neurons from becoming saturated.
The two strategies to apply batch normalization.
- apply the normalization before the activation function.
- apply the normalization after the activation function.
Examples of activation functions to avoid saturated neurons.
- ReLU.
- leaky ReLU.
- maxout.
- elu.
- softPlus.
what are the variations on gradient descent that help in faster learning?
1- Momentum.
2-Adaptive learning rate.
3- Adaptive moments.
Gradients exploding
where the gradient becomes too big in
some point, causing a huge step size.
Gradient clipping
is a technique to avoid exploding gradients
by simply not allowing overly large values of the gradient in the weight update
step.
How to avoid overfitting?
split the dataset to :
1- training dataset»»»» to train the model.
2-validating dataset»»»» to tune hyperparameters.
3- testing dataset»»»» to test the model.