DL CNNs & RNNs Flashcards
How to calculate number of parameters (weights and biases) in a CNN?
((filter width * filter height) * (num of old channels/filters) + 1 for bias) * num of new filters
What is dropout?
Randomly setting a fraction of input units to 0 at each update during training time - helps prevent overfitting
How do we combat exploding and vanishing gradients?
- Normaliztion of inputs + Careful initialization of weights
- Regularization
- Gradient clipping
Tips for Choosing Initial weights
- Never set to all zero
- Try somewhere between -0.2 and 0.2 (or fanin)
- Biases are often initialized to 0.01 or similar small value
What is regularization?
“Regularization is any modification we make to a
learning algorithm that is intended to reduce its
generalization error but not its training error.”
Goodfellow et al, 2016
Regularization methods for regression
- Lasoo - L1 encourages sparseness in weight matrices
- ridge - L2 (weight decay/parameter shrinkage)
- elastic net - combines lasoo and ridge
Inverted dropout
During training randomly drop out units according
to a dropout probability at each training epoch
How does dropout work?
Spread out weights - cannot rely on any one input too much
Disadv. of dropout
Introduces another hyperparameter - dropout probability - often one for each layer
What’s another type of regularization apart from inverted dropout?
- Dataset augmentation
- Synthesize examples by flipping, rotating, cropping, distorting
- makes dataset more robust
What is early stopping?
Allowing a model to overfit and then rollback to the point at which the error curve on the training and test sets begin to diverge
1:1 RNN
Vanilla network without RNN
Image classification
1:M RNN
Image captionning
M:1 RNN
Sentiment analysis
M:M RNN
Machine translation
M:M RNN
Video classification (syncned sequence input and output)
Why Convolutions?
The main advantages of using convolutions are parameter sharing and sparsity of connections. Parameter sharing is helpful because reduces the number of weight parameters is one layer without losing accuracy. Additionally, the convolution operation breaks down the input features into a smaller feature space, so that each output value depends on a small number of inputs and can be quickly adjusted.
CNNs
Designed to process data that come in the form of multiple arrays, for example, colour images composed of 3 2-D arrays containing pixel intensities in the 3 colour channels
Key features of CNNs
- local connections
- shared weights
- pooling
- use of many layers
- roots in neocognition
Role of convolutional layer
- detect local conjunctions of features from the previous layer
Role of pooling layer
To merge semantically similar features into one
Success of CNNs
ImageNet 2012
Halved error rates of competing approaches
Efficient use of GPUs, ReLUs
New regularizations - dropout
Techniques to generate more training examples by deforming existing ones
RNNs are good for what tasks
Those that involve sequential input (speech and language)
- process an input sequence one element at a time, maintaining in their hidden units a ‘state vector’ that implicitly contains information about the history of all the past elements of the sequence
Useful things about CNNs
- Partial connectivity (i.e. sparse connections) - not all the units in layer i are connected to all the units in layer i + 1
- Weight sharing - different parts of the network are forced to use the same weights