the rest Flashcards
(13 cards)
what is a key difference between fully connected mlps and cnn’s
In contrast to fully connected networks where every input is connected with every neuron in a subsequent layer, CNNs use convolutions as the base operation
CNNs capture local spatial relationships within an image. Using CNNs for image data allows for a smaller number of weights, and, therefore, lightweight and easier to train models with the same or better predictive power as fully connected networks for image based tasks.
equation for output size of a convolution
(Original Image Dimension - Filter Dimension + 2 x Padding Size) / Stride +1
Equation for Number of parameters
F x F x D x K + K
(Filter Dimension are FxFxD, K is number of filters + K accounts for bias term)
what is a problem with RNN
-vanishing / exploding gradient problem. As the network repeats the application of a single weight repeatedly, it’s final value of will be x^n where n is the number of times it is applied and x is the weight, if weight >1 = exploding, weight <1 vanishing
-If we do backpropagation, it will be very hard to take the gradient descent steps to find the optimal values
What is the difference in activation functions between vanilla RNN’s and LSTM RNN’s
While vanila RNN’s typically use ReLu functions. LSTM’s use sigmoid activation functions (normalises between 0 and 1) and tanH (normalise values between -1 and 1)
How does the long term memory % unit in a LSTM network work (first unit)
We have two components, long-term memory and short-term memory:
-Short term memory (some value) is multiplied by some weight added to the input (which is also multiplied by some weight) with a summed bias and inserted into an activation function
-The result of the activation function is multiplied by the long term memory value to get the output for that run
-The first stage in LSTM ynit determines what percentage of the Long-Term memory is remembered
How does the second third unit of a LSTM network function (%poternial memory to remember, potential Long term memory unit)
weighted short-term memory and input is summed together with an added bias and put in a sigmoid activation function largely like the first unit
the third unit functions the same way but the activation function is a tanH function.
The resulting values of these two units are multiplied together and added to the result of the first units output
This becomes the new long term memory
(these are called the memory gates)
What is the fincal stage of the LSTM sequence
The short-term memory is multiplied by a weight as is the input, these are summed and added to a bias and plugged into a sigmoid function.
Then the result of the output of the first 3 units (the new long term memory) is put in a sigmoid function (to determine the percentage that the LSTM remembers) and is multiplied by the result of the aforementioned
The result becomes the new short term memory
What is an autoencoder
A neural network which reduces the dimensionality of an input into a latent vector and then learns to reconstruct it from the latent vector
learning is done by minimising reconstruction loss
what are the advantages of an autoencoder
-maps input into lower dimensions (bottleneck layer) which reducer number of parameters (unsupervised)
-much like PCA it reduces dimensions but it also reconstructs output with significantly lower information loss than PCA
what are some uses of autoencoders?
-anomaly detection (comparing reconstruction error of a trained model to new inputs, if data is anomolous the variation will be higher than in conventional models)
-probability distribution estimation
-sending compressed data over networks
-unsupervised clustering
what are some problems with autoencoders
areas with unseen data cannot be reconstructed effectively, it will lead to garbage
What is the difference between an auto encoder and a generative auto encoder
-generative auto encoder learns the probability density function modelled as a gaussian that some input could have been generated by as the latent space. Auto encoder just learns a lower dimensional representation of the input and reconstructs accordingly
-GANs have 2x the output neurons of AE as they have an output for both the mean of the input dataset as well as their variance