Deep Learning Prep Flashcards
(37 cards)
Difference between AI, Machine learning and Deep Learning
Deep learning simulates the brain. ML uses statistical methods to enable machines to improve with experience. Deep learning makes computation of the multi layered neural network feasible.
Is deep learning better than ML?
Deep learning is more useful for working with high dimensional data. When we have a large amount of inputs or inputs with different types of data.
What is a perceptron and how does is work?
Deep learning uses the concept of functioning neurons like biological neurons. A perceptron is a linear model used for binary classification. It models a neuron which has a set of inputs, each neuron has a specific weight.
What is the role of weights and bias’
Normally bias is treated as another weighted input. Weights are also an additional input that decides which neurons will be activated.
What are the activation functions? *
- Linear / Identity *
- Unit or Binary Step
- Sigmoid or Logistic
- Tanh
- Relu
- Softmax
The activation function decides if a neuron should be activated or not by calculating the weighted sum and further adding bias. The purposed of the activation function is to introduce a non linearity to the output of the neuron.
Explain learning of perceptron?
4 steps:
- Initializing weights and threshold
- Provide input and calculate the output
- Update the weights
- Repeat steps 2 and 3
What is the significance of cost or loss function?
A cost function is a measure of the accuracy of the neural network with respect to a given training sample and expected output. It provides the performance of the neural network as a whole, in deep learning the goal is to minimize the cost function. For that we use the concept of gradient descent.
What is gradient descent? *
Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient *
What are benefits of mini batch gradient descent?*
More efficient when compared to stochastic gradient descent. Generalization by finding the flat minima. Mini batches allows help to approximate the gradient of the entire training set which helps us to avoid local minima. *
What are steps for using gradient descent algorithms?
Steps:
- Initialize random weight and bias
- Pass an input through the network and get values from the output layer
- Calculate the error between the actual value and the predicted value
- Go to each neuron which contributes to the error and then change its respective values to reduce the error
- Reiterate until you find the best weights of the network
Create a gradient decent in python
# our weights params = [weights_hidden, weights_output, bias_hidden, bias_output]
#define function with sgd def sgd(cost, params, lr=0.05):
grads = T.grad(cost=cost, wrt=params) updates = []
for p, g in zip(params, grads):
updates.append([p, p- g * lr])
return updates
updates = sgd(cost, params)
What are the shortcomings of a single layer perceptron?
Single layer perceptron cannot classify non-linearly separable data points. They cannot solve complex problems that have a lot of parameters.
What is a multi-layer perceptron?
Is a deep artificial neural network that is composed of more than one perceptron. They are composed of an input layer to receive the signal, an output layer that makes a decision or prediction about the input and in between those two, an arbitrary number of hidden layers that are the true computational engine of the MLP
- Input nodes
- Hidden nodes
- Output nodes
What exactly is data normalization and why do we need it?
Is a very important pre processing step, used to rescale values to fit in a specific range to assure better convergence during backpropagation. In general it boils down to subtracting the mean of each data point and dividing by its standard deviation so that we have normally distributed data.
What are better, deep NN’s or shallow ones?
At every level a network learns more and new and more abstract representation of the input. Deeper networks work better.
What is weight initialization in a deep NN?
Bad initialization can prevent a NN from learning and a good one can speed up the rate of convergence and a better overall error. The rule is to set the weights close to zero without being too small
What is the difference between a feed forward and back propagation NN?
Feed forward: connections are fed forward and do not form cycles
Back Propagation: Consists of two steps, the first is feed forwarding the values and the second is to calculate the error and propagate it back to the earlier layers
What are hyperparameters? Name a few in any NN
Hyperparameters are the variables which determine the network structure and the variables which determine how the network is trained (eg. Learning Rate, number of epochs, batch size)
Explain the different hyper parameters related to network and training?
Network Hyperparameters:
- Number of hidden layers: the layers between the input and the output layers
- Network weight initialization: mostly uniform weight distributions are used
- Activation function: are used to introduce non linearity to the models, which allows deep learning models to learn non linear prediction boundaries. Generally the rectifier activation function (ReLu) is the most popular.
Training Parameters:
- Learning Rate: Defines how quickly a network updates it parameter. Low learning rate slows down the learning process but converts smoothly. A larger learning rate speeds up learning rate but may not converge as smooth as low LR. Usually a decaying learning rate is preferred to get best of both worlds and best expected output.
- Momentum: helps identify direction of next step with knowledge of previous step. Helps prevent Oscillation and is typically between 0.5-0.9
- Number of Epochs: Number of times the network is shown the training data. High numbers can lead to overfitting.
- Batch size: number of samples given to the network after updating parameters, usually in 32, 16 or 64 (Is arbitrary number)
What is dropout?
Is a regularization technique to avoid overfitting, which is to increase the validation accuracy, thus increasing the generalizing power.
Generally we use a small dropout value of 20%-50% of neurons. A value too low has minimal effect and a value too high results in under learning by the network.
In training a NN you notice that the loss does not decrease in the few starting epochs. What could be the reason?
- The learning rate is too low
- Regularization parameter is too high
- Stuck at local minima
Name a few deep learning frameworks
- Tensorflow
- Pytorch
- Keras
- CNTK
- Caffe
- Chainer
What are tensors?
Tensors are nothing but a de facto representation of the data in deep learning. They are multidimensional arrays, that allows you to represent data having higher dimensions. In general, Deep Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set.
List a few advantages of Tensorflow
- It has platform flexibility
- It is easily trainable on CPU as well as GPU for distributed computing
- Tensorflow has auto differentiating capabilities
- It has advanced support for threads, asynchronous computation.
- It is customizable and open source