Deep neural networks Flashcards

1
Q

What are the applications of deep neural networks?

A
  • Medical, stocks, self driving cars, computer vission, medical diagnosis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are neural networks?

A
  • Vaguely mimic the process in which the brain operates, neurons that fire bits of info.
  • Simple explanation: There are some red and blue shells in the floor, the neural network draws the line that best separates them. Neural networks given some data in the form of red and blue points it find the line that best separates them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

boundary line general equation 2d

A

w1x1+w2x2+b+0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a percepton?

A
  • Building block of a neural network
  • Encoding of the equation into a small graph
  • small nodes x1,…, xn, 1 and their edges w1,…,wn, b
  • then linear function node calculates the sum of the multiplication of each small node going into it by its edge.
  • then the step function returns a 1 or 0 depending if the result is negative or positive linear function
  • Perceptrons as Logical Operators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are neural networks called like that?

A
  • Because it works similarly to a neuron. The percepton gets an input and based on the inputs it returns an output.
  • A neuron gets electrical inputs through its dendrites and then decides to emit an electrical pulse or not
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Perceptron trick?

A
  • Punto (4,5) bias = 1

- w1 - 4, w2 -5, bias - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Perceptron algorithm

A
  1. Start with random weights: w1, …, wn, b
  2. For every misclassified point (x1, …, xn): If prediction = 0:
    - For i = 1 …n
    - Change wi + a xi
    - Change b to b + a
  3. For every misclassified point (x1, …, xn): If prediction = 1:
    - For i = 1 …n
    - Change wi - a xi
    - Change b to b - a
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an error function?

A
  • Tells us how far we are from the solution
  • For Neural nets the result should be continuos not discrete
  • The error function should assign a penalty value to each point. More to the incorrectly classified
  • We need to move from discrete predictions to continious
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the new perceptron?

A
  • Replace the step function with the sigmond function.
  • 1 / (1 + e ^ (-x))
  • The sigmond will give a greater probability to a big number, 0.5 to 0 and very small probabilities to negative numbers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to do multi-class classification?

A
  • We use the softmax function.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the softmax function?

A
  • It helps to translate a score from a function (Linear for example) to a probability.
  • You have scores Z1, …, Zn
  • P(class i) = e^Zi/(e^Z1+…+e^Zn)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is cross-entropy?

A
  • It is used for calculating the maximum likelihood of a model.
  • -ln(p1)-ln(p2)
  • Goal is to minimize the cross-entropy. Better if the number is small
    Formula
    yi = 1 if present on box i
  • sum yiln(pi) + (1-yi)ln(1-pi)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Multi-class cross-entropy?

A
  • -sum sum yij*ln(pij)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Error function

A
  • 1/m*sum(i in m) (1-yi)(1-ln(^yi)) + yiln(^yi)

- m is the number of points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why and how to calculate the gradient descent?

A
  • The negative of delta E (gradient descent) is the direction to move to decrease the most error
  • It is the derivative of the error function by each weight and bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to combine two preceptons?

A
  • Calculate the probability of the first model
  • Calculate the probability of the second model
  • Add the two probabilities and apply the sigmond equation
17
Q

Layer in a neural network?

A
  • Input layer: The inputs of the model
  • Hidden layer: The first models
  • Output layer: The combination of the models
18
Q

How to do multiclass classification with a neural network?

A
  • Add more nodes to the output layer that gives a probability for each class
19
Q

What is to train a neural network?

A
  • It means to estimate the most appropriate weights
20
Q

What is feedforward?

A
  • Feedforward is the process neural networks use to turn the input into an output
21
Q

What is backpropagation?

A

Doing a feedforward operation.
Comparing the output of the model with the desired output.
Calculating the error.
Running the feedforward operation backwards (backpropagation) to spread the error to each of the weights.
Use this to update the weights, and get a better model.
Continue this until we have a model that is good.

22
Q

What is Keras?

A
  • It is a library to code neural networks
  • The general idea for this example is that you’ll first load the data, then define the network, and then finally train the network.
23
Q

Simple neural net with Keras

A

from keras.models import Sequential
from keras.layers.core import Dense, Activation, Flatten

    #Create the Sequential model
    model = Sequential()
    #1st Layer - Add an input layer of 32 nodes
    model.add(Dense, input_dim=32)
    #2nd Layer - Add a fully connected layer of 128 nodes
    model.add(Dense(128))
    #3rd Layer - Add a softmax activation layer
    model.add(Activation('softmax'))
    #4th Layer - Add a fully connected layer
    model.add(Dense(10))
    #5th Layer - Add a Sigmoid activation layer
    model.add(Activation('sigmoid'))

model. compile(loss=”categorical_crossentropy”, optimizer=”adam”, metrics = [‘accuracy’])
model. summary()

model. fit(X, y, nb_epoch=1000, verbose=0)
model. evaluate()

24
Q

What is an epoch?

A

It is an step in the direction of the gradient to decrease the error

25
Q

Batch and stochastic gradient descent

A
  • To do feedforward and backpropagation to recalculate the weights and improve them split the data into batches and train into those batches. This is more efficient
  • This instead of always using all the data each step you do it with each small batch
26
Q

Learning rate rule of thumb

A
  • If your model is not working decrease the learning rate
27
Q

Learning rate decay?

A
  • If you are approaching the best solution decrease learning rate.
  • If steep long steeps
  • If plain short steeps
28
Q

Why to test a model?

A
  • Train with a set of data and then test with a different set of data will allow the model to avoid overfitting
29
Q

How to determine the number of epochs?

A
  • With a model complexity graph. A good number of epochs is one where the number of training errors and testing errors are low
30
Q

What is regularization?

A

Is the way to avoid model overfitting. You penalize high weight coefficients. Add to the error function Lamda*(w1^2 … w2^2)

31
Q

what is dropout?

A
  • it is not using some nodes in the training.
32
Q

activation functions?

A
  • Sigmond
  • hyperbolic tangent. Similar to sigmond but it is between 1 and -1. It allows bigger steps.
  • relu. if x>=0 then x, else 0. Or the max between 1 and 0
  • The last activation function needs to be a sigmond, because the output needs to be a probability
33
Q

how to avoid local minimum?

A
  • Restart from different places. This means to change the initial weight values in each restart
34
Q

What is momentum?

A
  • It is a different way to estimate the step.

- The step is a linear combination of the past steps

35
Q

Keras optimizers

A

SGD
This is Stochastic Gradient Descent. It uses the following parameters: Learning rate, Momentum, Nesterov Momentum

Adam
Adam (Adaptive Moment Estimation) uses a more complicated exponential decay that consists of not just considering the average (first moment), but also the variance (second moment) of the previous steps.

RMSProp
RMSProp (RMS stands for Root Mean Squared Error) decreases the learning rate by dividing it by an exponentially decaying average of squared gradients.

36
Q

Good tips to avoid overfitting

A
  • Increase the batch size. The number of batches represents the number of groups in which the data is divided. It increases the number of iterations, but make it quicker to do each
  • Use dropout