- Building block of a neural network - Encoding of the equation into a small graph - small nodes x1,..., xn, 1 and their edges w1,...,wn, b - then linear function node calculates the sum of the multiplication of each small node going into it by its edge. - then the step function returns a 1 or 0 depending if the result is negative or positive linear function - Perceptrons as Logical Operators

- Punto (4,5) bias = 1 - w1 - 4, w2 -5, bias - 1

1. Start with random weights: w1, ..., wn, b 2. For every misclassified point (x1, ..., xn): If prediction = 0: - For i = 1 ...n - Change wi + a xi - Change b to b + a 3. For every misclassified point (x1, ..., xn): If prediction = 1: - For i = 1 ...n - Change wi - a xi - Change b to b - a

- 1/m sum(i in m) (1-yi)(1-ln(^yi)) + yiln(^yi) - m is the number of points

Deep neural networks Flashcards by Alvaro Pinzon Cortes

What are the applications of deep neural networks?

Medical, stocks, self driving cars, computer vission, medical diagnosis

How well did you know this?

Not at all

Perfectly

What are neural networks?

Vaguely mimic the process in which the brain operates, neurons that fire bits of info.
Simple explanation: There are some red and blue shells in the floor, the neural network draws the line that best separates them. Neural networks given some data in the form of red and blue points it find the line that best separates them

How well did you know this?

Not at all

Perfectly

boundary line general equation 2d

w1x1+w2x2+b+0

How well did you know this?

Not at all

Perfectly

What is a percepton?

Building block of a neural network
Encoding of the equation into a small graph
small nodes x1,…, xn, 1 and their edges w1,…,wn, b
then linear function node calculates the sum of the multiplication of each small node going into it by its edge.
then the step function returns a 1 or 0 depending if the result is negative or positive linear function
Perceptrons as Logical Operators

How well did you know this?

Not at all

Perfectly

Why are neural networks called like that?

Because it works similarly to a neuron. The percepton gets an input and based on the inputs it returns an output.
A neuron gets electrical inputs through its dendrites and then decides to emit an electrical pulse or not

How well did you know this?

Not at all

Perfectly

Perceptron trick?

Punto (4,5) bias = 1

- w1 - 4, w2 -5, bias - 1

How well did you know this?

Not at all

Perfectly

Perceptron algorithm

Start with random weights: w1, …, wn, b
For every misclassified point (x1, …, xn): If prediction = 0:
- For i = 1 …n
- Change wi + a xi
- Change b to b + a
For every misclassified point (x1, …, xn): If prediction = 1:
- For i = 1 …n
- Change wi - a xi
- Change b to b - a

How well did you know this?

Not at all

Perfectly

What is an error function?

Tells us how far we are from the solution
For Neural nets the result should be continuos not discrete
The error function should assign a penalty value to each point. More to the incorrectly classified
We need to move from discrete predictions to continious

How well did you know this?

Not at all

Perfectly

What is the new perceptron?

Replace the step function with the sigmond function.
1 / (1 + e ^ (-x))
The sigmond will give a greater probability to a big number, 0.5 to 0 and very small probabilities to negative numbers

How well did you know this?

Not at all

Perfectly

How to do multi-class classification?

We use the softmax function.

How well did you know this?

Not at all

Perfectly

What is the softmax function?

It helps to translate a score from a function (Linear for example) to a probability.
You have scores Z1, …, Zn
P(class i) = e^Zi/(e^Z1+…+e^Zn)

How well did you know this?

Not at all

Perfectly

What is cross-entropy?

It is used for calculating the maximum likelihood of a model.
-ln(p1)-ln(p2)
Goal is to minimize the cross-entropy. Better if the number is small
Formula
yi = 1 if present on box i
sum yiln(pi) + (1-yi)ln(1-pi)

How well did you know this?

Not at all

Perfectly

Multi-class cross-entropy?

-sum sum yij*ln(pij)

How well did you know this?

Not at all

Perfectly

Error function

1/m*sum(i in m) (1-yi)(1-ln(^yi)) + yiln(^yi)

- m is the number of points

How well did you know this?

Not at all

Perfectly

Why and how to calculate the gradient descent?

The negative of delta E (gradient descent) is the direction to move to decrease the most error
It is the derivative of the error function by each weight and bias

How well did you know this?

Not at all

Perfectly

How to combine two preceptons?

Study These Flashcards

Calculate the probability of the first model
Calculate the probability of the second model
Add the two probabilities and apply the sigmond equation

Layer in a neural network?

Study These Flashcards

Input layer: The inputs of the model
Hidden layer: The first models
Output layer: The combination of the models

How to do multiclass classification with a neural network?

Study These Flashcards

Add more nodes to the output layer that gives a probability for each class

What is to train a neural network?

Study These Flashcards

It means to estimate the most appropriate weights

What is feedforward?

Study These Flashcards

Feedforward is the process neural networks use to turn the input into an output

What is backpropagation?

Study These Flashcards

Doing a feedforward operation.
Comparing the output of the model with the desired output.
Calculating the error.
Running the feedforward operation backwards (backpropagation) to spread the error to each of the weights.
Use this to update the weights, and get a better model.
Continue this until we have a model that is good.

What is Keras?

Study These Flashcards

It is a library to code neural networks
The general idea for this example is that you’ll first load the data, then define the network, and then finally train the network.

Simple neural net with Keras

Study These Flashcards

from keras.models import Sequential
from keras.layers.core import Dense, Activation, Flatten

    #Create the Sequential model
    model = Sequential()

    #1st Layer - Add an input layer of 32 nodes
    model.add(Dense, input_dim=32)

    #2nd Layer - Add a fully connected layer of 128 nodes
    model.add(Dense(128))

    #3rd Layer - Add a softmax activation layer
    model.add(Activation('softmax'))

    #4th Layer - Add a fully connected layer
    model.add(Dense(10))

    #5th Layer - Add a Sigmoid activation layer
    model.add(Activation('sigmoid'))

model. compile(loss=”categorical_crossentropy”, optimizer=”adam”, metrics = [‘accuracy’])
model. summary()

model. fit(X, y, nb_epoch=1000, verbose=0)
model. evaluate()

What is an epoch?

Study These Flashcards

It is an step in the direction of the gradient to decrease the error

Batch and stochastic gradient descent

- To do feedforward and backpropagation to recalculate the weights and improve them split the data into batches and train into those batches. This is more efficient - This instead of always using all the data each step you do it with each small batch

Learning rate rule of thumb

- If your model is not working decrease the learning rate

Learning rate decay?

- If you are approaching the best solution decrease learning rate. - If steep long steeps - If plain short steeps

Why to test a model?

- Train with a set of data and then test with a different set of data will allow the model to avoid overfitting

How to determine the number of epochs?

- With a model complexity graph. A good number of epochs is one where the number of training errors and testing errors are low

What is regularization?

Is the way to avoid model overfitting. You penalize high weight coefficients. Add to the error function Lamda*(w1^2 ... w2^2)

what is dropout?

- it is not using some nodes in the training.

activation functions?

- Sigmond - hyperbolic tangent. Similar to sigmond but it is between 1 and -1. It allows bigger steps. - relu. if x>=0 then x, else 0. Or the max between 1 and 0 - The last activation function needs to be a sigmond, because the output needs to be a probability

how to avoid local minimum?

- Restart from different places. This means to change the initial weight values in each restart

What is momentum?

- It is a different way to estimate the step. | - The step is a linear combination of the past steps

Keras optimizers

SGD This is Stochastic Gradient Descent. It uses the following parameters: Learning rate, Momentum, Nesterov Momentum ``` Adam Adam (Adaptive Moment Estimation) uses a more complicated exponential decay that consists of not just considering the average (first moment), but also the variance (second moment) of the previous steps. ``` RMSProp RMSProp (RMS stands for Root Mean Squared Error) decreases the learning rate by dividing it by an exponentially decaying average of squared gradients.

Good tips to avoid overfitting

- Increase the batch size. The number of batches represents the number of groups in which the data is divided. It increases the number of iterations, but make it quicker to do each - Use dropout

Deep neural networks Flashcards

(36 cards)