03 - Multi Layer Perceptron Flashcards

1
Q

What are capcity, optimization and generalization

A

Capacity: the range or scope of the types of functions that the model can approximate
Optimization: minimization of training error
Generalization: the model’s ability to adapt properly to new, previously unseen data, drawn from the same distribution as the one used to create the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe a fully connected NN, incl parameter sizes.

A

Usual form for a neural net: h = (Wx+b)
→ h - hidden layer response, W - weight matrix, 1 vector per neuron, x - input vector, b - bias vector

Like in regression, we add a bias to be able to offset the response.
- x → nx1 vector
- h → mx1 vector
- W → mxn matrix
- Wx = a→ mx1 (pre activation response)
- b → mx1
- a+b → mx1
- h = g(a+b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are hyperparameters

A

in neural nets, we train the weights and biases. Everything else that is adjustable are hyperparameters:

  • Number of layers, propagation type (fully connected, convolutional) activation function, loss function & parameters, training iterations and batch sizes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the input, hidden and output layers in a NN

A

Input layer
- Vectorized version of input data
- Sometimes it is preprocessed
- Weights connect to the hidden layer
- Weights & biases are floats, not integers, so input needs to be converted to floats

Hidden layer(s)
- There is no answer to how many layers to use, it depends on the task and should be trained
- In the example the nr of perceptrons expands, normally a compression structure is seen
- Eg. we compress pixel values → features → class possibilities.
- Wether we see an expanding or bottleneck topology is strongly application dependend

Output Layer
- usually no activation function is used for the output layer, as the probability for all classes is wanted
- For regression
- Linear outputs with MSE (Mean Square Error)
- For classification
- Softmax units (logistic sigmoid for two classes)
- many other options for other applications

The output before softmax: o = Wx+b (logits)
Predicted label: hat y=softmax(o)
Loss is found via Negative log likelihood or cross entropy:NLL/CE: (o,y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Normalization vs Standardization

A

Standardization centers data around a mean of zero and a standard deviation of one
Normalization scales data to a set range, often [0, 1], by using the minimum and maximum values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Typical activation functions

A

Sigmoid:
- sigma’(a)=sigma(a)(1-sigma(a))

Tanh:
- tanh’(x)=1-tanh(x)^2

ReLU(rectified linear unit):
- relu’(x)=step(x)
- most used

You can find the derivative by using the original input which makes sigmoid and tanh already popular, but reLU is the simplest, where the gradient is just a step function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Perceptron -> Multilayer Perceptron

A

One-way computational chain, big function block with many free parameters

  • Input processing: h_1 = g_1(W_1x+b_1)
  • Processing of first hidden representation: h_2 = g_2(W_2h_1+b_2)
  • …keep on going for each layer

Earlier, wideness was used, so many neurons but no deepness. This did not work, so now we go deep with and reduce the number of neurons pr layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

MLP Max Likelihood Recap

A

-So we have a model p, input x and try to predict an output y
-We use ML to estimate the model parameters with a training dataset
W_{ML}=arg max E_{x~*p_data}log p(y|x)
→ the expectation is the mean over the m trianing examples
*p_data is the empirical distribution, a limited sample since we do not have access to the full population

  • The model should follow this empirical distribution, such that we are able to predict future test cases
  • To get high classification accuracy, we need something to optimize with, like cross entropy or negative log likelihood
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly