C3 Flashcards

1
Q

types of decision regions

A
  1. network with a single node -> for separation with just one line
  2. one-hidden layer network -> realize a convex region: each hidden node realizes on of the lines bounding the region
  3. two-hidden layer network -> realizes the union of three convex regions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to train multi-layer networks?

A

replace the sign function by its smooth approximation and use the gradient descent algorithm to find weights that minimize the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

weight update rule

A

gradient descent method: walk in the direction yielding the maximum decrease of the network error E
Δw_ji = −eta * 𝜕E / 𝜕w_ji
w_ji = wji + Δw_ji

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

backpropagation algorithm

A

the algorithm searches for weight values that minimize the total error of the network

consists of the repeated application of these two phases:
- forward pass: network is activated on one example and the error of each neuron of the output layer is computed, and also the activations of all hidden nodes
- backward pass: network error is used for updating the weights. Starting at the output layer, the error is propagated backwards through the network, layer by layer, with help of the generalized delta rule. Finally all weights are updated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 update strategies

A
  • full batch mode: weights are updated after all the inputs are processed
  • (mini) batch mode: weights are updated after a small random sample of inputs is processed (Stochastic Gradient Descent)
  • one-line mode: weights are updated after processing single inputs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

advantages Stochastic Gradient Descent

A
  • additional randomness helps to avoid local minima
  • huge savings of CPU time
  • easy to execute on GPU cards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

stopping criteria

A
  • total mean squared error change: backpropagation is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small
  • generalization based criterion: after each epoch the network is tested for
    generalization using a different set of examples (validation set). If the generalization performance is adequate then stop (Early Stopping: avoid overfitting)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 common error functions with corresponding activation functions of the output layer

A
  • linear => SSE (sum of squared errors) (regression)
  • logistic => cross-entropy (binary)
  • softmax => cross-entropy + softmax (multiclass)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly