[2] Neural Networks Flashcards
(36 cards)
What determines the output of a neuron?
It is the biased, weight sum of its inputs passed into an activation function
Why are activation functions important?
They allow the network to learn non-linearities
What is a perceptron?
A special type of ANN with:
- Real-valued inputs
- Binary output
- Threshold activation function
How are perceptrons trained?
Increase the weights (this is the threshold) based on whether the class is higher or lower than the perceptron
What idea limits the generalisability of perceptrons?
The Perceptron Convergence Theorem stats that perceptrons will converge if and only if the problem is linearly separable
Hence, they can’t learn XOR
What are the general approaches to updating weights?
Online learning updates weights after every instance; offline learning does it after every epoch.
Batch learning updates weights after every batch of instances
What algorithm is used to train neural networks?
Backpropagation:
[1] Calculate the predicted output using the current weights
[2] Calculate the error
[3] Update each weight in proportion to its gradient to the error i.e. how much changing that weight affects the error
Note: weights are trained backwards i.e. start at the last hidden layer
What are some potential issues when using backpropagation?
Improper learning rate leads to divergence or slow convergence
Overfitting if training too long, for using too many weights, or using too few instances
Local minima
How should variables be represented in an ANN?
Use a binary representation (i.e. one hot encoding) for nominal variables
For numeric variables, consider scaling or standardisation
What is scaling and standardization? When should each be used?
Scaling - scale then numbers between [0,1] if they are on a similar range
Standardisation - assume a normal distribution and scale it to N(0,1) if the values are more varied
What can happen if ANN weights aren’t set appropriately?
If they are all set to 0, the network will be symmetric i.e. all the weights will change together, and so it won’t train
If the weights are too high, the activation will be in the part of the sigmoid with a shallow gradient, and so training will be slow,
How should ANN weights be set?
Using fan-in factor, i.e. using a uniform random generator between -1/sqrt(d) and 1/sqrt(d) where d is the number of inputs
This ensures the variance of the weighted sum is approximately 1/3
How can back propagation be sped up?
With momentum, in which gradients from previous steps are used in addition to the current gradient
How can weight matricies be visualised?
With Hinton diagrams, in which the size of the square is based on the magnitude; it is white if it is positive and black if it is negative
What are the key principles of CNNs?
The automatically extract features to produce a feature map
They are not fully connected - convolutions with shared weights are used instead
What are the dimensions of a feature map?
In each direction, it is (image_size - filter_size) / shift + 1
What techniques can be applied to optimise CNNs?
- Subsampling aggregates based on the maximal value; this reduces data while retaining/emphasizing the information
- Weight smoothing use used when domain specific knowledge suggests adjacent inputs are related
- Centered weight initialization starts with higher weights in the center, as these are often where objects are found
What is a weight agnostic network?
It has a single weight shared by the whole network; training is done by newtwork topology search.
What operations occur while training a weight agnostic network?
- Insert node by splitting an existing connection
- Add connection - connect two previously connected nodes
- Change activation - change the activation function of a node
What are HONNs?
Higher order neural networks connect each input to multiple nodes in the first hidden layer.
The order is the number of nodes that each input connects to.
CNN are a special type of HONN
Why are HONNs useful?
Instead of just taking the weighted sum, for each combination of inputs they take a sum of the weighted products.
This allows them to explore higher order relationships i.e. products; for example, they can solve XOR
What are self-organizing maps?
They represent high dimensional data in lower dimensions by weighting mapping inputs to neurons
The weights are trained by competitive learning in which the node whose weight is closest to the input value is chosen to fire. It updates its weights to reinforce those that made it win. the neighborhood function preserves topology
What are residual neural networks?
They have shortcut connections between layers.
This makes training more effective as it reduces the vanishing gradient effect
What is EvoCNN?
A GA to automatically train network structures. It uses a two-level encoding to describe the layers and then their connections
Each mutation performs one of three actions:
- Add a new unit (convolutional, pooling or full)
- Modify an existing unit’s encoded information
- Delete an existing unit