[2] Neural Networks Flashcards

1
Q

What determines the output of a neuron?

A

It is the biased, weight sum of its inputs passed into an activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are activation functions important?

A

They allow the network to learn non-linearities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a perceptron?

A

A special type of ANN with:

  • Real-valued inputs
  • Binary output
  • Threshold activation function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are perceptrons trained?

A

Increase the weights (this is the threshold) based on whether the class is higher or lower than the perceptron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What idea limits the generalisability of perceptrons?

A

The Perceptron Convergence Theorem stats that perceptrons will converge if and only if the problem is linearly separable

Hence, they can’t learn XOR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the general approaches to updating weights?

A

Online learning updates weights after every instance; offline learning does it after every epoch.

Batch learning updates weights after every batch of instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What algorithm is used to train neural networks?

A

Backpropagation:
[1] Calculate the predicted output using the current weights
[2] Calculate the error
[3] Update each weight in proportion to its gradient to the error i.e. how much changing that weight affects the error

Note: weights are trained backwards i.e. start at the last hidden layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some potential issues when using backpropagation?

A

Improper learning rate leads to divergence or slow convergence

Overfitting if training too long, for using too many weights, or using too few instances

Local minima

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How should variables be represented in an ANN?

A

Use a binary representation (i.e. one hot encoding) for nominal variables

For numeric variables, consider scaling or standardisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is scaling and standardization? When should each be used?

A

Scaling - scale then numbers between [0,1] if they are on a similar range

Standardisation - assume a normal distribution and scale it to N(0,1) if the values are more varied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can happen if ANN weights aren’t set appropriately?

A

If they are all set to 0, the network will be symmetric i.e. all the weights will change together, and so it won’t train

If the weights are too high, the activation will be in the part of the sigmoid with a shallow gradient, and so training will be slow,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How should ANN weights be set?

A

Using fan-in factor, i.e. using a uniform random generator between -1/sqrt(d) and 1/sqrt(d) where d is the number of inputs

This ensures the variance of the weighted sum is approximately 1/3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can back propagation be sped up?

A

With momentum, in which gradients from previous steps are used in addition to the current gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can weight matricies be visualised?

A

With Hinton diagrams, in which the size of the square is based on the magnitude; it is white if it is positive and black if it is negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the key principles of CNNs?

A

The automatically extract features to produce a feature map

They are not fully connected - convolutions with shared weights are used instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the dimensions of a feature map?

A

In each direction, it is (image_size - filter_size) / shift + 1

17
Q

What techniques can be applied to optimise CNNs?

A
  • Subsampling aggregates based on the maximal value; this reduces data while retaining/emphasizing the information
  • Weight smoothing use used when domain specific knowledge suggests adjacent inputs are related
  • Centered weight initialization starts with higher weights in the center, as these are often where objects are found
18
Q

What is a weight agnostic network?

A

It has a single weight shared by the whole network; training is done by newtwork topology search.

19
Q

What operations occur while training a weight agnostic network?

A
  • Insert node by splitting an existing connection
  • Add connection - connect two previously connected nodes
  • Change activation - change the activation function of a node
20
Q

What are HONNs?

A

Higher order neural networks connect each input to multiple nodes in the first hidden layer.

The order is the number of nodes that each input connects to.

CNN are a special type of HONN

21
Q

Why are HONNs useful?

A

Instead of just taking the weighted sum, for each combination of inputs they take a sum of the weighted products.

This allows them to explore higher order relationships i.e. products; for example, they can solve XOR

22
Q

What are self-organizing maps?

A

They represent high dimensional data in lower dimensions by weighting mapping inputs to neurons

The weights are trained by competitive learning in which the node whose weight is closest to the input value is chosen to fire. It updates its weights to reinforce those that made it win. the neighborhood function preserves topology

23
Q

What are residual neural networks?

A

They have shortcut connections between layers.

This makes training more effective as it reduces the vanishing gradient effect

24
Q

What is EvoCNN?

A

A GA to automatically train network structures. It uses a two-level encoding to describe the layers and then their connections

Each mutation performs one of three actions:

  • Add a new unit (convolutional, pooling or full)
  • Modify an existing unit’s encoded information
  • Delete an existing unit
25
Q

What are auto-encoders?

A

Neural networks that have been trained to copy their input to their output

The use an intermediate layer called the latent representation

26
Q

How is the loss of an auto-encoder calculated?

A

The difference between the input and output (or a special domain-specific variations)

27
Q

What are the main configurations of auto encoder?

A

Under-complete auto encoders have latent representations with smaller dimensions than the input and output.

Otherwise, they are over-complete

28
Q

What do under-complete auto-encoders do?

A

They learn the most salient features of the data

29
Q

How do over-complete auto-encoders work?

A

They use regularisation to avoid simply copying the data

Sparsity auto encoders trie to push as many output values of the latent representation to 0 as possible

Contractive auto encoders regularise by derivative penalty, meaning the output of each node is smooth if the input changes slightly. This makes them robust to slight fluctuations

30
Q

What are some particular applications of auto encoders?

A

De-noising auto-encoders remove noise from the image

Variational auto-encoders modify an image in a desired way. However, the latency space must might not be continuous and so instead of a point in the latent space, a distribution is used.

31
Q

Why is cross-entropy used?

A

It’s gradients are more pronounced at extreme values, leading to faster convergence

32
Q

Why is ReLU often used?

A

It is fast to compute, minimises the impact of vanishing gradient, and encourages sparsity

33
Q

What is the purpose of regularisation?

A

It prevents weights from getting too large, and pushes as many to zero as possible (allowing them to be ignored)

34
Q

What is a particular type of regularisation?

A

Lasso regularisation uses L1 to remove irrelevant variables from a linear model

35
Q

How does dropout work?

A

A random percentage of neutrons are removed on each mini-batch

Note: for inference, the weights must be multiplied by (1-p)%

36
Q

What are the general strategies for transfer learning?

A

Learn shared hidden representations (i.e. DLID). This is useful if the classes are the same, but the way they are captured differs (i.e. different camera types)

Shared features - use this when the head layers do the same general task, but the tail does a particular task.