Equivariance

WHen you move the image in particular ways across the image, the featirues also move. The feature detection in the output maps also move in the same way as you move the input.

Invariance

Seen at the final output. Even if you move or transpose the final digit across the image, it will still output that it is a four. There is some rotation invariance, but at some point if you rotated the digit too much, it would stop predicting that it is a four. There is also scale invariance, which means if you increase the size or decrease the size of the image, the network will be able to predict the size

Alex net

CNN layer 11x11 and relu actiivation with stride of 4 3x3 max pooling layer with 2 stride a conv layer of 5x5 2 padding a max poo 3x3 with 2 stride 3 conv layers 3x3 with 1 pad and relu activation Pool laer of 3x3 4096 fully connected 4096 fully connected 1000 fully connected

VGG

did repeated 3x3 and 2x2 max pooling layers. ENd result had a lot of parameters

Inception

Used parallel filters, filters of different sizes to get features at multiple scales. Used filter concatenation, smaller 1x1 filters, 3x3 filters, 5x5 filters, and max pooling.

Optimization error

Even if NN can perfectly model the world, it may not be able to find good weights that model the fun

Estimation Error

Even of we find the best hypothesis that minimized training error, it doesnt mean we will be able to generalize well on the test set

Modeling Error

Given a NN architecture, the actual model that represents the real world may not be in that space. There may be no set of weights that model the real world.

Cases where transfer learning doesnt work well

IF teh source dataset you train on is very different from the target dataset, transfer learning is not as effective. If you have enough data for the target domain, it won’t perform better, it will just result in faster convergence

Saliency maps

SHows sensitivity of loss to individual pixel changes. Marge sensitivity implies important pixels. In practice instead of finding the loss, we find the gradient of the classifier scores(pre softmax) before softmax. We then take the absolute value of the gradient. We can also sum across all channels.

Backprop for visualization

Normal backprop isn’t best choice for visualization. We may only get parts of the image that decrease the feature activation. There are probably lots of such input pixels. Guided backprop can be used to improve visualizations. Take gradient and only pass back pos gradient. SInce this changes how we do backprop, we do guided back prop - we zero out 0 and negative gradient.

Adversarial

Inputs formed by applying small, but intentionally worst-case pertubations to examples, such that the perturbed input results in the model outputting an incorrect answer

LOss functions

CEL, MSE, L1, L2

Style TRansfer

Focal Loss

Object detection architectures