ResNets Flashcards
(24 cards)
What problem do ResNets aim to solve?
Vanishing and exploding gradients in deep neural networks.
What causes vanishing gradients in deep networks?
Repeated multiplication of small derivatives during backpropagation.
What is a common sign of gradient instability in deep models?
Abnormal gradient distributions, such as near-zero or spiked values.
What are three signs of unstable gradient flow?
Abnormal gradients, chaotic learning curves, and irregular layer outputs.
Why is ReLU preferred over sigmoid in deep networks?
ReLU better preserves gradients during backpropagation.
What is the core idea behind residual connections?
Instead of learning y = f(x), learn y = f(x) + x.
What is a residual block?
A network unit that adds its input to its output after a series of transformations.
Why do residual connections help with training deep models?
They allow gradients to flow more easily through the network.
What does f(x) + x mean in a ResNet?
The output is the sum of the learned transformation and the original input.
What happens if f(x) learns nothing in a ResNet?
The identity connection ensures the network can still pass input forward.
What analogy links ResNets and LSTMs?
Both preserve information over structure—LSTM across time, ResNet across depth.
How do residual connections relate to vanishing gradients?
They reduce the chance of vanishing gradients by providing an unimpeded gradient path.
What are skip connections in ResNets not equivalent to?
Encoder-decoder skip paths like in U-Nets.
What does a typical ResNet block contain?
Two convolutional layers and a skip connection with optional batch norm and ReLU.
What is one advantage of using residual blocks in CNNs?
They allow very deep networks to be trained effectively.
What is ResNet-34?
A 34-layer residual network designed for ImageNet-level performance.
What architecture enabled the training of 100+ layer CNNs?
Residual Neural Networks (ResNets).
What is one limitation of ResNets?
Reduced interpretability due to multiple forward paths.
What can happen if residual blocks are poorly designed?
They may default to identity mappings and learn nothing useful.
What is a computational cost of using ResNets?
Increased parameter count and training time due to added layers.
Why might debugging ResNets be difficult?
Because of the complexity introduced by residual pathways.
What do residual blocks encourage the network to learn?
Only the difference (residual) between input and desired output.
What paper introduced ResNets?
‘Deep Residual Learning for Image Recognition’ by He et al., 2015.
Why is deeper not always better in plain CNNs?
Deeper networks can suffer from degraded training due to gradient issues.