CNN Flashcards
(30 cards)
What is a kernel (or filter) in a CNN?
A small matrix of learnable weights that is convolved across the input; the same weights are reused at every spatial location (weight sharing).
Define the convolution operation.
Element‑wise multiplication between the kernel and a local input patch, followed by summation and adding a bias, repeated across all spatial positions.
What is an activation (feature) map?
The 2‑D output produced by applying one kernel over the entire input.
What does stride control in convolution?
The number of pixels the kernel shifts after each computation; larger stride downsamples the output.
What is the receptive field of a neuron in a CNN layer?
The spatial size of the input region (F×F) that a particular output activation depends on.
Why is zero‑padding used?
To preserve spatial dimensions and let border pixels contribute equally during convolution.
In CNN terminology, what is a channel?
The depth dimension of the data (e.g., RGB image has 3 channels); each kernel spans all input channels.
What is pooling in CNNs and why is it used?
A down‑sampling operation (usually max pooling) that summarizes local regions, providing translation invariance and reducing spatial size.
What is a ‘block’ in practical CNN architectures?
A repeated pattern of layers, typically Conv → ReLU (×N) followed by Pooling, stacked to build deep networks.
List the main steps of a forward pass through one convolution layer.
1) Select local patch, 2) Multiply element‑wise with kernel and sum, 3) Add bias, 4) Slide by stride and repeat for all positions.
Give the formula for calculating output height of a convolution layer.
H_out = ⌊(H_in + 2P − F) / S⌋ + 1 where F is kernel size, S is stride, P is padding.
What is parameter sharing and why does it matter?
All positions use the same kernel weights, greatly reducing the number of parameters compared to fully‑connected layers.
Explain sparse connectivity in CNNs.
Each output activation depends only on a small local patch of the input, not the entire input, leading to computational efficiency.
Why are CNNs more efficient than equivalent fully‑connected layers?
Because of parameter sharing and sparse connectivity, they require far fewer weights and computations.
What benefit does max pooling provide?
It keeps the strongest activation in a region, providing robustness to small translations and slight distortions.
Define translation equivariance in CNNs.
Shifting the input causes an equal shift in the output feature map (Conv ∘ Shift = Shift ∘ Conv).
Define translation invariance and how CNNs achieve it.
The output does not change when the input is shifted; pooling and deeper layers drive CNNs toward invariance.
What is the typical high‑level architecture of a CNN?
Input → [Conv → ReLU]×N → Pool → … → Flatten/Global Pool → Dense → Softmax.
How does the number of filters affect the output?
Each filter produces one feature map, so the number of filters equals the number of output channels.
Why not always use large kernels?
They increase parameters and lose locality; stacking small kernels approximates larger receptive fields more efficiently.
What is stride 2 useful for?
It downsamples feature maps during convolution, often replacing separate pooling layers.
What role do ReLU activations play in CNNs?
They introduce non‑linearity, allowing the network to learn complex representations and mitigating vanishing gradients.
How are multiple input channels handled in convolution?
Each kernel contains a weight slice for every input channel; these slices are summed to produce each output activation.
How does kernel size affect receptive field and spatial resolution?
Larger kernels increase receptive field but also parameters; smaller kernels preserve more detailed spatial information.