CNN Flashcards

(30 cards)

1
Q

What is a kernel (or filter) in a CNN?

A

A small matrix of learnable weights that is convolved across the input; the same weights are reused at every spatial location (weight sharing).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define the convolution operation.

A

Element‑wise multiplication between the kernel and a local input patch, followed by summation and adding a bias, repeated across all spatial positions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an activation (feature) map?

A

The 2‑D output produced by applying one kernel over the entire input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does stride control in convolution?

A

The number of pixels the kernel shifts after each computation; larger stride downsamples the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the receptive field of a neuron in a CNN layer?

A

The spatial size of the input region (F×F) that a particular output activation depends on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is zero‑padding used?

A

To preserve spatial dimensions and let border pixels contribute equally during convolution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In CNN terminology, what is a channel?

A

The depth dimension of the data (e.g., RGB image has 3 channels); each kernel spans all input channels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is pooling in CNNs and why is it used?

A

A down‑sampling operation (usually max pooling) that summarizes local regions, providing translation invariance and reducing spatial size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a ‘block’ in practical CNN architectures?

A

A repeated pattern of layers, typically Conv → ReLU (×N) followed by Pooling, stacked to build deep networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

List the main steps of a forward pass through one convolution layer.

A

1) Select local patch, 2) Multiply element‑wise with kernel and sum, 3) Add bias, 4) Slide by stride and repeat for all positions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give the formula for calculating output height of a convolution layer.

A

H_out = ⌊(H_in + 2P − F) / S⌋ + 1 where F is kernel size, S is stride, P is padding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is parameter sharing and why does it matter?

A

All positions use the same kernel weights, greatly reducing the number of parameters compared to fully‑connected layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain sparse connectivity in CNNs.

A

Each output activation depends only on a small local patch of the input, not the entire input, leading to computational efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why are CNNs more efficient than equivalent fully‑connected layers?

A

Because of parameter sharing and sparse connectivity, they require far fewer weights and computations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What benefit does max pooling provide?

A

It keeps the strongest activation in a region, providing robustness to small translations and slight distortions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define translation equivariance in CNNs.

A

Shifting the input causes an equal shift in the output feature map (Conv ∘ Shift = Shift ∘ Conv).

17
Q

Define translation invariance and how CNNs achieve it.

A

The output does not change when the input is shifted; pooling and deeper layers drive CNNs toward invariance.

18
Q

What is the typical high‑level architecture of a CNN?

A

Input → [Conv → ReLU]×N → Pool → … → Flatten/Global Pool → Dense → Softmax.

19
Q

How does the number of filters affect the output?

A

Each filter produces one feature map, so the number of filters equals the number of output channels.

20
Q

Why not always use large kernels?

A

They increase parameters and lose locality; stacking small kernels approximates larger receptive fields more efficiently.

21
Q

What is stride 2 useful for?

A

It downsamples feature maps during convolution, often replacing separate pooling layers.

22
Q

What role do ReLU activations play in CNNs?

A

They introduce non‑linearity, allowing the network to learn complex representations and mitigating vanishing gradients.

23
Q

How are multiple input channels handled in convolution?

A

Each kernel contains a weight slice for every input channel; these slices are summed to produce each output activation.

24
Q

How does kernel size affect receptive field and spatial resolution?

A

Larger kernels increase receptive field but also parameters; smaller kernels preserve more detailed spatial information.

25
Differentiate feature map depth and spatial size.
Depth is the number of channels (feature maps); spatial size refers to height and width dimensions.
26
Why does pooling reduce spatial precision?
It aggregates information over regions, discarding exact spatial details to gain invariance.
27
Compare CNNs with fully‑connected layers in terms of translation properties.
CNNs are translation equivariant due to convolution, while fully‑connected layers are not; they treat every input position separately.
28
What effect do stride and padding have on output size?
Increasing stride reduces output size; padding can counteract size reduction or preserve dimensions ("same" convolution).
29
What is the main advantage of sparse connectivity for generalization?
By focusing on local patterns, the network learns features that generalize across spatial locations, reducing overfitting.
30
Explain how pooling contributes to translation invariance.
Pooling outputs the same summary statistic for small input shifts, making the network’s later decisions insensitive to exact positions.