Chapter 14 Flashcards
(25 cards)
What is the primary motivation for using CNNs instead of dense neural networks for vision tasks?
Dense networks are computationally expensive for images; CNNs reduce complexity while maintaining spatial structure.
What biological structure inspired CNNs?
The architecture of the human visual cortex.
What is a local receptive field in the visual cortex?
A small region of the visual field that a neuron responds to.
What are convolutional layers used for in CNNs?
To detect local patterns in input data, such as edges or textures.
What is the function of pooling layers in CNNs?
To reduce spatial dimensions and promote translation invariance.
What is zero-padding and why is it used?
Adding zeros around an image to maintain the output size after convolution.
What does stride mean in the context of CNNs?
The number of pixels the filter moves across the input during convolution.
What happens when stride > 1?
The output feature map becomes smaller.
What is a filter (or kernel) in a CNN?
A small matrix of weights used to scan and extract features from the input.
What is a feature map?
The output produced by applying a filter across the input.
Why are multiple filters used in convolutional layers?
To detect different features in the input image.
How is an image represented in TensorFlow for CNN input?
As a 3D tensor: [height, width, channels].
What are the types of padding in CNNs?
“Same” (with zero-padding) and “Valid” (no padding).
What is the memory challenge with convolutional layers?
High memory usage due to many parameters and computations per feature map.
What are the benefits of pooling layers?
Reduced computation, fewer parameters, and improved invariance to translations.
What is depthwise pooling?
Pooling across feature maps (depth dimension) to summarize feature activations.
What is a typical CNN architecture structure?
Multiple convolution-ReLU-pooling blocks followed by fully connected layers.
What is the trend of kernel sizes and feature maps in deep CNNs?
Larger kernels at lower layers; smaller kernels and more feature maps at higher layers.
What is transfer learning in CNNs?
Using pretrained CNN layers for a new task by retraining only the final layers.
What is ResNet and how does it improve CNN performance?
A deep CNN with residual (skip) connections to solve vanishing gradient problems.
What is YOLO used for?
Real-time object detection by predicting multiple bounding boxes and class probabilities in one forward pass.
What metric is used for measuring bounding box accuracy?
Intersection over Union (IoU).
What is semantic segmentation?
Classifying each pixel in an image to determine the class of the object it belongs to.
What is the difference between object detection and semantic segmentation?
Detection identifies bounding boxes; segmentation assigns a class to each pixel.