lecture 7 - CNNs Flashcards
What is image classification in machine learning?
- Image classification involves analyzing an image (represented as pixel data with color values) to assign it a label.
- Example: determining whether a picture contains a cat or not.
What is the typical data structure for an image in image classification?
- Each image is represented as a 3D array
- dimensions height × width × number of color channels
- e.g., a 64×64 image with 3 color channels results in 64×64×3 datapoints per image
What is object detection, and how does it differ from image classification?
- Object detection involves identifying objects within an image, determining how many objects are present, and specifying their locations.
- Image classification, it provides labels
What is neural style transfer?
Neural style transfer combines a source image with a style from another image, transferring the style onto the source image while preserving its content.
How do larger images affect normal neural network design?
Larger images have more datapoints, leading to a greater number of weights in the network. This increases computational complexity and memory requirements.
Why are regular neural networks not ideal for image classification tasks with large images?
- Regular neural networks require a very high number of weights for large images, making them computationally infeasible.
- Specialized architectures, like convolutional neural networks (CNNs), are used to manage this complexity.
What is edge detection in image processing?
- Edge detection identifies areas in an image where the intensity (brightness) changes sharply.
- These areas often represent object boundaries or transitions from one color to another.
How does a filter (kernel) work in edge detection?
- A filter scans across the image by sliding over the input grid and performs a convolution operation to compute an output.
- This operation detects changes in intensity, indicating edges.
If you apply a 3×3 filter on a 6×6 image, what will be the size of the output?
- The output will be a 4×4 matrix
- applying a filter reduces the output dimensions by the size of the filter minus one.
Does the orientation of the input image affect the result of edge detection?
No, flipping the input image around does not change the filter’s ability to detect edges, as the convolution operation remains consistent.
What is the mathematical operation performed during convolution?
The filter and a corresponding section of the input image are multiplied element-wise, and the resulting values are summed to produce one output value.
What is the primary purpose of using convolution in edge detection?
Convolution helps extract meaningful patterns, such as edges, from the input image, facilitating feature extraction in downstream computer vision tasks.
What does a vertical filter detect in an image?
- A vertical filter detects vertical edges by emphasizing intensity differences between the left and right sides of the filter.
- If one side is much brighter, an edge is detected.
How does a horizontal filter detect edges?
- A horizontal filter detects horizontal edges by comparing brightness between the top and bottom parts of the filter.
- It is the transposed version of a vertical filter.
What is a Sobel filter, and why is it useful?
- A Sobel filter is an advanced edge detection filter that gives more importance to the center of the image section being analyzed.
- It works well for detecting faint edges.
What is the purpose of a Scharr filter?
A Scharr filter is a fine-tuned version of the Sobel filter that detects edges with even greater precision.
What are the two main advantages of using convolution in image processing?
- Parameter/Weight sharing: The filter size is fixed, reducing the number of weights significantly.
- Local information: Convolution captures local patterns by taking into account the spatial relationship of neighboring pixels.
Why is the filter size considered a hyperparameter in convolutional neural networks?
- The filter size determines the receptive field and affects the output size.
- It is typically chosen as an odd number (e.g., 3×3 or 5×5) to ensure proper centering.
What is the purpose of padding in convolutional neural networks?
Padding prevents the output from shrinking after each convolution, enabling the network to go deeper while preserving the original image size.
How does padding help in edge detection tasks?
Padding ensures that pixels on the borders of the image are used as frequently as those in the center, allowing the network to accurately detect information near the edges.
How is padding typically applied to an image?
Padding adds extra rows and columns (usually filled with zeros) around the original image to maintain the desired output size.
What is the difference in output size between convolution with and without padding?
- without padding: [n-f+1] x [n-f+1]
- with padding: [n+2p-f+1] x [n+2p-f+1]
- The output size remains the same as the input if the padding is chosen appropriately.
What are the two main benefits of using padding in CNNs?
- It allows the filter to operate at the edges, ensuring that all pixels are considered equally.
- It maintains the image size after convolution, making it easier to design deeper networks.
How is the required padding size determined for padding?
- p = (f-1)/2
- f = filter size
- this will give an nxn output
- This formula works best when the filter size is an odd number, ensuring an integer padding value.