Midterm Flashcards

(109 cards)

1
Q

Define image processing

A

Image Processing is manipulating an image to improve its quality, extract information, or enable further analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define feature

A

A distinctive attribute or description used to label or differentiate objects in images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Feature extraction involves two things. What are they?

A

Detection (finding features) and Description (quantifying features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are invariant and covariant features?

A

Invariant features: Values remain unchanged under specific transformations (e.g., rotation, scaling)

Covariant features: Values change predictably under transformations (e.g., scaling affects area proportionally)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are local and global features?

A

Local features: Apply to individual image regions (e.g., corners, edges)

Global features: Describe entire images (e.g., colour histogram)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The purpose of preprocessing techniques is to…

A

Prepare images for further analysis by reducing noise, enhancing features, and normalizing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define boundary analysis

A

An analysis of the edges or outlines of objects to aid in object shape identification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define region analysis

A

An analysis of the areas or segments within an image to support texture and pattern recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is boundary following/tracing

A

A technique to identify the boundary of an object in a binary image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the requirements for boundary following/tracing?

A
  • Must be a binary image
  • Image padded with a border of 0’s
  • Single connected region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are chain codes?

A

Chain codes represent the boundary of an object as a sequence of connected line segments. These segments are described using directional numbers based on connectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different connectivity types?

A

4-Connectivity: Segments connect pixels in horizontal and vertical directions

8-Connectivity: Segments connect pixels in horizontal, vertical, and diagonal directions (finer boundary representation than 4-C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two types of chain codes?

A

Freeman chain codes and slope chain codes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define freeman chain codes

A

A boundary chain code that assigns a directional number (e.g., 0 for right, 1 for top-right, etc.) to each segment between consecutive boundary pixels (e.g., 0766666453321212)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a strategy that could reduce the length of a boundary chain?

A

Resample fine-grained grid to a coarser grid spacing. This also helps with reducing sensitivity to noise or segmentation errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some normalization techniques for chain codes?

A

Rotation normalization and starting point normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is rotation normalization

A

uses the difference between consecutive directions instead of absolute directions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is starting point normalization

A

A normalization technique for chain codes that treats the chain code as circular and shifts it to start with the smallest sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define slope chain codes (SCCs)

A

A chain code for boundary analysis that uses slope changes between contiguous line segments to represent a boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you normalize a slope chain code?

A

Positive and zero slope changes are normalized to [0, 1), negative slope changes are normalized to (-1, 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the advantages of SCCs over Freeman codes?

A
  • Provide finer granularity by utilizing a continuous slope range (-1, 1)
  • Better representation under rotation
  • Simpler process as SCCs do not require defining a grid
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define boundary approximation using minimum-perimeter polygons (MPP)

A

Boundary approximation using polygons to minimize the total perimeter while maintaining the shape’s integrity, provides a compact/simplified representation of object boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the advantages of boundary approximation using MPP?

A
  • Reduces computational complexity
  • Simplifies boundary representation for storage and analysis
  • Useful in applications like shape matching and object recognition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Define scale-invariant feature transform (SIFT)

A

SIFT extracts features that are invariant to scale, rotation, and certain changes in illumination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
SIFT is designed to detect and describe _______ features in images
Local
26
SIFT features are ___________
Invariant
27
Describe the first step of the SIFT algorithm: Scale Space Pyramid Construction
The scale space pyramid constructions step represents the image at multiple scales to detect features across varying object sizes
28
How would you construct a Scale Space Pyramid
Repeatedly blur (with a Gaussian filter) and downsample the image
29
Each group of blurred images in a scale space pyramid is called ________
An octave
30
Describe the second step of the SIFT algorithm: Obtain Initial Keypoints
Compute the difference of Gaussians (DoG) and find local extrema
31
How would you find the local extrema when obtaining initial keypoints in a SIFT algorithm
Compare each pixel's intensity value in the 2D DoG image to the intensity values of its 8 neighbours. The pixel is marked as an extremum if its value is greater/smaller than all its neighbours
32
Describe the third step of the SIFT algorithm: Improve Keypoint Localization Accuracy
SIFT uses mathematical interpolation (Linear + Quadratic Terms of Taylor Series Expansion) to help locate the true extremum position
33
What are the six key steps in the SIFT algorithm?
1. Construct a scale-space pyramid 2. Obtain initial keypoints 3. Improve keypoint localization accuracy 4. Delete unsuitable keypoints 5. Compute keypoint orientations 6. Compute keypoint descriptor
34
How do unstable keypoints occur in a SIFT algorithm and why delete them?
Can occur due to: Low Contrast + Noise: Keypoints with insignificant intensity changes are sensitive to noise Edge Responses: Keypoints along edges are not well-localized and are less robust Removing these keypoints ensures that SIFT retains only distinctive and stable features
35
What is a keypoint descriptor?
A "unique fingerprint" for each keypoint, used to match features across images, even under changes in scale/rotation/illumination
36
How do you compute keypoint descriptors?
1. Select neighbourhood 2. Divide into subregions 3. Compute gradients 4. Create histograms 5. Combine histograms 6. Normalize the descriptor
37
What is a prototype?
Predefined patterns or templates representing specific classes, often stored in raw or processed forms for comparison
38
What is prototype matching?
Comparing unknown patterns to stored prototypes to determine the class, similarity between unknown and known data determines classification
39
What are some methods for prototype-based matching?
Minimum Distance Classifier and Template Matching
40
Define Minimum Distance Classifier
Compares unknown patterns to the mean of each class, aligns the class with the smallest distance
41
Define template matching
Uses correlation to find the best match between an unknown pattern and stored templates
42
What are the steps for minimum distance classification?
1. Mean calculation: Compute mean vector for each class using training data 2. Distance measurement: Measure distance between unknown pattern and each class mean 3. Class assignment: Assign unknown pattern to class with smallest distance
43
What are the steps to template matching?
1. Start with a template 2. Slide template across bigger image 3. Compare at each position 4. Find best match
44
What is a similarity score?
It is used in prototype matching and it determines how close a region of an image is to a predefined prototype
45
How is similarity score calculated?
It is calculated using a formula called correlation coefficient. What it does is: 1. Pixel-by-pixel comparison to template 2. Normalization (makes brightnesses between template and image closer) 3. Output score is between -1 and 1 (1: Perfect, 0: No Match, -1: Perfect inverse match (opposite))
46
What is the limitation of the basic correlation formula? Is there a way to address this limitation?
Sensitive to intensity changes (i.e., if the image becomes brighter or darker, the correlation score will be affected). To address the limitation, use a normalized correlation formula (normalizes the correlation result to account for intensity variations in the template or the image)
47
How does SIFT matching work?
Matching involves comparing SIFT descriptors from a known image (prototype) with descriptors from an unknown image
48
SIFT descriptors are high-dimensional vectors, which means matching directly can be computationally expensive. What strategies can be implemented to improve performance?
Best-Bin-First Search: Quickly identifies potential matches by approximating the nearest neighbours using limited computations. Clusters of Matches: To improve reliability, clusters of potential matches are identified using the generalized Hough transform, which groups matches that align well geometrically
49
What are the steps for SIFT feature matching?
1. Keypoint detection: Identify distinctive points in both images 2. Descriptor generation: Compute a 128-dimensional vector for each keypoint 3. Feature matching: Compare descriptors from both images and find the best match for each keypoint 4. Filter matches: Use techniques like Lowe’s Ratio Test and Clustering to improve accuracy
50
Describe Best-Bin-First Search (BBF Search)
Since comparing all features (brute force) is too slow, BBF Search focuses on the most likely matches first. This is done by: 1. Organizing descriptors into bins (data structures like KD-trees) 2. Searching in the best bin (most promising candidates) 3. Stopping early if a good match is found A good analogy is searching for a book by starting in the correct section instead of scanning the entire library
51
Describe Clusters of Matches (Generalized Hough Transform)
Since individual matches can be noisy or incorrect, the Generalized Hough Transform identifies clusters of consistent matches. This is done by: 1. Grouping matches that agree on a geometric transformation (e.g., scaling, rotation) 2. Discard outliers that don’t align with the cluster A good analogy is solving a jigsaw puzzle by fitting groups of pieces together
52
What is a Neural Network (NN)?
A Neural Network (NN) is a computational system inspired by the human brain, designed to recognize patterns and solve problems
53
What is the basic structure of a neural network?
NN is composed of interconnected units called neurons organized in layers. Key components include an input layer, hidden layers, and an output layer
54
What is the difference between a biological and artificial neuron?
Biological Neurons: - Process and transmit information in the brain - Receive signals, integrate inputs, and send outputs Artificial Neurons: - Perform mathematical operations - Use activation functions to decide outputs
55
What is the structure of an artificial neuron?
Inputs: Data features or signals Weights: Influence the strength of each input Bias: Adds flexibility to the decision boundary Activation Function: Determines whether a neuron should "fire" (output) Output: Result of processing inputs Formula: Activation(Input * Weight + Bias)
56
Describe weights in neural networks
Weights determine the importance of each input feature to the neuron’s output (larger = stronger influence). Weights are adjusted during training to minimize loss. Higher weights amplify corresponding inputs; lower weights diminish them. Fine-tuning weights enables the network to adapt to patterns in the data
57
Describe bias in neural networks
Bias is a trainable parameter that allows the model to shift the activation function. Bias enables the neuron to make decisions independent of weighted inputs (helps network fit data more flexibly)
58
Describe activation functions in neural networks
Activation functions introduce non-linearity to the network. They decide whether or not to 'fire' the neuron's output
59
What are the most commonly used activation functions?
Sigmoid: Smooth gradient, used for binary classification ReLU: Efficient and widely used for hidden layers Tanh: Zero-centred, scales outputs between -1 and 1 Softmax: Converts outputs to probabilities
60
What is a Multi-Layer Perceptron (MLP)
A Multi-Layer Perceptron is a class of feed-forward neural networks consisting of multiple layers of neurons. MLPs can learn complex patterns by stacking layers. The architecture is structured in the following manner: Input Layer: Receives the input features Hidden Layers: Perform feature extraction through non-linear transformations Output Layer: Provides predictions
61
What is the forward propagation process?
1. Input features are passed through the network 2. Each layer applies weights, biases, and activation functions 3. Outputs are propagated to the next layer until the final output is produced
62
What is the difference between an objective function and a loss function?
Loss Function: Measures the error for a single data point or batch of data Objective Function: The function to be minimized (or maximized) during training (often represents the aggregate loss over the entire dataset)
63
What is backpropagation?
- Process of using optimization algorithms that adjust weights and biases to minimize the loss - Calculates gradients of the loss function with respect to weights
64
What is a gradient?
A gradient is a vector representing the direction and rate of a function's steepest increase (or decrease). In neural networks, it typically refers to the partial derivatives of the loss function with respect to the model's parameters (weights and biases). Think of it as a 'guide' or a 'pointer', a gradient just points to the best way to get to where you want to go (reduces errors in a neural network)
65
What are Convolutional Neural Networks (CNNs)?
Specialized neural networks are primarily used for image recognition and computer vision tasks. CNNs achieve state-of-the-art performance in many tasks (e.g., image classification, object detection)
66
What makes CNNs stand out from traditional machine learning?
Traditional machine learning methods require manual feature extraction. CNNs learn hierarchical feature representations directly from raw data (e.g., images). There is a reduced number of parameters compared to fully connected networks (MLP) (exploiting local connectivity and parameter sharing)
67
What is the architecture of a CNN?
1. Convolution Layer 2. Pooling Layer 3. Fully Connected Layer (FC) 4. Activation Functions
68
What is the convolution layer in a CNN?
The convolution layer performs filtering by sliding filters (kernels) over the input. It learns filters that activate when they see specific features
69
What is the pooling layer in a CNN?
The pooling layer reduces spatial dimensions (e.g., max pooling). Helps reduce computation and control overfitting.
70
What is the fully connected layer (FC) in a CNN?
The fully connected layer (FC) is the final layer for classification or regression.
71
Describe what a filter (kernel), stride, and padding is in a convolution operation
Filter (kernel): A small matrix applied over the input (e.g., 3×3 or 5×5). Stride: The step size with which the filter moves across the input. Padding: Zero-padding preserves spatial dimensions.
72
What is the formula for determining the output size (OS) in a convolution neural network?
OS = 1 + (W - K - 2P)/S W: Input dimension K: Kernel size P: Padding S: Stride
73
The main building block of a CNN is the __________ layer
Convolutional
74
Explain the process that occurs during a convolution operation
1. Filter Sliding: Kernel moves across input data with certain stride value until it parses complete width, then moves down one row and starts at left again. This repeats until entire image is traversed 2. Element-wise Multiplication & Summation: At each position, we multiply the overlapping input patch by the filter and sum the results 3. Feature Map: The sum is stored in the feature map at the corresponding location
75
Describe the difference between grayscale image convolution and RGB (colour) image convolution
Grayscale Image Convolution: - A grayscale image has only one channel (intensity values from 0 to 255). - Image shape is denoted as (H×𝑊×1) - Convolution filter shape: (𝑓×𝑓×1) - The convolution operation applies a single 2D filter over the image. - Produces a single feature map as output. RGB (Color) Image Convolution: - Each pixel has three separate intensity values (3 channels: Red, Green, Blue). - Image shape is denoted as (H×𝑊×3) - Convolution filter shape: (𝑓×𝑓×3) – one filter “slice” per channel, then summed into a single feature map. - Element-wise multiplication is performed independently for each channel, and the results are summed across channels. - Produces a single feature map per filter.
76
Why do we need multiple filters in a convolutional layer?
- A single filter captures only one type of feature (e.g., horizontal edges). - A Convolutional Layer applies multiple filters to extract different features at the same time. - More filters = richer feature representation. Example: - Filter 1: Detects vertical edges. - Filter 2: Detects horizontal edges. - Filter 3: Detects diagonal lines
77
What is depth in a convolutional layer?
- The number of filters in a convolutional layer determines its depth. - If a layer has 64 filters, it produces 64 feature maps. - The output of a convolutional layer has the shape: HxWxD where D = number of filters (depth)
78
How do CNNs learn filters?
- Filters are not manually set; they are learned during training. - The CNN adjusts filter values using backpropagation. - Each filter activates strongly when it detects a matching pattern. Over multiple layers: - Early layers: Detect edges & textures. - Middle layers: Detect shapes & parts. - Deeper layers: Detect high-level objects (faces, animals, etc.).
79
Why do we need pooling layers in CNNs?
- Feature maps generated by Convolutional Layers are large. - Pooling reduces spatial size, keeping only the most important information. - Helps prevent overfitting by forcing CNNs to generalize. - Makes CNNs translation invariant (small shifts in the image don't affect detection).
80
What are the different types of pooling?
Max Pooling (Most Common): Takes the maximum value from each sub-region. Average Pooling (Less Common): Takes the average value from each region, retains the overall smoothness of feature maps.
81
Describe the difference between batch processing and single image processing?
Instead of processing one image at a time, CNNs process multiple images in parallel (batch). A batch size is the number of images processed together before updating weights. The characteristics of both options are listed below: Batch: - Updates weights after computing the gradient over a batch of images - More stable gradients, efficient GPU use - Requires more memory Single-Image: - Updates weights after every image - Faster weight updates -Unstable training, noisy updates
82
True or False: Batch processing adds another dimension to an image tensor
True, with batch processing, an additional Batch Size (B) dimension is added: (HxWxDxB)
83
What is Batch Normalization (BN)?
Neural networks suffer from internal covariate shift, where layer activations change drastically, slowing training. Batch Normalization (BN) normalizes activations, reducing variance between batches and improving stability.
84
How does Batch Normalization (BN) work?
- Computes the mean and variance for each batch. - Normalizes activations by subtracting the mean and dividing by standard deviation. - Applies learnable scale and shift parameters to maintain network flexibility.
85
What are the benefits of Batch Normalization (BN)?
- Faster convergence (reduces training time). - More stable training (reduces sensitivity to learning rate). - Reduces dependence on careful weight initialization. - Acts as a mild regularizer (reduces overfitting).
86
What is regularization?
CNNs can overfit, memorizing training data instead of generalizing. Regularization techniques help improve model generalization.
87
What is dropout?
- During training, random neurons are deactivated with probability, p - This forces the network to learn multiple representations, improving generalization.
88
How does dropout work?
- In each training step, some neurons are ignored. - During testing, all neurons are active, but their activations are scaled by p (dropout probability)
89
What are the benefits of dropout?
- Prevents overfitting. - Helps CNNs learn redundant features. - Improves model robustness.
90
What are Autoencoders?
- Neural networks designed for unsupervised learning. - Learn compact representations (encoding) of input data. - Used to pre-train deep models when labelled data is scarce Consists of two main parts: - Encoder: Compresses input into a lower-dimensional representation. - Decoder: Reconstructs the input from this compressed representation.
91
Why use deep autoencoders?
- Reduce dimensionality (feature compression). - Learn meaningful latent representations of data. - Useful for denoising, anomaly detection, and pretraining deep models.
92
What are the use cases for autoencoders?
- Image reconstruction - Anomaly detection - Data generation (using variational autoencoders (VAEs)) - Image Segmentation (U-Net)
93
What are Variational Autoencoders (VAEs)?
Learn probabilistic representations to generate pixel-wise segmentations. It works by the encoder network converting the input into two vectors: - Mean (μ): Center of the latent space distribution. - Variance (σ2): Spread of the distribution. Then, instead of sampling directly from μ and σ, we generate z: z = μ + σ⋅ϵ, ϵ ∼ N(0,1) Then, the decoder takes the sampled latent vector, z, and reconstructs the original input
94
Explain how an autoencoder works
- Train an autoencoder to learn unsupervised feature representations. - Use the encoder’s output as input features for a classifier.
95
What are Generative Adversarial Networks (GANs)?
Generative Adversarial Networks (GANs) are a type of deep learning model used for generating new data that mimics a given dataset. Consists of two competing neural networks: - Generator (“Artist”): Creates fake data. - Discriminator (“Critic”): Evaluates if data is real or fake
96
What is deconvolution?
Deconvolution (also called transposed convolution) is used to increase the spatial resolution of feature maps in CNNs. It helps reconstruct finer details lost during convolution. Often used in image segmentation and super-resolution tasks
97
How do deconvolution layers work?
- Works by spreading pixel values over a larger area. - Deconvolution uses a learnable kernel like standard convolution but performs an inverse process. - Unlike upsampling, deconvolution learns weights dynamically.
98
What are the components in a transposed convolution layer?
Stride: Spacing between output values (upsampling factor). Kernel: Similar concept to the convolution kernel, but effectively “spread out.” Padding & Output Shape: Calculations ensure the desired output height/width. Learnable Parameters: Weights are learned just like in forward convolution.
99
Why would you use deconvolution in segmentation?
- Segmentation demands pixel-wise classification. - Deep networks (like CNNs) typically reduce resolution to capture context. - Need to “decode” feature maps back to full resolution (Transposed Conv.) - To classify each pixel in the original image, we need to restore or approximate its original spatial resolution.
100
What is image segmentation?
The process of dividing an image into meaningful regions. Each pixel is assigned a label corresponding to an object/class
101
What are the different types of image segmentation?
Semantic: Labels every pixel with a class Instance: Identifies and separates individual objects within an image Panoptic: Combination of semantic + instance segmentation (Recognizes both object boundaries and individual instances)
102
What is U-Net?
Widely used model for biomedical image segmentation. Helps precisely segment small objects. U-Net concatenates feature maps from the encoder to the decoder preserving features from the earlier layers (skip connection). Consists of two parts: - Contracting path (Downsampling via convolutional layers) - Expanding path (Upsampling via deconvolution layers)
103
What is transfer learning?
Transfer Learning is a deep learning technique where a pre-trained model is adapted for a new task. Instead of training from scratch, we reuse knowledge from existing models trained on large datasets (e.g., ImageNet). This saves computational resources and improves performance on smaller datasets.
104
How does transfer learning work?
1. Select a Pre-trained Model: Choose a model trained on a large dataset (e.g., VGG16, ResNet, EfficientNet) 2. Feature Extraction or Fine-tuning: - Feature Extraction: Freeze convolutional layers and use them to extract useful representations. - Fine-tuning: Unfreeze some deeper layers and retrain them on the new dataset. 3. Train a New Classifier: Replace the final classification layer with a new one tailored to the target task.
105
What are some common image reconstruction techniques?
Denoising: Removes noise while preserving details Inpainting: Fills in missing parts or damaged regions of an image Super-Resolution: Enhances low-resolution images to high-resolution
106
What are some deep learning models that can be used for image reconstruction?
- CNNs - Autoencoders - GANs
107
What is image augmentation?
It increases dataset diversity by creating artificial/modified copies of existing data. Prevents overfitting and improves model robustness to variations
108
What are some image augmentation techniques?
Geometric Transformations: Rotation, flipping, cropping, scaling Colour-Based Transformations: Brightness adjustment, contrast enhancement, colour jittering Noise Addition: Gaussian noise, salt-and-pepper noise Synthetic Data Generation: GANs and diffusion models for generating new samples
109
How do image augmentation and image reconstruction complement each other?
- Augmentation enhances training datasets to improve reconstruction models - Reconstruction techniques can be used to clean augmented images - Example: Super-resolution can be combined with augmentation for better data quality - High-quality data + Diverse training = Robust models.