Difficulties in Image Registration and Generative Models Flashcards

Question

How does **DeepLab** improve upon traditional segmentation methods?

Answer 1

DeepLab uses **atrous convolutions** (dilated convolutions) to capture larger contextual features without downsampling the image, along with **CRF** (Conditional Random Fields) for refining object boundaries.

Answer 2

Atrous convolution allows the network to **capture larger context** without reducing the resolution, making it more efficient for high-resolution segmentation tasks.

Answer 3

Mask R-CNN extends Faster R-CNN by adding an additional branch to predict **masks** for each detected object, enabling instance segmentation rather than just object detection.

Answer 4

Mask R-CNN predicts a **binary mask** for each detected object by using a fully convolutional network that outputs a mask for each object instance, alongside bounding boxes and class labels.

Answer 5

Semantic segmentation assigns class labels to all pixels but doesn’t distinguish between multiple objects of the same class, while instance segmentation assigns a **unique identifier** to each object instance.

Answer 6

Spatial resolution refers to the **level of detail** in the segmentation output, with **higher resolution** providing better pixel-level precision in object boundary detection.

Answer 7

The main challenges in semantic segmentation for deep learning models include **reduced feature map resolution**, **multiple objects at multiple scales**, and **reduced localisation accuracy**.

Answer 8

Instance segmentation is used in applications like **autonomous vehicles**, where it's important to identify **specific objects** (like pedestrians, vehicles) in complex, cluttered environments.

Answer 9

**Synthetic data** is used to supplement **real data** when **annotating real images is expensive** or time-consuming. It helps create **large labeled datasets** for training deep learning models.

Answer 10

Panoptic segmentation provides a **complete scene understanding** by assigning class labels to background regions and instance identifiers to objects, unlike semantic or instance segmentation that only focuses on one aspect.

Answer 11

Context-aware saliency detection methods take into account **higher-level task context** or surrounding object importance, guiding the attention model to focus on certain parts of an image based on its context (e.g., identifying pedestrians in a self-driving car scenario).

Answer 12

Bottom-up saliency detection focuses on **low-level features** like **edges, contrast, and colour** that automatically draw attention without any task-specific guidance.

Answer 13

**Bottom-up saliency** focuses on low-level features like colour and contrast, while **top-down saliency** focuses on task-specific information or context, guiding attention based on higher-level goals or objectives.

Answer 14

Subitizing refers to the ability to **quickly and accurately identify small numbers** of objects (usually up to 4) without counting, which relates to saliency when rapidly detecting small sets of objects.

Answer 15

Domain adaptation allows a saliency model trained on one domain (like images) to adjust and work on another domain (like video) by **accounting for differences** such as temporal changes and motion.

Answer 16

Canny edge detection applies **Gaussian filtering** to **smooth out noise** before detecting edges, ensuring that the detected edges are less affected by random noise in the image.

Answer 17

NMS is used to **thin edges** by checking if a pixel’s gradient magnitude is the **local maximum** in the direction of the gradient, ensuring there’s only **one edge response** per actual edge.

Answer 18

In the second step, Canny edge detection calculates the **gradient magnitude** and **direction** of the image using operators like Sobel to identify edge boundaries.

Answer 19

Domain adaptation adjusts the model trained on one domain to work in another domain by minimizing performance gaps due to **different data distributions**, such as temporal changes in video data compared to static images.

Answer 20

The primary goal of image registration is to align one image with another by estimating an optimal transformation, minimizing the difference between the reference and target images.

Answer 21

Image registration is used in medical imaging (e.g., aligning MRI with CT scans), monitoring disease progression, aligning multiple subject images for statistical models, and creating population-based digital brain atlases.

Answer 22

The challenge is identifying the corresponding points between the images accurately, which involves minimizing dissimilarities between the images or features extracted from them.

Answer 23

SIFT (Scale-Invariant Feature Transform) is a feature detection and description method that identifies keypoints in an image that are invariant to scale, rotation, and illumination. In image registration, SIFT is used to **detect and match keypoints** between images, allowing the estimation of transformations needed to align them accurately.

Answer 24

A rigid transformation includes only translation and rotation, which preserves the structure and dimensions of the object being transformed. It is used for within-subject registration when no distortion is present.

Answer 25

Affine transformations involve translation, rotation, scaling, and shearing, while rigid transformations only involve translation and rotation, preserving the original object shape.

Answer 26

Non-rigid transformations are non-linear and allow for complex deformations like bending and warping, which is useful for tasks like inter-subject registration and distortion correction.

Answer 27

Thresholding is a segmentation method that groups pixels based on their intensity values by setting a threshold that separates pixels into two categories (e.g., foreground and background).

Answer 28

The main parameter in thresholding is the threshold value, which separates pixel intensities into two clusters: those above and those below the threshold.

Answer 29

a way to automatically choose the best brightness cut-off to separate an image into, (it picks the threshold where the difference between the two groups is biggest) determines the optimal threshold by minimizing intra-class intensity variance or maximizing inter-class variance to separate background and foreground

Answer 30

K-means clustering is a method where pixels are grouped into clusters based on their features (e.g., intensity, color, or texture). The process involves iteratively assigning pixels to the nearest cluster center and updating the cluster centers.

Answer 31

K-means updates the cluster centers by computing the mean of all the points assigned to that cluster, aiming to minimize intra-class variance. Intra = within, inter = between

Answer 32

Non-rigid transformations are necessary for correcting distortions and handling complex, localized deformations in images, as rigid or affine transformations are insufficient for these tasks.

Answer 33

The four steps in the Canny edge detection algorithm are: 1) Gaussian filtering to suppress noise, 2) Compute gradient magnitude and direction, 3) Apply non-maximum suppression, 4) Perform hysteresis thresholding to identify edges.

Answer 34

The Chain rule in backpropagation helps compute gradients by combining the upstream gradient and local gradient to update the model's parameters iteratively.

Answer 35

The Gaussian kernel size affects the amount of **smoothing** applied to the image, helping to reduce noise. Larger kernels result in more smoothing, which may suppress finer details but reduce noise.

Answer 36

Domain adaptation in saliency models allows a model trained on one type of data (e.g., images) to adapt and work with another type (e.g., videos), addressing differences such as motion and temporal changes.

Answer 37

Traditional feature-based methods may struggle with complex scenes, have high computational costs, rely on manually defined features, and often fail in cluttered environments or with overlapping objects.

Difficulties in Image Registration and Generative Models Flashcards

(61 cards)