Difficulties in Image Registration and Generative Models Flashcards

(61 cards)

1
Q

What does VAE stand for?

A

Variational Autoencoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between self-supervised and unsupervised learning?

A

Self-supervised learning generates labels from the data itself, while unsupervised learning doesn’t have labels and focuses on identifying patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does semantic segmentation focus on?

A

Semantic segmentation assigns a class label to each pixel, classifying background stuff like sky or road.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does instance segmentation focus on?

A

Instance segmentation detects individual objects and assigns unique identifiers to each object, distinguishing between objects of the same class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is panoptic segmentation important?

A

Panoptic segmentation combines semantic and instance segmentation, providing both class labels and object identification for a complete scene understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the limitation of instance segmentation?

A

Instance segmentation doesn’t handle background stuff and focuses only on distinguishing between objects of the same class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does bottom-up saliency refer to?

A

Bottom-up saliency is based on low-level features like contrast, colour, and edges that automatically draw attention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does top-down saliency refer to?

A

Top-down saliency uses high-level task-specific context or guidance to focus attention, such as the task at hand (e.g., detecting pedestrians in self-driving cars).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is subitizing in visual saliency modelling?

A

Subitizing refers to the ability to quickly and accurately judge the number of objects in a small set (usually 1-4 objects) without counting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is context-aware saliency detection?

A

Context-aware saliency detection incorporates higher-level guidance or task-specific information (like object importance or focus on certain areas) to generate saliency maps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is domain shift in saliency models for images and videos?

A

Domain shift refers to training a model on one data domain (e.g., images) and adapting it to work on another (e.g., video), which can affect performance due to differences in data characteristics like motion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the core mechanism behind GANs?

A

GANs consist of a generator that creates fake data and a discriminator that tries to detect whether the data is real or fake, improving the model over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between Pix2Pix and CycleGAN?

A

Pix2Pix uses paired data for image transformation, while CycleGAN works with unpaired data, learning to transform images from one domain to another without matching pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a diffusion model do in generative AI?

A

Diffusion models start with random noise and progressively denoise it to generate a realistic image, learning to reverse the noise process during training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the application of diffusion models in healthcare?

A

Diffusion models are used to generate synthetic medical images (like heart images), reducing the need for real patient data and providing more training examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the benefits of panoptic segmentation?

A

Panoptic segmentation combines semantic and instance segmentation, giving a complete output that labels both background (semantic) and objects (instance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Chain Rule used in backpropagation?

A

The Chain Rule in backpropagation helps calculate gradients by combining the upstream gradient and the local gradient to update network parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the four steps in the Canny Edge Detection algorithm?

A
  1. Gaussian filtering to suppress noise, 2. Compute gradient magnitude and direction, 3. Apply non-maximum suppression (NMS), 4. Use hysteresis thresholding to detect edges.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the main difference between semantic and panoptic segmentation?

A

Semantic segmentation classifies pixels into categories, while panoptic segmentation provides both class labels for the background and instance identifiers for individual objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is task-dependent saliency?

A

Task-dependent saliency (top-down saliency) adjusts the focus of attention based on the context or task at hand, such as detecting specific objects based on a goal or environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is semantic segmentation?

A

Semantic segmentation classifies each pixel of an image into predefined categories (like sky, road, or tree) but doesn’t distinguish between individual objects of the same class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is instance segmentation?

A

Instance segmentation assigns unique labels to individual objects of the same class, allowing the model to detect multiple instances of the same object type (like multiple cars).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How does panoptic segmentation differ from semantic segmentation and instance segmentation?

A

Panoptic segmentation combines both semantic segmentation (for background) and instance segmentation (for object instances), giving a full pixel-wise segmentation that identifies both the object type and instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is UNet’s primary architecture?

A

UNet’s architecture consists of an encoder (contracting path), a bottleneck (compressed features), and a decoder (expanding path with skip connections to recover spatial details), which helps with pixel-wise segmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How does **DeepLab** improve upon traditional segmentation methods?
DeepLab uses **atrous convolutions** (dilated convolutions) to capture larger contextual features without downsampling the image, along with **CRF** (Conditional Random Fields) for refining object boundaries.
26
What is the role of **atrous convolution** in DeepLab?
Atrous convolution allows the network to **capture larger context** without reducing the resolution, making it more efficient for high-resolution segmentation tasks.
27
What is the key difference between **Faster R-CNN** and **Mask R-CNN**?
Mask R-CNN extends Faster R-CNN by adding an additional branch to predict **masks** for each detected object, enabling instance segmentation rather than just object detection.
28
How does **Mask R-CNN** create object masks?
Mask R-CNN predicts a **binary mask** for each detected object by using a fully convolutional network that outputs a mask for each object instance, alongside bounding boxes and class labels.
29
What does **semantic segmentation** lack that **instance segmentation** provides?
Semantic segmentation assigns class labels to all pixels but doesn’t distinguish between multiple objects of the same class, while instance segmentation assigns a **unique identifier** to each object instance.
30
What is **spatial resolution** in the context of **semantic segmentation**?
Spatial resolution refers to the **level of detail** in the segmentation output, with **higher resolution** providing better pixel-level precision in object boundary detection.
31
What are the **main challenges** in **semantic segmentation** for deep learning models?
The main challenges in semantic segmentation for deep learning models include **reduced feature map resolution**, **multiple objects at multiple scales**, and **reduced localisation accuracy**.
32
What is **instance segmentation** used for in real-world applications?
Instance segmentation is used in applications like **autonomous vehicles**, where it's important to identify **specific objects** (like pedestrians, vehicles) in complex, cluttered environments.
33
Why is **synthetic data** used in training deep learning models for segmentation?
**Synthetic data** is used to supplement **real data** when **annotating real images is expensive** or time-consuming. It helps create **large labeled datasets** for training deep learning models.
34
What is the **main advantage** of **panoptic segmentation** over semantic and instance segmentation?
Panoptic segmentation provides a **complete scene understanding** by assigning class labels to background regions and instance identifiers to objects, unlike semantic or instance segmentation that only focuses on one aspect.
35
What are **context-aware methods** in visual saliency?
Context-aware saliency detection methods take into account **higher-level task context** or surrounding object importance, guiding the attention model to focus on certain parts of an image based on its context (e.g., identifying pedestrians in a self-driving car scenario).
36
How does **bottom-up saliency detection** work?
Bottom-up saliency detection focuses on **low-level features** like **edges, contrast, and colour** that automatically draw attention without any task-specific guidance.
37
What are the two types of saliency in visual attention?
**Bottom-up saliency** focuses on low-level features like colour and contrast, while **top-down saliency** focuses on task-specific information or context, guiding attention based on higher-level goals or objectives.
38
What is **subitizing** in visual saliency?
Subitizing refers to the ability to **quickly and accurately identify small numbers** of objects (usually up to 4) without counting, which relates to saliency when rapidly detecting small sets of objects.
39
How does **domain adaptation** help with saliency models for both images and videos?
Domain adaptation allows a saliency model trained on one domain (like images) to adjust and work on another domain (like video) by **accounting for differences** such as temporal changes and motion.
40
How does **Canny edge detection** handle noise in images?
Canny edge detection applies **Gaussian filtering** to **smooth out noise** before detecting edges, ensuring that the detected edges are less affected by random noise in the image.
41
What is the **role of non-maximum suppression (NMS)** in the Canny edge detector?
NMS is used to **thin edges** by checking if a pixel’s gradient magnitude is the **local maximum** in the direction of the gradient, ensuring there’s only **one edge response** per actual edge.
42
What does the **Canny edge detection algorithm** do in the second step of its process?
In the second step, Canny edge detection calculates the **gradient magnitude** and **direction** of the image using operators like Sobel to identify edge boundaries.
43
How does **domain adaptation** affect the performance of saliency models across different domains (e.g. images vs. videos)?
Domain adaptation adjusts the model trained on one domain to work in another domain by minimizing performance gaps due to **different data distributions**, such as temporal changes in video data compared to static images.
44
What is the primary goal of image registration?
The primary goal of image registration is to align one image with another by estimating an optimal transformation, minimizing the difference between the reference and target images.
45
What are some common applications of image registration?
Image registration is used in medical imaging (e.g., aligning MRI with CT scans), monitoring disease progression, aligning multiple subject images for statistical models, and creating population-based digital brain atlases.
46
What is the issue with finding the exact point to match in image registration?
The challenge is identifying the corresponding points between the images accurately, which involves minimizing dissimilarities between the images or features extracted from them.
47
What is SIFT and how is it used in image registration?
SIFT (Scale-Invariant Feature Transform) is a feature detection and description method that identifies keypoints in an image that are invariant to scale, rotation, and illumination. In image registration, SIFT is used to **detect and match keypoints** between images, allowing the estimation of transformations needed to align them accurately.
48
What is a rigid transformation in image registration?
A rigid transformation includes only translation and rotation, which preserves the structure and dimensions of the object being transformed. It is used for within-subject registration when no distortion is present.
49
What is the key difference between affine and rigid transformations?
Affine transformations involve translation, rotation, scaling, and shearing, while rigid transformations only involve translation and rotation, preserving the original object shape.
50
Why is a non-rigid (deformable) transformation useful in image registration?
Non-rigid transformations are non-linear and allow for complex deformations like bending and warping, which is useful for tasks like inter-subject registration and distortion correction.
51
What is thresholding used for in image segmentation?
Thresholding is a segmentation method that groups pixels based on their intensity values by setting a threshold that separates pixels into two categories (e.g., foreground and background).
52
What is the main parameter in thresholding for image segmentation?
The main parameter in thresholding is the threshold value, which separates pixel intensities into two clusters: those above and those below the threshold.
53
What is Otsu’s method for thresholding?
a way to automatically choose the best brightness cut-off to separate an image into, (it picks the threshold where the difference between the two groups is biggest) determines the optimal threshold by minimizing intra-class intensity variance or maximizing inter-class variance to separate background and foreground
54
What is K-means clustering in image segmentation?
K-means clustering is a method where pixels are grouped into clusters based on their features (e.g., intensity, color, or texture). The process involves iteratively assigning pixels to the nearest cluster center and updating the cluster centers.
55
How does the K-means algorithm update cluster centers?
K-means updates the cluster centers by computing the mean of all the points assigned to that cluster, aiming to minimize intra-class variance. Intra = within, inter = between
56
Why are non-rigid transformations necessary in image registration?
Non-rigid transformations are necessary for correcting distortions and handling complex, localized deformations in images, as rigid or affine transformations are insufficient for these tasks.
57
What are the four steps in the Canny edge detection algorithm?
The four steps in the Canny edge detection algorithm are: 1) Gaussian filtering to suppress noise, 2) Compute gradient magnitude and direction, 3) Apply non-maximum suppression, 4) Perform hysteresis thresholding to identify edges.
58
What does the Chain rule do in backpropagation?
The Chain rule in backpropagation helps compute gradients by combining the upstream gradient and local gradient to update the model's parameters iteratively.
59
Why is the **Gaussian kernel size** important in Canny edge detection?
The Gaussian kernel size affects the amount of **smoothing** applied to the image, helping to reduce noise. Larger kernels result in more smoothing, which may suppress finer details but reduce noise.
60
What is domain adaptation in saliency models for image and video?
Domain adaptation in saliency models allows a model trained on one type of data (e.g., images) to adapt and work with another type (e.g., videos), addressing differences such as motion and temporal changes.
61
What are some limitations of traditional feature-based methods in image segmentation?
Traditional feature-based methods may struggle with complex scenes, have high computational costs, rely on manually defined features, and often fail in cluttered environments or with overlapping objects.