The Final Iteration Flashcards

Keep adding here, and then study it all. And I mean all of it (40 cards)

1
Q

What is the epipolar plane?

A

Given two optical centres and a point in an image, you can compute the epipolar plane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an epipolar line?

A

An epipolar line is defined by both the epipolar plane and the image plane for a camera. The epipolar line emerges where the image plane intersects the epipolar plane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does an epipolar plane help with stereo vision?

A

It reduces the dimension for the correspondence problem from 2D to 1D, making it more efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the cost volume in regards to correspondence search?

A

The cost volume stores matching costs for each pixel over a range of disparities, which represents how well that pixel matches with a shifted pixel in the other image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is the cost volume used in regards to correspondence search?

A

The cost volume is used to compute the disparity match by selecting the disparity with the lowest cost for each pixel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the fine-grained version of how a U-net is trained on image data for segmentation?

A
  • Feed it image-label pairs, where each label is a pixel-wise segmentation map
  • Architecture uses an encoder to downsample features, and a decoder with skip connections to reconstruct spatial details
  • During training, a loss function compares predictions with the ground truth network
  • Network updates via backpropagation
  • Performance evaluated using a test set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does Background Subtraction work in Object Tracking?

A

Background subtraction works by comparing current frames to a background model using changes in pixel intensities. It uses Gaussian Mixed Models, which allows it to handle dynamic backgrounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are advantages of using SIFT descriptors over raw pixel intensities?

A
  • Invariant to scale, rotation and minor illumination changes
  • Uses local gradient orientation histograms, which are resistant to noise and misalignment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the benefits from using drop-out when training networks?

A
  • Improves generalisation
  • Reduces overfitting
  • Forces the network to learn more robust representations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does a Gaussian Mixed Model work?

A
  • It models each pixel as a mixture of several Gaussian distributions, representing different background states.
  • Each pixel is then compared to these different Gaussian models to determine whether it fits into the background or the foreground i.e. an object of interest.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does histogram equalisation work?

A
  • Redistributes pixel intensity values so that they span the full range of possible values
  • Computes the cumulative distribution function (CDF) of the image histogram, and maps the original intensities to new ones
  • Spreads out frequent intensity values, improving contrast in low-contrast images
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the equation used to calculate disparity between two images?

A

Disparity = Xl - Xr
Where:
Xl = Point observed in left image
Xr = Point observed in right image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the equation used to calculate depth, using disparity?

A

Z = f(T)/D
Where:
Z = Depth
f = Focal Length
T = Real-world distance between two cameras
D = Disparity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does a Particle Filter work?

A
  • Generates particles, each of which represents a different hypothesis that ‘guesses’ the position of the object in the next time-frame
  • Each particle has an assigned weight, which is computed through the use of an observation/motion model that compares predicted measurements with real sensor data
  • Over time, particles with lower weights contribute less, leading to degeneracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does resampling alleviate the problem of degeneracy in Particle Filters?

A

It duplicates high-weight particles and discards low-weight particles when generating the next set of particles.
It thereby focuses computation on more likely hypotheses and maintains tracking accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is semantic segmentation?

A

Assigns a class label to each pixel in an image, grouping pixels by category without differentiating between individual objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is instance segmentation?

A

Assigns a class label to each pixel in an image, and also identifies and separates each instance of an object i.e. it identifies each object separately unlike semantic segmentation

18
Q

How is training loss computed for a denoising diffusion model?

A
  • Generates images with predicted levels of noise
  • Compares predicted noise in image with actual noise in same image in sequence using the MSE loss function.
  • Repeats this process at each timestep
19
Q

What are some disadvantages of using VAEs for generating images?

A
  • Blurry image generation due to reconstruction loss through the use of KL divergence terms and the use of a probabilistic decoder.
  • Utilise a latent space regularisation term, which also contributes to blurry images
20
Q

How is illumination invariance implemented in feature detection?

A
  • HOG descriptors
  • SIFT descriptors

Both of these are able to implement illumination invariance through focusing on edge orientations rather than absolute intensities, as well as normalising local patches to reduce brightness variations

21
Q

What is a formal definition of lower-level tasks?

A

Low-level tasks involve basic image processing such as edge detection or noise reduction

22
Q

What is a formal definition of mid-level tasks?

A

Mid-level tasks involve interpreting groups of pixels, including segmentation and depth estimation

23
Q

What is a formal definition of high-level tasks?

A

High-level tasks refer to semantic understanding such as object detection, recognition and pose estimation

24
Q

What is the sensing stage in Computer Vision?

A

It involves acquiring raw image data through devices like cameras or depth sensors, and provides the initial input to the CV system.

25
What is in a basic image processing pipeline?
- Image acquisition - Pre-processing - Feature extraction - Classification
26
What are the pinhole camera's limitations?
- Low brightness, due to a lack of light - Image blue if the hole is too large - Diffraction effects if the hole is too small
27
How are 2D coordinates derived from 3D coordinates?
- It is derived through the use of a projection process that uses the camera projection matrix. - The matrix combines intrinsic and extrinsic parameters from the camera, and transforms the 3D point into the camera coordinate system and then projected onto the image plane
28
What are some edge detection algorithms that are commonly used?
- Sobel - Canny
29
How does Sobel work?
Sobel uses convolutional kernels to compute gradients in the horizontal and vertical directions
30
What is an advantage and disadvantage of using Sobel to detect edges?
Advantage - Simple and computationally efficient Disadvantage - Sensitive to noise
31
What are keypoints in feature detection?
Keypoints are distinct and repeatable locations in an image, such as corners or edges, that are stable under various transformations like rotation and scale.
32
What are local descriptors?
Local descriptors capture distinctive information from small regions around keypoints in an image, which are robust to changes in scale, rotation and illumination.
33
What is Thresholding in regards to grayscale images?
Thresholding converts a greyscale image into a binary image by selecting a threshold value. Pixels with intensity values above the threshold are considered as foreground objects, whereas pixels below the threshold are considered as part of the background.
34
Describe the Region Growing segmentation method
- Start with user-defined or automatic selection of seed points in the image - Expand regions by including neighbouring pixels that meet pre-defined criteria, such as intensity - Growth continues until no more similar pixels are found, resulting in segmented regions.
35
What is the formal definition of Affine Transformations?
They are a linear mapping method that preserves points, straight lines and planes. It includes transformations such as rotation, translation, scaling and shearing.
36
How could affine transformations be used in image registration?
They are used to align two images to each other by correcting geometric distortions, enabling one image to be mapped to the other
37
How does intensity-based registration work?
It utilises the pixel intensity values and compares them directly between two images, rather than relying on extracted features. It is typically achieved through minimising the sum of squared differences.
38
Why are geometric transformation techniques important for image alignment problems?
They are important because they bring different views of the same scene or object into a common coordinate system.
39
What is a basic pipeline for image alignment?
- Image acquisition & loading - Use a similarity measurement - Use a registration algorithm to find the best transformation parameters - Apply geometric transformations
40
In a particle filter used for tracking, what does each particle represent? How are these particles updated at each time step?
Each particle in a particle filter represents a possible state hypothesis (e.g., object position + velocity), with an associated weight. - Prediction: particles are propagated using a motion model - Update: weights updated based on how well each particle matches new observations - Resampling: new particles drawn according to weight → focus on high-probability areas