The Final Iteration Flashcards
Keep adding here, and then study it all. And I mean all of it (40 cards)
What is the epipolar plane?
Given two optical centres and a point in an image, you can compute the epipolar plane.
What is an epipolar line?
An epipolar line is defined by both the epipolar plane and the image plane for a camera. The epipolar line emerges where the image plane intersects the epipolar plane.
How does an epipolar plane help with stereo vision?
It reduces the dimension for the correspondence problem from 2D to 1D, making it more efficient.
What is the cost volume in regards to correspondence search?
The cost volume stores matching costs for each pixel over a range of disparities, which represents how well that pixel matches with a shifted pixel in the other image.
How is the cost volume used in regards to correspondence search?
The cost volume is used to compute the disparity match by selecting the disparity with the lowest cost for each pixel
What is the fine-grained version of how a U-net is trained on image data for segmentation?
- Feed it image-label pairs, where each label is a pixel-wise segmentation map
- Architecture uses an encoder to downsample features, and a decoder with skip connections to reconstruct spatial details
- During training, a loss function compares predictions with the ground truth network
- Network updates via backpropagation
- Performance evaluated using a test set
How does Background Subtraction work in Object Tracking?
Background subtraction works by comparing current frames to a background model using changes in pixel intensities. It uses Gaussian Mixed Models, which allows it to handle dynamic backgrounds.
What are advantages of using SIFT descriptors over raw pixel intensities?
- Invariant to scale, rotation and minor illumination changes
- Uses local gradient orientation histograms, which are resistant to noise and misalignment
What are the benefits from using drop-out when training networks?
- Improves generalisation
- Reduces overfitting
- Forces the network to learn more robust representations
How does a Gaussian Mixed Model work?
- It models each pixel as a mixture of several Gaussian distributions, representing different background states.
- Each pixel is then compared to these different Gaussian models to determine whether it fits into the background or the foreground i.e. an object of interest.
How does histogram equalisation work?
- Redistributes pixel intensity values so that they span the full range of possible values
- Computes the cumulative distribution function (CDF) of the image histogram, and maps the original intensities to new ones
- Spreads out frequent intensity values, improving contrast in low-contrast images
What is the equation used to calculate disparity between two images?
Disparity = Xl - Xr
Where:
Xl = Point observed in left image
Xr = Point observed in right image
What is the equation used to calculate depth, using disparity?
Z = f(T)/D
Where:
Z = Depth
f = Focal Length
T = Real-world distance between two cameras
D = Disparity
How does a Particle Filter work?
- Generates particles, each of which represents a different hypothesis that ‘guesses’ the position of the object in the next time-frame
- Each particle has an assigned weight, which is computed through the use of an observation/motion model that compares predicted measurements with real sensor data
- Over time, particles with lower weights contribute less, leading to degeneracy
How does resampling alleviate the problem of degeneracy in Particle Filters?
It duplicates high-weight particles and discards low-weight particles when generating the next set of particles.
It thereby focuses computation on more likely hypotheses and maintains tracking accuracy
What is semantic segmentation?
Assigns a class label to each pixel in an image, grouping pixels by category without differentiating between individual objects.
What is instance segmentation?
Assigns a class label to each pixel in an image, and also identifies and separates each instance of an object i.e. it identifies each object separately unlike semantic segmentation
How is training loss computed for a denoising diffusion model?
- Generates images with predicted levels of noise
- Compares predicted noise in image with actual noise in same image in sequence using the MSE loss function.
- Repeats this process at each timestep
What are some disadvantages of using VAEs for generating images?
- Blurry image generation due to reconstruction loss through the use of KL divergence terms and the use of a probabilistic decoder.
- Utilise a latent space regularisation term, which also contributes to blurry images
How is illumination invariance implemented in feature detection?
- HOG descriptors
- SIFT descriptors
Both of these are able to implement illumination invariance through focusing on edge orientations rather than absolute intensities, as well as normalising local patches to reduce brightness variations
What is a formal definition of lower-level tasks?
Low-level tasks involve basic image processing such as edge detection or noise reduction
What is a formal definition of mid-level tasks?
Mid-level tasks involve interpreting groups of pixels, including segmentation and depth estimation
What is a formal definition of high-level tasks?
High-level tasks refer to semantic understanding such as object detection, recognition and pose estimation
What is the sensing stage in Computer Vision?
It involves acquiring raw image data through devices like cameras or depth sensors, and provides the initial input to the CV system.