Additional Exam Notes Flashcards
The deck that brings together other points I have yet to cover properly. (39 cards)
Explain the term Epipolar Plane
It’s a combination of the image feature and two optical centres, all of which define a plane of interpretation.
The world feature generating the known image feature and the corresponding feature in the other view must lie in this plane.
Why does the epipolar plane simplify the stereo correspondence problem?
- The intersection of the plane of interpretation and the second image plane is a straight line (called an epipolar line)
The epipolar plane determines the epipolar lines. Given a feature extracted from one view, the corresponding feature must lie on the corresponding epipolar line in the other. - Search space is reduced from two dimensions to one
How is the Integral Image calculated?
- Computes a value at each pixel, that is the sum of the pixel values above and to the left of the source pixel
- Cumulative row sum: s(x, y) = s(x-1, y) + i(x, y)
- Integral image/ii(x, y) = ii(x, y-1) + s(x, y)
What benefit does using Integral images bring?
- Advantages of speed i.e. more efficient calculations. Only 4 numbers are needed to calculate the sum of intensity values for each rectangle
What are advantages of a pinhole camera?
- Simple to understand
- No lens distortion due to lack of lens
- Infinite depth of field: No depth of field effect distorting the image
What is a one line description for segmentation?
Assign all pixels to objects
What is a one line description for Recognition?
Identify the main object in the image
What is a one line description for Detection?
Find the location of all objects
What is a one line description for Pose?
Find all of the object parts
What are characteristics of a good feature?
- Invariant to scale and rotation
- Reflect useful object properties
- Unique and repeatable
What are example types of object recognition tasks?
- Segmentation
- Pose Estimation
- Detection
Why are Diffusion Models preferred over GANs?
- Easier to train
- Generate high-quality images
- Avoid GAN’s issues such as mode collapse or tough to train due to discriminator balance
What are important concepts in Background Subtraction?
- Gaussian Mixture Models
- Thresholding
What features are commonly used in Classical Computer Vision?
- HOG descriptors
- Colour histograms
- Texture Features
Describe the structure of a pinhole camera
- The pinhole camera’s optical point/centre is located on the principal axis / Z axis
- The 3D scene is projected onto the image plane through the pinhole located at the optical point
- The focal length narrows or widens the field of view by increasing or decreasing in length respectively.
Explain the forward process in a diffusion model
- Noise is added in a Markov Chain
- Each step’s noise is only dependent on the previous step
- This is a fixed process, containing no learned parameters.
Explain the reverse process in a Diffusion model
- Train and use a U-net style model that learns the reverse diffusion process
- This is trained by predicting the noise in the image given the current timestep
- By removing predicted noise from the image, we get an approximation of the original image
Describe the process of object detection for classification
- Features are extracted using feature extraction methods
- They are then used to train a classifier model, which identifies regions in test images likely to contain objects and classifies them accordingly
Explain how a particle filter updates object tracking
- Each particle represents a hypothesis
- Predictions are made based on a motion model
- Measured positions are then used to update the particle weights based on how well they match the observation
Describe the architecture of a U-net model?
- U-net uses an encoder to compress features
- Followed by a decoder to reconstruct segmentation maps
- Skip connections between matching levels of encoder and decoder to preserve spatial information
Explain both semantic and instance segmentation
- Semantic labels every pixel by class
- Instance distinguishes individual objects of the same class
What are the challenges of using a single Gaussian Background Model?
- Cannot model dynamic backgrounds
- Sensitive to lighting changes
- Cannot represent multiple background modes at a pixel
What’s an advantage and disadvantage of using a Particle Filter over a Kalman Filter?
Advantage - Particle filters handle multi-modal distributions
Disadvantage - Requires more computation
Why is SIFT useful?
- Invariant to scale, rotation and illumination
- It’s robust in regards to matching across different views