Additional Exam Notes Flashcards by Joshua Carey-Young

Explain the term Epipolar Plane

It’s a combination of the image feature and two optical centres, all of which define a plane of interpretation.
The world feature generating the known image feature and the corresponding feature in the other view must lie in this plane.

How well did you know this?

Not at all

Perfectly

Why does the epipolar plane simplify the stereo correspondence problem?

The intersection of the plane of interpretation and the second image plane is a straight line (called an epipolar line)
The epipolar plane determines the epipolar lines. Given a feature extracted from one view, the corresponding feature must lie on the corresponding epipolar line in the other.
Search space is reduced from two dimensions to one

How well did you know this?

Not at all

Perfectly

How is the Integral Image calculated?

Computes a value at each pixel, that is the sum of the pixel values above and to the left of the source pixel
Cumulative row sum: s(x, y) = s(x-1, y) + i(x, y)
Integral image/ii(x, y) = ii(x, y-1) + s(x, y)

How well did you know this?

Not at all

Perfectly

What benefit does using Integral images bring?

Advantages of speed i.e. more efficient calculations. Only 4 numbers are needed to calculate the sum of intensity values for each rectangle

How well did you know this?

Not at all

Perfectly

What are advantages of a pinhole camera?

Simple to understand
No lens distortion due to lack of lens
Infinite depth of field: No depth of field effect distorting the image

How well did you know this?

Not at all

Perfectly

What is a one line description for segmentation?

Assign all pixels to objects

How well did you know this?

Not at all

Perfectly

What is a one line description for Recognition?

Identify the main object in the image

How well did you know this?

Not at all

Perfectly

What is a one line description for Detection?

Find the location of all objects

How well did you know this?

Not at all

Perfectly

What is a one line description for Pose?

Find all of the object parts

How well did you know this?

Not at all

Perfectly

What are characteristics of a good feature?

Invariant to scale and rotation
Reflect useful object properties
Unique and repeatable

How well did you know this?

Not at all

Perfectly

What are example types of object recognition tasks?

Segmentation
Pose Estimation
Detection

How well did you know this?

Not at all

Perfectly

Why are Diffusion Models preferred over GANs?

Easier to train
Generate high-quality images
Avoid GAN’s issues such as mode collapse or tough to train due to discriminator balance

How well did you know this?

Not at all

Perfectly

What are important concepts in Background Subtraction?

Gaussian Mixture Models
Thresholding

How well did you know this?

Not at all

Perfectly

What features are commonly used in Classical Computer Vision?

HOG descriptors
Colour histograms
Texture Features

How well did you know this?

Not at all

Perfectly

Describe the structure of a pinhole camera

The pinhole camera’s optical point/centre is located on the principal axis / Z axis
The 3D scene is projected onto the image plane through the pinhole located at the optical point
The focal length narrows or widens the field of view by increasing or decreasing in length respectively.

How well did you know this?

Not at all

Perfectly

Explain the forward process in a diffusion model

Study These Flashcards

Noise is added in a Markov Chain
Each step’s noise is only dependent on the previous step
This is a fixed process, containing no learned parameters.

Explain the reverse process in a Diffusion model

Study These Flashcards

Train and use a U-net style model that learns the reverse diffusion process
This is trained by predicting the noise in the image given the current timestep
By removing predicted noise from the image, we get an approximation of the original image

Describe the process of object detection for classification

Study These Flashcards

Features are extracted using feature extraction methods
They are then used to train a classifier model, which identifies regions in test images likely to contain objects and classifies them accordingly

Explain how a particle filter updates object tracking

Study These Flashcards

Each particle represents a hypothesis
Predictions are made based on a motion model
Measured positions are then used to update the particle weights based on how well they match the observation

Describe the architecture of a U-net model?

Study These Flashcards

U-net uses an encoder to compress features
Followed by a decoder to reconstruct segmentation maps
Skip connections between matching levels of encoder and decoder to preserve spatial information

Explain both semantic and instance segmentation

Study These Flashcards

Semantic labels every pixel by class
Instance distinguishes individual objects of the same class

What are the challenges of using a single Gaussian Background Model?

Study These Flashcards

Cannot model dynamic backgrounds
Sensitive to lighting changes
Cannot represent multiple background modes at a pixel

What’s an advantage and disadvantage of using a Particle Filter over a Kalman Filter?

Study These Flashcards

Advantage - Particle filters handle multi-modal distributions
Disadvantage - Requires more computation

Why is SIFT useful?

Study These Flashcards

Invariant to scale, rotation and illumination
It’s robust in regards to matching across different views

What is the importance of illumination modelling?

Helps accurately interpret scene radiance, ensuring features and colours are not misinterpreted due to lighting changes

What factors affect the quality of images captured in digital photography?

- Shutter speed - Aperture - ISO Setting

What are valid applications of saliency prediction in Computer Vision?

- Smart image cropping - Object Detection - Content-aware compression

In Stereo vision, what helps estimate depth from two images?

- Epipolar geometry: Contains correspondence search - Disparity maps: Allow depth calculation

What are components of a Kalman filter used in tracking?

- Motion model - Measurement Update - State Prediction

Explain how ISO, Shutter Speed and Aperture interact to control image brightness?

- ISO controls sensor sensitivity - Aperture determines how much light enters the lens - Shutter Speed determines how long the light is exposed for Increasing one typically requires reducing another to maintain consistent exposure

Describe how disparity maps are used in Stereo Vision to infer depth

Disparity maps represent pixel shifts between left and right images. Depth is inversely proportional to disparity. The larger the disparity, the closer the object is to the camera that took the image.

How does a Kalman Filter handle uncertainty during object tracking?

Kalman filters maintain a mean and a covariance for state estimates. Prediction steps add uncertainty and measurement updates reduce it

What is the formal way of explaining the role of Pooling layers?

Pooling layers downsample feature maps to reduce computation and enforce spatial hierarchy.

What is the formal way of explaining the role of Activation Functions in CNNs?

Activation functions introduce non-linearity, enabling complex pattern learning.

What is the function of a Disparity Map in Stereo Vision?

Disparity maps capture the shift of corresponding pixels between stereo image pairs. This disparity allows calculation of depth via triangulation

Compare background subtraction using a single Gaussian model against a Gaussian Mixed Model

Single Gaussian: - Assumes a static background Mixed Gaussian: - Can represent dynamic backgrounds like moving leaves by modelling multiple modes per pixel

How does Motion Difference capture movement?

Motion difference compares sequential frames to detect pixel intensity changes, indicating movement

What is a major challenge of evaluating saliency maps?

Human attention is subjective, making ground truths hard to define. Multiple metrics exist, but interpreting results remains complex

What is the main advantage of using U-Net in image segmentation tasks?

U-Net preserves spatial resolution through skip connections and so they are very efficient for small dataset segmentation tasks.

Additional Exam Notes Flashcards

The deck that brings together other points I have yet to cover properly. (39 cards)