final exam fixed Flashcards

1
Q
  1. What is the objective of multiple view geometry?
A

To understand the 3D structure of a scene given multiple images taken from different perspectives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. What is the difference between the 3D reconstruction and the Structure from Motion in multiple view geometry?
A

3D Reconstruction (Stereo Vision): Assumes known intrinsic (K) and extrinsic (R T) parameters to recover 3D scene using two cameras. Structure from Motion (SfM): Recovers 3D scene structure and camera poses simultaneously using multiple images/views (K might be given).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. In stereo vision with parallel cameras what is the meaning of the following equation? Z=bf/(u1−u2)
A

Depth (Z) = (Baseline (b) × Focal length (f)) / Disparity (u1−u2) where b is the distance between cameras f is the focal length and disparity is the difference in image x-coordinates of the same point between the two images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. In stereo vision what is the potential issue of a small baseline?
A

A small baseline limits the depth resolution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. What is the goal of triangulation?
A

To estimate the 3D coordinates of a point given its 2D projections in multiple images and the camera positions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. What are the benefits of stereo rectification in feature matching?
A

Stereo rectification simplifies the process of finding feature correspondences by making the image planes coplanar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What is the epipolar constraint?
A

Corresponding points on one image must lie on the epipolar lines of the other image. The three vectors of p1 p2 and c1c2 are coplanar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Why can’t the scale ambiguity be avoided in multiple view geometry with monocular vision?
A

Both object distance from the camera and object size are needed to determine the scale which requires prior knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. What is the reprojection error in two-view geometry?
A

The distance between the original 2D point and the point obtained by triangulating its 3D position using the estimated (R T) and projecting it back onto the image plane using (R T) and (K1 K2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. What are the possible causes for outliers in two-view geometry?
A

Changes in scale and perspective variations in illumination noise and blur and occlusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. What is the goal of RANSAC?
A

To estimate unknown pose when given measurements X which may contain outliers without considering the latent variable Z.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. Explain the procedures of RANSAC when applied to line fitting.
A

Randomly select a minimal subset to estimate the parameter. Calculate the number of inliers vs. outliers. Repeat k times and choose the parameter with the smallest number of outliers assuming enough inliers exist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. What are the three main components needed to implement RANSAC?
A

random selection of data, model estimation and count of the inliers verifying this model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. In RANSAC why do we select the smallest number of data points required to determine the unknown parameter?
A

The probability of selecting a subset of points entirely of inliers is higher, creating a better estimation of the parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  1. Describe the four steps required for the sequential structure from motion.
A

Feature detection -> Feature matching/tracking -> motion estimation -> Local Optimization (bundle adjustment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. What is the difference between front-end and back-end of visual odometry?
A

Front-end: Handles feature detection matching and pose estimation between two frames. Back-end: Refines pose among multiple frames.

17
Q
  1. Describe the goal the type of correspondence and the name of algorithm for bootstrapping.
A

2D-to-2D (8-point algorithm): Determines relative pose (up to scale) and 3D location of features from correspondence over two views. 3D-to-2D (Camera Calibration DLT and PnP): Determines absolute pose and intrinsic parameters between 3D feature locations and 2D pixel coordinates. 3D-to-3D (Point cloud registration): Determines relative pose using correspondence between 3D feature locations.

18
Q
  1. Describe the goal the type of correspondence and the name of algorithm for localization.
A

Goal: Determine pose for each additional view using 3D-to-2D correspondence (DLT PnP).

19
Q
  1. Why do we need the mapping step in visual odometry?
A

To extend the structure by selecting new keyframes and extracting new features as the number of features decreases quickly for additional views.

20
Q
  1. Why do we need the bundle adjustment step in visual odometry?
A

needed in visual odometry to refine the 3D structure and camera motion estimates by minimizing the reprojection error across multiple frames

21
Q
  1. How is visual SLAM different from visual odometry?
A

Visual SLAM addresses loop detection and closure to guarantee global consistency in addition to visual odometry’s goals of estimating incremental motion and guaranteeing local consistency.

22
Q
  1. Describe the characteristics of the indirect method in visual SLAM.
A

Extracts features with RANSAC and minimizes reprojection error. Can handle large relative motion between frames but is slow due to RANSAC.

23
Q
  1. Describe the characteristics of the direct method in visual SLAM.
A

Minimizes the photometric error of the image without extracting features using RANSAC. Uses all image information for greater robustness and accuracy. Fast because no RANSAC is used but sensitive to initial guess and cannot handle large relative motion between frames.

24
Q
  1. What is the goal of tracking?
A

To locate a moving object in consecutive video frames.

25
Q
  1. What are the pros and the cons of feature detection and matching compared with tracking?
A

Feature Detection and Mapping: Pros: Works even with large motion between two frames. Cons: Does not utilize additional information from small motion between frames. Tracking: Pros: Utilizes additional information from small motion between frames. Cons: May not work well with large motion between frames.

26
Q
  1. Summary for Point Tracking
A

Block-based methods: Robust to large motions but computationally expensive. Differential methods: No search performed applied to small motions can be extended to large motions using multi-scale implementation and computationally efficient for optical flow.

27
Q
  1. Summary for KLT Template Tracking
A

KLT (Kanade-Lucas-Tomasi) is a template tracking algorithm that estimates the motion of a template image patch between consecutive frames by minimizing the sum of squared intensity differences, assuming small motion and brightness constancy.

28
Q
  1. Describe the image features that are not preserved under homography and the image feature that is preserved under homography.
A

Not preserved: Parallel lines distances lengths curvature. Preserved: Collinear points.

29
Q
  1. What are the pros and the cons of vision compared with IMU?
A

Vision is more accurate for slow motion and provides richer information, but struggles with fast motion, challenging scenes, and has lower output rates. IMUs excel in fast motion scenarios with higher output rates, but are less accurate for slow motion and drift over time.

30
Q
  1. What are the pros and the cons of IMU compared with vision?
A

IMU Pros: More accurate for motion with large accelerations and angular velocity higher output rate. IMU Cons: Inaccurate for motion with low acceleration measurement drifts over time.

31
Q
  1. What are the differences between the loosely coupled approach and the tightly coupled approach in visual-inertial fusion?
A

In the loosely coupled approach, visual odometry (VO) and IMU estimate the pose independently, making it easier to implement but less accurate. The tightly coupled approach integrates IMU measurements directly with feature tracking and 3D reconstruction, resulting in higher accuracy but increased complexity.

32
Q
  1. In the tightly coupled visual-inertial fusion compare the filtering and the optimization for speed and accuracy.
A

Filtering: Repeated procedure for prediction and correction. Faster but less accurate (sensitive to linearization). Optimization: Multiple iterations to minimize the objective function. Slower (can be improved with GTSAM) but more accurate.