3. Camera Calibration Flashcards
(35 cards)
What is calibration?
It is a process of estimating/calculating projection matrix P. Given n points with known 3D coordinates in the world Xi, and known image projections xi (where they are on the screen, on the image), we estimate the projection matrix P.
How to estimate the projection matrix in camera calibration?
P is a 3x4 matrix, and it can be represented as a vector of 3 transposed vectors where one vector corresponds to one row in the P. This simplifies a bit the calculation.
For every point in the world that we know the projection coords, we can write xi = P*Xi, just simply converting. We know that xi and Xi are parallel to each other, so we can write xi x PXi = 0 (their cross-product is 0 since they don’t cross). Then, we can simplify it a bit by extracting the unknown (p-vectors) into a single vector of 12 rows and 1 column.
For each Xi-xi pair (one known coords), we get only 2 linear indepdenent equations (since third can be written as the first two -> dependent), we need only 6 known coords for the solution.
How many points are needed to calibrate the camera? Why do we might need more though?
Only 6 are needed: For each Xi-xi pair (one known coords), we get only 2 linear indepdenent equations (since third can be written as the first two -> dependent), we need only 6 known coords for the solution.
The problem are the errors, measuring the real world coords, selecting the pixels on the screen that correspond to those dots, camera position in the world frame (measuring tape), rotation etc. All of these things have some error. That means, the more points we have, the better the estimate.
What algorithms do we use to calibrate the camera with many points?
We can use least-squares estimation, or find the right nullspace of A using SVD and take the last singular vector p = v12
Explain how least-squares estimation can be used to calibrate a camera.
The idea is that we have many points in real world and we are trying to find the parameters that best fit these points. The formula that we are minimizing is given as Ap = 0 where A is the equation system matrix of coordinates (Xi and xi). p is a projection matrix written as a vector. We are minimizing the p.
Problem: trivial solution of p=0.
Solution: Add a constraint such that the norm of p is some number (non-zero)
Ap = 0 s.t. ||p|| = 1
This is also called homogeneous least squares
What is homogeneous least squares?
We are trying to find a vector p (of size n) that satisfies Ap = 0 where A is a matrix mxn, 0 is size of m. We also assume m >= n and that rank(A) = n. To exclude p=0, we additionaly constrain the norm of p as ||p||=1
Since we are minimizing the squared error, we want to minimize ||Ap||2 (squared).
||Ap||2 = pT AT Ap.
Using some vector calculus and langrange multiplier, we can see that this can be represented as eigenvalue problem where p is an eigenvector of AT A and lambda is an eigenvalue.
In least square estimation, what eigenvector should you choose?
Since we are minimizing ||Ap||2 = pT AT Ap,
we can represent that ||Ap||2 = lambda. So we are minimizing lambda. That means, we choose the eigenvector where the eigenvalue is the lowest.
How to perform the SVD (singular value decomposition) in camera calibration?
Since we can’t use eigenvalue problem (not a square matrix), we use SVD which is the similar but for rectangular matrices.
We represent A = U S VT where
- U = Rmxn orthonormal
- V = Rnxn orthonomal
- S = nxn is a diagonal matrix with descending singular values along the diagonal
This means AT A = V S2 VT, and under our assumptions that
- eigenvectors of AT A are the right singular vectors of A, and
- eigenvalues of AT A are the square of the singular values of A
To find the eigenvector or AT A with the smallest eigenvalue, we compute the last right-singular vector of A
Why use SVD instead of eigen vectors
For simple vectors (with 12 values), it doesn’t make much of a difference but with very large ones, there are methods in SVD to calculate only one vector (the last one) instead of the whole computation of eigen vectors.
What are coplanar points and how to avoid them?
They are the points that all lie on one plane, and the satisfy ПT Xi = 0. For them, we will get some bad solutions. (П, 0, 0), (0, П, 0), (0, 0, П). To solve this either use some 3D calibration target (some checker box) when calibrating, or take two photos while moving the 2D plane (wooden board with dots) around.
What is a checkerboard?
It is usually some wooden board with checkboard on it. It is used for calibration of the camera. Board will have to be moved around and multiple photos have to be taken because of the coplanar points.
How to extract intrinsic and extrinsic parameters from the projection matrix when doing calibration?
So we assume we have P (3x4). We first split the matrix into a 3x3 matrix and a 3x1 vector.
Next, we decompose the matrix M into upper triangular part K (calibration) (K has values on and above the main diagonal) and orthonormal part R (rotation) using RQ-decomposition. The c (camera coords) are calculated as the nullspace of P by means of SVD.
What is aperture? How does it affect the image quality?
Aperture is the size of the pinhole. This hole can’t be infinitely small so some light-rays are overlapping. Point usually projects a circle on the image plane (blur affect). If we make this hole smaller, we might get a better quality (light-rays are clustered better), but with low aperture, we need higher exposure (photos take longer cuz we need more light). This might not be optimal (wait for 10min to take a photo? What about a video?)
Even if we have small aperture, we might get some wave-effects of the light called diffraction effects. This introduces blur even with very small aperture
What are diffraction effects?
They happen when aperture is very small and wave-properties of the light cause the image blur.
How can adding a lens help with image quality? When does it not improve the quality?
Instead of only one light way from the point reaching the sensor, multiple rays are reflected onto the same point on the sensor (more light). This helps if the object is at the optimal distance away from the lens. As soon when it is closer or further (not in focus), then the light rays are not in the same point but they create a “circle of confusion”.
What is a circle of confusion?
When we use a lens and the object is not in focus (too far aways or too close to the lens), the light from a single point project to multiple points creating the circle of confusion.
It also happens when the aperture is too big and rays overlap a lot making a blur effect.
What is the lens formula, how to interpret it?
It is a formula that describes how far the object has to be away from the lens to be in focus (no circle of confusion)
1/f = 1/D + 1/D’
f - focal length (distance between lens and a focal point)
D - distance between an object and the lens
D’ - distance between an image plane (sensor) and the lens
This means that further away the object is, we have to decrease the distance between the lens and the sensor (by either moving the sensor or moving the lens, usually the lens is moved)
What is the depth of field?
It is a range of distance away from the lens which are in focus.
Given the thin lens formula, we have only one distance that should be in focus but due to the pixel nature of the sensor, we have a range of distance which will be in focus. This range is called Depth of Field.
How can the depth of field be controlled?
Even with lenses, we have the aperture in front of the lens that controls how much light comes on the lens and onto the sensor.
- If we want to have a bigger depth of field, we have to decrease the aperture which makes it more like a pin hole camera (no “out of focus” property) but less light comes in and we will have to increase the exposure.
- For smaller depth of field, we open the aperture more and the light comes from a wider angle and it makes the depth of field smaller and gives the bigger blur effect outside of the depth of field.
What is the field of view (FoV)?
It is how much can we observe using out camera lens at the given time. It depends on the focal length and the size of the sensor.
If we have a small focal length (short cameras), the FoV is larger and we don’t have the ‘zoom effect’ which means we have to come closer to the object to take a picture of them.
If we have a bigger focal length (longer cameras), we have small FoV which makes the objects appear larger on the image (zoom effect)
How can we calculate the field of view (FoV)>
FoV = arctan(d/2 * 1/f).
Larger the focal length, smaller the FoV
What are some distortions that appear when we have small focal length?
Small focal length means bigger FoV (field of view). That means that we have to come closer to an object to take a photo of it. This makes the depth more significant on the image (like bigger noses) (when you are 30cm away from the face, the difference of 5cm is noticeable)
This can be fixed by moving away from the object and increasing the focal length (zoom). When we are far away, this 5cm difference is not noticeable and we have similar to pinhole camera, no lens effects.
What are some flaws when working with lenses?
- Chromatic aberration: different wavelengths of light refract off of the lens at different angles. This distorts colors and causes color fringing.
- Spherical aberration: When working with spherical lenses, the rays further from the center focus closer. This causes lenses to not focus light perfectly
- Vignetting: When there are multiple lenses, the first lens might not refract the light onto the second lens. These light rays are lost and it causes dark area around the image. Sometimes this is used as an artistic decision.
- Radial Distortion: It is caused by imperfect lenses. It is more noticeable for the rays that pass at the edge of the lens. It makes straight lines not straight in the image.
What is chromatic aberration?
When different wavelengths of light refract off of the lens at different angles. This distorts colors and causes color fringing.