3. Camera Calibration Flashcards by Savo Simeunovic

What is calibration?

It is a process of estimating/calculating projection matrix P. Given n points with known 3D coordinates in the world Xi, and known image projections xi (where they are on the screen, on the image), we estimate the projection matrix P.

How well did you know this?

Not at all

Perfectly

How to estimate the projection matrix in camera calibration?

P is a 3x4 matrix, and it can be represented as a vector of 3 transposed vectors where one vector corresponds to one row in the P. This simplifies a bit the calculation.

For every point in the world that we know the projection coords, we can write xi = P*Xi, just simply converting. We know that xi and Xi are parallel to each other, so we can write xi x PXi = 0 (their cross-product is 0 since they don’t cross). Then, we can simplify it a bit by extracting the unknown (p-vectors) into a single vector of 12 rows and 1 column.

For each Xi-xi pair (one known coords), we get only 2 linear indepdenent equations (since third can be written as the first two -> dependent), we need only 6 known coords for the solution.

How well did you know this?

Not at all

Perfectly

How many points are needed to calibrate the camera? Why do we might need more though?

Only 6 are needed: For each Xi-xi pair (one known coords), we get only 2 linear indepdenent equations (since third can be written as the first two -> dependent), we need only 6 known coords for the solution.

The problem are the errors, measuring the real world coords, selecting the pixels on the screen that correspond to those dots, camera position in the world frame (measuring tape), rotation etc. All of these things have some error. That means, the more points we have, the better the estimate.

How well did you know this?

Not at all

Perfectly

What algorithms do we use to calibrate the camera with many points?

We can use least-squares estimation, or find the right nullspace of A using SVD and take the last singular vector p = v12

How well did you know this?

Not at all

Perfectly

Explain how least-squares estimation can be used to calibrate a camera.

The idea is that we have many points in real world and we are trying to find the parameters that best fit these points. The formula that we are minimizing is given as Ap = 0 where A is the equation system matrix of coordinates (Xi and xi). p is a projection matrix written as a vector. We are minimizing the p.

Problem: trivial solution of p=0.
Solution: Add a constraint such that the norm of p is some number (non-zero)
Ap = 0 s.t. ||p|| = 1
This is also called homogeneous least squares

How well did you know this?

Not at all

Perfectly

What is homogeneous least squares?

We are trying to find a vector p (of size n) that satisfies Ap = 0 where A is a matrix mxn, 0 is size of m. We also assume m >= n and that rank(A) = n. To exclude p=0, we additionaly constrain the norm of p as ||p||=1

Since we are minimizing the squared error, we want to minimize ||Ap||2 (squared).
||Ap||2 = pT AT Ap.

Using some vector calculus and langrange multiplier, we can see that this can be represented as eigenvalue problem where p is an eigenvector of AT A and lambda is an eigenvalue.

How well did you know this?

Not at all

Perfectly

In least square estimation, what eigenvector should you choose?

Since we are minimizing ||Ap||2 = pT AT Ap,
we can represent that ||Ap||2 = lambda. So we are minimizing lambda. That means, we choose the eigenvector where the eigenvalue is the lowest.

How well did you know this?

Not at all

Perfectly

How to perform the SVD (singular value decomposition) in camera calibration?

Since we can’t use eigenvalue problem (not a square matrix), we use SVD which is the similar but for rectangular matrices.

We represent A = U S VT where
- U = Rmxn orthonormal
- V = Rnxn orthonomal
- S = nxn is a diagonal matrix with descending singular values along the diagonal

This means AT A = V S2 VT, and under our assumptions that
- eigenvectors of AT A are the right singular vectors of A, and
- eigenvalues of AT A are the square of the singular values of A

To find the eigenvector or AT A with the smallest eigenvalue, we compute the last right-singular vector of A

How well did you know this?

Not at all

Perfectly

Why use SVD instead of eigen vectors

For simple vectors (with 12 values), it doesn’t make much of a difference but with very large ones, there are methods in SVD to calculate only one vector (the last one) instead of the whole computation of eigen vectors.

How well did you know this?

Not at all

Perfectly

What are coplanar points and how to avoid them?

They are the points that all lie on one plane, and the satisfy ПT Xi = 0. For them, we will get some bad solutions. (П, 0, 0), (0, П, 0), (0, 0, П). To solve this either use some 3D calibration target (some checker box) when calibrating, or take two photos while moving the 2D plane (wooden board with dots) around.

How well did you know this?

Not at all

Perfectly

What is a checkerboard?

It is usually some wooden board with checkboard on it. It is used for calibration of the camera. Board will have to be moved around and multiple photos have to be taken because of the coplanar points.

How well did you know this?

Not at all

Perfectly

How to extract intrinsic and extrinsic parameters from the projection matrix when doing calibration?

So we assume we have P (3x4). We first split the matrix into a 3x3 matrix and a 3x1 vector.

Next, we decompose the matrix M into upper triangular part K (calibration) (K has values on and above the main diagonal) and orthonormal part R (rotation) using RQ-decomposition. The c (camera coords) are calculated as the nullspace of P by means of SVD.

How well did you know this?

Not at all

Perfectly

What is aperture? How does it affect the image quality?

Aperture is the size of the pinhole. This hole can’t be infinitely small so some light-rays are overlapping. Point usually projects a circle on the image plane (blur affect). If we make this hole smaller, we might get a better quality (light-rays are clustered better), but with low aperture, we need higher exposure (photos take longer cuz we need more light). This might not be optimal (wait for 10min to take a photo? What about a video?)

Even if we have small aperture, we might get some wave-effects of the light called diffraction effects. This introduces blur even with very small aperture

How well did you know this?

Not at all

Perfectly

What are diffraction effects?

They happen when aperture is very small and wave-properties of the light cause the image blur.

How well did you know this?

Not at all

Perfectly

How can adding a lens help with image quality? When does it not improve the quality?

Instead of only one light way from the point reaching the sensor, multiple rays are reflected onto the same point on the sensor (more light). This helps if the object is at the optimal distance away from the lens. As soon when it is closer or further (not in focus), then the light rays are not in the same point but they create a “circle of confusion”.

How well did you know this?

Not at all

Perfectly

What is a circle of confusion?

Study These Flashcards

When we use a lens and the object is not in focus (too far aways or too close to the lens), the light from a single point project to multiple points creating the circle of confusion.

It also happens when the aperture is too big and rays overlap a lot making a blur effect.

What is the lens formula, how to interpret it?

Study These Flashcards

It is a formula that describes how far the object has to be away from the lens to be in focus (no circle of confusion)

1/f = 1/D + 1/D’
f - focal length (distance between lens and a focal point)
D - distance between an object and the lens
D’ - distance between an image plane (sensor) and the lens

This means that further away the object is, we have to decrease the distance between the lens and the sensor (by either moving the sensor or moving the lens, usually the lens is moved)

What is the depth of field?

Study These Flashcards

It is a range of distance away from the lens which are in focus.

Given the thin lens formula, we have only one distance that should be in focus but due to the pixel nature of the sensor, we have a range of distance which will be in focus. This range is called Depth of Field.

How can the depth of field be controlled?

Study These Flashcards

Even with lenses, we have the aperture in front of the lens that controls how much light comes on the lens and onto the sensor.

If we want to have a bigger depth of field, we have to decrease the aperture which makes it more like a pin hole camera (no “out of focus” property) but less light comes in and we will have to increase the exposure.
For smaller depth of field, we open the aperture more and the light comes from a wider angle and it makes the depth of field smaller and gives the bigger blur effect outside of the depth of field.

What is the field of view (FoV)?

Study These Flashcards

It is how much can we observe using out camera lens at the given time. It depends on the focal length and the size of the sensor.

If we have a small focal length (short cameras), the FoV is larger and we don’t have the ‘zoom effect’ which means we have to come closer to the object to take a picture of them.

If we have a bigger focal length (longer cameras), we have small FoV which makes the objects appear larger on the image (zoom effect)

How can we calculate the field of view (FoV)>

Study These Flashcards

FoV = arctan(d/2 * 1/f).

Larger the focal length, smaller the FoV

What are some distortions that appear when we have small focal length?

Study These Flashcards

Small focal length means bigger FoV (field of view). That means that we have to come closer to an object to take a photo of it. This makes the depth more significant on the image (like bigger noses) (when you are 30cm away from the face, the difference of 5cm is noticeable)

This can be fixed by moving away from the object and increasing the focal length (zoom). When we are far away, this 5cm difference is not noticeable and we have similar to pinhole camera, no lens effects.

What are some flaws when working with lenses?

Study These Flashcards

Chromatic aberration: different wavelengths of light refract off of the lens at different angles. This distorts colors and causes color fringing.
Spherical aberration: When working with spherical lenses, the rays further from the center focus closer. This causes lenses to not focus light perfectly
Vignetting: When there are multiple lenses, the first lens might not refract the light onto the second lens. These light rays are lost and it causes dark area around the image. Sometimes this is used as an artistic decision.
Radial Distortion: It is caused by imperfect lenses. It is more noticeable for the rays that pass at the edge of the lens. It makes straight lines not straight in the image.

What is chromatic aberration?

Study These Flashcards

When different wavelengths of light refract off of the lens at different angles. This distorts colors and causes color fringing.

What is spherical aberration?

When working with spherical lenses (lenses carved from a sphere), the rays further from the center focus closer. This causes lenses to not focus light perfectly. It causes blur away from the image center.

What is Vignetting?

When there are multiple lenses, the first lens might not refract the light onto the second lens. These light rays are lost and it causes dark area around the image. Sometimes this is used as an artistic decision.

What is Radial Distortion?

It is when straight lines in the world are not straight in the image. It is caused by imperfect lenses. It is more noticeable for the rays that pass at the edge of the lens. (at the edges of the image)

What are the image sensors? What are the two famous types of image sensors?

It is a matrix of cells where each cell converts photons into electrons by using light-sensitive diodes. CCD - Charge Couples Device CMOS - Complementary metal oxide semiconductor

What are the differences between CCD and CMOS sensors?

CCD: Each cell on the sensor transforms the photons into electrons. Then, each cell transports the charge across the chip and reads it in on corner of the array. Here, an analog-to-digital (ADC) turns each pixel's (cell) value into a digital value by measuring the amount of charge and converting it into a binary form. Problem is that we don't know the order of pixel digitalization and with very dynamic videos, the objects will be distorted. CMOS: Each cell transforms the photons to electrons and immediately digitalizes it within the cell. There is no need to have a separate ADC (every cell has one). A standard nowadays

What is a Bayer grid?

It is a grid made of 2 green, 1 red, and 1 blue color filter in this order: b g g r 2 greens because on average, humans are more sensitive to green color (see shades of green better)

How do cameras sense colors using Bayer filter?

Each pixel only measures one of the primary colors (red, green, blue) in the Bayer grid order. The color of that filter is estimated from the neighboring values (demosaicing). For example, in the image, the center pixel has its green value and it gets estimates from the neighboring blue and red pixels.

What is demosaicing? What is its problem and how to fix it?

If one pixel is broken or even in bayer grid's estimation of pixel colors, we estimate the pixel's color by taking a look at the neighboring pixel values. The problem with this is color moiré. This happens when we have sharp edges in the image and we use bayer grid. They are artifacts in the image due to the wrong pixel color estimation. Fix it using some artificial blur to avoid sharp edges.

What are different ways of sensing colors?

- Bayer grid - Prism: A special prism is created that separates blue, red, and green color in separate ways and those light rays have separate sensors. This is expensive, and makes cameras big and heavy. - Foveon X3: Takes advantage of the fact that red, blue and green light penetrate silicon to different depths. At different depths of silicon different colors are registered. Each pixel measures all 3 colors. Better quality but very bad quality in low-light environments. - X-Trans: It is similar to bayer filter but with the different pattern (pseudo random). It avoids having red and blue pixels in the same columns. It reduces the color moiré artifacts but requires more complex demosaicing procedure.

What are some issues with digital cameras?

- Noise: due to low light, light sensitivity (ISO) when amplifying the light level we get more noise. Stuck pixels (dead ones) - Resolutions: more megapixels mean smaller pixels which lead to low amount of photons hitting the sensors -> noise issues like in low-light - In-camera processing: oversharpening - RAW: how to compress images to not lose quality? - Blooming: some charge in the sensors overflowing to neighboring pixels, white balance

What is blooming?

When working with camera sensors, some charge in the sensors overflowing to neighboring pixels.

3. Camera Calibration Flashcards

(35 cards)