L10: Object detection Flashcards

1
Q

Linemod

❗️❗️❗️The modalities used by Linemod:

A
  1. Color features:
    local gradients → takes max gradient across R/G/B maps for pixel position (no greyscale convertion)
  2. Depth features
    Depth gradient → local surface normal vector, which is estimated form point cloud data (has orientation/direction)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linemod

❗️❗️❗️How features are quantized

A

The resulting directions are quantized into predefined orientation bins.

  • Color gradients → Bin direction of the gradients into 180 bins (0 to 180 degrees). Negative color gradients are omitted.
  • Normal vectors → 3D orientations, the points that point out of the camera are predefined. Lie inside a 3D cone. Bin normal vector into the nearest section of the cone.

Both gradients are normalized so there is ONLY a direction. instead of having coordiantes, all gradients have integers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Linemod

❗️❗️❗️The matching function for color/depth gradients, both pixel-wise and for an image window

A

Cross-correlation between color- and depth gradient.
- Color domain: each object template gradient dot each image gradient.
- - Depth domain: Same thing but it is anti-parallel (only positive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Linemod

What is Linemod?

A

Object detection belonging to template matching.
Advantage of Linemod: It reduces the number of templates and speeds it up. Multimodal templates. Can handle scale -, viewpoint -, and illumination change.
- Template matching require a lot of templates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What levels are there?

A

Instance-level → Detect a specific instance of an object

Category-level → Detect an instance of a certain object type (like dog, fridge, oven, dining_table, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Linemod

What is spreading and binarization?

A

Introduces a tolerance. This speeds up the process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linemod

What is being matched in the gradients?

A

Compares quantized gradient features of the object templeate with the corresponding gradient feature extracted from the input image.

  • This can be done for both color - and depth domain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

CenterNet

What is CenterNet?

A

It is a category-level detector (can also do it instance level).
It is trained to predict bounding boxes around the detected objects.
- Tries to predict object centers plus sizes within the image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

CenterNet

What is anchor boxes?

A

Smaller set of boxes in smaller pixel locations. It stretches the box and finds the defined object after it has classified it.
- Anchor boxes are fixed initial boundary box guesses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CenterNet

❗️❗️❗️How 2D detections are parameterized, and how this is different from regular anchor-based detectors

A

CenterNet scans through the image with strides R = 4. At each stride location the classifier predicts whether it’s an object center. The object size is predicted, and an offset correction is made, to compensate for inaccuracies caused by striding.

Anchors count as positive with an overlap IoU > 0.7.

Strides: Betyder at det bevæger sig over billedet med 4 pixel step i vertical og horizontal retning.
IoU: Intersection of Union (the higher the better!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

CenterNet

❗️❗️❗️How 2D detections are parameterized, and how this is different from regular anchor-based detectorsHow the three-term loss is built and what the terms mean

A

Guide training with loss:
1. - Classification/”focal” loss → Focus on objects, helps with too much background.
- Penalizes wrong binary prediction (0,1) with a modified log-loss.
2. Size loss → Predicts the size and the loss is found from the real size
- Penalize the discrepancies between the prediction and true size. Euclidean distance
3. Offset loss → Compensates downsampling from strides R=4. If we scale up predictions 4x there will be an offset from the ground truth center position. The net actively tries to predict the offset. (offset between the downscaled ground truth center and the predicted one)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CenterNet

❗️❗️❗️How the prediction output is converted back to a bounding box in the full image

A

Recovered with the predicted center, size and offset. Nearby pixels can also get classified as object, and 8-point non-maximum suppression is performed to remove overlapping center prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CenterNet

❗️❗️❗️How to repurpose CenterNet to other tasks, e.g. 3D detection and human body pose estimation by joint positions

A

It can produce other things than only a center point:
- Specify 3D boxes → replace regression heads for the new prediction task
- Human body pose estimation → Specify number of joint locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly