L12: 3D classification Flashcards

1
Q

What is 3D Classification?

A

3D classification refers to the task of classifying or categorizing three-dimensional (3D) objects or scenes into different classes or categories. It involves analyzing the spatial structure and characteristics of 3D data to make predictions about their class labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does Deep Sets and PointNet achieve invariance?

A

By pooling

Deep Sets: sum-pooling - collapse the responses to a single sum
PointNet: max-pooling - collapse the responses to a single value, the maximum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the two method takes as input?

A

A point cloud of an object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does equivariance mean?

A

“loose definition: the output “follows” the disturbance applied to the input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Point Clouds?

A

A datastructure of unordered 3D points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can Deep Sets be used for 3D Classification?

A

Deep Sets can be used to process 3D point clouds by treating each point as an element in the set. The encoder network processes each point’s features, and the pooling function aggregates the point features into a global representation. This global representation can then be fed into a fully connected layer for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can PointNet be used for 3D Classification?

A

PointNet can be used by treating the input point cloud as a set of points. The shared MLPs process the features of each point independently, and the max pooling operation aggregates the point features into a global representation. The global feature vector can then be passed through fully connected layers for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do equivariance and invariance mean?

A

Equivariance refers to the property where the output of a neural network transforms in a corresponding way when the input data undergoes a transformation. In other words, if the input is transformed, the output is transformed in the same way.

Invariance, on the other hand, refers to the property where the output of a neural network remains unchanged or invariant to certain transformations applied to the input data. In other words, regardless of the transformation applied to the input, the output remains the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When do we want equivariance?

A

Equivariance is desirable in certain cases where the transformation information is meaningful and should be preserved in the output. It allows the network to learn features and patterns that are invariant to specific transformations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do we want invariance?

A

Invariance is useful when the specific transformation applied to the input is considered irrelevant or when the desired output should not depend on that transformation. It allows the network to learn higher-level features that are invariant to certain transformations, leading to more robust and generalizable representations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does it mean that PointNet takes a permutation-invariant approach?

A

It means that the architecture is designed to process unordered point clouds without relying on any specific order or permutation of the points. It treats each point independently and aggregates their features to obtain a global representation of the entire point cloud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PointNet: How can we ignore translations?

A

By centering (demeaning) the inputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PointNet: How can we ignore the rotations?

A

Two possibilities

Augmentation of data input: create a bunch of duplicates of a point cloud object that are rotated from each other.

  • DRAWBACK: many many many iterations are needed

Or do as the PointNet do, add a Transform Net (T-net) into the layers

The T-Net has 9 output neurons. Reshape these 9 numbers to a 3x3 matrix. Multiply with the input point cloud. The hope is that the T-Net learns to “transform” every point cloud into a new space where the rotation doesn’t matter (much)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PointNet: What is the purpose of segmentation in it?

A

Using a segmentation network within the PointNet makes it able to seperate different parts of a point cloud.

So if you have a knife, it would be able to segment out what points adds up to the blade and what points adds up to the shaft.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly