College 4 Flashcards

1
Q

Name and define 3 properties of convolution.

A
  1. sparse connectivity: convolution kernel is much smaller than the input (less connections)
  2. Parameter sharing:
    kernel coefficients are identical for each input location
  3. Equivariant representations:
    convolution value covaries with input value (I you shift your input (image) your output is going to be the same with the same shift)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If you have 2d convolution, with an 32x32 input and a 3x3 filter how many parameters do you have to learn? And what will be the size of the feature map

A
  1. 9 (3x3)

2. 28 x 28

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If you apply 6 filters in convolution how many feature maps do you get?

A

6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the filter size of a 2d convolution for an input with N channels?

A

(3,3,N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

1,1,2,4
5,6,7,8
3,2,1,0
1,2,3,4

What is the result of applying 2d max pooling with a 2x2 filter and stride: 2?

A

6,8

3,4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If you have a convolution over a 7 by 7 input
Filter size: 3x3
stride: 1
What is the output size?

A

5x5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If you have a convolution over a 7 by 7 input
Filter size: 3x3
stride: 2
What is the output size?

A

3x3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can be an advantage of maxpooling?

A

more robustness (to little shifts in the input) / better generalisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can be an advantage of increasing strides?

A

efficiency / space reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
If you have a convolution over a 8 by 8 input
Filter size: 3x3
stride: 3
padding: 2
What is the output size?
A

4x4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate the width and height of the output size

A

width_out = ((Width_input - filter_width + 2 x padding) / Stride) + 1

height_out = ((Height_input - filter_height + 2 x padding) / Stride) + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
If you have a convolution over a 5 by 5 input
Filter size: 3x3
stride: 2
padding: 1
What is the output size?
A

3x3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If you have a convolution over a 64 by 64 by 3 input
Filter size: 4x4x3
filters: 32
stride:2
What is the number of feature maps?
What is the output width and height?
What is the number of parameters of the convolutional layer?

A
output width: 31
output height: 31
number of feature maps: 32
number of parameters: 1568
(4x4x3 x32 + 1 , 
1 for bias)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define: transposed convolution

A

A specific transformation is not always useful, so a more robust way to upsample is to learn som filters that allow going from a feature map to a larger one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Apply transposed convolution:
input =
0, 1
2, 3

kernel =
0, 1
2, 3

A

0, 0, 1
0, 4, 6
4, 12, 9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name an application of transposed convolution

A

automatic colorization (encoder-decoder)

17
Q

what is the size of a filter with 1 by 1 convolution?

A

1x1

18
Q

why would you apply 1 by 1 convolution?

A

the deeper you get into the network the more feature maps you get, if your network get to big you want to reduce the size of the network, you can apply 1x1 convolution to reduce the dimensionality of the feature maps; compressed version

19
Q

How does 1x1 convolution work?

A

input feature map with shape (W,H, N

m filters of size (1,1,N) with (m

20
Q

What is the goal of inception architectures?

A

Increasing the depth and width of the network while keeping the computational budget constant

21
Q

How does the naïve version of the inception module work?

A
The information from the previous layer goes through:
1x1 convolutions
3x3 convolutions
5x5 convolutions
3x3 max pooling

and this information is then concatenated

22
Q

What was the problem with the naïve version of the inception module

A

It did not keep the computational budget constant.

23
Q

What were the improvements made to the naïve version of the inception module?

A

Before applying the 3x3 and 5x5 convolutions 1x1 convolutions were applied. And after the 3x3 max pooling 1x1 convolutions were applied

24
Q

What happens when you add more layers (get a deeper network)?

A

The intuition is that you get more parameters, more powerful network and better performance but that is not always the case. A 20-layer network can outperform a 56-layer network. A layer gives information to the next layer and that layer gives information to the next layer etc. If one layer extracts information that is not very useful the next layer will try to learn something from non-useful information. The later layers get input that has been processed a lot and has probably lost information that was in the original input.

25
Q

What is the idea of Res-Nets?

A

Give the network the option to just copy the input if the function F(x) (the information passed through from the previous layer) is not informative (and add this information to the output)

26
Q

What are attention mechanisms?

A

Attention mechanisms highlight the most informative features (in an image. This process runs parallel to to the feature extraction and creates a mask to ignore the features that are not important.

27
Q

What are the main methods for object detection?

A
  • Region proposals - R -CNN (Fast R-CNN and Faster R-CNN)
  • You Only Look Once - YOLO
  • Single Shot MultiBox Detector - SSD
  • RetinaNet