# Quiz 2 Flashcards

1
Q

You have an input volume of 32×32×3. What are the dimensions of
the resulting volume after convolving a 5×5 kernel with zero padding,
stride of 1, and 2 filters?

A

Parameter count = (k1 * k2 * depth + 1) * No. of filters

Therefore, (5 * 5 * 3 + 1) * 2 which comes to 152

2
Q
1. Consider a documents collection made of 100 documents. Given a
query q, the set of documents relevant to the users is D* = d3, d12,
d34, d56, d98. An IR system retrieves the following documents D =
d3, d12, d35, d56, d66, d88, d95
• Compute the number of True-Positives, True-Negatives, FalsePositives, False-Negatives
• Compute Precision, Recall, and Accuracy.
A
```TP = 3, TN = 91, FP = 4, FN = 2
Precision = 3/7 Recall = 3/5 Accuracy = 94/100```
3
Q

You have an input volume of 32×32×3. What are the dimensions of
the resulting volume after convolving a 5×5 kernel with zero padding,
stride of 1, and 2 filters?
4. How many weights and biases would you have?

A

W×H×T* num of F + num of F

= 5 * 5 * 5 * 3 * 2 + 2

4
Q

Output size of vanilla Convolution

A

(H-k1+1) X (W-k2+2)

5
Q

Suppose you have an input volume of dimension 64x64x16. How many
parameters would a single 1x1 convolutional filter have, including the
bias?

A

17

6
Q

Suppose your input is a 300 by 300 color (RGB) image, and you use
a convolutional layer with 100 filters that are each 5x5. How many
parameters does this layer have including the bias parameters?

A

7600

7
Q

You have an input volume that is 63x63x16 and convolve it with 32
filters that are each 7x7, and stride of 1. You want to use a same

A

((63 − 7 + 2P) / 1) + 1 = 63

Solve for P = 3

8
Q

Sigmoid

A

0 to 1

Computation is exponential term

9
Q

Tanh

A

-1 to 1 (centered at 0)

Still computationally heavy

10
Q

Relu

A

No saturation on positive end

Can cause dead neuron (if x <= 0)

Cheap to compute

11
Q

Leaky relu

A

Learnable parameter

No saturation

Still cheap to compute

12
Q

Which activation is best?

A

ReLU is typical starting point

Sigmoid is typically avoided

13
Q

Initialization

A

Initialization that is close to a good (local) minima will converge faster and to a better solution

Initializing values to a constant value leads to a degenerate solution!

Xavier Initialization –> Lesson 3, Slide 26

14
Q

Issues with optimizers

A

Ill-conditioned loss surface

15
Q

Optimization types

A

RMSProp

Keep a moving average of squared gradients

Use gradient statistics to reduce learning rate across iterations

Maintains both first and second moment statistics for gradients

16
Q

Drop out

A

Dropout: For each node, keep its output with probability p; Activations of deactivated nodes are essentially zero

In practice, implement with a mask calculated each iteration

During testing, no nodes are dropped

Can be seen as:

Training 2^n networks, or

The model should not rely too heavily on particular features

17
Q

A

Sampling

Synthetic oversampling minority technique

Identify nearest neighbors in feature space, select subset of nearest neighbors, then uniformly sample from line segment connecting the nearest neighbors

Cost-based learning

Focal Loss

downweight easy examples (well classified, high probability examples)