Quiz 3 - CNN Architecture, Visualization, Advanced CV Architecture Flashcards

Question 1

Q

T/F: Visualization makes assessing interpretability easy

Answer

A

False

Visualization leads to some interpretable representations, bt they may be misleading or uninformative
Assessing interpretability is difficult
- Requires user studies to show usefulness
Neural networks learn distributed representation
- no one node represents a particular feature
- makes interpretation difficult

Question 2

Q

Steps to obtaining Gradient of Activation with respect to input

Answer

A

Pick a neuron
Run forward method up to layer we care about
Find gradient of its activation w.r.t input image
Can first find highest activated image patches using its corresponding neuron (based on receptive field)

Question 3

Q

T/F: A single-pixel change can make a NN wrong

Answer

A

True (single-pixel attacks)

Question 4

Q

Shape vs. Texture Bias

Answer

A

Ex: take picture of cat and apply texture of elephant
- Humans are biased towards shape (will see cat)
- Neural Networks are biased towards texture (will classify cat as elephant, likely)

Question 5

Q

Estimation Error

Answer

A

Even with the best weights to minimize training error, doesn’t mean it will generalize to the testing set (ie. overfit or non-generalizable features in training)

Question 6

Q

Limitations to Transfer Learning

Answer

A

If source dataset you train on is very different from target dataset
If you have enough data for the target domain, it just results in faster convergence

Question 7

Q

____ can be used to detect dataset bias

Answer

A

Gradient-based visualizations

Question 8

Q

Saliency Maps

Answer

A

Shows us what we think the neural network may find important in the input
- sensitivity of loss to individual pixel changes
- large sensitivity imples important pixels

Question 9

Q

What is non-semantic shift for label data?

Answer

A

Two images of the same thing, but different

Ex: Two pictures of bird but different – one a picture one a sketch

Question 10

Q

T/F: CNNs have scale invariance

Answer

A

True - but only some

Question 11

Q

low-labeled setting: domain generalization

Answer

A

Source
- multiple labeled
target
- unknown
shift
- non-semantic

Question 12

Q

T/F: For larger networks, estimation error can increase

Answer

A

True - With a small amount of data and a large amount of parameters, we could overfit

Question 13

Q

Backward Pass: Deconvnet

Answer

A

Pass back only the positive gradients

Question 14

Q

AlexNet - Key aspects

Answer

A

ReLU instead of sigmoid/tanh
Specialized normalization layers
PCA-based data augmentation
Dropout
Ensembling

Question 15

Q

Gram Matrix

Answer

A

Take a pair of channels in a feature map of n layers
- Get correlation (dot product) between features and then sum it up
Feed into larger matrix (Gram) to get correlation of all features
Get Gram matrix loss for style image with respect to generated image
Get Gram matrix loss for content image with respect to generated image
Sum up the losses with parameters (alpha, beta) for proportion of total loss contributed by each Gram matrix

Question 16

Q

Low-labeled setting: Semi-supervised learning

Answer

A

Source
- single labeled (usually much less)
target
- single unlabeled
shift
- none

Question 17

Q

low-labeled setting: cross-category transfer

Answer

A

Source
- single labeled
target
- single unlabeled
shift
- semantic

Question 18

Q

T/F: We can generate images from scratch using gradients to obtain an image with maximized score for a given class?

Answer

A

True - Image optimization

Question 19

Q

Creating alternating layers in a CNN (convolution/non-linear, pooling, and fully connect layers at the end) results in a ________ receptive field .

Answer

A

It results in an increasing receptive field for a particular pixel deep inside the network.

Question 20

Q

What is the problem for visualization in modern Neural Networks?

Answer

A

Small filters such as 3x3

Small convolution outputs are hard to interpet

Question 21

Q

Increasing the depth of a NN leads to ___ error (higher/lower)

Answer

A

higher - hard to optimize (but can be mitigated with residual blocks/skip connections)

Question 22

Q

Since the output of of convolution and pooling layers are ______ we can __________ them

Answer

A

Since the output of of convolution and pooling layers are (multi-channel) images we can sequence them just as any other layer

Question 23

Q

What is semantic shift for labeled images?

Answer

A

Both objects are image but different things

Question 24

Q

Most parameters in the ___ layer of a CNN

Answer

A

Fully Connected Layer - input x output dimensionality + bias

Question 25

Q

Normal backpropagation is not always the best choice for gradient-based visualizations because…?

Answer

A

You may get parts of image that decrease the feature activation
- likely lots of these input pixels

Question 26

Q

Grad-CAM

Answer

A

Feed image through CNN (only convolution part) for last Convolution Feature Map (most abstract features closest to classification on the network).
Following CNN with any Task-specific network (classification, question/answering)
Backprop until convolution
1. Obtain a feature map the size of the original feature maps
2. Obtain per-channel weighting (global average pooling for each channel of gradient) for neuron importance, then normalize
Multiply feature maps with their weighting
Feed through ReLU to obtain only positive features
Final result, values that are important will have higher values

Question 27

Q

VGG - Key Aspects

Answer

A

Repeating particular blocks of layers
- 3x3 conv with small strides
- 2x2 max pooling stride 2
Very large number of parameters

Question 28

Q

Convolution layers have the property of _____ and output has the property of _______

(choose translation equivariance or invariance for each)

Answer

A

Convolution layers have the property of translation equivariance and output has the property of invariance

Note: Some rotation invariance and scale invariance (only some)

Question 29

Q

Visualizing Neural Network Methods

Answer

A

Weights (kernels)
- See what edges are detected in kernels
Activations
- What does image look like in activation layer
Gradients
- Assess what is used for the optimization itself
Robustness
- See what weaknesses/bias are of NN

Question 30

Q

The gradient of the Convolution layer Kernel is equivalent to the _________

Answer

A

Cross-Correlation between the upstream gradient and input (until K₁xK₂ output)

Question 31

Q

Defenses for adversarial attacks

Answer

A

training with adversarial examples
perturbations, noies, or re-encoding of inputs
there are no universal methods to prevent attacks

Question 32

Q

T/F: Computer vision segmentation algorithms can be applied directly to gradients to get image segments

Question 33

Q

Exploring the space of possible architecture (methods)

Answer

A

Evolutionary Learning and Reinforcement Learning
Prune over-parameterized networks
Learning of repeated blocks is typical

Question 34

Q

The gradient of the loss with respect to the input image is equivalent to ____

Answer

A

Convolution between the upstream gradint and the kernel

Question 35

Q

Backward Pass:

Guided Backpropagation

Answer

A

Zero out gradient for negative values in forward pass
Zero out negative gradients
Only propagate positive influence
Like a combination of backprop and deconvnet

Question 36

Q

Gradient Ascent

Answer

A

Compute the gradient of the score for a particular class with respect to the input image
- Add the learning rate times gradient to maximize score (not subtracting)
Algorithm
- Start from random/zero image
- Compute forward pass
- Compute gradients
- Perform Ascent
- Iterate
Note: Uses scores to avoid minimizing other class scores
Need regularization as well

Question 37

Q

How do we represent similarity in terms of textures?

Answer

A

Should remove most spatial information
Key ideas revolved around summary statistics
Gram Matrix
- feature correlations

Question 38

Q

We can take the activations of any layer (FC, conv, etc.) and perform _____________

Answer

A

dimensionality reduction
- often to reduce to two dimensions for plotting
- PCA
- t-SNA (most common)
  - non-linear mapping to preserve pair-wise distances
good for visualizing decision boundaries (esp non-linear)

Question 39

Q

What is the power-law region for data effectiveness?

Answer

A

Region where generalization error (log-scale) decreases linearly with sufficient data

Question 40

Q

Modeling Error

Answer

A

Given a NN architecture, actual model that represents the real world may not be in that space. There may be no set of weights that model the real world.

Ie. a simple architecture or function may not be able to model complex reality (potentially low capacity)

Question 41

Q

What can you do to train a CNN if you don’t have enough data?

Answer

A

Transfer Learning -

Train on large-scale dataset and optimize parameters
Take custom data set and initialize the network with weights trained before (step 1)
Replace last layer with new fully-connected layer for output nodes per category
Continue to train on new dataset (finetune - update parameters, freeze feature layer - update only last layer weights if not enough data)

Question 42

Q

low-labeled setting: few-shot learning

Answer

A

Source
- single labeled
target
- single few-labeled
shift
- semantic

Question 43

Q

Most memory usage is in the ___ layers of a CNN

Answer

A

convolution layers - large output

Question 44

Q

Residual block/ skip connections

Answer

A

Allow information from a layer to propagate to any future layer (with identity (ie no transform) )

can help with better gradient flow

Question 45

Q

low-labeled setting: domain adaptation

Answer

A

Source
- single labeled
target
- single unlabeled
shift
- non-semantic

Question 46

Q

T/F: Saliency maps use the loss to assess importance of input pixels

Answer

A

False

In practice, saliency maps find gradient of the classifier scores (pre-softmax)
softmax and then loss function adds some complexity (weird effects in terms of the gradient)

Question 47

Q

How to preserve the content of an image

Answer

A

Match features at different layers
Use a loss for this
- optimize image by minimizing the difference between the images (content and generated images)
Multiple losesses
- Backward edges going to same node are summed
- Loss is sum of the difference across the identified layers

Question 48

Q

Optimization Error

Answer

A

Optimization algorithm may not be able to find the weights that 100% model the world

Question 49

Q

T/F: We have reached the point in complex CNN architectures where more data is not/barely improving performance

Answer

A

False - The ‘Irreducible Error Region’ has not been reached

Question 50

Q

What does an input pixel affect at the output in convolution?

Answer

A

Neighborhood around it (where part of the kernel touches it)

Question 51

Q

Visualizing Weights for CNN Layers

Answer

A

Fully Connect Layers
- Reshape weights for a node back into size of image, then scale to 0-255
Convolution Layers
- For each kernel, scale values from 0-255 and observe:
  - oriented edges
  - color
  - texture

Question 52

Q

Receptive Field

Answer

A

Defines what set of input pixels in the original image affect the value of a particular node deep in the neural network.

Question 53

Q

Where does a kernel pixel affect an output image during the convolution operation?

Answer

A

Everywhere!

The pixels in the kernel stride across the entire input image

Question 54

Q

low-labeled setting: un/self-supervised

Answer

A

Source
- single labeled
target
- many labeled
shift
- both/task

Question 55

Q

For larger networks, optimization error will likely ___ in size

Answer

A

increase - dynamics of optomization could get more difficult with deeper network

Question 56

Q

AlexNet - Architecture

Answer

A

Horizontal split architecture - couldn’t fit into one GPU

conv -> max pool -> norm (x2)

conv x 3 -> max pool

fully connected x3

Question 57

Q

T/F: CNNs do not have rotation invariance

Answer

A

False - They have some

Question 58

Q

A way to increase class scores or activations for an image

Answer

A

Gradient Ascent - optimization of an image to increase score for a particular class

Question 59

Q

Effectiveness of Transfer Learning

Answer

A

Surprisingly effective

Features learned for 1000 object categories will work well for the 1001st!

Generalizes even across tasks (classification to object detection)

Question 60

Q

For larger networks, modeling error will ___ in size

Answer

A

likely increase in size.

Question 61

Q

What was used to show the benefits of Neural Networks?

Answer

A

Large-scale data benchmarking

Question 62

Q

Inception Architecture

Answer

A

Repeated blocks composed of simple layers
parallel filters of different sizes
- 1x1 convolution, 3x3 convolution, 5x5 convolution, 3x3 max pooling -> filter concatenation
- increases computational complexity (4 times)

Question 63

Q

T/F: You need a large amount of pixel changes to make a network confidently wrong

Answer

A

False - Gradient ascent perturbations can make model confidently wrong (adversarial noise)

Question 64

Q

Key elements of practical application of saliency maps

Answer

A

Find gradient of classifier scores (pre soft-max), instead of loss
take absolute value of gradients
sum across channels
- We don’t care specifically about RBG specifics

Answer 64

A

Visualization of activation/filter
Larger early in the network
Looking at activations across the input
- which images have the highest activation?

Answer 65

A

Each kernel has size of entire input
- Equivalent to Wx+b
- output is one scalar
One kernel per output node

Answer 66

A

Probability distribution over classes for each pixel.

Answer 67

A

Convolutions work on arbitrary input sizes (because of striding)

Answer 68

A

In max-unpooling, contributions from multiple windows are summed.

Answer 69

A

Take each input pixel, multiply by learnable kernel, “stamp” it on output

Answer 70

A

Begin with a pre-trained trunk/backbone (e.g. network pretrained on ImageNet)

Answer 71

A

skip connections

Answer 72

A

Given an image, output a list of bounding boxes with probability distribution over classes per box

Answer 73

A

Variable number of boxes

Need to determine candidate regions (position and scale) first

Answer 74

A

multi-headed
- classification
  - predicting distribution over class labels
- regression
  - predicting bounding box for each image region
both heads share features
jointly optimized (summing gradients)

Answer 75

A

Combining redundant boxes to find bounding box for object in image

Answer 76

A

uses grid idea as anchors
- different scales
- different aspect ratios
tricks used to increase resolution (decrease subsampling ratio)

Answer 77

A

Single-scale

faster for same size than SSD

Answer 78

A

large-scle object detection, segmentation, and captioning dataset

Answer 79

A

For each bounding box, calculate intersection over union (IoU)
- extract intersection over union with closest ground truth
Keep only those with IoI > threshold
Calculate Precision/Recall curve across classification probability threshold
Calculate average precision (AP) over recall of [0, 0.1, 0.2, …, 1.0]
Average over all categories to get mean Average Precision (mAP)

Answer 80

A

Find regions of interests (ROIs) with object-like things
Classify those regions (refine their bounding boxes)

Answer 81

A

unsupervised (non-learned) algorithms
downsides
- 1+ second per image
- returns thousands of mostly backgrund images
resize each candidate to full input size and classify

Answer 82

A

Takes 1+ second per image
return thousands of (mostly background) boxes

Answer 83

A

Computations for convolutions are re-done for each image patch, even if overlapping

Answer 84

A

Reuse computation by finding regions in feature maps
- feature extraction once per image

Answer 85

A

Variable input size to FC layers due to different feature map sizes

Answer 86

A

ROI Pooling
- Given an arbitraryily-sized feature map, we can use pooling across a grid (ROI Pooling Layer) to convert to fixed-sized representation

Answer 87

A

Use Neural Networks for the region proposal
- Region Proposal Network (RPN)
  - output: objectness score
  - top k selected for classification
  - complexity in implementation due to some non differentiable parts (gradient with respect to bounding box coordinates)

Answer 88

A

Neural Network model to find regions of objects
Uses anchors in a grid
- k anchor boxes
  - various sizes and shapes
    - hyperparameters
- 2k scores
  - object or not-object like
- 4k coordinates

Answer 89

A

Two-stage object detection methods are slower but more accurate

Brainscape's Knowledge GenomeTM

Quiz 3 - CNN Architecture, Visualization, Advanced CV Architecture Flashcards

Brainscape's Knowledge Genome^TM