ai Flashcards
(92 cards)
Given a convolution layer with input channels 3, output channel 64, kernel size 4x4 and stride 2, dilation 3, padding 1, what are the parameter size of this convolution layer?
3x64x4x4
In Pytorch (import torch.nn as nn), which of the following layer downsamples the input size into half?
nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=2, padding=1, dilation=1))
Which following statement is True about convolution layer?
Convolution layer is linear and it is often used along with activation function.
In the design of auto-encoder, the encoder and decoder should follow the exact same structure.
FALSE
All regularizations (e.g., L1 norm, L2 norm) penalize larger parameters.
TRUE
When updating parameters using gradient descent, which way of calculating loss works better (ie. a better trade-off between efficiency and robustness)?
calculate loss for a mini-batch of data examples in every iteration
MaxPooling preserves detected features, and downsamples feature map (image)
TRUE
What is the size of receptive field for two stacking dilated convolution layers with kernel size 3x3, stride 1, and dilation 2?
9x9
In CNN, two conv layers cannot be connected directly, we must use a pooling layer in the middle.
FALSE
In the design of CNN, fully connected layer usually contains much more parameters than conv layers.
TRUE
What is the purpose of the ReLU activation function in a CNN?
To introduce non-linearity
What is the main advantage of using dropout in a CNN?
Preventing overfitting
In the mini-batch SGD training, an important practice is to shuffle the training data before every epoch. Why?
It helps the training converge fast and prevents bias.
Logistic Regression is widely used to solve a classification problem with predicting probabilities of discrete (or categorical) values.
TRUE
Which of the following statements is true about activation functions in the context of neural networks and backpropagation?
Activation functions like ReLU (Rectified Linear Unit) introduce non-linear properties to the model, allowing it to learn complex patterns.
Which case is overfitting?
Training error is low, but testing error is high
What approach could be used to handle overfitting?
Use regularization
Besides penalizing larger parameters, which regularization makes parameters more sparse?
L1 norm
In Backpropagation, which claim is true?
Backward-pass uses the information preserved in forward-pass to calculate gradients
As an activation function, tanh avoids the vanishing gradient problem.
FALSE
As an activation function, ReLU solves the vanishing gradient problem.
TRUE
About SGD optimization, which is not correct?
Randomly initialize the parameters will affect the performance.
In reinforcement learning, what is the benefit of using network instead of lookup table?
Generalization
Which way do we usually use to train an autoencoder model?
We usually train the encoder model and the decoder model together