hoai_exam2024_retry02 Flashcards
(40 cards)
b and c are correct
b. … is, e.g., a polynomial of degree 1.
Linear regression models describe relationships as linear equations, which are polynomials of degree 1 (e.g., ).
c. … has a closed-form solution.
Linear regression can be solved with a closed-form solution using the Normal Equation () when the design matrix  has full rank.
Correct answer: c
c. … may increase the validation performance.
Dropout is a regularization technique that helps prevent overfitting by randomly dropping out neurons during training. This can lead to improved generalization and better validation performance.
Explanation of other options:
* a. … is only used during validation time.
Incorrect: Dropout is not used during validation or testing. It is only applied during training. During validation/testing, all neurons are used, and their outputs are scaled accordingly.
* b. … is always used during training as well as validation time.
Incorrect: Dropout is applied only during training, not during validation/testing.
* d. … increases the validation loss in order to decrease the training loss.
Incorrect: Dropout aims to increase training loss by introducing noise during training to make the model more robust, which often decreases the validation loss.
Correct answers: a and b.
a. … dynamically control the learning rate.
Learning rate schedules adjust the learning rate during training based on a predefined strategy, such as reducing it after certain epochs or when validation performance plateaus.
b. … might improve the speed at which the network learns.
By reducing the learning rate at the right time, learning rate schedules can help the network converge faster and more effectively to a good solution.
Explanation of other options:
* c. … guarantee to find the global minimum of the loss.
Incorrect: Learning rate schedules improve optimization but do not guarantee finding the global minimum, especially in non-convex loss surfaces like those in neural networks.
* d. … cannot be applied in fully-connected neural networks.
Incorrect: Learning rate schedules can be applied to any type of neural network, including fully connected networks.
Correct answers: b and d.
b. … allow to create deeper neural networks while maintaining trainability.
Residual connections help address the vanishing gradient problem by allowing gradients to flow more easily through the network, making it feasible to train much deeper networks.
d. … create shortcuts for gradients.
Residual connections provide shortcut paths that bypass one or more layers, enabling gradients to flow directly back during backpropagation, thus improving gradient propagation.
Explanation of other options:
* a. … can only be used in convolution neural networks.
Incorrect: Residual connections can be used in various types of neural networks, not just convolutional ones. They are a general concept applicable wherever a deep architecture is used.
* c. … reduce the spatial size of the input.
Incorrect: Residual connections do not alter the spatial size of the input. Instead, they add the input back to the output of a layer (or block).
Correct answers: b, c, and d.
b. Automatic differentiation.
Frameworks like PyTorch provide automatic differentiation, which simplifies calculating gradients for backpropagation during neural network training.
c. Easy switching of computations between CPU and GPU.
PyTorch makes it simple to move tensors and computations between CPU and GPU by using .to(device) or .cuda() methods.
d. Straightforward construction of neural networks.
PyTorch’s dynamic computation graph and torch.nn module allow for an intuitive and flexible way to build and experiment with neural networks.
Explanation of the incorrect option:
* a. Developed without any influence by industry.
Incorrect: PyTorch was developed by Facebook’s AI Research (FAIR) team and has strong industry influence. This influence has contributed to its wide adoption and practical design.
Correct answers: a and c.
Here are the correct answers:
a. A stride of 2 affects the amount of performed convolution calculations.
A stride of 2 means the kernel is applied every two steps, so fewer convolutions are performed compared to a stride of 1. This results in a reduction of the output size and fewer calculations.
c. The kernel is moved across the input by a step size of 2.
A stride of 2 means that the convolutional kernel moves across the input feature map with a step size of 2 in both the horizontal and vertical directions.
Explanation of the incorrect options:
* b. A stride of 2 is the same as max pooling with a size of 2.
Incorrect: While both operations reduce the spatial dimensions of the input, they are not the same. Max pooling selects the maximum value in each region, while a convolution with a stride of 2 performs a convolution operation, which is different from pooling.
* d. A stride of 2 is the maximum possible value.
Incorrect: The stride can be greater than 2, though typical values are 1 or 2 in most architectures. Larger strides (e.g., 3, 4) are also used in some cases.