week 5 - chatgpt Flashcards
What is the main advantage of using deep neural networks over shallow ones?
Deep networks can represent complex functions with fewer parameters by exploiting hierarchical features, making them more efficient for complex tasks.
What is the vanishing gradient problem in deep networks?
As gradients are backpropagated through many layers, they can become very small, preventing early layers from learning effectively.
What is the exploding gradient problem in deep networks?
Gradients can grow exponentially as they are backpropagated, leading to unstable updates and divergence during training.
How does the ReLU activation function help with vanishing gradients?
ReLU has a constant derivative for positive inputs, preventing the gradient from shrinking to zero during backpropagation.
What is the difference between ReLU, LReLU, and PReLU?
ReLU outputs 0 for negative inputs; LReLU uses a small fixed slope; PReLU learns the slope from data.
Why is weight initialization important in deep networks?
Proper initialization (e.g., Xavier or He) maintains variance across layers, avoiding vanishing or exploding activations and gradients.
What is the purpose of batch normalization?
It standardizes layer inputs to have zero mean and unit variance, which stabilizes and speeds up training.
How do skip connections help train deep networks?
They allow gradients to bypass some layers, mitigating the vanishing gradient problem and enabling very deep architectures.
What is the main idea behind convolutional neural networks (CNNs)?
CNNs use local filters and weight sharing to detect spatial hierarchies in data, such as edges, textures, and shapes in images.
What role do pooling layers play in CNNs?
Pooling layers reduce spatial dimensions and introduce translation invariance, helping to generalize better across input variations.
What is the purpose of 1x1 convolutions in CNNs?
They are used to reduce or expand the number of channels without affecting spatial dimensions, aiding computational efficiency.
What is dropout and why is it used?
Dropout randomly deactivates neurons during training, forcing redundancy and improving generalization by reducing overfitting.